8/12/2019 Source Coding ompression
1/34
Source Coding-Compression
Most Topics from Digital Communications-
Simon Haykin
Chapter 9
9.1~9.4
8/12/2019 Source Coding ompression
2/34
Fundamental Limits on Performance
Given an information source, and a noisy channel
1) Limit on the minimum number of bits
per symbol
2) Limit on the maximum rate for reliable
communication
Shannons theorems
8/12/2019 Source Coding ompression
3/34
Information Theory
Let the source alphabet,
with the prob. of occurrence
Assume the discrete memory-less source (DMS)
What is the measure of information?
0, 1 -1{ , .. , }KS s s s
-1
0
0,1, .. , -1( ) , 1K
k k k
k
k KP s s p and p
8/12/2019 Source Coding ompression
4/34
Uncertainty, Information, and Entropy
(cont)
Interrelations between info., uncertainty or surprise
No surprise no information
If A is a surprise and B is another surprise,
then what is the total info. of simultaneous A and B
The amount of info may be related to the inverse of
the prob. of occurrence.
1( . )Pr .Info ob
.( ) .( ) .( )Info A B Info A Info B
1
( ) log( )kk
I S p
8/12/2019 Source Coding ompression
5/34
Property of Information
1)
2)
3)
4)
* Custom is to use logarithm of base 2
k k(s ) 0 for p 1I
k( ) 0 for 0 p 1kI s
k i( ) ( ) for p p
k iI s I s
indep.statist.sandsif),()()( ikikik sIsIssI
8/12/2019 Source Coding ompression
6/34
8/12/2019 Source Coding ompression
7/34
8/12/2019 Source Coding ompression
8/34
Average Length
For a code Cwith associated probabilitiesp(c)the average
length is defined as
We say that a prefix code Cis optimalif for all prefix
codes C, la(C)la(C)
l C p c l cac C
( ) ( ) ( )
8/12/2019 Source Coding ompression
9/34
Relationship to Entropy
Theorem (lower bound): For any probability
distribution p(S) with associated uniquely decodable
code C,
Theorem (upper bound): For any probability
distribution p(S) with associated optimal prefix code
C,
H S l Ca( ) ( )
l C H S a ( ) ( ) 1
8/12/2019 Source Coding ompression
10/34
Coding Efficiency
Coding Efficiency
n = Lmin/La
where La is the average code-word length
From Shannons Theorem La >= H(S)
Thus Lmin = H(S)
Thus n = H(S)/La
8/12/2019 Source Coding ompression
11/34
Kraft McMillan Inequality
Theorem (Kraft-McMillan): For any uniquely decodable codeC,
Also, for any set of lengths L such that
there is a prefix code C such that
NOTE: Kraft McMillan Inequality does not tell uswhether the code is prefix-free or not
2 1
l cc C
( )
2 1
l
l L
l c l i Li i( ) ( ,...,| |) 1
8/12/2019 Source Coding ompression
12/34
8/12/2019 Source Coding ompression
13/34
Prefix Codes
Aprefix codeis a variable length code in which nocodeword is a prefix of another word
e.g a = 0, b = 110, c = 111, d = 10
Can be viewed as a binary tree with message values at theleaves and 0 or 1s on the edges.
a
b c
d
0
0
0 1
1
1
8/12/2019 Source Coding ompression
14/34
Some Prefix Codes for Integers
n Binary Unary Split
1 ..001 0 1|
2 ..010 10 10|0
3 ..011 110 10|1
4 ..100 1110 110|00
5 ..101 11110 110|01
6 ..110 111110 110|10
Many other fixed prefix codes:
Golomb, phased-binary, subexponential, ...
8/12/2019 Source Coding ompression
15/34
Data compression implies sending or storing a
smaller number of bits. Although many methods are
used for this purpose, in general these methods can
be divided into two broad categories: lossless and
lossymethods.
Data compression methods
8/12/2019 Source Coding ompression
16/34
Run Length Coding
8/12/2019 Source Coding ompression
17/34
IntroductionWhat is RLE?
Compression technique Represents data using value and run length
Run length defined as number of consecutive equal values
e.g
1110011111 1 30 21 5RLE
Values Run Lengths
8/12/2019 Source Coding ompression
18/34
Introduction
Compression effectiveness depends on input
Must have consecutive runs of values in order to maximize
compression
Best case: all values same
Can represent any length using two values
Worst case: no repeating values
Compressed data twice the length of original!!
Should only be used in situations where we know for sure have
repeating values
8/12/2019 Source Coding ompression
19/34
Run-length encoding example
8/12/2019 Source Coding ompression
20/34
Run-length encoding for two symbols
8/12/2019 Source Coding ompression
21/34
EncoderResults
Input: 4,5,5,2,7,3,6,9,9,10,10,10,10,10,10,0,0
Output: 4,1,5,2,2,1,7,1,3,1,6,1,9,2,10,6,0,2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
Best Case:
Input: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Output: 0,16,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
Worst Case:
Input: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
Output: 0,1,1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,12,1,13,1,14,1,15,1
Valid OutputOutput Ends Here
8/12/2019 Source Coding ompression
22/34
Huffman Coding
8/12/2019 Source Coding ompression
23/34
8/12/2019 Source Coding ompression
24/34
Huffman Codes
Huffman Algorithm
Start with a forest of trees each consisting of a single
vertex corresponding to a message s and with weight
p(s) Repeat:
Select two trees with minimum weight rootsp1andp2
Join into single tree by adding root with weightp1
+ p2
8/12/2019 Source Coding ompression
25/34
Example
p(a) = .1, p(b) = .2, p(c ) = .2, p(d) = .5
a(.1) b(.2) d(.5)c(.2)
a(.1) b(.2)
(.3)
a(.1) b(.2)
(.3) c(.2)
a(.1) b(.2)
(.3) c(.2)
(.5)
(.5) d(.5)
(1.0)
a=000, b=001, c=01, d=1
0
0
0
1
1
1Step 1
Step 2Step 3
8/12/2019 Source Coding ompression
26/34
Encoding and Decoding
Encoding: Start at leaf of Huffman tree and follow path
to the root. Reverse order of bits and send.
Decoding: Start at root of Huffman tree and take branch
for each bit received. When at leaf can output messageand return to root.
a(.1) b(.2)
(.3) c(.2)
(.5) d(.5)
(1.0)0
0
0
1
1
1
There are even faster methods that
can process 8 or 32 bits at a time
8/12/2019 Source Coding ompression
27/34
Huffman codes Pros & Cons
Pros:
The Huffman algorithm generates an optimal prefix code.
Cons: If the ensemble changesthe frequencies and probabilities change
the optimal coding changes
e.g. in text compression symbol frequencies vary with context
Re-computing the Huffman code by running through the entire file in
advance?!
Saving/ transmitting the code too?!
8/12/2019 Source Coding ompression
28/34
8/12/2019 Source Coding ompression
29/34
Lempel-Ziv Algorithms
LZ77(Sliding Window)
Variants: LZSS (Lempel-Ziv-Storer-Szymanski)
Applications: gzip, Squeeze, LHA, PKZIP, ZOO
LZ78(Dictionary Based) Variants: LZW (Lempel-Ziv-Welch),
LZC (Lempel-Ziv-Compress)
Applications:compress, GIF, CCITT (modems), ARC, PAK
Traditionally LZ77 was better but slower, but the gzip version isalmost as fast as any LZ78.
L l Zi di
8/12/2019 Source Coding ompression
30/34
Lempel Ziv encoding
Lempel Ziv (LZ) encoding is an example of a
category of algorithms called dictionary-basedencoding. The idea is to create a dictionary (a table)
of strings used during the communication session. If
both the sender and the receiver have a copy of the
dictionary, then previously-encountered strings canbe substituted by their index in the dictionary to
reduce the amount of information transmitted.
C i
8/12/2019 Source Coding ompression
31/34
Compression
In this phase there are two concurrent events:
building an indexed dictionary and compressing a
string of symbols. The algorithm extracts the smallest
substring that cannot be found in the dictionary from
the remaining uncompressed string. It then stores a
copy of this substring in the dictionary as a new entryand assigns it an index value. Compression occurs
when the substring, except for the last character, is
replaced with the index found in the dictionary. Theprocess then inserts the index and the last character
of the substring into the compressed string.
8/12/2019 Source Coding ompression
32/34
An example of Lempel Ziv encoding
D i
8/12/2019 Source Coding ompression
33/34
Decompression
Decompression is the inverse of the compression
process. The process extracts the substrings from the
compressed string and tries to replace the indexes
with the corresponding entry in the dictionary, which
is empty at first and built up gradually. The idea is
that when an index is received, there is already anentry in the dictionary corresponding to that index.
8/12/2019 Source Coding ompression
34/34
An example of Lempel Ziv decoding
Top Related