Multimedia SystemsMultimedia Systems
Entropy CodingEntropy Coding
Course PresentationCourse Presentation
Mahdi Amiri
March 2014
Sharif University of Technology
Source and Channel CodingShannon's Separation PrincipleShannon's Separation Principle
Assumptions:
Single source and user
Unlimited complexity and delay
Claude E. Shannon, 1916-2001
Page 1 Multimedia Systems, Entropy Coding
Information
Source
Generates information
we want to transmit or
store.
Source
Coding
Channel
Coding
Reduces number of bits
to store or transmit
relevant information.
Increases number of bits
or changes them to protect
against channel errors.
What about joint source
and channel coding?
Coding related elements in a communication system.
Claude E. Shannon, 1916-2001
Source CodingMotivationMotivation
Data storage and transmission cost money.
Use fewest number of bits to represent information source.
Pro:
Less memory, less transmission time.
Page 2 Multimedia Systems, Entropy Coding
Less memory, less transmission time.
Cons:
Extra processing required.
Distortion (if using lossy compression ).
Data has to be decompressed to be represented, this
may cause delay.
Source CodingPrinciplesPrinciples
Example
The source coder shall represent the video signal by the minimum number of
(binary) symbols without exceeding an acceptable level of distortion.
Two principles are utilized:
1. Properties of the information source that are known a priori result in
redundant information that need not be transmitted (“redundancy
Page 3 Multimedia Systems, Entropy Coding
redundant information that need not be transmitted (“redundancy
reduction“).
2. The human observer does not perceive certain deviations of the
received signal from the original (“irrelevancy reduction“).
Approaches:
Lossless coding: completely reversible, exploit 1. principle only.
Lossy coding: not reversible, exploit 1. and 2. principle.
Data CompressionLossless and LossyLossless and Lossy
Lossless
Exact reconstruction is possible.
Applied to general data.
Lower compression rates.
Page 4 Multimedia Systems, Entropy Coding
Lower compression rates.
Examples: Run-length, Huffman, Lempel-Ziv.
Lossy
Higher compression rates.
Applied to audio, image and video.
Examples: CELP, JPEG, MPEG-2.
Data CompressionCodec (Encoder and Decoder)Codec (Encoder and Decoder)
T
Transform,
prediction
Original
signalQ E
QuantizationEntropy
encoder
Page 5 Multimedia Systems, Entropy Coding
Reconstructed
signalT-1 Q-1 E-1
Compressed
bit-stream
Inverse
TransformDequantization
Entropy
decoder
General structure of a Codec.
Run-length encoding
Fixed Length Coding (FLC)
Variable Length Coding (VLC)
Entropy CodingSelected Topics and AlgorithmsSelected Topics and Algorithms
Huffman Coding Algorithm
Entropy, Definition
Lempel-Ziv (LZ77)
Lempel-Ziv-Welch (LZW)
Arithmetic Coding
Page 6 Multimedia Systems, Entropy Coding
Lossless CompressionRunRun--Length Encoding (RLE)Length Encoding (RLE)
BBBBHHDDXXXXKKKKWWZZZZ 4B2H2D4X4K2W4Z
0, 40
0, 40
0,10 1,20 0,10
0,10 1,1 0,18 1,1 0,10
Page 7 Multimedia Systems, Entropy Coding
Image of a rectangle
0,10 1,1 0,18 1,1 0,10
0,10 1,1 0,18 1,1 0,10
0,10 1,1 0,18 1,1 0,10
0,10 1,20 0,10
0,40
RLE used in
Fax machines.
Lossless CompressionFixed Length Coding (FLC)Fixed Length Coding (FLC)
A simple example
►♣♣♠☻►♣☼►☻The message to code:
5 different symbols � at least 3 bits
Message length: 10 symbols
Codeword table
Page 8 Multimedia Systems, Entropy Coding
Total bits required to code: 10*3 = 30 bits
Codeword table
Lossless CompressionVariable Length Coding (VLC)Variable Length Coding (VLC)
Intuition: Those symbols that are more frequent should have smaller codes, yet since
their length is not the same, there must be a way of distinguishing each code
►♣♣♠☻►♣☼►☻The message to code:
Codeword tableTo identify end of a codeword as
Page 9 Multimedia Systems, Entropy Coding
Total bits required to code: 3*2 +3*2+2*2+3+3 = 24 bits
Codeword tableTo identify end of a codeword as
soon as it arrives, no codeword can
be a prefix of another codeword
How to find the optimal codeword table?
Lossless CompressionVLC, Example ApplicationVLC, Example Application
Morse code
nonprefix code
Needs separator symbol
for unique decodability
Page 10 Multimedia Systems, Entropy Coding
Lossless CompressionHuffman Coding AlgorithmHuffman Coding Algorithm
Step 1: Take the two least probable symbols in the alphabet
(longest codewords, equal length, differing in last digit)
Step 2: Combine these two symbols into a single symbol, and repeat.
P(n): Probability of
symbol number n
Page 11 Multimedia Systems, Entropy Coding
symbol number n
Here there is 9 symbols.
e.g. symbols can be
alphabet letters ‘a’, ‘b’, ‘c’,
‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’
Lossless CompressionHuffman Coding AlgorithmHuffman Coding Algorithm
Paper: "A Method for the Construction of
Minimum-Redundancy Codes“, 1952
Results in "prefix-free codes“
Most efficient
No other mapping will produce a smaller average output size,
Page 12 Multimedia Systems, Entropy Coding
David A. Huffman
1925-1999
No other mapping will produce a smaller average output size,
If the actual symbol frequencies agree with those used to create the code.
Cons:
Have to run through the entire data in advance to find frequencies.
‘Minimum-Redundancy’ is not favorable for error correction techniques (bits
are not predictable if e.g. one is missing).
Does not support block of symbols: Huffman is designed to code single
characters only. Therefore at least one bit is required per character, e.g. a word of
8 characters requires at least an 8 bit code.
Entropy CodingEntropy, DefinitionEntropy, Definition
The entropy, H, of a discrete random variable X is a measure of the
amount of uncertainty associated with the value of X.
Measure of information content (in bits)
A quantitative measure of the disorder of a system
( ) ( )( )2
1log
x X
H X P xP x∈
= ⋅∑X � Information Source
P(x) � Probability that symbol x in X will occur
Information Theory
Point of View
Page 13 Multimedia Systems, Entropy Coding
A quantitative measure of the disorder of a system
It is impossible to compress the data such that the average
number of bits per symbol is less than the Shannon entropy
of the source(in noiseless channel)
The Intuition Behind the FormulaClaude E. Shannon
1916-2001( )( )1
amount of uncertatinty P x HP x
↑ ⇒ ↓ ⇒ ∼
( )( )2
1bringing it to the world of bits log , information content of H I x x
P x⇒ ∼ =
( )weighted average number of bits required to encode each possible value and P x⇒ × ∑
Lossless CompressionLempelLempel--Ziv (LZ77)Ziv (LZ77)
Algorithm for compression of character sequences
Assumption: Sequences of characters are repeated
Idea: Replace a character sequence by a reference to an earlier occurrence1. Define a: search buffer = (portion) of recently encoded data
look-ahead buffer = not yet encoded data
2. Find the longest match between
Page 14 Multimedia Systems, Entropy Coding
2. Find the longest match between
the first characters of the look ahead buffer
and an arbitrary character sequence in the search buffer
3. Produces output <offset, length, next_character>
offset + length = reference to earlier occurrence
next_character = the first character following the match in the look ahead buffer
Lossless CompressionLempelLempel--ZivZiv--Welch (LZW)Welch (LZW)
Drops the search buffer and keeps an explicit dictionary
Produces only output <index>
Used by unix "compress", "GIF", "V24.bis", "TIFF”
Example: wabbapwabbapwabbapwabbapwoopwoopwoo
Progress clip at 12th entry
Page 15 Multimedia Systems, Entropy Coding
Progress clip at 12th entry
Encoder output sequence so far: 5 2 3 3 2 1
Lossless CompressionLempelLempel--ZivZiv--Welch (LZW)Welch (LZW)
Example: wabbapwabbapwabbapwabbapwoopwoopwoo
Progress clip at the end of above example
Page 16 Multimedia Systems, Entropy Coding
Encoder output sequence: 5 2 3 3 2 1
6 8 10 12 9 11 7 16 5 4 4 11 21 23 4
Lossless CompressionArithmetic CodingArithmetic Coding
Encodes the block of symbols into a single number, a
fraction n where (0.0 ≤ n < 1.0).
Step 1: Divide interval [0,1) into subintervals based on probability of
the symbols in the current context ���� Dividing Model.
Step 2: Divide interval corresponds to the current symbol into sub-
Page 17 Multimedia Systems, Entropy Coding
Step 2: Divide interval corresponds to the current symbol into sub-
intervals based on dividing model of step 1.
Step 3: Repeat Step 2 for all symbols in the block of symbols.
Step 4: Encode the block of symbols with a single number in the
final resulting range. Use the corresponding binary number in this
range with the smallest number of bits.
See the encoding and decoding examples in the following slides
Lossless CompressionArithmetic Coding, EncodingArithmetic Coding, Encoding
Example: SQUEEZE
Using FLC: 3 bits per symbol � 7*3 = 21 bits
P(‘E’) = 3/7 Prob. ‘S’ ‘Q’ ‘U’ ‘Z’: 1/7
Page 18 Multimedia Systems, Entropy Coding
We can encode the word SQUEEZE with a
single number in [0.64769-0.64772) range.
The binary number in this range with the
smallest number of bits is 0.101001011101,
which corresponds to 0.647705 decimal. The '0.'
prefix does not have to be transmitted because
every arithmetic coded message starts with this
prefix. So we only need to transmit the sequence
101001011101, which is only 12 bits.
Dividing Model
Lossless CompressionArithmetic Coding, DecodingArithmetic Coding, Decoding
Input Probabilities: P(‘A’)=60%, P(‘B’)=20%, P(‘C’)=10%, P(‘<space>’)=10%
Decoding the input value of 0.538
Dividing model from input probabilities �
60% 20% 10% 10%
Page 19 Multimedia Systems, Entropy Coding
The fraction 0.538 (the circular point) falls into the
sub-interval [0, 0.6) � the first decoded symbol is 'A'
The subregion containing the point is successively
subdivided in the same way as diviging model.
Since .538 is within the interval [0.48, 0.54), the
second symbol of the message must have been 'C'.
Since .538 falls within the interval [0.534, 0.54), the
Third symbol of the message must have been '<space>'.
The internal protocol in this example indicates <space> as the
termination symbol, so we consider this is the end of decoding process
Lossless CompressionArithmetic CodingArithmetic Coding
Pros
Typically has a better compression ratio than
Huffman coding.
Cons
Page 20 Multimedia Systems, Entropy Coding
Cons
High computational complexity.
Patent situation had a crucial influence to decisions
about the implementation of an arithmetic coding
(Many now are expired).
Thank You
Multimedia SystemsMultimedia Systems
Entropy CodingEntropy Coding
Page 21 Multimedia Systems, Entropy Coding
Thank You
1. http://ce.sharif.edu/~m_amiri/
2. http://www.dml.ir/
FIND OUT MORE AT...
Next Session: Color SpaceNext Session: Color Space
Top Related