Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... ·...

22
Multimedia Systems Multimedia Systems Entropy Coding Entropy Coding Course Presentation Course Presentation Mahdi Amiri March 2014 Sharif University of Technology

Transcript of Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... ·...

Page 1: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Multimedia SystemsMultimedia Systems

Entropy CodingEntropy Coding

Course PresentationCourse Presentation

Mahdi Amiri

March 2014

Sharif University of Technology

Page 2: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Source and Channel CodingShannon's Separation PrincipleShannon's Separation Principle

Assumptions:

Single source and user

Unlimited complexity and delay

Claude E. Shannon, 1916-2001

Page 1 Multimedia Systems, Entropy Coding

Information

Source

Generates information

we want to transmit or

store.

Source

Coding

Channel

Coding

Reduces number of bits

to store or transmit

relevant information.

Increases number of bits

or changes them to protect

against channel errors.

What about joint source

and channel coding?

Coding related elements in a communication system.

Claude E. Shannon, 1916-2001

Page 3: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Source CodingMotivationMotivation

Data storage and transmission cost money.

Use fewest number of bits to represent information source.

Pro:

Less memory, less transmission time.

Page 2 Multimedia Systems, Entropy Coding

Less memory, less transmission time.

Cons:

Extra processing required.

Distortion (if using lossy compression ).

Data has to be decompressed to be represented, this

may cause delay.

Page 4: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Source CodingPrinciplesPrinciples

Example

The source coder shall represent the video signal by the minimum number of

(binary) symbols without exceeding an acceptable level of distortion.

Two principles are utilized:

1. Properties of the information source that are known a priori result in

redundant information that need not be transmitted (“redundancy

Page 3 Multimedia Systems, Entropy Coding

redundant information that need not be transmitted (“redundancy

reduction“).

2. The human observer does not perceive certain deviations of the

received signal from the original (“irrelevancy reduction“).

Approaches:

Lossless coding: completely reversible, exploit 1. principle only.

Lossy coding: not reversible, exploit 1. and 2. principle.

Page 5: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Data CompressionLossless and LossyLossless and Lossy

Lossless

Exact reconstruction is possible.

Applied to general data.

Lower compression rates.

Page 4 Multimedia Systems, Entropy Coding

Lower compression rates.

Examples: Run-length, Huffman, Lempel-Ziv.

Lossy

Higher compression rates.

Applied to audio, image and video.

Examples: CELP, JPEG, MPEG-2.

Page 6: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Data CompressionCodec (Encoder and Decoder)Codec (Encoder and Decoder)

T

Transform,

prediction

Original

signalQ E

QuantizationEntropy

encoder

Page 5 Multimedia Systems, Entropy Coding

Reconstructed

signalT-1 Q-1 E-1

Compressed

bit-stream

Inverse

TransformDequantization

Entropy

decoder

General structure of a Codec.

Page 7: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Run-length encoding

Fixed Length Coding (FLC)

Variable Length Coding (VLC)

Entropy CodingSelected Topics and AlgorithmsSelected Topics and Algorithms

Huffman Coding Algorithm

Entropy, Definition

Lempel-Ziv (LZ77)

Lempel-Ziv-Welch (LZW)

Arithmetic Coding

Page 6 Multimedia Systems, Entropy Coding

Page 8: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionRunRun--Length Encoding (RLE)Length Encoding (RLE)

BBBBHHDDXXXXKKKKWWZZZZ 4B2H2D4X4K2W4Z

0, 40

0, 40

0,10 1,20 0,10

0,10 1,1 0,18 1,1 0,10

Page 7 Multimedia Systems, Entropy Coding

Image of a rectangle

0,10 1,1 0,18 1,1 0,10

0,10 1,1 0,18 1,1 0,10

0,10 1,1 0,18 1,1 0,10

0,10 1,20 0,10

0,40

RLE used in

Fax machines.

Page 9: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionFixed Length Coding (FLC)Fixed Length Coding (FLC)

A simple example

►♣♣♠☻►♣☼►☻The message to code:

5 different symbols � at least 3 bits

Message length: 10 symbols

Codeword table

Page 8 Multimedia Systems, Entropy Coding

Total bits required to code: 10*3 = 30 bits

Codeword table

Page 10: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionVariable Length Coding (VLC)Variable Length Coding (VLC)

Intuition: Those symbols that are more frequent should have smaller codes, yet since

their length is not the same, there must be a way of distinguishing each code

►♣♣♠☻►♣☼►☻The message to code:

Codeword tableTo identify end of a codeword as

Page 9 Multimedia Systems, Entropy Coding

Total bits required to code: 3*2 +3*2+2*2+3+3 = 24 bits

Codeword tableTo identify end of a codeword as

soon as it arrives, no codeword can

be a prefix of another codeword

How to find the optimal codeword table?

Page 11: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionVLC, Example ApplicationVLC, Example Application

Morse code

nonprefix code

Needs separator symbol

for unique decodability

Page 10 Multimedia Systems, Entropy Coding

Page 12: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionHuffman Coding AlgorithmHuffman Coding Algorithm

Step 1: Take the two least probable symbols in the alphabet

(longest codewords, equal length, differing in last digit)

Step 2: Combine these two symbols into a single symbol, and repeat.

P(n): Probability of

symbol number n

Page 11 Multimedia Systems, Entropy Coding

symbol number n

Here there is 9 symbols.

e.g. symbols can be

alphabet letters ‘a’, ‘b’, ‘c’,

‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’

Page 13: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionHuffman Coding AlgorithmHuffman Coding Algorithm

Paper: "A Method for the Construction of

Minimum-Redundancy Codes“, 1952

Results in "prefix-free codes“

Most efficient

No other mapping will produce a smaller average output size,

Page 12 Multimedia Systems, Entropy Coding

David A. Huffman

1925-1999

No other mapping will produce a smaller average output size,

If the actual symbol frequencies agree with those used to create the code.

Cons:

Have to run through the entire data in advance to find frequencies.

‘Minimum-Redundancy’ is not favorable for error correction techniques (bits

are not predictable if e.g. one is missing).

Does not support block of symbols: Huffman is designed to code single

characters only. Therefore at least one bit is required per character, e.g. a word of

8 characters requires at least an 8 bit code.

Page 14: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Entropy CodingEntropy, DefinitionEntropy, Definition

The entropy, H, of a discrete random variable X is a measure of the

amount of uncertainty associated with the value of X.

Measure of information content (in bits)

A quantitative measure of the disorder of a system

( ) ( )( )2

1log

x X

H X P xP x∈

= ⋅∑X � Information Source

P(x) � Probability that symbol x in X will occur

Information Theory

Point of View

Page 13 Multimedia Systems, Entropy Coding

A quantitative measure of the disorder of a system

It is impossible to compress the data such that the average

number of bits per symbol is less than the Shannon entropy

of the source(in noiseless channel)

The Intuition Behind the FormulaClaude E. Shannon

1916-2001( )( )1

amount of uncertatinty P x HP x

↑ ⇒ ↓ ⇒ ∼

( )( )2

1bringing it to the world of bits log , information content of H I x x

P x⇒ ∼ =

( )weighted average number of bits required to encode each possible value and P x⇒ × ∑

Page 15: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionLempelLempel--Ziv (LZ77)Ziv (LZ77)

Algorithm for compression of character sequences

Assumption: Sequences of characters are repeated

Idea: Replace a character sequence by a reference to an earlier occurrence1. Define a: search buffer = (portion) of recently encoded data

look-ahead buffer = not yet encoded data

2. Find the longest match between

Page 14 Multimedia Systems, Entropy Coding

2. Find the longest match between

the first characters of the look ahead buffer

and an arbitrary character sequence in the search buffer

3. Produces output <offset, length, next_character>

offset + length = reference to earlier occurrence

next_character = the first character following the match in the look ahead buffer

Page 16: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionLempelLempel--ZivZiv--Welch (LZW)Welch (LZW)

Drops the search buffer and keeps an explicit dictionary

Produces only output <index>

Used by unix "compress", "GIF", "V24.bis", "TIFF”

Example: wabbapwabbapwabbapwabbapwoopwoopwoo

Progress clip at 12th entry

Page 15 Multimedia Systems, Entropy Coding

Progress clip at 12th entry

Encoder output sequence so far: 5 2 3 3 2 1

Page 17: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionLempelLempel--ZivZiv--Welch (LZW)Welch (LZW)

Example: wabbapwabbapwabbapwabbapwoopwoopwoo

Progress clip at the end of above example

Page 16 Multimedia Systems, Entropy Coding

Encoder output sequence: 5 2 3 3 2 1

6 8 10 12 9 11 7 16 5 4 4 11 21 23 4

Page 18: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionArithmetic CodingArithmetic Coding

Encodes the block of symbols into a single number, a

fraction n where (0.0 ≤ n < 1.0).

Step 1: Divide interval [0,1) into subintervals based on probability of

the symbols in the current context ���� Dividing Model.

Step 2: Divide interval corresponds to the current symbol into sub-

Page 17 Multimedia Systems, Entropy Coding

Step 2: Divide interval corresponds to the current symbol into sub-

intervals based on dividing model of step 1.

Step 3: Repeat Step 2 for all symbols in the block of symbols.

Step 4: Encode the block of symbols with a single number in the

final resulting range. Use the corresponding binary number in this

range with the smallest number of bits.

See the encoding and decoding examples in the following slides

Page 19: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionArithmetic Coding, EncodingArithmetic Coding, Encoding

Example: SQUEEZE

Using FLC: 3 bits per symbol � 7*3 = 21 bits

P(‘E’) = 3/7 Prob. ‘S’ ‘Q’ ‘U’ ‘Z’: 1/7

Page 18 Multimedia Systems, Entropy Coding

We can encode the word SQUEEZE with a

single number in [0.64769-0.64772) range.

The binary number in this range with the

smallest number of bits is 0.101001011101,

which corresponds to 0.647705 decimal. The '0.'

prefix does not have to be transmitted because

every arithmetic coded message starts with this

prefix. So we only need to transmit the sequence

101001011101, which is only 12 bits.

Dividing Model

Page 20: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionArithmetic Coding, DecodingArithmetic Coding, Decoding

Input Probabilities: P(‘A’)=60%, P(‘B’)=20%, P(‘C’)=10%, P(‘<space>’)=10%

Decoding the input value of 0.538

Dividing model from input probabilities �

60% 20% 10% 10%

Page 19 Multimedia Systems, Entropy Coding

The fraction 0.538 (the circular point) falls into the

sub-interval [0, 0.6) � the first decoded symbol is 'A'

The subregion containing the point is successively

subdivided in the same way as diviging model.

Since .538 is within the interval [0.48, 0.54), the

second symbol of the message must have been 'C'.

Since .538 falls within the interval [0.534, 0.54), the

Third symbol of the message must have been '<space>'.

The internal protocol in this example indicates <space> as the

termination symbol, so we consider this is the end of decoding process

Page 21: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Lossless CompressionArithmetic CodingArithmetic Coding

Pros

Typically has a better compression ratio than

Huffman coding.

Cons

Page 20 Multimedia Systems, Entropy Coding

Cons

High computational complexity.

Patent situation had a crucial influence to decisions

about the implementation of an arithmetic coding

(Many now are expired).

Page 22: Lec05, Entropy Coding, v1.04.pptce.sharif.edu/courses/92-93/2/ce342-1/resources... · Lempel-Ziv-Welch (LZW) Arithmetic Coding Page 6 Multimedia Systems, Entropy Coding. Lossless

Thank You

Multimedia SystemsMultimedia Systems

Entropy CodingEntropy Coding

Page 21 Multimedia Systems, Entropy Coding

Thank You

1. http://ce.sharif.edu/~m_amiri/

2. http://www.dml.ir/

FIND OUT MORE AT...

Next Session: Color SpaceNext Session: Color Space