Willems is It 2013
-
Upload
beboha2008 -
Category
Documents
-
view
219 -
download
0
Transcript of Willems is It 2013
-
8/18/2019 Willems is It 2013
1/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Lossless Source Coding Algorithms
Frans M.J. Willems
Department of Electrical EngineeringEindhoven University of Technology
ISIT 2013, Istanbul, Turkey
http://find/
-
8/18/2019 Willems is It 2013
2/102
-
8/18/2019 Willems is It 2013
3/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Choosing a Topic
POSSIBLE TOPICS:Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
http://find/
-
8/18/2019 Willems is It 2013
4/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Choosing a Topic
POSSIBLE TOPICS:Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
http://find/
-
8/18/2019 Willems is It 2013
5/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Choosing a Topic
POSSIBLE TOPICS:Multi-user Information Theory (with Edward van der Meulen (KUL),Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?Lossless Source Coding is about UNDERSTANDING data. UniversalLossless Source Coding is focussing on FINDING STRUCTURE indata. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
http://find/http://goback/
-
8/18/2019 Willems is It 2013
6/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Lecture Structure
TUTORIAL, binary case, my favorite algorithms, ...REMARKS, open problems, ...
http://find/
-
8/18/2019 Willems is It 2013
7/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
1 INTRODUCTION
2 HUFFMAN and TUNSTALL
Binary IID SourcesHuffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-Sources
Context TreesCoding Probabilities
6 REPETITION TIMESLZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
http://find/
-
8/18/2019 Willems is It 2013
8/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Binary Sources, Sequences, IID
Binary Sourcex 1x 2 · · · x N
The binary source produces a sequence x N 1 = x 1x 2 · · · x N with components∈ {0, 1} with probability P (x N 1 ).
Definition (Binary IID Source)
For an independent identically distributed (i.i.d.) source with parameter
θ, for 0 ≤ θ ≤ 1,P (x N 1 ) =
N n=1
P (x n),
whereP (1) = θ, and P (0) = 1 − θ.
A sequence x N 1 containing N − w zeros and w ones has probabilityP (x N 1 ) = (1 − θ)N −w θw .
Entropy IID Source
The ENTROPY of this source is h(θ) ∆= (1 − θ)log2 11−θ + θ log2 1θ (bits).
http://find/
-
8/18/2019 Willems is It 2013
9/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Binary Sources, Sequences, IID
Binary Sourcex 1x 2 · · · x N
The binary source produces a sequence x N 1 = x 1x 2 · · · x N with components∈ {0, 1} with probability P (x N 1 ).
Definition (Binary IID Source)
For an independent identically distributed (i.i.d.) source with parameter
θ, for 0 ≤ θ ≤ 1,P (x N 1 ) =
N n=1
P (x n),
whereP (1) = θ, and P (0) = 1 − θ.
A sequence x N 1 containing N − w zeros and w ones has probabilityP (x N 1 ) = (1 − θ)N −w θw .
Entropy IID Source
The ENTROPY of this source is h(θ) ∆= (1 − θ)log2 11−θ + θ log2 1θ (bits).
http://find/
-
8/18/2019 Willems is It 2013
10/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Fixed-to-Variable (FV) Length Codes
IDEA:
Give more probable sequences shorter codewords than less probablesequences.
Definition (FV-Length Code)
A FV-length code assigns to source sequence x N 1 a binary codeword c (x N 1 )
of length L(x N 1 ). The rate of a FV code is
R ∆=
E [L(X N 1 )]
N (code-symbols/source-symbol).
GOAL:
We would like to find decodable FV-length codes that MINIMIZE this rate.
http://find/
-
8/18/2019 Willems is It 2013
11/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Prefix Codes
Definition (Prefix code)
In a prefix code no codeword is the prefix of any other codeword.
We focus on prefix codes. Codewords in a prefix code can be regarded asleaves in a rooted tree. Prefix codes lead to instantaneous decodability.
Example
x N 1 c (x N 1 ) L(x
N 1 )
00 0 101 10 2
10 110 311 111 3 ∅1
0
11
10
111
110
http://goforward/http://find/http://goback/
-
8/18/2019 Willems is It 2013
12/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Prefix-Codes (cont.)
Theorem (Kraft, 1949)
(a) The lengths of the codewords in a prefix code satisfy Kraft’s inequality
x N 1 ∈X
N
2−L(x N 1 ) ≤ 1.
(b) For codeword lengths satisfying Kraft’s inequality there exists a prefix code with these lengths.
This leads to:
Theorem (Fano, 1961)
(a) Any prefix code satisfies
E [L(X N 1 )] ≥ H (X N 1 ) = Nh(θ),
or equivalently R ≥ h(θ). The minimum is achieved if and only if L(x N 1 ) = − log2(P (x N 1 )) ( ideal codeword length) for all x N 1 ∈ X N withnonzero P (x N 1 ).(b) There exist prefix codes with
E [L(X N 1 )] < H (X N 1 ) + 1 = Nh(θ) + 1,
or equivalently R
-
8/18/2019 Willems is It 2013
13/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ MethodVF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Code
Definition (Optimal FV-length Code)
A code that minimizes the expected codeword-length E [L(X N 1 )] (and therate R ) is called optimal.
Theorem (Huffman, 1952)
The Huffman construction leads to an optimal FV-length code.CONSTRUCTION:
Consider the set of probabilities {P (x N 1 ), x N 1 ∈ X N }.Replace two smallest probabilities by a probability which is their sum.Label the branches from these two smallest probabilities to their sumwith code-symbols “0” and “1”.
Continue like this until only one probability (equal to 1) is left.
Obviously Huffman’s code results in E [L(X N 1 )] < H (X N 1 ) + 1 = Nh(θ) + 1
and therefore R
-
8/18/2019 Willems is It 2013
14/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Code
Definition (Optimal FV-length Code)
A code that minimizes the expected codeword-length E [L(X N 1 )] (and therate R ) is called optimal.
Theorem (Huffman, 1952)
The Huffman construction leads to an optimal FV-length code.CONSTRUCTION:
Consider the set of probabilities {P (x N 1 ), x N 1 ∈ X N }.Replace two smallest probabilities by a probability which is their sum.Label the branches from these two smallest probabilities to their sumwith code-symbols “0” and “1”.
Continue like this until only one probability (equal to 1) is left.
Obviously Huffman’s code results in E [L(X N 1 )] < H (X N 1 ) + 1 = Nh(θ) + 1
and therefore R
-
8/18/2019 Willems is It 2013
15/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0
.2161
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N 1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
http://find/
-
8/18/2019 Willems is It 2013
16/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0.216
1
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N 1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+
2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
http://find/
-
8/18/2019 Willems is It 2013
17/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0.216
1
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N 1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
H ff ’ C i
http://find/
-
8/18/2019 Willems is It 2013
18/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0.216
1
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N
1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
H ff ’ C i
http://find/
-
8/18/2019 Willems is It 2013
19/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0.216
1
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N
1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
H ff ’ C t ti
http://find/
-
8/18/2019 Willems is It 2013
20/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman’s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0.216
1
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N
1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
H ff ’ C t ti
http://find/http://goback/
-
8/18/2019 Willems is It 2013
21/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0.216
1
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N
1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
Huffman’s Construction
http://find/
-
8/18/2019 Willems is It 2013
22/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0.216
1
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N
1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
Huffman’s Construction
http://find/
-
8/18/2019 Willems is It 2013
23/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffman s Construction
Example
Let N = 3 and θ = 0.3, then h(0.3) = 0.881.
.343
.147
.147
.063
.147
.063
.063
.027
.0901
0
.126
1
0.216
1
0
.2941
0 .3630
1
.6371
0
1.001
0
Now E [L(X N
1 )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.
Remarks: Huffman Code
http://find/
-
8/18/2019 Willems is It 2013
24/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Remarks: Huffman Code
Note that R ↓ h(θ) when N → ∞.Always E [L(X N 1 )] ≥ 1. For θ ≈ 0 a Huffman code has expectedcodeword length E [L(X N 1 )] ≈ 1 and rate R ≈ 1/N .Better bounds exist for Huffman codes than E [L(X N 1 )] < H (X N 1 ) + 1.E.g. Gallager [1978] showed that
E [L(X N 1 )] − H (X N 1 ) ≤ maxx N 1
P (x N 1 ) + 0.086.
Adaptive Huffman Codes (Gallager [1978]).
Variable-to-Fixed (VF) Length Codes
http://find/
-
8/18/2019 Willems is It 2013
25/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Variable-to-Fixed (VF) Length Codes
IDEA:Parse the source output into variable-length segments of roughly the sameprobability. Code all these segments with codewords of fixed length.
Definition (VF-Length Code)
A VF-length code is defined by a set of variable-length source segments.Each segment x ∗ in the set gets a unique binary codeword c (x ∗) of lengthL. The length of a segment x ∗ is denoted as N (x ∗). The rate of aVF-code is
R ∆=
L
E [N (X ∗)] (code-symbols/source symbol).
GOAL:
We would like to find parsable VF-length codes that MINIMIZE this rate.
Proper-and-Complete Segment Sets
http://find/http://goback/
-
8/18/2019 Willems is It 2013
26/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree SourcesContext Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Proper and Complete Segment Sets
Definition (Proper-and-Complete Segment Sets)
A set of source segments is proper-and-complete if each semi-infinitesource sequence has a unique prefix in this segment set.
We focus on proper-and-complete segments sets. Segments in aproper-and-complete set can be regarded as leaves in a rooted tree. Such
sets guarantee instantaneous parsability.
Example
x ∗ N (x ∗) c (x ∗)1 1 11
01 2 10001 3 01000 3 00
∅
0
1
00
01
000
001
Proper-and-Complete Segment Sets: Leaf-Node Lemma
http://find/
-
8/18/2019 Willems is It 2013
27/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Proper and Complete Segment Sets: Leaf Node Lemma
Assume that the source is IID with parameter θ. Consider a set of segmentsand all their prefixes. Depict them in a tree. The segments are leaves, theprefixes nodes. Note that all the nodes and leaves have a probability. E.g.
P (10) = θ(1 − θ). Let F (·) be a function on nodes, leaves.
F (∅)
F (0)
F (1)
0
1
F (10)
F (11)
0
1
F (100)
F (101)
0
1
Lemma (Massey, 1983)
l ∈leaves P (l )[F (l ) − F (∅)] =
n∈nodes P (n)
s ∈sons of nP (s )
P (n)[F (s ) − F (n)].
Let F (x ∗) = # of edges from x ∗ to root, then
E [N (X ∗)] =
x ∗∈nodes
P (x ∗).
Let F (x ∗) = − log2 P (x ∗), thenH (X ∗) = E [N (X ∗)]h(θ).
Proper-and-Complete Segment Sets: Leaf-Node Lemma
http://find/
-
8/18/2019 Willems is It 2013
28/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Proper and Complete Segment Sets: Leaf Node Lemma
Assume that the source is IID with parameter θ. Consider a set of segmentsand all their prefixes. Depict them in a tree. The segments are leaves, theprefixes nodes. Note that all the nodes and leaves have a probability. E.g.
P (10) = θ(1 − θ). Let F (·) be a function on nodes, leaves.
F (∅)
F (0)
F (1)
0
1
F (10)
F (11)
0
1
F (100)
F (101)
0
1
Lemma (Massey, 1983)
l ∈leaves P (l )[F (l ) − F (∅)] =
n∈nodes P (n)
s ∈sons of nP (s )
P (n)[F (s ) − F (n)].
Let F (x ∗) = # of edges from x ∗ to root, then
E [N (X ∗)] =
x ∗∈nodes
P (x ∗).
Let F (x ∗) = − log2 P (x ∗), thenH (X ∗) = E [N (X ∗)]h(θ).
Proper-and-Complete Segment Sets: Result
http://find/
-
8/18/2019 Willems is It 2013
29/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
p p g
Theorem
For any proper-and-complete segment set with no more than 2L segments
L ≥ H (X ∗) = E [N (X ∗)]h(θ),
or R = L/E [N (X ∗)] ≥ h(θ).
More precisely, since
R = L
E [N (X ∗)] =
L
H (X ∗)h(θ),
we should make H (X ∗) as close as possible to L, hence all segmentsshould have roughly the same probability.
Tunstall’s Code
http://find/
-
8/18/2019 Willems is It 2013
30/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Consider 0 < θ ≤ 1/2.
Definition (Optimal VF-length Code)
A code that maximizes the expected segment-length E [N (X ∗)] is calledoptimal. Such a code minimizes the rate R .
Theorem (Tunstall, 1967)
The Tunstall construction leads to an optimal code.CONSTRUCTION:
Start with the empty segment ∅ which has unit probability.As long as the number of segments is smaller than 2L replace asegment s with largest probability P (s ) by two segments s 0 and s 1.
The probabilities of the new segments (leaves) are P (s 0) = P (s )(1 − θ) and P (s 1) = P (s )θ.
The Tunstall construction results in H (X ∗) ≥ L − log2(1/θ) and therefore R ≤ L
L+log2(θ)h(θ) (Jelinek and Schneider [1972]).
Tunstall’s Code
http://find/
-
8/18/2019 Willems is It 2013
31/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Consider 0 < θ ≤ 1/2.
Definition (Optimal VF-length Code)
A code that maximizes the expected segment-length E [N (X ∗)] is calledoptimal. Such a code minimizes the rate R .
Theorem (Tunstall, 1967)
The Tunstall construction leads to an optimal code.CONSTRUCTION:
Start with the empty segment ∅ which has unit probability.As long as the number of segments is smaller than 2L replace asegment s with largest probability P (s ) by two segments s 0 and s 1.
The probabilities of the new segments (leaves) are P (s 0) = P (s )(1 − θ) and P (s 1) = P (s )θ.
The Tunstall construction results in H (X ∗) ≥ L − log2(1/θ) and therefore R ≤ L
L+log2(θ)h(θ) (Jelinek and Schneider [1972]).
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
32/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
33/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
34/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
35/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
36/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
37/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
38/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
39/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Tunstall’s Construction
http://find/
-
8/18/2019 Willems is It 2013
40/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
Let L = 3 and θ = 0.3. Again h(0.3) = 0.881.
1.00
.700
.300
0
1
.490
.210
0
1
.343
.147
0
1
.240
.103
0
1
.210
.090
0
1
.168
.072
0
1
.147
.063
0
1
Now E [N (X ∗)] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 andtherefore R = 3/3.283 = 0.914.
Remarks: Tunstall Code
http://find/
-
8/18/2019 Willems is It 2013
41/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Note that R ↓ h(θ) when L → ∞.For θ ≈ 0 a Tunstall code has expected segment lengthE [N (X ∗)] ≈ 2L − 1 and rate R ≈ L/(2L − 1). Better than Huffmanfor L = N .
In each step in the Tunstall procedure, a leaf with the largestprobability is changed into a node. This leads to:
The largest increase in expected segment length (Massey LN-lemma),and P (n) ≥ P (l ) for all nodes n and leaves l .Therefore for any two leafs l and l we can say that
P (l ) ≥ θP (n) ≥ P (l ).
So leaves cannot differ too much in probability. This fact is used to lowerbound H (X ∗) ≥ L− log2(1/θ).
Optimal VF-length codes can also be found by fixing a number γ and
defining a node to be internal if its probability is ≥ γ (Khodak, 1969).The size of the segment set is not completely controllable now.
Run-length Codes (Golomb [1966]).
Outline
http://find/
-
8/18/2019 Willems is It 2013
42/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID Sources
Huffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4
ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext Trees
Coding Probabilities6 REPETITION TIMES
LZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
Lexicographical Ordering
http://find/
-
8/18/2019 Willems is It 2013
43/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
IDEA:
Sequences having the same weight (and probability) only need to be
INDEXED. The binary representation of the index can be taken ascodeword.
Definition (Lexicographical Ordering, Index)
In a lexicographical ordering (0 < 1) we say that x N 1
-
8/18/2019 Willems is It 2013
44/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Theorem (Cover, 1973)
From the sequence x N 1 ∈ S we can compute index
i S(x N 1 ) =
n=1,N :x n =1
#S(x 1, x 2, · · · , x n−1, 0),
where #S(x 1, x 2,
· · · , x k ) denotes the number of sequences in
S having prefix x 1, · · · , x k .Moreover from the index i S(x
N 1 ) the sequence x
N 1 can be computed if
numbers #S(x 1, x 2, · · · , x n−1, 0) for n = 1, N are available.The index of a sequence can be represented by a codeword of fixed lengthlog2 |S|.
Example
Index i S(10100) = #S(0 )+#S(100) =4
2
+2
1
= 6 + 2 = 8 hence, since
|S| = 10 the corresponding codeword is 1000.
http://find/
-
8/18/2019 Willems is It 2013
45/102
FV: Pascal-Triangle Method (cont.)
-
8/18/2019 Willems is It 2013
46/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry CodeARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
First note that
H (X N 1 ) = H (X N 1 , w (X N 1 )) = H (W ) + H (X N 1 |W ).If we use enumerative coding for X N 1 given weight w , since all sequenceswith a fixed weight have equal probability
E [L(X N 1 |W )] = w =0,1,N
P (w )log2N
w <
w =0,1,N
P (w )log2
N w
+ 1 = H (X N 1 |W ) + 1.
If W is encoded using a Huffman code we obtain
E [L(X N 1
)] = E [L(W )] + E [L(X N 1 |
W )]
≤ H (W ) + 1 + H (X N 1 |W ) + 1= H (X N 1 ) + 2.
Worse than Huffman, but no big code-table needed however.
http://find/
-
8/18/2019 Willems is It 2013
47/102
VF: Petry Code
-
8/18/2019 Willems is It 2013
48/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry CodeARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
IDEA:
Modify the Tunstall segment sets such that the segments can be indexed.
Again let 0 < θ ≤ 1/2. It can be shown that a proper-and-completesegment set is a Tunstall set (maximal E [N (X ∗)] given the number of segments) if and only if for all nodes n and all leaves l
P (n) ≥ P (l ).
Consequence
If the segments x ∗ in a proper-and-complete segment set satisfy
P (x ∗−1) > γ ≥ P (x ∗),
where x ∗−1 is x ∗ without the last symbol, this segment set is a Tunstall set. Constant γ determines the size of the set.
SinceP (x ∗) = (1 − θ)n0(x ∗)θn1 (x ∗),
where n0(x ∗) is the number or zeros in x ∗ and n1(x ∗) the number of onesin x ∗, etc., this is equivalent to
An0(x ∗−1) + Bn1(x
∗−1) < C ≤ An0(x ∗) + Bn1(x ∗)for A = − logb (1 − θ), B = − logb θ, C = − logb γ , and some log-base b .
VF: Petry Code (cont.)
http://goback/http://find/http://goback/
-
8/18/2019 Willems is It 2013
49/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry CodeARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Note that log-base b has to satisfy
1 = (1 − θ) + θ = b −A + b −B .
For special values of θ, A and B are integers. E.g. for θ = (1 − θ)2 weobtain that A = 1 and B = 2 for b = (1 +
√ 5)/2. Now C can also
assumed to be integer. The corresponding codes are called Petry codes.
Definition (Petry (Schalkwijk), 1982)
Fix integers A and B . The segments x ∗ in a proper-and-complete Petrysegment set satisfy
An0(x ∗−1) + Bn1(x
∗−1) < C ≤ An0(x ∗) + Bn1(x ∗).
Integer C can be chosen to control the size of the set.
Linear Array
Petry codes can be implemented using a linear array.
http://find/http://goback/
-
8/18/2019 Willems is It 2013
50/102
VF: Petry Code (cont.)
-
8/18/2019 Willems is It 2013
51/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry CodeARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Theorem (Tjalkens & W. (1987))
A Petry code with parameters A < B , and C is a Tunstall code for parameter q where q = b −B when b is the solution of b −A + b −B = 1.For arbitrary θ the rate
log2 σ(C )
E [N (X ∗
)] ≤ C + (B − 1)
C
(h(θ) + d (θ
||q )).
Example
In the table q for several values of A and B :
A B 2 3 4 5
1 0.382 0.318 0.276 0.2452 0.4302 0.382 0.3463 0.450 0.412
Remarks: VF Petry Code
http://find/
-
8/18/2019 Willems is It 2013
52/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry CodeARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Note that log2 σ(C )/E [N (X ∗)] ↓ h(θ) + d (θ|q ) when C → ∞, hence
a Petry code achieves entropy for θ = q .
Tjalkens and W. investigated VF-length Petry codes for Markovsources, again with a linear array for each state.
VF-length universal enumerative solutions exist (Lawrence [1977],Tjalkens and W. [1992]).
The numbers in the linear array show exponential behaviour. Also anarray 2−i /M f for i = 1, M can be used, through which we makesteps (Tjalkens [PhD, 1987]). This reduces the storage complexityand is similar to Rissanen [1976] multiplication-avoiding arithmeticcoding (Generalized Kraft inequality).
Outline
http://goforward/http://find/http://goback/
-
8/18/2019 Willems is It 2013
53/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry CodeARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1 INTRODUCTION
2 HUFFMAN and TUNSTALLBinary IID Sources
Huffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODINGIntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext Trees
Coding Probabilities6 REPETITION TIMES
LZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
Idea Elias
http://find/
-
8/18/2019 Willems is It 2013
54/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry CodeARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Elias:
If source sequences are ORDERED LEXICOGRAPHICALLY then
codewords can be COMPUTED SEQUENTIALLY from the sourcesequence using conditional PROBABILITIES of next symbol given theprevious ones, and vice versa.
Source Intervals
http://find/
-
8/18/2019 Willems is It 2013
55/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Definition
Order the source sequences x N 1 ∈ {
0, 1}
N lexicographically according to0 < 1.Now, to each source sequence x N 1 ∈ {0, 1}N there corresponds asource-interval
I (x N 1 ) = [Q (x N 1 ), Q (x
N 1 ) + P (x
N 1 ))
withQ (x N 1 ) =
x̃ N 1
-
8/18/2019 Willems is It 2013
56/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
A codeword c with length L can be regarded as a binary fraction .c .If we concatenate this codeword with others the corresponding fractioncan increase, but no more than 2−L.
Definition
To a codeword c (x N 1 ) with length L(x N 1 ) there corresponds a code interval
J (x N 1 ) = [.c (x N 1 ), .c (x
N 1 ) + 2
−L(x N 1 )).
Note that J (x N 1 ) ⊂ [0, 1).
LOSSLESS SOURCE
Arithmetic coding: Encoding and Decoding
http://find/
-
8/18/2019 Willems is It 2013
57/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Procedure
ENCODING: Choose c such that the code interval ⊆ source interval,i.e.
[.c , .c + 2−L) ⊆ [Q (x N 1 ), Q (x N 1 ) + P (x N 1 )).DECODING: Is possible since there is only one source interval thatcontains the code interval.
Theorem
For sequence x N 1 with source-interval I (x N 1 ) = [Q (x N 1 ), Q (x N 1 ) + P (x N 1 ))take c (x N 1 ) as the codeword with
L(x N 1 ) ∆
= log21
P (x N 1 ) + 1
.c (x N 1 ) ∆
=
Q (x N 1 )
·2L(x
N 1 )
·2−L(x
N 1 ).
ThenJ (c (x N 1 )) ⊆ I (x N 1 ).
and
L(x N 1 ) < log21
P (x N 1 )+ 2,
i.e. less than two bits above the ideal cod e wo rd l eng th.
LOSSLESS SOURCE
http://find/
-
8/18/2019 Willems is It 2013
58/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
I.I.D. source with θ = 0.2 and N = 2.
Source Intervals
11
10
01
00
0.00
0.64
0.80
0.961.00 111110
1101
1011
00
0
1/4
11/163/4
13/16
7/8
31/3263/64
Code Intervals
Source-intervals are disjoint ⇒ code-intervals are disjoint ⇒ prefixcondition holds.
LOSSLESS SOURCE
http://find/
-
8/18/2019 Willems is It 2013
59/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTIONHUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
I.I.D. source with θ = 0.2 and N = 2.
Source Intervals
11
10
01
00
0.00
0.64
0.80
0.961.00 111110
1101
1011
00
0
1/4
11/163/4
13/16
7/8
31/3263/64
Code Intervals
Source-intervals are disjoint ⇒ code-intervals are disjoint ⇒ prefixcondition holds.
LOSSLESS SOURCE
Arithmetic Coding: Sequential Computation (Elias)
http://find/
-
8/18/2019 Willems is It 2013
60/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTIONHUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example (Connection to Cover’s formula)
Let L = 3 and θ = 0.2.
∅
0
1
00
01
10
11
000
001
010
011
100
101
110
111
Q (101) = P (0) + P (100) = 0.8 + 0.2 · 0.8 · 0.8 = 0.928.P (101) = P (1)P (0)P (1) = 0.2
·0.8
·0.2 = 0.032.
LOSSLESS SOURCE
Arithmetic Coding: Sequential Computation (Elias)
http://find/
-
8/18/2019 Willems is It 2013
61/102
LOSSLESS SOURCECODING ALGORITHMS
Frans M.J. Willems
INTRODUCTIONHUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
In general
Q (x N 1 ) =
n=1,N :x n =1
P (x 1, x 2, · · · , x n−1, 0),
P (x N 1 ) =N
n=1
P (x n
|x 1, x 2,
· · · , x n−1).
Sequential Computation
If we have access to P (x 1, x 2, · · · , x n, 0) and P (x 1, x 2, · · · , x n, 1) afterhaving processed P (x 1, x 2, · · · , x n) for n = 1, 2, · · · , N we can computeI
(x N
1 ) sequentially.
LOSSLESS SOURCE
Universal Coding
http://find/
-
8/18/2019 Willems is It 2013
62/102
OSS SS SOU CCODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Coding Probabilities
If the actual probabilities P (x N 1 ) are not known arithmetic coding is still
possible if instead of P (x N
1 ) we use coding probabilities P c (x N
1 ) satisfying
P c (x N 1 ) > 0 for all x
N 1 , and
x N 1
P c (x N 1 ) = 1.
ThenL(x N 1 ) < log2
1
P c (x N 1 )+ 2.
Encoder Decoderx N 1 c (x
N 1 ) x
N 1
P c (x 1 · · · x n−1, 0), P c (x 1 · · · x n−1, 1)for n = 1, N
P c (x 1 · · · x n−1, 0), P c (x 1 · · · x n−1, 1)for n = 1, N
PROBLEM: How do we choose the coding probabilities P c (x N 1 )?
LOSSLESS SOURCE
Individual Redundancy
http://find/
-
8/18/2019 Willems is It 2013
63/102
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Definition
The individual redundancy ρ(x N
1 ) of a sequence x N
1 is defined as
ρ(x N 1 ) = L(x N 1 ) − log2
1
P (x N 1 ),
i.e. codeword-length minus ideal codeword-length.
Bound Individual Redundancy
Arithmetic coding based on coding probabilities {P c (x N 1 ), x N 1 ∈ {0, 1}N }yields
ρ(x N 1 ) < log21
P c (x N
1
)+ 2 − log2
1
P (x N
1
)= log2
P (x N 1 )
P c (x N
1
)+ 2.
We say that the CODING redundancy < 2 bits.The coding probabilities should be as large as possible (as close as possibleto the actual probabilities). Next focus on remaining part of the individual
redundancy log2P (x N 1 )
P c (x N 1 )
.
LOSSLESS SOURCE
Remarks: Arithmetic Coding
http://find/
-
8/18/2019 Willems is It 2013
64/102
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Shannon [1948] already described relation between codewords andintervals, ordered probabilities however. Called Shannon-Fano code.
Shannon-Fano-Elias, arbitrary ordering, but not sequential.Finite precision issues arithmetic coding solved by Pasco [1976] andRissanen [1976].
LOSSLESS SOURCE
Outline
1 INTRODUCTION
http://find/
-
8/18/2019 Willems is It 2013
65/102
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
2 HUFFMAN and TUNSTALLBinary IID Sources
Huffman CodeTunstall Code
3 ENUMERATIVE CODINGLexicographical OrderingFV: Pascal-∆ MethodVF: Petry Code
4 ARITHMETIC CODING
IntervalsUniversal Coding, Individual Redundancy
5 CONTEXT-TREE WEIGHTINGIID, unknown θBinary Tree-SourcesContext Trees
Coding Probabilities6 REPETITION TIMES
LZ77Repetition Times, KacRepetition-Time AlgorithmAchieving Entropy
7 CONCLUSION
LOSSLESS SOURCECODING ALGORITHMS
CTW: Universal Codes
http://find/
-
8/18/2019 Willems is It 2013
66/102
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
IDEA:
Find good coding probabilities for sources with UNKNOWNPARAMETERS and STRUCTURE. Use WEIGHTING!
LOSSLESS SOURCECODING ALGORITHMS
Coding for a Binary IID Source, Unknown θ
Definition (Krichevsky Trofimov estimator (1981))
http://find/
-
8/18/2019 Willems is It 2013
67/102
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Definition (Krichevsky-Trofimov estimator (1981))
A good coding probability P c (x N 1 ) for a sequence x N 1 that contains a zeroes
and b
= N
−a
ones is
P e (a, b ) =
1θ=0
1
π
(1 − θ)θ· (1 − θ)aθb d θ.
(Dirichlet-(1/2, 1/2) prior, “weighting”).
Theorem
Upperbound on the PARAMETER redundancy
log2P (x N 1 )
P c (x N 1 )= log2
θa(1 − θ)b P e (a, b )
≤ 12
log2(a + b ) + 1 = 1
2 log2(N ) + 1.
for all θ and x N 1 with a zeros and b ones.Probability of a sequence with a zeroes and b ones followed by a zero
P e (a + 1, b ) = a + 1/2
a + b + 1 · P e (a, b ),
hence SEQUENTIAL COMPUTATION is possible!
LOSSLESS SOURCECODING ALGORITHMS
Individual Redundancy Binary IID source
http://find/
-
8/18/2019 Willems is It 2013
68/102
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context TreesCoding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
The total individual redundancy
ρ(x N 1 ) < log2θa(1 − θ)b
P e (a, b ) + 2 ≤
1
2 log2(N ) + 1
+ 2.
for all θ and x N 1 with a zeroes and b ones.
Shtarkov [1988]: 12
log2 N behaviour is asymptotically optimal forindividual redundancy for N → ∞ (NML-estimator)!Rissanen [1984]: Also for expected redundancy 1
2 log2 N behaviour is
asymptotically optimal.
LOSSLESS SOURCECODING ALGORITHMS
CTW: Binary Tree-Sources
Definition
http://find/
-
8/18/2019 Willems is It 2013
69/102
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Definition
∅
0
1
00
10
(tree-) model M = {00, 10, 1}
· · · x n−2 x n−1 x n
θ1 = 0.1
θ10 = 0.3
θ00 = 0.5
parameters
P (X n = 1| · · · , X n−1 = 1) = 0.1P (X n = 1| · · · , X n−2 = 1, X n−1 = 0) = 0.3P (X n = 1
| · · · , X n−2 = 0, X n−1 = 0) = 0.5
LOSSLESS SOURCECODING ALGORITHMS
CTW: Binary Tree-Sources
Definition
http://find/
-
8/18/2019 Willems is It 2013
70/102
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
∅
0
1
00
10
(tree-) model M = {00, 10, 1}
· · · x n−2 x n−1 x n
θ1 = 0.1
θ10 = 0.3
θ00 = 0.5
parameters
P (X n = 1| · · · , X n−1 = 1) = 0.1P (X n = 1| · · · , X n−2 = 1, X n−1 = 0) = 0.3P (X n = 1
| · · · , X n−2 = 0, X n−1 = 0) = 0.5
LOSSLESS SOURCECODING ALGORITHMS
CTW: Binary Tree-Sources
Definition
http://find/
-
8/18/2019 Willems is It 2013
71/102
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
∅
0
1
00
10
(tree-) model M = {00, 10, 1}
· · · x n−2 x n−1 x n
θ1 = 0.1
θ10 = 0.3
θ00 = 0.5
parameters
P (X n = 1| · · · , X n−1 = 1) = 0.1P (X n = 1| · · · , X n−2 = 1, X n−1 = 0) = 0.3P (X n = 1
| · · · , X n−2 = 0, X n−1 = 0) = 0.5
LOSSLESS SOURCECODING ALGORITHMS
CTW: Problem, Concepts
http://find/
-
8/18/2019 Willems is It 2013
72/102
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
PROBLEM:What are good coding probabilities for sequences x N 1 produced by atree-source with
an unknown tree-model,
and unknown parameters?
CONCEPTS:
CONTEXT TREE (Rissanen [1983])
WEIGHTING: If P 1(x ) or P 2(x ) are two alternative codingprobabilities for sequence x , then the weighted probability
P w (x ) ∆=
P 1(x ) + P 2(x )
2 ≥ 1
2 max(P 1(x ), P 2(x )),
thus we loose at most a factor of 2, which is one bit in redundancy.
LOSSLESS SOURCECODING ALGORITHMS
CTW: Problem, Concepts
http://find/
-
8/18/2019 Willems is It 2013
73/102
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
PROBLEM:What are good coding probabilities for sequences x N 1 produced by atree-source with
an unknown tree-model,
and unknown parameters?
CONCEPTS:
CONTEXT TREE (Rissanen [1983])
WEIGHTING: If P 1(x ) or P 2(x ) are two alternative codingprobabilities for sequence x , then the weighted probability
P w (x ) ∆=
P 1(x ) + P 2(x )
2 ≥ 1
2 max(P 1(x ), P 2(x )),
thus we loose at most a factor of 2, which is one bit in redundancy.
LOSSLESS SOURCECODING ALGORITHMS
Context Trees
D fi i i (C T )
http://find/
-
8/18/2019 Willems is It 2013
74/102
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal-∆ Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,Individual Redundancy
CONTEXT-TREEWEIGHTING
IID, unknown θ
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Definition (Context Tree)
∅
0
1
00
10
01
11
000
100
010
110
001
101
011
111
Node s contains the sequence of source symbols that have occurredfollowing context s . Depth is D .
LOSSLESS SOURCECODING ALGORITHMS
Context-tree splits up sequences in subsequences
Example