Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and...
-
Upload
nguyencong -
Category
Documents
-
view
231 -
download
1
Transcript of Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and...
Information and Coding Theory (CO349)Coding and Decoding
Herbert [email protected]
Department of ComputingImperial College London
Autumn 2017
Slide 1 of 40
Information and Codes
What will be covered in this course?
• Data and Information Representation• Data and Information Transmission
We will look at these problems from a quantitative, statisticalpoint of view as well as form an algebraic point of view.
In German, Dutch, Italian, etc: CS is Informatik – InformaticaInformation Science rather than Computer Science.
Slide 2 of 40
Information and Codes
What will be covered in this course?
• Data and Information Representation• Data and Information Transmission
We will look at these problems from a quantitative, statisticalpoint of view as well as form an algebraic point of view.
In German, Dutch, Italian, etc: CS is Informatik – InformaticaInformation Science rather than Computer Science.
Slide 2 of 40
Information and Codes
What will be covered in this course?
• Data and Information Representation• Data and Information Transmission
We will look at these problems from a quantitative, statisticalpoint of view as well as form an algebraic point of view.
In German, Dutch, Italian, etc: CS is Informatik – InformaticaInformation Science rather than Computer Science.
Slide 2 of 40
Practicalities
Two Lecturers
Herbert [email protected] 31
2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)
Mahdi [email protected] 31
2 weeks from 2 NovemberOpen-book coursework test 23 November
Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).
Different classes, different background, different applications.
Slide 3 of 40
Practicalities
Two Lecturers
Herbert [email protected] 31
2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)
Mahdi [email protected] 31
2 weeks from 2 NovemberOpen-book coursework test 23 November
Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).
Different classes, different background, different applications.
Slide 3 of 40
Practicalities
Two Lecturers
Herbert [email protected] 31
2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)
Mahdi [email protected] 31
2 weeks from 2 NovemberOpen-book coursework test 23 November
Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).
Different classes, different background, different applications.
Slide 3 of 40
Practicalities
Two Lecturers
Herbert [email protected] 31
2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)
Mahdi [email protected] 31
2 weeks from 2 NovemberOpen-book coursework test 23 November
Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).
Different classes, different background, different applications.
Slide 3 of 40
Practicalities
Two Lecturers
Herbert [email protected] 31
2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)
Mahdi [email protected] 31
2 weeks from 2 NovemberOpen-book coursework test 23 November
Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).
Different classes, different background, different applications.
Slide 3 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Overview – Information (Theory)
Part I: Information Theory
• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation
Part II: Coding Theory
• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography
Slide 4 of 40
Text Books
• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.
• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.
• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.
• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.
• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.
• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.
Slide 5 of 40
Text Books
• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.
• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.
• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.
• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.
• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.
• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.
Slide 5 of 40
Text Books
• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.
• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.
• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.
• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.
• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.
• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.
Slide 5 of 40
Text Books
• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.
• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.
• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.
• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.
• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.
• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.
Slide 5 of 40
Text Books
• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.
• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.
• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.
• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.
• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.
• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.
Slide 5 of 40
Text Books
• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.
• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.
• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.
• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.
• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.
• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.
Slide 5 of 40
Information and Security
Side-Channel Attacks (Kocher, 1996)The problem appears in attacks against public encryptionalgorithms like RSA. In (optimised) versions of de/encoding(using modular exponentation) properties of the secrete keydetermine the execution time.
How much information about the secret key is revealed?
Differential Privacy (Dwork, 2006)In large (statistical) databases an attacker can try to revealinformation about individuals (de-anonymise), e.g. there areonly 3 under-25 with hair loss registered, and there are twopeople getting hair-loss treatment in SW7. Kim is on thedatabase, 21 and lives at 170 Queen’s Gate.
How much information about individuals is revealed?
Slide 6 of 40
Information and Security
Side-Channel Attacks (Kocher, 1996)The problem appears in attacks against public encryptionalgorithms like RSA. In (optimised) versions of de/encoding(using modular exponentation) properties of the secrete keydetermine the execution time.
How much information about the secret key is revealed?
Differential Privacy (Dwork, 2006)In large (statistical) databases an attacker can try to revealinformation about individuals (de-anonymise), e.g. there areonly 3 under-25 with hair loss registered, and there are twopeople getting hair-loss treatment in SW7. Kim is on thedatabase, 21 and lives at 170 Queen’s Gate.
How much information about individuals is revealed?
Slide 6 of 40
Information and Security
Side-Channel Attacks (Kocher, 1996)The problem appears in attacks against public encryptionalgorithms like RSA. In (optimised) versions of de/encoding(using modular exponentation) properties of the secrete keydetermine the execution time.
How much information about the secret key is revealed?
Differential Privacy (Dwork, 2006)In large (statistical) databases an attacker can try to revealinformation about individuals (de-anonymise), e.g. there areonly 3 under-25 with hair loss registered, and there are twopeople getting hair-loss treatment in SW7. Kim is on thedatabase, 21 and lives at 170 Queen’s Gate.
How much information about individuals is revealed?
Slide 6 of 40
Information and Security
Side-Channel Attacks (Kocher, 1996)The problem appears in attacks against public encryptionalgorithms like RSA. In (optimised) versions of de/encoding(using modular exponentation) properties of the secrete keydetermine the execution time.
How much information about the secret key is revealed?
Differential Privacy (Dwork, 2006)In large (statistical) databases an attacker can try to revealinformation about individuals (de-anonymise), e.g. there areonly 3 under-25 with hair loss registered, and there are twopeople getting hair-loss treatment in SW7. Kim is on thedatabase, 21 and lives at 170 Queen’s Gate.
How much information about individuals is revealed?
Slide 6 of 40
Information – A Measure of Surprise
The Entropy of a probability distribution p is:
H(p) = −∑
p(x) log2(p(x)).
Example (Cover&Thomas)
Consider a Cheltenham Horse Race with 8 horses with winningchances (1
2 ,14 ,
18 ,
116 ,
164 ,
164 ,
164 ,
164), then the entropy is 2.
Label horses 0,10,110,1110,111100,111101,111110,111111then we need only only 2 bits in average to report winner.
For equally strong horses (18 ,
18 ,
18 ,
18 ,
18 ,
18 ,
18 ,
18) we have an
entropy of 3, and the minimal message length is also 3 bits.
Slide 7 of 40
Information – A Measure of Surprise
The Entropy of a probability distribution p is:
H(p) = −∑
p(x) log2(p(x)).
Example (Cover&Thomas)
Consider a Cheltenham Horse Race with 8 horses with winningchances (1
2 ,14 ,
18 ,
116 ,
164 ,
164 ,
164 ,
164), then the entropy is 2.
Label horses 0,10,110,1110,111100,111101,111110,111111then we need only only 2 bits in average to report winner.
For equally strong horses (18 ,
18 ,
18 ,
18 ,
18 ,
18 ,
18 ,
18) we have an
entropy of 3, and the minimal message length is also 3 bits.
Slide 7 of 40
Surprise vs Prejudice
A father and son are on a fishing trip in the mountains of Wales.On the way back home their car has a serious accident.
The father is immediately killed and declared dead on the siteof the accident. However, the son is severely injured and drivenby ambulance to the next hospital.
When the son is brought into the operating theatre the surgeonexclaims "I can’t do this, he is my son."
Scientific American, 1980s
Slide 8 of 40
Surprise vs Prejudice
A father and son are on a fishing trip in the mountains of Wales.On the way back home their car has a serious accident.
The father is immediately killed and declared dead on the siteof the accident. However, the son is severely injured and drivenby ambulance to the next hospital.
When the son is brought into the operating theatre the surgeonexclaims "I can’t do this, he is my son."
Scientific American, 1980s
Slide 8 of 40
Surprise vs Prejudice
A father and son are on a fishing trip in the mountains of Wales.On the way back home their car has a serious accident.
The father is immediately killed and declared dead on the siteof the accident. However, the son is severely injured and drivenby ambulance to the next hospital.
When the son is brought into the operating theatre the surgeonexclaims "I can’t do this, he is my son."
Scientific American, 1980s
Slide 8 of 40
Everything You Always Wanted toKnow About Logarithms But Were
Afraid to AskDefined (implicitly) via the equation:
x = blogb(x)
Common logarithms: b = 2, b = 10, and b = 2.71828 . . . = e
logb(xy) = logb(x) + logb(y)
logb(xn) = n logb(x), . . . logb(x) =loga(x)
loga(b)
Example
log2(64) = 6 as 26 = 64; log2( 164) = −6 as 1
64 = 2−6 (2−1 = 12 ).
Slide 9 of 40
Everything You Always Wanted toKnow About Logarithms But Were
Afraid to AskDefined (implicitly) via the equation:
x = blogb(x)
Common logarithms: b = 2, b = 10, and b = 2.71828 . . . = e
logb(xy) = logb(x) + logb(y)
logb(xn) = n logb(x), . . . logb(x) =loga(x)
loga(b)
Example
log2(64) = 6 as 26 = 64; log2( 164) = −6 as 1
64 = 2−6 (2−1 = 12 ).
Slide 9 of 40
Everything You Always Wanted toKnow About Logarithms But Were
Afraid to AskDefined (implicitly) via the equation:
x = blogb(x)
Common logarithms: b = 2, b = 10, and b = 2.71828 . . . = e
logb(xy) = logb(x) + logb(y)
logb(xn) = n logb(x), . . . logb(x) =loga(x)
loga(b)
Example
log2(64) = 6 as 26 = 64; log2( 164) = −6 as 1
64 = 2−6 (2−1 = 12 ).
Slide 9 of 40
Messages and Strings
DefinitionAn alphabet is a finite set S; its members s ∈ S are referred toas symbols.
DefinitionA message m in the alphabet S is a finite sequence ofelements in S:
m = s1s2 · · · sn with si ∈ S,1 ≤ i ≤ n.
A message is often also referred to as a string or word in S.The number n ∈ N is the length of m.
Notation for length |m| = n; prefix m|k = s1s2 · · · sk (with k ≤ n).
Slide 10 of 40
Messages and Strings
DefinitionAn alphabet is a finite set S; its members s ∈ S are referred toas symbols.
DefinitionA message m in the alphabet S is a finite sequence ofelements in S:
m = s1s2 · · · sn with si ∈ S,1 ≤ i ≤ n.
A message is often also referred to as a string or word in S.The number n ∈ N is the length of m.
Notation for length |m| = n; prefix m|k = s1s2 · · · sk (with k ≤ n).
Slide 10 of 40
Messages and Strings
DefinitionAn alphabet is a finite set S; its members s ∈ S are referred toas symbols.
DefinitionA message m in the alphabet S is a finite sequence ofelements in S:
m = s1s2 · · · sn with si ∈ S,1 ≤ i ≤ n.
A message is often also referred to as a string or word in S.The number n ∈ N is the length of m.
Notation for length |m| = n; prefix m|k = s1s2 · · · sk (with k ≤ n).
Slide 10 of 40
Notation
Consider also string of length zero, often denoted ε with |ε| = 0.
By S0 we denote the set of words containing only ε; Sn denotesthe set of all strings of length n; and
S∗ = S0 ∪ S1 ∪ S2 ∪ · · ·
Example
With the alphabet S = B = 0,1 we can produce all binarymessages of a certain length n. Concretely, for example:
B3 = 000,001,010,011,100,101,110,111
Slide 11 of 40
Notation
Consider also string of length zero, often denoted ε with |ε| = 0.
By S0 we denote the set of words containing only ε; Sn denotesthe set of all strings of length n; and
S∗ = S0 ∪ S1 ∪ S2 ∪ · · ·
Example
With the alphabet S = B = 0,1 we can produce all binarymessages of a certain length n. Concretely, for example:
B3 = 000,001,010,011,100,101,110,111
Slide 11 of 40
Notation
Consider also string of length zero, often denoted ε with |ε| = 0.
By S0 we denote the set of words containing only ε; Sn denotesthe set of all strings of length n; and
S∗ = S0 ∪ S1 ∪ S2 ∪ · · ·
Example
With the alphabet S = B = 0,1 we can produce all binarymessages of a certain length n. Concretely, for example:
B3 = 000,001,010,011,100,101,110,111
Slide 11 of 40
Coding
DefinitionLet S and T be two alphabets. A code c is an injective function
c : S → T ∗
For each symbol s ∈ S the string c(s) ∈ T ∗ is called thecodeword for s. The set of all codewords
C = c(s) | s ∈ S
is also referred to as the code.
For |T | = 2, e.g. T = B, we have a binary, for |T | = 3 a ternary,and for |T | = b a b-ary code.
Slide 12 of 40
Coding
DefinitionLet S and T be two alphabets. A code c is an injective function
c : S → T ∗
For each symbol s ∈ S the string c(s) ∈ T ∗ is called thecodeword for s. The set of all codewords
C = c(s) | s ∈ S
is also referred to as the code.
For |T | = 2, e.g. T = B, we have a binary, for |T | = 3 a ternary,and for |T | = b a b-ary code.
Slide 12 of 40
Example: Morse CodeExample
Development from 1836 by Samuel F. B. Morse (1791–1872).This is a code c : A, . . . ,Z → • , – ,∗ (with variations):
A • –B – • • •C – • – •D – • •E •F • • – •G – – •
H • • • •I • •J • – – –K – • –L • – • •M – –N – •
O – – –P • – – •Q – – • –R • – •S • • •T –U • • –
V • • • –W • – –X – • • –Y – • – –Z – – • •
Most famous (after April 1912) perhaps:
SOS 7→ • • •– – –• • • ≡ • • • – – – • • •
Slide 13 of 40
Example: Morse CodeExample
Development from 1836 by Samuel F. B. Morse (1791–1872).This is a code c : A, . . . ,Z → • , – ,∗ (with variations):
A • –B – • • •C – • – •D – • •E •F • • – •G – – •
H • • • •I • •J • – – –K – • –L • – • •M – –N – •
O – – –P • – – •Q – – • –R • – •S • • •T –U • • –
V • • • –W • – –X – • • –Y – • – –Z – – • •
Most famous (after April 1912) perhaps:
SOS 7→ • • •– – –• • • ≡ • • • – – – • • •
Slide 13 of 40
Decoding
DefinitionA code c : S → T ∗ is extended to S∗ as follows: Given a strings1s2 · · · sn in S∗ we define
c(s1s2 · · · sn) = c(s1)c(s2) · · · c(sn)
Note: This way we overload the function c (polymorphism).
DefinitionA code c : S → T ∗ is uniquely decodeable (short UD) if theextended code function c : S∗ → T ∗ is injective.
This means that every string in T ∗ corresponds to at most onemessage in S∗.
Slide 14 of 40
Decoding
DefinitionA code c : S → T ∗ is extended to S∗ as follows: Given a strings1s2 · · · sn in S∗ we define
c(s1s2 · · · sn) = c(s1)c(s2) · · · c(sn)
Note: This way we overload the function c (polymorphism).
DefinitionA code c : S → T ∗ is uniquely decodeable (short UD) if theextended code function c : S∗ → T ∗ is injective.
This means that every string in T ∗ corresponds to at most onemessage in S∗.
Slide 14 of 40
Decoding
DefinitionA code c : S → T ∗ is extended to S∗ as follows: Given a strings1s2 · · · sn in S∗ we define
c(s1s2 · · · sn) = c(s1)c(s2) · · · c(sn)
Note: This way we overload the function c (polymorphism).
DefinitionA code c : S → T ∗ is uniquely decodeable (short UD) if theextended code function c : S∗ → T ∗ is injective.
This means that every string in T ∗ corresponds to at most onemessage in S∗.
Slide 14 of 40
Decoding
DefinitionA code c : S → T ∗ is extended to S∗ as follows: Given a strings1s2 · · · sn in S∗ we define
c(s1s2 · · · sn) = c(s1)c(s2) · · · c(sn)
Note: This way we overload the function c (polymorphism).
DefinitionA code c : S → T ∗ is uniquely decodeable (short UD) if theextended code function c : S∗ → T ∗ is injective.
This means that every string in T ∗ corresponds to at most onemessage in S∗.
Slide 14 of 40
Example
Example
Let S = x , y , z and take a code c : S → B∗ defined as
x 7→ 0 y 7→ 10 z 7→ 11
Take the original stream yzxxzyxzyy · · · in S∗ and encode it.
Is it always possible to decode the encoded stream in T ∗ = B∗?
Example
Same code as above. Consider encoded stream
1000111000 · · ·
What is the original stream? Is it uniquely determined?
Slide 15 of 40
Example
Example
Let S = x , y , z and take a code c : S → B∗ defined as
x 7→ 0 y 7→ 10 z 7→ 11
Take the original stream yzxxzyxzyy · · · in S∗ and encode it.
Is it always possible to decode the encoded stream in T ∗ = B∗?
Example
Same code as above. Consider encoded stream
1000111000 · · ·
What is the original stream? Is it uniquely determined?
Slide 15 of 40
Example
Example
Let S = x , y , z and take a code c : S → B∗ defined as
x 7→ 0 y 7→ 10 z 7→ 11
Take the original stream yzxxzyxzyy · · · in S∗ and encode it.
Is it always possible to decode the encoded stream in T ∗ = B∗?
Example
Same code as above. Consider encoded stream
1000111000 · · ·
What is the original stream? Is it uniquely determined?
Slide 15 of 40
The Decoding ProblemGiven a sequence in T ∗ can we (sequentially) decode it?
Example
Same code with C = 0,10,11. Consider 100111000 · · ·Take prefix of length 1, first 1: it is not in the codebook C.Take prefix of length 2, i.e. 10: this matches with y .Consider remainder stream 0111000 · · · .Take prefix of length 1, i.e. 0: this matches with x .Consider remainder stream 111000 · · · .Take prefix of length 1, i.e. 1: is not in the codebook C.Take prefix of length 2, i.e. 11: this matches with z.
Example
Without the ‘pause’ in the Morse code we could not decodeit, e.g. IEEE as • • • • • vs • ••••.
Slide 16 of 40
The Decoding ProblemGiven a sequence in T ∗ can we (sequentially) decode it?
Example
Same code with C = 0,10,11. Consider 100111000 · · ·Take prefix of length 1, first 1: it is not in the codebook C.Take prefix of length 2, i.e. 10: this matches with y .Consider remainder stream 0111000 · · · .Take prefix of length 1, i.e. 0: this matches with x .Consider remainder stream 111000 · · · .Take prefix of length 1, i.e. 1: is not in the codebook C.Take prefix of length 2, i.e. 11: this matches with z.
Example
Without the ‘pause’ in the Morse code we could not decodeit, e.g. IEEE as • • • • • vs • ••••.
Slide 16 of 40
The Decoding ProblemGiven a sequence in T ∗ can we (sequentially) decode it?
Example
Same code with C = 0,10,11. Consider 100111000 · · ·Take prefix of length 1, first 1: it is not in the codebook C.Take prefix of length 2, i.e. 10: this matches with y .Consider remainder stream 0111000 · · · .Take prefix of length 1, i.e. 0: this matches with x .Consider remainder stream 111000 · · · .Take prefix of length 1, i.e. 1: is not in the codebook C.Take prefix of length 2, i.e. 11: this matches with z.
Example
Without the ‘pause’ in the Morse code we could not decodeit, e.g. IEEE as • • • • • vs • ••••.
Slide 16 of 40
Prefix-FreeDefinitionA code c : S → T ∗ is prefix-free (short PF) if there is no pair ofcodeworts q = c(s) and q′ = c(s′) such that
q′ = qr for some non-empty word r ∈ T ∗.
Example
Let S = w , x , y , z with c : S → B∗ defined by
w 7→ 10 x 7→ 01 y 7→ 11 z 7→ 011
This code is not PF. It is not UD either, because:
wxwy 7→ 10011011 as well as wzz 7→ 10011011
Slide 17 of 40
Prefix-FreeDefinitionA code c : S → T ∗ is prefix-free (short PF) if there is no pair ofcodeworts q = c(s) and q′ = c(s′) such that
q′ = qr for some non-empty word r ∈ T ∗.
Example
Let S = w , x , y , z with c : S → B∗ defined by
w 7→ 10 x 7→ 01 y 7→ 11 z 7→ 011
This code is not PF. It is not UD either, because:
wxwy 7→ 10011011 as well as wzz 7→ 10011011
Slide 17 of 40
PF implies UD
TheoremIf a code c : S → T ∗ is prefix-free then it is uniquely decodable.
Proof (Indirect).
Take x = x1x2 · · · xm and y = y1y2 · · · yn such that c(x) = c(y)with c(x) = c(x1)c(x2) · · · c(xm) and c(y) = c(y1)c(y2) · · · c(yn).
The prefixes (of length k ) must be the same c(x)|k = c(y)|k forany k . It might be that |c(x1)| 6= |c(y1)|.
Assume w.l.o.g. |c(x1)| ≤ |c(y1)|, and take k = |c(x1)| thenc(x)|k = c(x1) = c(y1)|k and thus c(y1) = c(x1)r for somer ∈ T ∗. This contradicts PF. Repeat this argument to cover allof c(x) = c(y). Formally: Induction on lengths m and n.
Slide 18 of 40
PF implies UD
TheoremIf a code c : S → T ∗ is prefix-free then it is uniquely decodable.
Proof (Indirect).
Take x = x1x2 · · · xm and y = y1y2 · · · yn such that c(x) = c(y)with c(x) = c(x1)c(x2) · · · c(xm) and c(y) = c(y1)c(y2) · · · c(yn).
The prefixes (of length k ) must be the same c(x)|k = c(y)|k forany k . It might be that |c(x1)| 6= |c(y1)|.
Assume w.l.o.g. |c(x1)| ≤ |c(y1)|, and take k = |c(x1)| thenc(x)|k = c(x1) = c(y1)|k and thus c(y1) = c(x1)r for somer ∈ T ∗. This contradicts PF. Repeat this argument to cover allof c(x) = c(y). Formally: Induction on lengths m and n.
Slide 18 of 40
PF implies UD
TheoremIf a code c : S → T ∗ is prefix-free then it is uniquely decodable.
Proof (Indirect).
Take x = x1x2 · · · xm and y = y1y2 · · · yn such that c(x) = c(y)with c(x) = c(x1)c(x2) · · · c(xm) and c(y) = c(y1)c(y2) · · · c(yn).
The prefixes (of length k ) must be the same c(x)|k = c(y)|k forany k . It might be that |c(x1)| 6= |c(y1)|.
Assume w.l.o.g. |c(x1)| ≤ |c(y1)|, and take k = |c(x1)| thenc(x)|k = c(x1) = c(y1)|k and thus c(y1) = c(x1)r for somer ∈ T ∗. This contradicts PF. Repeat this argument to cover allof c(x) = c(y). Formally: Induction on lengths m and n.
Slide 18 of 40
PF implies UD
TheoremIf a code c : S → T ∗ is prefix-free then it is uniquely decodable.
Proof (Indirect).
Take x = x1x2 · · · xm and y = y1y2 · · · yn such that c(x) = c(y)with c(x) = c(x1)c(x2) · · · c(xm) and c(y) = c(y1)c(y2) · · · c(yn).
The prefixes (of length k ) must be the same c(x)|k = c(y)|k forany k . It might be that |c(x1)| 6= |c(y1)|.
Assume w.l.o.g. |c(x1)| ≤ |c(y1)|, and take k = |c(x1)| thenc(x)|k = c(x1) = c(y1)|k and thus c(y1) = c(x1)r for somer ∈ T ∗. This contradicts PF. Repeat this argument to cover allof c(x) = c(y). Formally: Induction on lengths m and n.
Slide 18 of 40
Strings as Paths in Trees
If we concentrate on binary codes with T = B then we can seeany possible codeword in C as a (finite) path in a binary tree:
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
0 1
0 1 0 1
0 1 0 1 0 1 0 1
In general, we have to consider a b-ary tree for |T | = b.
Slide 19 of 40
Strings as Paths in Trees
If we concentrate on binary codes with T = B then we can seeany possible codeword in C as a (finite) path in a binary tree:
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
0 1
0 1 0 1
0 1 0 1 0 1 0 1
In general, we have to consider a b-ary tree for |T | = b.
Slide 19 of 40
Codes as (Sub)-TreesThe codewords in C can be represented by the end-points ofthe corresponding paths. If a code C is PF then none of itsdecendents can represent a codeword in C.
Example
The code C = 0,10,110,111 is PF, its corresponding tree is
•
•
• •
0
10
110 111
Slide 20 of 40
Codes as (Sub)-TreesThe codewords in C can be represented by the end-points ofthe corresponding paths. If a code C is PF then none of itsdecendents can represent a codeword in C.
Example
The code C = 0,10,110,111 is PF, its corresponding tree is
•
•
• •
0
10
110 111
Slide 20 of 40
Parameters of a Code
DefinitionGiven a code c : S → T ∗, let ni denote the number of symbolss ∈ S which are decoded by strings of length i , i.e.
ni = |s ∈ S | |c(s)| = i| = |s ∈ S | c(s) ∈ T i|
We refer to n1,n2, · · · ,nM as parameters of c, where M is themaximal length of a codeword.
The number of all potential codewords of length i is is given bybi = |T i |, in particular: b = |T | = b1.
Note that nibi gives the fraction of possible codewords of length i
which is actually used by a code (“filling rate”).
Slide 21 of 40
Parameters of a Code
DefinitionGiven a code c : S → T ∗, let ni denote the number of symbolss ∈ S which are decoded by strings of length i , i.e.
ni = |s ∈ S | |c(s)| = i| = |s ∈ S | c(s) ∈ T i|
We refer to n1,n2, · · · ,nM as parameters of c, where M is themaximal length of a codeword.
The number of all potential codewords of length i is is given bybi = |T i |, in particular: b = |T | = b1.
Note that nibi gives the fraction of possible codewords of length i
which is actually used by a code (“filling rate”).
Slide 21 of 40
Parameters of a Code
DefinitionGiven a code c : S → T ∗, let ni denote the number of symbolss ∈ S which are decoded by strings of length i , i.e.
ni = |s ∈ S | |c(s)| = i| = |s ∈ S | c(s) ∈ T i|
We refer to n1,n2, · · · ,nM as parameters of c, where M is themaximal length of a codeword.
The number of all potential codewords of length i is is given bybi = |T i |, in particular: b = |T | = b1.
Note that nibi gives the fraction of possible codewords of length i
which is actually used by a code (“filling rate”).
Slide 21 of 40
Kraft-McMillan NumberDefinitionGiven a code c : S → T ∗ with parameters n1,n2, · · · ,nM . TheKraft-McMillan number of c is defined as:
K =M∑
i=1
ni
bi =n1
b+
n2
b2 + · · ·+ nM
bM
Example
Consider a binary code, i.e. b = b1 = 2. The parameters of thecode are n1 = 1,n2 = 1,n3 = 2 (like in the last example), thenbi = 2i , i.e. b1 = 2,b2 = 4,b3 = 8 and
K =12
+14
+28
=88
= 1
Slide 22 of 40
Kraft-McMillan NumberDefinitionGiven a code c : S → T ∗ with parameters n1,n2, · · · ,nM . TheKraft-McMillan number of c is defined as:
K =M∑
i=1
ni
bi =n1
b+
n2
b2 + · · ·+ nM
bM
Example
Consider a binary code, i.e. b = b1 = 2. The parameters of thecode are n1 = 1,n2 = 1,n3 = 2 (like in the last example), thenbi = 2i , i.e. b1 = 2,b2 = 4,b3 = 8 and
K =12
+14
+28
=88
= 1
Slide 22 of 40
Existence of PF Codes
Theorem (L.G. Kraft, 1949)
Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.
Example
Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8
16 + 616 + 1
16 = 1516 ≤ 1.
• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we
can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.
• Codeword(s) of length 4: Take either 1110 or 1111.
Slide 23 of 40
Existence of PF Codes
Theorem (L.G. Kraft, 1949)
Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.
Example
Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8
16 + 616 + 1
16 = 1516 ≤ 1.
• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we
can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.
• Codeword(s) of length 4: Take either 1110 or 1111.
Slide 23 of 40
Existence of PF Codes
Theorem (L.G. Kraft, 1949)
Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.
Example
Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8
16 + 616 + 1
16 = 1516 ≤ 1.
• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we
can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.
• Codeword(s) of length 4: Take either 1110 or 1111.
Slide 23 of 40
Existence of PF Codes
Theorem (L.G. Kraft, 1949)
Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.
Example
Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8
16 + 616 + 1
16 = 1516 ≤ 1.
• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we
can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.
• Codeword(s) of length 4: Take either 1110 or 1111.
Slide 23 of 40
Existence of PF Codes
Theorem (L.G. Kraft, 1949)
Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.
Example
Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8
16 + 616 + 1
16 = 1516 ≤ 1.
• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we
can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.
• Codeword(s) of length 4: Take either 1110 or 1111.
Slide 23 of 40
Proof of Kraft’s Theorem
Proof.Any partial sum of the one defining K also fulfills
∑i
nibi ≤ 1, e.g.
n1b1 ≤ 1, n1
b1 + n2b2 ≤ 1, n1
b1 + n2b2 + n3
b3 ≤ 1, etc. This implies:
n1 ≤ b, n2 ≤ b(b1 − n1), n3 ≤ b(b2 − n1b1 − n2), etc.
or in general:
ni ≤ b(
bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1
).
• Codeword length 1: Since n1 ≤ b1 we can chose any ofthem; and b − n1 words of length 1 remain as prefixes forwords of length 2,3, . . ..
Slide 24 of 40
Proof of Kraft’s Theorem
Proof.Any partial sum of the one defining K also fulfills
∑i
nibi ≤ 1, e.g.
n1b1 ≤ 1, n1
b1 + n2b2 ≤ 1, n1
b1 + n2b2 + n3
b3 ≤ 1, etc. This implies:
n1 ≤ b, n2 ≤ b(b1 − n1), n3 ≤ b(b2 − n1b1 − n2), etc.
or in general:
ni ≤ b(
bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1
).
• Codeword length 1: Since n1 ≤ b1 we can chose any ofthem; and b − n1 words of length 1 remain as prefixes forwords of length 2,3, . . ..
Slide 24 of 40
Proof of Kraft’s Theorem (cont.)
Proof (cont).
• Codeword length 2: There are b2 codewords of length 2but b1n1 are unavailable, so b2 − b1n1 are left. But wehave n2 ≤ b(b − n1) so we can make our choices.Thereafter b2 − n1b − n2 prefixes of length 2 are left.
• Codewords of length i : There are alwaysbi−1 − n1bi−2 − n2bi−3 − . . .− ni−1 perfixes of length i − 1left, and from K ≤ 1 we have
ni ≤ b(
bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1
)So we can chose enough codewords ni of length i andenough remain.
Formally, by induction on the maximal length of codewords.
Slide 25 of 40
Proof of Kraft’s Theorem (cont.)
Proof (cont).
• Codeword length 2: There are b2 codewords of length 2but b1n1 are unavailable, so b2 − b1n1 are left. But wehave n2 ≤ b(b − n1) so we can make our choices.Thereafter b2 − n1b − n2 prefixes of length 2 are left.
• Codewords of length i : There are alwaysbi−1 − n1bi−2 − n2bi−3 − . . .− ni−1 perfixes of length i − 1left, and from K ≤ 1 we have
ni ≤ b(
bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1
)So we can chose enough codewords ni of length i andenough remain.
Formally, by induction on the maximal length of codewords.
Slide 25 of 40
Proof of Kraft’s Theorem (cont.)
Proof (cont).
• Codeword length 2: There are b2 codewords of length 2but b1n1 are unavailable, so b2 − b1n1 are left. But wehave n2 ≤ b(b − n1) so we can make our choices.Thereafter b2 − n1b − n2 prefixes of length 2 are left.
• Codewords of length i : There are alwaysbi−1 − n1bi−2 − n2bi−3 − . . .− ni−1 perfixes of length i − 1left, and from K ≤ 1 we have
ni ≤ b(
bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1
)So we can chose enough codewords ni of length i andenough remain.
Formally, by induction on the maximal length of codewords.
Slide 25 of 40
Encoding Strings
Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.
Example
Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.
• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.
• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.
• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.
Slide 26 of 40
Encoding Strings
Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.
Example
Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.
• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.
• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.
• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.
Slide 26 of 40
Encoding Strings
Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.
Example
Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.
• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.
• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.
• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.
Slide 26 of 40
Encoding Strings
Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.
Example
Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.
• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.
• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.
• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.
Slide 26 of 40
Encoding Strings
Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.
Example
Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.
• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.
• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.
• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.
Slide 26 of 40
Generating FunctionsGenerating functions are an important tool for investigatingsequences, e.g. enumerations of combinatorial objects.
DefinitionFor a sequence of numbers q(1),q(2),q(3), . . . the generatingfunction Q(x) is given as the polynomial (formal power series)in the unknown variable x :
Q(x) = q(1)x + q(2)x2 + q(3)x3 + · · ·
For a code c : S → T ∗ we consider for 1 ≤ i ≤ rM:
qr (i) = ||c(s)| = i | |s| = r|
and the generating function:
Qr (x) = qr (1)x + qr (2)x2 + qr (3)x3 + · · ·+ qr (rM)x rM
Slide 27 of 40
Generating FunctionsGenerating functions are an important tool for investigatingsequences, e.g. enumerations of combinatorial objects.
DefinitionFor a sequence of numbers q(1),q(2),q(3), . . . the generatingfunction Q(x) is given as the polynomial (formal power series)in the unknown variable x :
Q(x) = q(1)x + q(2)x2 + q(3)x3 + · · ·
For a code c : S → T ∗ we consider for 1 ≤ i ≤ rM:
qr (i) = ||c(s)| = i | |s| = r|
and the generating function:
Qr (x) = qr (1)x + qr (2)x2 + qr (3)x3 + · · ·+ qr (rM)x rM
Slide 27 of 40
Generating FunctionsGenerating functions are an important tool for investigatingsequences, e.g. enumerations of combinatorial objects.
DefinitionFor a sequence of numbers q(1),q(2),q(3), . . . the generatingfunction Q(x) is given as the polynomial (formal power series)in the unknown variable x :
Q(x) = q(1)x + q(2)x2 + q(3)x3 + · · ·
For a code c : S → T ∗ we consider for 1 ≤ i ≤ rM:
qr (i) = ||c(s)| = i | |s| = r|
and the generating function:
Qr (x) = qr (1)x + qr (2)x2 + qr (3)x3 + · · ·+ qr (rM)x rM
Slide 27 of 40
Counting Principle (CP)
TheoremGiven a UD code c : S → T ∗ with |c(s)| ≤ M for all s ∈ S withgenerating function Qr (x) then for all r ≥ 1:
Qr (x) = Q1(x)r .
Example
Take c : S → T ∗ with parameters n1 = 2, n2 = 3 as above.What is the relationship between Q1 and Q2?Clearly q1(1) = 2 = n1 and q1(2) = 3 = n2 and thus
Q1(x) = 2x + 3x2 and Q2(x) = 4x2 + 12x3 + 9x4
and thus indeed (Q1(x))2 = Q1(x)Q1(x) = Q2(x).
Slide 28 of 40
Counting Principle (CP)
TheoremGiven a UD code c : S → T ∗ with |c(s)| ≤ M for all s ∈ S withgenerating function Qr (x) then for all r ≥ 1:
Qr (x) = Q1(x)r .
Example
Take c : S → T ∗ with parameters n1 = 2, n2 = 3 as above.What is the relationship between Q1 and Q2?Clearly q1(1) = 2 = n1 and q1(2) = 3 = n2 and thus
Q1(x) = 2x + 3x2 and Q2(x) = 4x2 + 12x3 + 9x4
and thus indeed (Q1(x))2 = Q1(x)Q1(x) = Q2(x).
Slide 28 of 40
Proof of the Counting PrincipleProof ∗.Consider Q1(x)r = (n1x + n2x2 + n3x3 + · · ·+ nMxM)r withnj = q1(j). What is the coefficient n to xk for 1 ≤ k ≤ rM? Toget any term of the form nxk multiply r terms from Q1(x) to getnxk = ni1x i1 × ni2x i2 × · · · × nir x
ir with i1 + i2 + · · ·+ ir = k andthen n = ni1ni2 · · · nir . The coefficient to xk is∑
i1+i2+···+ir =k
ni1ni2 · · · nir
Consider Qr (x) or qr (k), i.e. the number of words of length kencoded from strings of length r : Each such codeword isformed as c(x1)c(x2) · · · c(xr ), we can chose any xj with|c(xj)| = ij as long as i1 + i2 + · · ·+ ir = k in which case thereni1ni2 · · · nir possible (input) stings. In total we have to consideragain all partitions of k
Slide 29 of 40
Proof of the Counting PrincipleProof ∗.Consider Q1(x)r = (n1x + n2x2 + n3x3 + · · ·+ nMxM)r withnj = q1(j). What is the coefficient n to xk for 1 ≤ k ≤ rM? Toget any term of the form nxk multiply r terms from Q1(x) to getnxk = ni1x i1 × ni2x i2 × · · · × nir x
ir with i1 + i2 + · · ·+ ir = k andthen n = ni1ni2 · · · nir . The coefficient to xk is∑
i1+i2+···+ir =k
ni1ni2 · · · nir
Consider Qr (x) or qr (k), i.e. the number of words of length kencoded from strings of length r : Each such codeword isformed as c(x1)c(x2) · · · c(xr ), we can chose any xj with|c(xj)| = ij as long as i1 + i2 + · · ·+ ir = k in which case thereni1ni2 · · · nir possible (input) stings. In total we have to consideragain all partitions of k
Slide 29 of 40
Kraft-McMillan Number for UD Codes
Theorem (B. McMillan, 1956)
If there exists a uniquely decodable code c : S → T ∗ withparameters n1,n2, · · · ,nM ∈ N then K ≤ 1.
∃ UD =⇒ K ≤ 1 =⇒ ∃ PF and PF =⇒ UD
Corollary
The following statements are equivalent for codes c : S → T ∗:• There exists a UD code with parameters n1,n2, . . . ,nM .• It holds that K ≤ 1 for parameters n1,n2, . . . ,nM .• There exists a PF code with parameters n1,n2, . . . ,nM .
Slide 30 of 40
Kraft-McMillan Number for UD Codes
Theorem (B. McMillan, 1956)
If there exists a uniquely decodable code c : S → T ∗ withparameters n1,n2, · · · ,nM ∈ N then K ≤ 1.
∃ UD =⇒ K ≤ 1 =⇒ ∃ PF and PF =⇒ UD
Corollary
The following statements are equivalent for codes c : S → T ∗:• There exists a UD code with parameters n1,n2, . . . ,nM .• It holds that K ≤ 1 for parameters n1,n2, . . . ,nM .• There exists a PF code with parameters n1,n2, . . . ,nM .
Slide 30 of 40
Proof of McMillan’s TheoremProof (Indirect) ∗.Fix r ≥ 1 and take x = 1
b in Qr (x), we get (M = max |c(s)|):
Qr
(1b
)=
qr (1)
b+
qr (2)
b2 +qr (3)
b3 + · · ·+ qr (rM)
brM .
UD implies qr (i) ≤ bi and thus qr (j)bj ≤ 1 for all 1 ≤ j ≤ rM, and
therefore Qr(1
b
)≤ rM. From K = Q1
(1b
)and by CP we have:
K r =
(Q1
(1b
))r
= Qr
(1b
)≤ rM thus
K r
r≤ M ∀r .
Suppose K > 1 or K = 1 + k with k > 0, then K r = (1 + k)r =1 + rk + 1
2 r(r − 1)k2 + · · · Furthermore thus K r > 12 r(r − 1)k2
or K r
r > 12(r − 1)k2 which, for large r , contradicts K r
r ≤ M.
Slide 31 of 40
Proof of McMillan’s TheoremProof (Indirect) ∗.Fix r ≥ 1 and take x = 1
b in Qr (x), we get (M = max |c(s)|):
Qr
(1b
)=
qr (1)
b+
qr (2)
b2 +qr (3)
b3 + · · ·+ qr (rM)
brM .
UD implies qr (i) ≤ bi and thus qr (j)bj ≤ 1 for all 1 ≤ j ≤ rM, and
therefore Qr(1
b
)≤ rM. From K = Q1
(1b
)and by CP we have:
K r =
(Q1
(1b
))r
= Qr
(1b
)≤ rM thus
K r
r≤ M ∀r .
Suppose K > 1 or K = 1 + k with k > 0, then K r = (1 + k)r =1 + rk + 1
2 r(r − 1)k2 + · · · Furthermore thus K r > 12 r(r − 1)k2
or K r
r > 12(r − 1)k2 which, for large r , contradicts K r
r ≤ M.
Slide 31 of 40
Proof of McMillan’s TheoremProof (Indirect) ∗.Fix r ≥ 1 and take x = 1
b in Qr (x), we get (M = max |c(s)|):
Qr
(1b
)=
qr (1)
b+
qr (2)
b2 +qr (3)
b3 + · · ·+ qr (rM)
brM .
UD implies qr (i) ≤ bi and thus qr (j)bj ≤ 1 for all 1 ≤ j ≤ rM, and
therefore Qr(1
b
)≤ rM. From K = Q1
(1b
)and by CP we have:
K r =
(Q1
(1b
))r
= Qr
(1b
)≤ rM thus
K r
r≤ M ∀r .
Suppose K > 1 or K = 1 + k with k > 0, then K r = (1 + k)r =1 + rk + 1
2 r(r − 1)k2 + · · · Furthermore thus K r > 12 r(r − 1)k2
or K r
r > 12(r − 1)k2 which, for large r , contradicts K r
r ≤ M.
Slide 31 of 40
Some Probability Theory
Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.
We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).
Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.
Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].
Slide 32 of 40
Some Probability Theory
Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.
We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).
Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.
Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].
Slide 32 of 40
Some Probability Theory
Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.
We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).
Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.
Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].
Slide 32 of 40
Some Probability Theory
Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.
We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).
Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.
Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].
Slide 32 of 40
Some Probability Theory
Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.
We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).
Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.
Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].
Slide 32 of 40
Some Probability Theory
Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.
We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).
Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.
Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].
Slide 32 of 40
Finite Probability Spaces
Consider a finite measurable spaces (Ω,B) with |Ω| = n,
DefinitionA probability (measure) Pr : B on (Ω,B) has to fulfill• Pr(Ω) = 1.• 0 ≤ Pr(A) ≤ 1 for all A ∈ B.• Pr(A ∪ B) = Pr(A) + Pr(B) for A ∩ B = ∅.
Some further rules (which follow from the axioms above):• Pr(∅) = 0,• Pr(A) = 1− Pr(A),• Pr(A ∪ B) = Pr(A) + Pr(B)− Pr(A ∩ B).• etc.
Slide 33 of 40
Finite Probability Spaces
Consider a finite measurable spaces (Ω,B) with |Ω| = n,
DefinitionA probability (measure) Pr : B on (Ω,B) has to fulfill• Pr(Ω) = 1.• 0 ≤ Pr(A) ≤ 1 for all A ∈ B.• Pr(A ∪ B) = Pr(A) + Pr(B) for A ∩ B = ∅.
Some further rules (which follow from the axioms above):• Pr(∅) = 0,• Pr(A) = 1− Pr(A),• Pr(A ∪ B) = Pr(A) + Pr(B)− Pr(A ∩ B).• etc.
Slide 33 of 40
Finite Probability Spaces
Consider a finite measurable spaces (Ω,B) with |Ω| = n,
DefinitionA probability (measure) Pr : B on (Ω,B) has to fulfill• Pr(Ω) = 1.• 0 ≤ Pr(A) ≤ 1 for all A ∈ B.• Pr(A ∪ B) = Pr(A) + Pr(B) for A ∩ B = ∅.
Some further rules (which follow from the axioms above):• Pr(∅) = 0,• Pr(A) = 1− Pr(A),• Pr(A ∪ B) = Pr(A) + Pr(B)− Pr(A ∩ B).• etc.
Slide 33 of 40
Random Distributions
For finite probability spaces (Ω,B,Pr) with |Ω| = n, we candefine a probability (measure) via atoms in ω ∈ Ω.
DefinitionA probability distribution is a function p : Ω→ [0,1], with∑
ω∈Ω
p(ω) = 1.
If we enumerate the elements in Ω in some arbitrary way asΩ = ω1, ω2, . . . , ωn then we can also represented p by a (row)vector in Rn.
p = (p(ω1),p(ω2), . . . ,p(ωn))
Slide 34 of 40
Random Distributions
For finite probability spaces (Ω,B,Pr) with |Ω| = n, we candefine a probability (measure) via atoms in ω ∈ Ω.
DefinitionA probability distribution is a function p : Ω→ [0,1], with∑
ω∈Ω
p(ω) = 1.
If we enumerate the elements in Ω in some arbitrary way asΩ = ω1, ω2, . . . , ωn then we can also represented p by a (row)vector in Rn.
p = (p(ω1),p(ω2), . . . ,p(ωn))
Slide 34 of 40
Random Distributions
For finite probability spaces (Ω,B,Pr) with |Ω| = n, we candefine a probability (measure) via atoms in ω ∈ Ω.
DefinitionA probability distribution is a function p : Ω→ [0,1], with∑
ω∈Ω
p(ω) = 1.
If we enumerate the elements in Ω in some arbitrary way asΩ = ω1, ω2, . . . , ωn then we can also represented p by a (row)vector in Rn.
p = (p(ω1),p(ω2), . . . ,p(ωn))
Slide 34 of 40
Random VariablesProbability distributions on a finite Ω define a probability(measure) in the obvious way
Pr(A) =∑ω∈A
p(ω).
DefinitionA random variable is function X : Ω→ R.
We can represent random variables as (column) vectors in Rn.
Example
Consider a dice. The event space describes the top face of thedice, i.e. Ω =
, , , , ,
, define X which
counts the number of eyes, e.g. X( )
= 5 etc.
Slide 35 of 40
Random VariablesProbability distributions on a finite Ω define a probability(measure) in the obvious way
Pr(A) =∑ω∈A
p(ω).
DefinitionA random variable is function X : Ω→ R.
We can represent random variables as (column) vectors in Rn.
Example
Consider a dice. The event space describes the top face of thedice, i.e. Ω =
, , , , ,
, define X which
counts the number of eyes, e.g. X( )
= 5 etc.
Slide 35 of 40
Random VariablesProbability distributions on a finite Ω define a probability(measure) in the obvious way
Pr(A) =∑ω∈A
p(ω).
DefinitionA random variable is function X : Ω→ R.
We can represent random variables as (column) vectors in Rn.
Example
Consider a dice. The event space describes the top face of thedice, i.e. Ω =
, , , , ,
, define X which
counts the number of eyes, e.g. X( )
= 5 etc.
Slide 35 of 40
Random VariablesProbability distributions on a finite Ω define a probability(measure) in the obvious way
Pr(A) =∑ω∈A
p(ω).
DefinitionA random variable is function X : Ω→ R.
We can represent random variables as (column) vectors in Rn.
Example
Consider a dice. The event space describes the top face of thedice, i.e. Ω =
, , , , ,
, define X which
counts the number of eyes, e.g. X( )
= 5 etc.
Slide 35 of 40
Moments in ProbabilityExpectation Value. For a random variable X and probabilitydistribution p we define:
E(X ) =∑ω∈Ω
p(ω)X (ω) =∑
i
piXi = µX
One can show: E(X + Y ) = E(X ) + E(Y ) and E(αX ) = αE(X ).
Example
Consider a (fair) dice and previous random variable X , then
E(X ) = 116
+ 216
+ 316
+ 416
+ 516
+ 616
=216
Variance. For random variable X and distribution p we define:
Var(X ) = E((X − E(X ))2 = E(X 2)− (E(X ))2.
The standard deviation is σX =√
Var(X ).Slide 36 of 40
Moments in ProbabilityExpectation Value. For a random variable X and probabilitydistribution p we define:
E(X ) =∑ω∈Ω
p(ω)X (ω) =∑
i
piXi = µX
One can show: E(X + Y ) = E(X ) + E(Y ) and E(αX ) = αE(X ).
Example
Consider a (fair) dice and previous random variable X , then
E(X ) = 116
+ 216
+ 316
+ 416
+ 516
+ 616
=216
Variance. For random variable X and distribution p we define:
Var(X ) = E((X − E(X ))2 = E(X 2)− (E(X ))2.
The standard deviation is σX =√
Var(X ).Slide 36 of 40
Moments in ProbabilityExpectation Value. For a random variable X and probabilitydistribution p we define:
E(X ) =∑ω∈Ω
p(ω)X (ω) =∑
i
piXi = µX
One can show: E(X + Y ) = E(X ) + E(Y ) and E(αX ) = αE(X ).
Example
Consider a (fair) dice and previous random variable X , then
E(X ) = 116
+ 216
+ 316
+ 416
+ 516
+ 616
=216
Variance. For random variable X and distribution p we define:
Var(X ) = E((X − E(X ))2 = E(X 2)− (E(X ))2.
The standard deviation is σX =√
Var(X ).Slide 36 of 40
Moments in ProbabilityExpectation Value. For a random variable X and probabilitydistribution p we define:
E(X ) =∑ω∈Ω
p(ω)X (ω) =∑
i
piXi = µX
One can show: E(X + Y ) = E(X ) + E(Y ) and E(αX ) = αE(X ).
Example
Consider a (fair) dice and previous random variable X , then
E(X ) = 116
+ 216
+ 316
+ 416
+ 516
+ 616
=216
Variance. For random variable X and distribution p we define:
Var(X ) = E((X − E(X ))2 = E(X 2)− (E(X ))2.
The standard deviation is σX =√
Var(X ).Slide 36 of 40
Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is
PrB(A) = Pr(A | B) =Pr(A ∩ B)
Pr(B)
One can show Bayes Theorem which states:
PrA(B) =PrB(A)Pr(B)
Pr(A)=
Pr(A | B)Pr(B)
Pr(A)= Pr(B | A).
Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if
Pr(B) = Pr(B | A) =Pr(A ∩ B)
Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).
Slide 37 of 40
Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is
PrB(A) = Pr(A | B) =Pr(A ∩ B)
Pr(B)
One can show Bayes Theorem which states:
PrA(B) =PrB(A)Pr(B)
Pr(A)=
Pr(A | B)Pr(B)
Pr(A)= Pr(B | A).
Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if
Pr(B) = Pr(B | A) =Pr(A ∩ B)
Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).
Slide 37 of 40
Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is
PrB(A) = Pr(A | B) =Pr(A ∩ B)
Pr(B)
One can show Bayes Theorem which states:
PrA(B) =PrB(A)Pr(B)
Pr(A)=
Pr(A | B)Pr(B)
Pr(A)= Pr(B | A).
Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if
Pr(B) = Pr(B | A) =Pr(A ∩ B)
Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).
Slide 37 of 40
Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is
PrB(A) = Pr(A | B) =Pr(A ∩ B)
Pr(B)
One can show Bayes Theorem which states:
PrA(B) =PrB(A)Pr(B)
Pr(A)=
Pr(A | B)Pr(B)
Pr(A)= Pr(B | A).
Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if
Pr(B) = Pr(B | A) =Pr(A ∩ B)
Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).
Slide 37 of 40
Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is
PrB(A) = Pr(A | B) =Pr(A ∩ B)
Pr(B)
One can show Bayes Theorem which states:
PrA(B) =PrB(A)Pr(B)
Pr(A)=
Pr(A | B)Pr(B)
Pr(A)= Pr(B | A).
Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if
Pr(B) = Pr(B | A) =Pr(A ∩ B)
Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).
Slide 37 of 40
Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is
PrB(A) = Pr(A | B) =Pr(A ∩ B)
Pr(B)
One can show Bayes Theorem which states:
PrA(B) =PrB(A)Pr(B)
Pr(A)=
Pr(A | B)Pr(B)
Pr(A)= Pr(B | A).
Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if
Pr(B) = Pr(B | A) =Pr(A ∩ B)
Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).
Slide 37 of 40
Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:
Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)
If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.
Caveat: Not all distributions on Ω1 × Ω2 are a product.
Example
Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1
2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1
2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).
Slide 38 of 40
Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:
Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)
If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.
Caveat: Not all distributions on Ω1 × Ω2 are a product.
Example
Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1
2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1
2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).
Slide 38 of 40
Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:
Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)
If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.
Caveat: Not all distributions on Ω1 × Ω2 are a product.
Example
Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1
2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1
2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).
Slide 38 of 40
Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:
Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)
If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.
Caveat: Not all distributions on Ω1 × Ω2 are a product.
Example
Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1
2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1
2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).
Slide 38 of 40
Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:
Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)
If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.
Caveat: Not all distributions on Ω1 × Ω2 are a product.
Example
Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1
2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1
2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).
Slide 38 of 40
Tensor/Kronecker Product
Given a n ×m matrix A and a k × l matrix B:
A =
a11 . . . a1m...
. . ....
an1 . . . anm
B =
b11 . . . b1l...
. . ....
bk1 . . . bkl
The tensor or Kronecker product A⊗ B is a nk ×ml matrix:
A⊗ B =
a11B . . . a1mB...
. . ....
an1B . . . anmB
Special cases are square matrices (n = m and k = l) andvectors (row n = k = 1, column m = l = 1).
Slide 39 of 40
Tensor/Kronecker Product
Given a n ×m matrix A and a k × l matrix B:
A =
a11 . . . a1m...
. . ....
an1 . . . anm
B =
b11 . . . b1l...
. . ....
bk1 . . . bkl
The tensor or Kronecker product A⊗ B is a nk ×ml matrix:
A⊗ B =
a11B . . . a1mB...
. . ....
an1B . . . anmB
Special cases are square matrices (n = m and k = l) andvectors (row n = k = 1, column m = l = 1).
Slide 39 of 40
Tensor/Kronecker Product
Given a n ×m matrix A and a k × l matrix B:
A =
a11 . . . a1m...
. . ....
an1 . . . anm
B =
b11 . . . b1l...
. . ....
bk1 . . . bkl
The tensor or Kronecker product A⊗ B is a nk ×ml matrix:
A⊗ B =
a11B . . . a1mB...
. . ....
an1B . . . anmB
Special cases are square matrices (n = m and k = l) andvectors (row n = k = 1, column m = l = 1).
Slide 39 of 40
CorrelationThe covariance of two random variables X and Y is:
Cov(X ,Y ) = E((X − E(X ))E((Y − E(Y )) = E(XY )− E(X )E(Y )
The correlation coefficient is ρ(X ,Y ) = Cov(X ,Y )/(σXσY ).
For independent random variables X and Y – i.e. if we havePr((X = xj) ∩ (Y = yk )) = Pr(X = xj)Pr(Y = yk ):
E(XY ) = E(X )E(Y ) and Cov(X ,Y ) = ρ(X ,Y ) = 0.
Note that ρ(X ,Y ) = 0 does not imply independence.
Example
Pr(X = x) = 13 for x = −1,0,1 and Y = X 2, then ρ(X ,Y ) = 0.
Slide 40 of 40
CorrelationThe covariance of two random variables X and Y is:
Cov(X ,Y ) = E((X − E(X ))E((Y − E(Y )) = E(XY )− E(X )E(Y )
The correlation coefficient is ρ(X ,Y ) = Cov(X ,Y )/(σXσY ).
For independent random variables X and Y – i.e. if we havePr((X = xj) ∩ (Y = yk )) = Pr(X = xj)Pr(Y = yk ):
E(XY ) = E(X )E(Y ) and Cov(X ,Y ) = ρ(X ,Y ) = 0.
Note that ρ(X ,Y ) = 0 does not imply independence.
Example
Pr(X = x) = 13 for x = −1,0,1 and Y = X 2, then ρ(X ,Y ) = 0.
Slide 40 of 40
CorrelationThe covariance of two random variables X and Y is:
Cov(X ,Y ) = E((X − E(X ))E((Y − E(Y )) = E(XY )− E(X )E(Y )
The correlation coefficient is ρ(X ,Y ) = Cov(X ,Y )/(σXσY ).
For independent random variables X and Y – i.e. if we havePr((X = xj) ∩ (Y = yk )) = Pr(X = xj)Pr(Y = yk ):
E(XY ) = E(X )E(Y ) and Cov(X ,Y ) = ρ(X ,Y ) = 0.
Note that ρ(X ,Y ) = 0 does not imply independence.
Example
Pr(X = x) = 13 for x = −1,0,1 and Y = X 2, then ρ(X ,Y ) = 0.
Slide 40 of 40