Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and...

136
Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky [email protected] Department of Computing Imperial College London Autumn 2017 Slide 1 of 40

Transcript of Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and...

Page 1: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information and Coding Theory (CO349)Coding and Decoding

Herbert [email protected]

Department of ComputingImperial College London

Autumn 2017

Slide 1 of 40

Page 2: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information and Codes

What will be covered in this course?

• Data and Information Representation• Data and Information Transmission

We will look at these problems from a quantitative, statisticalpoint of view as well as form an algebraic point of view.

In German, Dutch, Italian, etc: CS is Informatik – InformaticaInformation Science rather than Computer Science.

Slide 2 of 40

Page 3: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information and Codes

What will be covered in this course?

• Data and Information Representation• Data and Information Transmission

We will look at these problems from a quantitative, statisticalpoint of view as well as form an algebraic point of view.

In German, Dutch, Italian, etc: CS is Informatik – InformaticaInformation Science rather than Computer Science.

Slide 2 of 40

Page 4: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information and Codes

What will be covered in this course?

• Data and Information Representation• Data and Information Transmission

We will look at these problems from a quantitative, statisticalpoint of view as well as form an algebraic point of view.

In German, Dutch, Italian, etc: CS is Informatik – InformaticaInformation Science rather than Computer Science.

Slide 2 of 40

Page 5: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Practicalities

Two Lecturers

Herbert [email protected] 31

2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)

Mahdi [email protected] 31

2 weeks from 2 NovemberOpen-book coursework test 23 November

Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).

Different classes, different background, different applications.

Slide 3 of 40

Page 6: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Practicalities

Two Lecturers

Herbert [email protected] 31

2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)

Mahdi [email protected] 31

2 weeks from 2 NovemberOpen-book coursework test 23 November

Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).

Different classes, different background, different applications.

Slide 3 of 40

Page 7: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Practicalities

Two Lecturers

Herbert [email protected] 31

2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)

Mahdi [email protected] 31

2 weeks from 2 NovemberOpen-book coursework test 23 November

Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).

Different classes, different background, different applications.

Slide 3 of 40

Page 8: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Practicalities

Two Lecturers

Herbert [email protected] 31

2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)

Mahdi [email protected] 31

2 weeks from 2 NovemberOpen-book coursework test 23 November

Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).

Different classes, different background, different applications.

Slide 3 of 40

Page 9: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Practicalities

Two Lecturers

Herbert [email protected] 31

2 weeks until 30 OctoberOpen-book coursework test 26 October (or 30?)

Mahdi [email protected] 31

2 weeks from 2 NovemberOpen-book coursework test 23 November

Exam: Week 11, 11-15 December 2017, 2 hours (3 out of 4).

Different classes, different background, different applications.

Slide 3 of 40

Page 10: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 11: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 12: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 13: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 14: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 15: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 16: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 17: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 18: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 19: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Overview – Information (Theory)

Part I: Information Theory

• Simple Codes and Coding• Revision: Probabilities• Information Representation• Information Transmission• Information Manifestation

Part II: Coding Theory

• Revision: Linear Algebra• Linear Codes• Algebraic Error-Correcting Codes• Information in Cryptography

Slide 4 of 40

Page 20: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Text Books

• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.

• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.

• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.

• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.

• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.

• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.

Slide 5 of 40

Page 21: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Text Books

• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.

• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.

• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.

• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.

• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.

• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.

Slide 5 of 40

Page 22: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Text Books

• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.

• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.

• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.

• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.

• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.

• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.

Slide 5 of 40

Page 23: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Text Books

• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.

• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.

• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.

• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.

• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.

• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.

Slide 5 of 40

Page 24: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Text Books

• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.

• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.

• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.

• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.

• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.

• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.

Slide 5 of 40

Page 25: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Text Books

• Norman L. Biggs: Codes - An Introduction to Information,Communication and Cryptography, Springer 2008.

• Ron Roth: Introduction to Coding Theory, CambridgeUniversity Press, 2006.

• David Applebaum: Probability and Information, CambridgeUniversity Press, 1996/2008.

• Robert McEliece: The Theory of Information and Coding,Cambridge University Press, 2002.

• T.M. Cover, J.A. Thomas: Elements of Information Theory,Wiley-Interscience, 2006.

• Robert B. Ash: Information Theory, Wiley 1965, reprint:Dover 1980.

Slide 5 of 40

Page 26: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information and Security

Side-Channel Attacks (Kocher, 1996)The problem appears in attacks against public encryptionalgorithms like RSA. In (optimised) versions of de/encoding(using modular exponentation) properties of the secrete keydetermine the execution time.

How much information about the secret key is revealed?

Differential Privacy (Dwork, 2006)In large (statistical) databases an attacker can try to revealinformation about individuals (de-anonymise), e.g. there areonly 3 under-25 with hair loss registered, and there are twopeople getting hair-loss treatment in SW7. Kim is on thedatabase, 21 and lives at 170 Queen’s Gate.

How much information about individuals is revealed?

Slide 6 of 40

Page 27: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information and Security

Side-Channel Attacks (Kocher, 1996)The problem appears in attacks against public encryptionalgorithms like RSA. In (optimised) versions of de/encoding(using modular exponentation) properties of the secrete keydetermine the execution time.

How much information about the secret key is revealed?

Differential Privacy (Dwork, 2006)In large (statistical) databases an attacker can try to revealinformation about individuals (de-anonymise), e.g. there areonly 3 under-25 with hair loss registered, and there are twopeople getting hair-loss treatment in SW7. Kim is on thedatabase, 21 and lives at 170 Queen’s Gate.

How much information about individuals is revealed?

Slide 6 of 40

Page 28: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information and Security

Side-Channel Attacks (Kocher, 1996)The problem appears in attacks against public encryptionalgorithms like RSA. In (optimised) versions of de/encoding(using modular exponentation) properties of the secrete keydetermine the execution time.

How much information about the secret key is revealed?

Differential Privacy (Dwork, 2006)In large (statistical) databases an attacker can try to revealinformation about individuals (de-anonymise), e.g. there areonly 3 under-25 with hair loss registered, and there are twopeople getting hair-loss treatment in SW7. Kim is on thedatabase, 21 and lives at 170 Queen’s Gate.

How much information about individuals is revealed?

Slide 6 of 40

Page 29: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information and Security

Side-Channel Attacks (Kocher, 1996)The problem appears in attacks against public encryptionalgorithms like RSA. In (optimised) versions of de/encoding(using modular exponentation) properties of the secrete keydetermine the execution time.

How much information about the secret key is revealed?

Differential Privacy (Dwork, 2006)In large (statistical) databases an attacker can try to revealinformation about individuals (de-anonymise), e.g. there areonly 3 under-25 with hair loss registered, and there are twopeople getting hair-loss treatment in SW7. Kim is on thedatabase, 21 and lives at 170 Queen’s Gate.

How much information about individuals is revealed?

Slide 6 of 40

Page 30: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information – A Measure of Surprise

The Entropy of a probability distribution p is:

H(p) = −∑

p(x) log2(p(x)).

Example (Cover&Thomas)

Consider a Cheltenham Horse Race with 8 horses with winningchances (1

2 ,14 ,

18 ,

116 ,

164 ,

164 ,

164 ,

164), then the entropy is 2.

Label horses 0,10,110,1110,111100,111101,111110,111111then we need only only 2 bits in average to report winner.

For equally strong horses (18 ,

18 ,

18 ,

18 ,

18 ,

18 ,

18 ,

18) we have an

entropy of 3, and the minimal message length is also 3 bits.

Slide 7 of 40

Page 31: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Information – A Measure of Surprise

The Entropy of a probability distribution p is:

H(p) = −∑

p(x) log2(p(x)).

Example (Cover&Thomas)

Consider a Cheltenham Horse Race with 8 horses with winningchances (1

2 ,14 ,

18 ,

116 ,

164 ,

164 ,

164 ,

164), then the entropy is 2.

Label horses 0,10,110,1110,111100,111101,111110,111111then we need only only 2 bits in average to report winner.

For equally strong horses (18 ,

18 ,

18 ,

18 ,

18 ,

18 ,

18 ,

18) we have an

entropy of 3, and the minimal message length is also 3 bits.

Slide 7 of 40

Page 32: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Surprise vs Prejudice

A father and son are on a fishing trip in the mountains of Wales.On the way back home their car has a serious accident.

The father is immediately killed and declared dead on the siteof the accident. However, the son is severely injured and drivenby ambulance to the next hospital.

When the son is brought into the operating theatre the surgeonexclaims "I can’t do this, he is my son."

Scientific American, 1980s

Slide 8 of 40

Page 33: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Surprise vs Prejudice

A father and son are on a fishing trip in the mountains of Wales.On the way back home their car has a serious accident.

The father is immediately killed and declared dead on the siteof the accident. However, the son is severely injured and drivenby ambulance to the next hospital.

When the son is brought into the operating theatre the surgeonexclaims "I can’t do this, he is my son."

Scientific American, 1980s

Slide 8 of 40

Page 34: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Surprise vs Prejudice

A father and son are on a fishing trip in the mountains of Wales.On the way back home their car has a serious accident.

The father is immediately killed and declared dead on the siteof the accident. However, the son is severely injured and drivenby ambulance to the next hospital.

When the son is brought into the operating theatre the surgeonexclaims "I can’t do this, he is my son."

Scientific American, 1980s

Slide 8 of 40

Page 35: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Everything You Always Wanted toKnow About Logarithms But Were

Afraid to AskDefined (implicitly) via the equation:

x = blogb(x)

Common logarithms: b = 2, b = 10, and b = 2.71828 . . . = e

logb(xy) = logb(x) + logb(y)

logb(xn) = n logb(x), . . . logb(x) =loga(x)

loga(b)

Example

log2(64) = 6 as 26 = 64; log2( 164) = −6 as 1

64 = 2−6 (2−1 = 12 ).

Slide 9 of 40

Page 36: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Everything You Always Wanted toKnow About Logarithms But Were

Afraid to AskDefined (implicitly) via the equation:

x = blogb(x)

Common logarithms: b = 2, b = 10, and b = 2.71828 . . . = e

logb(xy) = logb(x) + logb(y)

logb(xn) = n logb(x), . . . logb(x) =loga(x)

loga(b)

Example

log2(64) = 6 as 26 = 64; log2( 164) = −6 as 1

64 = 2−6 (2−1 = 12 ).

Slide 9 of 40

Page 37: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Everything You Always Wanted toKnow About Logarithms But Were

Afraid to AskDefined (implicitly) via the equation:

x = blogb(x)

Common logarithms: b = 2, b = 10, and b = 2.71828 . . . = e

logb(xy) = logb(x) + logb(y)

logb(xn) = n logb(x), . . . logb(x) =loga(x)

loga(b)

Example

log2(64) = 6 as 26 = 64; log2( 164) = −6 as 1

64 = 2−6 (2−1 = 12 ).

Slide 9 of 40

Page 38: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Messages and Strings

DefinitionAn alphabet is a finite set S; its members s ∈ S are referred toas symbols.

DefinitionA message m in the alphabet S is a finite sequence ofelements in S:

m = s1s2 · · · sn with si ∈ S,1 ≤ i ≤ n.

A message is often also referred to as a string or word in S.The number n ∈ N is the length of m.

Notation for length |m| = n; prefix m|k = s1s2 · · · sk (with k ≤ n).

Slide 10 of 40

Page 39: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Messages and Strings

DefinitionAn alphabet is a finite set S; its members s ∈ S are referred toas symbols.

DefinitionA message m in the alphabet S is a finite sequence ofelements in S:

m = s1s2 · · · sn with si ∈ S,1 ≤ i ≤ n.

A message is often also referred to as a string or word in S.The number n ∈ N is the length of m.

Notation for length |m| = n; prefix m|k = s1s2 · · · sk (with k ≤ n).

Slide 10 of 40

Page 40: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Messages and Strings

DefinitionAn alphabet is a finite set S; its members s ∈ S are referred toas symbols.

DefinitionA message m in the alphabet S is a finite sequence ofelements in S:

m = s1s2 · · · sn with si ∈ S,1 ≤ i ≤ n.

A message is often also referred to as a string or word in S.The number n ∈ N is the length of m.

Notation for length |m| = n; prefix m|k = s1s2 · · · sk (with k ≤ n).

Slide 10 of 40

Page 41: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Notation

Consider also string of length zero, often denoted ε with |ε| = 0.

By S0 we denote the set of words containing only ε; Sn denotesthe set of all strings of length n; and

S∗ = S0 ∪ S1 ∪ S2 ∪ · · ·

Example

With the alphabet S = B = 0,1 we can produce all binarymessages of a certain length n. Concretely, for example:

B3 = 000,001,010,011,100,101,110,111

Slide 11 of 40

Page 42: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Notation

Consider also string of length zero, often denoted ε with |ε| = 0.

By S0 we denote the set of words containing only ε; Sn denotesthe set of all strings of length n; and

S∗ = S0 ∪ S1 ∪ S2 ∪ · · ·

Example

With the alphabet S = B = 0,1 we can produce all binarymessages of a certain length n. Concretely, for example:

B3 = 000,001,010,011,100,101,110,111

Slide 11 of 40

Page 43: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Notation

Consider also string of length zero, often denoted ε with |ε| = 0.

By S0 we denote the set of words containing only ε; Sn denotesthe set of all strings of length n; and

S∗ = S0 ∪ S1 ∪ S2 ∪ · · ·

Example

With the alphabet S = B = 0,1 we can produce all binarymessages of a certain length n. Concretely, for example:

B3 = 000,001,010,011,100,101,110,111

Slide 11 of 40

Page 44: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Coding

DefinitionLet S and T be two alphabets. A code c is an injective function

c : S → T ∗

For each symbol s ∈ S the string c(s) ∈ T ∗ is called thecodeword for s. The set of all codewords

C = c(s) | s ∈ S

is also referred to as the code.

For |T | = 2, e.g. T = B, we have a binary, for |T | = 3 a ternary,and for |T | = b a b-ary code.

Slide 12 of 40

Page 45: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Coding

DefinitionLet S and T be two alphabets. A code c is an injective function

c : S → T ∗

For each symbol s ∈ S the string c(s) ∈ T ∗ is called thecodeword for s. The set of all codewords

C = c(s) | s ∈ S

is also referred to as the code.

For |T | = 2, e.g. T = B, we have a binary, for |T | = 3 a ternary,and for |T | = b a b-ary code.

Slide 12 of 40

Page 46: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Example: Morse CodeExample

Development from 1836 by Samuel F. B. Morse (1791–1872).This is a code c : A, . . . ,Z → • , – ,∗ (with variations):

A • –B – • • •C – • – •D – • •E •F • • – •G – – •

H • • • •I • •J • – – –K – • –L • – • •M – –N – •

O – – –P • – – •Q – – • –R • – •S • • •T –U • • –

V • • • –W • – –X – • • –Y – • – –Z – – • •

Most famous (after April 1912) perhaps:

SOS 7→ • • •– – –• • • ≡ • • • – – – • • •

Slide 13 of 40

Page 47: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Example: Morse CodeExample

Development from 1836 by Samuel F. B. Morse (1791–1872).This is a code c : A, . . . ,Z → • , – ,∗ (with variations):

A • –B – • • •C – • – •D – • •E •F • • – •G – – •

H • • • •I • •J • – – –K – • –L • – • •M – –N – •

O – – –P • – – •Q – – • –R • – •S • • •T –U • • –

V • • • –W • – –X – • • –Y – • – –Z – – • •

Most famous (after April 1912) perhaps:

SOS 7→ • • •– – –• • • ≡ • • • – – – • • •

Slide 13 of 40

Page 48: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Decoding

DefinitionA code c : S → T ∗ is extended to S∗ as follows: Given a strings1s2 · · · sn in S∗ we define

c(s1s2 · · · sn) = c(s1)c(s2) · · · c(sn)

Note: This way we overload the function c (polymorphism).

DefinitionA code c : S → T ∗ is uniquely decodeable (short UD) if theextended code function c : S∗ → T ∗ is injective.

This means that every string in T ∗ corresponds to at most onemessage in S∗.

Slide 14 of 40

Page 49: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Decoding

DefinitionA code c : S → T ∗ is extended to S∗ as follows: Given a strings1s2 · · · sn in S∗ we define

c(s1s2 · · · sn) = c(s1)c(s2) · · · c(sn)

Note: This way we overload the function c (polymorphism).

DefinitionA code c : S → T ∗ is uniquely decodeable (short UD) if theextended code function c : S∗ → T ∗ is injective.

This means that every string in T ∗ corresponds to at most onemessage in S∗.

Slide 14 of 40

Page 50: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Decoding

DefinitionA code c : S → T ∗ is extended to S∗ as follows: Given a strings1s2 · · · sn in S∗ we define

c(s1s2 · · · sn) = c(s1)c(s2) · · · c(sn)

Note: This way we overload the function c (polymorphism).

DefinitionA code c : S → T ∗ is uniquely decodeable (short UD) if theextended code function c : S∗ → T ∗ is injective.

This means that every string in T ∗ corresponds to at most onemessage in S∗.

Slide 14 of 40

Page 51: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Decoding

DefinitionA code c : S → T ∗ is extended to S∗ as follows: Given a strings1s2 · · · sn in S∗ we define

c(s1s2 · · · sn) = c(s1)c(s2) · · · c(sn)

Note: This way we overload the function c (polymorphism).

DefinitionA code c : S → T ∗ is uniquely decodeable (short UD) if theextended code function c : S∗ → T ∗ is injective.

This means that every string in T ∗ corresponds to at most onemessage in S∗.

Slide 14 of 40

Page 52: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Example

Example

Let S = x , y , z and take a code c : S → B∗ defined as

x 7→ 0 y 7→ 10 z 7→ 11

Take the original stream yzxxzyxzyy · · · in S∗ and encode it.

Is it always possible to decode the encoded stream in T ∗ = B∗?

Example

Same code as above. Consider encoded stream

1000111000 · · ·

What is the original stream? Is it uniquely determined?

Slide 15 of 40

Page 53: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Example

Example

Let S = x , y , z and take a code c : S → B∗ defined as

x 7→ 0 y 7→ 10 z 7→ 11

Take the original stream yzxxzyxzyy · · · in S∗ and encode it.

Is it always possible to decode the encoded stream in T ∗ = B∗?

Example

Same code as above. Consider encoded stream

1000111000 · · ·

What is the original stream? Is it uniquely determined?

Slide 15 of 40

Page 54: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Example

Example

Let S = x , y , z and take a code c : S → B∗ defined as

x 7→ 0 y 7→ 10 z 7→ 11

Take the original stream yzxxzyxzyy · · · in S∗ and encode it.

Is it always possible to decode the encoded stream in T ∗ = B∗?

Example

Same code as above. Consider encoded stream

1000111000 · · ·

What is the original stream? Is it uniquely determined?

Slide 15 of 40

Page 55: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

The Decoding ProblemGiven a sequence in T ∗ can we (sequentially) decode it?

Example

Same code with C = 0,10,11. Consider 100111000 · · ·Take prefix of length 1, first 1: it is not in the codebook C.Take prefix of length 2, i.e. 10: this matches with y .Consider remainder stream 0111000 · · · .Take prefix of length 1, i.e. 0: this matches with x .Consider remainder stream 111000 · · · .Take prefix of length 1, i.e. 1: is not in the codebook C.Take prefix of length 2, i.e. 11: this matches with z.

Example

Without the ‘pause’ in the Morse code we could not decodeit, e.g. IEEE as • • • • • vs • ••••.

Slide 16 of 40

Page 56: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

The Decoding ProblemGiven a sequence in T ∗ can we (sequentially) decode it?

Example

Same code with C = 0,10,11. Consider 100111000 · · ·Take prefix of length 1, first 1: it is not in the codebook C.Take prefix of length 2, i.e. 10: this matches with y .Consider remainder stream 0111000 · · · .Take prefix of length 1, i.e. 0: this matches with x .Consider remainder stream 111000 · · · .Take prefix of length 1, i.e. 1: is not in the codebook C.Take prefix of length 2, i.e. 11: this matches with z.

Example

Without the ‘pause’ in the Morse code we could not decodeit, e.g. IEEE as • • • • • vs • ••••.

Slide 16 of 40

Page 57: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

The Decoding ProblemGiven a sequence in T ∗ can we (sequentially) decode it?

Example

Same code with C = 0,10,11. Consider 100111000 · · ·Take prefix of length 1, first 1: it is not in the codebook C.Take prefix of length 2, i.e. 10: this matches with y .Consider remainder stream 0111000 · · · .Take prefix of length 1, i.e. 0: this matches with x .Consider remainder stream 111000 · · · .Take prefix of length 1, i.e. 1: is not in the codebook C.Take prefix of length 2, i.e. 11: this matches with z.

Example

Without the ‘pause’ in the Morse code we could not decodeit, e.g. IEEE as • • • • • vs • ••••.

Slide 16 of 40

Page 58: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Prefix-FreeDefinitionA code c : S → T ∗ is prefix-free (short PF) if there is no pair ofcodeworts q = c(s) and q′ = c(s′) such that

q′ = qr for some non-empty word r ∈ T ∗.

Example

Let S = w , x , y , z with c : S → B∗ defined by

w 7→ 10 x 7→ 01 y 7→ 11 z 7→ 011

This code is not PF. It is not UD either, because:

wxwy 7→ 10011011 as well as wzz 7→ 10011011

Slide 17 of 40

Page 59: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Prefix-FreeDefinitionA code c : S → T ∗ is prefix-free (short PF) if there is no pair ofcodeworts q = c(s) and q′ = c(s′) such that

q′ = qr for some non-empty word r ∈ T ∗.

Example

Let S = w , x , y , z with c : S → B∗ defined by

w 7→ 10 x 7→ 01 y 7→ 11 z 7→ 011

This code is not PF. It is not UD either, because:

wxwy 7→ 10011011 as well as wzz 7→ 10011011

Slide 17 of 40

Page 60: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

PF implies UD

TheoremIf a code c : S → T ∗ is prefix-free then it is uniquely decodable.

Proof (Indirect).

Take x = x1x2 · · · xm and y = y1y2 · · · yn such that c(x) = c(y)with c(x) = c(x1)c(x2) · · · c(xm) and c(y) = c(y1)c(y2) · · · c(yn).

The prefixes (of length k ) must be the same c(x)|k = c(y)|k forany k . It might be that |c(x1)| 6= |c(y1)|.

Assume w.l.o.g. |c(x1)| ≤ |c(y1)|, and take k = |c(x1)| thenc(x)|k = c(x1) = c(y1)|k and thus c(y1) = c(x1)r for somer ∈ T ∗. This contradicts PF. Repeat this argument to cover allof c(x) = c(y). Formally: Induction on lengths m and n.

Slide 18 of 40

Page 61: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

PF implies UD

TheoremIf a code c : S → T ∗ is prefix-free then it is uniquely decodable.

Proof (Indirect).

Take x = x1x2 · · · xm and y = y1y2 · · · yn such that c(x) = c(y)with c(x) = c(x1)c(x2) · · · c(xm) and c(y) = c(y1)c(y2) · · · c(yn).

The prefixes (of length k ) must be the same c(x)|k = c(y)|k forany k . It might be that |c(x1)| 6= |c(y1)|.

Assume w.l.o.g. |c(x1)| ≤ |c(y1)|, and take k = |c(x1)| thenc(x)|k = c(x1) = c(y1)|k and thus c(y1) = c(x1)r for somer ∈ T ∗. This contradicts PF. Repeat this argument to cover allof c(x) = c(y). Formally: Induction on lengths m and n.

Slide 18 of 40

Page 62: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

PF implies UD

TheoremIf a code c : S → T ∗ is prefix-free then it is uniquely decodable.

Proof (Indirect).

Take x = x1x2 · · · xm and y = y1y2 · · · yn such that c(x) = c(y)with c(x) = c(x1)c(x2) · · · c(xm) and c(y) = c(y1)c(y2) · · · c(yn).

The prefixes (of length k ) must be the same c(x)|k = c(y)|k forany k . It might be that |c(x1)| 6= |c(y1)|.

Assume w.l.o.g. |c(x1)| ≤ |c(y1)|, and take k = |c(x1)| thenc(x)|k = c(x1) = c(y1)|k and thus c(y1) = c(x1)r for somer ∈ T ∗. This contradicts PF. Repeat this argument to cover allof c(x) = c(y). Formally: Induction on lengths m and n.

Slide 18 of 40

Page 63: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

PF implies UD

TheoremIf a code c : S → T ∗ is prefix-free then it is uniquely decodable.

Proof (Indirect).

Take x = x1x2 · · · xm and y = y1y2 · · · yn such that c(x) = c(y)with c(x) = c(x1)c(x2) · · · c(xm) and c(y) = c(y1)c(y2) · · · c(yn).

The prefixes (of length k ) must be the same c(x)|k = c(y)|k forany k . It might be that |c(x1)| 6= |c(y1)|.

Assume w.l.o.g. |c(x1)| ≤ |c(y1)|, and take k = |c(x1)| thenc(x)|k = c(x1) = c(y1)|k and thus c(y1) = c(x1)r for somer ∈ T ∗. This contradicts PF. Repeat this argument to cover allof c(x) = c(y). Formally: Induction on lengths m and n.

Slide 18 of 40

Page 64: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Strings as Paths in Trees

If we concentrate on binary codes with T = B then we can seeany possible codeword in C as a (finite) path in a binary tree:

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

0 1

0 1 0 1

0 1 0 1 0 1 0 1

In general, we have to consider a b-ary tree for |T | = b.

Slide 19 of 40

Page 65: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Strings as Paths in Trees

If we concentrate on binary codes with T = B then we can seeany possible codeword in C as a (finite) path in a binary tree:

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

0 1

0 1 0 1

0 1 0 1 0 1 0 1

In general, we have to consider a b-ary tree for |T | = b.

Slide 19 of 40

Page 66: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Codes as (Sub)-TreesThe codewords in C can be represented by the end-points ofthe corresponding paths. If a code C is PF then none of itsdecendents can represent a codeword in C.

Example

The code C = 0,10,110,111 is PF, its corresponding tree is

• •

0

10

110 111

Slide 20 of 40

Page 67: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Codes as (Sub)-TreesThe codewords in C can be represented by the end-points ofthe corresponding paths. If a code C is PF then none of itsdecendents can represent a codeword in C.

Example

The code C = 0,10,110,111 is PF, its corresponding tree is

• •

0

10

110 111

Slide 20 of 40

Page 68: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Parameters of a Code

DefinitionGiven a code c : S → T ∗, let ni denote the number of symbolss ∈ S which are decoded by strings of length i , i.e.

ni = |s ∈ S | |c(s)| = i| = |s ∈ S | c(s) ∈ T i|

We refer to n1,n2, · · · ,nM as parameters of c, where M is themaximal length of a codeword.

The number of all potential codewords of length i is is given bybi = |T i |, in particular: b = |T | = b1.

Note that nibi gives the fraction of possible codewords of length i

which is actually used by a code (“filling rate”).

Slide 21 of 40

Page 69: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Parameters of a Code

DefinitionGiven a code c : S → T ∗, let ni denote the number of symbolss ∈ S which are decoded by strings of length i , i.e.

ni = |s ∈ S | |c(s)| = i| = |s ∈ S | c(s) ∈ T i|

We refer to n1,n2, · · · ,nM as parameters of c, where M is themaximal length of a codeword.

The number of all potential codewords of length i is is given bybi = |T i |, in particular: b = |T | = b1.

Note that nibi gives the fraction of possible codewords of length i

which is actually used by a code (“filling rate”).

Slide 21 of 40

Page 70: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Parameters of a Code

DefinitionGiven a code c : S → T ∗, let ni denote the number of symbolss ∈ S which are decoded by strings of length i , i.e.

ni = |s ∈ S | |c(s)| = i| = |s ∈ S | c(s) ∈ T i|

We refer to n1,n2, · · · ,nM as parameters of c, where M is themaximal length of a codeword.

The number of all potential codewords of length i is is given bybi = |T i |, in particular: b = |T | = b1.

Note that nibi gives the fraction of possible codewords of length i

which is actually used by a code (“filling rate”).

Slide 21 of 40

Page 71: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Kraft-McMillan NumberDefinitionGiven a code c : S → T ∗ with parameters n1,n2, · · · ,nM . TheKraft-McMillan number of c is defined as:

K =M∑

i=1

ni

bi =n1

b+

n2

b2 + · · ·+ nM

bM

Example

Consider a binary code, i.e. b = b1 = 2. The parameters of thecode are n1 = 1,n2 = 1,n3 = 2 (like in the last example), thenbi = 2i , i.e. b1 = 2,b2 = 4,b3 = 8 and

K =12

+14

+28

=88

= 1

Slide 22 of 40

Page 72: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Kraft-McMillan NumberDefinitionGiven a code c : S → T ∗ with parameters n1,n2, · · · ,nM . TheKraft-McMillan number of c is defined as:

K =M∑

i=1

ni

bi =n1

b+

n2

b2 + · · ·+ nM

bM

Example

Consider a binary code, i.e. b = b1 = 2. The parameters of thecode are n1 = 1,n2 = 1,n3 = 2 (like in the last example), thenbi = 2i , i.e. b1 = 2,b2 = 4,b3 = 8 and

K =12

+14

+28

=88

= 1

Slide 22 of 40

Page 73: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Existence of PF Codes

Theorem (L.G. Kraft, 1949)

Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.

Example

Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8

16 + 616 + 1

16 = 1516 ≤ 1.

• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we

can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.

• Codeword(s) of length 4: Take either 1110 or 1111.

Slide 23 of 40

Page 74: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Existence of PF Codes

Theorem (L.G. Kraft, 1949)

Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.

Example

Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8

16 + 616 + 1

16 = 1516 ≤ 1.

• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we

can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.

• Codeword(s) of length 4: Take either 1110 or 1111.

Slide 23 of 40

Page 75: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Existence of PF Codes

Theorem (L.G. Kraft, 1949)

Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.

Example

Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8

16 + 616 + 1

16 = 1516 ≤ 1.

• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we

can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.

• Codeword(s) of length 4: Take either 1110 or 1111.

Slide 23 of 40

Page 76: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Existence of PF Codes

Theorem (L.G. Kraft, 1949)

Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.

Example

Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8

16 + 616 + 1

16 = 1516 ≤ 1.

• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we

can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.

• Codeword(s) of length 4: Take either 1110 or 1111.

Slide 23 of 40

Page 77: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Existence of PF Codes

Theorem (L.G. Kraft, 1949)

Given an alphabet T with |T | = b and (code) parametersn1,n2, · · · ,nM ∈ N. If K ≤ 1 there exists a PF b-ary codec : S → T ∗ with these parameters.

Example

Construct a PF binary code with n2 = 2,n3 = 3,n4 = 1.First check that K = 8

16 + 616 + 1

16 = 1516 ≤ 1.

• Codewords of length 2: take any two, e.g. 00 and 01.• Codewords of length 3: can’t take 00 · · · or 01 · · · but we

can take all of form 1s2s3 with s2, s3 ∈ B; in fact we needonly three of 100, 101, 110, and (maybe) not 111.

• Codeword(s) of length 4: Take either 1110 or 1111.

Slide 23 of 40

Page 78: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of Kraft’s Theorem

Proof.Any partial sum of the one defining K also fulfills

∑i

nibi ≤ 1, e.g.

n1b1 ≤ 1, n1

b1 + n2b2 ≤ 1, n1

b1 + n2b2 + n3

b3 ≤ 1, etc. This implies:

n1 ≤ b, n2 ≤ b(b1 − n1), n3 ≤ b(b2 − n1b1 − n2), etc.

or in general:

ni ≤ b(

bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1

).

• Codeword length 1: Since n1 ≤ b1 we can chose any ofthem; and b − n1 words of length 1 remain as prefixes forwords of length 2,3, . . ..

Slide 24 of 40

Page 79: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of Kraft’s Theorem

Proof.Any partial sum of the one defining K also fulfills

∑i

nibi ≤ 1, e.g.

n1b1 ≤ 1, n1

b1 + n2b2 ≤ 1, n1

b1 + n2b2 + n3

b3 ≤ 1, etc. This implies:

n1 ≤ b, n2 ≤ b(b1 − n1), n3 ≤ b(b2 − n1b1 − n2), etc.

or in general:

ni ≤ b(

bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1

).

• Codeword length 1: Since n1 ≤ b1 we can chose any ofthem; and b − n1 words of length 1 remain as prefixes forwords of length 2,3, . . ..

Slide 24 of 40

Page 80: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of Kraft’s Theorem (cont.)

Proof (cont).

• Codeword length 2: There are b2 codewords of length 2but b1n1 are unavailable, so b2 − b1n1 are left. But wehave n2 ≤ b(b − n1) so we can make our choices.Thereafter b2 − n1b − n2 prefixes of length 2 are left.

• Codewords of length i : There are alwaysbi−1 − n1bi−2 − n2bi−3 − . . .− ni−1 perfixes of length i − 1left, and from K ≤ 1 we have

ni ≤ b(

bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1

)So we can chose enough codewords ni of length i andenough remain.

Formally, by induction on the maximal length of codewords.

Slide 25 of 40

Page 81: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of Kraft’s Theorem (cont.)

Proof (cont).

• Codeword length 2: There are b2 codewords of length 2but b1n1 are unavailable, so b2 − b1n1 are left. But wehave n2 ≤ b(b − n1) so we can make our choices.Thereafter b2 − n1b − n2 prefixes of length 2 are left.

• Codewords of length i : There are alwaysbi−1 − n1bi−2 − n2bi−3 − . . .− ni−1 perfixes of length i − 1left, and from K ≤ 1 we have

ni ≤ b(

bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1

)So we can chose enough codewords ni of length i andenough remain.

Formally, by induction on the maximal length of codewords.

Slide 25 of 40

Page 82: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of Kraft’s Theorem (cont.)

Proof (cont).

• Codeword length 2: There are b2 codewords of length 2but b1n1 are unavailable, so b2 − b1n1 are left. But wehave n2 ≤ b(b − n1) so we can make our choices.Thereafter b2 − n1b − n2 prefixes of length 2 are left.

• Codewords of length i : There are alwaysbi−1 − n1bi−2 − n2bi−3 − . . .− ni−1 perfixes of length i − 1left, and from K ≤ 1 we have

ni ≤ b(

bi−1 − n1bi−2 − n2bi−3 − . . .− ni−1

)So we can chose enough codewords ni of length i andenough remain.

Formally, by induction on the maximal length of codewords.

Slide 25 of 40

Page 83: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Encoding Strings

Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.

Example

Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.

• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.

• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.

• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.

Slide 26 of 40

Page 84: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Encoding Strings

Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.

Example

Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.

• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.

• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.

• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.

Slide 26 of 40

Page 85: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Encoding Strings

Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.

Example

Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.

• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.

• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.

• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.

Slide 26 of 40

Page 86: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Encoding Strings

Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.

Example

Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.

• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.

• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.

• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.

Slide 26 of 40

Page 87: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Encoding Strings

Consider a code c : S → T ∗ with parameters n1,n2, . . . ,nM . Weaim to encode strings in of length r . Let qr (i) be the number ofstrings of length r in S∗ encoded in a string of length i in T ∗.

Example

Take c : S → T ∗ with parameters n1 = 2, n2 = 3; what is q2(i)?Given x = x1x2 ∈ S2, then c(x) ∈ T has lengths 2, 3 or 4.

• For |c(x1x2)| = 2: Then |c(x1)| = 1 = |c(x2)|, since n1 = 2we have q2(2) = 2× 2 = 4.

• For |c(x1x2)| = 3: Then |c(x1)| = 1 and |c(x2)| = 2 or viceversa, so we get q2(3) = 2× 3 + 3× 2 = 12.

• For |c(x1x2)| = 4: Then |c(x1)| = 2 = |c(x2)|, thus we knowthat q2(4) = 3× 3 = 9.

Slide 26 of 40

Page 88: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Generating FunctionsGenerating functions are an important tool for investigatingsequences, e.g. enumerations of combinatorial objects.

DefinitionFor a sequence of numbers q(1),q(2),q(3), . . . the generatingfunction Q(x) is given as the polynomial (formal power series)in the unknown variable x :

Q(x) = q(1)x + q(2)x2 + q(3)x3 + · · ·

For a code c : S → T ∗ we consider for 1 ≤ i ≤ rM:

qr (i) = ||c(s)| = i | |s| = r|

and the generating function:

Qr (x) = qr (1)x + qr (2)x2 + qr (3)x3 + · · ·+ qr (rM)x rM

Slide 27 of 40

Page 89: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Generating FunctionsGenerating functions are an important tool for investigatingsequences, e.g. enumerations of combinatorial objects.

DefinitionFor a sequence of numbers q(1),q(2),q(3), . . . the generatingfunction Q(x) is given as the polynomial (formal power series)in the unknown variable x :

Q(x) = q(1)x + q(2)x2 + q(3)x3 + · · ·

For a code c : S → T ∗ we consider for 1 ≤ i ≤ rM:

qr (i) = ||c(s)| = i | |s| = r|

and the generating function:

Qr (x) = qr (1)x + qr (2)x2 + qr (3)x3 + · · ·+ qr (rM)x rM

Slide 27 of 40

Page 90: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Generating FunctionsGenerating functions are an important tool for investigatingsequences, e.g. enumerations of combinatorial objects.

DefinitionFor a sequence of numbers q(1),q(2),q(3), . . . the generatingfunction Q(x) is given as the polynomial (formal power series)in the unknown variable x :

Q(x) = q(1)x + q(2)x2 + q(3)x3 + · · ·

For a code c : S → T ∗ we consider for 1 ≤ i ≤ rM:

qr (i) = ||c(s)| = i | |s| = r|

and the generating function:

Qr (x) = qr (1)x + qr (2)x2 + qr (3)x3 + · · ·+ qr (rM)x rM

Slide 27 of 40

Page 91: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Counting Principle (CP)

TheoremGiven a UD code c : S → T ∗ with |c(s)| ≤ M for all s ∈ S withgenerating function Qr (x) then for all r ≥ 1:

Qr (x) = Q1(x)r .

Example

Take c : S → T ∗ with parameters n1 = 2, n2 = 3 as above.What is the relationship between Q1 and Q2?Clearly q1(1) = 2 = n1 and q1(2) = 3 = n2 and thus

Q1(x) = 2x + 3x2 and Q2(x) = 4x2 + 12x3 + 9x4

and thus indeed (Q1(x))2 = Q1(x)Q1(x) = Q2(x).

Slide 28 of 40

Page 92: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Counting Principle (CP)

TheoremGiven a UD code c : S → T ∗ with |c(s)| ≤ M for all s ∈ S withgenerating function Qr (x) then for all r ≥ 1:

Qr (x) = Q1(x)r .

Example

Take c : S → T ∗ with parameters n1 = 2, n2 = 3 as above.What is the relationship between Q1 and Q2?Clearly q1(1) = 2 = n1 and q1(2) = 3 = n2 and thus

Q1(x) = 2x + 3x2 and Q2(x) = 4x2 + 12x3 + 9x4

and thus indeed (Q1(x))2 = Q1(x)Q1(x) = Q2(x).

Slide 28 of 40

Page 93: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of the Counting PrincipleProof ∗.Consider Q1(x)r = (n1x + n2x2 + n3x3 + · · ·+ nMxM)r withnj = q1(j). What is the coefficient n to xk for 1 ≤ k ≤ rM? Toget any term of the form nxk multiply r terms from Q1(x) to getnxk = ni1x i1 × ni2x i2 × · · · × nir x

ir with i1 + i2 + · · ·+ ir = k andthen n = ni1ni2 · · · nir . The coefficient to xk is∑

i1+i2+···+ir =k

ni1ni2 · · · nir

Consider Qr (x) or qr (k), i.e. the number of words of length kencoded from strings of length r : Each such codeword isformed as c(x1)c(x2) · · · c(xr ), we can chose any xj with|c(xj)| = ij as long as i1 + i2 + · · ·+ ir = k in which case thereni1ni2 · · · nir possible (input) stings. In total we have to consideragain all partitions of k

Slide 29 of 40

Page 94: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of the Counting PrincipleProof ∗.Consider Q1(x)r = (n1x + n2x2 + n3x3 + · · ·+ nMxM)r withnj = q1(j). What is the coefficient n to xk for 1 ≤ k ≤ rM? Toget any term of the form nxk multiply r terms from Q1(x) to getnxk = ni1x i1 × ni2x i2 × · · · × nir x

ir with i1 + i2 + · · ·+ ir = k andthen n = ni1ni2 · · · nir . The coefficient to xk is∑

i1+i2+···+ir =k

ni1ni2 · · · nir

Consider Qr (x) or qr (k), i.e. the number of words of length kencoded from strings of length r : Each such codeword isformed as c(x1)c(x2) · · · c(xr ), we can chose any xj with|c(xj)| = ij as long as i1 + i2 + · · ·+ ir = k in which case thereni1ni2 · · · nir possible (input) stings. In total we have to consideragain all partitions of k

Slide 29 of 40

Page 95: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Kraft-McMillan Number for UD Codes

Theorem (B. McMillan, 1956)

If there exists a uniquely decodable code c : S → T ∗ withparameters n1,n2, · · · ,nM ∈ N then K ≤ 1.

∃ UD =⇒ K ≤ 1 =⇒ ∃ PF and PF =⇒ UD

Corollary

The following statements are equivalent for codes c : S → T ∗:• There exists a UD code with parameters n1,n2, . . . ,nM .• It holds that K ≤ 1 for parameters n1,n2, . . . ,nM .• There exists a PF code with parameters n1,n2, . . . ,nM .

Slide 30 of 40

Page 96: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Kraft-McMillan Number for UD Codes

Theorem (B. McMillan, 1956)

If there exists a uniquely decodable code c : S → T ∗ withparameters n1,n2, · · · ,nM ∈ N then K ≤ 1.

∃ UD =⇒ K ≤ 1 =⇒ ∃ PF and PF =⇒ UD

Corollary

The following statements are equivalent for codes c : S → T ∗:• There exists a UD code with parameters n1,n2, . . . ,nM .• It holds that K ≤ 1 for parameters n1,n2, . . . ,nM .• There exists a PF code with parameters n1,n2, . . . ,nM .

Slide 30 of 40

Page 97: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of McMillan’s TheoremProof (Indirect) ∗.Fix r ≥ 1 and take x = 1

b in Qr (x), we get (M = max |c(s)|):

Qr

(1b

)=

qr (1)

b+

qr (2)

b2 +qr (3)

b3 + · · ·+ qr (rM)

brM .

UD implies qr (i) ≤ bi and thus qr (j)bj ≤ 1 for all 1 ≤ j ≤ rM, and

therefore Qr(1

b

)≤ rM. From K = Q1

(1b

)and by CP we have:

K r =

(Q1

(1b

))r

= Qr

(1b

)≤ rM thus

K r

r≤ M ∀r .

Suppose K > 1 or K = 1 + k with k > 0, then K r = (1 + k)r =1 + rk + 1

2 r(r − 1)k2 + · · · Furthermore thus K r > 12 r(r − 1)k2

or K r

r > 12(r − 1)k2 which, for large r , contradicts K r

r ≤ M.

Slide 31 of 40

Page 98: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of McMillan’s TheoremProof (Indirect) ∗.Fix r ≥ 1 and take x = 1

b in Qr (x), we get (M = max |c(s)|):

Qr

(1b

)=

qr (1)

b+

qr (2)

b2 +qr (3)

b3 + · · ·+ qr (rM)

brM .

UD implies qr (i) ≤ bi and thus qr (j)bj ≤ 1 for all 1 ≤ j ≤ rM, and

therefore Qr(1

b

)≤ rM. From K = Q1

(1b

)and by CP we have:

K r =

(Q1

(1b

))r

= Qr

(1b

)≤ rM thus

K r

r≤ M ∀r .

Suppose K > 1 or K = 1 + k with k > 0, then K r = (1 + k)r =1 + rk + 1

2 r(r − 1)k2 + · · · Furthermore thus K r > 12 r(r − 1)k2

or K r

r > 12(r − 1)k2 which, for large r , contradicts K r

r ≤ M.

Slide 31 of 40

Page 99: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Proof of McMillan’s TheoremProof (Indirect) ∗.Fix r ≥ 1 and take x = 1

b in Qr (x), we get (M = max |c(s)|):

Qr

(1b

)=

qr (1)

b+

qr (2)

b2 +qr (3)

b3 + · · ·+ qr (rM)

brM .

UD implies qr (i) ≤ bi and thus qr (j)bj ≤ 1 for all 1 ≤ j ≤ rM, and

therefore Qr(1

b

)≤ rM. From K = Q1

(1b

)and by CP we have:

K r =

(Q1

(1b

))r

= Qr

(1b

)≤ rM thus

K r

r≤ M ∀r .

Suppose K > 1 or K = 1 + k with k > 0, then K r = (1 + k)r =1 + rk + 1

2 r(r − 1)k2 + · · · Furthermore thus K r > 12 r(r − 1)k2

or K r

r > 12(r − 1)k2 which, for large r , contradicts K r

r ≤ M.

Slide 31 of 40

Page 100: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Some Probability Theory

Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.

We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).

Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.

Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].

Slide 32 of 40

Page 101: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Some Probability Theory

Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.

We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).

Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.

Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].

Slide 32 of 40

Page 102: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Some Probability Theory

Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.

We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).

Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.

Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].

Slide 32 of 40

Page 103: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Some Probability Theory

Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.

We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).

Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.

Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].

Slide 32 of 40

Page 104: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Some Probability Theory

Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.

We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).

Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.

Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].

Slide 32 of 40

Page 105: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Some Probability Theory

Probability theory is concerned with quantifying or measuringthe chances that certain events can happen.

We consider an event space, i.e. a finite set Ω and a setB ⊆ P(Ω) of measurable sets in Ω which form a Booleanalgebra (based on union, intersection and complement).For finite sets one can use the power-set B = P(Ω).

Probabilities are then assigned via a measure, i.e. a functionPr : B → R or m : B → R or µ : B → R.

Note: For (uncountable) infinite Ω one needs to develop a moregeneral measure theory (e.g. with B a σ-algebra) to overcomeproblems like µ([0,1]) = 1 but µ(x) = 0 for all x ∈ [0,1].

Slide 32 of 40

Page 106: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Finite Probability Spaces

Consider a finite measurable spaces (Ω,B) with |Ω| = n,

DefinitionA probability (measure) Pr : B on (Ω,B) has to fulfill• Pr(Ω) = 1.• 0 ≤ Pr(A) ≤ 1 for all A ∈ B.• Pr(A ∪ B) = Pr(A) + Pr(B) for A ∩ B = ∅.

Some further rules (which follow from the axioms above):• Pr(∅) = 0,• Pr(A) = 1− Pr(A),• Pr(A ∪ B) = Pr(A) + Pr(B)− Pr(A ∩ B).• etc.

Slide 33 of 40

Page 107: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Finite Probability Spaces

Consider a finite measurable spaces (Ω,B) with |Ω| = n,

DefinitionA probability (measure) Pr : B on (Ω,B) has to fulfill• Pr(Ω) = 1.• 0 ≤ Pr(A) ≤ 1 for all A ∈ B.• Pr(A ∪ B) = Pr(A) + Pr(B) for A ∩ B = ∅.

Some further rules (which follow from the axioms above):• Pr(∅) = 0,• Pr(A) = 1− Pr(A),• Pr(A ∪ B) = Pr(A) + Pr(B)− Pr(A ∩ B).• etc.

Slide 33 of 40

Page 108: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Finite Probability Spaces

Consider a finite measurable spaces (Ω,B) with |Ω| = n,

DefinitionA probability (measure) Pr : B on (Ω,B) has to fulfill• Pr(Ω) = 1.• 0 ≤ Pr(A) ≤ 1 for all A ∈ B.• Pr(A ∪ B) = Pr(A) + Pr(B) for A ∩ B = ∅.

Some further rules (which follow from the axioms above):• Pr(∅) = 0,• Pr(A) = 1− Pr(A),• Pr(A ∪ B) = Pr(A) + Pr(B)− Pr(A ∩ B).• etc.

Slide 33 of 40

Page 109: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Random Distributions

For finite probability spaces (Ω,B,Pr) with |Ω| = n, we candefine a probability (measure) via atoms in ω ∈ Ω.

DefinitionA probability distribution is a function p : Ω→ [0,1], with∑

ω∈Ω

p(ω) = 1.

If we enumerate the elements in Ω in some arbitrary way asΩ = ω1, ω2, . . . , ωn then we can also represented p by a (row)vector in Rn.

p = (p(ω1),p(ω2), . . . ,p(ωn))

Slide 34 of 40

Page 110: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Random Distributions

For finite probability spaces (Ω,B,Pr) with |Ω| = n, we candefine a probability (measure) via atoms in ω ∈ Ω.

DefinitionA probability distribution is a function p : Ω→ [0,1], with∑

ω∈Ω

p(ω) = 1.

If we enumerate the elements in Ω in some arbitrary way asΩ = ω1, ω2, . . . , ωn then we can also represented p by a (row)vector in Rn.

p = (p(ω1),p(ω2), . . . ,p(ωn))

Slide 34 of 40

Page 111: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Random Distributions

For finite probability spaces (Ω,B,Pr) with |Ω| = n, we candefine a probability (measure) via atoms in ω ∈ Ω.

DefinitionA probability distribution is a function p : Ω→ [0,1], with∑

ω∈Ω

p(ω) = 1.

If we enumerate the elements in Ω in some arbitrary way asΩ = ω1, ω2, . . . , ωn then we can also represented p by a (row)vector in Rn.

p = (p(ω1),p(ω2), . . . ,p(ωn))

Slide 34 of 40

Page 112: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Random VariablesProbability distributions on a finite Ω define a probability(measure) in the obvious way

Pr(A) =∑ω∈A

p(ω).

DefinitionA random variable is function X : Ω→ R.

We can represent random variables as (column) vectors in Rn.

Example

Consider a dice. The event space describes the top face of thedice, i.e. Ω =

, , , , ,

, define X which

counts the number of eyes, e.g. X( )

= 5 etc.

Slide 35 of 40

Page 113: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Random VariablesProbability distributions on a finite Ω define a probability(measure) in the obvious way

Pr(A) =∑ω∈A

p(ω).

DefinitionA random variable is function X : Ω→ R.

We can represent random variables as (column) vectors in Rn.

Example

Consider a dice. The event space describes the top face of thedice, i.e. Ω =

, , , , ,

, define X which

counts the number of eyes, e.g. X( )

= 5 etc.

Slide 35 of 40

Page 114: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Random VariablesProbability distributions on a finite Ω define a probability(measure) in the obvious way

Pr(A) =∑ω∈A

p(ω).

DefinitionA random variable is function X : Ω→ R.

We can represent random variables as (column) vectors in Rn.

Example

Consider a dice. The event space describes the top face of thedice, i.e. Ω =

, , , , ,

, define X which

counts the number of eyes, e.g. X( )

= 5 etc.

Slide 35 of 40

Page 115: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Random VariablesProbability distributions on a finite Ω define a probability(measure) in the obvious way

Pr(A) =∑ω∈A

p(ω).

DefinitionA random variable is function X : Ω→ R.

We can represent random variables as (column) vectors in Rn.

Example

Consider a dice. The event space describes the top face of thedice, i.e. Ω =

, , , , ,

, define X which

counts the number of eyes, e.g. X( )

= 5 etc.

Slide 35 of 40

Page 116: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Moments in ProbabilityExpectation Value. For a random variable X and probabilitydistribution p we define:

E(X ) =∑ω∈Ω

p(ω)X (ω) =∑

i

piXi = µX

One can show: E(X + Y ) = E(X ) + E(Y ) and E(αX ) = αE(X ).

Example

Consider a (fair) dice and previous random variable X , then

E(X ) = 116

+ 216

+ 316

+ 416

+ 516

+ 616

=216

Variance. For random variable X and distribution p we define:

Var(X ) = E((X − E(X ))2 = E(X 2)− (E(X ))2.

The standard deviation is σX =√

Var(X ).Slide 36 of 40

Page 117: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Moments in ProbabilityExpectation Value. For a random variable X and probabilitydistribution p we define:

E(X ) =∑ω∈Ω

p(ω)X (ω) =∑

i

piXi = µX

One can show: E(X + Y ) = E(X ) + E(Y ) and E(αX ) = αE(X ).

Example

Consider a (fair) dice and previous random variable X , then

E(X ) = 116

+ 216

+ 316

+ 416

+ 516

+ 616

=216

Variance. For random variable X and distribution p we define:

Var(X ) = E((X − E(X ))2 = E(X 2)− (E(X ))2.

The standard deviation is σX =√

Var(X ).Slide 36 of 40

Page 118: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Moments in ProbabilityExpectation Value. For a random variable X and probabilitydistribution p we define:

E(X ) =∑ω∈Ω

p(ω)X (ω) =∑

i

piXi = µX

One can show: E(X + Y ) = E(X ) + E(Y ) and E(αX ) = αE(X ).

Example

Consider a (fair) dice and previous random variable X , then

E(X ) = 116

+ 216

+ 316

+ 416

+ 516

+ 616

=216

Variance. For random variable X and distribution p we define:

Var(X ) = E((X − E(X ))2 = E(X 2)− (E(X ))2.

The standard deviation is σX =√

Var(X ).Slide 36 of 40

Page 119: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Moments in ProbabilityExpectation Value. For a random variable X and probabilitydistribution p we define:

E(X ) =∑ω∈Ω

p(ω)X (ω) =∑

i

piXi = µX

One can show: E(X + Y ) = E(X ) + E(Y ) and E(αX ) = αE(X ).

Example

Consider a (fair) dice and previous random variable X , then

E(X ) = 116

+ 216

+ 316

+ 416

+ 516

+ 616

=216

Variance. For random variable X and distribution p we define:

Var(X ) = E((X − E(X ))2 = E(X 2)− (E(X ))2.

The standard deviation is σX =√

Var(X ).Slide 36 of 40

Page 120: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is

PrB(A) = Pr(A | B) =Pr(A ∩ B)

Pr(B)

One can show Bayes Theorem which states:

PrA(B) =PrB(A)Pr(B)

Pr(A)=

Pr(A | B)Pr(B)

Pr(A)= Pr(B | A).

Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if

Pr(B) = Pr(B | A) =Pr(A ∩ B)

Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).

Slide 37 of 40

Page 121: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is

PrB(A) = Pr(A | B) =Pr(A ∩ B)

Pr(B)

One can show Bayes Theorem which states:

PrA(B) =PrB(A)Pr(B)

Pr(A)=

Pr(A | B)Pr(B)

Pr(A)= Pr(B | A).

Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if

Pr(B) = Pr(B | A) =Pr(A ∩ B)

Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).

Slide 37 of 40

Page 122: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is

PrB(A) = Pr(A | B) =Pr(A ∩ B)

Pr(B)

One can show Bayes Theorem which states:

PrA(B) =PrB(A)Pr(B)

Pr(A)=

Pr(A | B)Pr(B)

Pr(A)= Pr(B | A).

Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if

Pr(B) = Pr(B | A) =Pr(A ∩ B)

Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).

Slide 37 of 40

Page 123: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is

PrB(A) = Pr(A | B) =Pr(A ∩ B)

Pr(B)

One can show Bayes Theorem which states:

PrA(B) =PrB(A)Pr(B)

Pr(A)=

Pr(A | B)Pr(B)

Pr(A)= Pr(B | A).

Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if

Pr(B) = Pr(B | A) =Pr(A ∩ B)

Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).

Slide 37 of 40

Page 124: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is

PrB(A) = Pr(A | B) =Pr(A ∩ B)

Pr(B)

One can show Bayes Theorem which states:

PrA(B) =PrB(A)Pr(B)

Pr(A)=

Pr(A | B)Pr(B)

Pr(A)= Pr(B | A).

Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if

Pr(B) = Pr(B | A) =Pr(A ∩ B)

Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).

Slide 37 of 40

Page 125: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Bayes Theorem and IndependenceGiven two subsets A and B in a probability space (Ω,B,Pr).The conditional probability of A given that B has happened is

PrB(A) = Pr(A | B) =Pr(A ∩ B)

Pr(B)

One can show Bayes Theorem which states:

PrA(B) =PrB(A)Pr(B)

Pr(A)=

Pr(A | B)Pr(B)

Pr(A)= Pr(B | A).

Given two subsets A and B in a probability space (Ω,B,Pr), Aand B are (probabilistically) independent if

Pr(B) = Pr(B | A) =Pr(A ∩ B)

Pr(A)or Pr(A ∩ B) = Pr(A)Pr(B).

Slide 37 of 40

Page 126: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:

Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)

If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.

Caveat: Not all distributions on Ω1 × Ω2 are a product.

Example

Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1

2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1

2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).

Slide 38 of 40

Page 127: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:

Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)

If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.

Caveat: Not all distributions on Ω1 × Ω2 are a product.

Example

Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1

2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1

2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).

Slide 38 of 40

Page 128: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:

Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)

If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.

Caveat: Not all distributions on Ω1 × Ω2 are a product.

Example

Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1

2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1

2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).

Slide 38 of 40

Page 129: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:

Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)

If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.

Caveat: Not all distributions on Ω1 × Ω2 are a product.

Example

Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1

2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1

2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).

Slide 38 of 40

Page 130: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Products and ProbabilityGiven two probability spaces (Ω1,Pr1) and (Ω2,Pr2), to keepthings simple use Bi = P(Ωi). We can define a probability Pr onthe cartesian product Ω = Ω1 × Ω2 via:

Pr(〈ω1, ω2〉) = Pr1(ω1)Pr1(ω2)

If Pr1 and Pr2 correspond to probability distributions p1 and p2and p to Pr then p = p1 ⊗ p2, i.e. the tensor product.

Caveat: Not all distributions on Ω1 × Ω2 are a product.

Example

Consider Ω1 = 0,1 and Ω2 = z,o and probabilityPr(〈0, z〉) = Pr(〈1,o〉) = 1

2 and Pr(〈0,o〉) = Pr(〈1, z〉) = 0cannot be represented as a product p1 ⊗ p2.However, one can show: p = 1

2(1,0)⊗ (1,0) + 12(0,1)⊗ (0,1).

Slide 38 of 40

Page 131: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Tensor/Kronecker Product

Given a n ×m matrix A and a k × l matrix B:

A =

a11 . . . a1m...

. . ....

an1 . . . anm

B =

b11 . . . b1l...

. . ....

bk1 . . . bkl

The tensor or Kronecker product A⊗ B is a nk ×ml matrix:

A⊗ B =

a11B . . . a1mB...

. . ....

an1B . . . anmB

Special cases are square matrices (n = m and k = l) andvectors (row n = k = 1, column m = l = 1).

Slide 39 of 40

Page 132: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Tensor/Kronecker Product

Given a n ×m matrix A and a k × l matrix B:

A =

a11 . . . a1m...

. . ....

an1 . . . anm

B =

b11 . . . b1l...

. . ....

bk1 . . . bkl

The tensor or Kronecker product A⊗ B is a nk ×ml matrix:

A⊗ B =

a11B . . . a1mB...

. . ....

an1B . . . anmB

Special cases are square matrices (n = m and k = l) andvectors (row n = k = 1, column m = l = 1).

Slide 39 of 40

Page 133: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

Tensor/Kronecker Product

Given a n ×m matrix A and a k × l matrix B:

A =

a11 . . . a1m...

. . ....

an1 . . . anm

B =

b11 . . . b1l...

. . ....

bk1 . . . bkl

The tensor or Kronecker product A⊗ B is a nk ×ml matrix:

A⊗ B =

a11B . . . a1mB...

. . ....

an1B . . . anmB

Special cases are square matrices (n = m and k = l) andvectors (row n = k = 1, column m = l = 1).

Slide 39 of 40

Page 134: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

CorrelationThe covariance of two random variables X and Y is:

Cov(X ,Y ) = E((X − E(X ))E((Y − E(Y )) = E(XY )− E(X )E(Y )

The correlation coefficient is ρ(X ,Y ) = Cov(X ,Y )/(σXσY ).

For independent random variables X and Y – i.e. if we havePr((X = xj) ∩ (Y = yk )) = Pr(X = xj)Pr(Y = yk ):

E(XY ) = E(X )E(Y ) and Cov(X ,Y ) = ρ(X ,Y ) = 0.

Note that ρ(X ,Y ) = 0 does not imply independence.

Example

Pr(X = x) = 13 for x = −1,0,1 and Y = X 2, then ρ(X ,Y ) = 0.

Slide 40 of 40

Page 135: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

CorrelationThe covariance of two random variables X and Y is:

Cov(X ,Y ) = E((X − E(X ))E((Y − E(Y )) = E(XY )− E(X )E(Y )

The correlation coefficient is ρ(X ,Y ) = Cov(X ,Y )/(σXσY ).

For independent random variables X and Y – i.e. if we havePr((X = xj) ∩ (Y = yk )) = Pr(X = xj)Pr(Y = yk ):

E(XY ) = E(X )E(Y ) and Cov(X ,Y ) = ρ(X ,Y ) = 0.

Note that ρ(X ,Y ) = 0 does not imply independence.

Example

Pr(X = x) = 13 for x = −1,0,1 and Y = X 2, then ρ(X ,Y ) = 0.

Slide 40 of 40

Page 136: Information and Coding Theory (CO349) - doc.ic.ac.ukherbert/teaching/ICA1S.pdf · Information and Coding Theory (CO349) Coding and Decoding Herbert Wiklicky herbert@doc.ic.ac.uk Department

CorrelationThe covariance of two random variables X and Y is:

Cov(X ,Y ) = E((X − E(X ))E((Y − E(Y )) = E(XY )− E(X )E(Y )

The correlation coefficient is ρ(X ,Y ) = Cov(X ,Y )/(σXσY ).

For independent random variables X and Y – i.e. if we havePr((X = xj) ∩ (Y = yk )) = Pr(X = xj)Pr(Y = yk ):

E(XY ) = E(X )E(Y ) and Cov(X ,Y ) = ρ(X ,Y ) = 0.

Note that ρ(X ,Y ) = 0 does not imply independence.

Example

Pr(X = x) = 13 for x = −1,0,1 and Y = X 2, then ρ(X ,Y ) = 0.

Slide 40 of 40