EE514a Information Theory I Fall Quarter 2019EE514a { Information Theory I Fall Quarter 2019 Prof....
Transcript of EE514a Information Theory I Fall Quarter 2019EE514a { Information Theory I Fall Quarter 2019 Prof....
-
EE514a – Information Theory IFall Quarter 2019
Prof. Jeff Bilmes
University of Washington, SeattleDepartment of Electrical & Computer Engineering
Fall Quarter, 2019https://class.ece.uw.edu/514/bilmes/ee514_fall_2019/
Lecture 4 - Oct 7th, 2019
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F1/50(pg.1/244)
https://class.ece.uw.edu/514/bilmes/ee514_fall_2019/
-
Logistics Review
Class Road Map - IT-I
L1 (9/25): Overview, Communications,Information, Entropy
L2 (9/30): Entropy, Mutual Information,KL-Divergence
L3 (10/2): More KL, Jensen, more Venn,Log Sum,Data Proc. Inequality
L4 (10/7): Data Proc. Ineq.,thermodynamics, Stats, Fano,
L5 (10/9): M. of Conv, AEP, SourceCoding
L6 (10/14):
L7 (10/16):
L8 (10/21):
L9 (10/23):
L10 (10/28):
L11 (10/30):
L12 (11/4):
LXX (11/6): In class midterm exam
LXX (11/11): Veterans Day holiday
L13 (11/13):
L14 (11/18):
L15 (11/20):
L16 (11/25):
L17 (11/27):
L18 (12/2):
L19 (12/4):
LXX (12/10): Final exam
Finals Week: December 9th–13th.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F2/50(pg.2/244)
-
Logistics Review
Cumulative Outstanding Reading
Read chapters 1 and 2 in our book (Cover & Thomas, “InformationTheory”) (including Fano’s inequality).
Read chapters 3 and 4 in our book (Cover & Thomas, “InformationTheory”).
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F3/50(pg.3/244)
-
Logistics Review
Homework
Homework 1 on our assignment dropbox(https://canvas.uw.edu/courses/1319497/assignments), wasdue Tuesday, Oct 8th, 11:55pm.
Homework 2 will be out by early next week.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F4/50(pg.4/244)
https://canvas.uw.edu/courses/1319497/assignments
-
Logistics Review
Fano’s Inequality: Summary
Consider the following situation where we send X through a noisychannel, receive Y , and do further processing
X
p(y|x)
Yg(·) X̂
NoisyChannel
processing
X̂ is an estimate of X.
An error if X 6= X̂. How do we measure the error? With probability,Pe , p(X 6= X̂).Intuitively, conditional entropy should tell us something about theerror possibilities, in fact, we have
Theorem 4.2.7 (Fano’s Inequality)
H(Pe) + Pe log(|X | − 1) ≥ H(X|X̂) ≥ H(X|Y ) (4.34)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F5/50(pg.5/244)
-
Logistics Review
Fano’s Inequality
Theorem 4.2.7
H(Pe) + Pe log(|X | − 1) ≥ H(X|X̂) ≥ H(X|Y ) (4.28)
So Pe = 0 requires that H(X|Y ) = 0!Note, the theorem simplifies (and implies)1 + Pe log(|X |) ≥ H(X|Y ), or
Pe ≥H(X|Y )− 1
log |X | (4.29)
yielding a lower bound on the error.
This will be used to prove the converse to Shannon’s codingtheorem, i.e., that any code with probability of error → 0 as theblock length increases must have a rate R < C = the capacity ofthe channel (to be defined).
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F6/50(pg.6/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Lemma 4.3.1
Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:
p(X = X ′) ≥ max(
2−H(p)−D(p||r), 2−H(r)−D(r||p),)
(4.1)
Proof.
2−H(p)−D(p||r) = 2∑x p(x) log p(x)+
∑x p(x) log
r(x)p(x) (4.2)
= 2∑x p(x) log r(x) (4.3)
≤∑
x
p(x)2log r(x) (4.4)
=∑
x
p(x)r(x) (4.5)
= p(X = X ′)
(4.6)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.7/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Lemma 4.3.1
Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:
p(X = X ′) ≥ max(
2−H(p)−D(p||r), 2−H(r)−D(r||p),)
(4.1)
Proof.
2−H(p)−D(p||r) = 2∑x p(x) log p(x)+
∑x p(x) log
r(x)p(x) (4.2)
= 2∑x p(x) log r(x) (4.3)
≤∑
x
p(x)2log r(x) (4.4)
=∑
x
p(x)r(x) (4.5)
= p(X = X ′)
(4.6)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.8/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Lemma 4.3.1
Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:
p(X = X ′) ≥ max(
2−H(p)−D(p||r), 2−H(r)−D(r||p),)
(4.1)
Proof.
2−H(p)−D(p||r) = 2∑x p(x) log p(x)+
∑x p(x) log
r(x)p(x) (4.2)
= 2∑x p(x) log r(x) (4.3)
≤∑
x
p(x)2log r(x) (4.4)
=∑
x
p(x)r(x) (4.5)
= p(X = X ′)
(4.6)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.9/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Lemma 4.3.1
Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:
p(X = X ′) ≥ max(
2−H(p)−D(p||r), 2−H(r)−D(r||p),)
(4.1)
Proof.
2−H(p)−D(p||r) = 2∑x p(x) log p(x)+
∑x p(x) log
r(x)p(x) (4.2)
= 2∑x p(x) log r(x) (4.3)
≤∑
x
p(x)2log r(x) (4.4)
=∑
x
p(x)r(x) (4.5)
= p(X = X ′)
(4.6)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.10/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Lemma 4.3.1
Let X,X ′ be two independent r.v.s with X ∼ p(x) and X ′ ∼ r(x), withx, x′ ∈ X (same alphabet). L. bound on cross collision probability:
p(X = X ′) ≥ max(
2−H(p)−D(p||r), 2−H(r)−D(r||p),)
(4.1)
Proof.
2−H(p)−D(p||r) = 2∑x p(x) log p(x)+
∑x p(x) log
r(x)p(x) (4.2)
= 2∑x p(x) log r(x) (4.3)
≤∑
x
p(x)2log r(x) (4.4)
=∑
x
p(x)r(x) (4.5)
= p(X = X ′) (4.6)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F7/50(pg.11/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Thus, taking p(x) = r(x), the probability that two i.i.d. randomvariables X,X ′ are the same can be bounded as follows:
P (X = X ′) ≥ 2−H(p) (4.7)
P (X = X ′) =∑
x p(x)2 known as the probability of collision.
Above improves standard collision probability bound∑
x
p(x)2 =∑
x
[(p(x)− 1/n) + 1/n]2 (4.8)
= 1/n+∑
x
[(p(x)− 1/n)]2 ≥ 1/n (4.9)
so equality achieved when p(x) = 1/n (uniform distribution), alsowhen P (X = X ′) = 2−H(p) = 2− logn.Many other probabilistic quantities can be bounded in terms ofentropic quantities as we will see throughout the course.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F8/50(pg.12/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Thus, taking p(x) = r(x), the probability that two i.i.d. randomvariables X,X ′ are the same can be bounded as follows:
P (X = X ′) ≥ 2−H(p) (4.7)
P (X = X ′) =∑
x p(x)2 known as the probability of collision.
Above improves standard collision probability bound∑
x
p(x)2 =∑
x
[(p(x)− 1/n) + 1/n]2 (4.8)
= 1/n+∑
x
[(p(x)− 1/n)]2 ≥ 1/n (4.9)
so equality achieved when p(x) = 1/n (uniform distribution), alsowhen P (X = X ′) = 2−H(p) = 2− logn.Many other probabilistic quantities can be bounded in terms ofentropic quantities as we will see throughout the course.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F8/50(pg.13/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Thus, taking p(x) = r(x), the probability that two i.i.d. randomvariables X,X ′ are the same can be bounded as follows:
P (X = X ′) ≥ 2−H(p) (4.7)
P (X = X ′) =∑
x p(x)2 known as the probability of collision.
Above improves standard collision probability bound∑
x
p(x)2 =∑
x
[(p(x)− 1/n) + 1/n]2 (4.8)
= 1/n+∑
x
[(p(x)− 1/n)]2 ≥ 1/n (4.9)
so equality achieved when p(x) = 1/n (uniform distribution), alsowhen P (X = X ′) = 2−H(p) = 2− logn.
Many other probabilistic quantities can be bounded in terms ofentropic quantities as we will see throughout the course.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F8/50(pg.14/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
An interesting bound on probability of equality
Thus, taking p(x) = r(x), the probability that two i.i.d. randomvariables X,X ′ are the same can be bounded as follows:
P (X = X ′) ≥ 2−H(p) (4.7)
P (X = X ′) =∑
x p(x)2 known as the probability of collision.
Above improves standard collision probability bound∑
x
p(x)2 =∑
x
[(p(x)− 1/n) + 1/n]2 (4.8)
= 1/n+∑
x
[(p(x)− 1/n)]2 ≥ 1/n (4.9)
so equality achieved when p(x) = 1/n (uniform distribution), alsowhen P (X = X ′) = 2−H(p) = 2− logn.Many other probabilistic quantities can be bounded in terms ofentropic quantities as we will see throughout the course.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F8/50(pg.15/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Weak Law of Large Numbers (WLLN)
Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that
p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)
Written as 1nSnp−→ µ (converges in probability)
1nSn gets as close to µ as we want and the variance of
1nSn gets
arbitrarily small if n gets big enough.
SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Weak Law of Large Numbers (WLLN)
Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that
p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)
Written as 1nSnp−→ µ (converges in probability)
1nSn gets as close to µ as we want and the variance of
1nSn gets
arbitrarily small if n gets big enough.
SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Weak Law of Large Numbers (WLLN)
Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that
p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)
Written as 1nSnp−→ µ (converges in probability)
1nSn gets as close to µ as we want and the variance of
1nSn gets
arbitrarily small if n gets big enough.
SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Weak Law of Large Numbers (WLLN)
Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that
p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)
Written as 1nSnp−→ µ (converges in probability)
1nSn gets as close to µ as we want and the variance of
1nSn gets
arbitrarily small if n gets big enough.
SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Weak Law of Large Numbers (WLLN)
Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that
p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)
Written as 1nSnp−→ µ (converges in probability)
1nSn gets as close to µ as we want and the variance of
1nSn gets
arbitrarily small if n gets big enough.
SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Weak Law of Large Numbers (WLLN)
Let X1, X2, . . . be a sequence of i.i.d. r.v.s with EXi = µ 0, WLLN says that
p(| 1nSn − µ| > �)→ 0 as n→∞ (4.10)
Written as 1nSnp−→ µ (converges in probability)
1nSn gets as close to µ as we want and the variance of
1nSn gets
arbitrarily small if n gets big enough.
SLLN: If all r.v.s have finite 2nd moments, i.e., E(X2i )
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence
Xn → X almost surely (Xn a.s.−−→ X) if the set{ω ∈ Ω : Xn(ω)→ X(ω) as n→∞} (4.11)
is an event with probability 1. So all such events combined togetherhave probability 1, and any event left out has probability zero,according to the underlying probability measure.
Xn → X in the rth mean (r ≥ 1), (written Xn r−→ X) ifE|Xrn| �)→ 0 as n→∞, � > 0 (4.13)
Xn → X in distribution (written Xn D−→ X) ifp(Xn ≤ x)→ P (X < x) as n→∞ (4.14)
for all points x at which FX(x) = p(X ≤ x) is continuous.Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F10/50(pg.22/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence, expanded definition
Expanded description of convergence in probability.
Let X1, X2, . . . be a sequence of i.i.d. r.v.s, with EX = µ 0, ∀δ > 0, ∃n0 such that for all n > n0, we have
p
(∣∣∣ 1nSn − µ
∣∣∣ > �)< δ for n > n0 (4.15)
Equivalently
p
(∣∣∣ 1nSn − µ
∣∣∣ ≤ �)> 1− δ for n > n0 (4.16)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F11/50(pg.23/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence, expanded definition
Expanded description of convergence in probability.
Let X1, X2, . . . be a sequence of i.i.d. r.v.s, with EX = µ 0, ∀δ > 0, ∃n0 such that for all n > n0, we have
p
(∣∣∣ 1nSn − µ
∣∣∣ > �)< δ for n > n0 (4.15)
Equivalently
p
(∣∣∣ 1nSn − µ
∣∣∣ ≤ �)> 1− δ for n > n0 (4.16)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F11/50(pg.24/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence, expanded definition
Expanded description of convergence in probability.
Let X1, X2, . . . be a sequence of i.i.d. r.v.s, with EX = µ 0, ∀δ > 0, ∃n0 such that for all n > n0, we have
p
(∣∣∣ 1nSn − µ
∣∣∣ > �)< δ for n > n0 (4.15)
Equivalently
p
(∣∣∣ 1nSn − µ
∣∣∣ ≤ �)> 1− δ for n > n0 (4.16)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F11/50(pg.25/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence, expanded definition
Expanded description of convergence in probability.
Let X1, X2, . . . be a sequence of i.i.d. r.v.s, with EX = µ 0, ∀δ > 0, ∃n0 such that for all n > n0, we have
p
(∣∣∣ 1nSn − µ
∣∣∣ > �)< δ for n > n0 (4.15)
Equivalently
p
(∣∣∣ 1nSn − µ
∣∣∣ ≤ �)> 1− δ for n > n0 (4.16)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F11/50(pg.26/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence, implications
Some modes of convergence are stronger than others. That is
a.s.
p
r
D ∀r ≥ 1⇒
⇒⇒
Also, if r > s ≥ 1, then r ⇒ s.Different versions of things like the law of large numbers differ onlyin the strength of their required modes of convergence.
Keep this in mind when considering the AEP which we turn to next.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F12/50(pg.27/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence, implications
Some modes of convergence are stronger than others. That is
a.s.
p
r
D ∀r ≥ 1⇒
⇒⇒
Also, if r > s ≥ 1, then r ⇒ s.
Different versions of things like the law of large numbers differ onlyin the strength of their required modes of convergence.
Keep this in mind when considering the AEP which we turn to next.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F12/50(pg.28/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence, implications
Some modes of convergence are stronger than others. That is
a.s.
p
r
D ∀r ≥ 1⇒
⇒⇒
Also, if r > s ≥ 1, then r ⇒ s.Different versions of things like the law of large numbers differ onlyin the strength of their required modes of convergence.
Keep this in mind when considering the AEP which we turn to next.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F12/50(pg.29/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Aside: modes of convergence, implications
Some modes of convergence are stronger than others. That is
a.s.
p
r
D ∀r ≥ 1⇒
⇒⇒
Also, if r > s ≥ 1, then r ⇒ s.Different versions of things like the law of large numbers differ onlyin the strength of their required modes of convergence.
Keep this in mind when considering the AEP which we turn to next.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F12/50(pg.30/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
At first, we’ll emphasize mostly intuition
Please read chapter 3 if you haven’t yet. If you have read it, read itagain.
Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.
Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).
For the n random variables (X1, X2, . . . , Xn), there are Kn possible
outcomes.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.31/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
At first, we’ll emphasize mostly intuition
Please read chapter 3 if you haven’t yet. If you have read it, read itagain.
Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.
Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).
For the n random variables (X1, X2, . . . , Xn), there are Kn possible
outcomes.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.32/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
At first, we’ll emphasize mostly intuition
Please read chapter 3 if you haven’t yet. If you have read it, read itagain.
Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.
Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).
For the n random variables (X1, X2, . . . , Xn), there are Kn possible
outcomes.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.33/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
At first, we’ll emphasize mostly intuition
Please read chapter 3 if you haven’t yet. If you have read it, read itagain.
Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.
Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).
As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).
For the n random variables (X1, X2, . . . , Xn), there are Kn possible
outcomes.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.34/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
At first, we’ll emphasize mostly intuition
Please read chapter 3 if you haven’t yet. If you have read it, read itagain.
Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.
Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).
For the n random variables (X1, X2, . . . , Xn), there are Kn possible
outcomes.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.35/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
At first, we’ll emphasize mostly intuition
Please read chapter 3 if you haven’t yet. If you have read it, read itagain.
Consider blocks of outcomes of random variables (i.e., randomvectors of length n). n = the block length.
Let X1, X2, . . . , Xn be i.i.d. r.v.s all distributed according to p (wesay Xi ∼ p(x).As before, ∀i, Xi ∈ {a1, a2, . . . , aK}, so K possible symbols(alphabet or state space of size K).
For the n random variables (X1, X2, . . . , Xn), there are Kn possible
outcomes.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F13/50(pg.36/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m.
Hence, M = 2m possible code words.We can represent the encoder as follows:
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?
M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.37/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.
We can represent the encoder as follows:
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?
M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.38/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.We can represent the encoder as follows:
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?
M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.39/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.We can represent the encoder as follows:
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.
We want to have a code word for every possible source message,must have what condition?
M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.40/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.We can represent the encoder as follows:
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?
M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.41/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We wish to encode these Kn outcomes with binary digit strings (i.e.,code words) of length m. Hence, M = 2m possible code words.We can represent the encoder as follows:
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Example: English letters, would have K = 26 (alphabet size K), a“source message” consists of n letters.We want to have a code word for every possible source message,must have what condition?
M = 2m ≥ Kn ⇒ m ≥ n logK (4.17)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F14/50(pg.42/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
A question on rate: How many bits are used per source letter?
R = rate =logM
n=m
n≥ logK bits per source letter (4.18)
Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error?
Yes.
How? One way: some source messages would not have a code.
Source messages Code wordsgarbage
I.e., code words only assigned to a subset of the source messages!
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.43/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
A question on rate: How many bits are used per source letter?
R = rate =logM
n=m
n≥ logK bits per source letter (4.18)
Not surprising, e.g., for English need dlogKe = 5 bits.
Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error?
Yes.
How? One way: some source messages would not have a code.
Source messages Code wordsgarbage
I.e., code words only assigned to a subset of the source messages!
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.44/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
A question on rate: How many bits are used per source letter?
R = rate =logM
n=m
n≥ logK bits per source letter (4.18)
Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error?
Yes.How? One way: some source messages would not have a code.
Source messages Code wordsgarbage
I.e., code words only assigned to a subset of the source messages!
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.45/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
A question on rate: How many bits are used per source letter?
R = rate =logM
n=m
n≥ logK bits per source letter (4.18)
Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error? Yes.
How? One way: some source messages would not have a code.
Source messages Code wordsgarbage
I.e., code words only assigned to a subset of the source messages!
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.46/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
A question on rate: How many bits are used per source letter?
R = rate =logM
n=m
n≥ logK bits per source letter (4.18)
Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error? Yes.How? One way: some source messages would not have a code.
Source messages Code wordsgarbage
I.e., code words only assigned to a subset of the source messages!
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.47/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
A question on rate: How many bits are used per source letter?
R = rate =logM
n=m
n≥ logK bits per source letter (4.18)
Not surprising, e.g., for English need dlogKe = 5 bits.Question: can we use fewer than this bits per source letter (onaverage) and still have essentially no error? Yes.How? One way: some source messages would not have a code.
Source messages Code wordsgarbage
I.e., code words only assigned to a subset of the source messages!Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F15/50(pg.48/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
Any source message assigned to garbage would, if we wish to sendthat message, result in an error.
Alternatively, and perhaps less distressingly, rather than throw somemessages into the trash, we could assign to them long code words,and the non-garbage messages to short code words.
Source messages
Short Code words
Long Code words
In either case, if n gets big enough, we make the code such that theprobability of getting one of those error source messages (orlong-code-word source messages) very small!
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F16/50(pg.49/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
Any source message assigned to garbage would, if we wish to sendthat message, result in an error.
Alternatively, and perhaps less distressingly, rather than throw somemessages into the trash, we could assign to them long code words,and the non-garbage messages to short code words.
Source messages
Short Code words
Long Code words
In either case, if n gets big enough, we make the code such that theprobability of getting one of those error source messages (orlong-code-word source messages) very small!
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F16/50(pg.50/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
Any source message assigned to garbage would, if we wish to sendthat message, result in an error.
Alternatively, and perhaps less distressingly, rather than throw somemessages into the trash, we could assign to them long code words,and the non-garbage messages to short code words.
Source messages
Short Code words
Long Code words
In either case, if n gets big enough, we make the code such that theprobability of getting one of those error source messages (orlong-code-word source messages) very small!
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F16/50(pg.51/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Probability of Source Words
The probability of a source word can be expressed
p(X1 = x1, X2 = x2, . . . , Xn = xn) =
n∏
i=1
p(Xi = xi) (4.19)
Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)
= − logn∏
i=1
p(xi) =
n∑
i=1
− log p(xi) =n∑
i=1
I(xi)
(4.21)
Also note that EI(X) = H(X).
The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s
with mean µ = EXi.
So, I(Xi) is also a random variable with mean H(X).
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.52/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Probability of Source Words
The probability of a source word can be expressed
p(X1 = x1, X2 = x2, . . . , Xn = xn) =
n∏
i=1
p(Xi = xi) (4.19)
Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)
= − logn∏
i=1
p(xi) =
n∑
i=1
− log p(xi) =n∑
i=1
I(xi)
(4.21)
Also note that EI(X) = H(X).
The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s
with mean µ = EXi.
So, I(Xi) is also a random variable with mean H(X).
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.53/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Probability of Source Words
The probability of a source word can be expressed
p(X1 = x1, X2 = x2, . . . , Xn = xn) =
n∏
i=1
p(Xi = xi) (4.19)
Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)
= − logn∏
i=1
p(xi) =
n∑
i=1
− log p(xi) =n∑
i=1
I(xi)
(4.21)
Also note that EI(X) = H(X).
The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s
with mean µ = EXi.
So, I(Xi) is also a random variable with mean H(X).
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.54/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Probability of Source Words
The probability of a source word can be expressed
p(X1 = x1, X2 = x2, . . . , Xn = xn) =
n∏
i=1
p(Xi = xi) (4.19)
Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)
= − logn∏
i=1
p(xi) =
n∑
i=1
− log p(xi) =n∑
i=1
I(xi)
(4.21)
Also note that EI(X) = H(X).
The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s
with mean µ = EXi.
So, I(Xi) is also a random variable with mean H(X).
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.55/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Probability of Source Words
The probability of a source word can be expressed
p(X1 = x1, X2 = x2, . . . , Xn = xn) =
n∏
i=1
p(Xi = xi) (4.19)
Recall, the Shannon/Hartley information about an event {X = x} is− log p(x) = I(x), so information of joint event {x1, x2, . . . , xn} is:I(x1, x2, . . . , xn) = − log p(x1, x2, . . . , xn) (4.20)
= − logn∏
i=1
p(xi) =
n∑
i=1
− log p(xi) =n∑
i=1
I(xi)
(4.21)
Also note that EI(X) = H(X).
The WLLN says that 1nSnp−→ µ, where Sn is the sum of i.i.d. r.v.s
with mean µ = EXi.
So, I(Xi) is also a random variable with mean H(X).Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F17/50(pg.56/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
WLLN and entropy
Combining the above, we get
1
n
n∑
i=1
I(Xi)p−−−→
n→∞H(X) (4.22)
Thus, if n is big enough, we have that (this is where it gets cool ,)
1
n
n∑
i=1
I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)
⇒ − 1n
n∑
i=1
log p(xi) ≈ H(X) (4.24)
⇒ − logn∏
i=1
p(xi) ≈ nH(X) (4.25)
⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X)
(4.27)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.57/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
WLLN and entropy
Combining the above, we get
1
n
n∑
i=1
I(Xi)p−−−→
n→∞H(X) (4.22)
Thus, if n is big enough, we have that (this is where it gets cool ,)
1
n
n∑
i=1
I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)
⇒ − 1n
n∑
i=1
log p(xi) ≈ H(X) (4.24)
⇒ − logn∏
i=1
p(xi) ≈ nH(X) (4.25)
⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X)
(4.27)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.58/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
WLLN and entropy
Combining the above, we get
1
n
n∑
i=1
I(Xi)p−−−→
n→∞H(X) (4.22)
Thus, if n is big enough, we have that (this is where it gets cool ,)
1
n
n∑
i=1
I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)
⇒ − 1n
n∑
i=1
log p(xi) ≈ H(X) (4.24)
⇒ − logn∏
i=1
p(xi) ≈ nH(X) (4.25)
⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X)
(4.27)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.59/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
WLLN and entropy
Combining the above, we get
1
n
n∑
i=1
I(Xi)p−−−→
n→∞H(X) (4.22)
Thus, if n is big enough, we have that (this is where it gets cool ,)
1
n
n∑
i=1
I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)
⇒ − 1n
n∑
i=1
log p(xi) ≈ H(X) (4.24)
⇒ − logn∏
i=1
p(xi) ≈ nH(X) (4.25)
⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X)
(4.27)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.60/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
WLLN and entropy
Combining the above, we get
1
n
n∑
i=1
I(Xi)p−−−→
n→∞H(X) (4.22)
Thus, if n is big enough, we have that (this is where it gets cool ,)
1
n
n∑
i=1
I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)
⇒ − 1n
n∑
i=1
log p(xi) ≈ H(X) (4.24)
⇒ − logn∏
i=1
p(xi) ≈ nH(X) (4.25)
⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)
⇒ p(x1, . . . , xn) ≈ 2−nH(X)
(4.27)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.61/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
WLLN and entropy
Combining the above, we get
1
n
n∑
i=1
I(Xi)p−−−→
n→∞H(X) (4.22)
Thus, if n is big enough, we have that (this is where it gets cool ,)
1
n
n∑
i=1
I(xi) ≈ H(X) when ∀i, xi ∼ p(x) (4.23)
⇒ − 1n
n∑
i=1
log p(xi) ≈ H(X) (4.24)
⇒ − logn∏
i=1
p(xi) ≈ nH(X) (4.25)
⇒ − log p(x1, x2, . . . , xn) ≈ nH(X) (4.26)⇒ p(x1, . . . , xn) ≈ 2−nH(X) (4.27)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F18/50(pg.62/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We repeat this last equation: When n is large enough, we have
p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)
Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!
So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .
. . . and that probability is equal to 2−nH .
Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.63/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We repeat this last equation: When n is large enough, we have
p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)
Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!
So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .
. . . and that probability is equal to 2−nH .
Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.64/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We repeat this last equation: When n is large enough, we have
p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)
Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!
So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .
. . . and that probability is equal to 2−nH .
Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.65/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We repeat this last equation: When n is large enough, we have
p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)
Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!
So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .
. . . and that probability is equal to 2−nH .
Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.66/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
We repeat this last equation: When n is large enough, we have
p(x1, . . . , xn) ≈ 2−nH(X) when ∀i, xi ∼ p(x) (4.28)
Note, perhaps somewhat startlingly, r.h.s. is the probability and itdoes not depend on the specific sequence instance x1, x2, . . . , xn!
So, if n gets large enough, pretty much all sequences that happenhave the same probability . . .
. . . and that probability is equal to 2−nH .
Those sequences that have that probability (which means prettymuch all of them that occur) are called the typical sequences,represented by the set A.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F19/50(pg.67/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Said again: Almost all events are almost equally probable
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, and
if n is large enough,
then for any sample x1, x2, . . . , xn
The probability of the sample is essentially independent of thesample, i.e.,
p(x1, . . . , xn) ≈ 2−nH(X) (4.29)
where H(X) is the entropy of p(x).
Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are
represented by A(n)� .
Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.68/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Said again: Almost all events are almost equally probable
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,
then for any sample x1, x2, . . . , xn
The probability of the sample is essentially independent of thesample, i.e.,
p(x1, . . . , xn) ≈ 2−nH(X) (4.29)
where H(X) is the entropy of p(x).
Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are
represented by A(n)� .
Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.69/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Said again: Almost all events are almost equally probable
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,
then for any sample x1, x2, . . . , xn
The probability of the sample is essentially independent of thesample, i.e.,
p(x1, . . . , xn) ≈ 2−nH(X) (4.29)
where H(X) is the entropy of p(x).
Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are
represented by A(n)� .
Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.70/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Said again: Almost all events are almost equally probable
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,
then for any sample x1, x2, . . . , xn
The probability of the sample is essentially independent of thesample, i.e.,
p(x1, . . . , xn) ≈ 2−nH(X) (4.29)
where H(X) is the entropy of p(x).
Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are
represented by A(n)� .
Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.71/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Said again: Almost all events are almost equally probable
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,
then for any sample x1, x2, . . . , xn
The probability of the sample is essentially independent of thesample, i.e.,
p(x1, . . . , xn) ≈ 2−nH(X) (4.29)
where H(X) is the entropy of p(x).
Thus, there can only be 2nH such samples, and it may be that2nH � Kn
Those samples that will happen are called typical, and they are
represented by A(n)� .
Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.72/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Said again: Almost all events are almost equally probable
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,
then for any sample x1, x2, . . . , xn
The probability of the sample is essentially independent of thesample, i.e.,
p(x1, . . . , xn) ≈ 2−nH(X) (4.29)
where H(X) is the entropy of p(x).
Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are
represented by A(n)� .
Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.73/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Said again: Almost all events are almost equally probable
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, andif n is large enough,
then for any sample x1, x2, . . . , xn
The probability of the sample is essentially independent of thesample, i.e.,
p(x1, . . . , xn) ≈ 2−nH(X) (4.29)
where H(X) is the entropy of p(x).
Thus, there can only be 2nH such samples, and it may be that2nH � KnThose samples that will happen are called typical, and they are
represented by A(n)� .
Thus, a large portion of X n essentially won’t happen, i.e., could bethat 2nH ≈ |A(n)� | � |X n| = Kn.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F20/50(pg.74/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
The Typical Set
Let A(n)� be the set of typical sequences (i.e., those with probability
2−nH(X).
If “all” events have the same probability p, then there are 1/p ofthem.So the number of typical sequences is
|A(n)� | ≈ 2nH(X). (4.30)Thus, to represent (or code for) the typical sequences, we need onlynH(X) bits, so we take
m = nH(X) (4.31)
in the encoder model. Thus, the rate of the code is H(X).
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F21/50(pg.75/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
The Typical Set
Let A(n)� be the set of typical sequences (i.e., those with probability
2−nH(X).If “all” events have the same probability p, then there are 1/p ofthem.
So the number of typical sequences is
|A(n)� | ≈ 2nH(X). (4.30)Thus, to represent (or code for) the typical sequences, we need onlynH(X) bits, so we take
m = nH(X) (4.31)
in the encoder model. Thus, the rate of the code is H(X).
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F21/50(pg.76/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
The Typical Set
Let A(n)� be the set of typical sequences (i.e., those with probability
2−nH(X).If “all” events have the same probability p, then there are 1/p ofthem.So the number of typical sequences is
|A(n)� | ≈ 2nH(X). (4.30)
Thus, to represent (or code for) the typical sequences, we need onlynH(X) bits, so we take
m = nH(X) (4.31)
in the encoder model. Thus, the rate of the code is H(X).
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F21/50(pg.77/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
The Typical Set
Let A(n)� be the set of typical sequences (i.e., those with probability
2−nH(X).If “all” events have the same probability p, then there are 1/p ofthem.So the number of typical sequences is
|A(n)� | ≈ 2nH(X). (4.30)Thus, to represent (or code for) the typical sequences, we need onlynH(X) bits, so we take
m = nH(X) (4.31)
in the encoder model. Thus, the rate of the code is H(X).
{X1, X2, . . . ,Xn}
Xi ∈ {a1, a2, . . . , aK}Kn possible messages
n source letters in each source msg
{Y1 , Y 2, . . . , Ym}
Yi ∈ {0, 1}2m possible messages
m total bits
Encoder
Source messages Code words
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F21/50(pg.78/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Code Only The Typical Set
If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):
m
n= H, which could be ≤ logK (4.32)
So to summarize, we have three uses and interpretations of entropyhere for source coding.
1 The probability of a typical sequence is 2−nH(X).2 The number of typical sequences is 2nH(X).3 Number of bits per source symbol is H(X), when we code only for
the typical set.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.79/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Code Only The Typical Set
If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):
m
n= H, which could be ≤ logK (4.32)
So to summarize, we have three uses and interpretations of entropyhere for source coding.
1 The probability of a typical sequence is 2−nH(X).2 The number of typical sequences is 2nH(X).3 Number of bits per source symbol is H(X), when we code only for
the typical set.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.80/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Code Only The Typical Set
If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):
m
n= H, which could be ≤ logK (4.32)
So to summarize, we have three uses and interpretations of entropyhere for source coding.
1 The probability of a typical sequence is 2−nH(X).
2 The number of typical sequences is 2nH(X).3 Number of bits per source symbol is H(X), when we code only for
the typical set.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.81/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Code Only The Typical Set
If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):
m
n= H, which could be ≤ logK (4.32)
So to summarize, we have three uses and interpretations of entropyhere for source coding.
1 The probability of a typical sequence is 2−nH(X).2 The number of typical sequences is 2nH(X).
3 Number of bits per source symbol is H(X), when we code only forthe typical set.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.82/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Code Only The Typical Set
If we take m = nH, then the average number of bits per sourcealphabet letter is (i.e., the rate becomes):
m
n= H, which could be ≤ logK (4.32)
So to summarize, we have three uses and interpretations of entropyhere for source coding.
1 The probability of a typical sequence is 2−nH(X).2 The number of typical sequences is 2nH(X).3 Number of bits per source symbol is H(X), when we code only for
the typical set.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F22/50(pg.83/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).
The probability of a single sequence is
p(x1, x2, . . . , xn) =n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.Do they all have the same probability?
No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.
What is the most probable sequence?
When p = 0.1, the sequenceof all 0s.
Do the sequences that collectively have “any” probability all havethe same probability?
Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.84/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is
p(x1, x2, . . . , xn) =
n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.Do they all have the same probability?
No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.
What is the most probable sequence?
When p = 0.1, the sequenceof all 0s.
Do the sequences that collectively have “any” probability all havethe same probability?
Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.85/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is
p(x1, x2, . . . , xn) =
n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.
Do they all have the same probability?
No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.
What is the most probable sequence?
When p = 0.1, the sequenceof all 0s.
Do the sequences that collectively have “any” probability all havethe same probability?
Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.86/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is
p(x1, x2, . . . , xn) =
n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.Do they all have the same probability?
No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence?
When p = 0.1, the sequenceof all 0s.
Do the sequences that collectively have “any” probability all havethe same probability?
Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.87/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is
p(x1, x2, . . . , xn) =
n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.
What is the most probable sequence?
When p = 0.1, the sequenceof all 0s.
Do the sequences that collectively have “any” probability all havethe same probability?
Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.88/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is
p(x1, x2, . . . , xn) =
n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence?
When p = 0.1, the sequenceof all 0s.Do the sequences that collectively have “any” probability all havethe same probability?
Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.89/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is
p(x1, x2, . . . , xn) =
n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence? When p = 0.1, the sequenceof all 0s.
Do the sequences that collectively have “any” probability all havethe same probability?
Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.90/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is
p(x1, x2, . . . , xn) =
n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence? When p = 0.1, the sequenceof all 0s.Do the sequences that collectively have “any” probability all havethe same probability?
Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.91/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Consider Bernoulli trials X1, X2, . . . , Xn i.i.d. withp(Xi = 1) = p = 1− p(Xi = 0).The probability of a single sequence is
p(x1, x2, . . . , xn) =
n∏
i=1
pxi(1− pi)1−xi = p∑i xi(1− p)n−
∑i xi
(4.33)
There are 2n possible sequences.Do they all have the same probability? No, consider when p = 0.1,(1− p) = 0.9. The sequence of all zeros is much more likely.What is the most probable sequence? When p = 0.1, the sequenceof all 0s.Do the sequences that collectively have “any” probability all havethe same probability? Depends what we mean by “any”, but forsmall n, no. But as n gets large, something funny happens and“yes” becomes a more appropriate answer.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F23/50(pg.92/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).
Can we predict the probability that a particular sequence has aparticular probability value? I.e.,
Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)
Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.
It turns out that
Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)
if n is large enough.
In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.93/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).
Can we predict the probability that a particular sequence has aparticular probability value? I.e.,
Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)
Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.
It turns out that
Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)
if n is large enough.
In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.94/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).
Can we predict the probability that a particular sequence has aparticular probability value? I.e.,
Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)
Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.
It turns out that
Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)
if n is large enough.
In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.95/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).
Can we predict the probability that a particular sequence has aparticular probability value? I.e.,
Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)
Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.
It turns out that
Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)
if n is large enough.
In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.96/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
AEP setup
Notational reminder: H = H(X) is the entropy of a single randomvariable distributed as p(x).
Can we predict the probability that a particular sequence has aparticular probability value? I.e.,
Pr(p(X1, X2, . . . , Xn) = α) = ? (4.34)
Note: “p(X1, X2, . . . , Xn)” is a random variable! It is a randomprobability, and it is a true random variable since it is a probabilitythat is a function of a set of random variables.
It turns out that
Pr(p(X1, X2, . . . , Xn) ≈ 2−nH) ≈ 1 (4.35)
if n is large enough.
In English, this can be read as: almost all events (that occurcollectively with any appreciable probability) are all equally likely.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F24/50(pg.97/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Ex: Bernoulli trials
Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).
Hence, ESn = np and var(S) = npq, and
p(Sn = k) =
(n
k
)pkqn−k (4.36)
Then, the expression 2−nH can be viewed in an intuitive way. I.e.,
2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)
= 2log pnp+log(1−p)n(1−p) (4.38)
= pnpqnq where q = 1− p
(4.39)
here H = H(p) the binary entropy with probability p.
np is the expected number of 1’s.
nq is the expected number of 0’s.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.98/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Ex: Bernoulli trials
Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and
p(Sn = k) =
(n
k
)pkqn−k (4.36)
Then, the expression 2−nH can be viewed in an intuitive way. I.e.,
2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)
= 2log pnp+log(1−p)n(1−p) (4.38)
= pnpqnq where q = 1− p
(4.39)
here H = H(p) the binary entropy with probability p.
np is the expected number of 1’s.
nq is the expected number of 0’s.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.99/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Ex: Bernoulli trials
Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and
p(Sn = k) =
(n
k
)pkqn−k (4.36)
Then, the expression 2−nH can be viewed in an intuitive way. I.e.,
2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)
= 2log pnp+log(1−p)n(1−p) (4.38)
= pnpqnq where q = 1− p
(4.39)
here H = H(p) the binary entropy with probability p.
np is the expected number of 1’s.
nq is the expected number of 0’s.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.100/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Ex: Bernoulli trials
Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and
p(Sn = k) =
(n
k
)pkqn−k (4.36)
Then, the expression 2−nH can be viewed in an intuitive way. I.e.,
2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)
= 2log pnp+log(1−p)n(1−p) (4.38)
= pnpqnq where q = 1− p
(4.39)
here H = H(p) the binary entropy with probability p.
np is the expected number of 1’s.
nq is the expected number of 0’s.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.101/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Ex: Bernoulli trials
Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and
p(Sn = k) =
(n
k
)pkqn−k (4.36)
Then, the expression 2−nH can be viewed in an intuitive way. I.e.,
2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)
= 2log pnp+log(1−p)n(1−p) (4.38)
= pnpqnq where q = 1− p (4.39)
here H = H(p) the binary entropy with probability p.
np is the expected number of 1’s.
nq is the expected number of 0’s.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.102/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Ex: Bernoulli trials
Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and
p(Sn = k) =
(n
k
)pkqn−k (4.36)
Then, the expression 2−nH can be viewed in an intuitive way. I.e.,
2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)
= 2log pnp+log(1−p)n(1−p) (4.38)
= pnpqnq where q = 1− p (4.39)
here H = H(p) the binary entropy with probability p.
np is the expected number of 1’s.
nq is the expected number of 0’s.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.103/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Ex: Bernoulli trials
Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and
p(Sn = k) =
(n
k
)pkqn−k (4.36)
Then, the expression 2−nH can be viewed in an intuitive way. I.e.,
2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)
= 2log pnp+log(1−p)n(1−p) (4.38)
= pnpqnq where q = 1− p (4.39)
here H = H(p) the binary entropy with probability p.
np is the expected number of 1’s.
nq is the expected number of 0’s.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.104/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Ex: Bernoulli trials
Let Sn ∼ Binomial(n, p) with Sn = X1 +X2 + · · ·+Xn,Xi ∼ Bernoulli(p).Hence, ESn = np and var(S) = npq, and
p(Sn = k) =
(n
k
)pkqn−k (4.36)
Then, the expression 2−nH can be viewed in an intuitive way. I.e.,
2−nH(p) = 2−n(−p log p−(1−p) log(1−p)) (4.37)
= 2log pnp+log(1−p)n(1−p) (4.38)
= pnpqnq where q = 1− p (4.39)
here H = H(p) the binary entropy with probability p.
np is the expected number of 1’s.
nq is the expected number of 0’s.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F25/50(pg.105/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
In other words, all sequences that occur are the ones where thenumber of 1s and 0s are roughly equal to their expected values.
No other sequences have any appreciable probability!
The sequence X1, X2, . . . , Xn was assumed i.i.d., but this can beextended to Markov chains, and to ergotic stationary randomprocesses.
But before doing any of that, we need more formalism.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F26/50(pg.106/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
In other words, all sequences that occur are the ones where thenumber of 1s and 0s are roughly equal to their expected values.
No other sequences have any appreciable probability!
The sequence X1, X2, . . . , Xn was assumed i.i.d., but this can beextended to Markov chains, and to ergotic stationary randomprocesses.
But before doing any of that, we need more formalism.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F26/50(pg.107/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
In other words, all sequences that occur are the ones where thenumber of 1s and 0s are roughly equal to their expected values.
No other sequences have any appreciable probability!
The sequence X1, X2, . . . , Xn was assumed i.i.d., but this can beextended to Markov chains, and to ergotic stationary randomprocesses.
But before doing any of that, we need more formalism.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F26/50(pg.108/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Towards AEP
In other words, all sequences that occur are the ones where thenumber of 1s and 0s are roughly equal to their expected values.
No other sequences have any appreciable probability!
The sequence X1, X2, . . . , Xn was assumed i.i.d., but this can beextended to Markov chains, and to ergotic stationary randomprocesses.
But before doing any of that, we need more formalism.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F26/50(pg.109/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
Theorem 4.5.1 (AEP)
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then
− 1n
log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)
Proof.
− 1n
log p(X1, X2, . . . , Xn) = −1
nlog
n∏
i=1
p(Xi) (4.41)
= − 1n
∑
i
log p(Xi)p−→ −E log p(X) (4.42)
= H(X)
(4.43)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.110/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
Theorem 4.5.1 (AEP)
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then
− 1n
log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)
Proof.
− 1n
log p(X1, X2, . . . , Xn)
= − 1n
log
n∏
i=1
p(Xi) (4.41)
= − 1n
∑
i
log p(Xi)p−→ −E log p(X) (4.42)
= H(X)
(4.43)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.111/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
Theorem 4.5.1 (AEP)
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then
− 1n
log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)
Proof.
− 1n
log p(X1, X2, . . . , Xn) = −1
nlog
n∏
i=1
p(Xi) (4.41)
= − 1n
∑
i
log p(Xi)p−→ −E log p(X) (4.42)
= H(X)
(4.43)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.112/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
Theorem 4.5.1 (AEP)
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then
− 1n
log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)
Proof.
− 1n
log p(X1, X2, . . . , Xn) = −1
nlog
n∏
i=1
p(Xi) (4.41)
= − 1n
∑
i
log p(Xi)
p−→ −E log p(X) (4.42)
= H(X)
(4.43)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.113/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
Theorem 4.5.1 (AEP)
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then
− 1n
log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)
Proof.
− 1n
log p(X1, X2, . . . , Xn) = −1
nlog
n∏
i=1
p(Xi) (4.41)
= − 1n
∑
i
log p(Xi)p−→ −E log p(X) (4.42)
= H(X)
(4.43)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.114/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Asymptotic Equipartition Property (AEP)
Theorem 4.5.1 (AEP)
If X1, X2, . . . , Xn are i.i.d. and Xi ∼ p(x) for all i, then
− 1n
log p(X1, X2, . . . , Xn)p−→ H(X) (4.40)
Proof.
− 1n
log p(X1, X2, . . . , Xn) = −1
nlog
n∏
i=1
p(Xi) (4.41)
= − 1n
∑
i
log p(Xi)p−→ −E log p(X) (4.42)
= H(X) (4.43)
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F27/50(pg.115/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Typical Set
Definition 4.5.2 (Typical Set)
The typical set A(n)� w.r.t. p(x) is the set of sequences
(x1, x2, . . . , xn) ∈ X n with the property that
2−n(H(X)+�) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−�) (4.44)
Equivalently, we may write A(n)� as
A(n)� =
{(x1, x2, . . . , xn) :
∣∣∣∣−1
nlog p(x1, . . . , xn)−H
∣∣∣∣ < �}
(4.45)
Typical set are those sequences with log probability within the range−nH
n� n�
A(n)� has a number of interesting properties.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F28/50(pg.116/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Typical Set
Definition 4.5.2 (Typical Set)
The typical set A(n)� w.r.t. p(x) is the set of sequences
(x1, x2, . . . , xn) ∈ X n with the property that
2−n(H(X)+�) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−�) (4.44)
Equivalently, we may write A(n)� as
A(n)� =
{(x1, x2, . . . , xn) :
∣∣∣∣−1
nlog p(x1, . . . , xn)−H
∣∣∣∣ < �}
(4.45)
Typical set are those sequences with log probability within the range−nH
n� n�
A(n)� has a number of interesting properties.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F28/50(pg.117/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Typical Set
Definition 4.5.2 (Typical Set)
The typical set A(n)� w.r.t. p(x) is the set of sequences
(x1, x2, . . . , xn) ∈ X n with the property that
2−n(H(X)+�) ≤ p(x1, x2, . . . , xn) ≤ 2−n(H(X)−�) (4.44)
Equivalently, we may write A(n)� as
A(n)� =
{(x1, x2, . . . , xn) :
∣∣∣∣−1
nlog p(x1, . . . , xn)−H
∣∣∣∣ < �}
(4.45)
Typical set are those sequences with log probability within the range−nH
n� n�
A(n)� has a number of interesting properties.
Prof. Jeff Bilmes EE514a/Fall 2019/Info. Theory I – Lecture 4 - Oct 7th, 2019 L4 F28/50(pg.118/244)
-
Fano WLLN & Modes of Conv. Towards AEP AEP Source Coding
Typical Set
Definition 4.5.2 (Typical Set)
The typical set A(n)� w.r.t. p(x) is the set of sequences
(x1,