Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory...
Transcript of Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory...
![Page 1: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/1.jpg)
Probability Theory Refresher
Mario A. T. Figueiredo
Instituto Superior Tecnico & Instituto de Telecomunicacoes
Lisboa, Portugal
LxMLS 2016: Lisbon Machine Learning School
July 21, 2016
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 1 / 40
![Page 2: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/2.jpg)
Probability theory
The study of probability has roots in games of chance
Great names of science: Cardano, Fermat, Pascal, Laplace,Kolmogorov, Bernoulli, Poisson, Cauchy, Boltzman, de Finetti, ...
Natural tool to model uncertainty, information, knowledge, belief,observations, ...
...thus also learning, decision making, inference, science,...
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 2 / 40
![Page 3: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/3.jpg)
Probability theory
The study of probability has roots in games of chance
Great names of science: Cardano, Fermat, Pascal, Laplace,Kolmogorov, Bernoulli, Poisson, Cauchy, Boltzman, de Finetti, ...
Natural tool to model uncertainty, information, knowledge, belief,observations, ...
...thus also learning, decision making, inference, science,...
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 2 / 40
![Page 4: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/4.jpg)
Probability theory
The study of probability has roots in games of chance
Great names of science: Cardano, Fermat, Pascal, Laplace,Kolmogorov, Bernoulli, Poisson, Cauchy, Boltzman, de Finetti, ...
Natural tool to model uncertainty, information, knowledge, belief,observations, ...
...thus also learning, decision making, inference, science,...
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 2 / 40
![Page 5: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/5.jpg)
Probability theory
The study of probability has roots in games of chance
Great names of science: Cardano, Fermat, Pascal, Laplace,Kolmogorov, Bernoulli, Poisson, Cauchy, Boltzman, de Finetti, ...
Natural tool to model uncertainty, information, knowledge, belief,observations, ...
...thus also learning, decision making, inference, science,...
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 2 / 40
![Page 6: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/6.jpg)
Probability theory
The study of probability has roots in games of chance
Great names of science: Cardano, Fermat, Pascal, Laplace,Kolmogorov, Bernoulli, Poisson, Cauchy, Boltzman, de Finetti, ...
Natural tool to model uncertainty, information, knowledge, belief,observations, ...
...thus also learning, decision making, inference, science,...
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 2 / 40
![Page 7: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/7.jpg)
What is probability?
Classical definition: P(A) =NA
N
...with N mutually exclusive equally likely outcomes,
NA of which result in the occurrence of A. Laplace, 1814
Example: P(randomly drawn card is ♣) = 13/52.
Example: P(getting 1 in throwing a fair die) = 1/6.
Frequentist definition: P(A) = limN→∞
NA
N
...relative frequency of occurrence of A in infinite number of trials.
Subjective probability: P(A) is a degree of belief. de Finetti, 1930s
...gives meaning to P(“it will rain tomorrow”).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 3 / 40
![Page 8: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/8.jpg)
What is probability?
Classical definition: P(A) =NA
N
...with N mutually exclusive equally likely outcomes,
NA of which result in the occurrence of A. Laplace, 1814
Example: P(randomly drawn card is ♣) = 13/52.
Example: P(getting 1 in throwing a fair die) = 1/6.
Frequentist definition: P(A) = limN→∞
NA
N
...relative frequency of occurrence of A in infinite number of trials.
Subjective probability: P(A) is a degree of belief. de Finetti, 1930s
...gives meaning to P(“it will rain tomorrow”).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 3 / 40
![Page 9: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/9.jpg)
What is probability?
Classical definition: P(A) =NA
N
...with N mutually exclusive equally likely outcomes,
NA of which result in the occurrence of A. Laplace, 1814
Example: P(randomly drawn card is ♣) = 13/52.
Example: P(getting 1 in throwing a fair die) = 1/6.
Frequentist definition: P(A) = limN→∞
NA
N
...relative frequency of occurrence of A in infinite number of trials.
Subjective probability: P(A) is a degree of belief. de Finetti, 1930s
...gives meaning to P(“it will rain tomorrow”).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 3 / 40
![Page 10: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/10.jpg)
Key concepts: Sample space and events
Sample space X = set of possible outcomes of a random experiment.
Examples:I Tossing two coins: X = {HH,TH,HT, TT}I Roulette: X = {1, 2, ..., 36}I Draw a card from a shuffled deck: X = {A♣, 2♣, ..., Q♦,K♦}.
An event A is a subset of X : A ⊆ X (also written A ∈ 2X ).
Examples:I “exactly one H in 2-coin toss”:A = {TH,HT} ⊂ {HH,TH,HT, TT}.
I “odd number in the roulette”: B = {1, 3, ..., 35} ⊂ {1, 2, ..., 36}.
I “drawn a ♥ card”: C = {A♥, 2♥, ...,K♥} ⊂ {A♣, ...,K♦}
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 4 / 40
![Page 11: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/11.jpg)
Key concepts: Sample space and events
Sample space X = set of possible outcomes of a random experiment.
Examples:I Tossing two coins: X = {HH,TH,HT, TT}I Roulette: X = {1, 2, ..., 36}I Draw a card from a shuffled deck: X = {A♣, 2♣, ..., Q♦,K♦}.
An event A is a subset of X : A ⊆ X (also written A ∈ 2X ).
Examples:I “exactly one H in 2-coin toss”:A = {TH,HT} ⊂ {HH,TH,HT, TT}.
I “odd number in the roulette”: B = {1, 3, ..., 35} ⊂ {1, 2, ..., 36}.
I “drawn a ♥ card”: C = {A♥, 2♥, ...,K♥} ⊂ {A♣, ...,K♦}
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 4 / 40
![Page 12: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/12.jpg)
Key concepts: Sample space and events
Sample space X = set of possible outcomes of a random experiment.
(More delicate) examples:I Distance travelled by tossed die: X = R+
I Location of the next rain drop on a given square tile: X = R2
Properly handling the continuous case requires deeper concepts:
I Let Σ be collection of subsets of X : Σ ⊆ 2X
I Σ is a σ−algebra if
F A ∈ Σ ⇒ Ac ∈ Σ
F A1, A2, ... ∈ Σ ⇒∞⋃i=1
Ai ∈ Σ
I Corollary: if Σ ⊂ 2X is a σ-algebra, ∅ ∈ Σ and X ∈ Σ
I Example in Rn: collection of Lebesgue-measurable sets is a σ−algebra.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 5 / 40
![Page 13: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/13.jpg)
Key concepts: Sample space and events
Sample space X = set of possible outcomes of a random experiment.
(More delicate) examples:I Distance travelled by tossed die: X = R+
I Location of the next rain drop on a given square tile: X = R2
Properly handling the continuous case requires deeper concepts:
I Let Σ be collection of subsets of X : Σ ⊆ 2X
I Σ is a σ−algebra if
F A ∈ Σ ⇒ Ac ∈ Σ
F A1, A2, ... ∈ Σ ⇒∞⋃i=1
Ai ∈ Σ
I Corollary: if Σ ⊂ 2X is a σ-algebra, ∅ ∈ Σ and X ∈ Σ
I Example in Rn: collection of Lebesgue-measurable sets is a σ−algebra.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 5 / 40
![Page 14: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/14.jpg)
Key concepts: Sample space and events
Sample space X = set of possible outcomes of a random experiment.
(More delicate) examples:I Distance travelled by tossed die: X = R+
I Location of the next rain drop on a given square tile: X = R2
Properly handling the continuous case requires deeper concepts:
I Let Σ be collection of subsets of X : Σ ⊆ 2X
I Σ is a σ−algebra if
F A ∈ Σ ⇒ Ac ∈ Σ
F A1, A2, ... ∈ Σ ⇒∞⋃i=1
Ai ∈ Σ
I Corollary: if Σ ⊂ 2X is a σ-algebra, ∅ ∈ Σ and X ∈ Σ
I Example in Rn: collection of Lebesgue-measurable sets is a σ−algebra.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 5 / 40
![Page 15: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/15.jpg)
Key concepts: Sample space and events
Sample space X = set of possible outcomes of a random experiment.
(More delicate) examples:I Distance travelled by tossed die: X = R+
I Location of the next rain drop on a given square tile: X = R2
Properly handling the continuous case requires deeper concepts:
I Let Σ be collection of subsets of X : Σ ⊆ 2X
I Σ is a σ−algebra if
F A ∈ Σ ⇒ Ac ∈ Σ
F A1, A2, ... ∈ Σ ⇒∞⋃i=1
Ai ∈ Σ
I Corollary: if Σ ⊂ 2X is a σ-algebra, ∅ ∈ Σ and X ∈ Σ
I Example in Rn: collection of Lebesgue-measurable sets is a σ−algebra.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 5 / 40
![Page 16: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/16.jpg)
Key concepts: Sample space and events
Sample space X = set of possible outcomes of a random experiment.
(More delicate) examples:I Distance travelled by tossed die: X = R+
I Location of the next rain drop on a given square tile: X = R2
Properly handling the continuous case requires deeper concepts:
I Let Σ be collection of subsets of X : Σ ⊆ 2X
I Σ is a σ−algebra if
F A ∈ Σ ⇒ Ac ∈ Σ
F A1, A2, ... ∈ Σ ⇒∞⋃i=1
Ai ∈ Σ
I Corollary: if Σ ⊂ 2X is a σ-algebra, ∅ ∈ Σ and X ∈ Σ
I Example in Rn: collection of Lebesgue-measurable sets is a σ−algebra.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 5 / 40
![Page 17: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/17.jpg)
Key concepts: Sample space and events
Sample space X = set of possible outcomes of a random experiment.
(More delicate) examples:I Distance travelled by tossed die: X = R+
I Location of the next rain drop on a given square tile: X = R2
Properly handling the continuous case requires deeper concepts:
I Let Σ be collection of subsets of X : Σ ⊆ 2X
I Σ is a σ−algebra if
F A ∈ Σ ⇒ Ac ∈ Σ
F A1, A2, ... ∈ Σ ⇒∞⋃i=1
Ai ∈ Σ
I Corollary: if Σ ⊂ 2X is a σ-algebra, ∅ ∈ Σ and X ∈ Σ
I Example in Rn: collection of Lebesgue-measurable sets is a σ−algebra.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 5 / 40
![Page 18: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/18.jpg)
Kolmogorov’s Axioms for Probability
Probability is a function that maps events A into the interval [0, 1].
Kolmogorov’s axioms (1933) for probability P : Σ→ [0, 1]
I For any A, P(A) ≥ 0
I P(X ) = 1
I If A1, A2 ... ⊆ X are disjoint events, then P(⋃i
Ai
)=∑i
P(Ai)
From these axioms, many results can be derived. Examples:
I P(∅) = 0
I C ⊂ D ⇒ P(C) ≤ P(D)
I P(A ∪B) = P(A) + P(B)− P(A ∩B)
I P(A ∪B) ≤ P(A) + P(B) (union bound)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 6 / 40
![Page 19: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/19.jpg)
Kolmogorov’s Axioms for Probability
Probability is a function that maps events A into the interval [0, 1].
Kolmogorov’s axioms (1933) for probability P : Σ→ [0, 1]
I For any A, P(A) ≥ 0
I P(X ) = 1
I If A1, A2 ... ⊆ X are disjoint events, then P(⋃i
Ai
)=∑i
P(Ai)
From these axioms, many results can be derived. Examples:
I P(∅) = 0
I C ⊂ D ⇒ P(C) ≤ P(D)
I P(A ∪B) = P(A) + P(B)− P(A ∩B)
I P(A ∪B) ≤ P(A) + P(B) (union bound)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 6 / 40
![Page 20: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/20.jpg)
Kolmogorov’s Axioms for Probability
Probability is a function that maps events A into the interval [0, 1].
Kolmogorov’s axioms (1933) for probability P : Σ→ [0, 1]
I For any A, P(A) ≥ 0
I P(X ) = 1
I If A1, A2 ... ⊆ X are disjoint events, then P(⋃i
Ai
)=∑i
P(Ai)
From these axioms, many results can be derived. Examples:
I P(∅) = 0
I C ⊂ D ⇒ P(C) ≤ P(D)
I P(A ∪B) = P(A) + P(B)− P(A ∩B)
I P(A ∪B) ≤ P(A) + P(B) (union bound)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 6 / 40
![Page 21: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/21.jpg)
Kolmogorov’s Axioms for Probability
Probability is a function that maps events A into the interval [0, 1].
Kolmogorov’s axioms (1933) for probability P : Σ→ [0, 1]
I For any A, P(A) ≥ 0
I P(X ) = 1
I If A1, A2 ... ⊆ X are disjoint events, then P(⋃i
Ai
)=∑i
P(Ai)
From these axioms, many results can be derived. Examples:
I P(∅) = 0
I C ⊂ D ⇒ P(C) ≤ P(D)
I P(A ∪B) = P(A) + P(B)− P(A ∩B)
I P(A ∪B) ≤ P(A) + P(B) (union bound)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 6 / 40
![Page 22: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/22.jpg)
Kolmogorov’s Axioms for Probability
Probability is a function that maps events A into the interval [0, 1].
Kolmogorov’s axioms (1933) for probability P : Σ→ [0, 1]
I For any A, P(A) ≥ 0
I P(X ) = 1
I If A1, A2 ... ⊆ X are disjoint events, then P(⋃i
Ai
)=∑i
P(Ai)
From these axioms, many results can be derived. Examples:
I P(∅) = 0
I C ⊂ D ⇒ P(C) ≤ P(D)
I P(A ∪B) = P(A) + P(B)− P(A ∩B)
I P(A ∪B) ≤ P(A) + P(B) (union bound)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 6 / 40
![Page 23: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/23.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)(conditional prob. of A, given B)
...satisfies all of Kolmogorov’s axioms:
I For any A ⊆ X , P(A|B) ≥ 0
I P(X|B) = 1
I If A1, A2, ... ⊆ X are disjoint, then
P(⋃i
Ai
∣∣∣B) =∑i
P(Ai|B)
Independence: A, B are independent (denoted A ⊥⊥ B) if and only if
P(A ∩B) = P(A)P(B).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 7 / 40
![Page 24: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/24.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)(conditional prob. of A, given B)
...satisfies all of Kolmogorov’s axioms:
I For any A ⊆ X , P(A|B) ≥ 0
I P(X|B) = 1
I If A1, A2, ... ⊆ X are disjoint, then
P(⋃i
Ai
∣∣∣B) =∑i
P(Ai|B)
Independence: A, B are independent (denoted A ⊥⊥ B) if and only if
P(A ∩B) = P(A)P(B).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 7 / 40
![Page 25: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/25.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)(conditional prob. of A, given B)
...satisfies all of Kolmogorov’s axioms:
I For any A ⊆ X , P(A|B) ≥ 0
I P(X|B) = 1
I If A1, A2, ... ⊆ X are disjoint, then
P(⋃i
Ai
∣∣∣B) =∑i
P(Ai|B)
Independence: A, B are independent (denoted A ⊥⊥ B) if and only if
P(A ∩B) = P(A)P(B).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 7 / 40
![Page 26: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/26.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)
Events A, B are independent (A ⊥⊥ B) ⇔ P(A ∩B) = P(A)P(B).
Relationship with conditional probabilities:
A ⊥⊥ B ⇔ P(A|B) = P(A)
Example: X = “52 cards”, A = {3♥, 3♣, 3♦, 3♣}, andB = {A♥, 2♥, ...,K♥}; then, P(A) = 1/13, P(B) = 1/4
P(A ∩B) = P({3♥}) =1
52
P(A)P(B) =1
13
1
4=
1
52
P(A|B) = P(“3”|“♥”) =1
13= P(A)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 8 / 40
![Page 27: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/27.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)
Events A, B are independent (A ⊥⊥ B) ⇔ P(A ∩B) = P(A)P(B).
Relationship with conditional probabilities:
A ⊥⊥ B ⇔ P(A|B) = P(A)
Example: X = “52 cards”, A = {3♥, 3♣, 3♦, 3♣}, andB = {A♥, 2♥, ...,K♥}; then, P(A) = 1/13, P(B) = 1/4
P(A ∩B) = P({3♥}) =1
52
P(A)P(B) =1
13
1
4=
1
52
P(A|B) = P(“3”|“♥”) =1
13= P(A)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 8 / 40
![Page 28: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/28.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)
Events A, B are independent (A ⊥⊥ B) ⇔ P(A ∩B) = P(A)P(B).
Relationship with conditional probabilities:
A ⊥⊥ B ⇔ P(A|B) = P(A)
Example: X = “52 cards”, A = {3♥, 3♣, 3♦, 3♣}, andB = {A♥, 2♥, ...,K♥}; then, P(A) = 1/13, P(B) = 1/4
P(A ∩B) = P({3♥}) =1
52
P(A)P(B) =1
13
1
4=
1
52
P(A|B) = P(“3”|“♥”) =1
13= P(A)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 8 / 40
![Page 29: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/29.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)
Events A, B are independent (A ⊥⊥ B) ⇔ P(A ∩B) = P(A)P(B).
Relationship with conditional probabilities:
A ⊥⊥ B ⇔ P(A|B) = P(A)
Example: X = “52 cards”, A = {3♥, 3♣, 3♦, 3♣}, andB = {A♥, 2♥, ...,K♥}; then, P(A) = 1/13, P(B) = 1/4
P(A ∩B) = P({3♥}) =1
52
P(A)P(B) =1
13
1
4=
1
52
P(A|B) = P(“3”|“♥”) =1
13= P(A)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 8 / 40
![Page 30: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/30.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)
Events A, B are independent (A ⊥⊥ B) ⇔ P(A ∩B) = P(A)P(B).
Relationship with conditional probabilities:
A ⊥⊥ B ⇔ P(A|B) = P(A)
Example: X = “52 cards”, A = {3♥, 3♣, 3♦, 3♣}, andB = {A♥, 2♥, ...,K♥}; then, P(A) = 1/13, P(B) = 1/4
P(A ∩B) = P({3♥}) =1
52
P(A)P(B) =1
13
1
4=
1
52
P(A|B) = P(“3”|“♥”) =1
13= P(A)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 8 / 40
![Page 31: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/31.jpg)
Conditional Probability and Independence
If P(B) > 0, P(A|B) =P(A ∩B)
P(B)
Events A, B are independent (A ⊥⊥ B) ⇔ P(A ∩B) = P(A)P(B).
Relationship with conditional probabilities:
A ⊥⊥ B ⇔ P(A|B) = P(A)
Example: X = “52 cards”, A = {3♥, 3♣, 3♦, 3♣}, andB = {A♥, 2♥, ...,K♥}; then, P(A) = 1/13, P(B) = 1/4
P(A ∩B) = P({3♥}) =1
52
P(A)P(B) =1
13
1
4=
1
52
P(A|B) = P(“3”|“♥”) =1
13= P(A)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 8 / 40
![Page 32: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/32.jpg)
Bayes Theorem
Law of total probability: if A1, ..., An are a partition of X
P(B) =∑i
P(B|Ai)P(Ai)
=∑i
P(B ∩Ai)
Bayes’ theorem: if {A1, ..., An} is a partition of X
P(Ai|B) =P(B ∩Ai)
P(B)=
P(B|Ai) P(Ai)∑i
P(B|Ai)P(Ai)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 9 / 40
![Page 33: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/33.jpg)
Bayes Theorem
Law of total probability: if A1, ..., An are a partition of X
P(B) =∑i
P(B|Ai)P(Ai)
=∑i
P(B ∩Ai)
Bayes’ theorem: if {A1, ..., An} is a partition of X
P(Ai|B) =P(B ∩Ai)
P(B)=
P(B|Ai) P(Ai)∑i
P(B|Ai)P(Ai)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 9 / 40
![Page 34: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/34.jpg)
Random Variables
A (real) random variable (RV) is a function: X : X → R
I Discrete RV: range of X is countable (e.g., N or {0, 1})I Continuous RV: range of X is uncountable (e.g., R or [0, 1])
I Example: number of head in tossing two coins,X = {HH,HT, TH, TT},X(HH) = 2, X(HT ) = X(TH) = 1, X(TT ) = 0.Range of X = {0, 1, 2}.
I Example: distance traveled by a tossed coin; range of X = R+.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 10 / 40
![Page 35: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/35.jpg)
Random Variables
A (real) random variable (RV) is a function: X : X → R
I Discrete RV: range of X is countable (e.g., N or {0, 1})
I Continuous RV: range of X is uncountable (e.g., R or [0, 1])
I Example: number of head in tossing two coins,X = {HH,HT, TH, TT},X(HH) = 2, X(HT ) = X(TH) = 1, X(TT ) = 0.Range of X = {0, 1, 2}.
I Example: distance traveled by a tossed coin; range of X = R+.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 10 / 40
![Page 36: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/36.jpg)
Random Variables
A (real) random variable (RV) is a function: X : X → R
I Discrete RV: range of X is countable (e.g., N or {0, 1})I Continuous RV: range of X is uncountable (e.g., R or [0, 1])
I Example: number of head in tossing two coins,X = {HH,HT, TH, TT},X(HH) = 2, X(HT ) = X(TH) = 1, X(TT ) = 0.Range of X = {0, 1, 2}.
I Example: distance traveled by a tossed coin; range of X = R+.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 10 / 40
![Page 37: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/37.jpg)
Random Variables
A (real) random variable (RV) is a function: X : X → R
I Discrete RV: range of X is countable (e.g., N or {0, 1})I Continuous RV: range of X is uncountable (e.g., R or [0, 1])
I Example: number of head in tossing two coins,X = {HH,HT, TH, TT},X(HH) = 2, X(HT ) = X(TH) = 1, X(TT ) = 0.Range of X = {0, 1, 2}.
I Example: distance traveled by a tossed coin; range of X = R+.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 10 / 40
![Page 38: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/38.jpg)
Random Variables
A (real) random variable (RV) is a function: X : X → R
I Discrete RV: range of X is countable (e.g., N or {0, 1})I Continuous RV: range of X is uncountable (e.g., R or [0, 1])
I Example: number of head in tossing two coins,X = {HH,HT, TH, TT},X(HH) = 2, X(HT ) = X(TH) = 1, X(TT ) = 0.Range of X = {0, 1, 2}.
I Example: distance traveled by a tossed coin; range of X = R+.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 10 / 40
![Page 39: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/39.jpg)
Random Variables: Distribution FunctionDistribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: number of heads in tossing 2 coins; range(X) = {0, 1, 2}.
Probability mass function (discrete RV): fX(x) = P(X = x),
FX(x) =∑xi≤x
fX(xi).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 11 / 40
![Page 40: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/40.jpg)
Random Variables: Distribution FunctionDistribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: number of heads in tossing 2 coins; range(X) = {0, 1, 2}.
Probability mass function (discrete RV): fX(x) = P(X = x),
FX(x) =∑xi≤x
fX(xi).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 11 / 40
![Page 41: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/41.jpg)
Random Variables: Distribution FunctionDistribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: number of heads in tossing 2 coins; range(X) = {0, 1, 2}.
Probability mass function (discrete RV): fX(x) = P(X = x),
FX(x) =∑xi≤x
fX(xi).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 11 / 40
![Page 42: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/42.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 43: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/43.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 44: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/44.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 45: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/45.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 46: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/46.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 47: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/47.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 48: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/48.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 49: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/49.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 50: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/50.jpg)
Properties of Distribution Functions
FX : R→ [0, 1] is the distribution function of some r.v. X iff:
it is non-decreasing: x1 < x2 ⇒ FX(x1) ≤ FX(x2);
limx→−∞
FX(x) = 0;
limx→+∞
FX(x) = 1;
it is right semi-continuous: limx→z+
FX(x) = FX(z)
Further properties:
P(X = x) = fX(x) = FX(x)− limz→x−
FX(z);
P(z < X ≤ y) = FX(y)− FX(z);
P(X > x) = 1− FX(x).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 12 / 40
![Page 51: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/51.jpg)
Important Discrete Random VariablesUniform: X ∈ {x1, ..., xK}, pmf fX(xi) = 1/K.
Bernoulli RV: X ∈ {0, 1}, pmf fX(x) =
{p ⇐ x = 1
1− p ⇐ x = 0
Can be written compactly as fX(x) = px(1− p)1−x.
Binomial RV: X ∈ {0, 1, ..., n} (sum of n Bernoulli RVs)
fX(x) = Binomial(x;n, p) =
(n
x
)px (1− p)(n−x)
Binomial coefficients(“n choose x”):(
n
x
)=
n!
(n− x)!x!
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 13 / 40
![Page 52: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/52.jpg)
Important Discrete Random VariablesUniform: X ∈ {x1, ..., xK}, pmf fX(xi) = 1/K.
Bernoulli RV: X ∈ {0, 1}, pmf fX(x) =
{p ⇐ x = 1
1− p ⇐ x = 0
Can be written compactly as fX(x) = px(1− p)1−x.
Binomial RV: X ∈ {0, 1, ..., n} (sum of n Bernoulli RVs)
fX(x) = Binomial(x;n, p) =
(n
x
)px (1− p)(n−x)
Binomial coefficients(“n choose x”):(
n
x
)=
n!
(n− x)!x!
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 13 / 40
![Page 53: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/53.jpg)
Important Discrete Random VariablesUniform: X ∈ {x1, ..., xK}, pmf fX(xi) = 1/K.
Bernoulli RV: X ∈ {0, 1}, pmf fX(x) =
{p ⇐ x = 1
1− p ⇐ x = 0
Can be written compactly as fX(x) = px(1− p)1−x.
Binomial RV: X ∈ {0, 1, ..., n} (sum of n Bernoulli RVs)
fX(x) = Binomial(x;n, p) =
(n
x
)px (1− p)(n−x)
Binomial coefficients(“n choose x”):(
n
x
)=
n!
(n− x)!x!
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 13 / 40
![Page 54: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/54.jpg)
Important Discrete Random VariablesUniform: X ∈ {x1, ..., xK}, pmf fX(xi) = 1/K.
Bernoulli RV: X ∈ {0, 1}, pmf fX(x) =
{p ⇐ x = 1
1− p ⇐ x = 0
Can be written compactly as fX(x) = px(1− p)1−x.
Binomial RV: X ∈ {0, 1, ..., n} (sum of n Bernoulli RVs)
fX(x) = Binomial(x;n, p) =
(n
x
)px (1− p)(n−x)
Binomial coefficients(“n choose x”):(
n
x
)=
n!
(n− x)!x!
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 13 / 40
![Page 55: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/55.jpg)
Other Important Discrete Random VariablesGeometric(p): X ∈ N, pmf fX(x) = p(1− p)x−1.(e.g., number of trials until the first success).
Poisson(λ): X ∈ N ∪ {0}, pmf fX(x) =e−λλx
x!
Notice that∑∞
x=0λx
x! = eλ, thus∑∞
x=0 fX(x) = 1.
“...probability of the number of independent occurrences in a fixed(time/space) interval if these occurrences have known average rate”
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 14 / 40
![Page 56: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/56.jpg)
Other Important Discrete Random VariablesGeometric(p): X ∈ N, pmf fX(x) = p(1− p)x−1.(e.g., number of trials until the first success).
Poisson(λ): X ∈ N ∪ {0}, pmf fX(x) =e−λλx
x!
Notice that∑∞
x=0λx
x! = eλ, thus∑∞
x=0 fX(x) = 1.
“...probability of the number of independent occurrences in a fixed(time/space) interval if these occurrences have known average rate”
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 14 / 40
![Page 57: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/57.jpg)
Random Variables: Distribution Function
Distribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: continuous RV with uniform distribution on [a, b].
Probability density function (pdf, continuous RV): fX(x)
FX(x) =
∫ x
−∞fX(u) du, P(X ∈ [c, d]) =
∫ d
c
fX(x) dx, P(X=x) = 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 15 / 40
![Page 58: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/58.jpg)
Random Variables: Distribution Function
Distribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: continuous RV with uniform distribution on [a, b].
Probability density function (pdf, continuous RV): fX(x)
FX(x) =
∫ x
−∞fX(u) du, P(X ∈ [c, d]) =
∫ d
c
fX(x) dx, P(X=x) = 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 15 / 40
![Page 59: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/59.jpg)
Random Variables: Distribution Function
Distribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: continuous RV with uniform distribution on [a, b].
Probability density function (pdf, continuous RV): fX(x)
FX(x) =
∫ x
−∞fX(u) du, P(X ∈ [c, d]) =
∫ d
c
fX(x) dx, P(X=x) = 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 15 / 40
![Page 60: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/60.jpg)
Random Variables: Distribution Function
Distribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: continuous RV with uniform distribution on [a, b].
Probability density function (pdf, continuous RV): fX(x)
FX(x) =
∫ x
−∞fX(u) du,
P(X ∈ [c, d]) =
∫ d
c
fX(x) dx, P(X=x) = 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 15 / 40
![Page 61: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/61.jpg)
Random Variables: Distribution Function
Distribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: continuous RV with uniform distribution on [a, b].
Probability density function (pdf, continuous RV): fX(x)
FX(x) =
∫ x
−∞fX(u) du, P(X ∈ [c, d]) =
∫ d
c
fX(x) dx,
P(X=x) = 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 15 / 40
![Page 62: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/62.jpg)
Random Variables: Distribution Function
Distribution function: FX(x) = P({ω ∈ X : X(ω) ≤ x})
Example: continuous RV with uniform distribution on [a, b].
Probability density function (pdf, continuous RV): fX(x)
FX(x) =
∫ x
−∞fX(u) du, P(X ∈ [c, d]) =
∫ d
c
fX(x) dx, P(X=x) = 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 15 / 40
![Page 63: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/63.jpg)
Important Continuous Random Variables
Uniform: fX(x) = Uniform(x; a, b) =
{1b−a ⇐ x ∈ [a, b]
0 ⇐ x 6∈ [a, b](previous slide).
Gaussian: fX(x) = N (x;µ, σ2) =1√
2π σ2e−
(x−µ)2
2σ2
Exponential: fX(x) = Exp(x;λ) =
{λe−λx ⇐ x ≥ 0
0 ⇐ x < 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 16 / 40
![Page 64: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/64.jpg)
Important Continuous Random Variables
Uniform: fX(x) = Uniform(x; a, b) =
{1b−a ⇐ x ∈ [a, b]
0 ⇐ x 6∈ [a, b](previous slide).
Gaussian: fX(x) = N (x;µ, σ2) =1√
2π σ2e−
(x−µ)2
2σ2
Exponential: fX(x) = Exp(x;λ) =
{λe−λx ⇐ x ≥ 0
0 ⇐ x < 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 16 / 40
![Page 65: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/65.jpg)
Important Continuous Random Variables
Uniform: fX(x) = Uniform(x; a, b) =
{1b−a ⇐ x ∈ [a, b]
0 ⇐ x 6∈ [a, b](previous slide).
Gaussian: fX(x) = N (x;µ, σ2) =1√
2π σ2e−
(x−µ)2
2σ2
Exponential: fX(x) = Exp(x;λ) =
{λe−λx ⇐ x ≥ 0
0 ⇐ x < 0
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 16 / 40
![Page 66: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/66.jpg)
Expectation of Random Variables
Expectation: E(X) =
∑i
xi fX(xi) X ∈ {x1, ...xK} ⊂ R∫ ∞−∞
x fX(x) dx X continuous
Example: Bernoulli, fX(x) = px (1− p)1−x, for x ∈ {0, 1}.
E(X) = 0 (1− p) + 1 p = p.
Example: Binomial, fX(x) =(nx
)px (1− p)n−x, for x ∈ {0, ..., n}.
E(X) = n p.
Example: Gaussian, fX(x) = N (x;µ, σ2). E(X) = µ.
Linearity of expectation:E(X + Y ) = E(X) + E(Y ); E(αX) = αE(X), α ∈ R
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 17 / 40
![Page 67: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/67.jpg)
Expectation of Random Variables
Expectation: E(X) =
∑i
xi fX(xi) X ∈ {x1, ...xK} ⊂ R∫ ∞−∞
x fX(x) dx X continuous
Example: Bernoulli, fX(x) = px (1− p)1−x, for x ∈ {0, 1}.
E(X) = 0 (1− p) + 1 p = p.
Example: Binomial, fX(x) =(nx
)px (1− p)n−x, for x ∈ {0, ..., n}.
E(X) = n p.
Example: Gaussian, fX(x) = N (x;µ, σ2). E(X) = µ.
Linearity of expectation:E(X + Y ) = E(X) + E(Y ); E(αX) = αE(X), α ∈ R
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 17 / 40
![Page 68: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/68.jpg)
Expectation of Random Variables
Expectation: E(X) =
∑i
xi fX(xi) X ∈ {x1, ...xK} ⊂ R∫ ∞−∞
x fX(x) dx X continuous
Example: Bernoulli, fX(x) = px (1− p)1−x, for x ∈ {0, 1}.
E(X) = 0 (1− p) + 1 p = p.
Example: Binomial, fX(x) =(nx
)px (1− p)n−x, for x ∈ {0, ..., n}.
E(X) = n p.
Example: Gaussian, fX(x) = N (x;µ, σ2). E(X) = µ.
Linearity of expectation:E(X + Y ) = E(X) + E(Y ); E(αX) = αE(X), α ∈ R
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 17 / 40
![Page 69: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/69.jpg)
Expectation of Random Variables
Expectation: E(X) =
∑i
xi fX(xi) X ∈ {x1, ...xK} ⊂ R∫ ∞−∞
x fX(x) dx X continuous
Example: Bernoulli, fX(x) = px (1− p)1−x, for x ∈ {0, 1}.
E(X) = 0 (1− p) + 1 p = p.
Example: Binomial, fX(x) =(nx
)px (1− p)n−x, for x ∈ {0, ..., n}.
E(X) = n p.
Example: Gaussian, fX(x) = N (x;µ, σ2). E(X) = µ.
Linearity of expectation:E(X + Y ) = E(X) + E(Y ); E(αX) = αE(X), α ∈ R
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 17 / 40
![Page 70: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/70.jpg)
Expectation of Random Variables
Expectation: E(X) =
∑i
xi fX(xi) X ∈ {x1, ...xK} ⊂ R∫ ∞−∞
x fX(x) dx X continuous
Example: Bernoulli, fX(x) = px (1− p)1−x, for x ∈ {0, 1}.
E(X) = 0 (1− p) + 1 p = p.
Example: Binomial, fX(x) =(nx
)px (1− p)n−x, for x ∈ {0, ..., n}.
E(X) = n p.
Example: Gaussian, fX(x) = N (x;µ, σ2). E(X) = µ.
Linearity of expectation:E(X + Y ) = E(X) + E(Y ); E(αX) = αE(X), α ∈ R
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 17 / 40
![Page 71: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/71.jpg)
Expectation of Functions of Random Variables
E(g(X)) =
∑i
g(xi)fX(xi) X discrete, g(xi) ∈ R∫ ∞−∞
g(x) fX(x) dx X continuous
Example: variance, var(X) = E((X − E(X)
)2)
= E(X2)− E(X)2
Example: Bernoulli variance, E(X2) = E(X) = p
, thus var(X) = p(1− p).
Example: Gaussian variance, E((X − µ)2
)= σ2.
Probability as expectation of indicator, 1A(x) =
{1 ⇐ x ∈ A0 ⇐ x 6∈ A
P(X ∈ A) =
∫AfX(x) dx =
∫1A(x) fX(x) dx = E(1A(X))
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 18 / 40
![Page 72: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/72.jpg)
Expectation of Functions of Random Variables
E(g(X)) =
∑i
g(xi)fX(xi) X discrete, g(xi) ∈ R∫ ∞−∞
g(x) fX(x) dx X continuous
Example: variance, var(X) = E((X − E(X)
)2)
= E(X2)− E(X)2
Example: Bernoulli variance, E(X2) = E(X) = p
, thus var(X) = p(1− p).
Example: Gaussian variance, E((X − µ)2
)= σ2.
Probability as expectation of indicator, 1A(x) =
{1 ⇐ x ∈ A0 ⇐ x 6∈ A
P(X ∈ A) =
∫AfX(x) dx =
∫1A(x) fX(x) dx = E(1A(X))
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 18 / 40
![Page 73: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/73.jpg)
Expectation of Functions of Random Variables
E(g(X)) =
∑i
g(xi)fX(xi) X discrete, g(xi) ∈ R∫ ∞−∞
g(x) fX(x) dx X continuous
Example: variance, var(X) = E((X − E(X)
)2)= E(X2)− E(X)2
Example: Bernoulli variance, E(X2) = E(X) = p
, thus var(X) = p(1− p).
Example: Gaussian variance, E((X − µ)2
)= σ2.
Probability as expectation of indicator, 1A(x) =
{1 ⇐ x ∈ A0 ⇐ x 6∈ A
P(X ∈ A) =
∫AfX(x) dx =
∫1A(x) fX(x) dx = E(1A(X))
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 18 / 40
![Page 74: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/74.jpg)
Expectation of Functions of Random Variables
E(g(X)) =
∑i
g(xi)fX(xi) X discrete, g(xi) ∈ R∫ ∞−∞
g(x) fX(x) dx X continuous
Example: variance, var(X) = E((X − E(X)
)2)= E(X2)− E(X)2
Example: Bernoulli variance, E(X2) = E(X) = p , thus var(X) = p(1− p).
Example: Gaussian variance, E((X − µ)2
)= σ2.
Probability as expectation of indicator, 1A(x) =
{1 ⇐ x ∈ A0 ⇐ x 6∈ A
P(X ∈ A) =
∫AfX(x) dx =
∫1A(x) fX(x) dx = E(1A(X))
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 18 / 40
![Page 75: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/75.jpg)
Expectation of Functions of Random Variables
E(g(X)) =
∑i
g(xi)fX(xi) X discrete, g(xi) ∈ R∫ ∞−∞
g(x) fX(x) dx X continuous
Example: variance, var(X) = E((X − E(X)
)2)= E(X2)− E(X)2
Example: Bernoulli variance, E(X2) = E(X) = p , thus var(X) = p(1− p).
Example: Gaussian variance, E((X − µ)2
)= σ2.
Probability as expectation of indicator, 1A(x) =
{1 ⇐ x ∈ A0 ⇐ x 6∈ A
P(X ∈ A) =
∫AfX(x) dx =
∫1A(x) fX(x) dx = E(1A(X))
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 18 / 40
![Page 76: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/76.jpg)
Expectation of Functions of Random Variables
E(g(X)) =
∑i
g(xi)fX(xi) X discrete, g(xi) ∈ R∫ ∞−∞
g(x) fX(x) dx X continuous
Example: variance, var(X) = E((X − E(X)
)2)= E(X2)− E(X)2
Example: Bernoulli variance, E(X2) = E(X) = p , thus var(X) = p(1− p).
Example: Gaussian variance, E((X − µ)2
)= σ2.
Probability as expectation of indicator, 1A(x) =
{1 ⇐ x ∈ A0 ⇐ x 6∈ A
P(X ∈ A) =
∫AfX(x) dx =
∫1A(x) fX(x) dx = E(1A(X))
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 18 / 40
![Page 77: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/77.jpg)
Two (or More) Random VariablesJoint pmf of two discrete RVs: fX,Y (x, y) = P(X = x ∧ Y = y).
Extends trivially to more than two RVs.
Joint pdf of two continuous RVs: fX,Y (x, y), such that
P((X,Y ) ∈ A
)=
∫ ∫AfX,Y (x, y) dx dy, A ∈ σ(R2)
Extends trivially to more than two RVs.
Marginalization: fY (y) =
∑x
fX,Y (x, y), if X is discrete∫ ∞−∞
fX,Y (x, y) dx, if X continuous
Independence:
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)
⇒6⇐ E(X Y ) = E(X)E(Y )
.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 19 / 40
![Page 78: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/78.jpg)
Two (or More) Random VariablesJoint pmf of two discrete RVs: fX,Y (x, y) = P(X = x ∧ Y = y).
Extends trivially to more than two RVs.
Joint pdf of two continuous RVs: fX,Y (x, y), such that
P((X,Y ) ∈ A
)=
∫ ∫AfX,Y (x, y) dx dy, A ∈ σ(R2)
Extends trivially to more than two RVs.
Marginalization: fY (y) =
∑x
fX,Y (x, y), if X is discrete∫ ∞−∞
fX,Y (x, y) dx, if X continuous
Independence:
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)
⇒6⇐ E(X Y ) = E(X)E(Y )
.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 19 / 40
![Page 79: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/79.jpg)
Two (or More) Random VariablesJoint pmf of two discrete RVs: fX,Y (x, y) = P(X = x ∧ Y = y).
Extends trivially to more than two RVs.
Joint pdf of two continuous RVs: fX,Y (x, y), such that
P((X,Y ) ∈ A
)=
∫ ∫AfX,Y (x, y) dx dy, A ∈ σ(R2)
Extends trivially to more than two RVs.
Marginalization: fY (y) =
∑x
fX,Y (x, y), if X is discrete∫ ∞−∞
fX,Y (x, y) dx, if X continuous
Independence:
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)
⇒6⇐ E(X Y ) = E(X)E(Y )
.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 19 / 40
![Page 80: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/80.jpg)
Two (or More) Random VariablesJoint pmf of two discrete RVs: fX,Y (x, y) = P(X = x ∧ Y = y).
Extends trivially to more than two RVs.
Joint pdf of two continuous RVs: fX,Y (x, y), such that
P((X,Y ) ∈ A
)=
∫ ∫AfX,Y (x, y) dx dy, A ∈ σ(R2)
Extends trivially to more than two RVs.
Marginalization: fY (y) =
∑x
fX,Y (x, y), if X is discrete∫ ∞−∞
fX,Y (x, y) dx, if X continuous
Independence:
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)
⇒6⇐ E(X Y ) = E(X)E(Y )
.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 19 / 40
![Page 81: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/81.jpg)
Two (or More) Random VariablesJoint pmf of two discrete RVs: fX,Y (x, y) = P(X = x ∧ Y = y).
Extends trivially to more than two RVs.
Joint pdf of two continuous RVs: fX,Y (x, y), such that
P((X,Y ) ∈ A
)=
∫ ∫AfX,Y (x, y) dx dy, A ∈ σ(R2)
Extends trivially to more than two RVs.
Marginalization: fY (y) =
∑x
fX,Y (x, y), if X is discrete∫ ∞−∞
fX,Y (x, y) dx, if X continuous
Independence:
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)⇒6⇐ E(X Y ) = E(X)E(Y ).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 19 / 40
![Page 82: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/82.jpg)
Conditionals and Bayes’ Theorem
Conditional pmf (discrete RVs):
fX|Y (x|y) = P(X = x|Y = y) =P(X = x ∧ Y = y)
P(Y = y)=fX,Y (x, y)
fY (y).
Conditional pdf (continuous RVs): fX|Y (x|y) =fX,Y (x, y)
fY (y)...the meaning is technically delicate.
Bayes’ theorem: fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)(pdf or pmf).
Also valid in the mixed case (e.g., X continuous, Y discrete).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 20 / 40
![Page 83: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/83.jpg)
Conditionals and Bayes’ Theorem
Conditional pmf (discrete RVs):
fX|Y (x|y) = P(X = x|Y = y) =P(X = x ∧ Y = y)
P(Y = y)=fX,Y (x, y)
fY (y).
Conditional pdf (continuous RVs): fX|Y (x|y) =fX,Y (x, y)
fY (y)...the meaning is technically delicate.
Bayes’ theorem: fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)(pdf or pmf).
Also valid in the mixed case (e.g., X continuous, Y discrete).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 20 / 40
![Page 84: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/84.jpg)
Conditionals and Bayes’ Theorem
Conditional pmf (discrete RVs):
fX|Y (x|y) = P(X = x|Y = y) =P(X = x ∧ Y = y)
P(Y = y)=fX,Y (x, y)
fY (y).
Conditional pdf (continuous RVs): fX|Y (x|y) =fX,Y (x, y)
fY (y)...the meaning is technically delicate.
Bayes’ theorem: fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)(pdf or pmf).
Also valid in the mixed case (e.g., X continuous, Y discrete).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 20 / 40
![Page 85: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/85.jpg)
Conditionals and Bayes’ Theorem
Conditional pmf (discrete RVs):
fX|Y (x|y) = P(X = x|Y = y) =P(X = x ∧ Y = y)
P(Y = y)=fX,Y (x, y)
fY (y).
Conditional pdf (continuous RVs): fX|Y (x|y) =fX,Y (x, y)
fY (y)...the meaning is technically delicate.
Bayes’ theorem: fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)(pdf or pmf).
Also valid in the mixed case (e.g., X continuous, Y discrete).
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 20 / 40
![Page 86: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/86.jpg)
Joint, Marginal, and Conditional Probabilities: An Example
A pair of binary variables X,Y ∈ {0, 1}, with joint pmf:
Marginals: fX(0) = 15 + 2
5 = 35 , fX(1) = 1
10 + 310 = 4
10 ,
fY (0) = 15 + 1
10 = 310 , fY (1) = 2
5 + 310 = 7
10 .
Conditional probabilities:
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 21 / 40
![Page 87: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/87.jpg)
Joint, Marginal, and Conditional Probabilities: An Example
A pair of binary variables X,Y ∈ {0, 1}, with joint pmf:
Marginals: fX(0) = 15 + 2
5 = 35 , fX(1) = 1
10 + 310 = 4
10 ,
fY (0) = 15 + 1
10 = 310 , fY (1) = 2
5 + 310 = 7
10 .
Conditional probabilities:
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 21 / 40
![Page 88: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/88.jpg)
Joint, Marginal, and Conditional Probabilities: An Example
A pair of binary variables X,Y ∈ {0, 1}, with joint pmf:
Marginals: fX(0) = 15 + 2
5 = 35 , fX(1) = 1
10 + 310 = 4
10 ,
fY (0) = 15 + 1
10 = 310 , fY (1) = 2
5 + 310 = 7
10 .
Conditional probabilities:
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 21 / 40
![Page 89: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/89.jpg)
An Important Multivariate RV: Multinomial
Multinomial: X = (X1, ..., XK), Xi ∈ {0, ..., n}, such that∑iXi = n,
fX(x1, ..., xK) =
{ (n
x1 x2 ··· xK
)px11 px22 · · · p
xKk ⇐
∑i xi = n
0 ⇐∑
i xi 6= n(n
x1 x2 · · · xK
)=
n!
x1!x2! · · · xK !
Parameters: p1, ..., pK ≥ 0, such that∑
i pi = 1.
Generalizes the binomial from binary to K-classes.
Example: tossing n independent fair dice, p1 = · · · = p6 = 1/6.xi = number of outcomes with i dots. Of course,
∑i xi = n.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 22 / 40
![Page 90: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/90.jpg)
An Important Multivariate RV: Multinomial
Multinomial: X = (X1, ..., XK), Xi ∈ {0, ..., n}, such that∑iXi = n,
fX(x1, ..., xK) =
{ (n
x1 x2 ··· xK
)px11 px22 · · · p
xKk ⇐
∑i xi = n
0 ⇐∑
i xi 6= n(n
x1 x2 · · · xK
)=
n!
x1!x2! · · · xK !
Parameters: p1, ..., pK ≥ 0, such that∑
i pi = 1.
Generalizes the binomial from binary to K-classes.
Example: tossing n independent fair dice, p1 = · · · = p6 = 1/6.xi = number of outcomes with i dots. Of course,
∑i xi = n.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 22 / 40
![Page 91: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/91.jpg)
An Important Multivariate RV: Multinomial
Multinomial: X = (X1, ..., XK), Xi ∈ {0, ..., n}, such that∑iXi = n,
fX(x1, ..., xK) =
{ (n
x1 x2 ··· xK
)px11 px22 · · · p
xKk ⇐
∑i xi = n
0 ⇐∑
i xi 6= n(n
x1 x2 · · · xK
)=
n!
x1!x2! · · · xK !
Parameters: p1, ..., pK ≥ 0, such that∑
i pi = 1.
Generalizes the binomial from binary to K-classes.
Example: tossing n independent fair dice, p1 = · · · = p6 = 1/6.xi = number of outcomes with i dots. Of course,
∑i xi = n.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 22 / 40
![Page 92: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/92.jpg)
An Important Multivariate RV: Gaussian
Multivariate Gaussian: X ∈ Rn,
fX(x) = N (x;µ,C) =1√
det(2π C)exp
(−1
2(x− µ)TC−1(x− µ)
)
Parameters: vector µ ∈ Rn and matrix C ∈ Rn×n.Expected value: E(X) = µ. Meaning of C: next slide.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 23 / 40
![Page 93: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/93.jpg)
An Important Multivariate RV: Gaussian
Multivariate Gaussian: X ∈ Rn,
fX(x) = N (x;µ,C) =1√
det(2π C)exp
(−1
2(x− µ)TC−1(x− µ)
)
Parameters: vector µ ∈ Rn and matrix C ∈ Rn×n.Expected value: E(X) = µ. Meaning of C: next slide.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 23 / 40
![Page 94: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/94.jpg)
An Important Multivariate RV: Gaussian
Multivariate Gaussian: X ∈ Rn,
fX(x) = N (x;µ,C) =1√
det(2π C)exp
(−1
2(x− µ)TC−1(x− µ)
)
Parameters: vector µ ∈ Rn and matrix C ∈ Rn×n.Expected value: E(X) = µ. Meaning of C: next slide.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 23 / 40
![Page 95: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/95.jpg)
Covariance, Correlation, and all that...
Covariance between two RVs:
cov(X,Y ) = E[(X − E(X)
) (Y − E(Y )
)]= E(X Y )− E(X)E(Y )
Relationship with variance: var(X) = cov(X,X).
Correlation: corr(X,Y ) = ρ(X,Y ) = cov(X,Y )√var(X)
√var(Y )
∈ [−1, 1]
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)
⇒6⇐ cov(X, Y ) = 0.
Covariance matrix of multivariate RV, X ∈ Rn:
cov(X) = E[(X − E(X)
)(X − E(X)
)T ]= E(XXT )− E(X)E(X)T
Covariance of Gaussian RV, fX(x) = N (x;µ,C) ⇒ cov(X) = C
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 24 / 40
![Page 96: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/96.jpg)
Covariance, Correlation, and all that...
Covariance between two RVs:
cov(X,Y ) = E[(X − E(X)
) (Y − E(Y )
)]= E(X Y )− E(X)E(Y )
Relationship with variance: var(X) = cov(X,X).
Correlation: corr(X,Y ) = ρ(X,Y ) = cov(X,Y )√var(X)
√var(Y )
∈ [−1, 1]
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)
⇒6⇐ cov(X, Y ) = 0.
Covariance matrix of multivariate RV, X ∈ Rn:
cov(X) = E[(X − E(X)
)(X − E(X)
)T ]= E(XXT )− E(X)E(X)T
Covariance of Gaussian RV, fX(x) = N (x;µ,C) ⇒ cov(X) = C
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 24 / 40
![Page 97: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/97.jpg)
Covariance, Correlation, and all that...
Covariance between two RVs:
cov(X,Y ) = E[(X − E(X)
) (Y − E(Y )
)]= E(X Y )− E(X)E(Y )
Relationship with variance: var(X) = cov(X,X).
Correlation: corr(X,Y ) = ρ(X,Y ) = cov(X,Y )√var(X)
√var(Y )
∈ [−1, 1]
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)
⇒6⇐ cov(X, Y ) = 0.
Covariance matrix of multivariate RV, X ∈ Rn:
cov(X) = E[(X − E(X)
)(X − E(X)
)T ]= E(XXT )− E(X)E(X)T
Covariance of Gaussian RV, fX(x) = N (x;µ,C) ⇒ cov(X) = C
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 24 / 40
![Page 98: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/98.jpg)
Covariance, Correlation, and all that...
Covariance between two RVs:
cov(X,Y ) = E[(X − E(X)
) (Y − E(Y )
)]= E(X Y )− E(X)E(Y )
Relationship with variance: var(X) = cov(X,X).
Correlation: corr(X,Y ) = ρ(X,Y ) = cov(X,Y )√var(X)
√var(Y )
∈ [−1, 1]
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)
⇒6⇐ cov(X, Y ) = 0.
Covariance matrix of multivariate RV, X ∈ Rn:
cov(X) = E[(X − E(X)
)(X − E(X)
)T ]= E(XXT )− E(X)E(X)T
Covariance of Gaussian RV, fX(x) = N (x;µ,C) ⇒ cov(X) = C
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 24 / 40
![Page 99: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/99.jpg)
Covariance, Correlation, and all that...
Covariance between two RVs:
cov(X,Y ) = E[(X − E(X)
) (Y − E(Y )
)]= E(X Y )− E(X)E(Y )
Relationship with variance: var(X) = cov(X,X).
Correlation: corr(X,Y ) = ρ(X,Y ) = cov(X,Y )√var(X)
√var(Y )
∈ [−1, 1]
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)⇒6⇐ cov(X, Y ) = 0.
Covariance matrix of multivariate RV, X ∈ Rn:
cov(X) = E[(X − E(X)
)(X − E(X)
)T ]= E(XXT )− E(X)E(X)T
Covariance of Gaussian RV, fX(x) = N (x;µ,C) ⇒ cov(X) = C
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 24 / 40
![Page 100: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/100.jpg)
Covariance, Correlation, and all that...
Covariance between two RVs:
cov(X,Y ) = E[(X − E(X)
) (Y − E(Y )
)]= E(X Y )− E(X)E(Y )
Relationship with variance: var(X) = cov(X,X).
Correlation: corr(X,Y ) = ρ(X,Y ) = cov(X,Y )√var(X)
√var(Y )
∈ [−1, 1]
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)⇒6⇐ cov(X, Y ) = 0.
Covariance matrix of multivariate RV, X ∈ Rn:
cov(X) = E[(X − E(X)
)(X − E(X)
)T ]= E(XXT )− E(X)E(X)T
Covariance of Gaussian RV, fX(x) = N (x;µ,C) ⇒ cov(X) = C
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 24 / 40
![Page 101: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/101.jpg)
Covariance, Correlation, and all that...
Covariance between two RVs:
cov(X,Y ) = E[(X − E(X)
) (Y − E(Y )
)]= E(X Y )− E(X)E(Y )
Relationship with variance: var(X) = cov(X,X).
Correlation: corr(X,Y ) = ρ(X,Y ) = cov(X,Y )√var(X)
√var(Y )
∈ [−1, 1]
X ⊥⊥ Y ⇔ fX,Y (x, y) = fX(x) fY (y)⇒6⇐ cov(X, Y ) = 0.
Covariance matrix of multivariate RV, X ∈ Rn:
cov(X) = E[(X − E(X)
)(X − E(X)
)T ]= E(XXT )− E(X)E(X)T
Covariance of Gaussian RV, fX(x) = N (x;µ,C) ⇒ cov(X) = C
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 24 / 40
![Page 102: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/102.jpg)
More on Expectations and Covariances
Let A ∈ Rn×n be a matrix and a ∈ Rn a vector.
If E(X) = µ and Y = AX, then E(Y ) = Aµ;
If E(X) = µ and Y = X − µ, then E(Y ) = 0;
If cov(X) = C and Y = AX, then cov(Y ) = ACAT ;
If cov(X) = C and Y = aTX ∈ R, then var(Y ) = aTCa ≥ 0;
If cov(X) = C and Y = C−1/2X, then cov(Y ) = I;
Combining the 2-nd and the 4-th facts is called standardization
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 25 / 40
![Page 103: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/103.jpg)
More on Expectations and Covariances
Let A ∈ Rn×n be a matrix and a ∈ Rn a vector.
If E(X) = µ and Y = AX, then E(Y ) = Aµ;
If E(X) = µ and Y = X − µ, then E(Y ) = 0;
If cov(X) = C and Y = AX, then cov(Y ) = ACAT ;
If cov(X) = C and Y = aTX ∈ R, then var(Y ) = aTCa ≥ 0;
If cov(X) = C and Y = C−1/2X, then cov(Y ) = I;
Combining the 2-nd and the 4-th facts is called standardization
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 25 / 40
![Page 104: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/104.jpg)
More on Expectations and Covariances
Let A ∈ Rn×n be a matrix and a ∈ Rn a vector.
If E(X) = µ and Y = AX, then E(Y ) = Aµ;
If E(X) = µ and Y = X − µ, then E(Y ) = 0;
If cov(X) = C and Y = AX, then cov(Y ) = ACAT ;
If cov(X) = C and Y = aTX ∈ R, then var(Y ) = aTCa ≥ 0;
If cov(X) = C and Y = C−1/2X, then cov(Y ) = I;
Combining the 2-nd and the 4-th facts is called standardization
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 25 / 40
![Page 105: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/105.jpg)
More on Expectations and Covariances
Let A ∈ Rn×n be a matrix and a ∈ Rn a vector.
If E(X) = µ and Y = AX, then E(Y ) = Aµ;
If E(X) = µ and Y = X − µ, then E(Y ) = 0;
If cov(X) = C and Y = AX, then cov(Y ) = ACAT ;
If cov(X) = C and Y = aTX ∈ R, then var(Y ) = aTCa ≥ 0;
If cov(X) = C and Y = C−1/2X, then cov(Y ) = I;
Combining the 2-nd and the 4-th facts is called standardization
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 25 / 40
![Page 106: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/106.jpg)
More on Expectations and Covariances
Let A ∈ Rn×n be a matrix and a ∈ Rn a vector.
If E(X) = µ and Y = AX, then E(Y ) = Aµ;
If E(X) = µ and Y = X − µ, then E(Y ) = 0;
If cov(X) = C and Y = AX, then cov(Y ) = ACAT ;
If cov(X) = C and Y = aTX ∈ R, then var(Y ) = aTCa ≥ 0;
If cov(X) = C and Y = C−1/2X, then cov(Y ) = I;
Combining the 2-nd and the 4-th facts is called standardization
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 25 / 40
![Page 107: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/107.jpg)
More on Expectations and Covariances
Let A ∈ Rn×n be a matrix and a ∈ Rn a vector.
If E(X) = µ and Y = AX, then E(Y ) = Aµ;
If E(X) = µ and Y = X − µ, then E(Y ) = 0;
If cov(X) = C and Y = AX, then cov(Y ) = ACAT ;
If cov(X) = C and Y = aTX ∈ R, then var(Y ) = aTCa ≥ 0;
If cov(X) = C and Y = C−1/2X, then cov(Y ) = I;
Combining the 2-nd and the 4-th facts is called standardization
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 25 / 40
![Page 108: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/108.jpg)
More on Expectations and Covariances
Let A ∈ Rn×n be a matrix and a ∈ Rn a vector.
If E(X) = µ and Y = AX, then E(Y ) = Aµ;
If E(X) = µ and Y = X − µ, then E(Y ) = 0;
If cov(X) = C and Y = AX, then cov(Y ) = ACAT ;
If cov(X) = C and Y = aTX ∈ R, then var(Y ) = aTCa ≥ 0;
If cov(X) = C and Y = C−1/2X, then cov(Y ) = I;
Combining the 2-nd and the 4-th facts is called standardization
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 25 / 40
![Page 109: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/109.jpg)
Statistical InferenceScenario: observed RV Y , depends on unknown variable(s) X.Goal: given an observation Y = y, infer X.
Two main philosophies:Frequentist: X = x is fixed, but unknown;
Bayesian: X is a RV with pdf/pmf fX(x) (the prior)prior ⇔ knowledge about X
In both philosophies, a central object is fY |X(y|x)several names: likelihood function, observation model,...
This in not machine learning! fY,X(y, x) is assumed known.
In the Bayesian philosophy, all the knowledge about X is carried by
fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)=fY,X(y, x)
fY (y)
...the posterior (or a posteriori) pdf/pmf.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 26 / 40
![Page 110: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/110.jpg)
Statistical InferenceScenario: observed RV Y , depends on unknown variable(s) X.Goal: given an observation Y = y, infer X.
Two main philosophies:Frequentist: X = x is fixed, but unknown;
Bayesian: X is a RV with pdf/pmf fX(x) (the prior)prior ⇔ knowledge about X
In both philosophies, a central object is fY |X(y|x)several names: likelihood function, observation model,...
This in not machine learning! fY,X(y, x) is assumed known.
In the Bayesian philosophy, all the knowledge about X is carried by
fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)=fY,X(y, x)
fY (y)
...the posterior (or a posteriori) pdf/pmf.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 26 / 40
![Page 111: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/111.jpg)
Statistical InferenceScenario: observed RV Y , depends on unknown variable(s) X.Goal: given an observation Y = y, infer X.
Two main philosophies:Frequentist: X = x is fixed, but unknown;
Bayesian: X is a RV with pdf/pmf fX(x) (the prior)prior ⇔ knowledge about X
In both philosophies, a central object is fY |X(y|x)several names: likelihood function, observation model,...
This in not machine learning! fY,X(y, x) is assumed known.
In the Bayesian philosophy, all the knowledge about X is carried by
fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)=fY,X(y, x)
fY (y)
...the posterior (or a posteriori) pdf/pmf.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 26 / 40
![Page 112: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/112.jpg)
Statistical InferenceScenario: observed RV Y , depends on unknown variable(s) X.Goal: given an observation Y = y, infer X.
Two main philosophies:Frequentist: X = x is fixed, but unknown;
Bayesian: X is a RV with pdf/pmf fX(x) (the prior)prior ⇔ knowledge about X
In both philosophies, a central object is fY |X(y|x)several names: likelihood function, observation model,...
This in not machine learning! fY,X(y, x) is assumed known.
In the Bayesian philosophy, all the knowledge about X is carried by
fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)=fY,X(y, x)
fY (y)
...the posterior (or a posteriori) pdf/pmf.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 26 / 40
![Page 113: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/113.jpg)
Statistical InferenceScenario: observed RV Y , depends on unknown variable(s) X.Goal: given an observation Y = y, infer X.
Two main philosophies:Frequentist: X = x is fixed, but unknown;
Bayesian: X is a RV with pdf/pmf fX(x) (the prior)prior ⇔ knowledge about X
In both philosophies, a central object is fY |X(y|x)several names: likelihood function, observation model,...
This in not machine learning! fY,X(y, x) is assumed known.
In the Bayesian philosophy, all the knowledge about X is carried by
fX|Y (x|y) =fY |X(y|x) fX(x)
fY (y)=fY,X(y, x)
fY (y)
...the posterior (or a posteriori) pdf/pmf.Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 26 / 40
![Page 114: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/114.jpg)
Statistical Inference
The posterior pdf/pmf fX|Y (x|y) has all the information/knowledgeabout X, given Y = y (conditionality principle).
How to make an optimal “guess” x about X, given this information?
Need to define “optimal”: loss function: L(x, x) ≥ 0 measures“loss”/“cost” incurred by “guessing” x if truth is x.
The optimal Bayesian decision minimizes the expected loss:
xBayes = arg minx
E[L(x, X)|Y = y]
where
E[L(x, X)|Y = y] =
∫L(x, x) fX|Y (x|y) dx, continuous (estimation)∑
x
L(x, x) fX|Y (x|y), discrete (classification)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 27 / 40
![Page 115: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/115.jpg)
Statistical Inference
The posterior pdf/pmf fX|Y (x|y) has all the information/knowledgeabout X, given Y = y (conditionality principle).
How to make an optimal “guess” x about X, given this information?
Need to define “optimal”: loss function: L(x, x) ≥ 0 measures“loss”/“cost” incurred by “guessing” x if truth is x.
The optimal Bayesian decision minimizes the expected loss:
xBayes = arg minx
E[L(x, X)|Y = y]
where
E[L(x, X)|Y = y] =
∫L(x, x) fX|Y (x|y) dx, continuous (estimation)∑
x
L(x, x) fX|Y (x|y), discrete (classification)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 27 / 40
![Page 116: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/116.jpg)
Statistical Inference
The posterior pdf/pmf fX|Y (x|y) has all the information/knowledgeabout X, given Y = y (conditionality principle).
How to make an optimal “guess” x about X, given this information?
Need to define “optimal”: loss function: L(x, x) ≥ 0 measures“loss”/“cost” incurred by “guessing” x if truth is x.
The optimal Bayesian decision minimizes the expected loss:
xBayes = arg minx
E[L(x, X)|Y = y]
where
E[L(x, X)|Y = y] =
∫L(x, x) fX|Y (x|y) dx, continuous (estimation)∑
x
L(x, x) fX|Y (x|y), discrete (classification)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 27 / 40
![Page 117: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/117.jpg)
Statistical Inference
The posterior pdf/pmf fX|Y (x|y) has all the information/knowledgeabout X, given Y = y (conditionality principle).
How to make an optimal “guess” x about X, given this information?
Need to define “optimal”: loss function: L(x, x) ≥ 0 measures“loss”/“cost” incurred by “guessing” x if truth is x.
The optimal Bayesian decision minimizes the expected loss:
xBayes = arg minx
E[L(x, X)|Y = y]
where
E[L(x, X)|Y = y] =
∫L(x, x) fX|Y (x|y) dx, continuous (estimation)∑
x
L(x, x) fX|Y (x|y), discrete (classification)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 27 / 40
![Page 118: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/118.jpg)
Classical Statistical Inference Criteria
Consider that X ∈ {1, ...,K} (discrete/classification problem).
Adopt the 0/1 loss: L(x, x) = 0, if x = x, and 1 otherwise.
Optimal Bayesian decision:
xBayes = arg minx
K∑x=1
L(x, x) fX|Y (x|y)
= arg minx
(1− fX|Y (x|y)
)= arg max
xfX|Y (x|y) ≡ xMAP
MAP = maximum a posteriori criterion.
Same criterion can be derived for continuous X
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 28 / 40
![Page 119: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/119.jpg)
Classical Statistical Inference Criteria
Consider that X ∈ {1, ...,K} (discrete/classification problem).
Adopt the 0/1 loss: L(x, x) = 0, if x = x, and 1 otherwise.
Optimal Bayesian decision:
xBayes = arg minx
K∑x=1
L(x, x) fX|Y (x|y)
= arg minx
(1− fX|Y (x|y)
)= arg max
xfX|Y (x|y) ≡ xMAP
MAP = maximum a posteriori criterion.
Same criterion can be derived for continuous X
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 28 / 40
![Page 120: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/120.jpg)
Classical Statistical Inference Criteria
Consider that X ∈ {1, ...,K} (discrete/classification problem).
Adopt the 0/1 loss: L(x, x) = 0, if x = x, and 1 otherwise.
Optimal Bayesian decision:
xBayes = arg minx
K∑x=1
L(x, x) fX|Y (x|y)
= arg minx
(1− fX|Y (x|y)
)= arg max
xfX|Y (x|y) ≡ xMAP
MAP = maximum a posteriori criterion.
Same criterion can be derived for continuous X
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 28 / 40
![Page 121: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/121.jpg)
Classical Statistical Inference Criteria
Consider that X ∈ {1, ...,K} (discrete/classification problem).
Adopt the 0/1 loss: L(x, x) = 0, if x = x, and 1 otherwise.
Optimal Bayesian decision:
xBayes = arg minx
K∑x=1
L(x, x) fX|Y (x|y)
= arg minx
(1− fX|Y (x|y)
)= arg max
xfX|Y (x|y) ≡ xMAP
MAP = maximum a posteriori criterion.
Same criterion can be derived for continuous X
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 28 / 40
![Page 122: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/122.jpg)
Classical Statistical Inference Criteria
Consider the MAP criterion xMAP = arg maxx fX|Y (x|y)
From Bayes law:
xMAP = arg maxx
fY |X(y|x) fX(x)
fY (y)= arg max
xfY |X(y|x) fX(x)
...only need to know posterior fX|Y (x|y) up to a normalization factor.
Also common to write:xMAP = arg maxx
(log fY |X(y|x) + log fX(x)
)If the prior if flat, fX(x) = C, then,
xMAP = arg maxx
fY |X(y|x) ≡ xML
ML = maximum likelihood criterion.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 29 / 40
![Page 123: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/123.jpg)
Classical Statistical Inference Criteria
Consider the MAP criterion xMAP = arg maxx fX|Y (x|y)
From Bayes law:
xMAP = arg maxx
fY |X(y|x) fX(x)
fY (y)= arg max
xfY |X(y|x) fX(x)
...only need to know posterior fX|Y (x|y) up to a normalization factor.
Also common to write:xMAP = arg maxx
(log fY |X(y|x) + log fX(x)
)If the prior if flat, fX(x) = C, then,
xMAP = arg maxx
fY |X(y|x) ≡ xML
ML = maximum likelihood criterion.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 29 / 40
![Page 124: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/124.jpg)
Classical Statistical Inference Criteria
Consider the MAP criterion xMAP = arg maxx fX|Y (x|y)
From Bayes law:
xMAP = arg maxx
fY |X(y|x) fX(x)
fY (y)= arg max
xfY |X(y|x) fX(x)
...only need to know posterior fX|Y (x|y) up to a normalization factor.
Also common to write:xMAP = arg maxx
(log fY |X(y|x) + log fX(x)
)
If the prior if flat, fX(x) = C, then,
xMAP = arg maxx
fY |X(y|x) ≡ xML
ML = maximum likelihood criterion.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 29 / 40
![Page 125: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/125.jpg)
Classical Statistical Inference Criteria
Consider the MAP criterion xMAP = arg maxx fX|Y (x|y)
From Bayes law:
xMAP = arg maxx
fY |X(y|x) fX(x)
fY (y)= arg max
xfY |X(y|x) fX(x)
...only need to know posterior fX|Y (x|y) up to a normalization factor.
Also common to write:xMAP = arg maxx
(log fY |X(y|x) + log fX(x)
)If the prior if flat, fX(x) = C, then,
xMAP = arg maxx
fY |X(y|x) ≡ xML
ML = maximum likelihood criterion.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 29 / 40
![Page 126: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/126.jpg)
Statistical Inference: Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs:
Y = (Y1, ..., Yn), with Yi ∈ {0, 1}.Common pmf fYi|X(y|x) = xy(1− x)1−y, where x ∈ [0, 1].
Likelihood function: fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi
Log-likelihood function:
log fY |X(y1, ..., yn|x
)= n log(1− x) + log
x
1− x
n∑i=1
yi
Maximum likelihood: xML = arg maxx fY |X(y|x) =1
n
n∑i=1
yi
Example: n = 10, observed y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1), xML = 7/10.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 30 / 40
![Page 127: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/127.jpg)
Statistical Inference: Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs:
Y = (Y1, ..., Yn), with Yi ∈ {0, 1}.Common pmf fYi|X(y|x) = xy(1− x)1−y, where x ∈ [0, 1].
Likelihood function: fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi
Log-likelihood function:
log fY |X(y1, ..., yn|x
)= n log(1− x) + log
x
1− x
n∑i=1
yi
Maximum likelihood: xML = arg maxx fY |X(y|x) =1
n
n∑i=1
yi
Example: n = 10, observed y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1), xML = 7/10.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 30 / 40
![Page 128: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/128.jpg)
Statistical Inference: Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs:
Y = (Y1, ..., Yn), with Yi ∈ {0, 1}.Common pmf fYi|X(y|x) = xy(1− x)1−y, where x ∈ [0, 1].
Likelihood function: fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi
Log-likelihood function:
log fY |X(y1, ..., yn|x
)= n log(1− x) + log
x
1− x
n∑i=1
yi
Maximum likelihood: xML = arg maxx fY |X(y|x) =1
n
n∑i=1
yi
Example: n = 10, observed y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1), xML = 7/10.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 30 / 40
![Page 129: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/129.jpg)
Statistical Inference: Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs:
Y = (Y1, ..., Yn), with Yi ∈ {0, 1}.Common pmf fYi|X(y|x) = xy(1− x)1−y, where x ∈ [0, 1].
Likelihood function: fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi
Log-likelihood function:
log fY |X(y1, ..., yn|x
)= n log(1− x) + log
x
1− x
n∑i=1
yi
Maximum likelihood: xML = arg maxx fY |X(y|x) =1
n
n∑i=1
yi
Example: n = 10, observed y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1), xML = 7/10.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 30 / 40
![Page 130: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/130.jpg)
Statistical Inference: Example (Continuation)Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
How to express knowledge that (e.g.) X is around 1/2? Convenientchoice: conjugate prior. Form of the posterior = form of the prior.
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MAP: xMAP =α+
∑i yi−1
α+β+n−2
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMAP = 0.625(recall xML = 0.7
)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 31 / 40
![Page 131: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/131.jpg)
Statistical Inference: Example (Continuation)Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
How to express knowledge that (e.g.) X is around 1/2? Convenientchoice: conjugate prior. Form of the posterior = form of the prior.
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MAP: xMAP =α+
∑i yi−1
α+β+n−2
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMAP = 0.625(recall xML = 0.7
)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 31 / 40
![Page 132: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/132.jpg)
Statistical Inference: Example (Continuation)Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
How to express knowledge that (e.g.) X is around 1/2? Convenientchoice: conjugate prior. Form of the posterior = form of the prior.
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MAP: xMAP =α+
∑i yi−1
α+β+n−2
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMAP = 0.625(recall xML = 0.7
)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 31 / 40
![Page 133: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/133.jpg)
Statistical Inference: Example (Continuation)Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
How to express knowledge that (e.g.) X is around 1/2? Convenientchoice: conjugate prior. Form of the posterior = form of the prior.
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MAP: xMAP =α+
∑i yi−1
α+β+n−2
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMAP = 0.625(recall xML = 0.7
)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 31 / 40
![Page 134: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/134.jpg)
Statistical Inference: Example (Continuation)Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
How to express knowledge that (e.g.) X is around 1/2? Convenientchoice: conjugate prior. Form of the posterior = form of the prior.
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MAP: xMAP =α+
∑i yi−1
α+β+n−2
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMAP = 0.625(recall xML = 0.7
)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 31 / 40
![Page 135: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/135.jpg)
Statistical Inference: Example (Continuation)Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
How to express knowledge that (e.g.) X is around 1/2? Convenientchoice: conjugate prior. Form of the posterior = form of the prior.
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MAP: xMAP =α+
∑i yi−1
α+β+n−2
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMAP = 0.625(recall xML = 0.7
)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 31 / 40
![Page 136: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/136.jpg)
Statistical Inference: Example (Continuation)Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
How to express knowledge that (e.g.) X is around 1/2? Convenientchoice: conjugate prior. Form of the posterior = form of the prior.
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MAP: xMAP =α+
∑i yi−1
α+β+n−2
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMAP = 0.625(recall xML = 0.7
)Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 31 / 40
![Page 137: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/137.jpg)
Another Classical Statistical Inference Criterion
Consider that X ∈ R (continuous/estimation problem).
Adopt the squared error loss: L(x, x) = (x− x)2
Optimal Bayesian decision:
xBayes = arg minx
E[(x−X)2|Y = y]
= arg minx
x2 − 2 xE[X|Y = y]
= E[X|Y = y] ≡ xMMSE
MMSE = minimum mean squared error criterion.
Does not apply to classification problems.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 32 / 40
![Page 138: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/138.jpg)
Another Classical Statistical Inference Criterion
Consider that X ∈ R (continuous/estimation problem).
Adopt the squared error loss: L(x, x) = (x− x)2
Optimal Bayesian decision:
xBayes = arg minx
E[(x−X)2|Y = y]
= arg minx
x2 − 2 xE[X|Y = y]
= E[X|Y = y] ≡ xMMSE
MMSE = minimum mean squared error criterion.
Does not apply to classification problems.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 32 / 40
![Page 139: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/139.jpg)
Another Classical Statistical Inference Criterion
Consider that X ∈ R (continuous/estimation problem).
Adopt the squared error loss: L(x, x) = (x− x)2
Optimal Bayesian decision:
xBayes = arg minx
E[(x−X)2|Y = y]
= arg minx
x2 − 2 xE[X|Y = y]
= E[X|Y = y] ≡ xMMSE
MMSE = minimum mean squared error criterion.
Does not apply to classification problems.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 32 / 40
![Page 140: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/140.jpg)
Another Classical Statistical Inference Criterion
Consider that X ∈ R (continuous/estimation problem).
Adopt the squared error loss: L(x, x) = (x− x)2
Optimal Bayesian decision:
xBayes = arg minx
E[(x−X)2|Y = y]
= arg minx
x2 − 2 xE[X|Y = y]
= E[X|Y = y] ≡ xMMSE
MMSE = minimum mean squared error criterion.
Does not apply to classification problems.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 32 / 40
![Page 141: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/141.jpg)
Back to the Bernoulli Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MMSE: xMMSE =α+
∑i yi
α+β+n
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMMSE ' 0.611(recall that xMAP = 0.625, xML = 0.7
)
Conjugate prior equivalent to “virtual” counts;often called smoothing in NLP and ML.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 33 / 40
![Page 142: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/142.jpg)
Back to the Bernoulli Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MMSE: xMMSE =α+
∑i yi
α+β+n
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMMSE ' 0.611(recall that xMAP = 0.625, xML = 0.7
)Conjugate prior equivalent to “virtual” counts;often called smoothing in NLP and ML.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 33 / 40
![Page 143: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/143.jpg)
Back to the Bernoulli Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MMSE: xMMSE =α+
∑i yi
α+β+n
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMMSE ' 0.611(recall that xMAP = 0.625, xML = 0.7
)Conjugate prior equivalent to “virtual” counts;often called smoothing in NLP and ML.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 33 / 40
![Page 144: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/144.jpg)
Back to the Bernoulli Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MMSE: xMMSE =α+
∑i yi
α+β+n
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMMSE ' 0.611(recall that xMAP = 0.625, xML = 0.7
)Conjugate prior equivalent to “virtual” counts;often called smoothing in NLP and ML.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 33 / 40
![Page 145: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/145.jpg)
Back to the Bernoulli Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MMSE: xMMSE =α+
∑i yi
α+β+n
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMMSE ' 0.611(recall that xMAP = 0.625, xML = 0.7
)Conjugate prior equivalent to “virtual” counts;often called smoothing in NLP and ML.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 33 / 40
![Page 146: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/146.jpg)
Back to the Bernoulli Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MMSE: xMMSE =α+
∑i yi
α+β+n
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMMSE ' 0.611(recall that xMAP = 0.625, xML = 0.7
)
Conjugate prior equivalent to “virtual” counts;often called smoothing in NLP and ML.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 33 / 40
![Page 147: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/147.jpg)
Back to the Bernoulli Example
Observed n i.i.d. (independent identically distributed) Bernoulli RVs.
Likelihood:
fY |X(y1, ..., yn|x
)=
n∏i=1
xyi(1− x)1−yi = x∑
i yi (1− x)n−∑
i yi
I In our case, the Beta pdffX(x) ∝ xα−1(1− x)β−1, α, β > 0
I Posterior: fX|Y (x|y) =
xα−1+∑
i yi(1− x)β−1+n−∑
i yi
I MMSE: xMMSE =α+
∑i yi
α+β+n
I Example: α = 4, β = 4, n = 10,y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
xMMSE ' 0.611(recall that xMAP = 0.625, xML = 0.7
)Conjugate prior equivalent to “virtual” counts;often called smoothing in NLP and ML.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 33 / 40
![Page 148: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/148.jpg)
The Bernstein-Von Mises Theorem
In the previous example, we hadn = 10, y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1), thus
∑i yi = 7.
With a Beta prior with α = 4 and β = 4, we had
xML = 0.7, xMAP =3 +
∑i yi
6 + n= 0.625, xMMSE =
4 +∑
i yi8 + n
' 0.611
Consider n = 100, and∑
i yi = 70, with the same Beta(4,4) prior
xML = 0.7, xMAP =73
106' 0.689, xMMSE =
74
108' 0.685
... both Bayesian estimates are much closer to the ML.
This illustrates an important result in Bayesian inference: theBernstein-Von Mises theorem; under (mild) conditions,
limn→∞
xMAP = limn→∞
xMMSE = xML
message: if you have a lot of data, priors don’t matter much.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 34 / 40
![Page 149: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/149.jpg)
The Bernstein-Von Mises Theorem
In the previous example, we hadn = 10, y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1), thus
∑i yi = 7.
With a Beta prior with α = 4 and β = 4, we had
xML = 0.7, xMAP =3 +
∑i yi
6 + n= 0.625, xMMSE =
4 +∑
i yi8 + n
' 0.611
Consider n = 100, and∑
i yi = 70, with the same Beta(4,4) prior
xML = 0.7, xMAP =73
106' 0.689, xMMSE =
74
108' 0.685
... both Bayesian estimates are much closer to the ML.
This illustrates an important result in Bayesian inference: theBernstein-Von Mises theorem; under (mild) conditions,
limn→∞
xMAP = limn→∞
xMMSE = xML
message: if you have a lot of data, priors don’t matter much.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 34 / 40
![Page 150: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/150.jpg)
The Bernstein-Von Mises Theorem
In the previous example, we hadn = 10, y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1), thus
∑i yi = 7.
With a Beta prior with α = 4 and β = 4, we had
xML = 0.7, xMAP =3 +
∑i yi
6 + n= 0.625, xMMSE =
4 +∑
i yi8 + n
' 0.611
Consider n = 100, and∑
i yi = 70, with the same Beta(4,4) prior
xML = 0.7, xMAP =73
106' 0.689, xMMSE =
74
108' 0.685
... both Bayesian estimates are much closer to the ML.
This illustrates an important result in Bayesian inference: theBernstein-Von Mises theorem; under (mild) conditions,
limn→∞
xMAP = limn→∞
xMMSE = xML
message: if you have a lot of data, priors don’t matter much.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 34 / 40
![Page 151: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/151.jpg)
Important Inequalities
Cauchy-Schwartz’s inequality for RVs:
E(|X Y |) ≤√
E(X2)E(Y 2)
Recall that a real function g is convex if, for any x, y, and α ∈ [0, 1]
g(αx+ (1− α)y) ≤ αg(x) + (1− α)g(y)
Jensen’s inequality: if g is a real convex function, then
E(g(X)) ≥ g(E(X))
Examples: E(X)2 ≤ E(X2) ⇒ var(X) = E(X2)− E(X)2 ≥ 0.E(logX) ≤ logE(X), for X a positive RV.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 35 / 40
![Page 152: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/152.jpg)
Important Inequalities
Cauchy-Schwartz’s inequality for RVs:
E(|X Y |) ≤√
E(X2)E(Y 2)
Recall that a real function g is convex if, for any x, y, and α ∈ [0, 1]
g(αx+ (1− α)y) ≤ αg(x) + (1− α)g(y)
Jensen’s inequality: if g is a real convex function, then
E(g(X)) ≥ g(E(X))
Examples: E(X)2 ≤ E(X2) ⇒ var(X) = E(X2)− E(X)2 ≥ 0.E(logX) ≤ logE(X), for X a positive RV.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 35 / 40
![Page 153: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/153.jpg)
Important Inequalities
Cauchy-Schwartz’s inequality for RVs:
E(|X Y |) ≤√
E(X2)E(Y 2)
Recall that a real function g is convex if, for any x, y, and α ∈ [0, 1]
g(αx+ (1− α)y) ≤ αg(x) + (1− α)g(y)
Jensen’s inequality: if g is a real convex function, then
E(g(X)) ≥ g(E(X))
Examples: E(X)2 ≤ E(X2) ⇒ var(X) = E(X2)− E(X)2 ≥ 0.E(logX) ≤ logE(X), for X a positive RV.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 35 / 40
![Page 154: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/154.jpg)
Important Inequalities
Cauchy-Schwartz’s inequality for RVs:
E(|X Y |) ≤√
E(X2)E(Y 2)
Recall that a real function g is convex if, for any x, y, and α ∈ [0, 1]
g(αx+ (1− α)y) ≤ αg(x) + (1− α)g(y)
Jensen’s inequality: if g is a real convex function, then
E(g(X)) ≥ g(E(X))
Examples: E(X)2 ≤ E(X2) ⇒ var(X) = E(X2)− E(X)2 ≥ 0.E(logX) ≤ logE(X), for X a positive RV.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 35 / 40
![Page 155: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/155.jpg)
Entropy and all that...
Entropy of a discrete RV X ∈ {1, ...,K}: H(X) = −K∑x=1
fX(x) log fX(x)
Positivity: H(X) ≥ 0 ;H(X) = 0 ⇔ fX(i) = 1, for exactly one i ∈ {1, ...,K}.
Upper bound: H(X) ≤ logK ;H(X) = logK ⇔ fX(x) = 1/k, for all x ∈ {1, ...,K}
Measure of uncertainty/randomness of X
Continuous RV X, differential entropy: h(X) = −∫fX(x) log fX(x) dx
h(X) can be positive or negative. Example, iffX(x) = Uniform(x; a, b), h(X) = log(b− a).
If fX(x) = N (x;µ, σ2), then h(X) = 12 log(2πeσ2).
If var(Y ) = σ2, then h(Y ) ≤ 12 log(2πeσ2)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 36 / 40
![Page 156: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/156.jpg)
Entropy and all that...
Entropy of a discrete RV X ∈ {1, ...,K}: H(X) = −K∑x=1
fX(x) log fX(x)
Positivity: H(X) ≥ 0 ;H(X) = 0 ⇔ fX(i) = 1, for exactly one i ∈ {1, ...,K}.
Upper bound: H(X) ≤ logK ;H(X) = logK ⇔ fX(x) = 1/k, for all x ∈ {1, ...,K}
Measure of uncertainty/randomness of X
Continuous RV X, differential entropy: h(X) = −∫fX(x) log fX(x) dx
h(X) can be positive or negative. Example, iffX(x) = Uniform(x; a, b), h(X) = log(b− a).
If fX(x) = N (x;µ, σ2), then h(X) = 12 log(2πeσ2).
If var(Y ) = σ2, then h(Y ) ≤ 12 log(2πeσ2)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 36 / 40
![Page 157: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/157.jpg)
Entropy and all that...
Entropy of a discrete RV X ∈ {1, ...,K}: H(X) = −K∑x=1
fX(x) log fX(x)
Positivity: H(X) ≥ 0 ;H(X) = 0 ⇔ fX(i) = 1, for exactly one i ∈ {1, ...,K}.
Upper bound: H(X) ≤ logK ;H(X) = logK ⇔ fX(x) = 1/k, for all x ∈ {1, ...,K}
Measure of uncertainty/randomness of X
Continuous RV X, differential entropy: h(X) = −∫fX(x) log fX(x) dx
h(X) can be positive or negative. Example, iffX(x) = Uniform(x; a, b), h(X) = log(b− a).
If fX(x) = N (x;µ, σ2), then h(X) = 12 log(2πeσ2).
If var(Y ) = σ2, then h(Y ) ≤ 12 log(2πeσ2)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 36 / 40
![Page 158: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/158.jpg)
Entropy and all that...
Entropy of a discrete RV X ∈ {1, ...,K}: H(X) = −K∑x=1
fX(x) log fX(x)
Positivity: H(X) ≥ 0 ;H(X) = 0 ⇔ fX(i) = 1, for exactly one i ∈ {1, ...,K}.
Upper bound: H(X) ≤ logK ;H(X) = logK ⇔ fX(x) = 1/k, for all x ∈ {1, ...,K}
Measure of uncertainty/randomness of X
Continuous RV X, differential entropy: h(X) = −∫fX(x) log fX(x) dx
h(X) can be positive or negative. Example, iffX(x) = Uniform(x; a, b), h(X) = log(b− a).
If fX(x) = N (x;µ, σ2), then h(X) = 12 log(2πeσ2).
If var(Y ) = σ2, then h(Y ) ≤ 12 log(2πeσ2)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 36 / 40
![Page 159: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/159.jpg)
Entropy and all that...
Entropy of a discrete RV X ∈ {1, ...,K}: H(X) = −K∑x=1
fX(x) log fX(x)
Positivity: H(X) ≥ 0 ;H(X) = 0 ⇔ fX(i) = 1, for exactly one i ∈ {1, ...,K}.
Upper bound: H(X) ≤ logK ;H(X) = logK ⇔ fX(x) = 1/k, for all x ∈ {1, ...,K}
Measure of uncertainty/randomness of X
Continuous RV X, differential entropy: h(X) = −∫fX(x) log fX(x) dx
h(X) can be positive or negative. Example, iffX(x) = Uniform(x; a, b), h(X) = log(b− a).
If fX(x) = N (x;µ, σ2), then h(X) = 12 log(2πeσ2).
If var(Y ) = σ2, then h(Y ) ≤ 12 log(2πeσ2)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 36 / 40
![Page 160: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/160.jpg)
Entropy and all that...
Entropy of a discrete RV X ∈ {1, ...,K}: H(X) = −K∑x=1
fX(x) log fX(x)
Positivity: H(X) ≥ 0 ;H(X) = 0 ⇔ fX(i) = 1, for exactly one i ∈ {1, ...,K}.
Upper bound: H(X) ≤ logK ;H(X) = logK ⇔ fX(x) = 1/k, for all x ∈ {1, ...,K}
Measure of uncertainty/randomness of X
Continuous RV X, differential entropy: h(X) = −∫fX(x) log fX(x) dx
h(X) can be positive or negative. Example, iffX(x) = Uniform(x; a, b), h(X) = log(b− a).
If fX(x) = N (x;µ, σ2), then h(X) = 12 log(2πeσ2).
If var(Y ) = σ2, then h(Y ) ≤ 12 log(2πeσ2)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 36 / 40
![Page 161: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/161.jpg)
Entropy and all that...
Entropy of a discrete RV X ∈ {1, ...,K}: H(X) = −K∑x=1
fX(x) log fX(x)
Positivity: H(X) ≥ 0 ;H(X) = 0 ⇔ fX(i) = 1, for exactly one i ∈ {1, ...,K}.
Upper bound: H(X) ≤ logK ;H(X) = logK ⇔ fX(x) = 1/k, for all x ∈ {1, ...,K}
Measure of uncertainty/randomness of X
Continuous RV X, differential entropy: h(X) = −∫fX(x) log fX(x) dx
h(X) can be positive or negative. Example, iffX(x) = Uniform(x; a, b), h(X) = log(b− a).
If fX(x) = N (x;µ, σ2), then h(X) = 12 log(2πeσ2).
If var(Y ) = σ2, then h(Y ) ≤ 12 log(2πeσ2)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 36 / 40
![Page 162: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/162.jpg)
Entropy and all that...
Entropy of a discrete RV X ∈ {1, ...,K}: H(X) = −K∑x=1
fX(x) log fX(x)
Positivity: H(X) ≥ 0 ;H(X) = 0 ⇔ fX(i) = 1, for exactly one i ∈ {1, ...,K}.
Upper bound: H(X) ≤ logK ;H(X) = logK ⇔ fX(x) = 1/k, for all x ∈ {1, ...,K}
Measure of uncertainty/randomness of X
Continuous RV X, differential entropy: h(X) = −∫fX(x) log fX(x) dx
h(X) can be positive or negative. Example, iffX(x) = Uniform(x; a, b), h(X) = log(b− a).
If fX(x) = N (x;µ, σ2), then h(X) = 12 log(2πeσ2).
If var(Y ) = σ2, then h(Y ) ≤ 12 log(2πeσ2)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 36 / 40
![Page 163: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/163.jpg)
Kullback-Leibler divergence
Kullback-Leibler divergence (KLD) between two pmf:
D(fX‖gX) =
K∑x=1
fX(x) logfX(x)
gX(x)
Positivity: D(fX‖gX) ≥ 0D(fX‖gX) = 0 ⇔ fX(x) = gX(x), for x ∈ {1, ...,K}
KLD between two pdf:
D(fX‖gX) =
∫fX(x) log
fX(x)
gX(x)dx
Positivity: D(fX‖gX) ≥ 0D(fX‖gX) = 0 ⇔ fX(x) = gX(x), almost everywhere
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 37 / 40
![Page 164: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/164.jpg)
Kullback-Leibler divergence
Kullback-Leibler divergence (KLD) between two pmf:
D(fX‖gX) =
K∑x=1
fX(x) logfX(x)
gX(x)
Positivity: D(fX‖gX) ≥ 0D(fX‖gX) = 0 ⇔ fX(x) = gX(x), for x ∈ {1, ...,K}
KLD between two pdf:
D(fX‖gX) =
∫fX(x) log
fX(x)
gX(x)dx
Positivity: D(fX‖gX) ≥ 0D(fX‖gX) = 0 ⇔ fX(x) = gX(x), almost everywhere
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 37 / 40
![Page 165: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/165.jpg)
Kullback-Leibler divergence
Kullback-Leibler divergence (KLD) between two pmf:
D(fX‖gX) =
K∑x=1
fX(x) logfX(x)
gX(x)
Positivity: D(fX‖gX) ≥ 0D(fX‖gX) = 0 ⇔ fX(x) = gX(x), for x ∈ {1, ...,K}
KLD between two pdf:
D(fX‖gX) =
∫fX(x) log
fX(x)
gX(x)dx
Positivity: D(fX‖gX) ≥ 0D(fX‖gX) = 0 ⇔ fX(x) = gX(x), almost everywhere
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 37 / 40
![Page 166: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/166.jpg)
Kullback-Leibler divergence
Kullback-Leibler divergence (KLD) between two pmf:
D(fX‖gX) =
K∑x=1
fX(x) logfX(x)
gX(x)
Positivity: D(fX‖gX) ≥ 0D(fX‖gX) = 0 ⇔ fX(x) = gX(x), for x ∈ {1, ...,K}
KLD between two pdf:
D(fX‖gX) =
∫fX(x) log
fX(x)
gX(x)dx
Positivity: D(fX‖gX) ≥ 0D(fX‖gX) = 0 ⇔ fX(x) = gX(x), almost everywhere
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 37 / 40
![Page 167: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/167.jpg)
Mutual information
Mutual information (MI) between two random variables:
I(X;Y ) = D(fX,Y ‖fX fY
)
Positivity: I(X;Y ) ≥ 0I(X;Y ) = 0 ⇔ X,Y are independent.
MI is a measure of dependency between two random variables
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 38 / 40
![Page 168: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/168.jpg)
Mutual information
Mutual information (MI) between two random variables:
I(X;Y ) = D(fX,Y ‖fX fY
)Positivity: I(X;Y ) ≥ 0
I(X;Y ) = 0 ⇔ X,Y are independent.
MI is a measure of dependency between two random variables
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 38 / 40
![Page 169: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/169.jpg)
Mutual information
Mutual information (MI) between two random variables:
I(X;Y ) = D(fX,Y ‖fX fY
)Positivity: I(X;Y ) ≥ 0
I(X;Y ) = 0 ⇔ X,Y are independent.
MI is a measure of dependency between two random variables
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 38 / 40
![Page 170: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/170.jpg)
Other Stuff
Note covered, but also very important for machine learning:
Exponential families,
Basic inequalities (Markov, Chebyshev, etc...)
Stochastic processes (Markov chains, hidden Markov models,...)
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 39 / 40
![Page 171: Probability Theory Refresher - LxMLS 2020lxmls.it.pt/2016/Lecture_0.pdf · Probability Theory Refresher M ario A. T. Figueiredo Instituto Superior T ecnico & Instituto de Telecomunica˘c~oes](https://reader035.fdocuments.us/reader035/viewer/2022081406/5f0f42267e708231d44344f8/html5/thumbnails/171.jpg)
Recommended Reading (Probability and Statistics)
K. Murphy, “Machine Learning: A Probabilistic Perspective”, MITPress, 2012 (Chapter 2).
L. Wasserman, “All of Statistics: A Concise Course in StatisticalInference”, Springer, 2004.
Mario A. T. Figueiredo (IST & IT) LxMLS 2016: Probability Theory July 21, 2016 40 / 40