Post on 06-Mar-2018
Probability and StatisticsPart 1. Probability Concepts and Limit Theorems
Chang-han Rhee
Stanford University
Sep 19, 2011 / CME001
1
Outline
Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation
Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem
2
Outline
Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation
Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem
3
Probability of an Eventin a random experiment
Relative frequency of an event, when repeating a random experiment.
e.g. coin flip, dice roll, roulette
4
Sample SpaceSet of all possible outcomes.
I Single coin flipΩ = H,T
I Two coin flipsΩ = (H,H), (H,T), (T,H), (T,T)
I Single dice rollΩ = 1, 2, 3, 4, 5, 6
I Two dice rollsΩ = (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)
5
Event
Subset of sample space.I Single coin flip : The event that the coin lands head
A = H
I Two coin flips : The event that the first coin lands head
A = (H,H), (H,T)
I Single dice roll : The event that the dice falls on an odd number
A = 1, 3, 5
I Two dice roll : The event that the sum is 4
A = (1, 3), (2, 2), (3, 1)
6
Ω = (H,H), (H,T), (T,H), (T,T)
Sample Space
Event: first coin lands on head
Outcome: both coin lands on tail
7
Probability
DefinitionA set function P is called a probability if
I 0 ≤ P(A) ≤ 1 for each event AI P(Ω) = 1 (Unitarity)I For each sequence A1,A2, . . . of mutually disjoint events
P
(∞∪1
Ai
)=
∞∑1
P(Ai) (Countable Additivity)
8
Back to Examples
I Fair CoinP(∅) = 0
P(H) = 1/2
P(T) = 1/2
P(H,T) = 1
I Biased Coin ( p ∈ [0, 1] )P(∅) = 0
P(H) = p
P(T) = 1 − p
P(H,T) = 1
9
Outline
Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation
Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem
10
Random Variables
A random variable is a function from a sample space to a real number.e.g.
I Winnning in single coin flip
X(H) = 1
X(T) = −1
I First roll, second roll, and sum of two dice
X(i, j) = i
Y(i, j) = j
Z(i, j) = i + j
11
Discrete Random VariablesA discrete random variable X assumes values in discrete subset S of R.
The distribution of a discrete random variable is completely describedby a probability mass function pX : R → [0, 1] such that
P(X = x) = pX(x)
e.g.I [Bernoulli] X ∼ Ber(p) if X ∈ 0, 1 and
P(X = 1) = 1 − P(X = 0) = pi.e.,
pX(1) = p and pX(0) = 1 − p
I [Binomial] X ∼ Bin(n, p) if X ∈ 0, 1, . . . , n and
pX(k) =(
nk
)pk(1 − p)n−k
12
Continuous Random VariablesA continuous random variable X assumes values in R.
The distribution of continuous random variables is completelydescribed by a probability density function fX : R → R+ such that
P(a ≤ X ≤ b) =∫ b
afX(x)dx
e.g.I [Uniform] X ∼ Unif (a, b), a < b if
fX(x) = 1
b−a a ≤ x ≤ b0 o.w.
I [Gaussian/Normal] X ∼ N(µ, σ2), µ ∈ R, σ2 > 0 if
fX(x) =1√
2πσ2e−
(x−µ)2
2σ2
13
Probability Distribution∗
Each random variable induces another probability PX : 2R → [0, 1] onreal line through the following:
PX((−∞, x]) := P(X ≤ x)
We often denote the distribution function with FX:
FX(x) := P(X ≤ x)
[NOTATION] The right-hand sides of the previous displays areshorthand notation for the following:
P(X ≤ x) := P(ω ∈ Ω : X(ω) ≤ x)
14
Note: Distribution can be identical even if the supporting probabilityspace is different.
e.g.X(H) = 1
X(T) = −1
Y(i) =
1 if i is odd−1 if i is even
15
Joint Distribution
Two random variables X and Y induce a probabillity PX,Y on R2:
PX,Y((−∞, x]× (−∞, y]) = P(X ≤ x,Y ≤ y)
A collection of random variables X1,X2, . . . ,Xn induce a probabillityPX1,...,Xn on Rn:
PX1,...,Xn((−∞, x1]× · · · × (−∞, xn]) = P(X1 ≤ x1, · · · ,Xn ≤ xn)
16
Joint distribution of two discrete random variables X and Y assumingvalues in SX and SY can be completely described by joint probabilitymass function pX,Y : R× R → [0, 1] such that
P(X = x,Y = y) = pX,Y(x, y)
Joint distribution of two continuous random variables X and Y can becompletely described by joint probability density functionfX,Y : R× R → R+ such that
P(X ≤ x,Y ≤ y) =∫ x
−∞
∫ y
−∞fX,Y(x, y)dydx
17
Outline
Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation
Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem
18
Expectation
For discrete random variable X, the expectation of X is
E[X] =∑x∈S
x pX(x)
For continuous random variable Y , the expectation of Y is
E[Y] =∫ ∞
−∞y fY(y)dy
19
Computation of Expectation
We can also compute the expectation of g(X) and g(Y) as follows:
E[g(X)] =∑x∈S
g(x)pX(x)
and
E[g(Y)] =∫ ∞
−∞g(x)pY(x)dx
20
Properties of Expectation
I LinearityEaX + bY = aEX + bEY
I MonotonocityX ≤ Y =⇒ EX ≤ EY
21
Probability as an Expectatoin
[NOTATION] We denote the indicator function of A with IA(·)
IA(ω) =
1 if ω ∈ A0 if ω /∈ A
Probability can be written as an expectation:
PX(A) = E IA(X)
More generally,P(A) = E IA
22
Summary Statistics
I MeanE[X]
I Variance
var(X) = E[(X − EX)2]
= EX2 − (EX)2
I Standard Deviation
σ(X) =√
var(X)
23
Outline
Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation
Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem
24
Conditoinal Probability
The conditional probability of A given B is defined as
P(A|B) = P(A ∩ B)P(B)
25
Conditional Probability Mass and Density
If X and Y are both discrete random variables with joint probabilitymass function pX,Y(x, y),
P(X = x|Y = y) = pX|Y(x|y) :=pX,Y(x, y)
pX(y)
If X and Y are both continuous random variables with joint densityfunction fX,Y(x, y),
P(a ≤ X ≤ b|Y = y) =∫ b
afX|Y(x|y)dx
where
fX|Y(x|y) =fX,Y(x, y)
fY(y)
26
Independence
Two events A and B are independent if
P(A ∩ B) = P(A)P(B)
Two random variables X and Y are independent if
P(X ≤ x,Y ≤ y) = P(X ≤ x)P(Y ≤ y)
27
Conditional ExpectationDiscrete Random Variable
For discrete random variables X and Y , the conditional expectation ofX given Y = y is
E[X|Y = y] =∑x∈S
x pX|Y(x|y) =∞∑
x∈S
xP(X = x,Y = y)
P(Y = y)
28
Conditional ExpectationContinuous Random Variable
For continuous random variables X and Y , the conditional expectationof X given Y = y is
E[X|Y = y] =∫ ∞
−∞x fX|Y(x|y)dx =
∫ ∞
−∞x
fX,Y(x, y)fY(y)
29
Properties of Conditional Expectation
I LinearityE[aX + bY|Z] = aE[X|Z] + bE[Y|Z]
I Monotonocity
X ≤ Y =⇒ E[X|Z] ≤ E[Y|Z]
30
Outline
Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation
Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem
31
Almost Sure Convergence
Let X1,X2, . . . be a sequence of random variables. We say that Xn
converges almost surely to X∞ as n → ∞ if
P(Xn → X∞ as n → ∞) = 1
We use the notation Xna.s.→ X∞ to denote almost sure convergence, or
convergence with probability 1.
32
Lp Convergence
[NOTATION] For p > 0, we denote p-norm of X with ∥ · ∥p
∥X∥p := (E|X|p)1/p
Let X1,X2, . . . be a sequence of random variables. For p > 0, we saythat Xn converges to X∞ in pth mean if
∥Xn − X∞∥p → 0
as n → ∞.
We use the notation XnLp
→ X∞ to denote convergence in pth mean, orLp convergence.
33
Convergence in Probability
Let X1,X2, . . . be a sequence of random variables. We say that Xn
converges in probability to X∞ if for each ϵ > 0,
P(|Xn − X∞| > ϵ) → 0
as n → ∞.
We use the notation Xnp→ X∞ to denote convergence in probability.
34
Weak Convergence
Let X1,X2, . . . be a sequence of random variables. We say that Xn
converges weakly to X∞ if
P(Xn ≤ x) → P(X∞ ≤ x)
as n → ∞ for each x at which P(X∞ ≤ ·) is continuous.
We use the notation Xn ⇒ X∞ or XnD→ X∞ to denote convergence in
probability or convergence in distribution.
35
Implications
Almost Sure Convergence Lp Convergence
Convergence in Probability
Weak Convergence
36
Outline
Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation
Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem
37
Weak Law of Large Numbers
Theorem (Weak Law of Large Numbers)Suppose that X1,X2, · · · is a sequence of i.i.d. r.v.-s such thatE|X1| < ∞. Then,
1n(X1 + · · ·+ Xn)
P→ EX1 as n → ∞
38
Strong Law of Large Numbers
Theorem (Strong Law of Large Numbers)Suppose that X1,X2, · · · is a sequence of i.i.d. r.v.-s such that EX1exists. Then,
1n(X1 + · · ·+ Xn)
a.s.→ EX1 as n → ∞
39
Outline
Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation
Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem
40
Central Limit Theorem
TheoremSuppose that the Xi’s are iid rv’s with common finite variance σ2.Then, if Sn = X1 + · · ·+ Xn,
Sn − nE(X1)√n
⇒ σN(0, 1)
as n → ∞.
From here, we can deduce the following approximation:
1n
Sn − EX1D≈ 1√
nN(0, 1)
41