Probability basics - Home Page | NYU Courantcfgranda/pages/DSGA1002... · Probability basics DS GA...
Transcript of Probability basics - Home Page | NYU Courantcfgranda/pages/DSGA1002... · Probability basics DS GA...
Probability basics
DS GA 1002 Probability and Statistics for Data Sciencehttp://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17
Carlos Fernandez-Granda
Probability spaces
Conditional probability
Independence
General approach
Probabilistic modeling
1. Model phenomenon of interest as an experiment with several (possiblyinfinite) mutually exclusive outcomes
2. Group these outcomes in sets called events
3. Assign probabilities to the different events
Probability space
A probability space is a triple (Ω,F ,P) consisting of
I A sample space Ω, which contains all possible outcomes of theexperiment
I A set of events F , which must be a σ-algebra
I A probability measure P that assigns probabilities to the events in F
Sample space
Sample spaces can be
I Discrete: coin toss, score of a basketball game, number of people thatshow up at a party . . .
I Continuous: intervals of R or Rn used to model time, position,temperature, . . .
σ-algebra
A σ-algebra F is a collection of sets in Ω such that
1. If a set S ∈ F then Sc ∈ F
2. If the sets S1,S2 ∈ F , then S1 ∪ S2 ∈ FAlso infinite sequences; if S1,S2, . . . ∈ F then ∪∞i=1Si ∈ F
3. Ω ∈ F
Basketball game
I Cleveland Cavaliers are playing the Golden State Warriors
I Sample space
Ω := Cavs 1−Warriors 0,Cavs 0−Warriors 1, . . . ,Cavs 101−Warriors 97, . . ..
I Several possible σ-algebras
I If we want high granularity we can choose the power set of scores
I If we only care who wins
F := Cavs win,Warriors win,Cavs or Warriors win, ∅
Probability measure
Function over the sets in F such that
1. P (S) ≥ 0 for any event S ∈ F
2. If S1,S2 ∈ F are disjoint then
P (S1 ∪ S2) = P (S1) + P (S2)
Also countably infinite sequences of disjoint sets: S1,S2, . . . ∈ F
P(limn→∞
∪ni=1Si
)= lim
n→∞
n∑i=1
P (Si )
3. P (Ω) = 1
Properties of a probability measure
I P (∅) = 0
I If A ⊆ B then P (A) ≤ P (B)
I P (A ∪ B) = P (A) + P (B)− P (A ∩ B)
I To simplify notation, we set
P (A,B,C ) := P (A ∩ B ∩ C )
Important
I Probability measure only assigns probabilities to events in theσ-algebra
I Simpler σ-algebras can make our life easy
P (Cavs win) =12
P (Warriors win) =12
P (Cavs or Warriors win) = 1
P (∅) = 0
Probability spaces
Conditional probability
Independence
Definition
The conditional probability of an event S ′ ∈ F given S is
P(S ′|S
):=
P (S ′ ∩ S)
P (S)
P (·|S) is a valid probability measure
To simplify notation, we set
P (S |A,B,C ) := P (S |A ∩ B ∩ C )
Example: Flights and rain
Probabilistic model for late arrivals at an airport
Ω = late and rain, late and no rain,on time and rain, on time and no rain
F = power set of Ω,
P (late, no rain) =220, P (on time, no rain) =
1420,
P (late, rain) =320, P (on time, rain) =
120
Example: Flights and rain
P (late|rain)
=P (late, rain)
P (rain)
=P (late, rain)
P (late, rain) + P (on time, rain)
=34
Example: Flights and rain
P (late|rain) =P (late, rain)
P (rain)
=P (late, rain)
P (late, rain) + P (on time, rain)
=34
Example: Flights and rain
P (late|rain) =P (late, rain)
P (rain)
=P (late, rain)
P (late, rain) + P (on time, rain)
=34
Example: Flights and rain
P (late|rain) =P (late, rain)
P (rain)
=P (late, rain)
P (late, rain) + P (on time, rain)
=34
Chain rule
For any pair of events A and B
P (A ∩ B) = P (A) P (B|A) = P (B) P (A|B)
For any sequence of events S1,S2, S3, . . .
P (∩iSi ) = P (S1) P (S2|S1) P (S3|S1 ∩ S2) . . .
=∏i
P(Si | ∩i−1
j=1 Sj
)
Law of Total Probability
If A1,A2, . . . ∈ F is a partition of Ω
I Ai and Aj are disjoint if i 6= j
I Ω = ∪iAi
For any set S ∈ F
P (S) =∑i
P (S ∩ Ai ) =∑i
P (Ai ) P (S |Ai )
Example: Flights and rain (continued)
P (rain) = 0.2P (late|rain) = 0.75
P (late|no rain) = 0.125
P (late)
= P (rain) P (late|rain) + P (no rain) P (late|no rain)
= 0.2 · 0.75 + 0.8 · 0.125 = 0.25
Example: Flights and rain (continued)
P (rain) = 0.2P (late|rain) = 0.75
P (late|no rain) = 0.125
P (late) = P (rain) P (late|rain) + P (no rain) P (late|no rain)
= 0.2 · 0.75 + 0.8 · 0.125 = 0.25
Example: Flights and rain (continued)
P (rain) = 0.2P (late|rain) = 0.75
P (late|no rain) = 0.125
P (late) = P (rain) P (late|rain) + P (no rain) P (late|no rain)
= 0.2 · 0.75 + 0.8 · 0.125 = 0.25
Important!
P (A|B) 6= P (B|A)
Bayes’ Rule
Let A1,A2, . . . ∈ F be a partition of Ω
For any set S ∈ F
P (Ai |S) =P (Ai ) P (S |Ai )∑j P (S |Aj) P (Aj)
Example: Flights and rain (continued)
P (rain) = 0.2P (late|rain) = 0.75P (late|no rain) = 0.125
P (rain|late) ?
Example: Flights and rain (continued)
P (rain|late)
=P (rain, late)
P (late)
=P (late|rain) P (rain)
P (late|rain) P (rain) + P (late|no rain) P (no rain)
=0.75 · 0.2
0.75 · 0.2 + 0.125 · 0.8= 0.6
Example: Flights and rain (continued)
P (rain|late) =P (rain, late)
P (late)
=P (late|rain) P (rain)
P (late|rain) P (rain) + P (late|no rain) P (no rain)
=0.75 · 0.2
0.75 · 0.2 + 0.125 · 0.8= 0.6
Example: Flights and rain (continued)
P (rain|late) =P (rain, late)
P (late)
=P (late|rain) P (rain)
P (late|rain) P (rain) + P (late|no rain) P (no rain)
=0.75 · 0.2
0.75 · 0.2 + 0.125 · 0.8= 0.6
Example: Flights and rain (continued)
P (rain|late) =P (rain, late)
P (late)
=P (late|rain) P (rain)
P (late|rain) P (rain) + P (late|no rain) P (no rain)
=0.75 · 0.2
0.75 · 0.2 + 0.125 · 0.8= 0.6
Probability spaces
Conditional probability
Independence
Definition
Two sets A,B are independent if
P (A|B) = P (A)
or equivalently
P (A ∩ B) = P (A) P (B)
Conditional independence
A,B are conditionally independent given C if
P (A|B,C ) = P (A|C )
where P (A|B,C ) := P (A|B ∩ C ), or equivalently
P (A ∩ B|C ) = P (A|C ) P (B|C )
Conditional independence does not imply independence
Probabilistic model for taxi availability, flight delay and weather
P (rain) = 0.2P (late|rain) = 0.75P (late|no rain) = 0.125P (taxi|rain) = 0.1P (taxi|no rain) = 0.6
Given rain and no rain, late and taxi are conditionally independent
Are they also independent? P (taxi) = P (taxi|late)?
Conditional independence does not imply independence
P (taxi)
= P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late)
=P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Conditional independence does not imply independence
P (taxi) = P (taxi|rain) P (rain) + P (taxi|no rain) P (no rain)
= 0.1 · 0.2 + 0.6 · 0.8 = 0.5
P (taxi|late) =P (taxi, late)
P (late)
=P (taxi, late, rain) + P (taxi, late, no rain)
P (late)
=P (t|l, r) P (l|r) P (r) + P (t|l, no r) P (l|no r) P (no r)
P (l)
=P (taxi|r) P (late|r) P (r) + P (taxi|no r) P (late|no r) P (no r)
P (late)
=0.1 · 0.75 · 0.2 + 0.6 · 0.125 · 0.8
0.25= 0.3
They are not independent
Independence does not imply conditional independence
Probabilistic model for mechanical problems, weather and delays
P (rain) = 0.2P (late|rain) = 0.75P (late|no rain) = 0.125P (problem) = 0.1P (late|problem) = 0.7P (late|no problem) = 0.2P (late|no rain, problem) = 0.5
problem and no rain are independent
Are they also conditionally independent given late?P (problem|late, no rain) = P (problem|late) ?
Independence does not imply conditional independence
P (problem|late)
=P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain) =P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4
Independence does not imply conditional independence
P (problem|late) =P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain) =P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4
Independence does not imply conditional independence
P (problem|late) =P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain) =P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4
Independence does not imply conditional independence
P (problem|late) =P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain) =P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4
Independence does not imply conditional independence
P (problem|late) =P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain)
=P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4
Independence does not imply conditional independence
P (problem|late) =P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain) =P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4
Independence does not imply conditional independence
P (problem|late) =P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain) =P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4
Independence does not imply conditional independence
P (problem|late) =P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain) =P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4
Independence does not imply conditional independence
P (problem|late) =P (late, problem)
P (late)
=P (late|p) P (p)
P (late|p) P (p) + P (late|no p) P (no p)
=0.7 · 0.1
0.7 · 0.1 + 0.2 · 0.9= 0.28
P (problem|late, no rain) =P (late, no rain, problem)
P (late, no rain)
=P (late|no rain, p) P (no rain|p) P (p)
P (late|no rain) P (no rain)
=P (late|no rain, problem) P (no rain) P (problem)
P (late|no rain) P (no rain)
=0.5 · 0.10.125
= 0.4