Lecture 18: Central Limit Theorem - Stanford...
Transcript of Lecture 18: Central Limit Theorem - Stanford...
Lecture 18:Central Limit TheoremLisa YanAugust 6, 2018
Announcements
PS5 due today
◦ Pain poll
PS6 out today
◦ Due next Monday 8/13 (1:30pm) (will not be accepted after Wed 8/15)
◦ Programming part: Java, C, C++, or Python
Lecture Notes published for all material
2
Summary from last time
! " ≥ $ ≤ & '( for all $ > 0 if " ≥ 0 (Markov’s inequality)
! |" − -| ≥ . ≤ /010 for all . > 0 (Chebyshev’s inequality)
if 2 " = -, Var " = 45
! " ≥ $ ≤ /0/06(0 for all $ > 0 (One-sided Chebyshev’s
if 2 " = 0, Var " = 45 inequality)
2 7 " ≥ 7 2 " if 7′′ 9 ≥ 0 for all x (Jensen’s inequality)
3
(and others)
Bounds on sample mean
Consider a sample of n i.i.d random variables X1, X2, ... Xn, whereXk has distribution F with E[Xk] = µ and Var(Xk) = s2
Sample mean !" = $%∑'($
% "'
) !" = * , Var !" = +,%
4
Summary from last time
Weak Law of Large Numberslim$→&' () − + ≥ - = 0
à “probability goes to zero in the limit”à Will still have a deviation in the limit (at most -)à We have an infinite number of non-zero probabilitiesStrong Law of Large Numbers
' lim$→& () = + = 1à Implies WLLNà After some number n, ' () − + ≥ - = 0à We have a finite number of non-zero probabilities
5
Where are we going with all of this?As engineers, we want to:
1. Model things that can be random.
2. Find average values of situations that let us make good decisions.We perform multiple experiments to help us with engineering.
Counts and averages are both sums of RVs.
6
(RVs)
(expectation)
Weeks 4 and 5:
Learning how to model multiple RVs
This week (week 6):• Finding expectation
of complex situations• Defining sampling and
using existing data• Mathematical bounds
w.r.t modeling the average
Week 7: bringing it all together
• Sampling + average= Central Limit Theorem
• Samples + modeling= finding the best model
parameters given data
Goals for today
Central Limit Theorem!◦Confidence intervals, re-defined◦ Estimates for sums of IID RVsIntroduction to Parameter estimation
7
Central Limit TheoremIf ! "# = %,Var "# = '(,
Sample means of IID RVs are normally distributed.
)" = 1+,#-.
/"# ∼ 1 %, '
(
+Sums of IID RVs are normally distributed.
,#-.
/"# ∼ 1 +%, +'(
8
CLT explains a lot
9
Normal approximationto Binomial when
np(1-p) > 100.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
70 80 90 100 110 120 130 140 150
P(X
= x)
x
!~Bin # = 200, ( = 0.6
A short history of the CLT
10
1700
1800
1900
2000
1733: CLT for X ~ Ber(1/2) postulated by Abraham de Moivre
1823: Pierre-Simon Laplace extends de Moivre’swork to approximating Bin(n,p) with Normal
1901: Alexandr Lyapunov provides precisedefinition and rigorous proof of CLT
2012: Drake releases “Started from the Bottom”• As of July 4th, he has a total of 190 songs• Mean quality of subsamples of songs is normally distributed
(thanks to the Central Limit Theorem)
Implications of CLT
Anything that is a sum/average of independent random variables is normal
…meaning in real life, many things are normally distributed:◦ Exam scores: sum of individual problems◦Movie ratings: averages of independent viewer scores◦ Polling:◦ Ask 100 people if they will vote for candidate X◦ p1 = # “yes”/100
◦ Sum of Bernoulli RVs (each person independently says “yes” w.p. p)
◦ Repeat this process with different groups to get p1, …, pn (different sample statistics)
◦ Normal distribution over sample means pk
◦ Confidence interval: “How likely is it that an estimate for true p is close?”
11
No more convolution when ! → ∞!
For independent discrete random variables X and Y, and Z = X + Y,
$% & = ( ) + + = & =,-$.(0)$2(& − 0) =,
4$.(& − 5)$2(5)
pZ is defined as the convolution of pX and pY.
Sums of IID RVs are normally distributed.
,678
9)6 ∼ ; !<, !>?
12
(caveat: must be independent andidentically distributed)
!
CLT vs. Water speed statWater Pokemon speed statistic, !" :• # !" = 65, Var !" = 510• 126 Water Pokemon• Let Y = average of 50 Water Pokemon.What is the distribution of Y?
13
* = +! = ,-∑"/,
- !"# * = # !" = 65Var * = Var 01
- = 2,323 = 10.2
+! = 167"/,
-!" ∼ 9 :, ;
<
6
Solution:
* ∼ 9 65,10.2
!
Confidence interval, fully defined
Consider a sample of IID RVs X1, X2, …, Xn, where n is large,
"# = %&∑()%
& #(, Var "# = *+& ,- = %
&.%∑()%& #( − "# -
, SE = 2&
def For large n, a 100(1–α)% confidence interval of 3 is defined as:
"# − 45/- 7 SE , "# + 45/- 7 SE"# ± 45/- 7 (SE)
where Φ(45/-) = 1 − (>/2).
ex: 95% confidence interval: "# ± 1.96(SE)> = 0.05, >/2 = 0.025 Φ 45/- = 0.975, 45/- = 1.96In other words: 95% of time that we sample, true 3 will be in interval "# ± 1.96(SE)
NOT: 3 is 95% likely to be in this particular interval. 14
!
SE with confidence intervalIdle CPUs are the bane of our existence.• A large company monitors 225 computers (single CPU) for idle hours in order to
estimate the average number of idle hours per CPU.• Sample had !" = 11.6 hrs, '( = 16.81 hrs2 (so ' = 4.1 hrs)
Give the 90% confidence interval for the estimate of +, mean idle hrs/CPU.
15
, = 0.10, ,/2 = 0.05 Φ 34/( = 0.95, 34/( = 1.645Standard error, SE = 8
9 =:.;((< =
:.;;<
90% confidence interval: !" ± 34/((SE)à 11.6 ± 1.645 :.;;<
11.15, 12.05Interpret: 90% of time that such an interval is computed, the true + is it.
Solution:
BreakAttendance: t inyurl.com/cs109summer2018
16
Sampling distribution
CLT tells us sampling distribution of !" is approx. normal when n is large.
“large” n: n > 30
(larger is better): n > 100
We can also use CLT to decide values of n for good confidence intervals
17http://onlinestatbook.com/stat_sim/sampling_dist/
!
Estimating clock running timeWant to test runtime of algorithm.• Run algorithm n times and measure average time• Know variance is !" = 4 sec2
• Want to estimate mean runtime: % = &How large should n be so that P(average time is within 0.5 of t) = 0.95?
18
'( = )*∑,-)
* (, ∼ / &, 1* (by Central Limit Theorem)
2 & − 0.5 ≤ '( ≤ & + 0.5 = 0.95= 2 −0.5 ≤ '( − & ≤ 0.5 ( '( − &) ∼ / 0, 1*= 2 −<.= *
" ≤ > ≤ <.= *"
'( = 1@A,-)
*(, ∼ / %, !
"
@
Solution:
(SD: "*)
> ='( − &2/ @~/(0,1)
!
Estimating clock running timeWant to test runtime of algorithm.• Run algorithm n times and measure average time• Know variance is !" = 4 sec2
• Want to estimate mean runtime: % = &
How large should n be so that P(average time is within 0.5 of t) = 0.95?
19
' & − 0.5 ≤ -. ≤ & + 0.5 = ' −0.5 ≤ -. − & ≤ 0.5 = ' −0.1 2
"≤ 3 ≤
0.1 2
"= 0.95
= Φ2
6− Φ
7 2
6= Φ
2
6− 1 − Φ
2
6= 2Φ
2
6− 1 = 0.95
⇒ Φ2∗
6= 0.975 Solve for n*:
2∗
6= 1.96 ⇒ >∗ = 7.84 " = 62
What say you, Chebyshev?
' -. − & ≤ 0.5 ≥ 0.95 ⇒ ' -. − & ≥ 0.5 ≤6/2
0.1 B ≤ .05
16/> ≤ 0.05 ⇒ > ≥ 320
-. =1
>DEFG
2
.E ∼ I %,!"
>
Solution:
-. =G
2∑EFG2 .E ∼ I &,
6
2
Thanks for playing, Pafnuty!
The power of the central limit theorem
Sample means of IID RVs are normally distributed.!" = 1
%&'()
*"' ∼ , -, /
0
%• Can estimate confidence interval• Can give probabilities on sample mean
Sums of IID RVs are normally distributed.
&'()
*"' ∼ , %-, %/0
• Can estimate any sum of IID RVs (if -, /0 known)
20
our nextgoal
!
Crashing websiteLet X = number of visitors to a website/minute: X~Poi(100)• The server crashes if ≥ 120 requests/minute.What is P(crash in next minute)?
21
Recall: sum of IID Poissons is PoissonCLT says sums are normally distributed (as n à ∞)
Let:
" # ≥ 120 ≈ " ) ≥ 119.5 = " ./011011 ≥ 002.3/011
011 = 1 − Φ 1.95 ≈ 0.0256
Solution 1: (exact, using Poisson)
Solution 2: (using CLT)
7890
:
#8 ∼ < =>, =@A
" # ≥ 120 = 7B90A1
CD/011 100 B
E!
Poi(100)~7B90
:
Poi(100/=) Using CLT on discrete sum à continuity correction
Attempted solution 3: (using one-sided Chebyshev)
" # ≥ 120 = " # ≥ J # + L ≤@A
@A + LA =100
100 + 20A = 0.2
≈ 0.0282
!
possible
continuity correction
Summary: Approximations
22
Binomial:n trials
P(success) = p,Independent trials
Normal:p medium, np(1–p)>10
Independent trials
Poisson:p < 0.05, n > 20 AND/OR
slight dependence
continuity
correction
Poisson:Poisson = sum of IID Poissons
where n is arbitrarily large Normal:(by Central Limit Theorem)
Sum of IID RVswith known E[X], Var(X)
continuity correction
!
Another dice gameYou will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10
Win if: X ≤ 25 or X ≥ 45What is P(win)?
23
Roll!!!!!!!
And now the truth (according to CLT)…
!
Another dice game
24
WTF: ! " ≤ 25 or " ≥ 45 = 1 − !(25 < " < 45)
By CLT, " = ∑/012 "/ approx. 3 45, 478 where E "/ = 5 = 3.5,Var "/ = 78 = <=18
" ∼ 3 10 3.5 = 35, 10 35/12 = 350/12
1 − ! 25 < " < 45 ≈ 1 − ! 25.5 ≤ " ≤ 44.5
= 1 − ! 8=.=B<=<=C/18
≤ DB<=<=C/18
≤ EE.=B<=<=C/18
≈ 1 − 2Φ 1.76 − 1 ≈ 2 1 − 0.9608 = 0.0784
Solution:
continuity correction
You will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10
Win if: X ≤ 25 or X ≥ 45What is P(win)?
!
Another dice game
25
! " ≤ 25 or " ≥ 45 = 1 − !(25 < " < 45) = 0.0784Solution:
You will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10
Win if: X ≤ 25 or X ≥ 45What is P(win)?
Finding Statistics
26
What do you know? What statistic
can you find?
Distribution
with parameters
Anything your
heart desiresRVs
Sample of
populationBootstrapping
Estimates of
E[X] or Var(X),
confidence
interval, p-values
E[X] or
Var(X)
Probability
bounds
CLT
Sum of IID RVs?
Now…
27
What do you know? What statisticcan you find?
Distributionwith parameters
Anything yourheart desiresRVs
What if you don’t knowthe parameters of your distribution?
!
Parameter estimationdef parametric model – probability distribution that has parameters θ.
28
Distribution
• Ber(p)
• Poi(λ)
• Multinomial(p1, p2, …, pm)
• Unif(α, β)
• Normal(µ, σ2)
• etc.
Model
Bernoulli
Poisson
Multinomial
Uniform
Normal
Parameters θ! = #! = $
! = #%, #', … , #)! = *, +! = ,, -'
Note θ can bea vector of parameters!
“Point estimate” of parameter:Find the best single value for parameter estimate (as opposed to distribution of !")• Better understand of process producing data• Future predictions based on model• Simulation of future processes
Why do we care?
def Parameter estimation
1. We observe data that has a known model with unknown parameter ".
2. Use data to estimate model parameters as !".Note: estimators !" are RVs estimating parameter "
(e.g., sample mean: !" = $% estimates population mean " = &)
29
Machine Learning:Learn model
of systemDo stuff
with model
!
Estimator biasdef bias of estimator: ! "# − #• “unbiased” estimator: bias = 0
Example Sample MeanEstimator: '( = )
*∑,-)* (, Estimating: .
Bias: ! '( = .Example 2 Sample Variance
Estimator: /0 = )*1)∑2-)
* (2 − '( 0 Estimating: 30
Bias: ! /0 = 30
30
(unbiased estimator)
(unbiased estimator)
!
Estimator consistencydef consistent estimator: lim$→&' () − ) < , = 1 for , > 0• As we get more data, the estimate () should deviate from the true )
by at most a small amount. • Actually known as “weak” consistency
Example Sample MeanEstimator: 12 = 3
$∑563$ 25 Estimating: 7
By strong law of large numbers (SLLN): ' lim$→& 12 = 7 = 1Implying weak law of large numbers (WLLN): lim$→&' 12 − 7 ≥ , = 0 for , > 0Equivalently: lim$→&' 12 − 7 < , = 1 for , > 0
31(consistent estimator)
!
Sample variance consistencydef biased estimator: ! "# − # ≠ 0def consistent estimator: lim*→,- "# − # < / for / > 0
As 1 → ∞, estimate "# should deviate from true # by at most a small amount.
Are the below two estimators for variance 34 biased? Consistent?
1. 54 = 7*87∑:;7
* <: − =< 4 (sample variance)
unbiased: ! 54 = 34consistent: 54 = *
*877* ∑:;7
* <:4 − =<4 ,
- lim*→,7*∑:;7
* <:4 = ! <4 = 1, - lim*→, =<4 = @4 = 1 (by SLLN, <: i.i.d.)
2. A = 7*∑:;7
* <: − <̄ 4
biased: ! A = ! *87* 54 = *87
* 34consistent: for similar reasons as above #1 , where now lim*→,
*87* = 1 32