Lecture 18: Central Limit Theorem - Stanford...

Lecture 18:Central Limit TheoremLisa YanAugust 6, 2018

Announcements

PS5 due today

◦ Pain poll

PS6 out today

◦ Due next Monday 8/13 (1:30pm) (will not be accepted after Wed 8/15)

◦ Programming part: Java, C, C++, or Python

Lecture Notes published for all material

2

Summary from last time

! " ≥ $ ≤ & '( for all $ > 0 if " ≥ 0 (Markov’s inequality)

! |" − -| ≥ . ≤ /010 for all . > 0 (Chebyshev’s inequality)

if 2 " = -, Var " = 45

! " ≥ $ ≤ /0/06(0 for all $ > 0 (One-sided Chebyshev’s

if 2 " = 0, Var " = 45 inequality)

2 7 " ≥ 7 2 " if 7′′ 9 ≥ 0 for all x (Jensen’s inequality)

3

(and others)

Bounds on sample mean

Consider a sample of n i.i.d random variables X1, X2, ... Xn, whereXk has distribution F with E[Xk] = µ and Var(Xk) = s2

Sample mean !" = $%∑'($

% "'

) !" = * , Var !" = +,%

4

Summary from last time

Weak Law of Large Numberslim$→&' () − + ≥ - = 0

à “probability goes to zero in the limit”à Will still have a deviation in the limit (at most -)à We have an infinite number of non-zero probabilitiesStrong Law of Large Numbers

' lim$→& () = + = 1à Implies WLLNà After some number n, ' () − + ≥ - = 0à We have a finite number of non-zero probabilities

5

Where are we going with all of this?As engineers, we want to:

1. Model things that can be random.

2. Find average values of situations that let us make good decisions.We perform multiple experiments to help us with engineering.

Counts and averages are both sums of RVs.

6

(RVs)

(expectation)

Weeks 4 and 5:

Learning how to model multiple RVs

This week (week 6):• Finding expectation

of complex situations• Defining sampling and

using existing data• Mathematical bounds

w.r.t modeling the average

Week 7: bringing it all together

• Sampling + average= Central Limit Theorem

• Samples + modeling= finding the best model

parameters given data

Goals for today

Central Limit Theorem!◦Confidence intervals, re-defined◦ Estimates for sums of IID RVsIntroduction to Parameter estimation

7

Central Limit TheoremIf ! "# = %,Var "# = '(,

Sample means of IID RVs are normally distributed.

)" = 1+,#-.

/"# ∼ 1 %, '

(

+Sums of IID RVs are normally distributed.

,#-.

/"# ∼ 1 +%, +'(

8

CLT explains a lot

9

Normal approximationto Binomial when

np(1-p) > 100.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

70 80 90 100 110 120 130 140 150

P(X

= x)

x

!~Bin # = 200, ( = 0.6

A short history of the CLT

10

1700

1800

1900

2000

1733: CLT for X ~ Ber(1/2) postulated by Abraham de Moivre

1823: Pierre-Simon Laplace extends de Moivre’swork to approximating Bin(n,p) with Normal

1901: Alexandr Lyapunov provides precisedefinition and rigorous proof of CLT

2012: Drake releases “Started from the Bottom”• As of July 4th, he has a total of 190 songs• Mean quality of subsamples of songs is normally distributed

(thanks to the Central Limit Theorem)

Implications of CLT

Anything that is a sum/average of independent random variables is normal

…meaning in real life, many things are normally distributed:◦ Exam scores: sum of individual problems◦Movie ratings: averages of independent viewer scores◦ Polling:◦ Ask 100 people if they will vote for candidate X◦ p1 = # “yes”/100

◦ Sum of Bernoulli RVs (each person independently says “yes” w.p. p)

◦ Repeat this process with different groups to get p1, …, pn (different sample statistics)

◦ Normal distribution over sample means pk

◦ Confidence interval: “How likely is it that an estimate for true p is close?”

11

No more convolution when ! → ∞!

For independent discrete random variables X and Y, and Z = X + Y,

$% & = ( ) + + = & =,-$.(0)$2(& − 0) =,

4$.(& − 5)$2(5)

pZ is defined as the convolution of pX and pY.

Sums of IID RVs are normally distributed.

,678

9)6 ∼ ; !<, !>?

12

(caveat: must be independent andidentically distributed)

!

CLT vs. Water speed statWater Pokemon speed statistic, !" :• # !" = 65, Var !" = 510• 126 Water Pokemon• Let Y = average of 50 Water Pokemon.What is the distribution of Y?

13

* = +! = ,-∑"/,

- !"# * = # !" = 65Var * = Var 01

- = 2,323 = 10.2

+! = 167"/,

-!" ∼ 9 :, ;

<

6

Solution:

* ∼ 9 65,10.2

!

Confidence interval, fully defined

Consider a sample of IID RVs X1, X2, …, Xn, where n is large,

"# = %&∑()%

& #(, Var "# = *+& ,- = %

&.%∑()%& #( − "# -

, SE = 2&

def For large n, a 100(1–α)% confidence interval of 3 is defined as:

"# − 45/- 7 SE , "# + 45/- 7 SE"# ± 45/- 7 (SE)

where Φ(45/-) = 1 − (>/2).

ex: 95% confidence interval: "# ± 1.96(SE)> = 0.05, >/2 = 0.025 Φ 45/- = 0.975, 45/- = 1.96In other words: 95% of time that we sample, true 3 will be in interval "# ± 1.96(SE)

NOT: 3 is 95% likely to be in this particular interval. 14

!

SE with confidence intervalIdle CPUs are the bane of our existence.• A large company monitors 225 computers (single CPU) for idle hours in order to

estimate the average number of idle hours per CPU.• Sample had !" = 11.6 hrs, '( = 16.81 hrs2 (so ' = 4.1 hrs)

Give the 90% confidence interval for the estimate of +, mean idle hrs/CPU.

15

, = 0.10, ,/2 = 0.05 Φ 34/( = 0.95, 34/( = 1.645Standard error, SE = 8

9 =:.;((< =

:.;;<

90% confidence interval: !" ± 34/((SE)à 11.6 ± 1.645 :.;;<

11.15, 12.05Interpret: 90% of time that such an interval is computed, the true + is it.

Solution:

BreakAttendance: t inyurl.com/cs109summer2018

16

Sampling distribution

CLT tells us sampling distribution of !" is approx. normal when n is large.

“large” n: n > 30

(larger is better): n > 100

We can also use CLT to decide values of n for good confidence intervals

17http://onlinestatbook.com/stat_sim/sampling_dist/

http://onlinestatbook.com/stat_sim/sampling_dist/

!

Estimating clock running timeWant to test runtime of algorithm.• Run algorithm n times and measure average time• Know variance is !" = 4 sec2

• Want to estimate mean runtime: % = &How large should n be so that P(average time is within 0.5 of t) = 0.95?

18

'( = )*∑,-)

* (, ∼ / &, 1* (by Central Limit Theorem)

2 & − 0.5 ≤ '( ≤ & + 0.5 = 0.95= 2 −0.5 ≤ '( − & ≤ 0.5 ( '( − &) ∼ / 0, 1*= 2 −<.= *

" ≤ > ≤ <.= *"

'( = 1@A,-)

*(, ∼ / %, !

"

@

Solution:

(SD: "*)

> ='( − &2/ @~/(0,1)

!

Estimating clock running timeWant to test runtime of algorithm.• Run algorithm n times and measure average time• Know variance is !" = 4 sec2

• Want to estimate mean runtime: % = &

How large should n be so that P(average time is within 0.5 of t) = 0.95?

19

' & − 0.5 ≤ -. ≤ & + 0.5 = ' −0.5 ≤ -. − & ≤ 0.5 = ' −0.1 2

"≤ 3 ≤

0.1 2

"= 0.95

= Φ2

6− Φ

7 2

6= Φ

2

6− 1 − Φ

2

6= 2Φ

2

6− 1 = 0.95

⇒ Φ2∗

6= 0.975 Solve for n*:

2∗

6= 1.96 ⇒ >∗ = 7.84 " = 62

What say you, Chebyshev?

' -. − & ≤ 0.5 ≥ 0.95 ⇒ ' -. − & ≥ 0.5 ≤6/2

0.1 B ≤ .05

16/> ≤ 0.05 ⇒ > ≥ 320

-. =1

>DEFG

2

.E ∼ I %,!"

>

Solution:

-. =G

2∑EFG2 .E ∼ I &,

6

2

Thanks for playing, Pafnuty!

The power of the central limit theorem

Sample means of IID RVs are normally distributed.!" = 1

%&'()

*"' ∼ , -, /

0

%• Can estimate confidence interval• Can give probabilities on sample mean

Sums of IID RVs are normally distributed.

&'()

*"' ∼ , %-, %/0

• Can estimate any sum of IID RVs (if -, /0 known)

20

our nextgoal

!

Crashing websiteLet X = number of visitors to a website/minute: X~Poi(100)• The server crashes if ≥ 120 requests/minute.What is P(crash in next minute)?

21

Recall: sum of IID Poissons is PoissonCLT says sums are normally distributed (as n à ∞)

Let:

" # ≥ 120 ≈ " ) ≥ 119.5 = " ./011011 ≥ 002.3/011

011 = 1 − Φ 1.95 ≈ 0.0256

Solution 1: (exact, using Poisson)

Solution 2: (using CLT)

7890

:

#8 ∼ < =>, =@A

" # ≥ 120 = 7B90A1

CD/011 100 B

E!

Poi(100)~7B90

:

Poi(100/=) Using CLT on discrete sum à continuity correction

Attempted solution 3: (using one-sided Chebyshev)

" # ≥ 120 = " # ≥ J # + L ≤@A

@A + LA =100

100 + 20A = 0.2

≈ 0.0282

!

possible

continuity correction

Summary: Approximations

22

Binomial:n trials

P(success) = p,Independent trials

Normal:p medium, np(1–p)>10

Independent trials

Poisson:p < 0.05, n > 20 AND/OR

slight dependence

continuity

correction

Poisson:Poisson = sum of IID Poissons

where n is arbitrarily large Normal:(by Central Limit Theorem)

Sum of IID RVswith known E[X], Var(X)


!

Another dice gameYou will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10

Win if: X ≤ 25 or X ≥ 45What is P(win)?

23

Roll!!!!!!!

And now the truth (according to CLT)…

!

Another dice game

24

WTF: ! " ≤ 25 or " ≥ 45 = 1 − !(25 < " < 45)

By CLT, " = ∑/012 "/ approx. 3 45, 478 where E "/ = 5 = 3.5,Var "/ = 78 = <=18

" ∼ 3 10 3.5 = 35, 10 35/12 = 350/12

1 − ! 25 < " < 45 ≈ 1 − ! 25.5 ≤ " ≤ 44.5

= 1 − ! 8=.=B<=<=C/18

≤ DB<=<=C/18

≤ EE.=B<=<=C/18

≈ 1 − 2Φ 1.76 − 1 ≈ 2 1 − 0.9608 = 0.0784

Solution:


You will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10


!

Another dice game

25

! " ≤ 25 or " ≥ 45 = 1 − !(25 < " < 45) = 0.0784Solution:

You will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10


Finding Statistics

26

What do you know? What statistic

can you find?

Distribution

with parameters

Anything your

heart desiresRVs

Sample of

populationBootstrapping

Estimates of

E[X] or Var(X),

confidence

interval, p-values

E[X] or

Var(X)

Probability

bounds

CLT

Sum of IID RVs?

Now…

27

What do you know? What statisticcan you find?

Distributionwith parameters

Anything yourheart desiresRVs

What if you don’t knowthe parameters of your distribution?

!

Parameter estimationdef parametric model – probability distribution that has parameters θ.

28

Distribution

• Ber(p)

• Poi(λ)

• Multinomial(p1, p2, …, pm)

• Unif(α, β)

• Normal(µ, σ2)

• etc.

Model

Bernoulli

Poisson

Multinomial

Uniform

Normal

Parameters θ! = #! = $

! = #%, #', … , #)! = *, +! = ,, -'

Note θ can bea vector of parameters!

“Point estimate” of parameter:Find the best single value for parameter estimate (as opposed to distribution of !")• Better understand of process producing data• Future predictions based on model• Simulation of future processes

Why do we care?

def Parameter estimation

1. We observe data that has a known model with unknown parameter ".

2. Use data to estimate model parameters as !".Note: estimators !" are RVs estimating parameter "

(e.g., sample mean: !" = $% estimates population mean " = &)

29

Machine Learning:Learn model

of systemDo stuff

with model

!

Estimator biasdef bias of estimator: ! "# − #• “unbiased” estimator: bias = 0

Example Sample MeanEstimator: '( = )

*∑,-)* (, Estimating: .

Bias: ! '( = .Example 2 Sample Variance

Estimator: /0 = )*1)∑2-)

* (2 − '( 0 Estimating: 30

Bias: ! /0 = 30

30

(unbiased estimator)

(unbiased estimator)

!

Estimator consistencydef consistent estimator: lim$→&' () − ) < , = 1 for , > 0• As we get more data, the estimate () should deviate from the true )

by at most a small amount. • Actually known as “weak” consistency

Example Sample MeanEstimator: 12 = 3

$∑563$ 25 Estimating: 7

By strong law of large numbers (SLLN): ' lim$→& 12 = 7 = 1Implying weak law of large numbers (WLLN): lim$→&' 12 − 7 ≥ , = 0 for , > 0Equivalently: lim$→&' 12 − 7 < , = 1 for , > 0

31(consistent estimator)

!

Sample variance consistencydef biased estimator: ! "# − # ≠ 0def consistent estimator: lim*→,- "# − # < / for / > 0

As 1 → ∞, estimate "# should deviate from true # by at most a small amount.

Are the below two estimators for variance 34 biased? Consistent?

1. 54 = 7*87∑:;7

* <: − =< 4 (sample variance)

unbiased: ! 54 = 34consistent: 54 = *

*877* ∑:;7

* <:4 − =<4 ,

- lim*→,7*∑:;7

* <:4 = ! <4 = 1, - lim*→, =<4 = @4 = 1 (by SLLN, <: i.i.d.)

2. A = 7*∑:;7

* <: − <̄ 4

biased: ! A = ! *87* 54 = *87

* 34consistent: for similar reasons as above #1 , where now lim*→,

*87* = 1 32

Lecture 18: Central Limit Theorem - Stanford...

Documents

Transcript of Lecture 18: Central Limit Theorem - Stanford...