Cécile Ané - University of...

22
Confidence intervals Cécile Ané Stat 371 Spring 2006

Transcript of Cécile Ané - University of...

Page 1: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Confidence intervals

Cécile Ané

Stat 371

Spring 2006

Page 2: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Outline

1 Building a confidence interval

2 Planning a study: how much should I sample?

3 Conditions for validity

4 Confidence intervals for proportions

Page 3: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Example (problem 6.10)

Part of a study on the development of the thymus gland:weights of thymus gland from 5 chick embryos after 14days of incubation:

29.6 21.5 28.0 34.6 44.9 (mg)

We want to know µ, the mean weight of thymus glands inthe entire population of chick embryos after 14 days ofincubation (in the same incubator).

y = 31.72 is our best estimate for µ. How good is thisestimate? How far is µ from 31.72?

Page 4: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Standard error of the mean

We know the standard deviation of Y is σ/√

n. But we don’tknow σ. Hopefully, the standard deviation of the data, s, isclose to σ.

SEy =s√n

is the standard error of the mean.

It is an estimate of the standard deviation of Y . The deviation ofY from its mean µ, is an error we will make. SEy gives us anidea of how far y is from µ.

What happens to s when the sample size increases?

s stays about the same, as it gets closer to σ.

What happens to SEy when the sample size increases?

It becomes smaller and smaller, as y gets closer to µ.

Page 5: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Mechanics of a confidence interval1 Choose a confidence level. Typically, 95%. Polls use

90% or 95%.2 Find the value t such that IP{−t ≤ T ≤ t} = confidence

level. It also means

IP{T ≥ t} = (1− confidence level)/2

Refer to Student distribution on Table 4 (back cover), anduse degree of freedom df= n − 1.

3 Construct the interval: y ± tSEy i.e.

(y − tSEy , y + tSEy )

4 Conclude:

We are 90% confident that the mean thymus gland weight of allchick embryos at the age of 14 days of incubation (in thecondition of the experiment) is between 23.41 and 40.03 mg.

Page 6: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Mechanics of a confidence interval: Example

1 Confidence level. We will do both 90% and 95%.2 Find the value t : such that IP{T ≥ t} = .05 for level 90%

and .025 for level 95%.Degree of freedom: df=5− 1 = 4.t-Table gives: t = 2.13 for 90% confidence and t = 2.77 for95% confidence. With R:> qt(.975, df=4)[1] 2.776445

3 Interval: We had y = 31.72, s = 8.73 thenSEy = 8.73/

√5 = 3.90.

Radius of interval (bull’s eye): t ∗ SEy = 8.31 (90%confidence) and 10.81 (95% confidence).The interval is 31.72± 8.31 or 31.72± 10.81, i.e.

(23.41, 40.03) for 90% confidence(20.91, 42.53) for 95% confidence

4 Conclude.

Page 7: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

The Student distribution

−10 −5 0 5 10

0.0

0.1

0.2

0.3

dens

ity

1.64

Standard Normal curve

Page 8: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Degree of freedom: n − 1

Recall that

s2 =1

n − 1

((y1 − y)2 + · · ·+ (yn − y)2

)Suppose n = 3 and y1 − y = , y1 − y = . What is y3 − y?If we specify all but one of the deviations, we can compute thelast one. The variance is completely specified by n − 1deviations, or n − 1 pieces of information. Here the variance is:

df = # pieces of information needed for computing s2.

Imagine a sample with a single observation.

Page 9: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Back to the milk example

n = 14 cows, y = 36.2 lbs, s = 9.76 lbs.Find a 95% confidence interval for the population mean.

2 We want an area of .025 above t , and df=13:t = t.025,13 = 2.16.

3 SE = 9.76/√

14 = 2.61 lbs, and multiplier is t = 2.16.Radius is then 2.16 ∗ 2.61 = 5.6, and interval 36.3± 5.6 i.e(30.6, 41.8).

ConclusionWe are 95% confident that the average daily milk yield of a cowin the herd the cows were sampled from is between 30.6 and41.8 lbs.

Page 10: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Confidence interval with RYou need to have the raw data.

> milk[1] 19 23 26 30 32 34 37 37 39 41 44 44 46 55> t.test(milk)

One sample t-testdata: milkt = 13.8833, df = 13, p-value = 3.571e-09alternative hypothesis: true mean is not equal to 095 percent confidence interval:

30.57901 41.84956sample estimates:mean of x

36.21429

> t.test(milk, conf.level = .80)...80 percent confidence interval:

32.69239 39.73618

Page 11: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

True or False?

95% CI for the mean daily milk yield: (30.6, 41.8).

X

With the same data, a 99% confidence interval would be larger.

No

In a second sample of same size (14 cows), there is a 95%chance that the sample mean will be in (30.6, 41.8).

No

The probability is 95% that the sample mean is in (30.6, 41.8).

No

The probability is 95% that the population mean is in (30.6, 41.8).

X

The confidence is 95% that the population mean is in (30.6,41.8).

No

In the population, 95% of all daily milk yields are in (30.6, 41.8).

No

In the sample, 95% of all daily milk yields are in (30.6, 41.8).

Page 12: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Planning a study: how big should n be?

When planning a study, it is always a question we ask.

How many people am I going to interview?

How many blood samples to I need?

How many plants to I need to grow?

Trade-off between accuracy and cost.We want just the right number n to reach the conclusion.

We need to set a goal. Ex:

Polls: “margin of error” at least as small as 1%.

Chick thymus gland weight: we will repeat the experiment,but with a different incubation conditions. We want theinterval radius ≤ 1.5 mg.

Or: we want the SE at least as small as a given size:SE ≤ 0.75 mg.

Page 13: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Planning a study: how big should n be?

Solving this problem requires a guess for the population SD.It usually involves preliminary data.Chicks: guess is that SD = s = 8.73 mg.Aim: SE ≤ 0.75 mg.Then we solve SE = SD /

√n

n =

(guessed SDdesired SE

)2

n = (8.73/0.75)2 = 135.4896. (no unit!)We would sample 136 embryos.

Page 14: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Conditions for validity

1 Most importantly: the sampling process needs to be likerandom sampling. Independence of observations,sampled from the target population. At the end, we shoulddraw conclusions about the adequate population.

If the sampling process is biaised, toward Wisonsin farms,or toward large farms for instance, the confidence intervalwill greatly overstate the confidence we should have.

2 The observations Y1, . . . , Yn should be from a normaldistribution if n is small, so that Y is approximately normal.

Page 15: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Detecting non-normality - Normal probability plotsection 4.4 in the textbook

How do we know whether condition 2 is met? How can we tellthat Y1, . . . , Yn are normally distributed?Compare the spacing among observations with the spacingexpected from a normal distribution. We order the data.Milk yield, n = 14:

19 23 26 30 32 34 37 37 39 41 44 44 46 55

If Y1, . . . , Y14 come from N (0, 1), we could also order them,and computers can calculate the expected value of the lowest,the second lowest,. . . , the largest. We get the “z-scores”:

−1.8 − 1.24 − .92 − .67 − .46 − .27 − .09 .09 .27 .56 .67 .92 1.24 1.8

Is the spacing of the data looking like the “normal” spacing?We plot the y ’s vs. the z-scores.

Page 16: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Detecting non-normality - Normal probability plot

● ●

● ●

−1 0 1

2025

3035

4045

5055

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

● ●

● ●

−1 0 1

2025

3035

4045

5055

Normal Q−Q Plot

Theoretical QuantilesS

ampl

e Q

uant

iles

If the points are close to a line, then we can say the dataare normally distributed.

It is hard to tell with small sample sizes.

R demo and examples.

Page 17: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Confidence intervals for proportions

Example: What is the probability of getting the flu if one

has gotten the shot during the Fall,

and is in contact with the virus in the winter?

Experiment: Randomly sample n = 37 persons, get them theshot in the Fall. Expose them to the virus in December.Y = # of persons in the experiment who get the disease (theshot didn’t give them enough protection). We observe y = 5.

p = true value in the population: proportion or probability.

p = Y/n observed value. Ex: p = 5/30 = 0.17.

Goal: 95% confidence interval for p.

Page 18: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Confidence intervals for proportions

Sampling distribution of p: same shape as the binomial, but onvalues 0, 1/n, 2/n, . . . , (n − 1)/n, 1.

Distribution of p

Mean of p: IE(p) = p, standard deviation:

√p(1− p)

n.

If n is large enough for np ≥ 5 and n(1− p) ≥ 5, then p’s

distribution is approximately N

(p,

√p(1− p)

n

).

p lies in p ± 1.96

√p(1− p)

nin 95% of the experiments, i.e.

p lies in p ± 1.96

√p(1− p)

nin 95% of the experiments.

Page 19: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Confidence intervals for proportions

First idea: plug-in p in place of p and use

p ± 1.96

√p(1− p)

n

as a 95% confidence interval.

Ex: n = 30, y = 5 flu cases, so that p = 5/30 = .17 and√.17(1−.17)

30 = .07. We get the interval

.17± 1.96 ∗ .07 = .17± .13 i.e. (.033, .300)

BUT: this does not work very well. p lies in p ± 1.96√

p(1−p)n in

84% of the experiments if n = 10 and p = .3,

65% of the experiments if n = 10 and p = .1,

Page 20: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Confidence intervals for proportions

Instead: We pretend we have 4 more observations (i.e. samplesize is n + 4) and that out of those 4 extra observations, thereare 2 successes and 2 failures (i.e. # successes is Y + 2).

p =y + 2n + 4

and SEp =

√p(1− p)

n + 4

A 95% confidence interval for p is

p ± 1.96 SEp

p lies in p ± 1.96√

p(1−p)n in

95.2% of the experiments if n = 10 and p = .3,

93% of the experiments if n = 10 and p = .1,

We are no more over-estimating our confidence!

Page 21: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Confidence intervals for proportions

Example: n = 30, y = 5 flu cases.

We get p = (5 + 2)/(30 + 4) = .21 andSEp =

√.21 ∗ .79/34 = .07.

Our 95% confidence interval is (0.070, .342).

Page 22: Cécile Ané - University of Wisconsin–Madisonpages.stat.wisc.edu/~ane/st371/notes/chap6.pdfMechanics of a confidence interval 1 Choose a confidence level. Typically, 95%. Polls

Proportions: How big should n be?

How many people should I sample so that my margin of error isat most 1% ?

margin or error = 1.96∗SE, so it means SE at most 0.5%, i.eSEp ≤ 0.005. But SEp is

SEp =

√p(1− p)

n + 4≤√

1/4n + 4

We then need (safe choice)

n =1

4(Desired SE)2 − 4

Example: for SE at most 0.005, we need n ≥ 10, 000− 4.That’s why polls are usually done on several thousands people.