STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

18
STT 200 – LECTURE 5, SECTION 23,24 RECITATION 12 (4/2/2013) TA: Zhen Zhang [email protected] Office hour: (C500 WH) 3-4 PM Tuesday (office tel.: 432-3342) Help-room: (A102 WH) 9:00AM-1:00PM, Monday Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 23 1:50 – 2:40PM A234 WH, Section 24 1

description

TA: Zhen Zhang [email protected] Office hour: (C500 WH) 3-4 PM Tuesday ( office tel.: 432-3342) Help-room: (A102 WH) 9:00AM-1:00PM , Monday Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 23 1:50 – 2:40PM A234 WH, Section 24. - PowerPoint PPT Presentation

Transcript of STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

Page 1: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

1

STT 200 – LECTURE 5, SECTION 23,24RECITATION 12

(4/2/2013)

TA: Zhen Zhang

[email protected] hour: (C500 WH) 3-4 PM Tuesday

(office tel.: 432-3342)Help-room: (A102 WH) 9:00AM-1:00PM, Monday

Class meet on Tuesday: 12:40 – 1:30PM A224 WH, Section 231:50 – 2:40PM A234 WH, Section 24

Page 2: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

2

Example (sampling distribution)

Recall that the data we have last time contain “yes/no” responses from a population of 400 persons who were asked if they have wireless internet access at home. The population proportion of “yes” is 0.5575.

We draw many random samples with size , the sampling distribution of can be approximated by

What if we don’t know ?

0.3 0.4 0.5 0.6 0.7 0.8

p = 0.5575

Page 3: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

3

Example (confidence interval) To study, we draw sample with size , obtain and construct 95%

confidence interval using . We are 95% confident that is between it. The “95% confident” means if we draw samples and construct intervals

many times, approximately 95% intervals will cover .

0.2

0.4

0.6

0.8

Page 4: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

4

Example (check conditions)To validate the confidence interval, we need to check several conditions:

Independence condition: the responses in the sample are chosen

independently.

Randomness condition: the responses in the sample are chosen

randomly. We used the table of random digits from 1 to 400.

10% condition: the sample size is less than 10% of the population size

400.

Success/failure condition: the responses in the sample contains at

least 10 yeses and 10 nos.

Page 5: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

5

Construct confidence interval step by stepTo construct confidence interval for with confidence level :

Determine the critical value, either using Normal table, or

R/calculator. To use R/calculator, note the total area below is , so we

can find using qnorm((C+1)/2) in R, or invnorm((C+1)/2) in DISTR in a

Ti-83 Plus calculator. For example in a calculator: for 95% confidence,

for 90% confidence,

Find

The margin of error is

The confidence interval is

Page 6: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

6

Understand confidence interval backwards

If a 95% confidence interval for is , can you figure out what is , what is the margin of error, and what is the sample size?

Ans. is the middle point of this interval, or, the average of the two endpoints, so

and the margin of error is half of the width, or |endpoint-middle point|

Now since , we have

Page 7: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

7

RelationshipMargin of error determines the width of the confidence interval. The following simulation shows the relationship between and each of , and when other two fixed. For example, if the confidence level C increases, will increase, so will increase, and the confidence interval is wider.

0.0 0.2 0.4 0.6 0.8 1.0

fixed confidence level = 95%, n = 100p̂

mar

gin

of e

rror

p = 0.5

40 60 80 100 120 140

fixed confidence level = 95%, p = 0.5sample size n

0.80 0.85 0.90 0.95

fixed n = 100, p = 0.5confidence level C

Page 8: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

8

Sample size determination

Recall that from , we have:

and want to determine the sample size of the data we will collect. We

need to guess .

If “it is believed” or some “national study” gives a value for

population proportion , we can use it and replace .

We can also use from our pilot sample if we have.

If we totally have no idea about , we can use a conservative guess

based on the “worst” scenario, that is, when reaches its maximal

(when ), it corresponds to the largest required sample size.

Page 9: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

9

NEED SOME COFFEE?

Page 10: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

10

Chapter 19 (Page 504): #7:

Which statements are true?

a) For a given sample size, higher confidence means a smaller margin

of error.

b) For a specified confidence level, larger samples provides smaller

margins of error.

c) For a fixed margin of error, larger samples provide greater

confidence.

d) For a given confidence level, halving the margin of error requires a

sample twice as large.

Page 11: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

11

Chapter 19 (Page 504): #8:

Which statements are true?

a) For a given sample size, reducing the margin of error will mean

lower confidence.

b) For a certain confidence level, you can get a smaller margin of

error by selecting a bigger sample.

c) For a fixed margin of error, smaller samples will mean lower

confidence.

d) For a given confidence level, a sample 9 times as large will make a

margin of error one third as big.

Page 12: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

12

Chapter 19 (Page 505): #14:

11% of a random sample of 1003 adults approved of attempts to clone a human.

a) Find the margin of error if we want 95% confidence.

b) Explain what that margin of error means.

The pollsters are 95% confident that the true population of adults who approve of

attempts to clone humans is within 1.9% of the estimated 11%.

c) If we only need to be 90% confident, will the margin of error be larger or

smaller? Explain.

Smaller, since the critical value decreases as confidence level decreases.

d) Find that margin of error.

e) In general, if all other aspects of the situation remain the same, would smaller

samples produce smaller or larger margin of error?

Larger.

Page 13: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

13

Chapter 19 (Page 506): #27:

In a random survey of 226 college students, 20 reported being “only” children. Estimate

the proportion of students nationwide.

a) Check conditions for constructing a confidence interval.

The students’ birth orders are likely to be independent. The sample was random and consisted of

less than 10% of the population. There were 20 successes and 206 failures (both greater than 10).

b) Construct 95% confidence interval.

Hence the confidence interval is .

c) Interpret your interval.

We are 95% confident that between 5.15% and 12.55% of all college students are “only” children.

d) Explain what “95% confidence” means in this context.

If we were to select repeated samples like this we’d expect about 95% of the confidence intervals

we created to contain the true proportion of all college students who are “only” children.

Page 14: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

14

Chapter 19 (Page 506): #28:

74% of 1644 randomly selected college freshmen returned to college the next year.

Estimate the national freshman-to-sophomore retention rate.

a) Verify that the conditions are met.

It’s a random sample; both 74% and 26% of 1644 are greater than 10.

b) Construct a 98% confidence interval.

The critical value is invnorm((1+0.98)/2) = 2.326, hence the margin of error =

2.326*sqrt(0.74*0.26/1644)=.

Hence the confidence interval is

c) Interpret your interval.

We’re 98% confident that between 71.48% and 76.52% of colleges freshman return to

college their sophomore years.

d) Explain what “98% confidence” means in this context.

If we were to select repeated samples like this we’d expect about 98% of the confidence

intervals we created to contain the true proportion of all college freshmen who return to

be sophomores.

Page 15: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

15

Sample size determination

In a University, it’s believed that 25% of adults over 30 love Statistics. We wish to

see if this percentage is the same among the 18 to 25 age group.

a) How many of this younger age group must we survey in order to estimate the

proportion of those who love Statistics to within 5% with 90% confidence?

With 90% confidence, the critical value . Thus

So the required sample size is 203.

b) If we want to cut the margin of error in half, how many of this younger age

group must we survey? Do you have any concerns about this sample? Explain.

So the required sample size is 812.

This large sample might be larger than 10% of the population.

Page 16: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

16

APPENDIX 1 R codes for example:# please import the data we had in recitation 11 slide, otherwise it won’t work haswi <- c("Yes","Yes","Yes","No","Yes","No","Yes","No“ ... ...

p <- mean(haswi=="Yes"); N <- length(haswi); n <- 37; replica <- 10000set.seed(241)phats <- numeric(replica)interval <- matrix(0, replica, 2)zstar <- qnorm((1+0.95)/2)for (t in 1:replica){ mysamples <- haswi[sample(1:N, size=n)] ph <- sum(mysamples=="Yes")/n; moe <- zstar*sqrt(ph*(1-ph)/n) phats[t] <- ph interval[t,] <- c(ph-moe, ph+moe) }phats <- na.omit(phats)

win.graph(w=12,h=6) par(xaxt='n',mar=c(.8,2,.8,.8));B <- 100plot(1:B, ylim=range(interval[1:B,])+1*c(-.01,.01),type='n',ylab='',xlab=''); grid(col='gray60')abline(h=p, col='red',lwd=2) for(t in 1:B){ lines(x=c(t,t), y=interval[t,],col='gray40',lwd=2) lines(x=t+c(-.3,.2), y=rep(interval[t,1],2),col='gray40',lwd=2) lines(x=t+c(-.3,.2), y=rep(interval[t,2],2),col='gray40',lwd=2) points(x=t, y=mean(interval[t,]), pch=16, cex=.8,col='blue2')}

mean(p>=interval[1:B,1] & p<=interval[1:B,2])

Page 17: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

17

APPENDIX 2

R codes for the simulation study of finding relationship between margin of error and sample proportion, sample size and confidence level.

a = function(p=0.5,z=0.95,n=100) return(qnorm((1+z)/2)*sqrt(p*(1-p)/n))ps = seq(0,1,length=1e3)

win.graph(w=9,h=4) par(mfrow=c(1,3), mar=c(4,4,0,0)+1, cex.lab=2, yaxt='n', cex.sub=1.3)plot(a(ps)~ps, type='l', xlab=expression(hat(p)),ylab='margin of error', lwd=2, sub="fixed confidence level = 95%, n = 100"); grid(col='gray70')abline(v=0.5, col='red2',lwd=2); text(y=0,x=0.65,labels="p = 0.5",col='red2',cex=1.2)

ns <- seq(30,150,by=1)plot(a(n=ns)~ns, type='l', xlab='sample size n',ylab='', lwd=2, sub="fixed confidence level = 95%, p = 0.5"); grid(col='gray70')

zs <- seq(0.8,0.99,length=1e3)plot(a(z=zs)~zs, type='l', xlab='confidence level C',ylab='', lwd=2, sub="fixed n = 100, p = 0.5"); grid(col='gray70')

Page 18: STT 200 – Lecture 5, section 23,24 Recitation 12 (4/2/2013)

18Thank you.