Two-Sample Proportions Inference

Two-Sample Two-Sample Proportions Proportions InferenceInference

Two-Sample Proportions Two-Sample Proportions InferenceInference

Sampling Distributions for the Sampling Distributions for the difference in proportionsdifference in proportions

When tossing pennies, the probability of the coin landing on heads is 0.5. However, when spinning the coin, the probability of the coin landing on heads is 0.4. Let’s investigate.

Pairs of students will be given pennies and assigned to either flip or spin the penny

Looking at the sampling distribution Looking at the sampling distribution of the difference in sample of the difference in sample

proportions:proportions:What is the mean of the difference in sample proportions (flip - spin)?

What is the standard deviation of the difference in sample proportions (flip - spin)?

Can the sampling distribution of difference in sample proportions (flip - spin) be approximated by a normal distribution?

What is the probability that the difference in proportions (flipped – spun) is at least .25?

1.0ˆˆ sfpp

14.0ˆˆ sfpp

Yes, since n1p1=12.5, n1(1-p1)=12.5, n2p2=10, n2(1-p2)=15 – so all are at least 5)

ˆ ˆ .25 .1420f sP p p

Assumptions:Assumptions:

• TwoTwo, independentindependent SRS’s from populations ( or randomly assigned treatments)

• Populations at least 10n• Normal approximation for both

51

5

11

11

pn

pn

51

5

22

22

pn

pn

The National Sleep Foundation asked a random sample of U.S. adults questions about their sleep habits. One of the questions asked about snoring. Of the 995 respondents, 37% of adults reported that they snored at least a few nights a week during the past year.

Would you expect that percentage to be the same for all age groups? Split into two age categories, 26% of the 184 people under 30 snored, compared with 39% of the 811 in the older group.

Is this difference of 13% real, or due only to natural fluctuations in the sample we’ve chose?

18-29 30 and over

Total

Snore 48 318 366

Don’t snore

136 493 629

Total 184 811 995

HYPOTHESIS TEST!

For the true DIFFERENCE between

snoring rates!

Steps:Steps:

1) Assumptions2) Hypothesis statements &

define parameters3) Calculations4) Conclusion, in context

Assumptions:Assumptions:

• TwoTwo, independentindependent SRS’s from populations ( or randomly assigned treatments)

• Populations at least 10n• Normal approximation for both

1 1

1 1

10

1 10

n p

n p

2 2

2 2

10

1 10

n p

n p

18-29 30 and over

Total

Snore 48 (26%) 318 (39%) 366

Don’t snore

136 493 629

Total 184 811 995Assumptions:Assumptions:

• Ages are independent of each other Ages are independent of each other in random samplein random sample

• 995 is less than 10% of all adults

• Normal approximation for both

1 1

1 1

184(.26) 48 10

1 184(.74) 136 10

n p

n p

2 2

2 2

811(.39) 316 10

1 811(.61) 495 10

n p

n p

Hypothesis statements:Hypothesis statements:

H0: p1 - p2 = 0

Ha: p1 - p2 > 0

Ha: p1 - p2 < 0

Ha: p1 - p2 ≠ 0

Be sure to define

both p1 & p2!

H0: p1 = p2

Ha: p1 > p2

Ha: p1 < p2 Ha: p1 ≠ p2

Hypothesis statements:Hypothesis statements:

H0: There is no difference in snoring rates in the two age groups

pold – pyoung = 0p1

= p2

Ha: There is a difference in snoring rates in the two age groups

pold – pyoung ≠ 0p1

≠ p2

Since we assume that the population proportions are equal in the null hypothesis, the variances are equal.

Therefore, we

combine

the variances!

1 2

1 2

ˆc

x xp

n n

Formula for Hypothesis Formula for Hypothesis test:test:

statistic of SD

parameter - statisticstatisticTest

Test statistic:

1 2

c c c c

1 2

p pz

p (1 p ) p (1 p )

n n

184, 811

48 318.261, .392

184 81148 318

.3678184 811

young old

young old

combined

n n

p p

p

Test statistic:1 2

c c c c

1 2

p pz

p (1 p ) p (1 p )

n n

.392 .2613.33

(.3678)(.6322) (.3678)(.6322)811 184

z

P-value = 2(area to right of z = 3.33) = .0008

“Since the p-value < (>) , I reject (fail to reject) the H0. There is (is not) sufficient evidence to suggest that Ha.”

Conclusion:

Since the p-value is less that my significance level, I reject the null hypothesis of no difference. There is sufficient evidence to suggest that there is a difference in the rate of snoring between older adults and younger adults.

Formula for Hypothesis Formula for Hypothesis test:test:

statistic of SD

parameter - statisticstatisticTest

z

21

2121

11ˆ1ˆ

ˆˆ

nnpp

pppp

Usually p1 – p2 =0

Example - Student RetentionA group of college students were asked what they thought the “issue of the day”. Without a pause the class almost to a person said “student retention”. The class then went out and obtained a random sample (questionable) and asked the question, “Do you plan on returning next year?”

The responses along with the gender of the person responding are summarized in the following table. Response

Yes No Maybe Male 211 45 19 Gender Female 141 32 9

Test to see if the proportion of students planning on returning is the same for both genders at the 0.05 level of significance.

Assumptions: The two samples are independently chosen random samples. Sample is less than 10% of college population.

The sample sizes are large enough sincen1 p1 = 211 10, n1(1- p1) = 64 10n2p2 = 141 10, n2(1- p2) = 41 10

so an approx normal model can be used.

Significance level: = 0.05

Example - Student Retention

pm = true proportion of males who plan on returning

pf = true proportion of females who plan on returning

nm = 275 (number of males surveyed)

nf = 182 (number of females surveyed)

(sample proportion of males who plan on returning)

(sample proportion of females who plan on returning)

Null hypothesis: H0: pm– pf = 0Alternate hypothesis: Ha: pm – pf

0

Example - Student Retention

211.7672

275mp

141.7747

182fp

1 1 2 2c

1 2

n p n p 211 141 352p 0.7702

n n 275 182 457

Calculations:

1 1 2 2c

1 2

n p n p 211 141 352p 0.7702

n n 275 182 457

1 2

c c c c

p p

p (1 p ) p (1 p )

275 182

0.76727 0.77473

0.77024(1 0.77024) 0.77024(1 0.77024)

275 182

z

-0.0074525-0.19

0.040198

Test statistic:1 2

c c c c

1 2

p pz

p (1 p ) p (1 p )

n n

P-value:

The P-value for this test is 2 times the area under the z curve to the left of the computed z = -0.19.

P-value = 2(0.4247) = 0.8494

Conclusion:

Since P-value = 0.849 > 0.05 = , the hypothesis H0 is not rejected at significance level 0.05.

There is no evidence that the return rate is different for males and females.

Example 4: A forest in Oregon has an infestation of spruce moths. In an effort to control the moth, one area has been regularly sprayed from airplanes. In this area, a random sample of 495 spruce trees showed that 81 had been killed by moths. A second nearby area receives no treatment. In this area, a random sample of 518 spruce trees showed that 92 had been killed by the moth. Do these data indicate that the proportion of spruce trees killed by the moth is different for these areas?

Assumptions:

•Have 2 independent SRS of spruce trees

•Both distributions are approximately normal since n1p1=81, n1(1-p1)=414, n2p2=92, n2(1-p2)=426 and all > 10

•Population of spruce trees is at least 10,130.

H0: p1 = p2 where p1 is the true proportion of trees killed by moths Ha: p1 ≠ p2 in the treated area p2 is the true proportion of trees killed by moths in the untreated area

59.0

5181

4951

83.17.

18.16.

11ˆ1ˆ

ˆˆ

21

21

nnpp

ppz P-value =

0.5547

= 0.05

Since p-value > , I fail to reject H0. There is not sufficient evidence to suggest that the proportion of spruce trees killed by the moth is different for these areas

18-29 30 and over

Total

Snore 48 318 366

Don’t snore

136 493 629

Total 184 811 995

Back to snoring….What if I wanted to know the true difference in the population proportion of young adults who snore and the proportion of older adults who snore?

CONFIDENCE INTERVAL!

For the true DIFFERENCE between

snoring rates!

What are the steps for performing a confidence interval?1.) Identify the interval by name or formula

(CI for two-sample proportion)

2.) Assumptions• TwoTwo, independentindependent SRS’s from populations ( or randomly

assigned treatments)• Populations at least 10n• Normal approximation for both

3.) Calculations

4.) Conclusion (in context of problem)

1 1

1 1

10

1 10

n p

n p

2 2

2 2

10

1 10

n p

n p

Formula for confidence Formula for confidence interval:interval:

statistic of SD valuecritical statisticCI

21 ˆˆ pp *z 2

22

1

11 ˆ1ˆˆ1ˆ

npp

npp

Note: use p-hat when p is not known

Standard error!

Margin of error!

Two-sample Confidence Interval for Proportions

Conditions for inference have previously been met

2

1 1 2 21

1 2

ˆ ˆ ˆ ˆ1 1( )

p p p pp p z

n n

.392 1 .392 .261 1 .261(.392 .261) 1.96

811 184

.131 .0718

(.0592,.2028)

We are 95% confidence that the proportion of people who snore is between 5.92% and 20.28% higher for older adults than for younger adults.

Example 1: At Community Hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is the shape & standard error of the sampling distribution of the difference in the proportions of people with no visible scars between the two groups?

Since n1p1=259, n1(1-p1)=57, n2p2=94, n2(1-p2)=325 and all >

5, then the distribution of difference in proportions is

approximately normal.

0296.0419

)78(.22.316

)18(.82...

ES

Example 1: At Community Hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is a 95% confidence interval of the difference in proportion of people who had no visible scars between the plasma compress treatment & control group?

Assumptions:

•Have 2 independent randomly assigned treatment groups

•Both distributions are approximately normal since n1p1=259, n1(1-p1)=57, n2p2=94, n2(1-p2)=325 and all > 5

•Population of burn patients is at least 7350.

Since these are all burn patients, we can add 316 + 419 = 735.

If not the same – you MUST list separately.

654.,537.419

78.22.316

18.82.96.122.82.

11*ˆˆ

2

22

1

1121

n

ppnpp

zpp

We are 95% confident that the true proportion of people who had no visible scars between the plasma compress treatment is between 53.7% and 65.4% higher than for the control group.

Example 2: Suppose that researchers want to estimate the difference in proportions of people who are against the death penalty in Texas & in California. If the two sample sizes are the same, what size sample is needed to be within 2% of the true difference at 90% confidence?

nn)5(.5.)5(.5.

645.102. n25.25.

645.102.

Since both n’s are the same size, you have

common denominators – so add!

n = 3383

Do you think that the

proportion of defective

PEANUT M&Ms is higher than

The proportion of defective PLAIN M&MS?

Two-Sample Proportions Inference

Documents

Transcript of Two-Sample Proportions Inference