Chapter 11 Asking and Answering Questions About The Difference Between Two Population Proportions...

27
Chapter 11 Asking and Answering Questions About The Difference Between Two Population Proportions Created by Kathy Fritz

Transcript of Chapter 11 Asking and Answering Questions About The Difference Between Two Population Proportions...

Chapter 11

Asking and Answering Questions About The

Difference Between Two Population Proportions

Created by Kathy Fritz

Let’s review notation:

Proportion of Successes in Population 1 : p1

Proportion of Successes in Population 2: p2

Sample SizeSample Proportion

of Successes

Sample from Population 1

n1

Sample from Population 2

n2

Subscripts are added to distinguish between the

two populations. You can either use 1 and 2,

or you can use descriptive subscripts

like W for women and M for men.

In this chapter, only the case where the two samples are independent

samples is considered.

Estimating the Difference Between Two Population Proportions

Properties of the Sampling Distribution of

Large Sample Confidence IntervalInterpreting Confidence Intervals

Investigators at Madigan Army Medical Center tested using duct tape to remove warts versus the more traditional freezing treatment.

Suppose that the duct tape treatment will successfully remove 50% of warts and that the traditional freezing treatment will successfully remove 60% of warts.

Some people seem to think that duct tape can fix anything . . . even remove warts!

Let’s investigate the sampling distribution of

.6

.1

100)5(.5.

100)4(.6.

ˆˆ tapefreeze pp

pfreeze = the true proportion of warts that are successfully removed by freezing

ptape = the true proportion of warts that are successfully removed by using duct tape

Doing this repeatedly, we will

create the sampling

distribution of

�̂� 𝑓𝑟𝑒𝑒𝑧𝑒−�̂�𝑡𝑎𝑝𝑒

100)4(.6.

ˆ freezep

pfreeze = .6

.5

100)5(.5.

ˆ tapep

ptape = .5Suppose we repeatedly treated 100 warts using the

traditional freezing treatment and calculated the proportion of warts that are successfully removed. We would have the

sampling distribution of .

Suppose we repeatedly treated 100 warts using the

duct tape method and calculated the proportion of warts that are successfully

removed. We would have the sampling distribution of .

Randomly take one of the

sample proportions for

the freezing treatment and

one of the sample

proportions for the duct tape treatment and

find the difference.

3. If both n1 and n2 are large, then the sampling distribution of is approximately normal.

General Properties of the Sampling Distribution of If is the difference in sample proportions for independently selected random samples, then the following rules hold:

1.

2. This rule says that the sampling distribution of is centered at the actual

value of the difference in population proportions.

This means that the sample differences tend to cluster around the value of the

actual population difference.

This rule specifies the standard error of . The value of the

standard error describes how much the values tend to vary

from the actual population difference.

The sample sizes can be considered large if n1p1 ≥ 10,

n1(1 – p1) ≥ 10, n2p2 ≥ 10, and n2(1 – p2) ≥ 10.

A Large-Sample Confidence Interval for a Difference in Population ProportionsAppropriate when the following conditions are met1. The samples are independent random samples from

the populations of interest (or the samples are selected in a way that is reasonable to regard each sample as representative of the corresponding population).2. The sample sizes are large. This condition is met when , , , and .

2

22

1

1121

)ˆ1(ˆ)ˆ1(ˆvalue critical ˆˆ

npp

npp

zpp

When these conditions are met, a confidence interval for the difference in population proportions is

Confidence Intervals Continued . . .Interpretation of Confidence Interval

You can be confident that the actual value of the difference in population proportions is included in the computed interval. In a given problem, this statement should be in context.

Interpretation of Confidence Level

The confidence level specifies the long-run percentage of the time that this method will be successful in capturing the actual difference in population proportions.

As part of a study, people in a sample of 258 cell phone users ages 20 to 39 were asked if they use their cell phones to stay connected while they are in bed. The same question was also asked of each person in a sample of 129 cell phone users ages 40 to 49.The study found that 168 of the 258 people in the sample of 20- to 39-year-olds and 61 of the 129 people in the sample of 40- to 49-year-olds said that they sleep with their phones.How much greater is the proportion who use a cell phone to stay connected in bed for cell phone users ages 20 to 39 than for those ages 40 to 49?

Step 1 (Estimate):You want to estimate p1 – p2, where p1 is the proportion of cell phone users ages 20 to 39 who sleep with their phones and p2 is the proportion of cell phone users ages 40 to 49 who sleep with their phones

Cell Phones in Beds Continued . . .

Step 2 (Method):

Because the answers to the four key questions are 1) estimation, 2) sample data, 3) one categorical variable, and 4) two samples, a large sample confidence interval for a difference in population proportions will be considered. A confidence level of 90% will be used.Step 3 (Check):

• The sample size is large enough because , ,

• No information was provided regarding how the samples were selected. We must assume that the samples were selected in a reasonable way.

Step 5 (Communicate Results):Assuming that the samples were selected in a reasonable way, you can be 90% confident that the actual difference in the proportion of cell phone users who sleep with their cell phone for 20- to 39-year-olds and for 40- to 49-year-olds is between 0.09 and 0.27.The method used to construct this interval estimate is successful in capturing the actual value of the difference in population proportion about 90% of the time.

Cell Phones in Beds Continued . . .

Step 4 (Calculation):

(0.09,0.27)

Interpreting Confidence Intervals for a DifferenceCase Interpretation Examples

Both endpoints of the confidence interval for p1 – p2 are positive

If p1 – p2 is positive, it means you think p1 is greater than p2 and the interval gives an estimate of how much greater.

(0.24, 0.36) You think that p1 is greater than p2 by somewhere between 0.24 and 0.36.

Both endpoints of the confidence interval for p1 – p2 are negative

If p1 – p2 is negative, it means you think p1 is less than p2 and the interval gives an estimate of how much less.

(-0.14, -0.06) You think that p1 is less than p2 by somewhere between 0.14 and 0.06.

0 is included in the confidence interval

If the confidence interval includes 0, a plausible value for p1 – p2 is zero.

(-0.14, 0.09) Because 0 is included in the confidence interval, it is possible that the two population proportions could be equal.

Testing Hypotheses About the Difference Between Two Population Proportions

A Large-Sample Test for a Difference in Two Population Proportions

Appropriate when the following conditions are met1. The samples are independent random samples from

the populations of interest (or the samples are selected in a way that is reasonable to regard each sample as representative of the corresponding population).

2. The sample sizes are large. This condition is met when , , , and .

A Large-Sample Test for a Difference in Two Population Proportions Continued . . .When these conditions are met, the following test statistic can be used to test the null hypothesis H0: p1 – p2 = 0

where is the combined estimate of the common proportion if H0 is true

Null Hypothesis: H0: p1 – p2 = 0

A Large-Sample Test for a Difference in Two Population Proportions Continued . . .

When the Alternative Hypothesis Is . . .

The P-value Is . . .

Ha: p1 – p2 > 0 Area under the z curve to the right of the calculated value of the test statisticHa: p1 – p2 < 0

Ha: p1 – p2 ≠ 0

Area under the z curve to the left of the calculated value of the test statistic2·(area to right of z) if z is positiveor2·(area to left of z) if z is negative

Another Way to Write Hypothesis Statements:

H0: p1 - p2 = 0

Ha: p1 - p2 > 0

Ha: p1 - p2 < 0

Ha: p1 - p2 ≠ 0

H0: p1 = p2

Ha: p1 > p2 Ha: p1 < p2 Ha: p1 ≠ p2

A survey was conducted by Gallup to investigate public opinion on issues related to rising gas prices. Each person in a representative sample of low-income adult Americans (annual income less than $30,000) and each person in an independently selected representative sample of high-income adult Americans (annual income greater than $75,000) was asked whether he or she would consider buying an electric car if gas prices continued to rise.

Is there convincing evidence that the proportion who would never consider buying an electric car is different for low-income adult Americans than for high income adult Americans?

In the low-income sample, 65% said that they would not buy an electric car no matter how high gas prices were to rise. In the high-income sample, 59% responded this way. Suppose sample sizes were both 300.

Gas Prices Continued . . .

Step 1 (Hypotheses):H0: p1 – p2 = 0

Ha: p1 – p2 ≠ 0where p1 = proportion of low-income adult Americans who would never

consider buying an electric carp2 = proportion of high-income adult Americans who would never

consider buying an electric car

Step 2 (Method):Because the answers to the four key questions are 1) hypothesis testing, 2) sample data, 3) one categorical variable, and 4) two samples, consider a large-sample hypothesis test for a difference in population proportions. In this situation, because neither type of error is much more serious than the other, you might choose a value of 0.05 for a.

Gas Prices Continued . . .

H0: p1 – p2 = 0 versus Ha: p1 – p2 ≠ 0

Step 3 (Check):• The sample size is large enough because

, ,

• From the study description, you know that the samples were independently selected. You also know that Gallup believed the samples were selected in a way that would result in representative samples of adult Americans in the two income groups.

Gas Prices Continued . . .

H0: p1 – p2 = 0 versus Ha: p1 – p2 ≠ 0

Step 4 (Calculations):

Test statistic:

P-value = 2 · P(z > 1.50) = 0.1336

Gas Prices Continued . . .

H0: p1 – p2 = 0 versus Ha: p1 – p2 ≠ 0

Step 5 (Communicate Results):

Decision: 0.1336 > 0.05, Fail to reject H0

Conclusion: Based on the sample data, you are not convinced that there is a difference in the proportions who would never consider buying an electric car for low-income and high-income adult Americans.

Avoid These Common Mistakes

Avoid These Common Mistakes

1. Remember that the results of a hypothesis test can never show strong support for the null hypothesis. In two-sample situations, this means that you shouldn’t be convinced that there is not difference between two population proportions based on the outcome of a hypothesis test.

Avoid These Common Mistakes

2. If you have complete information (a census) for both populations, there is no need to carry out a hypothesis test or to construct a confidence interval – in fact, it would be inappropriate to do so.

Avoid These Common Mistakes

3. Don’t confuse statistical significance with practical significance. In the two-sample setting, it is possible to be convinced that two population proportions are not equal even in situations where the actual difference between them is small enough that it is of no practical use.

After rejecting a null hypothesis of no difference, it is useful to look at a confidence interval estimate of the difference to get a sense of practical significance.

4. Correctly interpreting confidence intervals in the two-sample case is more difficult than in the one-sample case, so take particular care when providing two-sample confidence interval interpretations.

Avoid These Common Mistakes

Because the two-sample confidence interval estimates a difference (p1 – p2), the most important thing to note is whether or not the interval includes 0.