Comparing Two Proportions

48
COMPARING TWO PROPORTIONS Chapter 22

description

Chapter 22. Comparing Two Proportions. Objectives. Sampling distribution of the difference between two proportions 2-proportion z-interval Pooling 2-proportion z-test. Comparing Two Proportions. - PowerPoint PPT Presentation

Transcript of Comparing Two Proportions

Page 1: Comparing Two Proportions

COMPARING TWO PROPORTIONS

Chapter 22

Page 2: Comparing Two Proportions

Objectives

Sampling distribution of the difference between two proportions

2-proportion z-interval Pooling 2-proportion z-test

Page 3: Comparing Two Proportions

Comparing Two Proportions

Comparisons between two percentages are much more common than questions about isolated percentages. And they are more interesting.

We often want to know how two groups differ, whether a treatment is better than a placebo control, or whether this year’s results are better than last year’s.

Page 4: Comparing Two Proportions

Another Ruler In order to examine the difference

between two proportions, we need another ruler—the standard deviation of the sampling distribution model for the difference between two proportions.

Recall that standard deviations don’t add, but variances do. In fact, the variance of the sum or difference of two independent random quantities is the sum of their individual variances.

Page 5: Comparing Two Proportions

Proportions observed in independent random samples are independent. Thus, we can add their variances. So…

The standard deviation of the difference between two sample proportions is

Thus, the standard error is

The Standard Deviation of the Difference Between Two Proportions

1 1 2 21 2

1 2

ˆ ˆ p q p qSD p pn n

1 1 2 21 2

1 2

ˆ ˆ ˆ ˆˆ ˆ p q p qSE p pn n

Page 6: Comparing Two Proportions

Assumptions and Conditions Independence Assumptions:

Randomization Condition: The data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment.

The 10% Condition: If the data are sampled without replacement, the sample should not exceed 10% of the population.

Independent Groups Assumption: The two groups we’re comparing must be independent of each other.

Page 7: Comparing Two Proportions

Assumptions and Conditions

Sample Size Condition: Each of the groups must be big

enough… Success/Failure Condition: Both groups

are big enough that at least 10 successes and at least 10 failures have been observed in each.

Page 8: Comparing Two Proportions

The Sampling Distribution

We already know that for large enough samples, each of our proportions has an approximately Normal sampling distribution.

The same is true of their difference.

Page 9: Comparing Two Proportions

Provided that the sampled values are independent, the samples are independent, and the samples sizes are large enough, the sampling distribution of is modeled by a Normal model with Mean:

Standard deviation:

The Sampling Distribution

1 2p p

1 1 2 21 2

1 2

ˆ ˆ p q p qSD p pn n

1 2ˆ ˆp p

Page 10: Comparing Two Proportions

The Sampling Distribution

The approximately normal sampling distribution

for ( 1 – 2):p̂ p̂

Page 11: Comparing Two Proportions

When the conditions are met, we are ready to find the confidence interval for the difference of two proportions:

The confidence interval is

where

The critical value z* depends on the particular confidence level, C, that you specify.

Two-Proportion z-Interval

1 1 2 21 2

1 2

ˆ ˆ ˆ ˆˆ ˆ p q p qSE p pn n

1 2 1 2ˆ ˆ ˆ ˆp p z SE p p

Page 12: Comparing Two Proportions

Example: 2-Proportion z-Interval

A recent study of 1000 randomly chosen residents in each of two randomly selected states indicated that the percent of people living in those states who were born in foreign countries was 6.5% for State A and 1.7% for State B. Find a 99% confidence interval for the difference between the proportions of foreign-born residents for these two states.

Page 13: Comparing Two Proportions

Procedure: 2-Proportion z-Interval

1) Check Conditions1) Independence2) Randomization3) 10% condition4) Success/Failure

2) Calculation Confidence Interval

3) Conclusiono In context

1 2 1 2ˆ ˆ ˆ ˆp p z SE p p 1 1 2 21 2

1 2

ˆ ˆ ˆ ˆˆ ˆ p q p qSE p pn n

Page 14: Comparing Two Proportions

Solution

1) Check Conditions1) Independence: Groups from different randomly

selected states should be independent.2) Randomization: Each sample was drawn

randomly from its respective state.3) 10% condition: Reasonable that the population

of both states is greater than 10,000.4) Success/Failure:

ˆ 1000(.065) 65 10ˆ 1000(.017) 17 10ˆ 1000(.935) 935 10ˆ 1000(.983) 983 10

A A

B B

A A

B B

n pn pn qn q

Page 15: Comparing Two Proportions

Solution

2) Calculate 99% Confidence Interval

ˆ ˆ ˆ ˆ.065, .017 and (.065 .017) .048A B A Bp p p p

1 2 1 2ˆ ˆ ˆ ˆp p z SE p p

* 2.576z

1 1 2 21 2

1 2

ˆ ˆ ˆ ˆ (.065)(.935) (.017)(.983)ˆ ˆ .00881000 1000

p q p qSE p p

n n

2.576(.0088) .0227me

99% CI: .048 .0227 or (.0253,.0707)

Page 16: Comparing Two Proportions

Solution

Conclusion I am 99% confident that the percent

difference of foreign-born residents for states A and B is between 2.5% and 7.1%.

Page 17: Comparing Two Proportions

Your Turn Suppose the AGA would like to compare the

proportion of homes heated by gas in the South with the corresponding proportion in the North. AGA selected a random sample of 60 homes located in the South and found that 34 of the homes use gas for heating fuel. AGA also randomly sampled 80 homes in the North and found 42 used gas for heating. Construct a 90% confidence interval for the difference between the proportions of Southern homes and Northern homes which are heated by gas.

Page 18: Comparing Two Proportions

Solution

1) Check Conditions1) Independence: Samples from different parts of

the country (north and south) should be independent.

2) Randomization: Each sample was drawn randomly.

3) 10% condition: Reasonable that the population in the North is greater than 800 and the population in the South is greater than 600.

4) Success/Failure:ˆ 80(.525) 42 10ˆ 60(.567) 34 10ˆ 80(.475) 38 10ˆ 60(.433) 26 10

N N

S S

N N

S S

n pn pn qn q

42ˆ: .5258034ˆ: .56760

N

S

North p

South p

Page 19: Comparing Two Proportions

Solution

2) Calculate 99% Confidence Interval

ˆ ˆ ˆ ˆ.567, .525 and (.567 .525) .042S N S Np p p p

1 2 1 2ˆ ˆ ˆ ˆp p z SE p p

* 1.645z

1 1 2 21 2

1 2

ˆ ˆ ˆ ˆ (.567)(.433) (.525)(.475)ˆ ˆ .084960 80

p q p qSE p p

n n

1.645(.0849) .140me

90% CI: .042 .14 or ( .098,.182)

Page 20: Comparing Two Proportions

Solution

Conclusion I am 90% confident that the percent difference of

homes in the South and the North is between -9.8% and 18.2%.

Because the 90% confidence interval contains the value 0, our samples have not indicated a real difference between the proportions of Southern homes and Northern homes which are heated by gas. The proportion of Southern homes could be smaller than the corresponding proportion for Northern homes by as much as 9.8%, or the proportion of Southern homes could exceed the proportion for Northern homes by as much as 18.2%.

Page 21: Comparing Two Proportions

TI-84: 2-Proportion z-Interval

Press STAT key, choose TESTS, and then choose B: 2-PropZInterval…

Adjust the settings; x1: the number selected sample 1 n1: sample 1 sample size x2: the number selected sample 2 n2: sample 2 sample size C-level: confidence level

Select “Calculate”

Page 22: Comparing Two Proportions

Example – Using the TI-84 How much does the cholesterol-lowering drug

Gemfibrozil help reduce the risk of heart attack? We compare the incidence of heart attack over a 5-year period for two random samples of middle-aged men taking either a placebo or the drug. Find a 90% Confidence Interval for the decease in the percentage of middle-aged men having heart attacks on the cholesterol-lowering drug.

H. attack

n

Drug 56 2051 2.73%

Placebo 84 2030 4.14%

Page 23: Comparing Two Proportions

Solution

90% Confidence Interval (.0047, .02345)

Conclusion We are 90% confident that the percent

difference of middle-aged men who suffer a heart attack is 0.47% to 2.3% lower when taking the cholesterol-lowering drug.

Page 24: Comparing Two Proportions

TWO PROPORTION Z-TEST

Page 25: Comparing Two Proportions

Everyone into the Pool The typical hypothesis test for the

difference in two proportions is the one of no difference. In symbols, H0: p1 – p2 = 0.

Since we are hypothesizing that there is no difference between the two proportions, that means that the standard deviations for each proportion are the same.

Since this is the case, we combine (pool) the counts to get one overall proportion.

Page 26: Comparing Two Proportions

The pooled proportion is

where and

If the numbers of successes are not whole numbers, round them first. (This is the only time you should round values in the middle of a calculation.)

Everyone into the Pool

1 2

1 2

ˆ pooledSuccess Successp

n n

1 1 1ˆSuccess n p 2 2 2ˆSuccess n p

Page 27: Comparing Two Proportions

We then put this pooled value into the formula, substituting it for both sample proportions in the standard error formula:

Everyone into the Pool

1 21 2

ˆ ˆ ˆ ˆˆ ˆ pooled pooled pooled pooled

pooled

p q p qSE p p

n n

Page 28: Comparing Two Proportions

Compared to What? We’ll reject our null hypothesis if we see a

large enough difference in the two proportions.

How can we decide whether the difference we see is large? Just compare it with its standard deviation.

Unlike previous hypothesis testing situations, the null hypothesis doesn’t provide a standard deviation, so we’ll use a standard error (here, pooled).

Page 29: Comparing Two Proportions

The conditions for the two-proportion z-test are the same as for the two-proportion z-interval.

We are testing the hypothesis H0: p1 – p2 = 0, or, equivalently, H0: p1 = p2.

The alternative hypothesis is either; Ha: p1≠p2, Ha: p1<p2, or Ha: p1>p2.

Because we hypothesize that the proportions are equal, we pool them to find

Two-Proportion z-Test

1 2

1 2

ˆ pooledSuccess Successp

n n

Page 30: Comparing Two Proportions

We use the pooled value to estimate the standard error:

Now we find the test statistic:

When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model, so we can use that model to obtain a P-value.

Two-Proportion z-Test

1 2

1 2

ˆ ˆ( ) 0ˆ ˆ( )pooled

p pzSE p p

1 21 2

ˆ ˆ ˆ ˆˆ ˆ pooled pooled pooled pooled

pooled

p q p qSE p p

n n

Page 31: Comparing Two Proportions

Procedure: 2-Proportion z-Test

1. Check conditions2. Hypothesis3. Test Statistic

Z test statistic

4. P-value5. Conclusion

1 2

1 2

ˆ ˆ( ) 0ˆ ˆ( )pooled

p pzSE p p

1 2

1 2

ˆ pooledSuccess Successp

n n

1 2

1 2

ˆ ˆ ˆ ˆˆ ˆ pooled pooled pooled pooled

pooled

p q p qSE p p

n n

Page 32: Comparing Two Proportions

Example: 2-Proportion z-Test

A researcher wants to know whether there is a difference in AP-Statistics exam failure rates between rural and suburban students. She randomly selects 107 rural students and 143 suburban students who took the exam. Thirty rural students failed to pass their exam, while 45 suburban students failed the exam. Is there a significant difference in failure rates for these groups?

Page 33: Comparing Two Proportions

Solution1) Check Conditions

1. Independence: Rural and Suburban student groups are independent.

2. Randomization: Samples were randomly drawn.3. 10% Condition: It is reasonable to believe that

the population of rural exam takers is greater than 1070 and the population of suburban exam takers is greater than 1430.

4. Success/Failures:success1=30>10success2=45>10failure1=(107-30)=77>10failure2=(143-45)=98>10

Page 34: Comparing Two Proportions

Solution

2) Hypothesis Null hypothesis – H0: pR=pS (There is no

difference in failure rates.) Alternative hypothesis – Ha: pR≠pS

(There is a difference in failure rates.)

Page 35: Comparing Two Proportions

Solution

3) Test Statistic

1 2

1 2

ˆ ˆ( ) 0 .035 0 .59ˆ ˆ( ) .059pooled

p pz

SE p p

1 2

1 2

30 45ˆ .3107 143pooled

Success Successp

n n

1 21 2

ˆ ˆ ˆ ˆ 1 1ˆ ˆ (.3)(.7) .059107 143

pooled pooled pooled pooledpooled

p q p qSE p p

n n

30 45ˆ ˆ ˆ ˆ.280, .315, and .280 .315 .035107 143R S R Sp p p p

Page 36: Comparing Two Proportions

Solution4) P-value

z = -.59 and a two-tailed test P-value = 2P(z > .59) = 2(.2776) P-value = .5552

5) ConclusionThe P-value .5552 is too high, thus we fail to reject the null hypothesis. There is insufficient evidence to show a difference between the failure rates of rural and suburban students on the AP Statistics exam.

Page 37: Comparing Two Proportions

Your Turn: 2-Proportion z-Test In recent years there has been a trend toward both parents

working outside the home. Do working mothers experience the same burdens and family pressures as their spouses? A popular belief is that the proportion of working mothers who feel they have enough spare time for themselves is significantly less than the corresponding proportion of working fathers. In order to test this claim, independent random samples of 100 working mothers and 100 working fathers were selected and their views on spare time for themselves were recorded. A summary of the data is given in the table below.

Is the belief that the proportion of working mothers who feel they have enough spare time for themselves less than the corresponding proportion of working fathers supported at significance level α=.01 by the sample information?

Working Mothers Working Fathers

Number sampled 100 100Number who feel they have enough spare time

37 56

Page 38: Comparing Two Proportions

Solution1) Check Conditions

1. Independence: Samples were independent.2. Randomization: Samples were randomly drawn.3. 10% Condition: It is reasonable to believe that

the population of mothers and fathers is greater than 1000.

4. Success/Failures:success1=37>10success2=56>10failure1=(100-37)=63>10failure2=(100-56)=44>10

Page 39: Comparing Two Proportions

Solution

2) Hypothesis Null hypothesis – H0: pM=pF (There is no

difference between Mothers and Fathers who feel they have enough spare time for themselves.)

Alternative hypothesis – Ha: pM<pF (The proportion of working mothers who feel they have enough spare time for themselves is less than the corresponding proportion of working fathers .)

Page 40: Comparing Two Proportions

Solution

3) Test Statistic

1 2

1 2

ˆ ˆ( ) 0 .19 0 2.70ˆ ˆ( ) .0705pooled

p pz

SE p p

1 2

1 2

37 56ˆ .465100 100pooled

Success Successp

n n

1 21 2

ˆ ˆ ˆ ˆ 1 1ˆ ˆ (.465)(.535) .0705100 100

pooled pooled pooled pooledpooled

p q p qSE p p

n n

37 56ˆ ˆ ˆ ˆ.37, .56, and .37 .56 .19100 100M F M Fp p p p

Page 41: Comparing Two Proportions

Solution4) P-value

z = -2.70 and a left-tailed test P-value = P(z < -2.70) = .0035 P-value = .0035

5) ConclusionThe p-value = .0035 is less than α = .01, thus we reject the null hypothesis. There is sufficient evidence at the 1% significance level to conclude that the proportion of working mothers who feel they have enough spare time for themselves is less than the corresponding proportion of working fathers.

Page 42: Comparing Two Proportions

TI-84: 2-Proportion z-Test

Press STAT key, choose TESTS, and then choose 6: 2-PropZTest…

Adjust the settings; x1: the number selected sample 1 n1: sample 1 sample size x2: the number selected sample 2 n2: sample 2 sample size p1: select type of test (≠p2, <p2, >p2)

Select “Calculate”

Page 43: Comparing Two Proportions

Example Using TI-84 To market a new white wine, a winemaker decides to use

two different advertising agencies, one operating in the east, one in the west. After the white wine has been on the market for 8 months, independent random samples of wine drinkers are taken from each of the two regions and questioned concerning their white wine preference. The numbers favoring the new brand are shown in the table. Is there evidence that the proportion of wine drinkers in the west who prefer the new wine is greater than the corresponding proportion of wine drinkers in the east. Test at the .02 significance level.

East WestNumber sampled 516 438Number who prefer the new white wine 18 23

Page 44: Comparing Two Proportions

Solution H0:pW=pE Ha:pW>pE 2-PropZTest

x1: 23 n1: 438 x2: 18 n2: 516 p1: >p2

z = 1.338 and P-value = .0905 Conclusion

The P-value .0905 is greater than α =.02, thus we fail to reject the null hypothesis. There is insufficient evidence at the .02 significance level to conclude that the proportion of wine drinkers in the west who prefer the new white wine is greater than the corresponding proportion of wine drinkers in the east.

Page 45: Comparing Two Proportions

What Can Go Wrong? Don’t use two-sample proportion methods

when the samples aren’t independent. These methods give wrong answers when the

independence assumption is violated. Don’t apply inference methods when there

was no randomization. Our data must come from representative random

samples or from a properly randomized experiment.

Don’t interpret a significant difference in proportions causally. Be careful not to jump to conclusions

about causality.

Page 46: Comparing Two Proportions

What have we learned?

We’ve now looked at inference for the difference in two proportions.

Perhaps the most important thing to remember is that the concepts and interpretations are essentially the same—only the mechanics have changed slightly.

Page 47: Comparing Two Proportions

What have we learned? (cont.) Hypothesis tests and confidence intervals

for the difference in two proportions are based on Normal models. Both require us to find the standard error of the

difference in two proportions. We do that by adding the variances of the two

sample proportions, assuming our two groups are independent.

When we test a hypothesis that the two proportions are equal, we pool the sample data; for confidence intervals we don’t pool.

Page 48: Comparing Two Proportions

Assignment

Pg 519 – 522: #1 – 21 odd