CHAPTER 10 Comparing Two Populations or Groups€¦ · · 2018-04-04PERFORM a significance test...

The Practice of Statistics, 5th Edition

Starnes, Tabor, Yates, Moore

Bedford Freeman Worth Publishers

CHAPTER 10Comparing Two Populations or Groups

10.1

Comparing Two Proportions

Learning Objectives

After this section, you should be able to:

The Practice of Statistics, 5th Edition 2

✓ DESCRIBE the shape, center, and spread of the sampling

distribution of the difference of two sample proportions.

✓ DETERMINE whether the conditions are met for doing inference

about p1 − p2.

✓ CONSTRUCT and INTERPRET a confidence interval to compare

two proportions.

✓ PERFORM a significance test to compare two proportions.



Introduction

Suppose we want to compare the proportions of individuals with a

certain characteristic in Population 1 and Population 2. Let’s call these

parameters of interest p1 and p2. The ideal strategy is to take a

separate random sample from each population and to compare the

sample proportions with that characteristic.

What if we want to compare the effectiveness of Treatment 1 and

Treatment 2 in a completely randomized experiment? This time, the

parameters p1 and p2 that we want to compare are the true proportions

of successful outcomes for each treatment. We use the proportions of

successes in the two treatment groups to make the comparison.


The Sampling Distribution of a Difference Between Two Proportions

To explore the sampling distribution of the difference between two

proportions, let’s start with two populations having a known proportion

of successes.

✓ At School 1, 70% of students did their homework last night

✓ At School 2, 50% of students did their homework last night.

Suppose the counselor at School 1 takes an SRS of 100 students and

records the sample proportion that did their homework.

School 2’s counselor takes an SRS of 200 students and records the

sample proportion that did their homework.

What can we say about the difference ˆ p 1 - ˆ p 2 in the sample proportions?



Using Fathom software, we generated an SRS of 100 students from

School 1 and a separate SRS of 200 students from School 2. The

difference in sample proportions was then calculated and plotted. We

repeated this process 1000 times.

What do you notice about the shape, center, and spreadof the sampling distribution of ˆ p 1 - ˆ p 2 ?


The Sampling Distribution of the Difference Between Sample Proportions

Choose an SRS of size n1 from Population 1 with proportion of

successes p1 and an independent SRS of size n2 from Population 2

with proportion of successes p2.


Both ˆ p 1 and ˆ p 2 are random variables. The statistic ˆ p 1 - ˆ p 2 is the differenceof these two random variables. In Chapter 6, we learned that for any two independent random variables X and Y,

mX -Y = mX - mY and s X -Y

2 = sX

2 + sY

2

Shape When n1p1, n1(1- p1), n2 p2 and n2(1- p2) are all at least 10, the

sampling distribution of ˆ p 1 - ˆ p 2 is approximately Normal.

Spread The standard deviation of the sampling distribution of ˆ p 1 - ˆ p 2 is

p1(1- p1)

n1

+p2(1- p2)

n2

as long as each sample is no more than 10% of its population (10% condition).



Suppose that there are two large high schools, each with more than

2000 students, in a certain town. At School 1, 70% of students did their

homework last night. Only 50% of the students at School 2 did their

homework last night. The counselor at School 1 takes an SRS of 100

students and records the proportion that did homework. School 2’s

counselor takes an SRS of 200 students and records the proportion that

did homework

a) Describe the shape, center, and spread of the sampling distribution of ˆ p 1 - ˆ p 2.

Because n1p1 =100(0.7) = 70, n1(1- p1) =100(0.30) = 30, n2 p2 = 200(0.5) =100

and n2(1- p2) = 200(0.5) =100 are all at least 10, the sampling distribution

of ˆ p 1 - ˆ p 2 is approximately Normal.

Its mean is p1 - p2 = 0.70 -0.50 = 0.20.

Its standard deviation is

0.7(0.3)

100+

0.5(0.5)

200= 0.058.


American made cars

Nathan and Kyle both work for the Department of Motor Vehicles

(DMV), but they live in different states. In Nathan’s state, 80% of the

registered cars are made by American manufacturers. In Kyle’s state,

only 60% of the registered cars are made by American manufacturers.

Nathan selects a random sample of 100 cars in his state and Kyle

selects a random sample of 70 cars in his state. Let Ƹ𝑝𝑁 − Ƹ𝑝𝐾 be the

difference in the sample proportion of cars made by American

manufacturers.

Problem:

a)What is the shape of the sampling distribution of Ƹ𝑝𝑁 − Ƹ𝑝𝐾? Why?

Because 100(0.80) = 80, 100(1 – 0.80) = 20, 70(0.60) = 42, and 70(1 –

0.60) = 28 are all ≥ 10, the sampling distribution of Ƹ𝑝𝑁 − Ƹ𝑝𝐾 is

approximately Normal.


American made cars








manufacturers.

Problem:

b) Find the mean of the sampling distribution. Show your work.

The mean is 𝜇 ො𝑝𝑁− ො𝑝𝐾 = 0.80 − 0.60 = 0.20.


American made cars








manufacturers.

Problem:

c) Find the standard deviation of the sampling distribution. Show your

work.

Because there are at least 10(100) = 1000 cars in Nathan’s state and at

least 10(70) = 700 cars in Kyle’s state, the standard deviation is

𝜎 ො𝑝𝑁− ො𝑝𝐾 =0.80(0.20)

100+0.60(0.40)

70= 0.0709


Confidence Intervals for p1 – p2

When data come from two random samples or two groups in a randomized

experiment, the statistic ˆ p 1 - ˆ p 2 is our best guess for the value of p1 -p2 . We

can use our familiar formula to calculate a confidence interval for p1 -p2:

statistic± (critical value) × (standard deviation of statistic)

If the Normal condition is met, we find the critical value z* for the given

confidence level from the standard Normal curve.



Conditions For Constructing A Confidence Interval About A Difference In Proportions

• Random: The data come from two independent random samples or

from two groups in a randomized experiment.

o 10%: When sampling without replacement, check that

n1 ≤ (1/10)N1 and n2 ≤ (1/10)N2.

•

Because we don't know the values of the parameters p1 and p2, we replace them

in the standard deviation formula with the sample proportions. The result is the

standard error of the statistic ˆ p 1 - ˆ p 2 : ˆ p 1(1- ˆ p 1)

n1

+ˆ p 2(1- ˆ p 2)

n2



Two-Sample z Interval for a Difference Between Two Proportions


Presidential approval

Many news organizations conduct polls asking adults in the United States if

they approve of the job the president is doing. How did President Obama’s

approval rating change from October 2012 to October 2013? According to a

Gallup poll of 1500 randomly selected U.S. adults on October 2–4, 2012,

52% approved of Obama’s job performance. A Gallup poll of 1500 randomly

selected U.S. adults on October 5–7, 2013, showed that 46% approved of

Obama’s job performance.

Problem:

(a) Calculate the standard error of the sampling distribution of the difference

in the sample proportions (2013 – 2012). Interpret this value.

𝑆𝐸 ො𝑝13− ො𝑝12 =0.46(0.54)

1500+0.52(0.48)

1500= 0.0182

If we were to take many random samples of 1500 U.S. adults in October 2012 and

October 2013, the difference in the sample proportions who approve of President

Obama will typically be 0.0182 from the true difference in proportions.



Many news organizations conduct polls asking adults in the United States if they

approve of the job the president is doing. How did President Obama’s approval rating

change from October 2012 to October 2013? According to a Gallup poll of 1500

randomly selected U.S. adults on October 2–4, 2012, 52% approved of Obama’s job

performance. A Gallup poll of 1500 randomly selected U.S. adults on October 5–7,

2013, showed that 46% approved of Obama’s job performance.

Problem:

(b) Use the results of these polls to construct and interpret a 90% confidence interval

for the change in Obama’s approval rating among all US adults from October 2012 to

October 2013.

State: We want to estimate 𝑝13 − 𝑝12 at the 90% confidence level where

𝑝13 is the true proportion of al US adults who approved of President Obama’s job

performance in October 2013 and

𝑝12 is the true proportion of all US adults who approved of President Obama’s job

performance in October 2012.









Problem:



October 2013.

Plan: We should use a two-sample z interval for 𝑝13 − 𝑝12 if the conditions are met.

Random: The data came from independent random samples.

10%: There were more than 10(1500) = 15,000 US adults in both October 2013 and

October 2012

Large Counts: 𝑛13 Ƹ𝑝13 = 1500 0.46 = 690, 𝑛13(1 − Ƹ𝑝13) = 1500 0.54 = 810,

𝑛12 Ƹ𝑝12 = 1500 0.52 = 780, and 𝑛12(1 − Ƹ𝑝12) = 1500 0.48 = 720 are all at least 10.









Problem:



October 2013.

Do: Ƹ𝑝13 − Ƹ𝑝12 ± 𝑧∗ො𝑝13 1− ො𝑝13

𝑛13+

ො𝑝12 1− ො𝑝12

𝑛12= 0.46 − 0.52 ±

1.6450.46 1−0.46

1500+

0.52 1−0.52

1500= −0.06 ± 0.030 = (−0.090,−0.030)









Problem:



October 2013.

Conclude: We are 90% confident that the interval from −0.090 to –0.030

captures the true change in the proportion of U.S. adults who approve of

President Obama’s job performance from October 2012 to October 2013.









Problem:

(c) Based on your interval, is there convincing evidence that Obama’s job approval

rating has changed?

Because 0 is not included in the interval, we have convincing evidence that

President Obama’s approval rating has changed from October 2012 to

October 2013.

Section Summary

In this section, we learned how to…


✓ DESCRIBE the shape, center, and spread of the sampling distribution

of the difference of two sample proportions.

✓ DETERMINE whether the conditions are met for doing inference

about p1 − p2.

✓ CONSTRUCT and INTERPRET a confidence interval to compare two

proportions.



PAGE 629

2, 6, 10Homework

CHAPTER 10 Comparing Two Populations or Groups€¦ · · 2018-04-04PERFORM a significance test...

Documents

Transcript of CHAPTER 10 Comparing Two Populations or Groups€¦ · · 2018-04-04PERFORM a significance test...