Two-Sample Proportions Inference
description
Transcript of Two-Sample Proportions Inference
Two-Sample Two-Sample Proportions Proportions InferenceInference
Two-Sample Proportions Two-Sample Proportions InferenceInference
Sampling Distributions for the Sampling Distributions for the difference in proportionsdifference in proportions
When tossing pennies, the probability of the coin landing on heads is 0.5. However, when spinning the coin, the probability of the coin landing on heads is 0.4. Let’s investigate.
Pairs of students will be given pennies and assigned to either flip or spin the penny
Looking at the sampling distribution Looking at the sampling distribution of the difference in sample of the difference in sample
proportions:proportions:What is the mean of the difference in sample proportions (flip - spin)?
What is the standard deviation of the difference in sample proportions (flip - spin)?
Can the sampling distribution of difference in sample proportions (flip - spin) be approximated by a normal distribution?
What is the probability that the difference in proportions (flipped – spun) is at least .25?
1.0ˆˆ sfpp
14.0ˆˆ sfpp
Yes, since n1p1=12.5, n1(1-p1)=12.5, n2p2=10, n2(1-p2)=15 – so all are at least 5)
ˆ ˆ .25 .1420f sP p p
Assumptions:Assumptions:
• TwoTwo, independentindependent SRS’s from populations ( or randomly assigned treatments)
• Populations at least 10n• Normal approximation for both
51
5
11
11
pn
pn
51
5
22
22
pn
pn
The National Sleep Foundation asked a random sample of U.S. adults questions about their sleep habits. One of the questions asked about snoring. Of the 995 respondents, 37% of adults reported that they snored at least a few nights a week during the past year.
Would you expect that percentage to be the same for all age groups? Split into two age categories, 26% of the 184 people under 30 snored, compared with 39% of the 811 in the older group.
Is this difference of 13% real, or due only to natural fluctuations in the sample we’ve chose?
18-29 30 and over
Total
Snore 48 318 366
Don’t snore
136 493 629
Total 184 811 995
HYPOTHESIS TEST!
For the true DIFFERENCE between
snoring rates!
Steps:Steps:
1) Assumptions2) Hypothesis statements &
define parameters3) Calculations4) Conclusion, in context
Assumptions:Assumptions:
• TwoTwo, independentindependent SRS’s from populations ( or randomly assigned treatments)
• Populations at least 10n• Normal approximation for both
1 1
1 1
10
1 10
n p
n p
2 2
2 2
10
1 10
n p
n p
18-29 30 and over
Total
Snore 48 (26%) 318 (39%) 366
Don’t snore
136 493 629
Total 184 811 995Assumptions:Assumptions:
• Ages are independent of each other Ages are independent of each other in random samplein random sample
• 995 is less than 10% of all adults
• Normal approximation for both
1 1
1 1
184(.26) 48 10
1 184(.74) 136 10
n p
n p
2 2
2 2
811(.39) 316 10
1 811(.61) 495 10
n p
n p
Hypothesis statements:Hypothesis statements:
H0: p1 - p2 = 0
Ha: p1 - p2 > 0
Ha: p1 - p2 < 0
Ha: p1 - p2 ≠ 0
Be sure to define
both p1 & p2!
H0: p1 = p2
Ha: p1 > p2
Ha: p1 < p2 Ha: p1 ≠ p2
Hypothesis statements:Hypothesis statements:
H0: There is no difference in snoring rates in the two age groups
pold – pyoung = 0p1
= p2
Ha: There is a difference in snoring rates in the two age groups
pold – pyoung ≠ 0p1
≠ p2
Since we assume that the population proportions are equal in the null hypothesis, the variances are equal.
Therefore, we
combine
the variances!
1 2
1 2
ˆc
x xp
n n
Formula for Hypothesis Formula for Hypothesis test:test:
statistic of SD
parameter - statisticstatisticTest
Test statistic:
1 2
c c c c
1 2
p pz
p (1 p ) p (1 p )
n n
184, 811
48 318.261, .392
184 81148 318
.3678184 811
young old
young old
combined
n n
p p
p
Test statistic:1 2
c c c c
1 2
p pz
p (1 p ) p (1 p )
n n
.392 .2613.33
(.3678)(.6322) (.3678)(.6322)811 184
z
P-value = 2(area to right of z = 3.33) = .0008
“Since the p-value < (>) , I reject (fail to reject) the H0. There is (is not) sufficient evidence to suggest that Ha.”
Conclusion:
Since the p-value is less that my significance level, I reject the null hypothesis of no difference. There is sufficient evidence to suggest that there is a difference in the rate of snoring between older adults and younger adults.
Formula for Hypothesis Formula for Hypothesis test:test:
statistic of SD
parameter - statisticstatisticTest
z
21
2121
11ˆ1ˆ
ˆˆ
nnpp
pppp
Usually p1 – p2 =0
Example - Student RetentionA group of college students were asked what they thought the “issue of the day”. Without a pause the class almost to a person said “student retention”. The class then went out and obtained a random sample (questionable) and asked the question, “Do you plan on returning next year?”
The responses along with the gender of the person responding are summarized in the following table. Response
Yes No Maybe Male 211 45 19 Gender Female 141 32 9
Test to see if the proportion of students planning on returning is the same for both genders at the 0.05 level of significance.
Assumptions: The two samples are independently chosen random samples. Sample is less than 10% of college population.
The sample sizes are large enough sincen1 p1 = 211 10, n1(1- p1) = 64 10n2p2 = 141 10, n2(1- p2) = 41 10
so an approx normal model can be used.
Significance level: = 0.05
Example - Student Retention
pm = true proportion of males who plan on returning
pf = true proportion of females who plan on returning
nm = 275 (number of males surveyed)
nf = 182 (number of females surveyed)
(sample proportion of males who plan on returning)
(sample proportion of females who plan on returning)
Null hypothesis: H0: pm– pf = 0Alternate hypothesis: Ha: pm – pf
0
Example - Student Retention
211.7672
275mp
141.7747
182fp
1 1 2 2c
1 2
n p n p 211 141 352p 0.7702
n n 275 182 457
Calculations:
1 1 2 2c
1 2
n p n p 211 141 352p 0.7702
n n 275 182 457
1 2
c c c c
p p
p (1 p ) p (1 p )
275 182
0.76727 0.77473
0.77024(1 0.77024) 0.77024(1 0.77024)
275 182
z
-0.0074525-0.19
0.040198
Test statistic:1 2
c c c c
1 2
p pz
p (1 p ) p (1 p )
n n
P-value:
The P-value for this test is 2 times the area under the z curve to the left of the computed z = -0.19.
P-value = 2(0.4247) = 0.8494
Conclusion:
Since P-value = 0.849 > 0.05 = , the hypothesis H0 is not rejected at significance level 0.05.
There is no evidence that the return rate is different for males and females.
Example 4: A forest in Oregon has an infestation of spruce moths. In an effort to control the moth, one area has been regularly sprayed from airplanes. In this area, a random sample of 495 spruce trees showed that 81 had been killed by moths. A second nearby area receives no treatment. In this area, a random sample of 518 spruce trees showed that 92 had been killed by the moth. Do these data indicate that the proportion of spruce trees killed by the moth is different for these areas?
Assumptions:
•Have 2 independent SRS of spruce trees
•Both distributions are approximately normal since n1p1=81, n1(1-p1)=414, n2p2=92, n2(1-p2)=426 and all > 10
•Population of spruce trees is at least 10,130.
H0: p1 = p2 where p1 is the true proportion of trees killed by moths Ha: p1 ≠ p2 in the treated area p2 is the true proportion of trees killed by moths in the untreated area
59.0
5181
4951
83.17.
18.16.
11ˆ1ˆ
ˆˆ
21
21
nnpp
ppz P-value =
0.5547
= 0.05
Since p-value > , I fail to reject H0. There is not sufficient evidence to suggest that the proportion of spruce trees killed by the moth is different for these areas
18-29 30 and over
Total
Snore 48 318 366
Don’t snore
136 493 629
Total 184 811 995
Back to snoring….What if I wanted to know the true difference in the population proportion of young adults who snore and the proportion of older adults who snore?
CONFIDENCE INTERVAL!
For the true DIFFERENCE between
snoring rates!
What are the steps for performing a confidence interval?1.) Identify the interval by name or formula
(CI for two-sample proportion)
2.) Assumptions• TwoTwo, independentindependent SRS’s from populations ( or randomly
assigned treatments)• Populations at least 10n• Normal approximation for both
3.) Calculations
4.) Conclusion (in context of problem)
1 1
1 1
10
1 10
n p
n p
2 2
2 2
10
1 10
n p
n p
Formula for confidence Formula for confidence interval:interval:
statistic of SD valuecritical statisticCI
21 ˆˆ pp *z 2
22
1
11 ˆ1ˆˆ1ˆ
npp
npp
Note: use p-hat when p is not known
Standard error!
Margin of error!
Two-sample Confidence Interval for Proportions
Conditions for inference have previously been met
2
1 1 2 21
1 2
ˆ ˆ ˆ ˆ1 1( )
p p p pp p z
n n
.392 1 .392 .261 1 .261(.392 .261) 1.96
811 184
.131 .0718
(.0592,.2028)
We are 95% confidence that the proportion of people who snore is between 5.92% and 20.28% higher for older adults than for younger adults.
Example 1: At Community Hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is the shape & standard error of the sampling distribution of the difference in the proportions of people with no visible scars between the two groups?
Since n1p1=259, n1(1-p1)=57, n2p2=94, n2(1-p2)=325 and all >
5, then the distribution of difference in proportions is
approximately normal.
0296.0419
)78(.22.316
)18(.82...
ES
Example 1: At Community Hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. What is a 95% confidence interval of the difference in proportion of people who had no visible scars between the plasma compress treatment & control group?
Assumptions:
•Have 2 independent randomly assigned treatment groups
•Both distributions are approximately normal since n1p1=259, n1(1-p1)=57, n2p2=94, n2(1-p2)=325 and all > 5
•Population of burn patients is at least 7350.
Since these are all burn patients, we can add 316 + 419 = 735.
If not the same – you MUST list separately.
654.,537.419
78.22.316
18.82.96.122.82.
11*ˆˆ
2
22
1
1121
n
ppnpp
zpp
We are 95% confident that the true proportion of people who had no visible scars between the plasma compress treatment is between 53.7% and 65.4% higher than for the control group.
Example 2: Suppose that researchers want to estimate the difference in proportions of people who are against the death penalty in Texas & in California. If the two sample sizes are the same, what size sample is needed to be within 2% of the true difference at 90% confidence?
nn)5(.5.)5(.5.
645.102. n25.25.
645.102.
Since both n’s are the same size, you have
common denominators – so add!
n = 3383
Do you think that the
proportion of defective
PEANUT M&Ms is higher than
The proportion of defective PLAIN M&MS?