1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of 1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving...
1 1 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Chapter 11: Comparisons Involving Proportions
and a Test of Independence
Inferences About the Difference Between Two Population Proportions
Test of Independence: Contingency Tables
Hypothesis Test for Proportions of a Multinomial Population
2 2 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Inferences About the Difference BetweenTwo Population Proportions
Interval Estimation of p1 - p2
Hypothesis Tests About p1 - p2
3 3 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Expected Value
Sampling Distribution of p p1 2
E p p p p( )1 2 1 2
p pp pn
p pn1 2
1 1
1
2 2
2
1 1 ( ) ( )
where: n1 = size of sample taken from population 1
n2 = size of sample taken from population 2
Standard Deviation (Standard Error)
4 4 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
If the sample sizes are large, the sampling distribution of can be approximated by a normal probability distribution.
If the sample sizes are large, the sampling distribution of can be approximated by a normal probability distribution.
p p1 2
The sample sizes are sufficiently large if all of these conditions are met: The sample sizes are sufficiently large if all of these conditions are met:
n1p1 > 5 n1(1 - p1) > 5
n2p2 > 5 n2(1 - p2) > 5
Sampling Distribution of p p1 2
5 5 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Sampling Distribution of p p1 2
p1 – p2p1 – p2
p pp pn
p pn1 2
1 1
1
2 2
2
1 1 ( ) ( )
p p1 2
6 6 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Interval Estimation of p1 - p2
Interval Estimate
1 1 2 21 2 / 2
1 2
(1 ) (1 )p p p pp p z
n n
7 7 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Market Research Associates is conducting research to evaluate the effectiveness of a client’s
new advertising campaign. Before the new campaign began, a telephone survey of 150 households in the test market area showed 60 households “aware” ofthe client’s product.
Interval Estimation of p1 - p2
Example:
The new campaign has been initiated with TV and newspaper advertisements running for three weeks.
8 8 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
A survey conducted immediatelyafter the new campaign showed 120of 250 households “aware” of theclient’s product.
Interval Estimation of p1 - p2
Does the data support the positionthat the advertising campaign has provided an increased awareness ofthe client’s product?
9 9 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Point Estimator of the Difference BetweenTwo Population Proportions
= sample proportion of households “aware” of the product after the new campaign
= sample proportion of households “aware” of the product before the new campaign
1p
2p
p1 = proportion of the population of households “aware” of the product after the new campaign p2 = proportion of the population of households “aware” of the product before the new campaign
1 2
120 60.48 .40 .08
250 150p p
10 10 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
.08 + 1.96(.0510).08 + .10
.48(.52) .40(.60).48 .40 1.96
250 150
Interval Estimation of p1 - p2
Hence, the 95% confidence interval for the differencein before and after awareness of the product is-.02 to +.18.
For = .05, z.025 = 1.96:
11 11 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypothesis Tests about p1 - p2
Hypotheses Testing
H0: p1 - p2 < 0
Ha: p1 - p2 > 0 1 2: 0aH p p 0 1 2: 0H p p 0 1 2: 0H p p
1 2: 0aH p p 0 1 2: 0H p p 1 2: 0aH p p
Left-tailed Right-tailed Two-tailed
We focus on tests involving no difference betweenthe two population proportions (i.e. p1 = p2)
12 12 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypothesis Tests about p1 - p2
1 2p p Pooled Estimate of Standard Error of
1 2
1 2
1 1(1 )p p p p
n n
1 1 2 2
1 2
n p n pp
n n
where:
13 13 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypothesis Tests about p1 - p2
1 2
1 2
( )
1 1(1 )
p pz
p pn n
Test Statistic
14 14 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Can we conclude, using a .05 levelof significance, that the proportion ofhouseholds aware of the client’s productincreased after the new advertisingcampaign?
Hypothesis Tests about p1 - p2
Example: Market Research Associates
15 15 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypothesis Tests about p1 - p2
1. Develop the hypotheses.H0: p1 - p2 < 0
Ha: p1 - p2 > 0
p1 = proportion of the population of households “aware” of the product after the new campaign
p2 = proportion of the population of households “aware” of the product before the new campaign
16 16 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypothesis Tests about p1 - p2
2. Specify the level of significance. a = .05
3. Compute the value of the test statistic.
p
250 48 150 40250 150
180400
45(. ) (. )
.
sp p1 245 55 1
2501150 0514 . (. )( ) .
(.48 .40) 0 .08 1.56
.0514 .0514z
17 17 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypothesis Tests about p1 - p2
Using the Critical Value Approach
5. Compare the Test Statistic with the Critical Value.
Because 1.56 < 1.645, we cannot reject H0.
For a = .05, z.05 = 1.645
4. Determine the critical value and rejection rule.
We cannot conclude that the proportion of householdsaware of the client’s product increased after the newcampaign.
18 18 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypothesis Tests about p1 - p2
5. Compare the p-value with significance level.
We cannot conclude that the proportion of householdsaware of the client’s product increased after the newcampaign.
4. Compute the p –value.
For z = 1.56, the p–value = .0594
Because p–value > a = .05, we cannot reject H0.
Using the p –Value Approach
19 19 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypothesis (Goodness of Fit) Testfor Proportions of a Multinomial Population
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the observed frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected frequency, ei , in each category by multiplying the category probability by the sample size.
20 20 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
22
1
( )f ee
i i
ii
k
4. Compute the value of the test statistic.
Note: The test statistic has a chi-square distributionwith k – 1 df provided that the expected frequenciesare 5 or more for all categories.
fi = observed frequency for category iei = expected frequency for category i
k = number of categories
where:
Hypothesis (Goodness of Fit) Testfor Proportions of a Multinomial Population
21 21 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
where is the significance level and
there are k - 1 degrees of freedom
p-value approach:
Critical value approach:
Reject H0 if p-value < a
5. Rejection rule:2 2
Reject H0 if
Hypothesis (Goodness of Fit) Testfor Proportions of a Multinomial Population
22 22 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Multinomial Distribution Goodness of Fit Test
Example:
Finger Lakes Homes manufactures four models of prefabricated homes, a two-story colonial, a log cabin, a split-level, and an A-frame. To help in production planning, management would like to determine if previous customer purchases indicate that there is a preference in the style selected.
23 23 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Split- A-Model Colonial Log Level Frame# Sold 30 20 35 15
The number of homes sold of eachmodel for 100 sales over the past twoyears is shown below.
Multinomial Distribution Goodness of Fit Test
24 24 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
The Hypotheses
Multinomial Distribution Goodness of Fit Test
where: pC = population proportion that purchase a colonial pL = population proportion that purchase a log cabin pS = population proportion that purchase a split-level pA = population proportion that purchase an A-frame
H0: pC = pL = pS = pA = .25
Ha: The population proportions are not equal
pC = .25, pL = .25, pS = .25, and pA = .25
25 25 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Rejection Rule
22
7.815 7.815
Do Not Reject H0Do Not Reject H0 Reject H0Reject H0
Multinomial Distribution Goodness of Fit Test
With = .05 and k - 1 = 4 - 1 = 3 degrees of freedom
Reject H0 if p-value < .05 or c2 > 7.815.
26 26 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Expected Frequencies
Test Statistic
22 2 2 230 25
2520 25
2535 25
2515 25
25
( ) ( ) ( ) ( )
Multinomial Distribution Goodness of Fit Test
e1 = .25(100) = 25 e2 = .25(100) = 25
e3 = .25(100) = 25 e4 = .25(100) = 25
= 1 + 1 + 4 + 4 = 10
27 27 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Conclusion Using the Critical Value Approach
Multinomial Distribution Goodness of Fit Test
We reject, at the .05 level of significance,the assumption that there is no home stylepreference.
c2 = 10 > 7.815
28 28 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Multinomial Distribution Goodness of Fit Test
Conclusion Using the p-Value Approach
The p-value < a . We can reject the null hypothesis.
Because c2 = 10 is between 9.348 and 11.345, the area in the upper tail of the distribution is between .025 and .01.
Area in Upper Tail .10 .05 .025 .01 .005
c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838
29 29 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Test of Independence: Contingency Tables
ei j
ij (Row Total )(Column Total )
Sample Size
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the observed frequency, fij , for each cell of the contingency table.
3. Compute the expected frequency, eij , for each cell.
30 30 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
22
( )f e
eij ij
ijji
5. Determine the rejection rule.
Reject H0 if p -value < a or .
2 2
4. Compute the test statistic.
where is the significance level and,with n rows and m columns, there are(n - 1)(m - 1) degrees of freedom.
Test of Independence: Contingency Tables
31 31 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Each home sold by Finger LakesHomes can be classified according toprice and to style. Finger Lakes’manager would like to determine ifthe price of the home and the style ofthe home are independent variables.
Contingency Table (Independence) Test
Example
32 32 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Price Colonial Log Split-Level A-Frame
The number of homes sold foreach model and price for the past twoyears is shown below. For convenience,the price of the home is listed as either$99,000 or less or more than $99,000.
> $99,000 12 14 16 3< $99,000 18 6 19 12
Contingency Table (Independence) Test
33 33 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Hypotheses
Contingency Table (Independence) Test
H0: Price of the home is independent of the
style of the home that is purchasedHa: Price of the home is not independent of the
style of the home that is purchased
34 34 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Expected Frequencies
Contingency Table (Independence) Test
Price Colonial Log Split-Level A-Frame Total
< $99K
> $99K
Total 30 20 35 15 100
12 14 16 3 45
18 6 19 12 55
35 35 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Rejection Rule
Contingency Table (Independence) Test
2.05 7.815 With = .05 and (2 - 1)(4 - 1) = 3 d.f.,
Reject H0 if p-value < .05 or 2 > 7.815
22 2 218 16 5
16 56 11
113 6 75
6 75 ( . )
.( )
. .( . )
. .
= .1364 + 2.2727 + . . . + 2.0833 = 9.149
Test Statistic
36 36 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Conclusion Using the Critical Value Approach
Contingency Table (Independence) Test
We reject, at the .05 level of significance,the assumption that the price of the home isindependent of the style of home that ispurchased.
c2 = 9.145 > 7.815
37 37 Slide
Slide
© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.
Conclusion Using the p-Value Approach
The p-value < a . We can reject the null hypothesis.
Because c2 = 9.145 is between 7.815 and 9.348, the area in the upper tail of the distribution is between .05 and .025.
Area in Upper Tail .10 .05 .025 .01 .005
c2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838
Contingency Table (Independence) Test