Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the...

32
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the...

Page 1: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.
Page 2: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Chapter 11a: Comparisons Involving Proportions and a Test of Independence

• Inference about the Difference between the Proportions of Two Populations

• Hypothesis Test for Proportions of a Multinomial Population

HHoo: : pp11 - - pp22 = 0 = 0HHaa: : pp11 - - pp22 = 0 = 0

Page 3: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Inferences About the Difference between the Proportions of Two Populations

• Sampling Distribution of

• Interval Estimation of p1 - p2

• Hypothesis Tests about p1 - p2

p p1 2p p1 2

Page 4: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

• Expected Value

• Standard Deviation

Sampling Distribution of p p1 2p p1 2

E p p p p( )1 2 1 2 E p p p p( )1 2 1 2

p pp pn

p pn1 2

1 1

1

2 2

2

1 1 ( ) ( ) p p

p pn

p pn1 2

1 1

1

2 2

2

1 1 ( ) ( )

where: where: nn11 = size of sample taken from population 1 = size of sample taken from population 1

nn22 = size of sample taken from population 2 = size of sample taken from population 2

Page 5: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

If the sample sizes are large, the sampling distributionIf the sample sizes are large, the sampling distribution of can be approximated by a normal probabilityof can be approximated by a normal probability distribution. distribution.

If the sample sizes are large, the sampling distributionIf the sample sizes are large, the sampling distribution of can be approximated by a normal probabilityof can be approximated by a normal probability distribution. distribution.

p p1 2p p1 2

The sample sizes are sufficiently large if The sample sizes are sufficiently large if allall of these of these conditions are met:conditions are met: The sample sizes are sufficiently large if The sample sizes are sufficiently large if allall of these of these conditions are met:conditions are met:

nn11pp11 >> 5 5 nn11(1 - (1 - pp11) ) >> 5 5

nn22pp22 >> 5 5 nn22(1 - (1 - pp22) ) >> 5 5

Sampling Distribution ofSampling Distribution of p p1 2p p1 2

Page 6: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Sampling Distribution ofSampling Distribution of p p1 2p p1 2

pp11 – – pp22pp11 – – pp22

p pp pn

p pn1 2

1 1

1

2 2

2

1 1 ( ) ( ) p p

p pn

p pn1 2

1 1

1

2 2

2

1 1 ( ) ( )

p p1 2p p1 2

Page 7: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Interval Estimation of p1 - p2

• Interval Estimate p p z p p1 2 2 1 2 /p p z p p1 2 2 1 2 /

sp pn

p pnp p1 2

1 1

1

2 2

2

1 1 ( ) ( )

sp pn

p pnp p1 2

1 1

1

2 2

2

1 1 ( ) ( )

p p1 2 p p1 2 PointPoint EstimatorEstimator ofof

Page 8: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Example: MRA MRA (Market Research Associates) is conducting

research to evaluate the effectiveness ofa client’s new advertising campaign. Before the new campaign began, atelephone survey of 150 householdsin the test market area showed 60 households “aware” of the client’sproduct.

The new campaign has been initiated with TV andnewspaper advertisements running for three weeks.

Page 9: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Example: MRAExample: MRA

A survey conducted immediately after the newA survey conducted immediately after the newcampaign showed 120 of 250 householdscampaign showed 120 of 250 households““aware” of the client’s product.aware” of the client’s product.

Does the data support the positionDoes the data support the positionthat the advertising campaign has that the advertising campaign has provided an increased awareness ofprovided an increased awareness ofthe client’s product?the client’s product?

Page 10: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Point Estimator of the Difference Betweenthe Proportions of Two Populations

= sample proportion of households “aware” of the= sample proportion of households “aware” of the product product afterafter the new campaign the new campaign = sample proportion of households “aware” of the= sample proportion of households “aware” of the product product beforebefore the new campaign the new campaign

1p1p

2p2p

pp11 = proportion of the population of households = proportion of the population of households “ “aware” of the product aware” of the product afterafter the new campaign the new campaign pp22 = proportion of the population of households = proportion of the population of households “ “aware” of the product aware” of the product beforebefore the new campaign the new campaign

1 2

120 60.48 .40 .08

250 150p p 1 2

120 60.48 .40 .08

250 150p p

Page 11: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

.08 + 1.96(.0510)

.08 + .10

. . .. (. ) . (. )

48 40 1 9648 52250

40 60150

. . .. (. ) . (. )

48 40 1 9648 52250

40 60150

Interval Estimate of p1 - p2:Large-Sample Case

Hence, the 95% confidence interval for the differenceHence, the 95% confidence interval for the differencein before and after awareness of the product isin before and after awareness of the product is-.02 to +.18.-.02 to +.18.

For For = .05, = .05, zz.025.025 = 1.96: = 1.96:

Page 12: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Using Excel to DevelopUsing Excel to Developan Interval Estimate of an Interval Estimate of pp11 – – pp22

Formula WorksheetFormula WorksheetA B C D E

1 Sur2 Sur1 Survey 2 (from Popul.1) Survey 1 (from Popul.2)2 No Yes Sample Size 250 1503 Yes No No. of "Yes" =COUNTIF(A2:A251,"Yes") =COUNTIF(B2:B151,"Yes")4 Yes Yes Samp. Propor. =D3/D2 =E3/E25 No Yes 6 Yes No Confid. Coeff. 0.95 7 No No Lev. Of Signif. =1-D6 8 No Yes z Value =NORMSINV(1-D7/2) 9 Yes No

10 No No Std. Error =SQRT(D4*(1-D4)/D2+E4*(1-E4)/E2)11 Yes Yes Marg. of Error =D8*D1012 Yes No 13 Yes Yes Pt. Est. of Diff. =D4-E414 No Yes Lower Limit =D13-D1115 Yes Yes Upper Limit =D13+D11

Note: Rows 16-251 are not shownNote: Rows 16-251 are not shown..

Page 13: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Value WorksheetValue Worksheet

Using Excel to DevelopUsing Excel to Developan Interval Estimate of an Interval Estimate of pp11 – – pp22

A B C D E1 Sur2 Sur1 Survey 2 (from Popul.1) Survey 1 (from Popul.2)2 No Yes Sample Size 250 1503 Yes No No. of "Yes" 120 604 Yes Yes Samp. Propor. 0.48 0.405 No Yes 6 Yes No Confid. Coeff. 0.95 7 No No Lev. Of Signif. 0.05 8 No Yes z Value 1.960 9 Yes No

10 No No Std. Error 0.051011 Yes Yes Marg. of Error 0.099912 Yes No 13 Yes Yes Pt. Est. of Diff. 0.08014 No Yes Lower Limit -0.02015 Yes Yes Upper Limit 0.180

Note: Rows 16-251 are not shownNote: Rows 16-251 are not shown..

Page 14: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Hypothesis Tests about p1 - p2

• Hypotheses

• Test Statistic

1 2

1 2 1 2( ) ( )

p p

p p p pz

1 2

1 2 1 2( ) ( )

p p

p p p pz

HH00: : pp11 - - pp22 << 0 0

HHaa: : pp11 - - pp22 > 0 > 0

Page 15: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Hypothesis Tests about Hypothesis Tests about pp11 - - pp22

Point Estimator of where Point Estimator of where pp11 = = pp22 p p1 2 p p1 2

1 2 1 2(1 )(1 1 )p ps p p n n 1 2 1 2(1 )(1 1 )p ps p p n n

1 1 2 2

1 2

n p n pp

n n

1 1 2 2

1 2

n p n pp

n n

where:where:

Page 16: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Can we conclude, using a .05 level of significance, that the proportion of households aware of the client’s product increased after the new advertising campaign?

Hypothesis Tests about Hypothesis Tests about pp11 - - pp22

Page 17: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Hypothesis Tests about Hypothesis Tests about pp11 - - pp22

pp11 = proportion of the population of households = proportion of the population of households “ “aware” of the product aware” of the product afterafter the new campaign the new campaign pp22 = proportion of the population of households = proportion of the population of households “ “aware” of the product aware” of the product beforebefore the new campaign the new campaign

HypothesesHypotheses

HH00: : pp11 - - pp22 << 0 0

HHaa: : pp11 - - pp22 > 0 > 0

Page 18: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

• Rejection Rule

• Test Statisticp

250 48 150 40

250 150180400

45(. ) (. )

.p

250 48 150 40250 150

180400

45(. ) (. )

.

sp p1 245 55 1

2501150 0514 . (. )( ) .sp p1 2

45 55 1250

1150 0514 . (. )( ) .

(.48 .40) 0 .08 1.56

.0514 .0514z

(.48 .40) 0 .08 1.56

.0514 .0514z

Hypothesis Tests about Hypothesis Tests about pp11 - - pp22

Reject Reject HH00 if if zz > 1.645 > 1.645

Page 19: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

ConclusionConclusion

Hypothesis Tests about Hypothesis Tests about pp11 - - pp22

Using Using = .05, we = .05, we cannotcannot conclude that the conclude that the proportion of households aware of the client’sproportion of households aware of the client’s product increased after the new campaign.product increased after the new campaign.

zz = 1.56 < 1.645. = 1.56 < 1.645. Do not reject Do not reject HH00..

Page 20: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Using Excel to ConductUsing Excel to Conducta Hypothesis Test about a Hypothesis Test about pp11 – – pp22

Formula WorksheetFormula WorksheetA B C D E

1 Sur2 Sur1 Survey 2 (from Popul.1) Survey 1 (from Popul.2)2 No Yes Sample Size 250 1503 Yes No No. of "Yes" =COUNTIF(A2:A251,"Yes") =COUNTIF(B2:B151,"Yes")4 Yes Yes Samp. Propor. =D3/D2 =E3/E25 No Yes 6 Yes No Lev of Signif. 0.05 7 No No Crit.Val. (upper) =NORMSINV(1-D7) 8 No Yes 9 Yes No Pt. Est. of Diff. =D4-E4

10 No No Hypoth. Value 011 Yes Yes12 Yes No Pool. Est. of p =(D2*D4+E2*E4)/(D2+E2)13 Yes Yes Standard Error =SQRT(D12*(1-D12)*(1/D2+1/E2))14 No Yes Test Statistic =(D9-D10)/D1315 Yes Yes p -Value =2*NORMSDIST(D14)16 Yes No Conclusion =IF(D15<D6,"Reject","Do Not Reject")

Note: Rows 17-251 are not shownNote: Rows 17-251 are not shown..

Page 21: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Value WorksheetValue Worksheet

Using Excel to ConductUsing Excel to Conducta Hypothesis Test about a Hypothesis Test about pp11 – – pp22

A B C D E1 Sur2 Sur1 Survey 2 (from Popul.1) Survey 1 (from Popul.2)2 No Yes Sample Size 250 1503 Yes No No. of "Yes" 120 604 Yes Yes Samp. Propor. 0.48 0.405 No Yes 6 Yes No Lev of Signif. 0.05 7 No No Crit.Val. (upper) 1.645 8 No Yes 9 Yes No Pt. Est. of Diff. 0.08

10 No No Hypoth. Value 011 Yes Yes12 Yes No Pool. Est. of p 0.45013 Yes Yes Standard Error 0.051414 No Yes Test Statistic 1.55715 Yes Yes p -Value 0.06016 Yes No Conclusion Do Not Reject

Note: Rows 17-251 are not shownNote: Rows 17-251 are not shown..

Page 22: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Hypothesis (Goodness of Fit) Testfor Proportions of a Multinomial

Population

1.1. Set up the null and alternative hypotheses. Set up the null and alternative hypotheses.

2.2. Select a random sample and record the observed Select a random sample and record the observed

frequency, frequency, ffi i , for each of the , for each of the kk categories. categories.

3.3. Assuming Assuming HH00 is true, compute the expected is true, compute the expected frequency, frequency, eei i , in each category by multiplying the, in each category by multiplying the category probability by the sample size.category probability by the sample size.

Page 23: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Hypothesis (Goodness of Fit) TestHypothesis (Goodness of Fit) Testfor Proportions of a Multinomial for Proportions of a Multinomial

PopulationPopulation

22

1

( )f ee

i i

ii

k2

2

1

( )f ee

i i

ii

k

55.. Reject Reject HH00 if if (where (where is the significance level is the significance level and there are and there are kk - 1 degrees of freedom). - 1 degrees of freedom).

2 2 2 2

4.4. Compute the value of the test statistic. Compute the value of the test statistic.

Page 24: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Example: Finger Lakes Homes (A)• Multinomial Distribution Goodness of Fit Test

Finger Lakes Homes manufactures

four models of prefabricated homes,

a two-story colonial, a log cabin, a

split-level, and an A-frame. To help

in production planning, management

would like to determine if previous

customer purchases indicate that there is

a preference in the style selected.

Page 25: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Example: Finger Lakes Homes (A)Example: Finger Lakes Homes (A)

Multinomial Distribution Goodness of Fit TestMultinomial Distribution Goodness of Fit Test

The number of homes sold of eachThe number of homes sold of each

model for 100 sales over the past twomodel for 100 sales over the past two

years is shown below.years is shown below.

Split- A-Split- A-Model Colonial Log Level FrameModel Colonial Log Level Frame

# Sold# Sold 30 20 35 15 30 20 35 15

Page 26: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

HypothesesHypotheses

Multinomial Distribution Goodness of Fit Multinomial Distribution Goodness of Fit TestTest

where:where:

ppCC = population proportion that purchase a colonial = population proportion that purchase a colonial

ppL L = population proportion that purchase a log cabin = population proportion that purchase a log cabin

ppS S = population proportion that purchase a split-level = population proportion that purchase a split-level

ppAA = population proportion that purchase an A-frame = population proportion that purchase an A-frame

HH00: : ppCC = = ppLL = = ppSS = = ppAA = .25 = .25

HHaa: The population proportions are : The population proportions are notnot

ppCC = .25, = .25, ppLL = .25, = .25, ppSS = .25, and = .25, and ppAA = .25 = .25

Page 27: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Rejection RuleRejection Rule

22

7.815 7.815

Do Not Reject H0Do Not Reject H0 Reject H0Reject H0

Multinomial Distribution Goodness of Fit Multinomial Distribution Goodness of Fit TestTest

With With = .05 and = .05 and

kk - 1 = 4 - 1 = 3 - 1 = 4 - 1 = 3

degrees of freedomdegrees of freedom

Reject H0 if if 22 > 7.815. > 7.815.

Page 28: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

• Expected Frequencies

• Test Statistic

22 2 2 230 25

2520 2525

35 2525

15 2525

( ) ( ) ( ) ( )22 2 2 230 25

2520 2525

35 2525

15 2525

( ) ( ) ( ) ( )

Multinomial Distribution Goodness of Fit Test

ee1 1 = .25(100) = 25 = .25(100) = 25 ee22 = .25(100) = 25 = .25(100) = 25

ee33 = .25(100) = 25 = .25(100) = 25 ee44 = .25(100) = 25 = .25(100) = 25

= 1 + 1 + 4 + 4 = 1 + 1 + 4 + 4

= 10= 10

Page 29: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

• Conclusion

Multinomial Distribution Goodness of Fit Test

We reject, at the .05 level of significance,We reject, at the .05 level of significance,

the assumption that there is no home stylethe assumption that there is no home style

preference.preference.

2 2 = 10 > 7.815= 10 > 7.815

Page 30: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Worksheet (showing data)Worksheet (showing data)

Using Excel to ConductUsing Excel to Conducta Goodness of Fit Testa Goodness of Fit Test

A B C D1 House Style2 1 Col3 2 Log4 3 Log5 4 Afr6 5 Col7 6 Spl8 7 Afr 9 8 Col

10 9 Afr11 10 Log12 11 Spl

Note: Rows 13-101 are not shown.Note: Rows 13-101 are not shown.

Page 31: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

Using Excel to ConductUsing Excel to Conducta Goodness of Fit Testa Goodness of Fit Test

Formula WorksheetFormula WorksheetC D E F G H I

1 Hyp. Observed Expect. Sq'd. Sq.Diff./2 Categ. Prop. Frequency Freq. Diff. Diff. Exp.Freq.3 Col. 0.25 =COUNTIF(B2:B101,"Col") =D3*$E$7 =E4-F4 =G3^2 =H3/F34 Log 0.25 =COUNTIF(B2:B101,"Log") =D4*$E$7 =E5-F5 =G4^2 =H4/F45 Split-L 0.25 =COUNTIF(B2:B101,"Spl") =D5*$E$7 =E6-F6 =G5^2 =H5/F56 A-Fr. 0.25 =COUNTIF(B2:B101,"Afr") =D6*$E$7 =E7-F7 =G6^2 =H6/F67 Total =SUM(E3:E6) =SUM(I3:I6)8 9 4

10 =I711 =E-112 =CHIDIST(E11,E12)

Categories

Degr. of Free.p -Value

Test Statistic

Note: Columns A-B and rows 13-101 are not shown.Note: Columns A-B and rows 13-101 are not shown.

Page 32: Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.

C D E F G H I1 Hyp. Observed Expect. Sq'd. Sq.Diff./2 Categ. Prop. Frequency Freq. Diff. Diff. Exp.Freq.3 Col. 0.25 30 25 5 25 14 Log 0.25 20 25 -5 25 15 Split-L 0.25 35 25 10 100 46 A-Fr. 0.25 15 25 -10 100 47 Total 100 108 9 4

10 1011 312 0.0186

Categories

Degr. of Free.p -Value

Test Statistic

Value WorksheetValue Worksheet

Using Excel to ConductUsing Excel to Conducta Goodness of Fit Testa Goodness of Fit Test

Note: Columns A-B and rows 13-101 are not shown.Note: Columns A-B and rows 13-101 are not shown.