Copyright © 2012 Brooks/Cole, a division of Cengage Learning, Inc. A History of Helping Chapter Two.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables...
-
Upload
benjamin-sullivan -
Category
Documents
-
view
215 -
download
0
Transcript of Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables...
Copyright ©2011 Brooks/Cole, Cengage Learning
More about Inference
for Categorical Variables
Chapter 15
1
Copyright ©2011 Brooks/Cole, Cengage Learning 2
Principle Question:
Is there a relationship between the two variables, so that the category into which individuals fall for one variable seems to depend on the category they are in for the other variable?
Copyright ©2011 Brooks/Cole, Cengage Learning 3
15.1 Chi-Square Test for Two-Way Tables
• Data displayed in a contingency or two-way table.• Each combination of row/column is a cell of table.• Two types of conditional percents: row and column.• Row percents: percents across a row, based on total
number in the row.• Column percents: percents down a column, based
on total number in the column.• If one variable is explanatory, use it to define rows
and use row percents.
Copyright ©2011 Brooks/Cole, Cengage Learning 4
Recall:Five steps for assessing
statistical significance.
Step 1: Null and alternative hypotheses
H0: The two variables are not related.
Ha: The two variables are related.
Sometimes associated is used instead of related.
Copyright ©2011 Brooks/Cole, Cengage Learning 5
Example 15.1 Ear Infections and Xylitol
Experiment: n = 533 children randomized to 3 groups Group 1: Placebo Gum; Group 2: Xylitol Gum; Group 3: Xylitol LozengeResponse = Did child have an ear infection?
Only 16.2% of children in Xylitol Gum group had infection.
Copyright ©2011 Brooks/Cole, Cengage Learning 6
Example 15.1 Infections and Xylitol
H0: p1 = p2 = p3(no relationship between trt and
outcome)
Ha: p1, p2 , p3 are not all the same (there is a relationship)
Let p1 = proportion who would get an ear infection
in the population given placebo gum p2 = proportion who would get an ear infection
in the population given xylitol gum p3 = proportion who would get an ear infection
in the population given xylitol lozenges
Copyright ©2011 Brooks/Cole, Cengage Learning 7
Example 15.2 Making FriendsQ: With whom do you find it easiest to make friend – opposite sex or same sex or no difference?
H0: No difference in distribution of responses of men and women (no relationship between gender and response)
Ha: There is a difference in distribution of responses of men and women (is a relationship between gender and response)
Copyright ©2011 Brooks/Cole, Cengage Learning 8
Tech Note: Homogeneity and Independence
Two variations of the general hypotheses statements which depend on the method of sampling.
• If samples have been taken from separate populations, the null hypothesis statement is a statement of homogeneity (sameness) among the populations.
• If a sample has been taken from a single population, and two categorical variables measured for each individual, the statement of no relationship is a statement of independence between the two variables.
Copyright ©2011 Brooks/Cole, Cengage Learning 9
Step 2: Chi-square Statistic and Necessary Conditions
Compute expected count for each cell:Expected count = Row total Column total
Total n
Compute test statistic by totaling over all cells: (Observed – Expected)2
Expected 2
Chi-square statistic measures the difference between the observed counts and the counts that would be expected if there were no relationship (i.e. if null hypothesis were true).
Copyright ©2011 Brooks/Cole, Cengage Learning 10
More on the Chi-square Statistic
Large difference evidence of a relationship.
Guidelines for large sample:1. All expected counts should be greater than 1.
2. At least 80% of the cells should have an expected count greater than 5.
Copyright ©2011 Brooks/Cole, Cengage Learning 11
Example 15.3 Infections and Xylitol
Expected count for “Placebo Gum, Yes Infection” cell:
Expected Counts:
Copyright ©2011 Brooks/Cole, Cengage Learning 12
Example 15.3 Infections and Xylitol
Chi-square Test Statistic:
Copyright ©2011 Brooks/Cole, Cengage Learning 13
Step 3: p-value of Chi-square Test
p-value = probability the chi-square test statistic could have been as large or larger if the null hypothesis were true.
Large test statistic evidence of a relationship.So how large is enough to declare significance?
Chi-square probability distribution used to find p-value.
Degrees of freedom df = (Rows – 1)(Columns – 1) = (r – 1)(c – 1)
Copyright ©2011 Brooks/Cole, Cengage Learning 14
Chi-square Distributions
• Skewed to the right distributions.• Minimum value is 0.• Indexed by the degrees of freedom (df).
Copyright ©2011 Brooks/Cole, Cengage Learning 15
Example 15.4 Infections and Xylitol
Chi-square statistic was 6.69 df = (3-1)(2-1) = 2
p-value = 0.035
Copyright ©2011 Brooks/Cole, Cengage Learning 16
Finding the p-value from Table A.5:
• If value of statistic falls between two table entries, p-value is between values of p (column headings) for these entries.
• If value of statistic is larger than entry in rightmost column (labeled p = 0.001), p-value is less than 0.001 (p < 0.001).
• If value of statistic is smaller than entry in leftmost column (labeled p = 0.50), p-value is greater than 0.50 (p > 0.50).
Look in corresponding “df” row of Table A.5. Scan across until you find where the statistic falls.
Copyright ©2011 Brooks/Cole, Cengage Learning 17
Example 15.5 Infections and Xylitol
There is a statistically significant relationship between the risk of an ear infection and the preventative treatment.
Chi-square statistic was 6.69 df = (3-1)(2-1) = 2
.025 < p-value < .05
Copyright ©2011 Brooks/Cole, Cengage Learning 18
Example 15.6 A Moderate p-Value
Table has three rows and three columns.The computed chi-square statistic is 8.12. Degrees of freedom are df = (3 – 1)(3 – 1) = 4.
Finding the p-value:Scan the df = 4 row in Table A.5 and the value of 8.12 is between the entries 7.78 (p = 0.10) and 8.50 (p = 0.075). Thus, the p-value is between 0.075 and 0.10.
0.075 < p-value < 0.10
Copyright ©2011 Brooks/Cole, Cengage Learning 19
Steps 4 and 5:Making a Decision andReporting a Conclusion
Two equivalent rules: Reject H0 when …
• p-value 0.05
• Chi-square statistic is greater than the entry in the 0.05 column of Table A.5 (the critical value).
Large test statistic small p-value evidence a real relationship exists in population.
Note: For 2x2 tables, a test statistic of 3.84 or larger is significant.
Copyright ©2011 Brooks/Cole, Cengage Learning 20
Reporting a Conclusion
Ways to write “do not reject H0”
• The relationship between smoking and drinking alcohol is not statistically significant.
• The proportions of smokers who never drink, drink occasionally, and drink often are not significantly different from the proportions of non-smokers who do so.
• There is insufficient evidence to conclude that there is a relationship in the population between smoking and drinking alcohol.
Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).
Copyright ©2011 Brooks/Cole, Cengage Learning 21
Reporting a Conclusion
Ways to write “reject H0”
• There is a statistically significant relationship between smoking and drinking alcohol.
• The proportions of smokers who never drink, drink occasionally, and drink often are not the same as the proportions of non-smokers who do so.
• Smokers have significantly different drinking behavior than non-smokers.
Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).
Copyright ©2011 Brooks/Cole, Cengage Learning 22
Example 15.8 Making FriendsQ: With whom do you find it easiest to make friend –
opposite sex or same sex or no difference?
df = (2 – 1)(3 – 1) = 2. Table A.5: value of 8.515 falls between entries in 0.025 column (7.38) and 0.01 column (9.21). 0.01 < p-value < 0.025
There is statistically significant relationship at the 0.05 level.
There appears to be a a difference in distribution of responses of men and women if the populations were asked this question.
Copyright ©2011 Brooks/Cole, Cengage Learning 23
Supporting Analyses
• Description of row (or column) percents.
• Bar chart of counts or percents.
• Examination each cell’s “contribution to chi-square.” Cells with largest values have contributed most to significance of relationship deserve attention in any description of relationship.
• Confidence intervals for important proportions or for differences between proportions.
To learn about the specific nature of the relationship:
Copyright ©2011 Brooks/Cole, Cengage Learning 24
Chi-Square Test or Z-Test forDifference in Two Proportions?
Does it make a difference?
• If desired Ha has no specific direction (two-sided), the two tests give exactly the same p-value. The squared value of the z-statistic equals the chi-square statistic.
• If desired Ha has a direction (one-sided), the z-test should be used.
Copyright ©2011 Brooks/Cole, Cengage Learning 25
15.3 Testing Hypotheses about One Categorical Variable: GOF
Step 1: Determine the null and alternative hypotheses.
H0: The probabilities for k categories are p1, p2, . . . , pk.
Ha: Not all probabilities specified in H0 are correct.
Note: Probabilities in the null hypothesis must sum to 1.
Goodness of Fit (GOF) Test
Copyright ©2011 Brooks/Cole, Cengage Learning 26
Goodness of Fit (GOF) Test
Step 2: Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic.
If at least 80% of the expected counts are greater than 5 and none are less than 1, compute
where the expected count for the ith category is computed as npi.
(Observed – Expected)2
Expected 2
Copyright ©2011 Brooks/Cole, Cengage Learning 27
Goodness of Fit (GOF) Test
Step 3: Assuming the null hypothesis is true, find the p-value. Use chi-square distribution with df = k – 1.
Step 4: Decide whether or not the result is statistically significant based on the p-value. The result is statistically significant if the p-value .
Step 5: Report the conclusion in the context of the situation.
Copyright ©2011 Brooks/Cole, Cengage Learning 28
Example 15.15 Pennsylvania Daily Number
State lottery game: Three-digit number made by drawing a digit between 0 and 9 from each of three different containers.
Focus = draws from the first container. If numbers randomly selected, each value would be equally likely to occur.
H0: p = 1/10 for each of the 10 possible digitsHa: Not H0
Copyright ©2011 Brooks/Cole, Cengage Learning 29
Example 15.15 Daily Number
Data: n = 500 days between 7/19/99 and 11/29/00
Copyright ©2011 Brooks/Cole, Cengage Learning 30
Example 15.15 Daily NumberChi-square goodness of fit statistic:
From Table A.5: df = k – 1 = 10 – 1 = 9 p-value > 0.50
Result is not statistically significant; the null hypothesis is not rejected.