Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze...

68
Chapter 9 Analysis of Two-Way Tables

Transcript of Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze...

Page 1: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Chapter 9

Analysis of Two-Way Tables

Page 2: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Two-way (i.e. contingency) tables: to classify & analyze categorical data:

Binomial counts: ‘success’ vs. ‘failure’

Proportions: binomial count divided by total sample size

Page 3: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

We’ll later see that inference via two-way tables is an alternative—with advantages & disadvantages—to the z-test for comparing two sample proportions:

. prtest hsci, by(white)

Page 4: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

An advantage of two-way tables is that they can examine more than two variables.

A disadvantage of two-way tables is that they can only do two-sided hypothesis tests.

Page 5: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Here’s a two-way table:

. tab hsci white, cell Nonwhite White Total

not hsci 50 103 153

25.0% 51.5%76.5%

hsci 5 42 47

2.5% 21.0% 23.5%

Total 55 145 200

27.5% 72.5% 100.0%

Page 6: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Nonwhite White Total

not hsci 50 103 153

25.0% 51.5%76.5%

hsci 5 42 47

2.5% 21.0% 23.5%

Total 55 145 200

27.5% 72.5% 100.0%

The row variable: hsci vs. not hsci. The column variable: white vs. nonwhite.

Page 7: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Nonwhite White Total

not hsci 50 103 153

25.0% 51.5%76.5%

hsci 5 42 47

2.5% 21.0% 23.5%

Total 55 145 200

27.5% 72.5% 100.0%

Cells: each combination of values for the two variables (50, 103, 5, 42).

Page 8: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Nonwhite White Total

not hsci 50 103 153

25.0% 51.5% 76.5%

hsci 5 42 47

2.5% 21.0% 23.5%

Total 55 145 200

27.5% 72.5% 100.0%

Joint distributions: Each cell’s percentage of the total sample (50/200=.250; 103/200=.515; 5/200=.025; 42/200=.210).

Page 9: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Nonwhite White Total

not hsci 50 103 153

25.0% 51.5%76.5%

hsci 5 42 47

2.5% 21.0% 23.5%

Total 55 145200

27.5% 72.5% 100.0%

The marginal frequencies: the row totals (153, 47) & the column totals (55, 145).

Page 10: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Nonwhite White Total

not hsci 50 103 153

25.0% 51.5%76.5%

hsci 5 42 47

2.5% 21.0% 23.5%

Total 55 145 200

27.5% 72.5% 100.0%

The marginal distributions: each row total/sample total (76.5%, 23.5%). Each column total/sample total (27.5%, 72.5%).

Page 11: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Here are the same data displayed as column conditional probabilities:

. tab hsci white, col nofreq nonwhite white Total

no 90.91% 71.03% 76.5%

yes 9.09% 28.97% 23.5%

Total 100.0% 100.% 100.00%

The conditional distributions (i.e conditional probabilities): Column—divide each column cell count by its column total count.

Page 12: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Here are the same data displayed as row conditional probabilities:

. tab hsci white, row nofreq nonwhite white Total

not hsci 32.68% 67.32% 100.0%

hsci 10.64% 89.36% 100.0%

Total 27.5% 72.5% 100.00%

The conditional distributions (i.e conditional probabilities): Row—divide each row cell count by its row total count.

Page 13: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Tip: It’s usually best to compute conditional distributions (i.e. probabilities) across the categories of the explanatory variable.

E.g., tab hsci white, col: computes the conditional distributions across the categories of the explanatory variable race-ethnicity (i.e. white vs. nonwhite).

Alternatively, you may want to compare joint distributions (i.e. cell counts/total sample): tab hsci white, cell

Page 14: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

We’ve discussed the following:

row variables

column variables

cells: each combination of values for the two variables.

joint distributions: each cell’s percentage of the total sample.

Page 15: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

marginal frequencies

marginal distributions: each marginal frequency/total sample size

column conditional distributions: divide each column cell count by its column total count

row conditional distributions: divide each row cell count by its row total count.

Page 16: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

And we’ve said that typically it’s best to compute the conditional distributions (i.e. probabilities) across the categories of the explanatory variable.

Or that it may be preferable to compare joint distributions (i.e. compare the cell probabilities).

Page 17: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Let’s next consider conceptual problems of two-way tables.

Page 18: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Simpson’s Paradox

An NSF study found that the median salary of newly graduated female engineers & scientists was just 73% of the median salary for males. Here are women’s median salaries in the 16 fields as a percentage of male salaries:

94% 96% 98% 95% 85% 85% 84% 100% 103% 100% 107% 93% 104% 93% 106% 100%

How can it be that, on average, the women earn just 73% of the median salary for males, since no listed % falls below 84%?

Page 19: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Because women are disproportionately located in the lower-paying fields of engineering & science.

That is, ‘field of science & engineering’ is a lurking variable (i.e. an unmeasured confounded variable) that influences the observed association between gender & salary.

Page 20: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Simpson’s Paradox: the reversal of a bivariate relationship due to the influence of a lurking variable.

Aggregating data has the effect of ignoring one or more lurking variables.

Another example: comparing hospital mortality rates.

Yet another: comparing airline on-time rates.

Page 21: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Conclusion from

Simpson’s Paradox

Always be on the lookout for lurking variables with aggregated data!!

A bivariate relationship may change direction when a third, control variable is introduced.

Page 22: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

What’s a control variable?

Holding a variable constant makes it a control variable: doing so removes the part of the bivariate relationship that was caused by the control variable.

That is, controlling for a variable neutralizes its influence on the observed relationship.

E.g., controlling for field of science & engineering.

E.g., controlling for race/ethnicity.

Page 23: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

To repeat, holding a variable constant removes its statistical effects from the bivariate association being examined.

Doing so ensures (more or less) that a bivariate relationship is assessed apart from the influence of the controlled variable: e.g., the relationship between a Montessori school program & student IQ scores, holding constant social class.

Page 24: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

What’s better: statistical control or experimental control?

The answer returns us to the matter of observational study versus experimental study (see Moore/McCabe, chapter 3).

Page 25: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Good experimental design controls for all possible lurking variables. Why?

But statistical control cannot do so. Why not?

Moreover, statistical control is weakened by the imprecision of measurement of variables.

But we can’t experiment on everything.

Page 26: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Let’s consider the following variant on Simpson’s Paradox:

Page 27: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

A bivariate association may not appear until a third, control variable is introduced.

The apparent absence of the bivariate relationship is called spurious non-association.

E.g., no association between years of education & level of income in post-WW II data, until controlling for age of respondents.

Page 28: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Conclusion from

Spurious Non-Association

Explore not just bivariate relationships but also multivariate relationships among all the variables of potential practical or theoretical relevance.

Page 29: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Here’s how to add a control variable to a two-way table in Stata:

bys female: tab hsci white, cell chi2

Page 30: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

malenonwhite white Total

0 21 41 62

23.08% 45.05% 68.13%

1 2 27 29

2.20% 29.67% 31.87%

Total 23 68 91

25.27% 74.73% 100.00 %

Pearson chi2(1) = 7.6120 Pr = 0.006

femalenonwhite white Total

0 29 62 91

26.61% 56.88% 83.49%

1 3 15 18

2.75% 13.76% 16.51%

Total 32 77 109

29.36% 70.64% 100.00%

Pearson chi2(1) = 1.6744 Pr = 0.196

Page 31: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

This example introduces a test of statistical significance for two-way tables.

The test is based on the Chi-square statistic.

Page 32: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

male nonwhite white Total

0 21 41 62 23.08% 45.05% 68.13%

1 2 27 29 2.20% 29.67% 31.87%

Total 23 68 91 25.27% 74.73% 100.00 %Pearson chi2(1) = 7.6120 Pr = 0.006

femalenonwhite white Total

0 29 62 91 26.61% 56.88% 83.49%

1 3 15 18 2.75% 13.76% 16.51%

Total 32 77 109 29.36% 70.64% 100.00%

Pearson chi2(1) = 1.6744 Pr = 0.196

Page 33: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Note in the example that the two-way table for male tests insignificant.

How do two-way tables & their test of significance evaluate the data?

They do so by comparing expected & observed cell counts in terms of proportional distributions.

Page 34: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Back to the two-way table without the control variable::

. tab hsci white, cell

Nonwhite White Totalnot hsci 50 103 153

25.0% 51.5% 76.5%

hsci 5 42 47 2.5% 21.0% 23.5%

Total 55 145 200 27.5% 72.5% 100.0%

Page 35: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Describing Relations in Two-Way Tables

The original data must be counts.

Inference for two-way tables: compare the observed cell counts to the expected cell counts; then compute the Chi-square significance test.

Page 36: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

We begin by computing the expected cell counts: row total times column total, divided by total sample size.

Premise: the null hypothesis of ‘statistical independence’ (i.e. no association between the variables) characterizes the data.

Page 37: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Expected cell counts: row total times column total, divided by total sample size.

nonwhite white Total

no 50 103 153

yes 5 42 47

Total 55 145 200

Page 38: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

nonwhite white Total

no 50 103 153

yes 5 42 47

Total 55 145 200

. di (153*55)/200=42.075 . di (153*145)/200=110.925

. di (47*55)/200= 12.925 . di (47*145)/200= 34.075

How do the expected cell counts compare to the observed cell counts: Do the conditional probabilities appear to be equal for nonwhites & whites across no-hsci & yes hsci?

Page 39: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Expected count for each cell: its row total times its column total, divided by the total sample size.

Each expected cell count is based on the proportion of the total sample accounted for by its entire row & by its entire column.

The Chi-square test assumes independence (i.e. no association) between the conditional distributions of nonwhites & whites in honors science.

Page 40: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

That is, each expected cell count reflects the null hypothesis of statistical independence (i.e. no association):

that the proportion of non-white honors science students is simply the proportion of non-white students in the population.

that the proportion of white honors science students is simply the proportion of white students in the population.

What’s the alternative hypothesis?

Page 41: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Chi-Square Test Assumptions

Random sample

Two categorical variables

Count data

At least 5 observations in 80% of the cells & no less than 1 observation in any cell (best if there’s at least 5 observations in all cells)

Page 42: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

If the assumptions are fulfilled, use the Chi-square test:

tab hsci white, chi2

If the numbers of observations per cell don’t meet the assumptions, use ‘Fisher’s exact test’ (a non-parametric test, which may be very slow):

tab hsci white, exact

Page 43: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Chi-square statistic: measures how much the observed cell counts in a two-way table diverge from the expected cell counts.

It’s therefore a test of independence:

Ho: the variables are independent from each other

Ha: they are not independent from each other

Page 44: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Step 1: Chi-square = summation for all cells of (observed cell count – expected cell count)squared, divided by the cell’s expected count

Step 2: df = (# row vars –1) (# column vars – 1)

Step 3: Chi-square significance test=Chi-square/df

Page 45: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Chi-square/df statistic: positive values only

Has a distinct distribution for each degree of freedom (see Moore/McCabe)

Two-sided hypothesis test only

Page 46: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Chi-square Test: To Repeat…

Chi-square statistic: measures how much the observed cell counts in a two-way table diverge from the expected cell counts.

That is, it compares the sample distribution with a hypothesized distribution.

It’s a test of statistical independence (Ho: no association; Ha: association).

Page 47: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Step 1: Chi-square = summation for all cells of (observed cell count – expected cell count)squared, divided by the cell’s expected count

Step 2: df = (# row vars –1) (# column vars – 1)

Step 3: Chi-square significance test=Chi-square/df

Page 48: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Hypothesis Test

Ho: hsci whites = hsci nonwhites

Ha: hsci whites ~= hsci nonwhites (i.e. two-sided alternative)

Chi-square test: two-sided alternative hypothesis only.

Page 49: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

. tab hsci white, cell chi2

nonwhite white Total

no 50 103 153

25.0% 51.5% 76.5%

yes 5 42 47

2.5% 21.0% 23.5%

Total 55 145 200

27.5% 72.5% 100.0%

Pearson chi2(1) = 8.7613 Pr = 0.003

Conclusion: Reject the null hypothesis.

Page 50: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Let’s repeat the earlier example to see what happens when we add a control variable to the two-way table:

bys female: tab hsci nonwhite, col chi2

Page 51: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

female = malenonwhite white Total

0 21 41 62

23.08% 45.05% 68.13%

1 2 27 29

2.20% 29.67% 31.87%

Total 23 68 91

25.27% 74.73% 100.00%

Pearson chi2(1) = 7.6120 Pr = 0.006

female = femalenonwhite white Total

0 29 62 91

26.61% 56.88% 83.49%

1 3 15 18

2.75% 13.76% 16.51%

Total 32 77 109

29.36% 70.64% 100.00%

Pearson chi2(1) = 1.6744 Pr = 0.196

Page 52: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

But when adding a control variable, beware of the consequences for sub-sample sizes.

If the sub-samples are too small, it may be hard to obtain statistical significance.

So always check the size of sub-samples.

Page 53: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Remember: cell counts should be >=5, & for at least 80% of the cells must be this large.

Page 54: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

We can compare two population proportions either by the chi-square test or by the two-sample z-test—which give exactly the same result—because the chi-square test is equal to the square of the z-test.

Chi-square test advantage: can compare more than two populations (e.g., SES by race-ethnicity in hsb2.dta); but the original data must be counts.

z-test advantage: can test either one-sided or two-sided alternatives; the original data may be counts or proportions.

Page 55: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

. prtest hsci, by(white)

Two-sample test of proportion nonwhite: Number of obs = 55

white: Number of obs = 145

------------------------------------------------------------------------------

Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]

---------+--------------------------------------------------------------------

nonwhite | .0909091 .0387638 2.34521 0.0190 .0149335 .1668847

white | .2896552 .0376696 7.68936 0.0000 .2158241 .3634863

---------+--------------------------------------------------------------------

diff | -.1987461 .0540521 -.3046863 -.0928059

| under Ho: .0671451 -2.95995 0.0031

------------------------------------------------------------------------------

Ho: proportion(nonwhite) - proportion(white) = diff = 0

Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0

z = -2.960 z = -2.960 z = -2.960

P < z = 0.0015 P > |z| = 0.0031 P > z = 0.9985

Page 56: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

. tab hsci white, cell chi2

nonwhite white Total

no 50 103 153

25.0% 51.50% 76.50%

yes 5 42 47

2.50% 21.0% 23.50%

Total 55 145 200

27.50% 72.50% 100.0%

Pearson chi2(1) = 8.7613 Pr = 0.003

pr=.003 for Chi-square test & for z-test.

Page 57: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Other Useful Stata Commands

findit tabchi: displays observed & expected frequencies, various types of residuals (raw, pearson, adjusted), & various tests of significance.

findit tabout: to make publication-style contingency & other tables

See the following class documents: ‘Making contingency tables in Stata’; ‘Making working & publication-style tables in Stata’.

Page 58: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

For greater depth concerning contingency tables & their various significance tests, see Agresti & Finlay, Statistical Methods for the Social Sciences, chap. 8.

Page 59: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Summary: Two-Way Tables

Two-way tables: categorical data—binomial counts (‘success’ vs. ‘failure’) or proportions (binomial counts divided by the total sample size); but the data must be counts

Row variables? Column variables? Cells?

Page 60: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Marginal frequencies? Marginal distributions?

Joint distributions?

Row & column conditional distributions?

Page 61: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

How to compute expected cell frequencies? What do they represent?

Null hypothesis? Alternative hypothesis?

How to compute the Chi-square test?

How to compute its degrees of freedom?

Chi-square assumptions?

Page 62: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Advantages/disadvantages of inference via two-ways tables versus inference via z-test for two sample proportions?

Chi-square test of significance: equals the square of the z-test for comparing sample proportions, but the Chi-square test requires the original data to be counts.

Page 63: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Simpson’s Paradox: aggregating the data ignores lurking variables.

Moral of the story: beware of the relations portrayed in aggregated data (i.e. look out for lurking variables)!

Page 64: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Spurious non-association: a bivariate association appears only when a third, control variable is introduced.

Moral of this story: the same as for Simpson’s Paradox.

Page 65: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Finally, when should we use contingency tables & the Chi-square test?

As part of bivariate exploratory data analysis, including in preparation for regression analysis

Or when we don’t have enough observations to do regression analysis (i.e. perhaps categorize the data and do cross-tabs).

Page 66: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

Here’s how to do the tables for Moore/McCabe, problem 9.1:

Page 67: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

. tabulate educ age [freq=years], chi2

a1 a2 a3 Total

e1 5,325 9,152 16,035 30,512

e2 14,061 24,070 18,320 56,451

e3 11,659 19,926 9,662 41,247

e4 10,342 19,878 8,005 38,225

Total 41,387 73,026 52,022 166,435

Pearson chi2(6) = 9.6e+03 Pr = 0.000

Page 68: Chapter 9 Analysis of Two-Way Tables. Two-way (i.e. contingency) tables: to classify & analyze categorical data: Binomial counts: success vs. failure.

. tabulate educ age [freq=years], cell chi2

a1 a2 a3 Total

e1 3.20 5.50 9.63 18.33

e2 8.45 14.46 11.01 33.92

e3 7.01 11.97 5.81 24.78

e4 6.21 11.94 4.81 22.97

Total 24.87 43.88 31.26 100.00

Pearson chi2(6) = 9.6e+03 Pr = 0.000

The ‘row’ or ‘col’ options may be preferable to use.