Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies –...

18
Sociology 601 Class12: October 8, 2009 1
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies –...

Sociology 601 Class12: October 8, 2009

1

8.2 Chi-squared statistical significance test for contingency tables.

support tax reform? Yes No Totsupport Yes 150 100 250environment? No 200 50 250

Tot 350 150 500

• “Is the level of support for the environment independent of the level of support for tax reform?”– If so, these two measures may have some causal link

worth investigating.– Q: which causes which? 2

2x2 table: a t-test for proportions

• With a 2x2 table, we can use a t-test for independent-sample proportions (review 7.2).

. prtesti 250 .6 250 .8

Two-sample test of proportion x: Number of obs = 250 y: Number of obs = 250------------------------------------------------------------------------------ Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- x | .6 .0309839 .5392727 .6607273 y | .8 .0252982 .7504164 .8495836-------------+---------------------------------------------------------------- diff | -.2 .04 -.2783986 -.1216014 | under Ho: .0409878 -4.88 0.000------------------------------------------------------------------------------ diff = prop(x) - prop(y) z = -4.8795 Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(Z < z) = 0.0000 Pr(|Z| < |z|) = 0.0000 Pr(Z > z) = 1.0000 3

Moving beyond 2x2 tables:

• Comparing conditional probabilities is fine when there are only two comparisons and two possible outcomes for each comparison.

• The Chi-Square (2) test is a new technique for making comparisons more flexible.

• 2 is like a null hypothesis that every cell should have the frequency you would expect if the variables were independently distributed.

• fe is the expected count for each cell.

– fe = product of row totals * column totals / table total (A&F 254)

– fe = total N * unconditional row probability * unconditional column probability

– fe = column N * unconditional row

• A test for the whole table will combine tests for fe for every cell.

4

Calculating expected cell counts:

• The expected cell count is the count we would expect in a cell if – environmental support among tax reform advocates and

among tax reform opponents were identical, or if– environmental support among tax reform advocates were

the same as environmental support among the whole sample, or if

– tax reform support among environmentalists were the same as among non-enviornmentalists

• 50% of sample supports environmental spending, so– fe(1,1) = .5 * 350 = 175– fe(1,1) = 250 * 350 / 500 (A&F)– fe(1,1) = 500*(350/500)taxes *(250/500)environment = 175

• fe(1,2) = 75• fe(2,1) = 175• fe(2,2) = 75 5

Testing independence of support for tax reform and environmental spending:

• New Approach: Chi Squared test for independence of attitudes toward taxes and the environment.

• Test statistic: – 2 = ((fo – fe)2 / fe )

– where fo is the observed count in each cell

– and where fe is the expected count for each cell, assuming that attitudes toward taxes will be the same for people who support environmental issues as for people who do not support environmental issues. 6

Assumptions and hypothesis for a chi-squared test:

• Assumptions:– two categorical variables (for this course)– random sample or stratified random sample– fe 5 for all cells

• Hypothesis: Ho: the two variables are statistically independent.– this means that the distribution of each variable is

independent of the score of the other variable

7

Using expected cell counts to calculate a Chi-squared test statistic

• The test statistic is analogous to a t-statistic…– but the form of the equation makes it difficult to see

that the X2 statistic is a difference between the observed and expected values, divided by an estimate of the typical variation we would expect from random sampling error.

• Test statistic: – 2 = ((fo – fe)2 / fe )= ((150 –175)2/175 + (100-75)2/75 + (200-175)2/175 + (50-75)2/75 ) = 3.5714 + 8.3333 + 3.5714 + 8.3333 = 23.81

8

Degrees of freedom for a Chi-squared statistic:

• We now have a test statistic: 2 = 23.81 • How do we assign a p-value to this?• Step 1: calculate the degrees of freedom.

– Given the row and column marginal totals, how many cells need we fill in before we can do the rest automatically?

– Answer: 1 in this case, so df = 1.– General answer: df = (r-1)*(c-1), where r is the

number of rows and c is the number of columns.

9

p-value for a Chi-squared statistic:

• Assign a p-value to the statistic: 2 = 23.81, df = 1

• Given the degrees of freedom, look up the p-value.– Go to Table C on page 670.– Go down to the row for df = 1– Move across X2 values to the largest tabled value that is

smaller than the measured X2

– Look up the corresponding p-value at the top of the column: p < .001

– The chi-squared test is always a 1-tailed test: we always use the right tail of the distribution.

10

Do your own chi-squared test:

• You watch 50 beachcombers to see if they are wearing sandals and if they are wearing shorts

• . wearing shorts?• Yes No Tot• sandals? Yes 20 10 30• . No 10 10 20• . Tot 30 20 50

• Q: Does a beachcomber’s chance of wearing sandals depend on their chance of wearing shorts?

•11

Chi-Squared Tests for tables larger than 2X2

• Here is a command to run a chi-squared test on the gender and partyid data from the 1996 GSS (cf. 8.1)

. tab partyid3 sex, chi2

| respondents sex partyid3 | male female | Total------------+----------------------+---------- Democrat | 350 627 | 977 Independent | 514 557 | 1,071 Republican | 400 407 | 807 ------------+----------------------+---------- Total | 1,264 1,591 | 2,855

Pearson chi2(2) = 43.4391 Pr = 0.00012

• Add expected cell counts

. tab partyid3 sex, chi2 exp

+--------------------+| Key ||--------------------|| frequency || expected frequency |+--------------------+

| respondents sex partyid3 | male female | Total------------+----------------------+---------- Democrat | 350 627 | 977 | 432.5 544.5 | 977.0 ------------+----------------------+----------Independent | 514 557 | 1,071 | 474.2 596.8 | 1,071.0 ------------+----------------------+---------- Republican | 400 407 | 807 | 357.3 449.7 | 807.0 ------------+----------------------+---------- Total | 1,264 1,591 | 2,855 | 1,264.0 1,591.0 | 2,855.0

Pearson chi2(2) = 43.4391 Pr = 0.00013

8.3 When not to do a chi-squared test

1.) Do not do a Chi-squared test when the expected value of a cell is less than 5.

The Problem: The total 2 is 6.28, so p<.05, but 4.5 of the total comes from one cell with fe = 2.

(It is okay to do a Chi-squared test if a cell has an expected value above 5 and an observed value below 5!)

age Party identification

Democrat Indep. Republican Total

<65 42 (40) 5 (8) 33 (32) 80

65 8 (10) 5 (2) 7 (8) 20

total 50 10 40 100

14

A small sample alternative to a chi-squared test

When the sample size is too small for a chi-squared test, you may treat the contingency table as a small sample comparison of two population proportions.

This means you should do a Fisher’s exact test for population proportions.

A Fisher’s exact test will also work okay on large samples, but you sometimes will bog down the computer with lengthy computations. (This is especially likely to happen when the tables are 5X4 or larger). 15

Fisher’s exact test in STATA

(not necessary in this case because of large N).. tab partyid3 sex, chi exact

Enumerating sample-space combinations:

stage 3: enumerations = 1

stage 2: enumerations = 158

stage 1: enumerations = 0

| respondents sex

partyid3 | male female | Total

------------+----------------------+----------

Democrat | 350 627 | 977

Independent | 514 557 | 1,071

Republican | 400 407 | 807

------------+----------------------+----------

Total | 1,264 1,591 | 2,855

Pearson chi2(2) = 43.4391 Pr = 0.000

Fisher's exact = 0.000 16

When not do a chi-squared test (#2)

2.) Do not do a Chi-squared test for cell values that are not observed frequencies.

The Problem: If you use percentages, you misstate the sample size as 100.

sex Voted in last election?

Yes No Total

women 35% 15% 50%

men 20% 30% 50%

total 55% 45% 100%

17

When not to do a chi-squared test (#3)

3.) Do not do a Chi-squared test to find a difference in population proportions for dependent samples.

The Problem: You want to know if the speech changed people’s opinions. A 2 test would tell you if opinions after the speech depend on opinions before the speech.

Before speech:

Number supporting death penalty:

After hearing speech:

Yes No Total

Yes 80 20 100

No 40 60 100

total 120 80 200

18