CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ......

34
CHAPTER 23 In this chapter we cover... Two-way tables The problem of multiple comparisons Expected counts in two-way tables The chi-square test Using technology Cell counts required for the chi-square test Uses of the chi-square test The chi-square distributions The chi-square test and the z test The chi-square test for goodness of fit Lisl Dennis/Getty Images Two Categorical Variables: The Chi-Square Test The two-sample z procedures of Chapter 21 allow us to compare the proportions of successes in two groups, either two populations or two treatment groups in an experiment. In the first example in Chapter 21 (page 513), we compared young men and young women by looking at whether or not they lived with their par- ents. That is, we looked at a relationship between two categorical variables, gender (female or male) and “Where do you live? ” (with parents or not). In fact, the data include three more outcomes for “Where do you live? ”: in another person’s home, in your own place, and in group quarters such as a dormitory. When there are more than two outcomes, or when we want to compare more than two groups, we need a new statistical test. The new test addresses a general question: is there a relationship between two categorical variables? Two-way tables We saw in Chapter 6 that we can present data on two categorical variables in a two-way table of counts. That’s our starting point. Here is an example. EXAMPLE 23.1 Health care: Canada and the United States Canada has universal health care. The United States does not, but often offers more elaborate treatment to patients with access. How do the two systems compare in 547

Transcript of CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ......

Page 1: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

CH

AP

TE

R

23In this chapter we cover...

Two-way tables

The problem of multiplecomparisons

Expected counts intwo-way tables

The chi-square test

Using technology

Cell counts required forthe chi-square test

Uses of the chi-square test

The chi-squaredistributions

The chi-square test andthe z test∗

The chi-square test forgoodness of fit∗

Lisl

Den

nis/

Get

tyIm

ages

Two Categorical Variables:The Chi-Square Test

The two-sample z procedures of Chapter 21 allow us to compare the proportionsof successes in two groups, either two populations or two treatment groups in anexperiment. In the first example in Chapter 21 (page 513), we compared youngmen and young women by looking at whether or not they lived with their par-ents. That is, we looked at a relationship between two categorical variables, gender(female or male) and “Where do you live?” (with parents or not). In fact, the datainclude three more outcomes for “Where do you live?”: in another person’s home,in your own place, and in group quarters such as a dormitory. When there are morethan two outcomes, or when we want to compare more than two groups, we need anew statistical test. The new test addresses a general question: is there a relationshipbetween two categorical variables?

Two-way tablesWe saw in Chapter 6 that we can present data on two categorical variables in atwo-way table of counts. That’s our starting point. Here is an example.

E X A M P L E 2 3 . 1 Health care: Canada and the United States

Canada has universal health care. The United States does not, but often offers moreelaborate treatment to patients with access. How do the two systems compare in

547

Page 2: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

548 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

treating heart attacks? A comparison of random samples of 2600 U.S. and 400 Canadianheart attack patients found that “the Canadian patients typically stayed in the hospitalone day longer (P = 0.009) than the U.S. patients but had a much lower rate of cardiaccatheterization (25 percent vs. 72 percent, P < 0.001), coronary angioplasty (11 per-cent vs. 29 percent, P < 0.001), and coronary bypass surgery (3 percent vs. 14 percent,P < 0.001).”1

The study then looked at many outcomes a year after the heart attack. There wasno significant difference in the patients’ survival rate. Another key outcome was thepatients’ own assessment of their quality of life relative to what it had been before theheart attack. Here are the data for the patients who survived a year:

Quality of life Canada United States

Much better 75 541Somewhat better 71 498About the same 96 779Somewhat worse 50 282Much worse 19 65

Total 311 2165

The two-way table in Example 23.1 shows the relationship between two cate-gorical variables. The explanatory variable is the patient’s country, Canada or theUnited States. The response variable is quality of life a year after a heart attack,with 5 categories. The two-way table gives the counts for all 10 combinations ofvalues of these variables. Each of the 10 counts occupies a cell of the table.cell

It is hard to compare the counts because the U.S. sample is much larger. Hereare the percents of each sample with each outcome:

Quality of life Canada United States

Much better 24% 25%Somewhat better 23% 23%About the same 31% 36%Somewhat worse 16% 13%Much worse 6% 3%

Total 100% 100%

In the language of Chapter 6 (page 153), these are the conditional distributions ofoutcomes, given the patients’ nationality. The differences are not large, but slightlyhigher percents of Canadians thought their quality of life was “somewhat worse”or “much worse.” Figure 23.1 compares the two distributions. We want to know ifthere is a significant difference between the two distributions of outcomes.

Page 3: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

Two-way tables 549

010

2030

40

Perc

ent

Can. U.S. Can. U.S. Can. U.S. Can. U.S. Can. U.S.

Muchworse

Somewhatworse

Aboutthe same

Somewhatbetter

Muchbetter

These bars compare“somewhat worse”percents: 16% in Canada, 13% in the United States.

F I G U R E 2 3 . 1 Bar graph comparing quality of life a year after a heart attack inCanada and the United States, for Example 23.1.

Lisl Dennis/Getty Images

A P P L Y Y O U R K N O W L E D G E

23.1 Smoking among French men. Smoking remains more common in much ofEurope than in the United States. In the United States, there is a strongrelationship between education and smoking: well-educated people are less likelyto smoke. Does a similar relationship hold in France? Here is a two-way table ofthe level of education and smoking status (nonsmoker, former smoker, moderatesmoker, heavy smoker) of a sample of 459 French men aged 20 to 60 years.2 Thesubjects are a random sample of men who visited a health center for a routinecheckup. We are willing to consider them an SRS of men from their region ofFrance.

Smoking Status

Education Nonsmoker Former Moderate Heavy

Primary school 56 54 41 36Secondary school 37 43 27 32University 53 28 36 16

(a) What percent of men with a primary school education are nonsmokers?Former smokers? Moderate smokers? Heavy smokers? These percents should addto 100% (up to roundoff error). They form the conditional distribution ofsmoking, given a primary education.

(b) In a similar way, find the conditional distributions of smoking among menwith a secondary education and among men with a university education. Make a

Page 4: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

550 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

table that presents the three conditional distributions. Be sure to include a “Total”column showing that each row adds to 100%.

(c) Compare the three conditional distributions. Is there any clear relationshipbetween education and smoking?

23.2 Attitudes toward recycled products. Recycling is supposed to save resources.Some people think recycled products are lower in quality than other products, afact that makes recycling less practical. Here are data on attitudes toward coffeefilters made of recycled paper.3

Think the quality ofthe recycled product is

Higher The same Lower

Buyers 20 7 9Nonbuyers 29 25 43

(a) It appears that people who have bought the recycled filters have morepositive opinions than those who have not. Give percents to back up this claim.Make a bar graph that compares your percents for buyers and nonbuyers.

(b) Association does not prove causation. Explain how buying recycled filtersmight improve a person’s opinion of their quality. Then explain how the opiniona person holds might influence his or her decision to buy or not. You see that thecause-and-effect relationship might go in either direction.

The problem of multiple comparisonsThe null hypothesis in Example 23.1 is that there is no difference between thedistributions of outcomes in Canada and the United States. Put more generally,the null hypothesis is that there is no relationship between two categorical variables,

H0: there is no relationship between nationality and quality of life

The alternative hypothesis says that there is a relationship but does not specify anyparticular kind of relationship,

Ha : there is some relationship between nationality and quality of life

Any difference between the Canadian and American distributions means that thenull hypothesis is false and the alternative hypothesis is true. The alternative hy-pothesis is not one-sided or two-sided. We might call it “many-sided” because itallows any kind of difference.

With only the methods we already know, we might start by comparing theproportions of patients in the two nations with “much better”quality of life, usingthe two-sample z test for proportions. We could similarly compare the proportionswith each of the other outcomes: five tests in all, with five P-values. This is a

Page 5: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

The problem of multiple comparisons 551

bad idea. The P-values belong to each test separately, not to the collection of fivetests together. Think of the distinction between the probability that a basketballplayer makes a free throw and the probability that she makes all of five free throws.When we do many individual tests or confidence intervals, the individual P-values and

CAUTIONUTION

confidence levels don’t tell us how confident we can be in all of the inferences takentogether.

Because of this, it’s cheating to pick out the largest of the five differences andthen test its significance as if it were the only comparison we had in mind. Forexample, the “much worse”proportions in Example 23.1 are significantly different(P = 0.0047) if we compare just this one outcome. But is it surprising that the mostdifferent proportions among five outcomes differ by this much? That’s a differentquestion.

The problem of how to do many comparisons at once with an overall measureof confidence in all our conclusions is common in statistics. This is the problem ofmultiple comparisons. Statistical methods for dealing with multiple comparisons multiple comparisonsusually have two steps:

1. An overall test to see if there is good evidence of any differences among theparameters that we want to compare.

2. A detailed follow-up analysis to decide which of the parameters differ and toestimate how large the differences are.

The overall test, though more complex than the tests we met earlier, is oftenreasonably straightforward. The follow-up analysis can be quite elaborate. In ourbasic introduction to statistical practice, we will concentrate on the overall test,along with data analysis that points to the nature of the differences.

A P P L Y Y O U R K N O W L E D G E

23.3 Nonsmokers and education in France. In the setting of Exercise 23.1, consideronly the proportions of nonsmokers in the three populations of men with primary,secondary, and university education. Do three significance tests of the three nullhypotheses

H0: pprimary = psecondary

H0: pprimary = puniversity

H0: psecondary = puniversity

against the two-sided alternatives. Give P-values for each test. These threeP-values don’t tell us how often the three proportions for the three educationgroups will be spread this far apart just by chance.

23.4 Who’s online? A sample survey by the Pew Internet and American Life Projectasked a random sample of adults about use of the Internet and about the type ofcommunity they lived in. Following is the two-way table:4

Page 6: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

552 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

Community Type

Rural Suburban Urban

Internet users 433 1072 536Nonusers 463 627 388

(a) Give three 95% confidence intervals, for the percents of adults in rural,suburban, and urban communities who use the Internet.

(b) Explain clearly why we are not 95% confident that all three of these intervalscapture their respective population proportions.

He started it!

A study of deaths in bar fightsshowed that in 90% of the cases,the person who died started thefight. You shouldn’t believe this. Ifyou killed someone in a fight, whatwould you say when the police askyou who started the fight? After all,dead men tell no tales.

Expected counts in two-way tablesOur general null hypothesis H0 is that there is no relationship between the twocategorical variables that label the rows and columns of a two-way table. To testH0, we compare the observed counts in the table with the expected counts, thecounts we would expect—except for random variation—if H0 were true. If theobserved counts are far from the expected counts, that is evidence against H0. Itis easy to find the expected counts.

EXPECTED COUNTS

The expected count in any cell of a two-way table when H0 is true is

expected count = row total × column totaltable total

E X A M P L E 2 3 . 2 Observed versus expected counts

Let’s find the expected counts for the quality-of-life study. Here is the two-way tablewith row and column totals:

Quality of life Canada United States Total

Much better 75 541 616Somewhat better 71 498 569About the same 96 779 875Somewhat worse 50 282 332Much worse 19 65 84

Total 311 2165 2476

The expected count of Canadians with much better quality of life a year after a heartattack is

row 1 total × column 1 totaltable total

= (616)(311)2476

= 77.37

Page 7: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

Expected counts in two-way tables 553

Here is the table of all 10 expected counts:

Quality of life Canada United States Total

Much better 77.37 538.63 616Somewhat better 71.47 497.53 569About the same 109.91 765.09 875Somewhat worse 41.70 290.30 332Much worse 10.55 73.45 84

Total 311 2165

As this table shows, the expected counts have exactly the same row and column totals (up toroundoff error) as the observed counts. That’s a good way to check your work.

To see how the data diverge from the null hypothesis, compare the observed countswith these expected counts. You see, for example, that 19 Canadians reported muchworse quality of life, whereas we would expect only 10.55 if the null hypothesis weretrue.

Why the formula works Where does the formula for an expected cell countcome from? Think of a basketball player who makes 70% of her free throws in thelong run. If she shoots 10 free throws in a game, we expect her to make 70% ofthem, or 7 of the 10. Of course, she won’t make exactly 7 every time she shoots10 free throws in a game. There is chance variation from game to game. But inthe long run, 7 of 10 is what we expect. In more formal language, if we have nindependent tries and the probability of a success on each try is p , we expect npsuccesses.

Now go back to the count of Canadians with much better quality of life a yearafter a heart attack. The proportion of all 2476 subjects with much better qualityof life is

count of successestable total

= row 1 totaltable total

= 6162476

Think of this as p , the overall proportion of successes. If H0 is true, we expect(except for random variation) this same proportion of successes in both countries.So the expected count of successes among the 311 Canadians is

np = (311)(

6162746

)= 77.37

That’s the formula in the Expected Counts box.

A P P L Y Y O U R K N O W L E D G E

23.5 Smoking among French men. The two-way table in Exercise 23.1 displays dataon the education and smoking behavior of a sample of French men. The nullhypothesis says that there is no relationship between these variables. That is, thedistribution of smoking is the same for all three levels of education.

Page 8: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

554 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

(a) Find the expected counts for each smoking status among men with auniversity education. This is one row of the two-way table of expected counts.Find the row total and verify that it agrees with the row total for the observedcounts.

(b) We conjecture that men with a university education smoke less than the nullhypothesis calls for. How does comparing the observed and expected counts inthis row confirm this conjecture?

23.6 Attitudes toward recycled products. Exercise 23.2 describes a comparison ofthe attitudes of people who do and don’t buy coffee filters made of recycled paper.The null hypothesis “no relationship” says that in the population of all consumers,the proportions who hold each attitude are the same for buyers and nonbuyers.

(a) Find the expected cell counts if this hypothesis is true and display them in atwo-way table. Add the row and column totals to your table and check that theyagree with the totals for the observed counts.

(b) Are there any large deviations between the observed counts and the expectedcounts? What kind of relationship between the two variables do these deviationspoint to?

The chi-square testThe statistical test that tells us whether the observed differences between Canadaand the United States are statistically significant compares the observed andexpected counts. The test statistic that makes the comparison is the chi-squarestatistic.

CHI-SQUARE STATISTIC

The chi-square statistic is a measure of how far the observed counts in atwo-way table are from the expected counts. The formula for the statistic is

X 2 =∑ (observed count − expected count)2

expected count

The sum is over all cells in the table.

The chi-square statistic is a sum of terms, one for each cell in the table. In thequality-of-life example, 75 Canadian patients reported much better quality of life.The expected count for this cell is 77.37. So the term of the chi-square statisticfrom this cell is

(observed count − expected count)2

expected count= (75 − 77.37)2

77.37

= 5.61777.37

= 0.073

Page 9: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

Using technology 555

Think of the chi-square statistic X2 as a measure of the distance of the ob-served counts from the expected counts. Like any distance, it is always zero orpositive, and it is zero only when the observed counts are exactly equal to the ex-pected counts. Large values of X2 are evidence against H0 because they say thatthe observed counts are far from what we would expect if H0 were true. Althoughthe alternative hypothesis Ha is many-sided, the chi-square test is one-sided because any

CAUTIONUTIONviolation of H0 tends to produce a large value of X2. Small values of X2 are notevidence against H0.

Using technologyCalculating the expected counts and then the chi-square statistic by hand is abit time-consuming. As usual, software saves time and always gets the arithmeticright. Figure 23.2 (pages 556 and 557) shows output for the chi-square test forthe quality-of-life data from a graphing calculator, two statistical programs, and aspreadsheet program.

E X A M P L E 2 3 . 3 Chi-square from software

The outputs differ in the information they give. All except the Excel spreadsheet tell usthat the chi-square statistic is X2 = 11.725, with P-value 0.020. There is quite good ev-idence that the distributions of outcomes are different in Canada and the United States.

The two statistical programs repeat the two-way table of observed counts and addthe row and column totals. Both programs offer additional information on request. Weasked CrunchIt! to add the column percents that enable us to compare the Canadianand American distributions. The chi-square statistic is a sum of 10 terms, one for eachcell in the table. We asked Minitab to give the expected count and the contributionto chi-square for each cell. The top-left cell has expected count 77.4 and chi-squareterm 0.073, just as we calculated. Look at the 10 terms. More than half the value ofX2 (6.766 out of 11.725) comes from just one cell. This points to the most importantdifference between the two countries: a higher proportion of Canadians report muchworse quality of life. Most of the rest of X2 comes from two other cells: more Canadiansreport somewhat worse quality of life, and fewer report about the same quality.

Excel is as usual more awkward than software designed for statistics. It lacks a menuselection for the chi-square test. You must program the spreadsheet to calculate theexpected cell counts and then use the CHITEST worksheet formula. This gives theP-value but not the test statistic itself. You can of course program the spreadsheet tofind the value of X2. The Excel output shows the observed and expected cell counts andthe P-value.

The chi-square test is the overall test for detecting relationships between twocategorical variables. If the test is significant, it is important to look at the data tolearn the nature of the relationship. We have three ways to look at the quality-of-life data:

• Compare appropriate percents: which outcomes occur in quite differentpercents of Canadian and American patients? This is the method we learnedin Chapter 6.

Page 10: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 22, 2006 21:38

556 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

TI-83

CrunchIt!

Canada

Much better

Cell format

Count(Column percent)

75(24.12%)

541(24.99%)

616(24.88%)

Somewhat better 71(22.83%)

498(23%)

569(22.98%)

About the same

Somewhat worse

Much worse

Total

Statistic Value P-valueDF

Chi-square 11.725485 0.01954

96(30.87%)

311(100.00%)

2165(100.00%)

2476(100.00%)

19(6.109%)

65(3.002%)

84(3.393%)

50(16.08%)

282(13.03%)

332(13.41%)

779(35.98%)

875(35.34%)

USA Total

F I G U R E 2 3 . 2 Output from the TI-83 graphing calculator, CrunchIt!, Minitab, andExcel for the two-way table in the quality-of-life study (continued).

• Compare observed and expected cell counts: which cells have more orfewer observations than we would expect if H0 were true?

• Look at the terms of the chi-square statistic: which cells contribute themost to the value of X2?

E X A M P L E 2 3 . 4 Canada and the United States: conclusions

There is a significant difference between the distributions of quality of life reported byCanadian and American patients a year after a heart attack. All three ways of comparingthe distributions agree that the main difference is that a higher proportion of Canadians

Page 11: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 22, 2006 21:38

Using technology 557

Excel

A B C D1

2

4

5

6

7

8

9

10

11

12

13

14

15

16

17

3

Sheet1 Sheet2 Sheet3

Observed Canada

Canada

USA75 541

498

77928265

71

965019

Expected USA77.3771.47

109.91

41.710.55

497.53538.63

765.09

290.373.45

CHITEST(B2:C6,B9:C13) 0.019482

Canada

Much better

Somewhat better

7577.4

0.0728

7171.5

0.0031

541538.60.0105

498497.50.0004

569569.0

*

About the same 96109.9

1.7593

779765.1

0.2527

875875.0

*

Somewhat worse 5041.7

1.6515

282290.30.2372

332332.0

*

Much worse 1910.6

6.7660

6573.4

0.9719

8484.0

*

All 311311.0

*

21652165.0

*

24762476.0

*

Cell Contents:

Pearson Chi-Square = 11.725, DF = 4, P-Value = 0.020

CountExpected countContribution to Chi-square

616616.0

*

USA All

Minitab

This key identifiesthe output for eachcell in the table.

F I G U R E 2 3 . 2 (continued).

Page 12: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 22, 2006 21:38

558 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

report that their quality of life is worse than before their heart attack. Other responsevariables measured in the study agree with this conclusion.

The broader conclusion, however, is controversial. Americans are likely to pointto the better outcomes produced by their much more intensive treatment. Canadiansreply that the differences are small, that there was no significant difference in survival,and that the American advantage comes at high cost. The resources spent on expensivetreatment of heart attack victims could instead be spent on providing basic health careto the many Americans who lack it.

There is an important message here: although statistical studies shed light on issuesof public policy, statistics alone rarely settles complicated questions such as “Which kindof health care system works better?”

A P P L Y Y O U R K N O W L E D G E

23.7 Smoking among French men. In Exercises 23.1 and 23.5, you began to analyzedata on the smoking status and education of French men. Figure 23.3 displays theMinitab output for the chi-square test applied to these data.

(a) Starting from the observed and expected counts in the output, calculate thefour terms of the chi-square statistic for the bottom row (university education).Verify that your work agrees with Minitab’s “Contribution to Chi-square” up toroundoff error.

Nonsmoker

Primary 5659.48

0.2038

5450.930.1856

4142.370.0443

3634.220.0924

187187.00

*

Secondary 3744.211.1769

4337.850.6996

2731.490.6414

3225.441.6928

139139.00

*

University 5342.31

2.7038

2836.221.8655

3630.141.1414

1624.342.8576

133133.00

*

All 146146.00

*

125125.00

*

104104.00

*

8484.00

*

459459.00

*

Cell Contents:

Pearson Chi-Square = 13.305, DF = 6, P-Value = 0.038

CountExpected countContribution to Chi-square

Former Moderate Heavy All

F I G U R E 2 3 . 3 Minitab output for the two-way table of education level and smokingstatus among French men, for Exercise 23.7.

Page 13: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 22, 2006 21:38

Cell counts required for the chi-square test 559

(b) According to Minitab, what is the value of the chi-square statistic X2 and theP-value of the chi-square test?

(c) Look at the “Contribution to Chi-square” entries in Minitab’s display. Whichterms contribute the most to X2? Write a brief summary of the nature andsignificance of the relationship between education and smoking.

23.8 Attitudes toward recycled products. In Exercises 23.2 and 23.6 you began toanalyze data on consumer attitudes toward recycled products. Figure 23.4 givesCrunchIt! output for these data.

(a) Starting from the observed and expected counts, find the six terms of thechi-square statistic and then the statistic X2 itself. Check your work against thecomputer output.

(b) What is the P-value for the test? Explain in simple language what it means toreject H0 in this setting.

(c) Which cells contribute the most to X2? What kind of relationship do theseterms in combination with the row percents in the table point to?

Cell format

Higher

Buyers 20(55.56%)

13.26

7(19.44%)

8.662

9(25%)14.08

36(100.00%)

Nonbuyers 29(29.9%)

35.74

25(25.77%)

23.34

43(44.33%)

37.92

97(100.00%)

Total

Statistic Value P-valueDF

Chi-square 7.638116 0.02192

49(36.84%)

32(24.06%)

52(39.1%)

133(100.00%)

The same Lower Total

Count(Row percent)Expected count

F I G U R E 2 3 . 4 CrunchIt! output for the study of consumer attitudes toward recycledproducts, for Exercise 23.8.

Cell counts required for the chi-square testThe chi-square test, like the z procedures for comparing two proportions, is anapproximate method that becomes more accurate as the counts in the cells of thetable get larger. We must therefore check that the counts are large enough to trustthe P-value. Fortunately, the chi-square approximation is accurate for quite modestcounts. Here is a practical guideline.5

Page 14: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

560 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

CELL COUNTS REQUIRED FOR THE CHI-SQUARE TEST

You can safely use the chi-square test with critical values from thechi-square distribution when no more than 20% of the expected counts areless than 5 and all individual expected counts are 1 or greater. In particular,all four expected counts in a 2 × 2 table should be 5 or greater.

Note that the guideline uses expected cell counts. The expected counts for thequality of life study of Example 23.1 appear in the Minitab output in Figure 23.2.The smallest expected count is 10.6, so the data easily meet the guideline for safeuse of chi-square.

A P P L Y Y O U R K N O W L E D G E

23.9 Does chi-square apply? Figure 23.3 displays Minitab output for data on Frenchmen. Using the information in the output, verify that the data meet the cellcount requirement for use of chi-square.

23.10 Does chi-square apply? Figure 23.4 displays CrunchIt! output for data onconsumer attitudes toward recycled products. Using the information in theoutput, verify that the data meet the cell count requirement for use of chi-square.

Uses of the chi-square testTwo-way tables can arise in several ways. The study of the quality of life of heart at-tack patients compared two independent random samples, one in Canada and theother in the United States. The design of the study fixed the sizes of the two sam-ples. The next example illustrates a different setting, in which all the observationscome from just one sample.

E X A M P L E 2 3 . 5 Extracurricular activities and grades

STATE: North Carolina State University studied student performance in a course re-4STEPSTEP

quired by its chemical engineering major. Students must earn at least a C in the coursein order to continue in the major. One question of interest was the relationship betweentime spent in extracurricular activities and success in the course. Students were askedto estimate how many hours per week they spent on extracurricular activities (less than2, 2 to 12, or greater than 12). The CrunchIt! output in Figure 23.5 shows the two-waytable of extracurricular activity time and course grade for the 119 students who answeredthe question.6

FORMULATE: Carry out a chi-square test for

H0: there is no relationship between extracurricular activity time and coursegrade

Page 15: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 22, 2006 18:45

Uses of the chi-square test 561

Ha : there is some relationship between these two variables

Compare column percents or observed versus expected cell counts or terms of chi-squareto see the nature of the relationship.

SOLVE: First check the guideline for use of chi-square. The expected cell counts ap-pear in the output in Figure 23.5. Two of the expected counts are quite small, 5.513and 2.487. But all the expected counts are greater than 1, and only 1 out of 6 (17%) isless than 5. We can safely use chi-square. The output shows that there is a significantrelationship (X2 = 6.926, P = 0.0313). The column percents show an interesting pat-tern: students who spend low and high amounts of time on extracurricular activities areboth less likely to earn a C or better than students who spend a moderate amount oftime.

Alt-6/AlamyCONCLUDE: We find that 75% of students in the moderate extracurricular activitygroup succeed in the course, compared with 55% in the low group and only 38% in thehigh group. These differences in success percents are significant (P = 0.03). Becausethere are few students in the low and (especially) high groups, we now wish that thequestionnaire had not lumped 2 to 12 hours together. We should also look at other datathat might help explain the pattern. For example, are the “low extracurricular” studentsmore often employed? Or are they students with low GPAs who are struggling despitelots of study time?

F I G U R E 2 3 . 5 CrunchIt! output for the two-way table of course grade andextracurricular activities, for Example 23.5.

Page 16: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

562 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

Pay attention to the nature of the data in Example 23.5:

• We do not have three separate samples of students with low, moderate, andhigh extracurricular activity. We have a single group of 119 students, eachclassified in two ways (extracurricular activity and course grade).

• The data (except for small nonresponse) cover all of the students enrolled inthis course in one semester. We might regard this as a sample of studentsenrolled in the course over several years. But we might also regard these119 students as the entire population rather than a sample from a largerpopulation.

One of the most useful properties of chi-square is that it tests the null hypoth-esis “the row and column variables are not related to each other” whenever thishypothesis makes sense for a two-way table. It makes sense when we are compar-ing a categorical response in two or more samples, as when we compared qualityof life for patients in Canada and the United States. The hypothesis also makessense when we have data on two categorical variables for the individuals in asingle sample, as when we examined grades and extracurricular activities for asample of college students. The hypothesis “no relationship” makes sense evenif the single sample is an entire population. Statistical significance has the samemeaning in all these settings: “A relationship this strong is not likely to happenjust by chance.” This makes sense whether the data are a sample or an entirepopulation.

USES OF THE CHI-SQUARE TEST

Use the chi-square test to test the null hypothesis

H0: there is no relationship between two categorical variables

when you have a two-way table from one of these situations:• Independent SRSs from each of two or more populations, with

each individual classified according to one categorical variable.(The other variable says which sample the individual comesfrom.)

• A single SRS, with each individual classified according to both of twocategorical variables.

A P P L Y Y O U R K N O W L E D G E

23.11 Majors for men and women in business. A study of the career plans of youngwomen and men sent questionnaires to all 722 members of the senior class in the

Page 17: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

The chi-square distributions 563

College of Business Administration at the University of Illinois. One questionasked which major within the business program the student had chosen. Here arethe data from the students who responded:7

Female Male

Accounting 68 56Administration 91 40Economics 5 6Finance 61 59

This is an example of a single sample classified according to two categoricalvariables (gender and major).

(a) Describe the differences between the distributions of majors for women andmen with percents, with a bar graph, and in words.

(b) Verify that the expected cell counts satisfy the requirement for use ofchi-square.

(c) Test the null hypothesis that there is no relationship between the gender ofstudents and their choice of major. Give a P-value.

(d) Which two cells have the largest terms of the chi-square statistic? How dothe observed and expected counts differ in these cells? (This should strengthenyour conclusions in (a).)

(e) What percent of the students did not respond to the questionnaire? Why doesthis nonresponse weaken conclusions drawn from these data?

The chi-square distributionsSoftware usually finds P-values for us. The P-value for a chi-square test comes fromcomparing the value of the chi-square statistic with critical values for a chi-squaredistribution.

THE CHI-SQUARE DISTRIBUTIONS

The chi-square distributions are a family of distributions that take onlypositive values and are skewed to the right. A specific chi-squaredistribution is specified by giving its degrees of freedom.

The chi-square test for a two-way table with r rows and c columns usescritical values from the chi-square distribution with (r − 1)(c − 1) degreesof freedom. The P-value is the area to the right of X2 under the densitycurve of this chi-square distribution.

Page 18: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

564 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

0

df = 1

df = 4

df = 8

F I G U R E 2 3 . 6 Density curves for the chi-square distributions with 1, 4, and 8 degreesof freedom. Chi-square distributions take only positive values and are right-skewed.

Figure 23.6 shows the density curves for three members of the chi-square familyof distributions. As the degrees of freedom increase, the density curves become lessskewed and larger values become more probable. Table E in the back of the bookgives critical values for chi-square distributions. You can use Table E if you do nothave software that gives you P-values for a chi-square test.

E X A M P L E 2 3 . 6 Using the chi-square table

The two-way table of 5 outcomes by 2 countries for the quality-of-life study has 5 rowsand 2 columns. That is, r = 5 and c = 2. The chi-square statistic therefore has degreesof freedom

(r − 1)(c − 1) = (5 − 1)(2 − 1) = (4)(1) = 4

Three of the outputs in Figure 23.2 give 4 as the degrees of freedom.df = 4

p .02 .01

x∗ 11.67 13.28

The observed value of the chi-square statistic is X 2 = 11.725. Look in the df =4 row of Table E. The value X2 = 11.725 falls between the 0.02 and 0.01 critical valuesof the chi-square distribution with 4 degrees of freedom. Remember that the chi-squaretest is always one-sided. So the P-value of X2 = 11.725 is between 0.02 and 0.01. Theoutputs in Figure 23.2 show that the P-value is 0.0195, close to 0.02.

We know that all z and t statistics measure the size of an effect in the standardscale centered at zero. We can roughly assess the size of any z or t statistic by the68–95–99.7 rule, though this is exact only for z. The chi-square statistic does not

Page 19: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

The chi-square test and the z test 565

have any such natural interpretation. But here is a helpful fact: the mean of anychi-square distribution is equal to its degrees of freedom. In Example 23.6, X2 wouldhave mean 4 if the null hypothesis were true. The observed value X2 = 11.725is so much larger than 4 that we suspect it is significant even before we look atTable E.

A P P L Y Y O U R K N O W L E D G E

23.12 Attitudes toward recycled products. The CrunchIt! output in Figure 23.4gives 2 degrees of freedom for the table in Exercise 23.2.

(a) Verify that this is correct.

(b) The computer gives the value of the chi-square statistic as X2 = 7.638.Between what two entries in Table E does this value lie? What does the table tellyou about the P-value?

(c) What is the mean value of the statistic X2 if the null hypothesis is true? Howdoes the observed value of X2 compare with this mean?

23.13 Smoking among French men. The Minitab output in Figure 23.3 gives thedegrees of freedom for the table of education and smoking status as DF = 6.

(a) Show that this is correct for a table with 3 rows and 4 columns.

(b) Minitab gives the chi-square statistic as Chi-Square 13.305. Betweenwhich two entries in Table E does this value lie? Verify that Minitab’s resultP-Value = 0.038 lies between the tail areas for these values.

The chi-square test and the z test∗

One use of the chi-square test is to compare the proportions of successes in anynumber of groups. If the r rows of the two-way table are r groups and the columnsare “success”and “failure,” the counts form an r × 2 table. P-values come from thechi-square distribution with r − 1 degrees of freedom. If r = 2, we are comparingjust two proportions. We now have two ways to do this: the z test from Chapter 21and the chi-square test with 1 degree of freedom for a 2 × 2 table. These two testsalways agree. In fact, the chi-square statistic X2 is just the square of the z statistic,and the P-value for X2 is exactly the same as the two-sided P-value for z. Werecommend using the z test to compare two proportions because it gives you thechoice of a one-sided test and is related to a confidence interval for the differencep1 − p2.

A P P L Y Y O U R K N O W L E D G E

23.14 Treating ulcers. Gastric freezing was once a recommended treatment for ulcersin the upper intestine. Use of gastric freezing stopped after experiments showed ithad no effect. One randomized comparative experiment found that 28 of the 82

∗The remainder of the material in this chapter is optional.

Page 20: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

566 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

gastric-freezing patients improved, while 30 of the 78 patients in the placebogroup improved.8 We can test the hypothesis of “no difference” between the twogroups in two ways: using the two-sample z statistic or using the chi-squarestatistic.

(a) Check the conditions required for both tests, given in the boxes on pages 521and 560. The conditions are very similar, as they ought to be.

(b) State the null hypothesis with a two-sided alternative and carry out the z test.What is the P-value, exactly from software or approximately from the bottom rowof Table C?

(c) Present the data in a 2 × 2 table. Use the chi-square test to test thehypothesis from (a). Verify that the X2 statistic is the square of the z statistic. Usesoftware or Table E to verify that the chi-square P-value agrees with the z result(up to the accuracy of the tables if you do not use software).

(d) What do you conclude about the effectiveness of gastric freezing as atreatment for ulcers?

The chi-square test for goodness of fit∗

The most common and most important use of the chi-square statistic is to test thehypothesis that there is no relationship between two categorical variables. A variationof the statistic can be used to test a different kind of null hypothesis: that a cat-egorical variable has a specified distribution. Here is an example that illustrates thisuse of chi-square.

More chi-square tests

There are other chi-square tests forhypotheses more specific than “norelationship.” A sociologist placespeople in classes by social status,waits ten years, then classifies thesame people again. The row andcolumn variables are the classes atthe two times. She might test thehypothesis that there has been nochange in the overall distribution ofsocial status in the group. Or shemight ask if moves up in status arebalanced by matching moves down.These and other null hypothesescan be tested by variations of thechi-square test.

E X A M P L E 2 3 . 7 Never on Sunday?

Births are not evenly distributed across the days of the week. Fewer babies are born onSaturday and Sunday than on other days, probably because doctors find weekend birthsinconvenient. Exercise 1.4 (page 10) gives national data that demonstrate this fact.

A random sample of 140 births from local records shows this distribution across thedays of the week:

Day Sun. Mon. Tue. Wed. Thu. Fri. Sat.

Births 13 23 24 20 27 18 15

Sure enough, the two smallest counts of births are on Saturday and Sunday. Do thesedata give significant evidence that local births are not equally likely on all days of theweek?

The chi-square test answers the question of Example 23.7 by comparing ob-served counts with expected counts under the null hypothesis. The null hypothesisfor births says that they are evenly distributed. To state the hypotheses carefully,write the discrete probability distribution for days of birth:

Page 21: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

The chi-square test for goodness of fit 567

Day Sun. Mon. Tue. Wed. Thu. Fri. Sat.

Probability p1 p2 p3 p4 p5 p6 p7

The null hypothesis says that the probabilities are the same on all days. In thatcase, all 7 probabilities must be 1/7. So the null hypothesis is

H0: p1 = p2 = p3 = p4 = p5 = p6 = p7 = 17

The alternative hypothesis says that days are not all equally probable:

Ha: not all pi = 17

As usual in chi-square tests, Ha is a “many-sided” hypothesis that simply says thatH0 is not true. The chi-square statistic is also as usual:

X2 =∑ (observed count − expected count)2

expected count

The expected count for an outcome with probability p is np , as we saw in thediscussion following Example 23.2. Under the null hypothesis, all the probabilitiespi are the same, so all 7 expected counts are equal to

npi = 140 × 17

= 20

These expected counts easily satisfy our guideline for using chi-square. The chi-square statistic is

X2 =∑ (observed count − 20)2

20

= (13 − 20)2

20+ (23 − 20)2

20+ · · · + (15 − 20)2

20= 7.6

This new use of X2 requires a different degrees of freedom. To find the P-value, compare X2 with critical values from the chi-square distribution withdegrees of freedom one less than the number of values the birth day can take. That’s7 − 1 = 6 degrees of freedom. From Table E, we see that X2 = 7.6 is smaller thanthe smallest entry in the df = 6 row, which is the critical value for tail area 0.25.The P-value is therefore greater than 0.25 (software gives the more exact valueP = 0.269). These 140 births don’t give convincing evidence that births are notequally likely on all days of the week.

df = 6

p .25 .20

x∗ 7.84 8.56

The chi-square test applied to the hypothesis that a categorical variable has aspecified distribution is called the test for goodness of fit. The idea is that the testassesses whether the observed counts “fit” the distribution. The only differencesbetween the test of fit and the test for a two-way table are that the expected counts

Page 22: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

568 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

are based on the distribution specified by the null hypothesis and that the degreesof freedom are one less than the number of possible outcomes in this distribution.Here are the details.

THE CHI-SQUARE TEST FOR GOODNESS OF FIT

A categorical variable has k possible outcomes, with probabilities p1, p2,p3, . . . , pk . That is, pi is the probability of the ith outcome. We haven independent observations from this categorical variable.To test the null hypothesis that the probabilities have specified values

H0: p1 = p10, p2 = p20, . . . , pk = pk0

use the chi-square statistic

X2 =∑ (count of outcome i − npi 0)2

npi 0

The P-value is the area to the right of X2 under the density curve of thechi-square distribution with k − 1 degrees of freedom.

In Example 23.7, the outcomes are days of the week, with k = 7. The nullhypothesis says that the probability of a birth on the ith day is pi 0 = 1/7 for alldays. We observe n = 140 births and count how many fall on each day. These arethe counts used in the chi-square statistic.

Randy Duchaine/CORBIS

A P P L Y Y O U R K N O W L E D G E

23.15 Saving birds from windows. Many birds are injured or killed by flying intowindows. It appears that birds don’t see windows. Can tilting windows down sothat they reflect earth rather than sky reduce bird strikes? Place six windows at theedge of a woods: two vertical, two tilted 20 degrees, and two tilted 40 degrees.During the next four months, there were 53 bird strikes, 31 on the verticalwindow, 14 on the 20-degree window, and 8 on the 40-degree window.9 If the tilthas no effect, we expect strikes on all three windows to have equal probability.Test this null hypothesis. What do you conclude?

23.16 More on birth days. Births really are not evenly distributed across the days ofthe week. The data in Example 23.7 failed to reject this null hypothesis becauseof random variation in a quite small number of births. Here are data on 700 birthsin the same locale:

Day Sun. Mon. Tue. Wed. Thu. Fri. Sat.

Births 84 110 124 104 94 112 72

Page 23: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

The chi-square test for goodness of fit 569

(a) The null hypothesis is that all days are equally probable. What are theprobabilities specified by this null hypothesis? What are the expected counts foreach day in 700 births?

(b) Calculate the chi-square statistic for goodness of fit.

(c) What are the degrees of freedom for this statistic? Do these 700 births givesignificant evidence that births are not equally probable on all days of the week?

23.17 Course grades. Most students in a large statistics course are taught by teachingassistants (TAs). One section is taught by the course supervisor, a senior professor.The distribution of grades for the hundreds of students taught by TAs thissemester was

Grade A B C D/F

Probability 0.32 0.41 0.20 0.07

The grades assigned by the professor to students in his section were

Grade A B C D/F

Count 22 38 20 11

(These data are real. We won’t say when and where, but the professor was not theauthor of this book.)

(a) What percents of each grade did students in the professor’s section earn? Inwhat ways does this distribution of grades differ from the TA distribution?

(b) Because the TA distribution is based on hundreds of students, we are willingto regard it as a fixed probability distribution. If the professor’s grading follows thisdistribution, what are the expected counts of each grade in his section?

(c) Does the chi-square test for goodness of fit give good evidence that theprofessor’s grades follow a different distribution? (State hypotheses, check theguideline for using chi-square, give the test statistic and its P-value, and state yourconclusion.)

23.18 What’s your sign? The University of Chicago’s General Social Survey (GSS) is 4STEPSTEP

the nation’s most important social science sample survey. For reasons known onlyto social scientists, the GSS regularly asks its subjects their astrological sign. Hereare the counts of responses in the most recent year this question was asked:10

Sign Aries Taurus Gemini Cancer Leo Virgo

Count 225 222 241 240 260 250

Sign Libra Scorpio Sagittarius Capricorn Aquarius Pisces

Count 243 214 200 216 224 244

If births are spread uniformly across the year, we expect all 12 signs to be equallylikely. Are they? Follow the four-step process in your answer.

Page 24: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

570 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

C H A P T E R 23 SUMMARYThe chi-square test for a two-way table tests the null hypothesis H0 that there isno relationship between the row variable and the column variable. Thealternative hypothesis Ha says that there is some relationship but does not saywhat kind.The test compares the observed counts of observations in the cells of the tablewith the counts that would be expected if H0 were true. The expected count inany cell is

expected count = row total × column totaltable total

The chi-square statistic is

X2 =∑ (observed count − expected count)2

expected count

The chi-square test compares the value of the statistic X2 with critical valuesfrom the chi-square distribution with (r − 1)(c − 1) degrees of freedom. Largevalues of X2 are evidence against H0, so the P-value is the area under thechi-square density curve to the right of X2.The chi-square distribution is an approximation to the distribution of thestatistic X2. You can safely use this approximation when all expected cell countsare at least 1 and no more than 20% are less than 5.If the chi-square test finds a statistically significant relationship between the rowand column variables in a two-way table, do data analysis to describe the natureof the relationship. You can do this by comparing well-chosen percents,comparing the observed counts with the expected counts, and looking for thelargest terms of the chi-square statistic.

S T A T I S T I C S I N S U M M A R YHere are the most important skills you should have acquired from reading thischapter.

A. TWO-WAY TABLES1. Understand that the data for a chi-square test must be presented as a

two-way table of counts of outcomes.2. Use percents to describe the relationship between any two categorical

variables, starting from the counts in a two-way table.

B. INTERPRETING CHI-SQUARE TESTS1. Locate the chi-square statistic, its P-value, and other useful facts (row or

column percents, expected counts, terms of chi-square) in output fromyour software or calculator.

Page 25: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

Check Your Skills 571

2. Use the expected counts to check whether you can safely use thechi-square test.

3. Explain what null hypothesis the chi-square statistic tests in a specifictwo-way table.

4. If the test is significant, compare percents, compare observed withexpected cell counts, or look for the largest terms of the chi-square statisticto see what deviations from the null hypothesis are most important.

C. DOING CHI-SQUARE TESTS BY HAND1. Calculate the expected count for any cell from the observed counts in a

two-way table. Check whether you can safely use the chi-square test.2. Calculate the term of the chi-square statistic for any cell, as well as the

overall statistic.3. Give the degrees of freedom of a chi-square statistic. Make a quick

assessment of the significance of the statistic by comparing the observedvalue with the degrees of freedom.

4. Use the chi-square critical values in Table E to approximate the P-value ofa chi-square test.

C H E C K Y O U R S K I L L S

The National Survey of Adolescent Health interviewed several thousand teens (grades 7to 12). One question asked was “What do you think are the chances you will be marriedin the next ten years?”Here is a two-way table of the responses by sex:11

Female Male

Almost no chance 119 103Some chance, but probably not 150 171A 50-50 chance 447 512A good chance 735 710Almost certain 1174 756

23.19 The number of female teenagers in the sample is

(a) 4877. (b) 2625. (c) 2252.

23.20 The percent of the females in the sample who responded “almost certain” is about

(a) 44.7%. (b) 39.6%. (c) 33.6%.

23.21 The percent of the females in the sample who responded “almost certain” is

(a) higher than the percent of males who felt this way.

(b) about the same as the percent of males who felt this way.

(c) lower than the percent of males who felt this way.

23.22 The expected count of females who respond “almost certain” is about

(a) 464.6. (b) 891.2. (c) 1038.8.

Page 26: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

572 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

23.23 The term in the chi-square statistic for the cell of females who respond “almostcertain” is about

(a) 17.6. (b) 15.6. (c) 0.1.

23.24 The degrees of freedom for the chi-square test for this two-way table are

(a) 4. (b) 8. (c) 20.

23.25 The null hypothesis for the chi-square test for this two-way table is

(a) Equal proportions of female and male teenagers are almost certain they willbe married in ten years.

(b) There is no difference between female and male teenagers in theirdistributions of opinions about marriage.

(c) There are equal numbers of female and male teenagers.

23.26 The alternative hypothesis for the chi-square test for this two-way table is

(a) Female and male teenagers do not have the same distribution of opinionsabout marriage.

(b) Female teenagers are more likely than male teenagers to think it is almostcertain they will be married in ten years.

(c) Female teenagers are less likely than male teenagers to think it is almostcertain they will be married in ten years.

23.27 Software gives chi-square statistic X2 = 69.8 for this table. From the table ofcritical values, we can say that the P-value is

(a) between 0.0025 and 0.001.

(b) between 0.001 and 0.0005.

(c) less than 0.0005.

23.28 The most important fact that allows us to trust the results of the chi-square test isthat

(a) the sample is large, 4877 teenagers in all.

(b) the sample is close to an SRS of all teenagers.

(c) all of the cell counts are greater than 100.

C H A P T E R 23 EXERCISES

If you have access to software or a graphing calculator, use it to speed your analysis ofthe data in these exercises. Exercises 23.29 to 23.38 are suitable for hand calculation ifnecessary.

23.29 Who’s online? A sample survey by the Pew Internet and American Life Projectasked a random sample of adults about use of the Internet and about the type ofcommunity they lived in. Here, repeated from Exercise 23.4, is the two-way table:

Community Type

Rural Suburban Urban

Internet users 433 1072 536Nonusers 463 627 388

Page 27: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

Chapter 23 Exercises 573

(a) Give a 95% confidence interval for the difference between the proportions ofrural and suburban adults who use the Internet.

(b) What is the overall pattern of the relationship between Internet use andcommunity type? Is the relationship statistically significant?

First Light/Getty Images

23.30 Child care workers. A large study of child care used samples from the datatapes of the Current Population Survey over a period of several years. The result isclose to an SRS of child care workers. The Current Population Survey has threeclasses of child care workers: private household, nonhousehold, and preschoolteacher. Here are data on the number of blacks among women workers in thesethree classes:12

Total Black

Household 2455 172Nonhousehold 1191 167Teachers 659 86

(a) What percent of each class of child care workers is black?

(b) Make a two-way table of class of worker by race (black or other).

(c) Can we safely use the chi-square test? What null and alternative hypothesesdoes X2 test?

(d) The chi-square statistic for this table is X2 = 53.194. What are its degrees offreedom? What is the mean of X2 if the null hypothesis is true? Use Table E toapproximate the P-value of the test.

(e) What do you conclude from these data?

23.31 Free speech for racists? The General Social Survey (GSS) for 2002 asked thisquestion: “Consider a person who believes that Blacks are genetically inferior. Ifsuch a person wanted to make a speech in your community claiming that Blacksare inferior, should he be allowed to speak, or not?” Here are the responses,broken down by the race of the respondent:13

Black White Other

Allowed 67 476 35Not allowed 53 252 17

(a) Because the GSS is essentially an SRS of all adults, we can combine the racesin these data and give a 99% confidence interval for the proportion of all adultswho would allow a racist to speak. Do this.

(b) Find the column percents and use them to compare the attitudes of the threeracial groups. How significant are the differences found in the sample?

23.32 Do you use cocaine? Sample surveys on sensitive issues can give differentresults depending on how the question is asked. A University of Wisconsin studydivided 2400 respondents into 3 groups at random. All were asked if they hadever used cocaine. One group of 800 was interviewed by phone; 21% said theyhad used cocaine. Another 800 people were asked the question in a one-on-one

Page 28: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

574 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

personal interview; 25% said “Yes.” The remaining 800 were allowed to make ananonymous written response; 28% said “Yes.” 14 Are there statistically significantdifferences among these proportions? State the hypotheses, convert theinformation given into a two-way table of counts, give the test statistic and itsP-value, and state your conclusions.

23.33 Ethnicity and seat belt use. How does seat belt use vary with drivers’ race orethnic group? The answer depends on gender (males are less likely to buckle up)and also on location. Here are data on a random sample of male drivers observedin Houston:15

Drivers Belted

Black 369 273Hispanic 540 372White 257 193

(a) The table gives the number of drivers in each group and the number of thesewho were wearing seat belts. Make a two-way table of group by belted or not.

(b) Are there statistically significant differences in seat belt use among men inthese three groups? If there are, describe the differences.

23.34 Did the randomization work? After randomly assigning subjects to treatmentsin a randomized comparative experiment, we can compare the treatment groupsto see how well the randomization worked. We hope to find no significantdifferences among the groups. A study of how to provide premature infants with asubstance essential to their development assigned infants at random to receiveone of four types of supplement, called PBM, NLCP, PL-LCP, and TG-LCP.16

(a) The subjects were 77 premature infants. Outline the design of theexperiment if 20 are assigned to the PBM group and 19 to each of the othertreatments.

(b) The random assignment resulted in 9 females in the TG-LCP group and11 females in each of the other groups. Make a two-way table of group by genderand do a chi-square test to see if there are significant differences among thegroups. What do you find?

23.35 Opinions about the death penalty. “Do you favor or oppose the death penaltyfor persons convicted of murder? ” When the General Social Survey asked thisquestion in its 2002 survey, the responses of people whose highest education was abachelor’s degree and of people with a graduate degree were as follows:17

Favor Oppose

Bachelor 135 71Graduate 64 50

(a) Is there evidence that the proportions of all people at these levels ofeducation who favor the death penalty differ? Find the two sample proportions,the z statistic, and its P-value.

Page 29: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 22, 2006 21:38

Chapter 23 Exercises 575

(b) Is there evidence that the opinions of all people at these levels of educationdiffer? Find the chi-square statistic X2 and its P-value. If your work is correct, X2

should be the same as z2 and the two P-values should be identical.

23.36 Unhappy rats and tumors. Some people think that the attitude of cancerpatients can influence the progress of their disease. We can’t experiment withhumans, but here is a rat experiment on this theme. Inject 60 rats with tumorcells and then divide them at random into two groups of 30. All the rats receiveelectric shocks, but rats in Group 1 can end the shock by pressing a lever. (Ratslearn this sort of thing quickly.) The rats in Group 2 cannot control the shocks,which presumably makes them feel helpless and unhappy. We suspect that the ratsin Group 1 will develop fewer tumors. The results: 11 of the Group 1 rats and 22of the Group 2 rats developed tumors.18

(a) State the null and alternative hypotheses for this investigation. Explainwhy the z test rather than the chi-square test for a 2 × 2 table is the proper test.

(b) Carry out the test and report your conclusion.

23.37 Regulating guns. The National Gun Policy Survey, conducted by the NationalOpinion Research Center at the University of Chicago, asked a random sample ofadults many questions about regulation of guns in the United States. One of thequestions was “Do you think there should be a law that would ban possession ofhandguns except for the police and other authorized persons? ” Figure 23.7

Lessthanhigh

school

Highschool

GraduateSome

collegeCollege

GraduatePostgraduate

degree All

Yes 5850.002.6055

8439.440.0558

16936.50

1.7989

9842.060.1463

7743.750.4690

48640.47

*

No 5850.00

1.7710

12960.56

0.0379

29463.501.2228

13557.940.0994

9956.25

0.3188

71559.53

*

All

Cell Contents:

Pearson Chi-Square = 8.525, DF = 4, P-Value = 0.074

Count% of ColumnContribution to Chi-square

116100.00

*

213100.00

*

463100.00

*

233100.00

*

176100.00

*

1201100.00

*

F I G U R E 2 3 . 7 Minitab output for the sample survey responses of Exercise 23.37.

Page 30: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 22, 2006 21:38

576 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

displays Minitab output that includes the two-way table of response versus therespondents’ highest level of education.19

(a) The column percents show the breakdown of responses separately for eachlevel of education. Which education groups show particularly high and lowsupport for the proposed law? Which education group’s responses contribute themost to the size of the chi-square statistic? Is there a consistent direction in therelationship, such as “people with more education are more likely to supportstrong gun laws”?

(b) Verify the degrees of freedom given by Minitab. How does the value of thechi-square statistic compare with its mean under the null hypothesis? What doyou conclude from the chi-square test?

23.38 I think I’ll be rich by age 30. A sample survey of young adults(aged 19 to 25) asked, “What do you think are the chances you will havemuch more than a middle-class income at age 30? ” The CrunchIt! outputin Figure 23.8 shows the two-way table and related information, omittinga few subjects who refused to respond or who said they were already rich.20

Male

Almost no chance

Cell format

Count(Column percent)

98(3.985%)

96(4.056%)

194(4.02%)

Some, but probably not 286(11.63%)

426(18%)

712(14.75%)

A 50-50 chance 720(29.28%)

696(29.4%)

1416(29.34%)

A good chance 758(30.83%)

663(28.01%)

1421(29.44%)

Almost certain 597(24.28%)

486(20.53%)

1083(22.44%)

Total 2459(100.00%)

2367(100.00%)

4826(100.00%)

Female Total

Statistic

Chi-square 4 43.94552 <0.0001

DF Value P-Value

F I G U R E 2 3 . 8 CrunchIt! output for the sample survey responses of Exercise 23.38.

Page 31: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

Chapter 23 Exercises 577

Use the output as the basis for a discussion of the differences betweenyoung men and young women in assessing their chances of being rich byage 30.

The remaining exercises concern larger tables that require software for easy analysis. 4STEPSTEP

Follow the Formulate, Solve, and Conclude steps of the four-part process in youranswers to these exercises. It may be helpful to restate in your own words the Stateinformation given in the exercise.

23.39 Students and catalog shopping. What is the most important reason thatstudents buy from catalogs? The answer may differ for different groups of students.Here are results for samples of American and East Asian students at a largemidwestern university:21

American Asian

Save time 29 10Easy 28 11Low price 17 34Live far from stores 11 4No pressure to buy 10 3Other reason 20 7

Total 115 69

Describe the most important differences between American and Asian students.Is there a significant overall difference between the two distributions ofresponses?

23.40 Where do young adults live? A survey by the National Institutes of Healthasked a random sample of young adults (aged 19 to 25), “Where do you live now?That is, where do you stay most often? ” We earlier (page 513) compared theproportions of men and women who lived with their parents. Here now is the fulltwo-way table (omitting a few who refused to answer and one who claimed to behomeless):22

Female Male

Parents’ home 923 986Another person’s home 144 132Own place 1294 1129Group quarters 127 119

What are the most important differences between young men and women? Aretheir choices of living places significantly different?

23.41 How are schools doing? The nonprofit group Public Agenda conductedtelephone interviews with a stratified sample of parents of high school children.There were 202 black parents, 202 Hispanic parents, and 201 white parents.

Page 32: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

578 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

One question asked was “Are the high schools in your state doing an excellent,good, fair or poor job, or don’t you know enough to say? ” Here are the surveyresults:23

Black Hispanic Whiteparents parents parents

Excellent 12 34 22Good 69 55 81Fair 75 61 60Poor 24 24 24Don’t know 22 28 14

Total 202 202 201

Are the differences in the distributions of responses for the three groupsof parents statistically significant? What departures from the null hypothesis “norelationship between group and response” contribute most to the value of thechi-square statistic? Write a brief conclusion based on your analysis.

Hugh Burden/SuperStock

23.42 The Mediterranean diet. Cancer of the colon and rectum is less common in theMediterranean region than in other Western countries. The Mediterranean dietcontains little animal fat and lots of olive oil. Italian researchers compared1953 patients with colon or rectal cancer with a control group of 4154 patientsadmitted to the same hospitals for unrelated reasons. They estimated consumptionof various foods from a detailed interview, then divided the patients into threegroups according to their consumption of olive oil. Here are some of thedata:24

Olive Oil

Low Medium High Total

Colon cancer 398 397 430 1225Rectal cancer 250 241 237 728Controls 1368 1377 1409 4154

(a) Is this study an experiment? Explain your answer.

(b) The investigators report that “less than 4% of cases or controls refused toparticipate.” Why does this fact strengthen our confidence in the results?

(c) The researchers conjectured that high olive oil consumption would be morecommon among patients without cancer than among patients with colon canceror rectal cancer. What do the data say?

23.43 Market research. Before bringing a new product to market, firms carry outextensive studies to learn how consumers react to the product and how best toadvertise its advantages. Here are data from a study of a new laundry detergent.25

The subjects are people who don’t currently use the established brand that thenew product will compete with. Give subjects free samples of both detergents.

Page 33: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

Chapter 23 Exercises 579

After they have tried both for a while, ask which they prefer. The answers maydepend on other facts about how people do laundry.

Laundry Practices

Soft water, Soft water, Hard water, Hard water,warm wash hot wash warm wash hot wash

Prefer standard product 53 27 42 30Prefer new product 63 29 68 42

How do laundry practices (water hardness and wash temperature) influence thechoice of detergent? In which settings does the new detergent do best? Are thedifferences between the detergents statistically significant?

Support for political parties. Political parties anxiously ask what groups of peoplesupport them. The General Social Survey (GSS) asked its 2002 sample, “Generallyspeaking, do you usually think of yourself as a Republican, Democrat, Independent, orwhat?”Here is a large two-way table breaking down the responses by age group:26

Age Group

18–30 31–40 41–55 56–89

Strong Democrat 60 83 113 151Not strong Democrat 99 126 138 148Independent, near Democrat 72 56 77 62Independent 152 124 149 102Independent, near Republican 53 41 50 54Not strong Republican 90 85 133 138Strong Republican 42 56 89 127Other party 9 12 14 13

Exercises 23.44 to 23.46 are based on this table.

23.44 Other parties. The GSS is essentially an SRS of American adults. Give a 95%confidence interval for the proportion of adults who support “other parties.”

23.45 Party support. Make a 2 × 4 table by combining the counts in the three rowsthat mention Democrat and in the three rows that mention Republican andignoring strict independents and supporters of other parties. We might think ofthis table as comparing all adults who lean Democrat and all adults who leanRepublican. How does support of the two major parties differ among agegroups?

23.46 Politics and age. Use the full table to analyze the differences in political partysupport among age groups. The sample is so large that the differences are bound tobe highly significant, but give the chi-square statistic and its P-value nonetheless.The main challenge is in seeing what the data say. Does the full table yield anyinsights not found in the compressed table you analyzed in the previousexercise?

Page 34: CHAPTER Lisl Dennis/Getty Imagesvirtual.yosemite.cc.ca.us/jcurl/Math134 4 s/ch23-547-580.pdf · ... either two populations or two treatment groups in an experiment. ... heart attack

P1: PBU/OVY P2: PBU/OVY QC: PBU/OVY T1: PBU

GTBL011-23 GTBL011-Moore-v18.cls June 21, 2006 21:6

580 C H A P T E R 23 • Two Categorical Variables: The Chi-Square Test

E E S E E CASE STUDIES

The Electronic Encyclopedia of Statistical Examples and Exercises (EESEE) is availableon the text CD and Web site. These more elaborate stories, with data, provide settingsfor longer case studies. Here are some suggestions for EESEE stories that apply thechi-square test.

23.47 Read the EESEE story “Surgery in a Blanket.” Write a report that answersQuestions 1, 3, 5, 6, and 7 for this case study.

23.48 Read the EESEE story “Trilobite Bites.” Write a report that answers Questions 1,2, 4, and 5 for this case study.