Chi-Square Test ( χ 2 )

25
Chi-Square Test (χ 2 ) χ – greek symbol “chi”

description

Chi-Square Test ( χ 2 ). χ – greek symbol “chi”. When is the Chi-Square Test used?. The chi-square test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. - PowerPoint PPT Presentation

Transcript of Chi-Square Test ( χ 2 )

Page 1: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

χ – greek symbol “chi”

Page 2: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)When is the Chi-Square Test used?

The chi-square test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.

Also, the chi-square test is used to test for independence of two or more different categories.

If there is a significant difference, it basically implies that χ2 > σ, where σ is the stated significance level with usual values of 1%, 5% or 10%.

Take note that the significance level (σ) is always given in a problem.

Page 3: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)Chi-Square Test Requirements

1. Quantitative data. 2. One or more categories. 3. Independent observations. 4. Adequate sample size (at least 10). 5. Simple random sample. 6. Data in frequency form. 7. All observations must be used.

Page 4: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)How to find the value of χ2?

Consider this problem:Carl, the manager of a car dealership, did not want to stock cars that were bought less frequently because of their unpopular color. The five colors that he ordered were red, yellow, green, blue, and white. According to Carl, the expected frequencies or number of customers choosing each color should follow the percentages of last year.She felt 20% would choose yellow, 30% would choose red, 10% would choose green, 10% would choose blue, and 30% would choose white. She now took a random sample of 150 customers and asked them their color preferences. Is there a significant difference between the observed and expected frequencies? σ = 5%

Page 5: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)Color Preferences for 150 Customers:

Category Color Observed Frequency

Yellow 35

Red 50

Green 30

Blue 10

White 25

Page 6: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

We are testing if Carl’s expected frequencies “fit” with the observed

frequencies. That is why a chi square test is sometimes called the goodness

of fit or how good some expected frequency fits into observed data.

Page 7: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)We must first state our hypotheses, (Ho and Ha)

Null hypothesis- There is no significant difference between the

expected and observed frequencies.Alternative hypothesis- There is a significant difference between the expected

and observed frequencies.In other words, if the probability of getting the observed frequency is within our area of rejection (bounded by our chi critical value), we are going to reject our null hypothesis. Otherwise, we are going to approve.

Page 8: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)Formula for Calculating χ2

where O is the observed frequency;E is the expected frequency;We already know the observed frequencies which were listed in the previous slide. We need to find out the expected frequencies.

Page 9: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)How to get Expected Frequency (E)

We get the total number of customers and multiply it to its corresponding percentage.For Yellow:

150 x 0.2 = 30For Red:

150 x 0.3 = 45

… and so on. Values are tabulated in the next slide.

Page 10: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)Category Color Observed Frequency Expected Frequency

Yellow 35 30

Red 50 45

Green 30 15

Blue 10 15

White 25 45

Total: 150 150

Page 11: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)Getting χ2

To get χ2, we take the summation of the squares of the differences between observed frequencies and expected frequencies all over each corresponding expected frequency.Therefore, for our first data set,

35-30 = 5 5^2 = 25 25 / 30 (E) = 5/6Second data set,

50-45=5 5^2 = 25 25 / 45 (E) = 5/9Following with the remaining three sets, we add all of those values. That is our chi squared statistic.

Page 12: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)Category Color

Observed Frequency

Expected Frequency

O-E (O-E)2 (O-E)2/E

Yellow 35 30 5 25 .83

Red 50 45 5 25 .56

Green 30 15 15 225 15

Blue 10 15 -5 25 1.67

White 25 45 -20 400 8.89

X^2 = 26.95

Page 13: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)Calculating our Chi Critical Value (χc

2)

To get χc2, we get our Chi table and locate our critical value

with degrees of freedom (Df) and significance level (σ).Df = 5 – 1 = 4 σ = 0.05

Page 14: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

Df = 4σ = 0.05

χc2 = 9.49

χ2 = 26.95

Page 15: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

Conclusion:χ2 is a lot bigger from our Chi critical value χc

2. In the Chi distribution graph, the area bounded by our Chi critical value (area of rejection) definitely overlaps with the area bounded by our Chi statistic. We are therefore inclined to reject our null hypothesis at 5% significance level and Carl’s distribution is incorrect. (does not fit)

Page 16: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

TEST FOR INDEPENDENCEProblem:In a certain town, there are about one million eligible voters. A simple random sample of 10000 eligible voters was chosen to study the relationship between sex and participation in the last election. The contingency table is shown in the 2nd slide after this slide :P. We want to find out if gender and voting are independent. σ = 0.05

Page 17: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

Null and Alternate HypothesesHo = Sex is independent from voting.Ha = Sex and voting are dependent.

Page 18: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

OBSERVED FREQUENCIES

Men Women Total

Voted 2792 3591 6383

Didn’t vote 1486 2131 3617

Total 4278 5722 10000

Contingency Table

Page 19: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

Formula for Expected Frequency

In order to get the expected frequency, it is defined by the formula:Expected frequency = RowTotal x ColumnTotal /

GRAND TOTAL

Page 20: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

EXPECTED FREQUENCIES

Men Women Total

Voted 2731 3652 6383

Didn’t vote 1547 2070 3617

Total 4278 5722 10000

Page 21: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

Summary of FrequenciesObserved Expected ((O-E)^2)/E

Men Voted 2792 2731 1.363

Men didn’t vote 1486 1547 2.405

Women Voted 3591 3652 1.018

Women didn’t vote 2131 2070 1.797

χ2 = 6.53

Page 22: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

Computing Df (degrees of freedom)Degrees of freedom for chi square test of independence is equal to:

(rows-1)*(columns-1) = Df

(2-1) = 1 * (2-1) = 1 * 1 = Df

Df = 1We then get our chi critical value by getting Chi table and locating significance level 0.05 and Df of 1.

Page 23: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

χc2 = 3.84

χ2 = 6.53

Page 24: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)

Conclusion:

Since our χc2 > χ2, we are going to reject

that our null hypothesis is true, and approve of the fact that sex and voting are dependent in the town.

Page 25: Chi-Square Test ( χ 2 )

Chi-Square Test (χ2)Summary of Formulas:Goodness of Fit:

Df = number of categories – 1Test for Independence (Contingency Table)

Df = (rows-1)(columns-1)