Post on 02-Jun-2018
8/10/2019 Ch. 12 Examples-Illustrations
1/6
1
Ch. 12 Examples & Illustrations
An Illustration of the shapes of 2
distribution is given below for df equal to 2, 5 and 10. You can
see that it progressively approaches symmetrical looking shape.
Chi-square distribution
df = 2
Chi-square distribution
df = 5
P(lower) P(upper) Chi-square
.9999 .0001 25.74
Chi-square distribution
df = 10
P(lower) P(upper) Chi-square1.0000 1.00E-05 41.30
8/10/2019 Ch. 12 Examples-Illustrations
2/6
2
The tables for the 2
distribution for various degrees of freedom from 1 to 100 are given on the
back of the book. The degree of freedom is indicated in the first column. For any degree of
freedom the 2value given in the tables is the value which would have the area to the right
indicated by the subscript of 2on top row. For example for 10 degrees of freedom the
2value
which will have 10 percent area to the right is 15.9871.The 2
distribution is generally used for
Goodness of Fit Test or Test of Independence.
1. The 2Test of Goodness of Fit in The Case of Equal Expected Frequencies
Example 1:A marketing manager for a manufacturer of sports cards plans to begin a new series
with the pictures and playing statistics of six former major league baseball players. She sets up a
booth and sells 120 cards. She wants to find out whether the picture of anyplayer (with playing
statistics) makes significant difference in the sale of the sports cards or not.
Ho: Sales is Independent of Players Profile or all players are treated equally
H1: At least one player is not treated same as others
The sample results are given below:
Player No. of cards(fo) Expected Freq(fe)
A 13 20
B 30 20
C 14 20
D 15 20
E 28 20
F 20 20
_____________________________________________
Total 120 120
Note that a general convention is to consider n sufficiently large for Chi-square test if all theexpected frequencies (fe) are at least equal to 5. If for any cell the value of expected frequencyfalls below 5, it is better to combine it with another category. The formula to calculate Chi-
square is given below:
2= (fo- fe)
2/feis distributed as
2with degree of freedom k-1, where k is the number of
categories. We have to subtract one because only five of the six frequencies can be arbitrarily
determined once the total is fixed. The sixth is determined when five others and the total are
given. Let us do the calculations indicated by the formula as follows:
fo- fe (fo- fe)2 (fo- fe)
2/ fe
-7 49 2.4510 100 5.00-6 36 1.80
-5 25 1.258 64 3.20
0 0 0 13.7 =
2calculated.
Total
8/10/2019 Ch. 12 Examples-Illustrations
3/6
3
For 6-1 = 5 degrees of freedom, the table value of 2
.05= 11.0705
Since the calculated test statistic exceeds the Table value, we reject the Null hypothesis of
Independence. In other words, we can say with 95% confidence that the sample does not fit thehypothesis of Uniform Distribution or equal expected frequencies. But the
2.01 is 15.0863. So we
cannot have 99% confidence in this statement. The impact of players photo and profile is
somewhat significant but not strongly significant.
The MegaStat result is given below.In excel create two columns: one for observed and one for
expected frequencies. Then go to Chi-square/cross tab. Then select Goodness of Fit and in thedialogue box fill the input section, put 0 for number of parameters estimated and you get the
results.The following table shows all the calculation we did above using calculator. It gives the
value of calculated test statistic exactly equal to what we obtained.
Goodness of Fit Test
observed expected O - E (O - E) / E
13 20.000 -7.000 2.45030 20.000 10.000 5.000
14 20.000 -6.000 1.800
15 20.000 -5.000 1.250
28 20.000 8.000 3.200
20 20.000 0.000 0.000
120 120.000 0.000 13.700
13.70 chi-square
5 df
.0176 p-value
The p-value indicates that we can reject the Null at 5% but not at 1% level test.
2. The 2Test of Goodness of Fit in The Case of Unequal Expected Frequencies
Example2: A recent (hypothetical) national survey of hospitaladmissions for people between 25
and 50 years who had hospital admissions during a two years period showed that 40% had 1
admission only, 20% had two admissions, 14% had 3 admissions, 10% had 4 admissions, 8%had 5 admissions, 6% had 6 admissions and only 2% had 7 or more admissions. The mayor of a
small city claims that his city is much healthier than the national average. He even cites the
percentages for the two extreme categories. He says that 44% of local population in the given
age group have only one hospital admissions (compared to 40% national) and the percentage of 6
or more admissions is only 5% compared to national 8%. His claim was in fact based on asample of 400 randomly selected people in the specified age group who were interviewed by a
local Newspaper. It was revealed that 176 people had only 1 admission, 75 had 2 admissions, 50had 3 admissions, 44 had 4 admissions, 35 had 5 admissions, 15 had 6 admissions and only 5 had
7 or more admissions. Is the claim of the mayor valid? Test at 5% and 10%.
Looking at the two extreme categories the mayors claim seems to have strong evidence. But
8/10/2019 Ch. 12 Examples-Illustrations
4/6
4
Statisticians in the local University wanted to test the claim using more scientific methods. Does
the overall data support the mayors claim?
The Null hypothesis in this case is that all the categories (number of hospital admissions) in the
local population are the same as in the national population. The alternative hypothesis is that the
local and national patterns (or percentages) are different. We will obtain the expected frequenciesby multiplying the percentages in the national survey by the total number of observation in the
local survey. For example the expected frequency for only one admission is 0.40*400= 160
(assuming equality between local and national percentages). The following table will make itclear.
Admissions National% fe fo fofe (fofe)2 (fefo)
2/ fe
1 40 160 176 16 256 1.6002 20 80 75 -5 25 0.3133 14 56 50 -6 36 0.643
4 10 40 44 4 16 0.400
5 8 32 35 3 9 0.2816 6 24 15 -9 81 3.3757+ 2 8 5 -3 9 1.125
Total 100 400 400 0 --- 7.737
The calculated test statistic is 7.737 and the degree of freedom is 7-1 = 6.
For this df the table gives 2
.10=10.6446 and 2
.05=12.5916. Thus the Null hypothesis of no
difference between national and local populations with respect to the number of hospitaladmissions cannot be rejected even at 10% level. The mayors claim was found to lack strong
evidence from the data when the scientific hypothesis testing method was applied although
initially it seemed to have some evidence.
To the computer it does not matter whether the case is that of equal expected frequencies or
unequal expected frequencies. The process is the same.
Goodness of Fit Test
observed expected O - E (O - E) / E176 160.000 16.000 1.600
75 80.000 -5.000 0.313
50 56.000 -6.000 0.643
44 40.000 4.000 0.400
35 32.000 3.000 0.281
15 24.000 -9.000 3.375
5 8.000 -3.000 1.125400 400.000 0.000 7.737
7.74 chi-square
6 df
.2580 p-value
The p-value clearly supports our above conclusion.
8/10/2019 Ch. 12 Examples-Illustrations
5/6
5
3. Chi-Square Test of Independence using Contingency Table
Example 3: A sample of 500 individuals was collected to study whether the letter grade hassignificant impact on the income after 10 years of graduation. Suppose income level is divided
into three (arbitrary) groups as High Income, Middle Income and Low Income. The observed
frequencies are shown in the Contingency Table below:
Table of observed frequencies of Income level by Letter GradeGrade
Income A B C D TotalHigh 18 14 12 6 50
Middle 52 70 100 78 300
Low 20 26 58 46 150
Total 90 110 170 130 500
We have the observed frequencies and need to find the expected frequencies. After that the
formula for the test statistic is the same as in the case of Goodness of Fit test. The formula for theexpected frequencies is based on the Null Hypothesis that the rows and columns are independent
of each other.
If feijdenotes the expected frequency in cell (i,j) thenfeij= (Row i total*Column j total)/Grand Total
For example the expected frequency in cell (1,1) or the left upper corner cell would be50*90/500 = 9 whereas the observed frequency is 18. It is also customary to show both types of
frequencies in the same table so that pair wise differences can be easily calculated. The row and
column totals for the observed and expected frequencies must be identical. Therefore, if you
have to do rounding, keep this in mind.
Table of observed and expected frequencies of Income level by Letter GradeGrade
Income A B C D Total
High 18
(9)
14
(11)
12
(17)
6
(13)
50
Middle 52(54)
70(66)
100(102)
78(78)
300
Low 20
(27)
26
(33)
58
(51)
46
(39)
150
Total 90 110 170 130 500
The degree of freedom formula is: df = (number of rows-1)*(number of columns-1).In the present example this would give (3-1)*(4-1) = 6 degrees of freedom.
8/10/2019 Ch. 12 Examples-Illustrations
6/6
6
2= (fo- fe)
2/fe= {(18-9)
2/9} + {(14-11)
2/11} +..+{(46-39)
2/39}= 20.92
Note that there are 12 terms in the above sum, one for each cell.The table values for 6 df are:
2.05=12.5916 and
2.01=16.8119. The calculated test statistic
exceeds both. Therefore, Null Hypothesis of Independence is rejected even at 1% level.
The MegaStat results are given below. Looking at the low p-value we can say with 99%confidence that letter grade does matter for future incomes: not a big surprise. (Note that in
MegaStat for contingency tables, you do not need to enter the expected frequencies, only provide
the observed frequencies).
Chi-square Contingency Table Test for Independence
A B C D Total
HIGH Observed 18 14 12 6 50
Expected 9.00 11.00 17.00 13.00 50.00
O - E 9.00 3.00 -5.00 -7.00 0.00
(O - E) / E 9.00 0.82 1.47 3.77 15.06MED Observed 52 70 100 78 300
Expected 54.00 66.00 102.00 78.00 300.00
O - E -2.00 4.00 -2.00 0.00 0.00
(O - E) / E 0.07 0.24 0.04 0.00 0.36
LOW Observed 20 26 58 46 150
Expected 27.00 33.00 51.00 39.00 150.00
O - E -7.00 -7.00 7.00 7.00 0.00
(O - E) / E 1.81 1.48 0.96 1.26 5.52
Total Observed 90 110 170 130 500
Expected 90.00 110.00 170.00 130.00 500.00
O - E 0.00 0.00 0.00 0.00 0.00
(O - E) / E 10.89 2.55 2.47 5.03 20.93
20.93 chi-square
6 df
.0019 p-value