Ch. 12 Examples-Illustrations

download Ch. 12 Examples-Illustrations

of 6

Transcript of Ch. 12 Examples-Illustrations

  • 8/10/2019 Ch. 12 Examples-Illustrations

    1/6

    1

    Ch. 12 Examples & Illustrations

    An Illustration of the shapes of 2

    distribution is given below for df equal to 2, 5 and 10. You can

    see that it progressively approaches symmetrical looking shape.

    Chi-square distribution

    df = 2

    Chi-square distribution

    df = 5

    P(lower) P(upper) Chi-square

    .9999 .0001 25.74

    Chi-square distribution

    df = 10

    P(lower) P(upper) Chi-square1.0000 1.00E-05 41.30

  • 8/10/2019 Ch. 12 Examples-Illustrations

    2/6

    2

    The tables for the 2

    distribution for various degrees of freedom from 1 to 100 are given on the

    back of the book. The degree of freedom is indicated in the first column. For any degree of

    freedom the 2value given in the tables is the value which would have the area to the right

    indicated by the subscript of 2on top row. For example for 10 degrees of freedom the

    2value

    which will have 10 percent area to the right is 15.9871.The 2

    distribution is generally used for

    Goodness of Fit Test or Test of Independence.

    1. The 2Test of Goodness of Fit in The Case of Equal Expected Frequencies

    Example 1:A marketing manager for a manufacturer of sports cards plans to begin a new series

    with the pictures and playing statistics of six former major league baseball players. She sets up a

    booth and sells 120 cards. She wants to find out whether the picture of anyplayer (with playing

    statistics) makes significant difference in the sale of the sports cards or not.

    Ho: Sales is Independent of Players Profile or all players are treated equally

    H1: At least one player is not treated same as others

    The sample results are given below:

    Player No. of cards(fo) Expected Freq(fe)

    A 13 20

    B 30 20

    C 14 20

    D 15 20

    E 28 20

    F 20 20

    _____________________________________________

    Total 120 120

    Note that a general convention is to consider n sufficiently large for Chi-square test if all theexpected frequencies (fe) are at least equal to 5. If for any cell the value of expected frequencyfalls below 5, it is better to combine it with another category. The formula to calculate Chi-

    square is given below:

    2= (fo- fe)

    2/feis distributed as

    2with degree of freedom k-1, where k is the number of

    categories. We have to subtract one because only five of the six frequencies can be arbitrarily

    determined once the total is fixed. The sixth is determined when five others and the total are

    given. Let us do the calculations indicated by the formula as follows:

    fo- fe (fo- fe)2 (fo- fe)

    2/ fe

    -7 49 2.4510 100 5.00-6 36 1.80

    -5 25 1.258 64 3.20

    0 0 0 13.7 =

    2calculated.

    Total

  • 8/10/2019 Ch. 12 Examples-Illustrations

    3/6

    3

    For 6-1 = 5 degrees of freedom, the table value of 2

    .05= 11.0705

    Since the calculated test statistic exceeds the Table value, we reject the Null hypothesis of

    Independence. In other words, we can say with 95% confidence that the sample does not fit thehypothesis of Uniform Distribution or equal expected frequencies. But the

    2.01 is 15.0863. So we

    cannot have 99% confidence in this statement. The impact of players photo and profile is

    somewhat significant but not strongly significant.

    The MegaStat result is given below.In excel create two columns: one for observed and one for

    expected frequencies. Then go to Chi-square/cross tab. Then select Goodness of Fit and in thedialogue box fill the input section, put 0 for number of parameters estimated and you get the

    results.The following table shows all the calculation we did above using calculator. It gives the

    value of calculated test statistic exactly equal to what we obtained.

    Goodness of Fit Test

    observed expected O - E (O - E) / E

    13 20.000 -7.000 2.45030 20.000 10.000 5.000

    14 20.000 -6.000 1.800

    15 20.000 -5.000 1.250

    28 20.000 8.000 3.200

    20 20.000 0.000 0.000

    120 120.000 0.000 13.700

    13.70 chi-square

    5 df

    .0176 p-value

    The p-value indicates that we can reject the Null at 5% but not at 1% level test.

    2. The 2Test of Goodness of Fit in The Case of Unequal Expected Frequencies

    Example2: A recent (hypothetical) national survey of hospitaladmissions for people between 25

    and 50 years who had hospital admissions during a two years period showed that 40% had 1

    admission only, 20% had two admissions, 14% had 3 admissions, 10% had 4 admissions, 8%had 5 admissions, 6% had 6 admissions and only 2% had 7 or more admissions. The mayor of a

    small city claims that his city is much healthier than the national average. He even cites the

    percentages for the two extreme categories. He says that 44% of local population in the given

    age group have only one hospital admissions (compared to 40% national) and the percentage of 6

    or more admissions is only 5% compared to national 8%. His claim was in fact based on asample of 400 randomly selected people in the specified age group who were interviewed by a

    local Newspaper. It was revealed that 176 people had only 1 admission, 75 had 2 admissions, 50had 3 admissions, 44 had 4 admissions, 35 had 5 admissions, 15 had 6 admissions and only 5 had

    7 or more admissions. Is the claim of the mayor valid? Test at 5% and 10%.

    Looking at the two extreme categories the mayors claim seems to have strong evidence. But

  • 8/10/2019 Ch. 12 Examples-Illustrations

    4/6

    4

    Statisticians in the local University wanted to test the claim using more scientific methods. Does

    the overall data support the mayors claim?

    The Null hypothesis in this case is that all the categories (number of hospital admissions) in the

    local population are the same as in the national population. The alternative hypothesis is that the

    local and national patterns (or percentages) are different. We will obtain the expected frequenciesby multiplying the percentages in the national survey by the total number of observation in the

    local survey. For example the expected frequency for only one admission is 0.40*400= 160

    (assuming equality between local and national percentages). The following table will make itclear.

    Admissions National% fe fo fofe (fofe)2 (fefo)

    2/ fe

    1 40 160 176 16 256 1.6002 20 80 75 -5 25 0.3133 14 56 50 -6 36 0.643

    4 10 40 44 4 16 0.400

    5 8 32 35 3 9 0.2816 6 24 15 -9 81 3.3757+ 2 8 5 -3 9 1.125

    Total 100 400 400 0 --- 7.737

    The calculated test statistic is 7.737 and the degree of freedom is 7-1 = 6.

    For this df the table gives 2

    .10=10.6446 and 2

    .05=12.5916. Thus the Null hypothesis of no

    difference between national and local populations with respect to the number of hospitaladmissions cannot be rejected even at 10% level. The mayors claim was found to lack strong

    evidence from the data when the scientific hypothesis testing method was applied although

    initially it seemed to have some evidence.

    To the computer it does not matter whether the case is that of equal expected frequencies or

    unequal expected frequencies. The process is the same.

    Goodness of Fit Test

    observed expected O - E (O - E) / E176 160.000 16.000 1.600

    75 80.000 -5.000 0.313

    50 56.000 -6.000 0.643

    44 40.000 4.000 0.400

    35 32.000 3.000 0.281

    15 24.000 -9.000 3.375

    5 8.000 -3.000 1.125400 400.000 0.000 7.737

    7.74 chi-square

    6 df

    .2580 p-value

    The p-value clearly supports our above conclusion.

  • 8/10/2019 Ch. 12 Examples-Illustrations

    5/6

    5

    3. Chi-Square Test of Independence using Contingency Table

    Example 3: A sample of 500 individuals was collected to study whether the letter grade hassignificant impact on the income after 10 years of graduation. Suppose income level is divided

    into three (arbitrary) groups as High Income, Middle Income and Low Income. The observed

    frequencies are shown in the Contingency Table below:

    Table of observed frequencies of Income level by Letter GradeGrade

    Income A B C D TotalHigh 18 14 12 6 50

    Middle 52 70 100 78 300

    Low 20 26 58 46 150

    Total 90 110 170 130 500

    We have the observed frequencies and need to find the expected frequencies. After that the

    formula for the test statistic is the same as in the case of Goodness of Fit test. The formula for theexpected frequencies is based on the Null Hypothesis that the rows and columns are independent

    of each other.

    If feijdenotes the expected frequency in cell (i,j) thenfeij= (Row i total*Column j total)/Grand Total

    For example the expected frequency in cell (1,1) or the left upper corner cell would be50*90/500 = 9 whereas the observed frequency is 18. It is also customary to show both types of

    frequencies in the same table so that pair wise differences can be easily calculated. The row and

    column totals for the observed and expected frequencies must be identical. Therefore, if you

    have to do rounding, keep this in mind.

    Table of observed and expected frequencies of Income level by Letter GradeGrade

    Income A B C D Total

    High 18

    (9)

    14

    (11)

    12

    (17)

    6

    (13)

    50

    Middle 52(54)

    70(66)

    100(102)

    78(78)

    300

    Low 20

    (27)

    26

    (33)

    58

    (51)

    46

    (39)

    150

    Total 90 110 170 130 500

    The degree of freedom formula is: df = (number of rows-1)*(number of columns-1).In the present example this would give (3-1)*(4-1) = 6 degrees of freedom.

  • 8/10/2019 Ch. 12 Examples-Illustrations

    6/6

    6

    2= (fo- fe)

    2/fe= {(18-9)

    2/9} + {(14-11)

    2/11} +..+{(46-39)

    2/39}= 20.92

    Note that there are 12 terms in the above sum, one for each cell.The table values for 6 df are:

    2.05=12.5916 and

    2.01=16.8119. The calculated test statistic

    exceeds both. Therefore, Null Hypothesis of Independence is rejected even at 1% level.

    The MegaStat results are given below. Looking at the low p-value we can say with 99%confidence that letter grade does matter for future incomes: not a big surprise. (Note that in

    MegaStat for contingency tables, you do not need to enter the expected frequencies, only provide

    the observed frequencies).

    Chi-square Contingency Table Test for Independence

    A B C D Total

    HIGH Observed 18 14 12 6 50

    Expected 9.00 11.00 17.00 13.00 50.00

    O - E 9.00 3.00 -5.00 -7.00 0.00

    (O - E) / E 9.00 0.82 1.47 3.77 15.06MED Observed 52 70 100 78 300

    Expected 54.00 66.00 102.00 78.00 300.00

    O - E -2.00 4.00 -2.00 0.00 0.00

    (O - E) / E 0.07 0.24 0.04 0.00 0.36

    LOW Observed 20 26 58 46 150

    Expected 27.00 33.00 51.00 39.00 150.00

    O - E -7.00 -7.00 7.00 7.00 0.00

    (O - E) / E 1.81 1.48 0.96 1.26 5.52

    Total Observed 90 110 170 130 500

    Expected 90.00 110.00 170.00 130.00 500.00

    O - E 0.00 0.00 0.00 0.00 0.00

    (O - E) / E 10.89 2.55 2.47 5.03 20.93

    20.93 chi-square

    6 df

    .0019 p-value