Chi Square Goodness of Fit Testfnl

download Chi Square Goodness of Fit Testfnl

of 4

Transcript of Chi Square Goodness of Fit Testfnl

  • 8/8/2019 Chi Square Goodness of Fit Testfnl

    1/4

    MASINDE MULIRO UNIVERSITY

    OF SCIENCE & TECHNOLOGYCOMPUTER SCIENCE DEPARTMENT

    PCT 911: ADVANCE RESEARCH METHODS

    TASK

    Chi-square goodness-of-fit test (x2) for Stolen

    Vehicles

    SUBMITTEDBY

    NAME: NAHASON MATOKE

    REGNO: SIT/H/004/10

    SUBMITTEDTO:

    DRG. WANYEMBI

    KAKAMEGA

  • 8/8/2019 Chi Square Goodness of Fit Testfnl

    2/4

    Chi Sqr Assignment 2

    Purpose:Test for distributional adequacythe chi-square test (Snedecor and Cochran, 1989) is used

    to test if a sample of data came from a population with a specific distribution.

    An attractive feature of the chi-square goodness-of-fit test is that it can be applied to any

    univariate distribution for which you can calculate the cumulative distribution function.

    The chi-square goodness-of-fit test is applied to binned data (i.e., data put into classes).

    This is actually not a restriction since for non-binned data you can simply calculate a

    histogram or frequency table before generating the chi-square test. However, the values of

    the chi-square test statistic are dependent on how the data is binned. Another disadvantage

    of the chi-square test is that it requires a sufficient sample size in order for the chi-square

    approximation to be valid.

    The chi-square test is an alternative to the Anderson-Darling and Kolmogorov-Smirnovgoodness-of-fit tests. The chi-square goodness-of-fit test can be applied to discrete

    distributions such as the binomial and the Poisson. The Kolmogorov-Smirnov and

    Anderson-Darling tests are restricted to continuous distributions.

    DefinitionThe chi-square test is defined for the hypothesis:H0: The data follow a specified distribution.

    Ha: The data do not follow the specified distribution.

    Test Statistic: For the chi-square goodness-of-fit computation, the data are divided into kbins and the test statistic is defined as:

    Where is the observed frequency for bin i and is the expected frequency for bin i.

    The expected frequenciesTheKenya Evening Star, Nov. 7, 2009, reported the following information for a randomsample of 1000 stolen cars for the previous year:

    170 were Fords, 300 Toyotas, 210 Nissans, 190 Hyundai's, and 130 Peugeots.

    Using the X2= goodness-fit test and significance level of 0.01 to test the hypothesis that

    proportions stolen are identical to population make proportions.Suppose it is established that 15% of all cars are Fords, 35% are Toyotas, 20% are Nissans,

    15% are Hyundais, and 15% are Peugeots.

    The Observed Stolen Vehicles.Ford Toyota Nissan Hyundai Peugeot Total

    Stolen (Oij) 170 300 210 190 130 1000

    Percentage of vehicles stolen for each make;

    (Stolen make/Total stolen) * 100

    Ford Toyota Nissan Hyundai Peugeot Total

  • 8/8/2019 Chi Square Goodness of Fit Testfnl

    3/4

    Stolen (Oij) % 17 30 21 19 13 100

    Total vehicles

    Total Vehicles=(stolen/percentage of stolen Vehicle)*100 = 5000

    There fore

    Expected Stolen Frequencies (Stolen Vehicle)

    Given that15% of all cars are Fords, 35% are Toyotas, 20% are Nissans, 15% are Hyundais, and 15%

    are Peugeots

    15% Ford ofTotal Vehicles =

    Ford Toyota Nissan Hyundai Peugeot

    Eij % 15% 35% 20% 15% 15%Eij 150 350 200 150 150

    Test the null hypothesis

    Oij Eij Oij- Eij (Oij- Eij)2/ EijFord 170 150 20 2.666666667

    Toyota 300 350 -50 7.142857143

    Nissan 210 200 10 0.5

    Hyundai 190 150 40 10.66666667

    Peugeot 130 150 -20 2.666666667

    23.64285714

    That is, chi-square is the sum of the squared difference between observed (Oij) and theexpected (Eij) data (or the deviation, d), divided by the expected data in all possiblecategories

    Assessing significance levels:

    In the chi-square test for independence the degree of freedom is equal to the number ofcolumns in the table minus one multiplied by the number of rows in the table minus one.

    Df: = (c-1) (r-1)

    = (2-1) (5-1)

    = 4

    Thus the value calculated from the formula above is compared with values in the chi-

    square distribution table (Bissonnette, 2006). We reject the null hypothesis if the chi-

    squared value is greater than the critical value (what is called the upper critical value).

  • 8/8/2019 Chi Square Goodness of Fit Testfnl

    4/4

    ConclusionTherefore the chi square for these data is: 23.643 (4 degrees of freedom: (2-1) (5-1)). The

    critical value at p =.01 is 13.277

    Since 23.643 is larger than 13.277, what observed differs from these expectations is enough

    to reject the null Hypothesis.

    State the you can draw from the observations made

    Test the null hypothesis

    Set up the hypothesis for Chi-Square goodness of fit test:

    H0. Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes thatthere is no significant difference between the observed and the expected value.

    Ha. Alternative hypothesis: In Chi-Square goodness of fit test, the alternative hypothesisassumes that there is a significant difference between the observed and the expected value.

    The calculated value of X2 (23.636) is much higher than the table value(13.277) which

    means that the calculated value cannot be said to have been due to chance. It is significant

    Hence, the hypothesis does not hold