Lec 1 Comparing Multiple KW Test 1

10
91-501 Compare Several Populations with Unknown Distributions (the Kruskal-Wallis test)

description

Lec 1 Comparing Multiple KW Test 1

Transcript of Lec 1 Comparing Multiple KW Test 1

  • 91-501

    Compare Several Populations with

    Unknown Distributions

    (the Kruskal-Wallis test)

  • The Kruskal-Wallis (KW) Test for Comparing

    Populations with Unknown Distributions

    A nonparametric test for comparing population

    medians by Kruskal and Wallis (KW Test)

    The KW procedure tests the null hypothesis that k samples from possibly different populations actually originate from similar populations, at least as far as their central tendencies, or medians, are concerned. The test assumes that the variables under consideration have underlying continuous distributions.

    In what follows assume we have k samples, and the sample size of the i-th sample is ni where i = 1, 2, . . ., k.

  • Test based on Ranks of Combined Data

    In the computation of the KW statistic, each observation is replaced by its rank in an ordered combination of all the k samples.

    By this we mean that the data from the k samples combined are ranked in a single series. The minimum observation is replaced by a rank of 1, the next-to-the-smallest by a rank of 2, and the largest or maximum observation is replaced by the rank of N, where N is the total number of observations in all the samples (N is the sum of the ni).

  • Compute the sum of the ranks for

    each sample

    The next step is to compute the Sum of

    the Ranks for each of the original samples.

    The KW test determines whether these

    sums of ranks are so different by sample

    that they are not likely to have all come

    from the same population.

  • Test statistic follows a

    2 distribution It can be shown that if the k samples come from the same

    population, that is, if the null hypothesis is true, then the

    test statistic, H, used in the KW procedure is distributed

    approximately as a chi-square statistic with df = k - 1,

    provided that the sample sizes of the k samples are not

    too small (say, ni>4, for all i). H is defined as follows:

    where

    k = number of samples (groups)

    ni = number of observations for the i-th sample or group

    N = total number of observations (sum of all the ni)

    Ri = sum of ranks for group i

    =12

    ( + 1)

    2

    =1 3( + 1)

  • Example

    The following data are from a comparison of four (4)

    investment firms. The observations represent

    percentage of growth during a three month period

    for recommended funds.

    A B C D

    4.2 3.3 1.9 3.5

    4.6 2.4 2.4 3.1

    3.9 2.6 2.1 3.7

    4.0 3.8 2.7 4.1

    2.8 1.8 4.4

  • Step 1: Express the data in terms

    of their ranks

    A B C D

    17 10 2 11

    19 4.5 4.5 9

    14 6 3 12

    15 13 7 16

    8 1 18

    SUM 65 41.5 17.5 66

  • Compute the test statistic

    The corresponding H test statistic is

    =12

    19(20)

    652

    4+41.52

    5+17.52

    5+662

    5 3 20 = 13.678

    =12

    ( + 1)

    2

    =1 3( + 1)

    From the Chi-Square table, the critical value for =0.05 with d.o.f. = k-1=3 is 7.81. Since 13.678 > 7.81, we reject

    the null hypothesis (H0). Note that the rejection region for

    the KW-test procedure is One-Sided, since we only reject

    the null hypothesis when the H-statistic is too large.

  • Example: Four heat-treatment schedules are tested against a certain

    metal material, the internal stress is measured. There are

    totally 24 samples tested, with 6 samples treated under a

    specific schedule. The results are shown in the following

    table. Using Kruskal-Wallis method, determine if the four

    treatment schedule produce the same result in terms of the

    internal stress.

    Schedule A Schedule B Schedule C Schedule D

    value rank value rank value rank value rank

    4.5 21.5 3.8 12 3.5 8.0 3.0 4

    5.0 24 4.0 16.5 4.5 21.5 2.8 3

    3.5 8 3.9 13.5 3.2 5 2.2 2

    3.7 11 4.2 19 2.1 1 3.4 6

    4.8 23 3.6 10 3.5 8.0 4.0 16.5

    4.0 16.5 4.4 20 4.0 16.5 3.9 13.5

    104 91 60 45 SUM

  • =12

    24(25)

    1042

    6+912

    6+602

    6+452

    6 3 25 = 7.41

    From the Chi-Square table, the critical value for =0.05 with d.o.f. = k-1=3 is 7.81. Since 7.41 < 7.81, we accept

    the null hypothesis (H0), i.e., we do not have strong

    evidence to reject the null hypothesis that the mean

    internal stress are all equal under the four different heat

    treatment schedule.