Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability...

38
Comparing Two Samples: Part I

Transcript of Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability...

Page 1: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Comparing Two Samples: Part I

Page 2: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Normality CheckFrequency histogram

(Skewness & Kurtosis)Probability plot, K-S test

Normality CheckFrequency histogram

(Skewness & Kurtosis)Probability plot, K-S test

Descriptive statistics

Descriptive statistics

Measurements(data)

Measurements(data)

Mean, SD, SEM, 95% confidence interval

Mean, SD, SEM, 95% confidence interval

YES

Check the Homogeneity of Variance

Check the Homogeneity of Variance

Data transformation

Data transformation

NO

Data transformation

Data transformation

NO

Median, range, Q1 and Q3

Median, range, Q1 and Q3

Non-ParametricTest(s) For 2 samples: Mann-WhitneyFor 2-paired samples: Wilcoxon For >2 samples:Kruskal-WallisSheirer-Ray-Hare

Non-ParametricTest(s) For 2 samples: Mann-WhitneyFor 2-paired samples: Wilcoxon For >2 samples:Kruskal-WallisSheirer-Ray-Hare

Parametric TestsStudent’s t tests for 2 samples; ANOVA for 2 samples; post hoc tests for multiple comparison of means

Parametric TestsStudent’s t tests for 2 samples; ANOVA for 2 samples; post hoc tests for multiple comparison of means

YES

Page 3: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Procedures for comparing two samples

• Test normality of the data; if passed, then

• Compare the measurements both of location (central

value) and dispersion (spread or range) of the

thickness indices between the two sites

• If there is a difference only in the measure of location, a parametric statistical test based upon the difference between the two sample means

Page 4: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

One sample t test: t = (X - )/SX • The 2-tailed t test for significant difference between a

mean longevity of horses and a hypothesized population mean of = 22 yr

• Ho: = 22 year

• Age at death (in year) of 25 horses:

t = (24.23 - 22)/ (4.25/25) = 2.624

= n - 1 = 25 - 1 = 24

t 0.05 (2), 24 = 2.064

Thus reject Ho0.01<p<0.02

Age (yr)17.20 20.90 23.10 25.80 28.1018.00 21.00 23.40 26.00 28.6018.70 21.70 23.80 26.30 29.3019.80 22.30 24.20 27.20 30.1020.30 22.60 24.60 27.60 35.10

mean 24.23SD 4.25SE 0.85

SX = Standard error of mean = SD/n

Page 5: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Confidence interval for mean• A researcher often needs to know how close a sample

mean (x) is to the population mean ().

For example, the amount of dissolved/dispersed petroleum hydrocarbons (DDPH) in the upper ocean.

It is impossible to sample the entire ocean so the population mean at any time is unknown. If data of the population follow the normality, the sample mean will be normally distributed.

• The standard deviation (or standard error ) of the mean can be obtained by:

SX = S.E.M. = S/n

• In general, the population mean lies within one standard deviation (S) of the sample mean with about 68% confidence. It is usual to quote 95% confidence intervals by multiplying S.E.M. with the (1) appropriate value of z, namely 1.96 (for n >30) or (2) appropriate value of t, df = n -1 (for n < 30).

Page 6: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Confidence interval for mean• e.g. 50 measurements of DDPH in the upper ocean were made.

The sample mean = 4.75 ppb and S = 3.99 ppb

Then SX = 3.99/ 50 = 0.5643

The 95% confidence intervals are obtained as

X z 0.05 SX = 4.75 (1.96)(0.5643) = 4.75 1.105 ppb

i.e. 5.855 and 3.633 ppb

• e.g. Only 10 measurements of DDPH in the upper ocean were made. The sample mean = 78.2 ppb and S = 38.6 ppb

Then SX = 38.6/ 10 = 12.206

The 95% confidence intervals are obtained as

X t 0.05, n-1 SX = 78.2 (2.262)(12.206) = 78.2 27.61 ppb

Page 7: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Two-tailed and one-tailed tests• If we do not know which of the means is greater than the other,

we reject extreme values in either direction (i.e. HA: a b); this procedure is known as a two-tailed test.

• However, if sufficient information is available for us to specify the direction of the population means, we could frame the alternative hypothesis as HA: a > b or a < b

• For example, comparison of total organic carbon (TOC) concentrations in the sewage effluent before and after being upgraded to an advance treatment. In this case, a reduced in TOC would be expected after improvement of the sewage treatment, and therefore, HA: before > after

• The corresponding critical value of z for 1-tailed tests is slightly different from those for 2-tailed tests (see Table B3).

Page 8: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

The difference between two sample means with limited data

• If n<30, the above method gives an unreliable estimate of z

• This problem was solved by ‘Student’ who introduced the t-test early in the 20th century

• Similar to z-test, but instead of referring to z, a value of t is required (Table B3)

• df = 2n - 2 for n1 = n2

• For all degrees of freedom below infinity, the curve appears leptokurtic compared with the normal distribution, and this property becomes extreme at small degrees of freedom.

Page 9: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

A B

Mea

n m

easu

red

unit

Comparison of two samples

Page 10: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

A B

Mea

sure

d un

it

Error bar = 2 SD

Two-sample t test

Page 11: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

A B

Mea

sure

d un

it

Error bar = 2 SD

Two-sample t test

Page 12: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

A B

Mea

sure

d un

it

Error bar = 2 SD

Page 13: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

• If the measure of dispersion (i.e. homogeneity of variance) also differs significantly between the samples:

– data transformation procedure: if there was no significant difference between the variances of transformed data, then a parametric test could be applied; otherwise you should consider

– non-parametric test or Welch’s approximate t’ (Zars p. 128-129)

Importance of Equal Variance

Page 14: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Assumptions for z and t tests• For the t and z tests to be valid:

– the measurements should be at least approximately normally distributed and

– with similar sample variances (i.e., homoscedastic or homogeneity of variance)

• Need to check Normality of both datasets

• Homoscedasticity can be checked by a F-test

The largest sample varianceF ratio = The smallest variance

Page 15: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Check for homogeneity of variance

Ho: equal variance between the two samples

HA: unequal variances

• Given that Sa2 = 0.2898 (n = 15) and Sb

2 = 0.1701 (n =12),

• Then, F ratio = 0.28929/0.1701 = 1.703

• Use the F table (Table B4, Zar 99, App.34),

Critical F 0.05(2), 14, 11 = 3.36 > 1.703

Accept Ho

• If Ho rejected, then transform data and redo F test

• If still failed, non-parametric or Welch’s approximate t’

Page 16: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

An example • Birds’ eggshells are thought

to be influenced by acid rain which reduces the egg thickness index (egg shell mass/ surface area; mg cm-2)

• We investigate gulls’ eggs at two nesting sites: a control site and a site affected by acid rain

• Taking a single egg at random from a number of nests at each site; determining the egg thickness index

Page 17: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Two sample Z test• Ho: no significant difference in the the thickness

indices between the two sites

• i.e. the thickness indices belong to the same population (or probability density function). Then the population means for the two sites will be the same and their difference will be zero.

• Based on the central limit theorem, a collection of sample means will follow a normal distribution. It can also be shown that the distribution of the difference of two means (Xa - Xb) will also be normally distributed.

• Recall the distribution of z: if (Xa - Xb) can be divided by an appropriate standard deviation, a value of z can be calculated:

Page 18: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Two sample Z test• The standard deviation of (Xa - Xb) =

[(1/n)(Sa2 + Sb

2)]

• Then, Z = (Xa - Xb)/[(1/n)(Sa2 + Sb

2)]

• This equation can be used provided the number of measurements in each sample is reasonably large (n > 30), the measurements are approximately normally distributed and n in each sample is the same.

• Ho: a = b ; HA: a b

• If the sample means are similar, the z value will be small and Ho will be accepted. The critical values of z can be obtained from the t table, Table B3, (df = ).

Page 19: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Two sample Z test• The following data were obtained at two gull nesting sites.

50 eggs were taken at random from each site, with 1 egg taken randomly from each nest. Egg thickness index results (mg cm-2)

Site a (control) b

n 50 50

mean 33.2 31.89

S2 16.41 17.39

Ho: a = b ; HA: a b

z = (Xa - Xb)/[(1/n)(Sa2 + Sb

2)] = (33.2-31.89)/[(1/50)(16.41 + 17.39)]

z = 1.593

As critical z=0.05, df = = 1.96, accept Ho.

Remember to always check the homogeneity of variance before running the t test.

Page 20: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Exercise• The following data were obtained at two gull nesting sites.

50 eggs were taken at random from each site, with 1 egg taken randomly from each nest. Egg thickness index results (mg cm-2)

Site a (control) b

n 50 50

mean 33.2 27.8

S2 16.41 15.21

Test the Ho: a = b (HA: a b) using the two sample z test

z = (Xa - Xb)/[(1/n)(Sa2 + Sb

2)]

Please do it later !

Remember to always check the homogeneity of variance before running the t test.

Page 21: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

A t test with equal measurements in each sample (na = nb)• e.g. The chemical oxygen demand (COD) is measured at two

industrial effluent outfalls, a and b, as part of consent procedure. Test the null hypothesis: Ho: a = b while HA: a b (Given that the data are normally distributed)

a b3.48 3.892.99 3.193.32 2.804.17 4.313.78 3.424.00 3.413.20 3.554.40 2.403.85 2.994.52 3.083.09 3.313.62 4.52

n 12 12

mean 3.701 3.406

S2 0.257 0.366

sp2 = (SS1+ SS2) / (υ1+ υ2) = [(0.257 × 11) + (0.366 × 11)]/(11+11) =

0.312

sX1 – X2 = √(sp2/n1 + sp

2/n2) = √(0.312/12) × 2 = 0.228

t = (X1 – X2) / sX1 – X2 = (3.701 – 3.406) / 0.228 = 1.294

df = 2n - 2 = 22

t = 0.05, df = 22, 2-tailed = 2.074 > t observed = 1.294, p > 0.05

The calculated t-value is less than the critical t value.

Thus, accept Ho.

SS = sum of square = S2 × υ

Remember to always check the homogeneity of variance before running the t test.

Example

Page 22: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

• Growth of 30-weeks old non-transgenic and transgenic tilapia was determined by measuring the body mass (wet weight). Since transgenic fish cloned with growth hormone (GH) related gene OPAFPcsGH are known to grow faster in other fish species (Rahman et al. 2001), it is hypothesized that HA: transgenic > non-transgenic while the null hypothesis is given as Ho: transgenic non-transgenic

Example

Page 23: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Ho: transgenic non-transgenic

HA: transgenic > non-transgenic

Given that mass (g) of tilapia are normally distributed.

n 8 8

mean 625.0 306.25

S2 6028.6 5798.2

transgenic non-transgenic700 305680 280500 275510 250670 490670 275620 275650 300

sp2 = (SS1+ SS2) / (υ1+ υ2) = 5913.4

sX1 – X2 = √(sp2/n1 + sp

2/n2) = 38.45

t = (X1 – X2) / sX1 – X2 = 8.29

df = 2n - 2 = 14

t = 0.05, df = 14, 1-tailed = 1.761 << 8.29 ; p < 0.001

The t-value is greater than the critical t value. Thus, reject Ho.

Remember to always check the homogeneity of variance before running the t test.

Example

Page 24: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

• e.g. The data are human blood-clotting time (in minutes) of individuals given one of two different drugs. It is hypothesized that Ho: a = b while HA: a b (Given that the data are normally distributed)

n 6 7

mean 8.75 9.74

S2 0.339 0.669

sp2 = (SS1+ SS2) / (υ1+ υ2) = 0.519

sX1 – X2 = √(sp2/n1 + sp

2/n2) = 0.401

t = (X1 – X2) / sX1 – X2 = -2.470

t = 0.05, df = 6 + 7 -2 = 11, 2-tailed = 2.201 < 2.470; 0.02<p<0.05

Thus, reject Ho but accept HA.

Given drug a Given drug b

8.80 9.908.40 9.007.90 11.108.70 9.609.10 8.709.60 10.40

9.50

Example

Page 25: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Power of the two-sample t test

• Power of two-sample t test is greatest when the number of measurements in each sample is the same (n1 = n2)

• Power of two-sample t test for different numbers of measurements in each sample (n1 n2) is smaller

• When n1 n2, effective n = 2n1n2 /(n1+n2)

– e.g. n1=6, n2=7,

effective n = 2(6 × 7)/ (6+7) = 6.46,

which is smaller than the average of 6 and 7 (6.5)

• Therefore, we should always use ‘balanced design’ where possible.

Page 26: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Normality CheckFrequency histogram

(Skewness & Kurtosis)Probability plot, K-S test

Normality CheckFrequency histogram

(Skewness & Kurtosis)Probability plot, K-S test

Descriptive statistics

Descriptive statistics

Measurements(data)

Measurements(data)

Mean, SD, SEM, 95% confidence interval

Mean, SD, SEM, 95% confidence interval

YES

Check the Homogeneity of Variance

Check the Homogeneity of Variance

Data transformation

Data transformation

NO

Data transformation

Data transformation

NO

Median, range, Q1 and Q3

Median, range, Q1 and Q3

Non-ParametricTest(s) For 2 samples: Mann-WhitneyFor 2-paired samples: Wilcoxon For >2 samples:Kruskal-WallisSheirer-Ray-Hare

Non-ParametricTest(s) For 2 samples: Mann-WhitneyFor 2-paired samples: Wilcoxon For >2 samples:Kruskal-WallisSheirer-Ray-Hare

Parametric TestsStudent’s t tests for 2 samples; ANOVA for 2 samples; post hoc tests for multiple comparison of means

Parametric TestsStudent’s t tests for 2 samples; ANOVA for 2 samples; post hoc tests for multiple comparison of means

YES

F-test

z testt tests

Mann-Whitneytest

For comparison of two independent samples

Page 27: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Why do we need to transform the data into normal distribution for parametric tests

when we can apply non-parametric tests?

Parametric tests are more powerful and more sensitive than non-parametric ones.

Non-parametric methods for comparison of two independent samples

Page 28: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Non-parametric tests for two samples - the Mann-Whitney test

• Non-parametric tests are also called ‘distribution-free’ test because they are independent of the underlying population distribution, the only assumption being independence of observations and continuity of the variable which is being measured. Most of these tests involve a ranking procedure

• Mann-Whitney test is useful for two independent samples and can be used as an alternative to the t-test, particularly where the assumptions for t-test cannot be demonstrated

• It can also be applied to ordinal data

• It is usually assumed that the distribution of the measurements in the two samples are of the same general form (shown by frequency histogram or stem-and-leaf plot)

• This test can be undertaken with any sample size. But the method of calculating the test statistic (U) depends upon the size of n.

Page 29: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Mann-Whitney test - basic principleMann-Whitney test - basic principle

35 24 123 21 19 17 15 9 4 3Case 1

35 24 123 21 19 17 15 9 4 3Case 2

35 24 123 21 19 17 15 9 4 3Case 3

35 24 123 21 19 17 15 9 4 3Case 4

Group A Group B Unequal medians

Page 30: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Non-parametric tests for two samples - the Mann-Whitney test

• In Mann-Whitney test, the null hypothesis cannot be stated in terms of population parameters, but is defined as the equality of the medians of the populations from which the two samples are drawn.

• Example: Mann-Whiney test where n is small

Sample A Sample B

9 10

12 15

6 11

7 13

18

• These measurements are combined and ranked in descending order:

B B B A B B A A A

18 15 13 12 11 10 9 7 6

Example

Page 31: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Example I: Mann-Whiney test (2-tailed test)

• These measurements are combined and ranked in descending order :

B B B A B B A A A

18 15 13 12 11 10 9 7 6

• Ho: equal medians

• Summation of ranks for A score

R2 = 4 + 7 + 8 + 9 = 28

• Summation of ranks for B score

R1 = 1 + 2 + 3 + 5 + 6 = 17

• U = n1n2 + [n1(n1 + 1)/2] - R1

• where the greatest measurement (smaller sum of ranks) in either of the two groups is given as rank 1 (R1); Thus group B is R1 in this case.

• U = (5)(4) + [(5)(5 +1)/2] - 17 = 18

• U0.05(2), 5, 4 = U0.05(2), 4, 5 = 19 > 18 ; thus accept Ho; i.e. equal medians

Example

Page 32: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Sample A Sample B

9 10

12 15

6 11

7 13

18

Sample A Sample B

9 10

12 15

6 11

7 13

18

Sample A Sample B

12 18

9 15

7 13

6 11

10

Sample A Sample B

12 18

9 15

7 13

6 11

10

Sample A Rank Sample B Rank

12 4 18 1

9 7 15 2

7 8 13 3

6 9 11 5

10 6

Sample A Rank Sample B Rank

12 4 18 1

9 7 15 2

7 8 13 3

6 9 11 5

10 6

18 15 13 12 11 10 9 7 6

Ho: equal medians

U = n1n2 + [n1(n1 + 1)/2] - R1

where the greatest measurement in either of the two groups is given as rank 1 (R1); Thus group B is R1 in this case.

U = (5)(4) + [(5)(5 +1)/2] - 17 = 18

U0.05(2), 5, 4 = U0.05(2), 4, 5 = 19 > 18 ; thus accept Ho; i.e. equal medians

R1 = 17 R2 = 28

Example

Page 33: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Sample A Sample B

9 10

11 15

6 12

7 13

18

Sample A Sample B

9 10

11 15

6 12

7 13

18

Sample A Sample B

11 18

9 15

7 13

6 12

10

Sample A Sample B

11 18

9 15

7 13

6 12

10

Sample A Rank Sample B Rank

11 5 18 1

9 7 15 2

7 8 13 3

6 9 12 4

10 6

Sample A Rank Sample B Rank

11 5 18 1

9 7 15 2

7 8 13 3

6 9 12 4

10 6

U = n1n2 + [n1(n1 + 1)/2] - R1

U = (5)(4) + [(5)(5 +1)/2] - 16 = 19

U0.05(2), 5, 4 = U0.05(2), 4, 5 = 19; thus reject Ho at p = 0.05; i.e. unequal medians

R1 = 16 R2 = 29

18 15 13 12 11 10 9 7 6

Example

Page 34: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Sample A Sample B

9 11

10 15

6 12

7 13

18

Sample A Sample B

9 11

10 15

6 12

7 13

18

Sample A Sample B

10 18

9 15

7 13

6 12

11

Sample A Sample B

10 18

9 15

7 13

6 12

11

Sample A Rank Sample B Rank

11 6 18 1

9 7 15 2

7 8 13 3

6 9 12 4

11 5

Sample A Rank Sample B Rank

11 6 18 1

9 7 15 2

7 8 13 3

6 9 12 4

11 5

U = n1n2 + [n1(n1 + 1)/2] - R1

U = (5)(4) + [(5)(5 +1)/2] - 15 = 20

U0.05(2), 5, 4 = U0.05(2), 4, 5 = 19 < 20; thus reject Ho at p = 0.02; i.e. unequal medians

R1 = 15 R2 = 32

18 15 13 12 11 10 9 7 6

Example

Page 35: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Example II: Mann-Whiney test (2-tailed test)

• The following data provide representative soil moisture contents on south and north-facing slopes under grassland in June. The Mann-Whitney test is used to test the null hypothesis that the population medians of the two samples are the same.

• A variance ratio test will show that the measurements are heteroscedastic.

• Also, the measurements are percentages, which would not be expected to be normally distributed.

Soil Moisture (% dry wt)North-facing Rank South-facing Rank

73.0 1 70.7 472.2 2 60.0 1671.3 3 55.9 1770.1 5 54.2 1869.4 6 53.0 1969.1 7 50.5 2069.1 8 49.4 2168.0 9 48.5 2267.6 10 47.1 2367.2 11 46.4 2466.5 12 45.3 2566.1 13 41.6 2665.4 14 40.6 2763.6 15 39.5 28

38.3 2933.4 3026.7 31

Example

Page 36: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

• A variance ratio test will show that the measurements are heteroscedastic.

• Also, the measurements are percentages, which would not be expected to be normally distributed.

• U = n1n2 + [n1(n1 + 1)/2] - R1

• U = (14)(17) + [(14)(14 + 1)/2] - 116 = 227

• U0.05(2), 14, 17 = 169 < 227

• p < 0.001, thus reject Ho

• The medians of the two samples are significantly different

n = 14 R1 = 116 n = 17 R2 = 380

Soil Moisture (% dry wt)North-facing Rank South-facing Rank

73.0 1 70.7 472.2 2 60.0 1671.3 3 55.9 1770.1 5 54.2 1869.4 6 53.0 1969.1 7 50.5 2069.1 8 49.4 2168.0 9 48.5 2267.6 10 47.1 2367.2 11 46.4 2466.5 12 45.3 2566.1 13 41.6 2665.4 14 40.6 2763.6 15 39.5 28

38.3 2933.4 3026.7 31

Example

Page 37: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Non-parametric tests for two samples - the Mann-Whitney test for n > 20

• Critical tables for U become unwieldy as n increases above 20 and Mann & Whitney obtained z as a function of U and n as shown below:

• z = (U - n1n2/2)/[n1n2(n1 + n2 + 1)/12]

• Suppose that the test value of U was found to be 148 with n1 = 16 and n2 = 29. Then

• z = (148 - (16 29/ 2)/[(16)(29)(16 + 29 + 1)/12] = -84/42.174 = -1.99

• For a 2-tailed test, the modulus (1.99) is taken.

• Referring Table B3, df = infinity, at p = 0.05, z = 1.96 < 1.99

• Thus, reject Ho

Page 38: Comparing Two Samples: Part I. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.

Important Notes

• Comparisons between two samples can be made with reference to the difference between the sample means or medians

• Comparison between sample means are made by calculating their difference and dividing by an appropriate sample standard deviation

• Where the sample size is large (e.g. n > 50) a two-sample z test can be used. For small samples, a Student’s t-test can be used

• The parametric t and z tests should be applied to independent interval/ ratio measurements which are at least approximately normally distributed

• A simple test of homoscedasticity (F ratio test) should be applied prior to application of t and z tests

• For non-normally distributed or heteroscedastic data, the non-parametric Mann-Whitney test can be utilized