Mann Whitney U For comparison data

20
Mann Whitney U For comparison data

description

Mann Whitney U For comparison data. Using Mann Whitney U. Non-parametric i.e. no assumptions are made about data fitting a normal distribution Is used to compare the medians of two sets of data It measures the overlap between the two data sets - PowerPoint PPT Presentation

Transcript of Mann Whitney U For comparison data

Page 1: Mann Whitney U For comparison data

Mann Whitney UFor comparison data

Page 2: Mann Whitney U For comparison data

Using Mann Whitney U

Non-parametric i.e. no assumptions are made about data fitting a normal distribution

Is used to compare the medians of two sets of data

It measures the overlap between the two data sets

You must have between 6 and 20 replicates of data

The data sets can have unequal numbers of replicates

Page 3: Mann Whitney U For comparison data

Frequency silver birch tree height at Rushey Plain

0

2

4

6

8

10

12

10 to14

15 to19

20 to24

25 to29

30 to34

35 to39

40 to44

45 to49

50 to54

55 to59

60 to64

Tree height

Fre

qu

ency

Example of normally distributed data

Page 4: Mann Whitney U For comparison data

Frequency of silver birch and beech trees at Broom Hill

0

2

4

6

8

10

12

10 to14

15 to19

20 to24

25 to29

30 to34

35 to39

40 to44

45 to49

50 to54

55 to59

60 to64

Tree height

Fre

qu

ency

Silver Birch

Beech

When to use Mann-Whitney U-test

Curve not normally distributed ie. non parametric

Compares overlap between two data sets

Page 5: Mann Whitney U For comparison data

The EquationU1 = n1 x n2 + ½ n2 (n2 + 1) - R2

U2 = n1 x n2 + ½ n1 (n1 + 1) - R1

Page 6: Mann Whitney U For comparison data

The EquationU1 = n1 x n2 + ½ n2 (n2 + 1) - R2

U2 = n1 x n2 + ½ n1 (n1 + 1) - R1

Where:

U1 = Mann - Whitney U for data set 1

n1 = Sample size of data set 1

R1 = Sum of the ranks of data set 1

U2 = Mann - Whitney U for data set 2

n2 = Sample size of data set 2

R2 = Sum of the ranks of data set 2

Page 7: Mann Whitney U For comparison data

1. Establish the Null Hypothesis H0 (this is always the negative form. i.e. there is no significant correlation between the variables) and the alternative hypothesis (H1).

Method

H0 - There is no significant difference between the variable at Site 1 and Site 2

H1 - There is a significant difference between the variable at Site 1 and Site 2

Page 8: Mann Whitney U For comparison data

2. Copy your data into the table below as variable x and variable y and label the data sets

Rank 1 R1

Data Set 1Beech Hill (m) 23 21 23 20 24 25 22

Data Set 2Rushey Plain (m)

16 18 19 17 20 21

Rank 2 R2

Page 9: Mann Whitney U For comparison data

Rank 1 R1

Data Set 1Beech Hill (m)

23 21 23 20 24 25 22

Data Set 2Rushey Plain (m)

16 18 19 17 20 21

Rank 2 R2

Start from the lowest and put the numbers in order:

16, 17, 18, 19, 20, 20, 21, 21, 22, 23, 23, 24, 25

3. Treat both sets of data as one data set and rank them in increasing order (the lowest data value gets the lowest rank)

Page 10: Mann Whitney U For comparison data

1 2 3 4

16 17 18 19 20 20 21 21 22 23 23 24 25

When you have data values of the same value, they must have the same rank.

Take the ranks you would normally assign (5 and 6) and add them together (11) and divide the ranks between the data values(5.5)

1

16 17 18 19 20 20 21 21 22 23 23 24 25

The lowest data value gets a rank of 1

5 6

1 5.5 5.5

16 17 18 19 20 20 21 21 22 23 23 24 25

The same thing is done for all data values that are the same

Page 11: Mann Whitney U For comparison data

When you have data values of the same value, they must have the same rank.

Take the ranks you would normally assign (5 and 6) and add them together (11) and divide the ranks between the data values(5.5)

The lowest data value gets a rank of 1

5 6 7 8 10 11

1 2 3 4 5.5 5.5 7.5 7.5 9 10.5 10.5 12 13

16 17 18 19 20 20 21 21 22 23 23 24 25

The assigned ranks can then be put into the table

Page 12: Mann Whitney U For comparison data

Rank 1 R1 10.5 7.5 10.5 5.5 12 13 9

Data Set 1Beech Hill (m) 23 21 23 20 24 25 22

Data Set 2Rushey Plain (m)

16 18 19 17 20 21

Rank 2 R2 1 3 4 2 5.5 7.5

4. Sum the ranks for each set of data ( R)

R1 = 10.5 + 7.5 + 10.5 + 5.5 + 12 + 13 + 9 = 68 R1 = 10.5 + 7.5 + 10.5 + 5.5 + 12 + 13 + 9 = 68

R2 = 1 + 3 + 4 + 2 + 5.5 + 7.5 = 23 R2 = 1 + 3 + 4 + 2 + 5.5 + 7.5 = 23

Page 13: Mann Whitney U For comparison data

5. Calculate the number of samples in each data set (n)

Count the number of samples in each of the data sets

Rank 1 R1 10.5 7.5 10.5 5.5 12 13 9

Data Set 1Beech Hill (m) 23 21 23 20 24 25 22

Data Set 2Rushey Plain (m)

16 18 19 17 20 21

Rank 2 R2 1 3 4 2 5.5 7.5

n1 = 7 n1 = 7

n2 = 6 n2 = 6

Page 14: Mann Whitney U For comparison data

U1 = n1 x n2 + ½ n2 (n2 + 1) - R2

U2 = n1 x n2 + ½ n1 (n1 + 1) - R1

It is a good idea to break the equations down into three bite size chunks that will then give you a very easy three figure sum

U1 = n1 x n2 + ½ n2 (n2 + 1) - R2

U2 = n1 x n2 + ½ n1 (n1 + 1) - R1

6. Calculate the Values for U1 and U2 using the equations

Page 15: Mann Whitney U For comparison data

6. Calculate the Values for U1 and U2 using the equations

U1 = n1 x n2 + ½ n2 (n2 + 1) - R2

(7x6) +3(6+1) -23

(7x6) +3(7) -23U1 = 42 +21 -23 = 40U1 = 42 +21 -23 = 40

U2 = 42 +28 -68 = 2U2 = 42 +28 -68 = 2

(7x6) +3.5(8) -68

(7x6) +3.5(7+1) -68

U2 = n1 x n2 + ½ n1 (n1 + 1) - R1

Page 16: Mann Whitney U For comparison data

If the U value is less than (or equal to) the critical value then there is a significant difference between the data

sets and the null hypothesis can be rejected

The smallest U value is U2 = 2

6. Compare the smallest U value against the table of critical values

Page 17: Mann Whitney U For comparison data

Critical Values for the Mann Whitney U test

Value of n2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

n1

1

2 0 0 0 0 1 1 1 1

3 0 1 1 2 2 3 3 4 4 5 5

4 0 1 2 3 4 4 5 6 7 8 9 10

5 0 1 2 3 5 6 7 8 9 11 12 13 14

6 1 2 3 5 6 8 10 11 13 14 16 17 19

7 1 3 5 6 8 10 12 14 16 18 20 22 24

8 0 2 4 6 8 10 13 15 17 19 22 24 26 29

9 0 2 4 7 10 12 15 17 20 23 26 28 31 34

10 0 3 5 8 11 14 17 20 23 26 29 33 36 39

11 0 3 6 9 13 16 19 23 26 30 33 37 40 44

12 1 4 7 11 14 18 22 26 29 33 37 41 45 49

13 1 4 8 12 16 20 24 28 33 37 41 45 50 54

14 1 5 9 13 17 22 26 31 36 40 45 50 55 59

15 1 5 10 14 19 24 29 34 39 44 49 54 59 64

(at the 0.05 or 95% confidence level i.e. we are 95% confident our data was not due to chance)

We us the values of n1 and n2 to find our critical value

Page 18: Mann Whitney U For comparison data

Is there a significant difference?

Is 2 (our smallest U value) smaller or larger than 6 (our critical value from the Mann Whitney Table)? SmallerSmaller

If the U value is less than (or equal to) the critical value then there is a significant difference between the data sets and the

null hypothesis can be rejected

The smallest U value is less than the critical value; therefore the null hypothesis is rejected

The alternative Hypothesis can be accepted – There is a significant difference between the tree heights of Beech Hill and Rushey Plain

Page 19: Mann Whitney U For comparison data

Use the following data to calculate U values independently

Rank 1 R1

Velocity cm.s-1

Pools12 12 5 6 14 9 20 8

Velocity cm.s-1

Riffles54 55 61 56 31 47 68 54

Rank 2 R2

Rank 1 R1

Abundance of Gammarus pulex

Pools27 39 43 2 0 1 3 9

Abundance of Gammarus pulex

Riffles85 16 80 18 3 5 63 150

Rank 2 R2

Abundance of Gammarus pulex in pools and riffles of an Exmoor stream

Page 20: Mann Whitney U For comparison data

Key questions

Is there a significant relationship?

Which data value/s would you consider to be anomalous and why?

What graph would you use to present this data?