Mann Whitney U For comparison data
description
Transcript of Mann Whitney U For comparison data
Mann Whitney UFor comparison data
Using Mann Whitney U
Non-parametric i.e. no assumptions are made about data fitting a normal distribution
Is used to compare the medians of two sets of data
It measures the overlap between the two data sets
You must have between 6 and 20 replicates of data
The data sets can have unequal numbers of replicates
Frequency silver birch tree height at Rushey Plain
0
2
4
6
8
10
12
10 to14
15 to19
20 to24
25 to29
30 to34
35 to39
40 to44
45 to49
50 to54
55 to59
60 to64
Tree height
Fre
qu
ency
Example of normally distributed data
Frequency of silver birch and beech trees at Broom Hill
0
2
4
6
8
10
12
10 to14
15 to19
20 to24
25 to29
30 to34
35 to39
40 to44
45 to49
50 to54
55 to59
60 to64
Tree height
Fre
qu
ency
Silver Birch
Beech
When to use Mann-Whitney U-test
Curve not normally distributed ie. non parametric
Compares overlap between two data sets
The EquationU1 = n1 x n2 + ½ n2 (n2 + 1) - R2
U2 = n1 x n2 + ½ n1 (n1 + 1) - R1
The EquationU1 = n1 x n2 + ½ n2 (n2 + 1) - R2
U2 = n1 x n2 + ½ n1 (n1 + 1) - R1
Where:
U1 = Mann - Whitney U for data set 1
n1 = Sample size of data set 1
R1 = Sum of the ranks of data set 1
U2 = Mann - Whitney U for data set 2
n2 = Sample size of data set 2
R2 = Sum of the ranks of data set 2
1. Establish the Null Hypothesis H0 (this is always the negative form. i.e. there is no significant correlation between the variables) and the alternative hypothesis (H1).
Method
H0 - There is no significant difference between the variable at Site 1 and Site 2
H1 - There is a significant difference between the variable at Site 1 and Site 2
2. Copy your data into the table below as variable x and variable y and label the data sets
Rank 1 R1
Data Set 1Beech Hill (m) 23 21 23 20 24 25 22
Data Set 2Rushey Plain (m)
16 18 19 17 20 21
Rank 2 R2
Rank 1 R1
Data Set 1Beech Hill (m)
23 21 23 20 24 25 22
Data Set 2Rushey Plain (m)
16 18 19 17 20 21
Rank 2 R2
Start from the lowest and put the numbers in order:
16, 17, 18, 19, 20, 20, 21, 21, 22, 23, 23, 24, 25
3. Treat both sets of data as one data set and rank them in increasing order (the lowest data value gets the lowest rank)
1 2 3 4
16 17 18 19 20 20 21 21 22 23 23 24 25
When you have data values of the same value, they must have the same rank.
Take the ranks you would normally assign (5 and 6) and add them together (11) and divide the ranks between the data values(5.5)
1
16 17 18 19 20 20 21 21 22 23 23 24 25
The lowest data value gets a rank of 1
5 6
1 5.5 5.5
16 17 18 19 20 20 21 21 22 23 23 24 25
The same thing is done for all data values that are the same
When you have data values of the same value, they must have the same rank.
Take the ranks you would normally assign (5 and 6) and add them together (11) and divide the ranks between the data values(5.5)
The lowest data value gets a rank of 1
5 6 7 8 10 11
1 2 3 4 5.5 5.5 7.5 7.5 9 10.5 10.5 12 13
16 17 18 19 20 20 21 21 22 23 23 24 25
The assigned ranks can then be put into the table
Rank 1 R1 10.5 7.5 10.5 5.5 12 13 9
Data Set 1Beech Hill (m) 23 21 23 20 24 25 22
Data Set 2Rushey Plain (m)
16 18 19 17 20 21
Rank 2 R2 1 3 4 2 5.5 7.5
4. Sum the ranks for each set of data ( R)
R1 = 10.5 + 7.5 + 10.5 + 5.5 + 12 + 13 + 9 = 68 R1 = 10.5 + 7.5 + 10.5 + 5.5 + 12 + 13 + 9 = 68
R2 = 1 + 3 + 4 + 2 + 5.5 + 7.5 = 23 R2 = 1 + 3 + 4 + 2 + 5.5 + 7.5 = 23
5. Calculate the number of samples in each data set (n)
Count the number of samples in each of the data sets
Rank 1 R1 10.5 7.5 10.5 5.5 12 13 9
Data Set 1Beech Hill (m) 23 21 23 20 24 25 22
Data Set 2Rushey Plain (m)
16 18 19 17 20 21
Rank 2 R2 1 3 4 2 5.5 7.5
n1 = 7 n1 = 7
n2 = 6 n2 = 6
U1 = n1 x n2 + ½ n2 (n2 + 1) - R2
U2 = n1 x n2 + ½ n1 (n1 + 1) - R1
It is a good idea to break the equations down into three bite size chunks that will then give you a very easy three figure sum
U1 = n1 x n2 + ½ n2 (n2 + 1) - R2
U2 = n1 x n2 + ½ n1 (n1 + 1) - R1
6. Calculate the Values for U1 and U2 using the equations
6. Calculate the Values for U1 and U2 using the equations
U1 = n1 x n2 + ½ n2 (n2 + 1) - R2
(7x6) +3(6+1) -23
(7x6) +3(7) -23U1 = 42 +21 -23 = 40U1 = 42 +21 -23 = 40
U2 = 42 +28 -68 = 2U2 = 42 +28 -68 = 2
(7x6) +3.5(8) -68
(7x6) +3.5(7+1) -68
U2 = n1 x n2 + ½ n1 (n1 + 1) - R1
If the U value is less than (or equal to) the critical value then there is a significant difference between the data
sets and the null hypothesis can be rejected
The smallest U value is U2 = 2
6. Compare the smallest U value against the table of critical values
Critical Values for the Mann Whitney U test
Value of n2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
n1
1
2 0 0 0 0 1 1 1 1
3 0 1 1 2 2 3 3 4 4 5 5
4 0 1 2 3 4 4 5 6 7 8 9 10
5 0 1 2 3 5 6 7 8 9 11 12 13 14
6 1 2 3 5 6 8 10 11 13 14 16 17 19
7 1 3 5 6 8 10 12 14 16 18 20 22 24
8 0 2 4 6 8 10 13 15 17 19 22 24 26 29
9 0 2 4 7 10 12 15 17 20 23 26 28 31 34
10 0 3 5 8 11 14 17 20 23 26 29 33 36 39
11 0 3 6 9 13 16 19 23 26 30 33 37 40 44
12 1 4 7 11 14 18 22 26 29 33 37 41 45 49
13 1 4 8 12 16 20 24 28 33 37 41 45 50 54
14 1 5 9 13 17 22 26 31 36 40 45 50 55 59
15 1 5 10 14 19 24 29 34 39 44 49 54 59 64
(at the 0.05 or 95% confidence level i.e. we are 95% confident our data was not due to chance)
We us the values of n1 and n2 to find our critical value
Is there a significant difference?
Is 2 (our smallest U value) smaller or larger than 6 (our critical value from the Mann Whitney Table)? SmallerSmaller
If the U value is less than (or equal to) the critical value then there is a significant difference between the data sets and the
null hypothesis can be rejected
The smallest U value is less than the critical value; therefore the null hypothesis is rejected
The alternative Hypothesis can be accepted – There is a significant difference between the tree heights of Beech Hill and Rushey Plain
Use the following data to calculate U values independently
Rank 1 R1
Velocity cm.s-1
Pools12 12 5 6 14 9 20 8
Velocity cm.s-1
Riffles54 55 61 56 31 47 68 54
Rank 2 R2
Rank 1 R1
Abundance of Gammarus pulex
Pools27 39 43 2 0 1 3 9
Abundance of Gammarus pulex
Riffles85 16 80 18 3 5 63 150
Rank 2 R2
Abundance of Gammarus pulex in pools and riffles of an Exmoor stream
Key questions
Is there a significant relationship?
Which data value/s would you consider to be anomalous and why?
What graph would you use to present this data?