Test for Outliers
-
Upload
asdasdas-asdasdasdsadsasddssa -
Category
Documents
-
view
250 -
download
0
Transcript of Test for Outliers
-
7/30/2019 Test for Outliers
1/26
Test For Outliers
One of the important aim in the statistical tests is to
recognize the presene or absence of outliers
Outliers in a series of measurements are
extraordinarily small or large observations
compared with the bulk of the data
There are test procedures in order to detect outliers
in data and we will look at the Dixons Q-test
Q-test is one of the nost frequently used outlier testprocedure
-
7/30/2019 Test for Outliers
2/26
Test For Outliers
The Q-test uses the range of measurements and
can be applied even when only few data areavailable
The n measurements are arranged in ascending
orderIf the very small value to be tested as an outlier is
denoted byx1and the very large value byxn
Then the test statistic is calculated as given on thenext slide
-
7/30/2019 Test for Outliers
3/26
Test For Outliers
For the smallest one
1
121
xxxxQ
n
-
7/30/2019 Test for Outliers
4/26
Test For Outliers
For the Largest one
1
1
xxxxQ
n
nnn
-
7/30/2019 Test for Outliers
5/26
Test For Outliers
The null hypothesis, i.e, that the concidered
measurement is not an outlier, is accepted if thequantity QQ(1-a;n)], then we reject the null hypothesis andsay that the value is an outlier
Q values for selected significance and degrees of
freedom are given in table and in your book in
Table2.10
-
7/30/2019 Test for Outliers
6/26
Example 2.8
Trace analysis of polycyclic aromatic hydrocarbons
(PAH) in a soil revealed for the trace constituentbenzo[a]pyrene the following values in mg/kg dry
weight
5.30, 5,00, 5.10, 5.20, 5.10, 6.20, 5.15
Apply the Q-test to check whether the smallest and
largest value might be an outlier
-
7/30/2019 Test for Outliers
7/26
Example 2.8
First we need to arrange the data in an ascending
order as
5.00, 5,10, 5.10, 5.15, 5.20, 5.30, 6.20
The we can calculate the Q value for both smallest
and largest values as
083.000.520.600.510.5
1
121
xxxxQ
n
-
7/30/2019 Test for Outliers
8/26
Example 2.8
For the largest value
75.000.520.6
30.520.6
1
1
xx
xxQ
n
nn
n
For an a=0.01 we can use the table 2-10 and obtain the
table value as
Q(1-0.01=0.99;n=7)=0.64
Since the Q1 value is much smaller (0.083) than the table
value we can not eliminate the smallest value as outlier
However, the Q2 value is in fact larger than the table value
and for this reason we can eliminate the largest as outlier
F t t E l
-
7/30/2019 Test for Outliers
9/26
F test Example
82.434.0
)59.3(8.10
5.14)255.0(
8.10
5.14
2
2
2
1
2
2
21
Confidence interval is
-
7/30/2019 Test for Outliers
10/26
Grubbss Test for Outlier
);1(*
nTs
xxT table a
It can be applied for series of measurements consisting of 3 to
150 measuremets
The null hypothesis, according to whichx*is not an outlier
within the measurement series ofn values is accepted at level
a, if the test quantitity T is:
By use of the test quantity T, the distances of the suspicious
values from the mean are determined and related to the
standard deviation of the measurements
-
7/30/2019 Test for Outliers
11/26
Grubbss Test for Outlier
21.2
411.0
20.629.5*
71.0411.0
00.529.5*1
s
xxT
sxxT
n
Exmaple 2.9
The data for the trace analysis of benzo[a]pyrene from
previous example are also used in Grubbss test
The mean of the data was 5.29 and the standard deviation
was 0.411)
The we can calculate the T values for the smallest ans the
largest vales as
-
7/30/2019 Test for Outliers
12/26
Grubbss Test for Outlier
21.2
411.0
20.629.5*
71.0411.0
00.529.5*1
s
xxT
sxxT
n
Exmaple 2.9
The table value at an a=0.01 is
T(1-0.01=0.99;n=7) =2.10
As a result, the test results is not significant for the smallest
value but is significant for the largest value
So the largest one is an outlier
N i T f M h d C i
-
7/30/2019 Test for Outliers
13/26
Non-parametric Tests for Method Comparison
The Tests that we have seen so far all requires that the data
must be normaly distributed.In this case distribution free methods needs to be used
These methods do not require the parameters such as
mean and standard deviation used in the previous tests
For that reason, they are non-parametric methods
These methods require more replicate mesurements
The do not use the values of the quantitative variables
They use the rank of the data and are based on the
counting
N t i T t f M th d C i
-
7/30/2019 Test for Outliers
14/26
Non-parametric Tests for Method Comparison
We will look at two example of non-parametric tests
These are:
The Mann -Wh itney U-testfor the comparison
of the independent samples
Wilcoxon T-testfor for paqired measurementsWhen Normality isw doubtful, you should always
check these tests especially in the case of small
samples
-
7/30/2019 Test for Outliers
15/26
The Mann-Whitney U-test
This test is based on the ranking the samples by taking the
both gruops (group A and group B ) of the data togetherIt gives the rank 1 to the lowest result and rank 2 to the
second ect.
If n1 and n2 are the number of data in the group with the
smallest and largest number of results, respectively, and R1and R2 are the sum of the ranks in these two groups, then
we can we can set up the equations as:
2
22212
111211
2
1
2
1
Rnn
nnU
Rnn
nnU
-
7/30/2019 Test for Outliers
16/26
The Mann-Whitney U-test
The smaller of the two U values is used to evaluate the test
When we have tie, the the average of the ranks are given.
The Mann-whitney test compares the median of the two
samples
The smaller the diffrerence between the medians, the
smaller the difference between U1and U2
222212
111
211
21
2
1
RnnnnU
Rnn
nnU
211
210
:
:
UUH
UUH
-
7/30/2019 Test for Outliers
17/26
The Mann-Whitney U-test
Example: The following two grops of
measuremets are to be compared
Here the lowest results, 10.8 is
given the rank 1.
Since we have 10.8 twice in group A
and B, they are both given the rank
of 1.5 as their average
A B
11.2 10.913.7 11.2
14.8 12.1
11.2 12.415.0 15.5
16.1 14.6
17.3 13.510.9 10.8
10.8
11.7
5.1
2
21
Rank
-
7/30/2019 Test for Outliers
18/26
The Mann-Whitney U-test
If we set the
hypothesis asThis will be a two
sided test
Group result rank Group result rank
A 10.8 1.5 B 12.4 10
B 10.8 1.5 B 13.5 11
A 10.9 3.5 A 13.7 12
B 10.9 3.5 B 14.6 13
A 11.1 5 A 14.8 14
A 11.2 6.5 A 15.0 15
B 11.2 6.5 B 15.5 16
A 11.7 8 A 16.1 17
B 12.1 9 A 17.3 18
211
210
:
:
UUH
UUH
-
7/30/2019 Test for Outliers
19/26
The Mann-Whitney U-test
R1 is the sum of the ranks in group B as:
R1=1.5+3.5+6.5+9+11+13+16=70.5
R2 is the sum of the ranks in group A as:
R2=1.5+3.5+6+6.5+8+10+12+14+15+17+18=100.5
5.34),min(
thatnotice
5.345.100
2
1101010*8
2
1
5.455.702
18*810*8
2
1
21
2121
222
212
111
211
UUU
nnUU
Rnn
nnU
Rnn
nnU
-
7/30/2019 Test for Outliers
20/26
The Mann-Whitney U-test
From the table (Appendix, Table 4), for a two sided test with
n1=8 and n2=10, a value of 17 is found.
If an observed U value is les than or equal to the value in the
table, the null hypothesis may be rejected at the level of the
significance of the table.
Since our calculated value is larger than 17, we conclude
that no difference between the two groups.
5.34),min(
thatnotice
5.345.1002
1101010*82
1
5.455.702
18*810*8
2
1
21
2121
222
212
111
211
UUU
nnUU
RnnnnU
Rnn
nnU
The Mann Whitney U test
-
7/30/2019 Test for Outliers
21/26
The Mann-Whitney U-testWe can now check the data used in this test have any
tendency to show normal distribution or not.
sample raw A raw B ranked ranked (j-0.5)/10 (j-0.5)/8 ranked (j-0.5)/18
1 11.20 10.90 10.80 10.80 0.05 0.06 10.80 0.03
2 13.70 11.20 10.90 10.90 0.15 0.19 10.80 0.08
3 14.80 12.10 11.20 11.20 0.25 0.31 10.90 0.14
4 11.20 12.40 11.20 12.10 0.35 0.44 10.90 0.19
5 15.00 15.50 11.70 12.40 0.45 0.56 11.20 0.25
6 16.10 14.60 13.70 13.50 0.55 0.69 11.20 0.31
7 17.30 13.50 14.80 14.60 0.65 0.81 11.20 0.36
8 10.90 10.80 15.00 15.50 0.75 0.94 11.70 0.42
9 10.80 16.10 0.85 12.10 0.47
10 11.70 17.30 0.95 12.40 0.53
11 13.50 0.58
12 13.70 0.64
13 14.60 0.69
14 14.80 0.75
15 15.00 0.81
16 15.50 0.86
17 16.10 0.92
18 17.30 0.97
The Mann Whitney U test
-
7/30/2019 Test for Outliers
22/26
The Mann-Whitney U-testWe can now check the data used in this test have any
tendency to show normal distribution or not.
Normal probabilty plot
0.00
0.25
0.50
0.75
1.00
10.00 12.00 14.00 16.00 18.00
Measurement
Probabilit
y
Group A
Group B
A and B
-
7/30/2019 Test for Outliers
23/26
Wilcoxon Matched Pairs Signed-Rank test
In this test, difference of the (di) paired data first calculated
These divalues are ranked first without regard to signstarting with the smallest value.
Then the same sign is given as to corresponding difference
If there are ties, the same rule (take average) is applied as inthe Mann-Whitney test
If any di value is zero the you can either drop them from
analysis or assign a rank of(p+1)/2, in which pis the number
of zero differences
In this case half of the zero difference takes negative and
the other half positive rank
-
7/30/2019 Test for Outliers
24/26
Wilcoxon Matched Pairs Signed-Rank test
The null hypohesis is that the methods A and B are equivalet
If Ho is true, it would be expected that that the sum of allranks for positive differences (T+) would be close to the sum
for negative differences (T-).
The test statistic is than for two sided case:
Wilcoxon T-test is calculated as: T= min (T+, T-)
The smaller the value of T, the larger the significance of thedifference
BAH
BAH
:
:
1
0
-
7/30/2019 Test for Outliers
25/26
Wilcoxon Matched Pairs Signed-Rank test
Lets now do the example
sample R T d=R-T rank signed rank
1 114 116 -2 1 -1
2 49 42 7 7.5 7.5
3 100 95 5 4 4
4 20 10 10 9.5 9.5
5 90 94 -4 2.5 -2.5
6 106 100 6 5.5 5.5
7 100 96 4 2.5 2.5
8 95 102 -7 7.5 -7.5
9 160 150 10 9.5 9.5
10 110 104 6 5.5 5.5
-
7/30/2019 Test for Outliers
26/26
Wilcoxon Matched Pairs Signed-Rank test
The critical (Table) value of T as a function of n and a are
given in Table 5 of appendix.In our example, all positive differences adds up to T+=44.0
And all negative differences T-=11.0
If the calulated T value is equal to or smaller than the tablevalue, the null hyothesis is rejected.
For an a=0.05 and n=10 in our two sided test, the table
value is T=8.
Thus the nul hypothesis is accepted and we can conclude
that there is no diffrence between the two method