Test for Outliers

7/30/2019 Test for Outliers

1/26

Test For Outliers

One of the important aim in the statistical tests is to

recognize the presene or absence of outliers

Outliers in a series of measurements are

extraordinarily small or large observations

compared with the bulk of the data

There are test procedures in order to detect outliers

in data and we will look at the Dixons Q-test

Q-test is one of the nost frequently used outlier testprocedure


2/26

Test For Outliers

The Q-test uses the range of measurements and

can be applied even when only few data areavailable

The n measurements are arranged in ascending

orderIf the very small value to be tested as an outlier is

denoted byx1and the very large value byxn

Then the test statistic is calculated as given on thenext slide


3/26

Test For Outliers

For the smallest one

1

121

xxxxQ

n


4/26

Test For Outliers

For the Largest one

1

1

xxxxQ

n

nnn


5/26

Test For Outliers

The null hypothesis, i.e, that the concidered

measurement is not an outlier, is accepted if thequantity QQ(1-a;n)], then we reject the null hypothesis andsay that the value is an outlier

Q values for selected significance and degrees of

freedom are given in table and in your book in

Table2.10


6/26

Example 2.8

Trace analysis of polycyclic aromatic hydrocarbons

(PAH) in a soil revealed for the trace constituentbenzo[a]pyrene the following values in mg/kg dry

weight

5.30, 5,00, 5.10, 5.20, 5.10, 6.20, 5.15

Apply the Q-test to check whether the smallest and

largest value might be an outlier


7/26

Example 2.8

First we need to arrange the data in an ascending

order as

5.00, 5,10, 5.10, 5.15, 5.20, 5.30, 6.20

The we can calculate the Q value for both smallest

and largest values as

083.000.520.600.510.5

1

121

xxxxQ

n


8/26

Example 2.8

For the largest value

75.000.520.6

30.520.6

1

1

xx

xxQ

n

nn

n

For an a=0.01 we can use the table 2-10 and obtain the

table value as

Q(1-0.01=0.99;n=7)=0.64

Since the Q1 value is much smaller (0.083) than the table

value we can not eliminate the smallest value as outlier

However, the Q2 value is in fact larger than the table value

and for this reason we can eliminate the largest as outlier

F t t E l


9/26

F test Example

82.434.0

)59.3(8.10

5.14)255.0(

8.10

5.14

2

2

2

1

2

2

21

Confidence interval is


10/26

Grubbss Test for Outlier

);1(*

nTs

xxT table a

It can be applied for series of measurements consisting of 3 to

150 measuremets

The null hypothesis, according to whichx*is not an outlier

within the measurement series ofn values is accepted at level

a, if the test quantitity T is:

By use of the test quantity T, the distances of the suspicious

values from the mean are determined and related to the

standard deviation of the measurements


11/26


21.2

411.0

20.629.5*

71.0411.0

00.529.5*1

s

xxT

sxxT

n

Exmaple 2.9

The data for the trace analysis of benzo[a]pyrene from

previous example are also used in Grubbss test

The mean of the data was 5.29 and the standard deviation

was 0.411)

The we can calculate the T values for the smallest ans the

largest vales as


12/26


21.2

411.0

20.629.5*

71.0411.0

00.529.5*1

s

xxT

sxxT

n

Exmaple 2.9

The table value at an a=0.01 is

T(1-0.01=0.99;n=7) =2.10

As a result, the test results is not significant for the smallest

value but is significant for the largest value

So the largest one is an outlier

N i T f M h d C i


13/26

Non-parametric Tests for Method Comparison

The Tests that we have seen so far all requires that the data

must be normaly distributed.In this case distribution free methods needs to be used

These methods do not require the parameters such as

mean and standard deviation used in the previous tests

For that reason, they are non-parametric methods

These methods require more replicate mesurements

The do not use the values of the quantitative variables

They use the rank of the data and are based on the

counting

N t i T t f M th d C i


14/26

Non-parametric Tests for Method Comparison

We will look at two example of non-parametric tests

These are:

The Mann -Wh itney U-testfor the comparison

of the independent samples

Wilcoxon T-testfor for paqired measurementsWhen Normality isw doubtful, you should always

check these tests especially in the case of small

samples


15/26

The Mann-Whitney U-test

This test is based on the ranking the samples by taking the

both gruops (group A and group B ) of the data togetherIt gives the rank 1 to the lowest result and rank 2 to the

second ect.

If n1 and n2 are the number of data in the group with the

smallest and largest number of results, respectively, and R1and R2 are the sum of the ranks in these two groups, then

we can we can set up the equations as:

2

22212

111211

2

1

2

1

Rnn

nnU

Rnn

nnU


16/26


The smaller of the two U values is used to evaluate the test

When we have tie, the the average of the ranks are given.

The Mann-whitney test compares the median of the two

samples

The smaller the diffrerence between the medians, the

smaller the difference between U1and U2

222212

111

211

21

2

1

RnnnnU

Rnn

nnU

211

210

:

:

UUH

UUH


17/26


Example: The following two grops of

measuremets are to be compared

Here the lowest results, 10.8 is

given the rank 1.

Since we have 10.8 twice in group A

and B, they are both given the rank

of 1.5 as their average

A B

11.2 10.913.7 11.2

14.8 12.1

11.2 12.415.0 15.5

16.1 14.6

17.3 13.510.9 10.8

10.8

11.7

5.1

2

21

Rank


18/26


If we set the

hypothesis asThis will be a two

sided test

Group result rank Group result rank

A 10.8 1.5 B 12.4 10

B 10.8 1.5 B 13.5 11

A 10.9 3.5 A 13.7 12

B 10.9 3.5 B 14.6 13

A 11.1 5 A 14.8 14

A 11.2 6.5 A 15.0 15

B 11.2 6.5 B 15.5 16

A 11.7 8 A 16.1 17

B 12.1 9 A 17.3 18

211

210

:

:

UUH

UUH


19/26


R1 is the sum of the ranks in group B as:

R1=1.5+3.5+6.5+9+11+13+16=70.5

R2 is the sum of the ranks in group A as:

R2=1.5+3.5+6+6.5+8+10+12+14+15+17+18=100.5

5.34),min(

thatnotice

5.345.100

2

1101010*8

2

1

5.455.702

18*810*8

2

1

21

2121

222

212

111

211

UUU

nnUU

Rnn

nnU

Rnn

nnU


20/26


From the table (Appendix, Table 4), for a two sided test with

n1=8 and n2=10, a value of 17 is found.

If an observed U value is les than or equal to the value in the

table, the null hypothesis may be rejected at the level of the

significance of the table.

Since our calculated value is larger than 17, we conclude

that no difference between the two groups.

5.34),min(

thatnotice

5.345.1002

1101010*82

1

5.455.702

18*810*8

2

1

21

2121

222

212

111

211

UUU

nnUU

RnnnnU

Rnn

nnU

The Mann Whitney U test


21/26

The Mann-Whitney U-testWe can now check the data used in this test have any

tendency to show normal distribution or not.

sample raw A raw B ranked ranked (j-0.5)/10 (j-0.5)/8 ranked (j-0.5)/18

1 11.20 10.90 10.80 10.80 0.05 0.06 10.80 0.03

2 13.70 11.20 10.90 10.90 0.15 0.19 10.80 0.08

3 14.80 12.10 11.20 11.20 0.25 0.31 10.90 0.14

4 11.20 12.40 11.20 12.10 0.35 0.44 10.90 0.19

5 15.00 15.50 11.70 12.40 0.45 0.56 11.20 0.25

6 16.10 14.60 13.70 13.50 0.55 0.69 11.20 0.31

7 17.30 13.50 14.80 14.60 0.65 0.81 11.20 0.36

8 10.90 10.80 15.00 15.50 0.75 0.94 11.70 0.42

9 10.80 16.10 0.85 12.10 0.47

10 11.70 17.30 0.95 12.40 0.53

11 13.50 0.58

12 13.70 0.64

13 14.60 0.69

14 14.80 0.75

15 15.00 0.81

16 15.50 0.86

17 16.10 0.92

18 17.30 0.97

The Mann Whitney U test


22/26

The Mann-Whitney U-testWe can now check the data used in this test have any

tendency to show normal distribution or not.

Normal probabilty plot

0.00

0.25

0.50

0.75

1.00

10.00 12.00 14.00 16.00 18.00

Measurement

Probabilit

y

Group A

Group B

A and B


23/26

Wilcoxon Matched Pairs Signed-Rank test

In this test, difference of the (di) paired data first calculated

These divalues are ranked first without regard to signstarting with the smallest value.

Then the same sign is given as to corresponding difference

If there are ties, the same rule (take average) is applied as inthe Mann-Whitney test

If any di value is zero the you can either drop them from

analysis or assign a rank of(p+1)/2, in which pis the number

of zero differences

In this case half of the zero difference takes negative and

the other half positive rank


24/26


The null hypohesis is that the methods A and B are equivalet

If Ho is true, it would be expected that that the sum of allranks for positive differences (T+) would be close to the sum

for negative differences (T-).

The test statistic is than for two sided case:

Wilcoxon T-test is calculated as: T= min (T+, T-)

The smaller the value of T, the larger the significance of thedifference

BAH

BAH

:

:

1

0


25/26


Lets now do the example

sample R T d=R-T rank signed rank

1 114 116 -2 1 -1

2 49 42 7 7.5 7.5

3 100 95 5 4 4

4 20 10 10 9.5 9.5

5 90 94 -4 2.5 -2.5

6 106 100 6 5.5 5.5

7 100 96 4 2.5 2.5

8 95 102 -7 7.5 -7.5

9 160 150 10 9.5 9.5

10 110 104 6 5.5 5.5


26/26


The critical (Table) value of T as a function of n and a are

given in Table 5 of appendix.In our example, all positive differences adds up to T+=44.0

And all negative differences T-=11.0

If the calulated T value is equal to or smaller than the tablevalue, the null hyothesis is rejected.

For an a=0.05 and n=10 in our two sided test, the table

value is T=8.

Thus the nul hypothesis is accepted and we can conclude

that there is no diffrence between the two method

Test for Outliers

Documents

Transcript of Test for Outliers