Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

19
Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error Sergei Zhilin, [email protected] Altai State University, Barnaul, Russia

description

Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error. Sergei Zhilin, [email protected] Altai State University, Barnaul, Russia. Plan. Fitting under interval error Simple method for outlier detection Geometric correction of satellite images - PowerPoint PPT Presentation

Transcript of Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

Simple Method for Outlier Detectionin Fitting Experimental Data

under Interval Error

Sergei Zhilin,[email protected]

Altai State University,Barnaul, Russia

2

Plan

• Fitting under interval error

• Simple method for outlier detection

• Geometric correction of satellite images

• Connections between the proposed approach and other theories

• Conclusions

3

Fitting under Interval Error

f(x,) +…

x1x2

xp

y

Input variablesx = (x1,…,xp) measured

without error

Output variable y

measured with error

Modeling function with known structure

Model parametersto be estimated Measurement error

• Black box approach

4

Fitting under Interval Error

• Classical statistical approach often assumes that the measurement error is normal

• In real-life applications the error is rather interval than normal

• “Interval” means “unknown but bounded”: [j, j], where j is error bound in j-th

measurement, j=1,…,n• There are no other assumptions about the

error

5

Fitting under Interval Error

• The structure of the modeling functionf (x,) is assumed fixed

.,...,1,),( nj yxfy S jjjjjj

• Each row (xj , yj , j) of the measurements table constrains possible values of the parameter with the set

n

jjSA

1

• Values of the parameter consistent with all constraints form the uncertainty set

6

Set of feasible models

Fitting under Interval Error

• Fitting data with the model y = 1 + 2x

1

2

x

y

In (x, y) domain In (1, 2) domain

Uncertainty set A is unbounded =

not enough data to build the model

Uncertainty set A

Uncertainty set ASet of feasible

models

7

Fitting under Interval Error

• Problems that may be stated with respect to the uncertainty set A

– Model parameters estimation

,min iA

i

,max iA

i

:],[],...,,[ 11 pp α .,...,1 pi

• Interval estimates of

• Point estimates of

,21

iii

.,...,1 pi :,...,1

p

8

Fitting under Interval Error

• Problems that may be stated with respect to the uncertainty set A

– Prediction of the output variable value for fixed values of input variables

• Point estimate of y

)()(21

)( xyxyxy

,min)( xxy T

A

:)](),([ xyxy(x) y

• Interval estimate of y

,max)( xxy T

A

9

Fitting under Interval Error

• All the above problems make sense only if the uncertainty set is not empty

• Possible reasons of the emptiness of the uncertainty set– Presence of outliers in the data set– Wrong structure assumed for the modeling

function

10

Simple method for outlier detection

• Core idea– An outlier may be treated as a measurement

with the underestimated error (i.e. the actual measurement error is greater than the declared error j for it)

– What are the lower bounds j' for actual errors which provide non-empty uncertainty set?

11

Simple method for outlier detection

1

2

x

y

In variables domain In parameters domain

• How much must we stretch the declared error interval in order to «correct» an outlier?

j'j

Let j' = wj ·j

wj = ?

12

Simple method for outlier detection

• Weights wj may be found from the following optimization problem

(1)

(2)

n

jj

ww

1,min

,),( jjjjjjj wyxfwy nj ,...,1

(3),1jw nj ,...,1

(4),1jw nkj ,...,1We can only enlarge error intervals…

(3),1jw kj ,...,1

,...121 jwww

(5)njj www

mm ...21

......,....................

Uncertainty set constraints with movable bounds

…or “freeze” some of error

intervalsSome of the measurements

are obtained with equal errors

13

Simple method for outlier detection

• Example

#Measurement

method x y

1 A 1 2.13 0.20

2 A 2 2.95 0.20

3 A 3 5.01 0.20

4 A 4 4.99 0.20

5 A 5 5.97 0.20

6 B 6 7.04 0.40

7 B 7 8.02 0.40

8 C 8 8.15 0.40

9 C 9 10.01 0.40

10 D 10 10.98 0.50

1 2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

11

Data with outliers which give empty

uncertainty set

w

1.000

1.000

4.686

1.000

1.000

1.000

1.000

1.343

1.000

1.000 x

y

1st attempt Solution of LPP (1)-(3)

y = 1 + 2x

Looks like outlier caused by a blunder.Let’s try to exclude it.

Not so explicit.We need to examine the

precision of method C

14

Simple method for outlier detection

• Example

#Measurement

method x y

1 A 1 2.13 0.20

2 A 2 2.95 0.20

3 A 3 5.01 0.20

4 A 4 4.99 0.20

5 A 5 5.97 0.20

6 B 6 7.04 0.40

7 B 7 8.02 0.40

8 C 8 8.15 0.40

9 C 9 10.01 0.40

10 D 10 10.98 0.50

1 2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

11w

1.000

1.000

1.000

1.000

1.000

1.000

1.143

1.143

1.000 x

y

y = 1 + 2x

2nd attemptSolution of (1) subject to (2)-(3) and w8 = w9

Is the precision of the method C overestimated

on ~14%?

Summary

In order to correct inconsistent data set we have to answer the following questions:

1. Is the outlier #3 really caused by a blunder?

2. Is the outlier #8 caused by a blunder OR is the precision of the method C overestimated?

15

Geometric correction ofsatellite images

y

x u

v

Distorted image

+

++

++

+

+

+

++

Target coordinate system

Ground Control Points

5919.3014309.602714179514

5927.5514349.30274520452

5991.4914486.30307229351

vuyx

Target coordinatesSource coordinates#

202

22011011000

202

22011011000

vbubuvbvbubby

vauauvavauaax

Geometric transformation

Obtained usinghigh-precision methods (GPS, large-scale maps)

Pointed by operatoron the screen with the error ≥ 1 pixel

+ +

Outliers are detected «on the fly» and operator

is noticed about error

+

After correction of outliers and building transformation,

target image is built

16

Geometric correction ofsatellite images

Resulting image with ground control points

Po

siti

on

al u

nce

rta

inty

(x x

)+(y

y)

, p

ixel

s

Resulting image with positional uncertainty map

17

Connections with other theories

• Proposed approach andinconsistent linear programming problems– When outliers are presented in the data, most of the

problems with respect to the uncertainty set may be stated as inconsistent linear programming problems

– Simple outlier detection method may be regarded as one of the possible ways to correct an inconsistent linear programming problem by building a minimal cost approximation by a proper linear programming problem.

18

Connections with other theories

• Proposed approach and robust estimation

(1)

(2)

n

jj

ww

1,min

,),( jjjjjjj wyxfwy nj ,...,1

(3),1jw nj ,...,1

We can only enlarge error intervals…

Uncertainty set constraints with movable bounds (3'),0jw nj ,...,1

We allow to scale error intervals freely

(to expand and to contract)

Solution (*, w*) of (1)-(3') gives

* is M-estimator for parameters (known as L1)

Weight function: W(x) = 1/|x|.

Residuals: wj*·j.

19

Conclusions

• Outlier detection is necessary tool in fitting experimental data

• Interval error model provides effective means of solving outliers detection problem

• Proposed approach is based on the simple idea and may be simply implemented

• Proposed approach provides flexible way to express and take into account a priori information