Robust regression - Robust estimation of regression...

43
Introduction Robust regression Examples Conclusion Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics Jiří Franc Robust regression 16. 11. 2009

Transcript of Robust regression - Robust estimation of regression...

Page 1: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Robust regressionRobust estimation of regression coefficients in linear regression

model

Jiří Franc

Czech Technical UniversityFaculty of Nuclear Sciences and Physical Engineering

Department of Mathematics

Jiří Franc Robust regression 16. 11. 2009

Page 2: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Outline

1 IntroductionRegression analysisClassical regression estimators

2 Robust regressionL-estimatorsM-estimatorsR-estimatorsS-estimatorsLeast median of squares (LMS)Least trimmed squares (LTS)

3 Examples

4 Conclusion

Jiří Franc Robust regression 16. 11. 2009

Page 3: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Linear regression model

Definition

The multiple linear regression model is the model

Yi = Xi,1β01 + Xi,2β

02 + · · ·+ Xi,pβ

0p + ei = XT

i β0 + ei i = 1 . . . n.

Where

n is the sample size,

Xi=(Xi,1,Xi,2, . . . ,Xi,p)T is called explanatory variables or regressors and

it is a sequence of deterministic p-dimensional vectors or a sequence ofrandom variables,

Yi is called response variable (dependent variable) and it is i-th elementof the random sequence of observations,

β0=(β01 , β

02 , . . . , β

0p)T is a p-dimensional vector of true regression

coefficients,

ei is called the sequence of disturbances (error terms), it representsunexplained variation in the dependent variable and it is a sequence ofrandom variables.

Jiří Franc Robust regression 16. 11. 2009

Page 4: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Linear regression model

Let rewrite the previous definition with n equations in matrix notation.

DefinitionY1Y2...

Yn

=

X1,1 X1,2 . . . X1,pX2,1 X2,2 . . . X2,p

......

. . ....

Xn,1 Xn,2 · · · Xn,n

β01β0

2...β0

p

+

e1e2...en

Equivalently,

Y = Xβ0 + e

Jiří Franc Robust regression 16. 11. 2009

Page 5: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Classical regression estimators

Ordinary least squares method (OLS)

βOLS = arg minβ∈Rp

n∑i=1

(Yi − XTi β)2 = arg min

β∈Rp(Y − Xβ)T (Y − Xβ)

βOLS = (XTX)−1XTY

In the certain conditions OLS is the best linear unbiased estimatorof β0.

In the certain conditions OLS is the best estimator among allunbiased estimators (ordinary least squares method is best formultiple regression when the iid errors are normal distributed.)

OLS is not robust and consequently often gives false result for realdata (even a single regression outlier can totally offset the OLSestimator).

Jiří Franc Robust regression 16. 11. 2009

Page 6: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Motivation

An example of outliers in y-direction and OLS estimate.

● ● ● ● ● ● ● ● ● ● ● ● ●●

●●

●● ●

50 55 60 65 70

05

1015

20

Number of International Calls (in tens of millions) from Belgium dependence on Year

Year

NoC

alls

LegendDatasetOLS−estimate

Jiří Franc Robust regression 16. 11. 2009

Page 7: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Motivation

An example of outliers in y-direction and OLS estimate.

0 1 2 3 4 5

12

34

56

X

Y

LegendDatasetOLS−estimate

Jiří Franc Robust regression 16. 11. 2009

Page 8: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Motivation

An example of outliers in x-direction and OLS estimate.

0 2 4 6 8 10

05

1015

X

Y

LegendDatasetOLS−estimate

Jiří Franc Robust regression 16. 11. 2009

Page 9: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Robust regression

The main aims of robust statistics:

description of the structure best fitting the bulk of the data.

identification of deviating data points (outliers) or deviatingsubstructures for further treatment.

identification of highly influential data points (leverage points) or atleast warning about them.

deal with unsuspected serial correlations.

Two ways how to deal with regression outliers:

Regression diagnostics: where certain quantities are computedfrom the data with the purpose of pinpointing influential points,after which these outliers can be removed or corrected.

Robust regression: which tries to devise estimators that are not sostrongly affected by outliers.

Jiří Franc Robust regression 16. 11. 2009

Page 10: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Robust regression

The main aims of robust statistics:

description of the structure best fitting the bulk of the data.

identification of deviating data points (outliers) or deviatingsubstructures for further treatment.

identification of highly influential data points (leverage points) or atleast warning about them.

deal with unsuspected serial correlations.

Two ways how to deal with regression outliers:

Regression diagnostics: where certain quantities are computedfrom the data with the purpose of pinpointing influential points,after which these outliers can be removed or corrected.

Robust regression: which tries to devise estimators that are not sostrongly affected by outliers.

Jiří Franc Robust regression 16. 11. 2009

Page 11: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

L-estimators

α− trimmed estimators

β(Lte ,α) =1

n − 2 bnαc

n−bnαc∑i=bnαc+1

Y(i),

where (Y(1), . . . ,Y(n)) is the order statistic and α is definite coefficient:α ∈ 〈0, 1〉.

α− regression quantile (Koenker, Bassett (1978))

β(Lrq ,α) = arg minβ∈Rp

n∑i=1

ρα(ri ),

where ri (β) = Yi − XTi β i = 1 . . . n and ρα(ri ) =

{α |ri | if ri ≥ 0(α− 1) |ri | if ri < 0

.

L-estimators are scale equivariant and regression equivariant.However the breakdown point is still 0% and for α = 0.5 holds, 0.5-regressionquantile estimator is least absolute values estimator.

Jiří Franc Robust regression 16. 11. 2009

Page 12: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

L-estimators

Least absolute values method

βL1 = arg minβ∈Rp

n∑i=1

∣∣Yi − XTi β∣∣ = arg min

β∈Rp

n∑i=1

|ri (β)|

Least maximum deviation method

βL∞ = arg minβ∈Rp

(maxi∈n

|ri (β)|)

Jiří Franc Robust regression 16. 11. 2009

Page 13: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

M-estimators are based on the idea of replacing the squared residuals used inOLS estimation by another function of the residuals.

M-estimators

β(M) = arg minβ∈Rp

n∑i=1

ρ(ri (β)),

where ρ is a symmetric function with a unique minimum at zero.

Differentiating this expression with respect to the regression coefficients yields:

M-estimatorsn∑

i=1

ψ(ri )Xi = 0,

where ψ is the derivative of ρ. The M-estimate is obtained by solving thissystem of p equations.

Jiří Franc Robust regression 16. 11. 2009

Page 14: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

OLS , L1 are also M-estimators with ψ(t) = t for OLS and ψ(t) = sgn(t)for L1 estimate.

M-estimators are unfortunately not scale equivariant even if they areregression equivariant. Hence one has to studentizate the M-estimators byan estimate of scale of disturbances σ necessarily.

β(M) = arg minβ∈Rp

n∑i=1

ρ

(ri (β)

σ

),

One possibility is to use the median absolute deviation (MAD):

σ = C mediani

(∣∣∣∣ri −medianj

(rj)∣∣∣∣) ,

where C is a constant (correction factor), which depends on the distribution.For normally distributed data C = 1.4826.

Jiří Franc Robust regression 16. 11. 2009

Page 15: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

Let vectors (Yi ,XTi )T are iid with distribution function F (Y ,X). If the function

ρ has an absolutely continuously derivation ψ, σ = 1 for simplicity and thefunctional T (F ) corresponding to the M-estimator then the functional T (F ) isthe solution of ∫

ψ(Y − XTT (F ))XdF (X,Y ) = 0.

DefineM(ψ,F ) :=

∫ψ′(Y − XTT (F ))XXTdF (X,Y ).

Then the influence function of T at a distribution F (on Rp × R) is given by

IF (X0,Y0,T ,F ) = M−1(ψ,F )X0ψ(Y0 − X0TT (F )).

The influence function with respect of Y0 can by bounded by choice of ψ , butthe influence function of M-estimators is unbounded in respect of X0.

The breakdown point of M-estimators is 0% due to the vulnerability to leveragepoints.

Jiří Franc Robust regression 16. 11. 2009

Page 16: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

Maronna and Yoahai (1981) showed, under certain conditions, thatM-estimators are consistent and asymptoticly normal with asymptoticcovariance matrix

V (T ,F ) = E[IF (X,Y ,T ,F )IFT (X,Y ,T ,F )

]=

=∫

IF (X,Y ,T ,F )IFT (X,Y ,T ,F )dF (X,Y ) == M(ψ,F )−1Q(ψ,F )M(ψ,F )−1

whereM(ψ,F ) =

∫ψ′(Y − XTT (F ))XXTdF (X,Y )

Q(ψ,F ) =∫ψ2(Y − XTT (F ))XXTdF (X,Y )

If disturbances are normal distributed (ei ∼ N(0, σ0)) and iid , the multivariateM-estimator has the asymptotic covariance matrix

V (ψ,Φ) =E [(XXT )−1]σ2

e,

where e is the asymptotic efficiency defined by

e =

(∫ψ′dΦ

)2∫ψ2dΦ

.

Jiří Franc Robust regression 16. 11. 2009

Page 17: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

Huber minimax M-estimator

ψ(t) =

{t if t < bb sgn(t) if t ≥ b

where b is a constant.

Andrew M-estimator

ψ(t) =

{sin(t) if − π ≤ |t| < π

0 otherwise

Tukey M-estimator

ψ(t) =

t(1−

( tc

)2)2

if |t| < c

0 otherwise

where c is a constant.

Descending minimaxM-estimator

ψ(t) =

t if |t| < ab sgn(t) tanh

[ 12b(c − |t|)

]if a ≤ |t| < c

0 otherwise

where a, b and c are constants.

Hampel M-estimator

ψ(t) =

t if |t| < aa sgn(t) if a ≤ |t| < bc−|t|c−b sgn(t) if b ≤ |t| < c0 otherwise

where a, b and c are constants.

Jiří Franc Robust regression 16. 11. 2009

Page 18: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

−4 −2 0 2 4

−1.

0−

0.5

0.0

0.5

1.0

Huber minimax (b=1)

t

psi(t

)

Jiří Franc Robust regression 16. 11. 2009

Page 19: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

−4 −2 0 2 4

−1.

0−

0.5

0.0

0.5

1.0

Hampel (a=1,b=2,c=3)

t

psi(t

)

Jiří Franc Robust regression 16. 11. 2009

Page 20: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

−4 −2 0 2 4

−1.

0−

0.5

0.0

0.5

1.0

Descending minimax (a=1,b=1.2,c=3)

t

psi(t

)

Jiří Franc Robust regression 16. 11. 2009

Page 21: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

−4 −2 0 2 4

−1.

0−

0.5

0.0

0.5

1.0

Andrew

t

psi(t

)

Jiří Franc Robust regression 16. 11. 2009

Page 22: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

M-estimators

−4 −2 0 2 4

−0.

50.

00.

5

Tukey (c=3)

t

psi(t

)

Jiří Franc Robust regression 16. 11. 2009

Page 23: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

GM-estimators

Generalized M-estimators are introduced in order to bounding theinfluence function of outlying Xi ’s by means of some weight function w .

GM-estimators

β(GM) = arg minβ∈Rp

n∑i=1

w(Xi )ρ(ri (β))

σ

The definition can be rewrite ton∑

i=1

w(Xi )ψ( riσ

)Xi = 0.

Unfortunately Maronna, Buston and Yohai (1979) showed that thebreakdown point of GM-estimators can be no better than a certain valuethat decrease as a function of p−1, where p is the number of regressioncoefficients.

Jiří Franc Robust regression 16. 11. 2009

Page 24: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

GM-estimators

The algorithm of Iteratively reweighted least squares with GM-estimatesbased on some ψ function is following.

1 The first elementary estimate β(OLS) of β0.

2 Count the residuals ri (β) = Yi − Yi = Yi − XTi β i = 1 . . . n.

3 Count the estimate σ of σ.

(e.g. MAD: σ = 1.4826 mediani

(∣∣∣∣ri −medianj

(rj)∣∣∣∣) )

4 Count the weights wi .

(e.g. Andrew’s ψ function: wi =ψ( ri

σ )riσ

)

5 Update the estimate β by performing a weighted least squares withthe weights wiCalculate β(WLS) = (XTWX)−1XTWY

6 Go back to item 2 and iterate until convergence

Jiří Franc Robust regression 16. 11. 2009

Page 25: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

R-estimators

R-estimation is procedure based on the ranks of the residuals. The idea ofusing rank statistics has been extended to the domain of multipleregression by Jurečková(1971) and Jaeckel(1972).

Let Ri (β) be the rank of Yi − XiTβ, further let an(i), i = 1, . . . , n be a

nondecreasing scores function, satisfying

n∑i=1

an(i) = 0.

R-estimator

β(R,ai ) = arg minβ∈Rp

n∑i=1

an(Ri (β))(Yi − XTi β) = arg min

β∈Rp

n∑i=1

an(Ri )ri .

Jiří Franc Robust regression 16. 11. 2009

Page 26: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

R-estimators

Some possibilities for the scores function an(i) are:

Wilcoxon scores

an(i) = i − n + 12

Median scores

an(i) = sgn(

i − n + 12

)Bounded normal scores

an(i) = min(

c,max{

Φ−1(

in + 1

),−c

}),

where c is a parameter .

Van der Waerden scores

an(i) = Φ−1(

in + 1

)

Jiří Franc Robust regression 16. 11. 2009

Page 27: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

R-estimators

The most widely used R-estimator is the estimator with Wilcoxonscores function, because the resulting R-estimator have a relativelysimple structure.

R-estimators are automatically scale equivariant.

The breakdown point of R-estimators is still 0% due to thevulnerability to leverage points.

In the case of regression with intercept, the constant term isestimated separately.

In order to solve the R-estimates, one has to find the minimum ofthe objective function by varying the unknown parameters.

Jiří Franc Robust regression 16. 11. 2009

Page 28: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

S-estimators

S-estimators were introduced by Rousseeuw and Yohai (1984), they arederived from a scale statistic in an implicit way and they are regression,scale and affine equivariant.

S-estimators are defined by minimization of the dispersion of the residualsby:

S-estimator

β(S,c,K) = arg minβ∈Rp

s(r1(β), . . . , rn(β)) = arg minβ∈Rp

s(β),

where s(β) is estimator of scale defined as the solution of equation

1n

n∑i=1

ρ

(ri (β)

s(β)

)= K .

Jiří Franc Robust regression 16. 11. 2009

Page 29: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

S-estimators

The function ρ must satisfy the following conditions:

C1 ρ is symmetric and continuously differentiable functionand ρ(0) = 0.

C2 There exists parameter c > 0 such that ρ is strictlyincreasing on 〈0, c〉 and constant on 〈c ,∞〉.

For the S-estimator with breakdown point of 50% one extra condition hasto be add.

C3Kρ(c)

=12

An example: Tukey’s biweight function ρ.

ρ(x) =

{x2

2 − x4

2c2 + x6

6c4 if |x | ≤ cc2

6 otherwise,

where c is a parameter from condition C2.Jiří Franc Robust regression 16. 11. 2009

Page 30: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

S-estimators

Under certain conditions, the behavior of β(S) at the normal model,where (Yi , xi ) are iid random variables satisfying Yi = xT

i β0 + ei with

ei ≈ N(0, σ2) is given by

limn→∞

√n(β(S) − β0

)= N

(0,

E[XTX

]−1 (∫ψ2dΦ

)(∫ψ′dΦ

)2).

Consequently the asymptotic efficiency is defined by

e :=

(∫ψ′dΦ

)2(∫ψ2dΦ

) .The same formulas was derived also for the M-estimator.

Jiří Franc Robust regression 16. 11. 2009

Page 31: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

S-estimators

An example of the S-estimator corresponding to the Tukey’s function ρ withbreakdown point of 50% is K = EΦ[ρ] and c = 1.547.

EΦ[ρ] =+∞∫−∞

ρ(x)φ(x)dx = 2−c∫−∞

c2

6 φxdx +c∫

−c

x2

2 − x4

2c2 + x6

6c4 φ(x)dx =

= c2

3 Φ(−c) + 12

c∫−c

x2φ(x)dx − 12c2

c∫−c

x4φ(x)dx + 16c4

c∫−c

x6φ(x)dx =

= c2

3 Φ(−c) + 12

(1− 2Φ(−c)− (2c)e−

c22

)−

− 12c2

(3− 6Φ(−c)−

(2c3 + 6c

)e−

c22

)+

+ 16c4

(15− 30Φ(−c)−

(2c5 + 10c3 + 30c

)e−

c22

)and

Kρ(c)

= α ∈ (0; 0.5]

Jiří Franc Robust regression 16. 11. 2009

Page 32: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

S-estimators

The asymptotic efficiency of the S-estimator corresponding to theTukey’s function ρ for different values of the breakdown point ε∗.

ε∗ e c K50% 28.7% 1.547 0.199545% 37.0% 1.756 0.231240% 42.6% 1.988 0.263435% 56.0% 2.251 0.295730% 66.1% 2.560 0.327825% 75.9% 2.937 0.359320% 84.7% 3.420 0.389915% 91.7% 4.096 0.419410% 96.6% 5.182 0.4475

MM-estimators: high-breakdown and high-efficiency estimators, wherethe initial estimate is obtained with an S-estimator, and it is thenimproved with an M-estimator.

Jiří Franc Robust regression 16. 11. 2009

Page 33: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

LMS-estimator

LMS is probably the first really applicable 50% breakdown pointestimator introduced by Rousseeuw (1984). The idea of the LMS isto replace the sum operator by a median, which is very robust.

Least median of squares estimator

β(LMS) = arg minβ∈Rp

(med

i(r2

i (β))

).

There always exists a solution for the LMS estimator.The LMS estimator is regression equivariant , scale equivariantand affine equivariant.If p > 1 then the breakdown point of the LMS method is

ε∗ := ([n/2]− p + 2)/n.

The LMS is not asymptotically normal.

Jiří Franc Robust regression 16. 11. 2009

Page 34: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

LMS-estimator

LMS is probably the first really applicable 50% breakdown pointestimator introduced by Rousseeuw (1984). The idea of the LMS isto replace the sum operator by a median, which is very robust.

Least median of squares estimator

β(LMS) = arg minβ∈Rp

(med

i(r2

i (β))

).

There always exists a solution for the LMS estimator.The LMS estimator is regression equivariant , scale equivariantand affine equivariant.If p > 1 then the breakdown point of the LMS method is

ε∗ := ([n/2]− p + 2)/n.

The LMS is not asymptotically normal.

Jiří Franc Robust regression 16. 11. 2009

Page 35: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

LTS-estimator

Least trimmed squares estimator

β(LTS) = arg minβ∈Rp

=h∑

i=1

r2(i)(β),

where r2(1) ≤ . . . ≤ r2

(n) are the ordered squared residuals.

There always exists a solution for the LTS-estimator.

The LTS estimator is regression equivariant , scale equivariant andaffine equivariant.

If p > 1, h = [n/2] + [(p + 1)/2] then the breakdown point of theLTS-estimator is

ε∗ := ([n − p]/2 + 1)/n.

The LTS has the same asymptotic efficiency at the normaldistribution as the Huber-type of M-estimator.

Jiří Franc Robust regression 16. 11. 2009

Page 36: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

LTS-estimator

Least trimmed squares estimator

β(LTS) = arg minβ∈Rp

=h∑

i=1

r2(i)(β),

where r2(1) ≤ . . . ≤ r2

(n) are the ordered squared residuals.

There always exists a solution for the LTS-estimator.

The LTS estimator is regression equivariant , scale equivariant andaffine equivariant.

If p > 1, h = [n/2] + [(p + 1)/2] then the breakdown point of theLTS-estimator is

ε∗ := ([n − p]/2 + 1)/n.

The LTS has the same asymptotic efficiency at the normaldistribution as the Huber-type of M-estimator.

Jiří Franc Robust regression 16. 11. 2009

Page 37: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

LTS-estimator

Least trimmed squares estimator

β(LTS) = arg minβ∈Rp

=h∑

i=1

r2(i)(β),

where r2(1) ≤ . . . ≤ r2

(n) are the ordered squared residuals.

The drawback of the LTS is that the objective function requiressorting of the squared residuals, which takes O(n log n) operations.

Robust high breakdown point estimators like the LTS can be verysensitive to a very small change of data or to a deletion of even onepoint from data set (i.e. small change of data can really cause alarge change of the estimate).

Jiří Franc Robust regression 16. 11. 2009

Page 38: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Examples

● ● ● ● ● ● ● ● ● ● ● ● ●●

●●

●● ●

50 55 60 65 70

05

1015

20

Number of International Calls (in tens of millions) from Belgium dependence on Year

Year

NoC

alls

LegendDatasetOLSM−HampelM−HuberMM−TukeyLTS

Jiří Franc Robust regression 16. 11. 2009

Page 39: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Examples

An example of outliers in y-direction.

0 1 2 3 4 5

12

34

56

X

Y

LegendDatasetOLSM−HampelM−HuberMM−TukeyLTS

Jiří Franc Robust regression 16. 11. 2009

Page 40: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Examples

An example of outliers in x-direction.

0 2 4 6 8 10 12

−5

05

1015

X

Y

LegendDatasetOLSM−HampelM−HuberMM−TukeyLTS

Jiří Franc Robust regression 16. 11. 2009

Page 41: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Examples

Robust regression in R

m.OLS <- lm(NoCalls~(Year),data=dataB)

library(MASS)

m.Huber <- rlm(NoCalls~(Year),data=dataB,psi = psi.huber)m.Hampel <- rlm(NoCalls~(Year),data=dataB,psi = psi.hampel)m.Tukey <- rlm(NoCalls~(Year),data=dataB,psi = psi.bisquare)m.MM.Bisq <- rlm(NoCalls~(Year),data=dataB, method=’MM’)

library(robustbase) - a package of basic robust statistics.

m.LTS <- ltsreg(NoCalls~(Year),data=dataB)

library(robust) - a package of robust methods.

Jiří Franc Robust regression 16. 11. 2009

Page 42: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

Conclusion and Discussion

Thank you for attention. The discussion is open.

Jiří Franc Robust regression 16. 11. 2009

Page 43: Robust regression - Robust estimation of regression ...artax.karlin.mff.cuni.cz/~adaml5am/Seminar/0910z/Franc-prez.pdf · Introduction Robust regression Examples Conclusion Robust

Introduction Robust regression Examples Conclusion

References

Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel,W. A. (1986), Robust Statistics: The Approach Based onInfluence Functions, J. Wiley, New York.Rousseeuw, P. J. and Leroy, A. M. (1987), Robust Regressionand Outlier Detection. J. Wiley, New York.Jurečková, J. (2001). Robustní statistické metody, Karolinum,PragueVíšek, J. Á. (2000): Regression with high breakdown point.Robust 2000, 2001, 324 - 356.

Jiří Franc Robust regression 16. 11. 2009