Minimum Distance Lasso for robust high-dimensional regression
Seminar on Robust Regression Methods
Transcript of Seminar on Robust Regression Methods
ROBUST REGRESSION METHOD
By,SUMON JOSE
A Seminar Presentation
Under the Guidence of Dr. Jessy John
February 24, 2015
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 1 / 69
CONTENTS
1 INTRODUCTION2 REVIEW3 ROBUSTNESS & RESISTANCE4 APPROACH5 STRENGTHS & WEAKNESSES6 M- ESTIMATORS7 DELIVERY TIME PROBLEM8 ANALYSIS9 PROPERTIES10 SURVEY OF OTHER ROBUST REGRESSION
ESTIMATORS11 REFERENCE
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 2 / 69
INTRODUCTION
Perfomance Evaluation- Geethu Anna Jose
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 3 / 69
REVIEW
The classical linear regression model relates the
dependednt or response variables yi to independent
explanatory variables xi1, xi2, ..., xip for i = 1, .., n, such
that
yi = xTi β + εi , (1)
for i=1,...,n
where xTi = (xi1, xi2, ..., xip), εi denote the error terms and
β = (β1, β2, ..., βp)T
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 4 / 69
REVIEW
The expected value of yi called the fitted value is
yi = xTi β (2)
and one can use this to calculate the residual for the i th
case,
ri = yi − yi (3)
In the case of simple linear regression model we may
calculate the value of β0 and β1 using the following
formulae:SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 5 / 69
REVIEW
β1 =
∑ni=1 yixi −
∑ni=1 yi
∑ni=1 xi
n∑ni=1 x
2i −
(∑n
i=1 xi )2
n
(4)
β0 = y − β1x (5)
The vector of fitted values yi curresponding to the
observed value yi may be expressed as follows:
y = X β (6)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 6 / 69
REVIEW
Limitations of Least Square Estimator
Extremly sensitive to deviations from the model
assumptions (as normal distribution is assumed for the
errors).
Drastically changed by the effect of outliers.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 7 / 69
REVIEW
What About Deleting Outliers Before Analysis
All the Outliers need not be erroneous data, they
could be exceptional occurances
Some of such Outliers could be the result of some
factors not considered in the current study
So in general, unusual observations are not always bad
observations. Moreover in large data it is often very
difficult to spot out the outlying data.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 8 / 69
ROBUSTNESS AND RESISTANCE
Resistant Regression Estimators
Definition
The Resistant regression estimators are primarily
concerned with robustness of validity: meaning that their
main concern is to prevent unsual observations from
affecting the estimates produced.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 9 / 69
ROBUSTNESS AND RESISTANCE
Robust Regression Estimators
Definition
They are concerned with both robustness of efficiency and
robustness of validity, meaning that they should also
maintain a small sampling variance, even when the data
does not fit the assumed distribution.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 10 / 69
ROBUSTNESS AND RESISTANCE
⇒ In general Robust regression estimators aim to fit
a model that describes the majority of a sample.
⇒ Their robustness is achieved by giving the data
different weights
⇒ Whereas in Least Square Approximation all data
are treated equally.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 11 / 69
APPROACH
Robust Estimation methods are powerful tools in
detection of outliers in complicated data sets.
But unless the data is very well behaved, different
estimators would give different estimates.
On their own, they do not provide a final model.
A healthy approach would be to employ both robust
regression methods as well as least square method to
compare the results.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 12 / 69
STRENGTHS & WEAKNESSES
Finite Sample Breakdown Point
Definition
Breakdown Point (BDP) is the measure of the resistance
of an estimator. The BDP of a regression estimator is the
smallest fraction of contamination that can cause the
estimator to ’breakdown’ and no longer represent the
trend of the data.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 13 / 69
STRENGTHS & WEAKNESSES
When an estimator breaks down, the estimate it produces
from the contaminated data can become arbitrarily far
from the estimate than it would give when the data was
uncontaminated.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 14 / 69
STRENGTHS & WEAKNESSES
In order to describe the BDP mathematically, define T as
a regression estimator, Z as a sample of n data points and
T (Z ) = β. Let Z′
be the corrupted sample where m of
the original data points are replaced with arbitrary values.
The maximum effect that could be caused by such
contamination is
effect(m; T ,Z ) = supz ′ |T (Z′)− T (Z )| (7)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 15 / 69
STRENGTHS & WEAKNESS
When (7) is infinite, an outlier can have an arbitrarily
large effect on T . The BDP of T at the sample Z is
therefore defined as:
BDP(T ,Z ) = min{m
n: effect(M ; T ,Z )is infinite} (8)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 16 / 69
STRENGTH & WEAKNESSES
The Least Square Method estimator for example has a
breakdown point of 1n because just one leverage point can
cause it to breakdown. As the number of data increases,
the breakdown point tends to 0 and so it is said to that
the least squares estimator has BDP 0%.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 17 / 69
STRENGTH & WEAKNESS
Remark
The highest breakdown point one can hope for is 50% as
if more than half the data is contaminated that one
cannot differentiate between ’good’ and ’bad’ data.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 18 / 69
STRENGTH & WEAKNESSES
Relative Efficiency of an Estimator
Definition
The efficiency of an estimator for a particular parameter is
defined as the ratio of its minimum possible variance to
its actual variance. Strictly, an estimator is considered
’efficient’ when this ratio is one.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 19 / 69
STRENGTH & WEAKNESSES
High efficiency is crucial for an estimator if the intention
is to use an estimate from sample data to make inference
about the larger population from which the same was
drawn.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 20 / 69
STRENGTH & WEAKNESSES
Relative Efficiency
Relative efficiency compares the efficiency of an
estimator to that of a well known method.
In the context of regression, estimators are compared
to the least squares estimator which is the most
efficient estimator known.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 21 / 69
STRENGTH & WEAKNESSES
Given two estimators T1 and T2 for a population
parameter β, where T1 is the most efficient estimator
possible and T2 is less efficient, the relative efficiency of
T2 is calculated as the ratio of its mean squared error to
the mean squared error of T1
Efficiency(T1,T2) =E [(T1 − β)2]
E [(T2 − β)2](9)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 22 / 69
M-ESTIMATORS
Introduction
1 Were first proposed by Huber(1973)
2 But the early ones had the weakness in terms of one
or more of the desired properties
3 From them developed the modern means
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 23 / 69
M-ESTIMATORS
Maximum Likelihood Type Estimators
M-estimation is based on the idea that while we still want
a maximum likelihood estimator, the errrors might be
better represented by a different, heavier tailed
distribution.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 24 / 69
M-ESTIMATORS
If the probability distribution function of the error of f (εi),
then the maximum likelihood estimator for β is that
which maximizes the likelihood function
n∏i=1
f (εi) =n∏
i=1
f (yi − xTi β) (10)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 25 / 69
M-ESTIMATORS
This means, it also maximizes the log-likelihood function
n∑i=1
ln f (εi) =n∑
i=1
ln f (yi − xTi β) (11)
When the errrors are normally distributed it has been
shown that this leads to minimising the sum of squared
residuals, which is the ordinary least square method.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 26 / 69
M-ESTIMATORS
Assuming the the errors are differently distributed, leads to
the maximum likelihood esimator, minimising a different
function. Using this idea, an M-estimator β minimizes
n∑i=1
ρ(εi) =n∑
i=1
ρ(yi − xTi β) (12)
where ρ(u) is a continuous, symmetric function called the
objectve function with a unique minimum at 0.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 27 / 69
M-ESTIMATORS
1 Knowing the appropriate ρ(u) to use requires
knowledge of how the errors are really distributed.
2 Functions are usually chosen through consideration of
how the resulting estimator down-weights the larger
residuals
3 A Robust M-estimator achieves this by minimizing the
sum of a less rapidly increasing objective function than
the ρ(u) = u2 of the least squares
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 28 / 69
M-ESTIMATORS
Constructing a Scale Equivariant Estimator
The M-estimator is not necessarily scale invariant i.e. if
the errors yi − xTi β were multiplied by a constant, the
new solution to the above equation might not be the
same as the scaled version of the old one.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 29 / 69
M-ESTIMATORS
To obtain a scale invariant version of this estimator we
usually solve,
n∑i=1
ρ(εis
) =n∑
i=1
ρ(yi − xT
i β
s) (13)
where s is a robust estimate of scale.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 30 / 69
M-ESTIMATORS
A popular choice for s is the re-scaled median absolute
deivation
s = 1.4826XMAD (14)
where MAD is the Median Absolute Deviation
MAD = Median|yi − xTi β| = Median|εi | (15)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 31 / 69
M-ESTIMATORS
’s’ is highly resistant to outlying observations, with BDP
50%, as it is based on the median rather than the mean.
The estimator rescales MAD by the factor 1.4826 so that
when the sample is large and εi really distributed as
N(0, σ2)), s estimates the standard deviation.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 32 / 69
M-ESTIMATORS
With a large sample and εi ∼ N(0, σ2):
P(|εi | < MAD) ≈ 0.5
⇒ P(|εi−0σ | <
MADσ ) ≈ 0.5
⇒ P(|Z | < MADσ ) ≈ 0.5
⇒ MADσ ≈ Φ−1(0.75)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 33 / 69
M-ESTIMATORS
⇒ MADΦ−1 ≈ σ
1.4826 X MAD ≈ σ
Thus the tuning constant 1.4826 makes s an
approximately unbiased estimator of σ if n is large and the
error distribution is normal.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 34 / 69
M-ESTIMATORS
Finding an M-Estimator
To obtain an M-estimate we solve,
Minimizeβ
n∑i=1
ρ(εis
) = Minimizeβ
n∑i=1
ρ(yi − x
′
iβ
s) (16)
For that we equate the first partial derivatives of ρ with
respect to βj (j=0,1,2,3,...,k) to zero, yielding a necessary
condition for a minimum.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 35 / 69
M-ESTIMATORS
This gives a system of p = k + 1 equations
n∑i=1
Xijψ(yi − x
′
iβ
s) = 0, j = 0, 1, 2, ..., k (17)
where ψ = ρ′ and Xij is the i th observation on the j th
regressor and xi0 = 1. In general ψ is a non-linear
function and so equation (17) must be solved iteratively.
The most widely used method to find this is the method
of iteratively reweighted least squares.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 36 / 69
M-ESTIMATORS
To use iteratively reweighted least squares suppose that aninitial estimate of β0 is available and that s is an estimateof the scale. Then we write the p = k + 1 equations as:
n∑i=1
Xijψ(yi − x
′i β
s) =
n∑i=1
xij{ψ[(yi − x ′iβ)/s]/(yi − x ′iβ)/s}(yi − x ′iβ)
s= 0
(18)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 37 / 69
M-ESTIMATORS
as
n∑i=1
XijW0i (yi − xiβ) = 0, j = 0, 1, 2, ..., k (19)
where
W 0i =
ψ[
(yi−x′i β)
s ]
(yi−x′iβ)
s
if yi 6= x′
i β0
1 if yi = x′
i β0
(20)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 38 / 69
M-ESTIMATORS
We may write the above equation in matrix form as
follows:
X′W 0Xβ = X
′W 0y (21)
where W0 is an n X n diagonal matrix of weights with
diagonal elements given by the expression
W 0i =
ψ[
(yi−x′i β)
s ]
(yi−x′iβ)
s
if yi 6= x′
i β0
1 if yi = x′
i β0
(22)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 39 / 69
M-ESTIMATORS
From the matrix form we realize that the expression is
same as that of the usual weighted least squares normal
equation. Consequently the one step estimator is
β1 = (X′W 0X )−1X
′W 0y (23)
At the next step we recompute the weights from the
equation for W but using β1 and not β0
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 40 / 69
M-ESTIMATORS
NOTE:
Usually only a few iterations are required to obtain
convergence
It could be easily be implemented by a computer
programme.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 41 / 69
M-ESTIMATORS
Re-Descending Estimators
Re- descending M estimators are those which have
influence functions that are non decreasing near the origin
but decreasing towards zero far from the origin.
Their ψ can be chosen to redescend smoothly to zero, so
that they usually satisfy ψ(x) = 0 for all |x | > r where r
is referred to as the minimum rejection point.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 42 / 69
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 43 / 69
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 44 / 69
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 45 / 69
M-ESTIMATORS
Robust Criterion Functions
Citerion ρ ψ(z) w(x) range
Least
Squares z2
2 z 1.0 |z | <∞Huber’s
t-function z2
2 z 1.0 |z | < t
t = 2 |z |t − t2
2 tsign(z) t|z | |x | > t
Andrew’s
Wave function a(1− cos(za)) sin(za)sin( z
a )za|z | ≤ aπ
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 46 / 69
DELIVERY TIME PROBLEM
ProblemA Softdrink bottler is analyzing the vending machine service routes in hisdistriution system. He is interested in predicting the amount of timerequired by the route driver to service the vending machines in an outlet.This service activity includes stocking the machine with beverage productsand minor maintenance or housekeeping. The industrial engineerresponsible for the study has suggested that the two most importantvariables affecting the delivery time (y) are the numer of cases of productstocked (x1) and the distance walked by the route driver (x2). Theengineer has collected 25 observations on delivery time, which are shownin the following table. Fit a regression model into it.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 47 / 69
DELIVERY TIME PROBLEM
Table of DataObservation Delivery time Number of cases Distance in Feets
i (in minutes) y x1 x21 16.8 7 5602 11.50 3 3203 12.03 3 3404 14.88 4 805 13.75 6 1506 18.11 7 3307 8 2 1108 17.83 7 2109 79.24 30 1460
10 21.50 5 60511 40.33 16 68812 21 10 21513 13.50 4 255
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 48 / 69
DELIVERY TIME PROBLEM
Observation Delivery time Number of cases Distance in Feets(in minutes) y x1 x2
14 19.75 6 46215 24.00 9 44816 29.00 10 77617 15.35 6 20018 19.00 7 13219 9.50 3 3620 35.10 17 77021 17.90 10 14022 52.32 26 81023 18.75 9 45024 19.83 8 63525 10.75 4 150
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 49 / 69
DELIVERY TIME PROBLEM
Least Square Fit of the Delivery Time DataObs. yi yi ei Weight
1 .166800E+02 .217081E+02 -.502808E+01 .100000E+012 0115000E+02 .103536E+02 .114639E+01 .100000E+013 .120300E+02 .120798E+02 -.497937E-01 .100000E+014 .148800E+02 .995565E+01 .492435E+01 .100000E+015 .137500E+02 .141944E+02 -.444398E+00 .100000E+016 .181100E+02 .183996E+02 -.289574E+00 .100000E+017 .800000E+01 .715538E+01 .844624E+00 .100000E+018 .178300E+02 .166734E+02 .115660E+02 .100000E+019 .792400E+02 .718203E+02 .741971E+01 .100000E+01
10 .215000E+02 .191236E+02 .237641E+01 .100000E+0111 .403300E+02 .380925E+02 .223749E+01 .100000E+0112 .2100000E+02 .215930E+02 -.593041E+00 .100000E+0113 .135000E+02 .124730E+02 .102701E+01 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 50 / 69
DELIVERY TIME PROBLEM
Obs. yi yi ei Weight
14 .197500E+02 .186825E+02 .106754E+01 .100000E+0115 .240000E+02 .233288E+02 .671202E+00 .100000E+0116 .290000E+02 .296629E+02 -.662928E+00 .100000E+0117 .153500E+02 .149136E+02 .436360E+00 .100000E+0118 .190000E+02 .155514E+02 .344862E+01 .100000E+0119 .950000E+01 .770681E+01 .179319E+01 .100000E+0120 .351000E+02 .408880E+02 -.578797E+01 .100000E+0121 .179000E+02 .205142E+02 -.261418E+01 .100000E+0122 .523200E+02 .560065E+02 -.368653E+01 .100000E+0123 .187500E+02 .233576E+02 -.460757E+01 .100000E+0124 .198300E+02 .244029E+02 -.457285E+01 .100000E+0125 .107500E+02 .109626E+02 -.212584E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 51 / 69
DELIVERY TIME PROBLEM
Accordingly we have the following values for the
parameters:
β0 = 2.3412
β1 = 1.6159
β2 = 0.014385 Thus we have the regression line as
follows:
yi = 2.3412 + 1.6159x1 + 0.014385x2 (24)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 52 / 69
DELIVERY TIME PROBLEM
Huber’s t-Function, t=2Obs. yi yi ei Weight
1 .166800E+02 .217651E+02 -.508511E+01 .639744E+002 .115000E+02 .109809E+02 .519115E+00 .100000E+013 .120300E+02 .126296E+02 -.599594E+00 .100000E+014 .148800E+02 .105856E+02 .429439E+01 .757165E+005 .137500E+02 .146038E+02 -.853800E+00 .100000E+016 .181100E+02 .186051E+02 -.495085E+00 .100000E+017 .800000E+01 .794135E+01 .586521E-01 .100000E+018 .178300E+02 .169564E+02 .873625E+00 .100000E+019 .792400E+02 .692795E+02 .996050E+01 .327017E+00
10 .215000E+02 .193269E+02 .217307E+01 .100000E+0111 .403300E+02 .372777E+02 .305228E+01 .100000E+0112 .210000E+02 .216097E+02 -.609734E+00 .100000E+0113 .135000E+02 .129900E+02 .510021E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 53 / 69
DELIVERY TIME PROBLEM
Obs. yi yi ei Weighti
14 .197500E+02 .188904E+02 .859556E+00 .100000E+0115 .240000E+02 .232828E+02 .717244E+00 .100000E+0116 .290000E+02 .293174E+02 -.317449E+00 .100000E+0117 .153500E+02 .152908E+02 .592377E-01 .100000E+0118 .190000E+02 .158847E+02 .311529E+01 .100000E+0119 .950000E+01 .845286E+01 .104714E+01 .100000E+0120 .351000E+02 .399326E+02 -.483256E+01 .672828E+0021 .179000E+02 .205793E+02 -.267929E+01 .100000E+0122 .523200E+02 .542361E+02 -.191611E+01 .100000E+0123 .187500E+02 .233102E+02 -.456023E+01 .713481E+0024 .198300E+02 .243238E+02 .449377E+01 .723794E+0025 .107500E+02 .115474E+02 -.797359E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 54 / 69
DELIVERY TIME PROBLEM
Accordingly we get the values of the parameters as
follows: β0 = 3.3736
β1 = 1.5282
β2 = 0.013739
Thus we get the regression line as follows:
yi = 3.3736 + 1.5282x1 + 0.013739x2 (25)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 55 / 69
DELIVERY TIME PROBLEM
Andrew’s Wave Function with a = 1.48Obs. yi yi ei Weight
i
1 .166800E+02 .216430E+02 -.496300E+01 .427594E+002 .115000E+02 .116923E+02 -.192338E+00 .998944E+003 .120300E+02 .131457E+02 .-.111570E+01 .964551E+004 .148800E+02 .114549E+02 .342506E+01 .694894E+005 .137500E+02 .152191E+02 -.146914E+01 .939284E+006 .181100E+01 .188574E+02 -.747381E+00 .984039E+007 .800000E+01 .890189E+01 .901888E+00 .976864E+008 .178300E+02 ..174040E+02 ..425984E+00 .994747E+009 .792400E+02 .660818E+02 .131582E+02 .0
10 .215000E+02 .192716E+02 .222839E+01 .863633E+0011 .403300E+02 .363170E+02 .401296E+01 .597491E+0012 .210000E+02 .218392E+02 -.839167E+00 .980003E+0013 .135000E02 .135744E+02 -.744338E+01 .999843E+00
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 56 / 69
DELIVERY TIME PROBLEM
Obs. yi yi ei Weighti
14 .197500E+02 .198979E+02 .752115E+00 .983877E+0015 .240000E+02 .232029E+02 .797080E+00 .981854E+0016 ..290000E+02 .286336E+02 .366350E+00 .996228E+0017 .153500E+02 .158247E+02 -.474704E+00 .993580E+0018 .190000E+02 .164593E+02 .254067E+01 .824146E+0019 .950000E+01 .946384E+01 .361558E-01 .999936E+0020 .351000E+02 .387684E+02 -.366837E+01 .655336E+0021 .179000E+02 .209308E+02 -.303081E+01 .756603E+0022 .523200E+02 .523766E+02 -.566063E-01 .999908E+0023 .187500E+02 .232271E+02 .-.447714E+01 .515506E+0024 .198300E+02 .240095E+02 -.417955E+01 .567792E+0025 .107500E+02 .123027E+02 -1.55274E+01 .932266E+00
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 57 / 69
DELIVERY TIME PROBLEM
Thus we have the estimates as follows:
β0 = 4.6532
β1 = 1.4582
β2 = 0.012111
Thus we get the regression line as follows:
yi = 4.6532 + 1.4582x1 + 0.012111x2 (26)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 58 / 69
ANALYSIS
Computing M-Estimators
Robust regression methods are not an option in most
statistical software today.
SAS, PROC, NLIN etc can be used to implement
iteratively reweighted least squares procedure.
There are also Robust procedures available in S-Pluz.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 59 / 69
ANALYSIS
Robust Regression Methods...
Robust regression methods have much to offer a data
analyst.
They will be extremly helpful in locating outliers and
hightly influential observations.
Whenever a least squares analysis is perfomed it would
be useful to perform a robust fit also.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 60 / 69
ANALYSIS
If the results of both the fit are in substantial
agreement, the use of Least Square Procedure offers a
good estimation of the parameters.
If the results of both the procedures are not in
agreement, the reason for the difference should be
identified and corrected.
Special attention need to be given to observations
that are down weighted in the robust fit.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 61 / 69
PROPERTIES
Breakdown Point The finite sample breakdown point is
the smallest fraction of anomalous data that can cause the
estimator to be useless. The smallest possible breakdown
poit is 1n , i.e. s single observation can distort the estimator
so badly that it is of no practical use to the regression
model builder. The breakdown point of OLS is 1n .
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 62 / 69
PROPERTIES
M-estimators can be affected by x-space outliers in an
identical manner to OLS.
Consequently, the breakdown point of the class of m
estimators is 1n as well.
We would generally want the breakdown point of an
estimator to exceed 10%.
This has led to the development of High Breakdown
point estimators.SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 63 / 69
PROPERTIES
Efficiency
The M estimators have a higher efficiency than the least
squares, i.e. they behave well even as the size of the
sample increases to ∞.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 64 / 69
SURVEY OF OTHER ROBUSTREGRESSION ESTIMATORS
High Break Down Point Estimators Because both the
OLS and M-estimator suffer from a low breakdown point
1n , considerable effort has been devoted to finding
estimators that perform better with respect to this
property. Often a break down point of 50% is desirable.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 65 / 69
SURVEY OF OTHER ROBUSTREGRESSION ESTIMATORS
There are various other estimation procedures like
Least Median of Squares
Least Trimmed Sum of Squres
S Estimators
R and L Estimators
Robust Ridge regression
MM Estimation etc.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 66 / 69
ABSTRACT & CONCLUSION
Review ⇒ Robustness and Resistance ⇒Our Approach ⇒ Strengths and Weaknesses
⇒ M-Estimators ⇒ Delivery time
problem ⇒ Analysis ⇒ Properties ⇒Survey of other Robust Regression Estimators
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 67 / 69
REFERENCE
1 Draper, R Norman. & Smith, Harry. “Applied Regression
Analysis”, 3rd edn., John Wiley and Sons, New York, 1998.
2 Montgomery, C Douglas. Peck, A Elizabeth. & Vining, Geoffrey
G. “Introduction to Linear Regression Analysis”, 3rd edn., Wiley
India, 2003.
3 Brook J, Richard. “Applied Regression Analysis and
Experimental Design”, Chapman & Hall, London, 1985.
4 Rawlings O, John. “Applied Regression Analysis: A Research
Tool”, Springer, New York, 1989.
5 Pedhazar, Elazar J. “Multiple Regression in Behavioural Research:
Explanation and Prediction”, Wadsworth, Australia, 1997SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 68 / 69
THANK YOU
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 69 / 69