Robust Regression

39
 1 Robust Regression By Will Garner

description

Distribusi Robust

Transcript of Robust Regression

  • 1Robust Regression

    By Will Garner

  • 21. Motivation

    Under the assumption that errors in regression models are normally distributed, the least squares estimate was the most efficient unbiased estimate of the bis

    What happens if the errors are not normally distributed?

  • 31. Motivation

    We need regression methods that are not as sensitive to outliers

    This leads to Robust Regression

  • 42. Robust Regression

    There are two popular remedies for this problem

    We can measure size in some other way, by replacing the square e2 by some other function r(e) which reflects the same of the residual in a less extreme way. To be sensible, we should have that r be symmetric [r(e) = r(-e)] r be positive [r(e) 0 for all e] r be monotone [r(e1) r(e2) if |e1| |e2|]

  • 52. Robust Regression

    An Example of this type of regression is M-estimation

    Suppose that the observed responses Yi are independent and have density functions

    Note: If f is the standard normal density, then is just the standard regression model and s is the standard deviation.

    1( ) i ii iYf Y f

    =

    x

  • 62.1. M-Estimation

    The log likelihood is given by

    Setting r = -log f, we have 1

    ( , ) log logn

    i i

    ii

    Yl n f

    =

    = +

    x

    1( , ) log

    ni i

    i

    Yl n

    =

    = +

    x

  • 72.1. M-Estimation

    Let s = s and ei(b) = Yi xib. Thus, to estimate b and s using maximum likelihood, we must minimize

    as a function of b and s. 1

    ( )logn

    i

    i

    en s

    s

    =

    +

    b

  • 82.1. M-Estimation

    Differentiating gives us:

    where y = r.

    1

    ( ) 0n

    ii

    i

    e

    s

    =

    =

    bx

    1

    ( ) ( )n

    ii

    i

    ee ns

    s

    =

    =

    b b

  • 92.1. M-Estimation

    If we do not have a requirement for f, then we can choose r to make the estimate robust by choosing a r for which y = r is bounded. Hence, we can generalize the above to the estimating equations

    and

    where c is also chosen the make the scale estimate robust. These estimates are called M-estimates, since their definition is motivated by the maximum likelihood estimating equations above.

    1

    ( ) 0n

    ii

    i

    e

    s

    =

    =

    bx

    1

    ( ) 0n

    i

    i

    e

    s

    =

    =

    b

  • 10

    2. Robust Regression

    Example: Let r(x) = (1/2)x2. Then reduces to the normal equations, , with the solution that is the least squares estimate (LSE). gives us the maximum likelihood estimate

    2 2

    1

    1

    ( )n

    ii

    en

    =

    =

  • 11

    2. Robust Regression

    Example: Let r(x) = |x|. We have that a value of bthat minimizes the log likelihood also satisfies

    This is know as the L1 estimate. Note: The L1 estimate is called the LAD (Least

    Absolute Deviations) estimate. Note: b need not be unique

    1( )

    n

    ii

    e=

    b

  • 12

    2. Robust Regression

    Example: Let

    Setting k = 1.5, we have a reasonable compromise between least squares (the greatest efficiency at the normal model) and L1 estimation, which gives more protection from outliers.

    ( )k x k

    x x k x kk x k

    <

    = >

  • 13

    2. Robust Regression

    Example: There is also the Mean Absolute Deviation (MAD), which is found by setting

    where c solves ; c = 1.4326

    1( ) sgn( )c

    z z =

    ( )11 34 0.6749c = =

  • 14

    2. Robust Regression

    Regression coefficients found using M-estimators are close to least squares estimators if the errors are normal, but are much more robust if the error distribution has heavy tails.

    However, M-estimates of regression coefficients are just as vulnerable as least squares estimates to outliers in the explanatory variables.

  • 15

    2. Robust Regression

    Another remedy is that we can replace the sum (or the mean) by a more robust measure of location, such as the median or a trimmed mean.

    Some examples are least median of squares (LMS) and least trimmed squares(LTS).

  • 16

    2. Robust Regression

    These estimates are very robust to outliers in both the errors and the explanatory variables, but can be unstable nonetheless.

    Small changes in non-extreme points can make a very large change in the fitted regression. (See Figure 1.)

  • 17

    2. Robust Regression

    x

    y

    B

    x

    y

    B

    Figure 1: Moving B from being collinear with the three points to being collinear with the other three points causes a drastic change in the regression line.

  • 18

    2. Robust Regression

    Furthermore, the LMS and LTS estimates are very inefficient if the data is actually normally distributed.

  • 19

    2. Robust Regression

    Thus, as we use more robust regression estimators on normal data, we have worse estimates. But, the further the data is away from being normal, the better the robust estimations will fit the data.

    Before we apply any regression analysis, we should first run a QQ-Plot to determine if the data is normally distributed. Depending on the results of the plot, we choose an appropriate model.

  • 20

    3. Measuring Robustness

    The next logical question to ask is how do we measure robustness?

    There are two common measures. The first is the breakdown point, which

    measures how well an estimate can resist baddata before it fails.

    The second measure is the influence curve, which tells us how much a single outlier affects the estimate.

  • 21

    3. Measuring Robustness

    Definition: The breakdown point of an estimate is the smallest fraction of the data that can be changed by an arbitrarily large amount and still cause an arbitrarily large change in the estimate.

  • 22

    3. Measuring Robustness

    Example: The breakdown point of the sample mean and the least squares estimate is 1/n.

    Example: The breakdown of the sample median is almost 1/2.

    Example: The breakdown point of the L1 estimator is also 1/n, though the L1 estimator looks at the LAD. This is the same of M-estimates.

    Example: The LMS and LTS estimates have breakdown points near 1/2.

  • 23

    4. Influence Curves

    Suppose that F is a k-dimensional distribution function and q is a population parameter that depends on F, so q = T(F). Tis called a statistical functional, since it is a function of a function.

  • 24

    4. Influence Curves

    The influence curve (IC) of a statistical functional T is the derivative with respect to tof T(Ft) evaluated at t = 0. It is a measure of the rate at which T responds to a small amount of contamination at z0.

  • 25

    4. Influence Curves

    The mean is highly nonrobust The least squares error is not robust M-estimates are not robust with respect to

    high-leverage points (outliers in the explanatory variables).

  • 26

    4. Influence Curves

    The robust estimators discussed so far are not entirely satisfactory, since those with high breakdown (such as LMS and LTS) have poor efficiency and the efficient M-estimators are not robust in the explanatory variables and have breakdown points of zero.

  • 27

    4. Influence Curves

    The natural question to ask is whether there are other estimates that have high breakdown points but much greater efficiency than LMS or LTS. It turns out that there are better estimates. We shall discuss two more estimators.

  • 28

    4. Influence Curves

    If we apply a weight function chosen to make the IC bounded, the resulting estimates are called bounded influence estimates or generalized M-estimates (GM-estimates).

  • 29

    4. Influence Curves

    To bound the IC, the weights are chosen in such a way that reduce the impact of high-leverage points. However, including a high-leverage points that is not an outlier increases the efficiency of the estimate.

  • 30

    4. Influence Curves

    That is, if we fit a regression line to a set of data and then we get another sample point that is far away from our other data points, but is close to the regression line, then the efficiency of the estimate increases. (See Figure 2.)

  • 31

    4. Influence Curves

    Figure 2: A good outlier

    x

    y

  • 32

    4. Influence Curves

    Hence, we include the weight function in the denominator so that the effect of a small residual at a high-leverage point will be magnified.

    The weights can be chosen to minimize the asymptotic variance of the estimates. This leads to weights of the form w(x) = ||Ax||-1, for some matrix A.

  • 33

    4. Influence Curves

    Note: The breakdown point of these estimates is better than an M-estimate, but cannot exceed 1/p, where p is the rank of X.

  • 34

    4. Influence Curves

    The estimating equation is usually solved iteratively by Newtons method of Fisher scoring, using some other estimate as a starting value.

  • 35

    4. Influence Curves

    There are combinations of high breakdown estimates with GM-estimates. These use a high breakdown estimate as a starting value and then use an iterative method. This is called a one-step GM-estimate. Hence, one gets a breakdown point of roughly 50% that is rather efficient.

  • 36

    5. S-Estimators

    We can think of the average size of the residuals as a measure of their dispersion, so we can consider more general regression estimators based on some dispersion or scale estimator s(e1, , en). This leads to minimizing D(b) = s[e1(b),en(b)], where s is a estimator of scale.

  • 37

    5. S-Estimators

    We define an S-estimator to be one in which we use s = s(e1, , en) defined by

    where K = E[r(Z)] for a standard normal Z, ris strictly increasing on [0, c] and constant on (c, ).

    ( )1

    1 n ii

    e

    sK

    n

    =

    =

  • 38

    5. S-Estimators

    Note: The breakdown point of such an estimate can be made close to 50% with a suitable choice of r. The biweight function

    is a popular choice. For c = 1.547, the breakdown point is just under 50%

    and the efficiency at the normal distribution is roughly 29%.

    2 4 6

    2 4

    22 2 6( )

    / 6

    x x x

    c cx c

    x

    c x c

    + = >

  • 39

    5. S-Estimators

    Remark: There is another notable class of estimators, R-Estimators It is a blend of the Bounded Influence Estimators

    and S-Estimators This leads to a generalized S-Estimate as well as

    least quartile difference (LQD) estimate and least trimmed difference (LTD) estimate