2013 06 19 Robust Regression

download 2013 06 19 Robust Regression

of 101

Transcript of 2013 06 19 Robust Regression

  • 8/13/2019 2013 06 19 Robust Regression

    1/101

  • 8/13/2019 2013 06 19 Robust Regression

    2/101

  • 8/13/2019 2013 06 19 Robust Regression

    3/101

    Introduction

    Introduction

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 3/45

    I d i

  • 8/13/2019 2013 06 19 Robust Regression

    4/101

    Introduction

    Modern regression

    (OLS Ordinary Least Squares) regression:One of the most widely used statistical methods in socialsciences.

    However, we will see that OLS regression does have limitations.

    Modern regression methods can be seen as improvements/alternatives to the usual regression model.

    Robust regression is one such modern method.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 4/45

    I t d ti

  • 8/13/2019 2013 06 19 Robust Regression

    5/101

    Introduction

    Robust regression

    The termrobusthas multiple interpretations in estimationframeworks:

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 5/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    6/101

    Introduction

    Robust regression

    The termrobusthas multiple interpretations in estimationframeworks:

    Robustness of validity: Resistance of the estimator to unusualobservations.A robustestimator should not suffer big changes when small changes are

    made to the data.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 5/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    7/101

    Introduction

    Robust regression

    The termrobusthas multiple interpretations in estimationframeworks:

    Robustness of validity: Resistance of the estimator to unusualobservations.A robustestimator should not suffer big changes when small changes are

    made to the data.

    Robustness of efficiency: Resistance of the estimator toviolations of underlying distributional assumptions.A robustestimator should keep high precision (i.e., small SEs) when

    distributional assumptions are violated.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 5/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    8/101

    Introduction

    Robust regression

    The termrobusthas multiple interpretations in estimationframeworks:

    Robustness of validity: Resistance of the estimator to unusualobservations.A robustestimator should not suffer big changes when small changes are

    made to the data.

    Robustness of efficiency: Resistance of the estimator toviolations of underlying distributional assumptions.A robustestimator should keep high precision (i.e., small SEs) when

    distributional assumptions are violated.

    We will mostly focus onrobustness of validityin regression.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 5/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    9/101

    Introduction

    Limitations of OLS regression: Example

    Jasso, G. (1985). Marital coital frequency and the passage of time: Estimating

    the separate effects of spouses ages and marital duration, birth and marriagecohorts, and period influences. American Sociological Review, 50(2), 224-41.doi:10.2307/2095411

    Jasso (1985) found that wifes age had apositive effecton the

    monthly coital frequency of married couples, after controlling forperiod and cohort effects.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 6/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    10/101

    Introduction

    Limitations of OLS regression: Example

    Jasso, G. (1985). Marital coital frequency and the passage of time: Estimating

    the separate effects of spouses ages and marital duration, birth and marriagecohorts, and period influences. American Sociological Review, 50(2), 224-41.doi:10.2307/2095411

    Jasso (1985) found that wifes age had apositive effecton the

    monthly coital frequency of married couples, after controlling forperiod and cohort effects.

    Kahn and Udry (1986) questioned her findings:

    They found four miscodes in the dataset. They found four additional outliersusing model diagnostics. They claim Jasso failed to consider a relevant interaction effect

    (length of marriage by wifes age): Model misspecification.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 6/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    11/101

    Limitations of OLS regression: Example

    Jasso, G. (1985). Marital coital frequency and the passage of time: Estimating

    the separate effects of spouses ages and marital duration, birth and marriagecohorts, and period influences. American Sociological Review, 50(2), 224-41.doi:10.2307/2095411

    Jasso (1985) found that wifes age had apositive effecton the

    monthly coital frequency of married couples, after controlling forperiod and cohort effects.

    Kahn and Udry (1986) questioned her findings:

    They found four miscodes in the dataset. They found four additional outliersusing model diagnostics. They claim Jasso failed to consider a relevant interaction effect

    (length of marriage by wifes age): Model misspecification.

    Total sample size: 2062.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 6/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    12/101

    Limitations of OLS regression: Example

    Model 1 Model 2Period 0.72 0.67

    Log Wifes Age 27.61 13.56Log Husbands Age 6.43 7.87Log Marital Duration 1.50 1.56

    Wife Pregnant 3.71

    3.74

    Child Under 6 0.56 0.68

    Wife Employed 0.37 0.23Husband Employed 1.28 1.10

    R2 .0475 .0612

    n 2062 2054

    Note. p< .10, p< .05, p< .01.

    Model 1: Jassos (1985) original model.Model 2: Kahn and Udrys (1986) model excluding 4 miscodes and 4

    outliers.Workshop Modern methods for robust regression, (# 152, Robert Andersen) 7/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    13/101

    Limitations of OLS regression: Example

    Some conclusions:

    A small number of unusual observations can have a large effecton the estimated regression coefficients.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 8/45

  • 8/13/2019 2013 06 19 Robust Regression

    14/101

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    15/101

    Limitations of OLS regression: Example

    Some conclusions:

    A small number of unusual observations can have a large effecton the estimated regression coefficients.

    Large samples arenotimmune to this problem.(Previous example: 8/2062= 0.39% of data.)

    Using diagnostic tools to uncover potential problems is a crucial(and often disregarded) analysis step.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 8/45

    Introduction

  • 8/13/2019 2013 06 19 Robust Regression

    16/101

    Limitations of OLS regression: Example

    Some conclusions:

    A small number of unusual observations can have a large effecton the estimated regression coefficients.

    Large samples arenotimmune to this problem.(Previous example: 8/2062= 0.39% of data.)

    Using diagnostic tools to uncover potential problems is a crucial(and often disregarded) analysis step.

    The decision on what to do when influential observations arefound should be based onsubstantive knowledge(i.e., no one-way-out solution exists).

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 8/45

  • 8/13/2019 2013 06 19 Robust Regression

    17/101

    Important background

  • 8/13/2019 2013 06 19 Robust Regression

    18/101

    Properties of estimators

    Assessing whether an estimator is robustrequires checking severalmathematical properties.

    Notation:

    population parameter that we intend to estimate

    T estimator for Y sample (Y = (y1, . . . , yn))

    estimate of (T(Y) =

    )

    n sample size

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 10/45

    Important background

  • 8/13/2019 2013 06 19 Robust Regression

    19/101

    Properties of estimators

    1 Bias: Does the estimator give, on average, the desiredparameter?

    bias=E[ ]

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 11/45

  • 8/13/2019 2013 06 19 Robust Regression

    20/101

  • 8/13/2019 2013 06 19 Robust Regression

    21/101

    Important background

  • 8/13/2019 2013 06 19 Robust Regression

    22/101

    Properties of estimators

    4 Influence function(IF): Localmeasure of the resistance of anestimator.Ideal: Having aboundedinfluence function, implying that onesingle observation has limited influence on the estimator.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 12/45

    Important background

  • 8/13/2019 2013 06 19 Robust Regression

    23/101

    Properties of estimators

    4 Influence function(IF): Localmeasure of the resistance of anestimator.Ideal: Having aboundedinfluence function, implying that onesingle observation has limited influence on the estimator.

    5 Relative efficiency: Ratio ofMSEs of the estimator withsmallest MSE (say TOpt) and estimator T

    Relative efficiency=

    MSE(TOpt)

    MSE(T)

    0 Rel. Effic. 1, the larger the better.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 12/45

    Important background

  • 8/13/2019 2013 06 19 Robust Regression

    24/101

    Properties of estimators

    Fact:

    OLS estimators are the most efficient (i.e., with smallestMSE) among all unbiased linear estimators if all theregression model assumption are met.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 13/45

    Important background

  • 8/13/2019 2013 06 19 Robust Regression

    25/101

    Properties of estimators

    Fact:

    OLS estimators are the most efficient (i.e., with smallestMSE) among all unbiased linear estimators if all theregression model assumption are met.

    As a consequence:Relative efficiency(more precisely, asymptoticefficiency) ofrobust estimators is assessed in comparison to the OLSestimators when model assumptions are met.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 13/45

  • 8/13/2019 2013 06 19 Robust Regression

    26/101

    Important background

  • 8/13/2019 2013 06 19 Robust Regression

    27/101

    Measures of location

    Measure of location: Quantity that characterizes a position in a

    distribution

    Statistic Formula BDP IF In current use in

    robust regression

    Mean y=

    i

    yi

    n 0 Unbounded No

    -trimmed mean yt= y(g+1)++y(ng)

    n2g Bounded Yes

    Median M=Q.50 .50 Bounded Yes

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 14/45

    Important background

  • 8/13/2019 2013 06 19 Robust Regression

    28/101

    Measures of location: Influence function

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 15/45

  • 8/13/2019 2013 06 19 Robust Regression

    29/101

  • 8/13/2019 2013 06 19 Robust Regression

    30/101

    Robustness, resistance, and OLS regression

    Cl ifi i f l b i

  • 8/13/2019 2013 06 19 Robust Regression

    31/101

    Classification of unusual observations

    Observation Definition

    Does it affect

    regression estimates?

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 18/45

    Robustness, resistance, and OLS regression

    Cl ifi i f l b i

  • 8/13/2019 2013 06 19 Robust Regression

    32/101

    Classification of unusual observations

    Observation Definition

    Does it affect

    regression estimates?

    Univariate outlier Outlier ofx ory

    Not necessarily(unconditional on each other)

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 18/45

    Robustness, resistance, and OLS regression

    Cl ifi ti f l b ti

  • 8/13/2019 2013 06 19 Robust Regression

    33/101

    Classification of unusual observations

    Observation Definition

    Does it affect

    regression estimates?

    Univariate outlier Outlier ofx ory

    Not necessarily(unconditional on each other)

    Regressionoutlier Unusual yvalue for a given x Not necessarily (A)

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 18/45

    Robustness, resistance, and OLS regression

    Cl ifi ti f l b ti

  • 8/13/2019 2013 06 19 Robust Regression

    34/101

    Classification of unusual observations

    Observation Definition

    Does it affect

    regression estimates?

    Univariate outlier Outlier ofx ory

    Not necessarily(unconditional on each other)

    Regressionoutlier Unusual yvalue for a given x Not necessarily (A)

    Observation withleverage Unusual x value Not necessarily (B)

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 18/45

    Robustness, resistance, and OLS regression

    Cl ssifi ti f s l bs ti s

  • 8/13/2019 2013 06 19 Robust Regression

    35/101

    Classification of unusual observations

    Observation Definition

    Does it affect

    regression estimates?

    Univariate outlier Outlier ofx ory

    Not necessarily(unconditional on each other)

    Regressionoutlier Unusual yvalue for a given x Not necessarily (A)

    Observation withleverage Unusual x value Not necessarily (B)

    Observation withinfluence Regression estimates change

    Yes (C)greatly with/without them

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 18/45

  • 8/13/2019 2013 06 19 Robust Regression

    36/101

  • 8/13/2019 2013 06 19 Robust Regression

    37/101

    Robustness, resistance, and OLS regression

    Classification of unusual observations

  • 8/13/2019 2013 06 19 Robust Regression

    38/101

    Classification of unusual observations

    Conclusion:

    Being x-discrepant (leverage) ory-discrepant (regressionoutlier)is not sufficient to identifyinfluentialobservations.

    However, it is a combination of both x- and y- discrepanciesthat determines the influence of an observation.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 19/45

    Robustness, resistance, and OLS regression

    Classification of unusual observations

  • 8/13/2019 2013 06 19 Robust Regression

    39/101

    Classification of unusual observations

    Conclusion:

    Being x-discrepant (leverage) ory-discrepant (regressionoutlier)is not sufficient to identifyinfluentialobservations.

    However, it is a combination of both x- and y- discrepanciesthat determines the influence of an observation.

    Important note:Observations with high leverages that follow the main regressiontrend helpdecreasingthe SE of the estimates! Plot B

    SE(b) = se

    i(xi x)2

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 19/45

    Robustness, resistance, and OLS regression

    Classification of unusual observations

  • 8/13/2019 2013 06 19 Robust Regression

    40/101

    Classification of unusual observations

    Conclusion:

    Being x-discrepant (leverage) ory-discrepant (regressionoutlier)is not sufficient to identifyinfluentialobservations.

    However, it is a combination of both x- and y- discrepanciesthat determines the influence of an observation.

    Important note:Observations with high leverages that follow the main regressiontrend helpdecreasingthe SE of the estimates! Plot B

    SE(b) = se

    i(xi x)2

    How do we identifyregressionoutliers, observations withleverage,and observations withinfluence?

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 19/45

    Robustness, resistance, and OLS regression

    Identifying unusual observations

  • 8/13/2019 2013 06 19 Robust Regression

    41/101

    Identifying unusual observations

    n = number of observations (i=1, . . . , n)

    k= number of predictors (j=1, . . . , k)

    Detecting. . . Use. . . Rule of thumb?

    Test? Plot?Flag if...

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 20/45

    Robustness, resistance, and OLS regression

    Identifying unusual observations

  • 8/13/2019 2013 06 19 Robust Regression

    42/101

    Identifying unusual observations

    n = number of observations (i=1, . . . , n)

    k= number of predictors (j=1, . . . , k)

    Detecting. . . Use. . . Rule of thumb?

    Test? Plot?Flag if...

    Leverage Hat values hi >2(k+ 1)/n a Index plot

    a. For large samples. Use hi >3(k+1)/n for small samples.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 20/45

  • 8/13/2019 2013 06 19 Robust Regression

    43/101

    Robustness, resistance, and OLS regression

    Identifying unusual observations

  • 8/13/2019 2013 06 19 Robust Regression

    44/101

    Identifying unusual observations

    n = number of observations (i=1, . . . , n)

    k= number of predictors (j=1, . . . , k)

    Detecting. . . Use. . . Rule of thumb?

    Test? Plot?Flag if...

    Leverage Hat values hi >2(k+ 1)/n a Index plot

    Regression outliers Studentized residuals | | 2 Yesb QQ-plot

    InfluenceDFBETAs | | 2/n Index plotCooks D

    > .5 c Index plot

    >4/(n k 1)d

    a. For large samples. Use hi >3(k+1)/n for small samples.b. Controlling for capitalization by chance is required.c. According to Cook and Weisberg (1999).d. According to Fox (1997).

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 20/45

    Robustness, resistance, and OLS regression

    Identifying unusual observations: Example

  • 8/13/2019 2013 06 19 Robust Regression

    45/101

    Identifying unusual observations: Example

    Weakliem, D.L., Andersen, R., & Heath, A.F. (2005). By popular demand:

    The effect of public opinion on income quality. Comparative Sociology, 4(3),261-284. doi:10.1163/156913305775010124

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 21/45

    Robustness, resistance, and OLS regression

    Identifying unusual observations: Example

  • 8/13/2019 2013 06 19 Robust Regression

    46/101

    Identifying unusual observations: Example

    Weakliem, D.L., Andersen, R., & Heath, A.F. (2005). By popular demand:

    The effect of public opinion on income quality. Comparative Sociology, 4(3),261-284. doi:10.1163/156913305775010124

    We will focus on a subset of the original data: Sample: n=26 countries with democracies

  • 8/13/2019 2013 06 19 Robust Regression

    47/101

    y g p

    Weakliem, D.L., Andersen, R., & Heath, A.F. (2005). By popular demand:

    The effect of public opinion on income quality. Comparative Sociology, 4(3),261-284. doi:10.1163/156913305775010124

    We will focus on a subset of the original data: Sample: n=26 countries with democracies

  • 8/13/2019 2013 06 19 Robust Regression

    48/101

    y g p

    Weakliem, D.L., Andersen, R., & Heath, A.F. (2005). By popular demand:

    The effect of public opinion on income quality. Comparative Sociology, 4(3),261-284. doi:10.1163/156913305775010124

    We will focus on a subset of the original data: Sample: n=26 countries with democracies

  • 8/13/2019 2013 06 19 Robust Regression

    49/101

  • 8/13/2019 2013 06 19 Robust Regression

    50/101

  • 8/13/2019 2013 06 19 Robust Regression

    51/101

    Robustness, resistance, and OLS regression

    Identifying unusual observations: Example

  • 8/13/2019 2013 06 19 Robust Regression

    52/101

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 23/45

  • 8/13/2019 2013 06 19 Robust Regression

    53/101

    Robustness, resistance, and OLS regression

    Identifying unusual observations: Example

  • 8/13/2019 2013 06 19 Robust Regression

    54/101

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 23/45

  • 8/13/2019 2013 06 19 Robust Regression

    55/101

  • 8/13/2019 2013 06 19 Robust Regression

    56/101

  • 8/13/2019 2013 06 19 Robust Regression

    57/101

    Robustness, resistance, and OLS regression

    Identifying unusual observations: Example

  • 8/13/2019 2013 06 19 Robust Regression

    58/101

    Conclusion:

    Slovakia and Czech Republic seem to have a large influence onthe estimation of the regression coefficients. This can be confirmed:

    OLS OLS(all countries) (omitting cz and sk)

    b SE b SE

    Intercept .028 .128 .107* .058Gini .00074 .0028 .00527*** .0013GDP .0175** .0079 .0063 .0037

    s .138 .0602

    R2

    .175 .4622n 26 24

    Note. p< .10, p< .05, p< .01.

    Lets now look into better alternatives to OLS!

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 24/45

    Robust regression for the linear model

  • 8/13/2019 2013 06 19 Robust Regression

    59/101

    Robust regression for the

    linear model

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 25/45

  • 8/13/2019 2013 06 19 Robust Regression

    60/101

  • 8/13/2019 2013 06 19 Robust Regression

    61/101

  • 8/13/2019 2013 06 19 Robust Regression

    62/101

    Robust regression for the linear model

    Various robust regression models

  • 8/13/2019 2013 06 19 Robust Regression

    63/101

    Type Estimator Breakdown Bounded Asymptotic

    Point Infl. Func. Efficiency

    OLS 0 No 100%

    L-EstimatorsLAV (Least Absolute Values) 0 Yes 64%LMS (Least Median of Squares) .5 Yes 37%

    LTS (Least Trimmed Squares) .5 Yes 8%

    R-Estimators Bounded influence estimator < .2 Yes 90%

    M-EstimatorsM-estimates (Huber, biweight) 0 No 95%GM-estimates (Mal.& Schw.) 1/(p+1) Yes 95%GM-estimates (S1S) .5 Yes 95%

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 26/45

  • 8/13/2019 2013 06 19 Robust Regression

    64/101

    Robust regression for the linear model

    Various robust regression models

  • 8/13/2019 2013 06 19 Robust Regression

    65/101

    Type Estimator Breakdown Bounded Asymptotic

    Point Infl. Func. Efficiency

    OLS 0 No 100%

    L-EstimatorsLAV (Least Absolute Values) 0 Yes 64%LMS (Least Median of Squares) .5 Yes 37%

    LTS (Least Trimmed Squares) .5 Yes 8%

    R-Estimators Bounded influence estimator < .2 Yes 90%

    M-EstimatorsM-estimates (Huber, biweight) 0 No 95%GM-estimates (Mal.& Schw.) 1/(p+1) Yes 95%GM-estimates (S1S) .5 Yes 95%

    S-Estimators S-estimates .5 Yes 33%

    GS-estimates .5 Yes 67%

    MM-Estimators MM-estimates .5 Yes 95%

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 26/45

  • 8/13/2019 2013 06 19 Robust Regression

    66/101

  • 8/13/2019 2013 06 19 Robust Regression

    67/101

    Robust regression for the linear model

    M-Estimators

  • 8/13/2019 2013 06 19 Robust Regression

    68/101

    Objectivefunctions:

    Weightfunctions:

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 29/45

  • 8/13/2019 2013 06 19 Robust Regression

    69/101

    Robust regression for the linear model

    GM-, S-, GS-, MM-estimators

  • 8/13/2019 2013 06 19 Robust Regression

    70/101

    GM-Estimators: Generalized M-estimators that take both vertical

    outliers and leverage into account: Vertical outliers: By using M-estimators.

    Leverage: Using hat values.

    Method to retain: Schweppe one-step estimator (S1S).

    S-Estimators: Minimize a robust M-estimate of the residual scale(instead ofY scale).

    GS-Estimators: Generalized S-estimates that increase efficiency.

    MM-Estimators: Rely on both M- (high efficiency) and S- (highBDP) estimators.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 31/45

    Robust regression for the linear model

    Example: Simulated data

  • 8/13/2019 2013 06 19 Robust Regression

    71/101

  • 8/13/2019 2013 06 19 Robust Regression

    72/101

    Robust regression for the linear model

    Example: Simulated data

  • 8/13/2019 2013 06 19 Robust Regression

    73/101

    Robust regression for the linear model

    Example: Simulated data

  • 8/13/2019 2013 06 19 Robust Regression

    74/101

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 32/45

  • 8/13/2019 2013 06 19 Robust Regression

    75/101

    Robust regression for the linear model

    Using robust regression as regression diagnostics

  • 8/13/2019 2013 06 19 Robust Regression

    76/101

    Common statistics measuring influence of observations (e.g., Cooks

    distance) are not robust against unusual obervations.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 34/45

  • 8/13/2019 2013 06 19 Robust Regression

    77/101

  • 8/13/2019 2013 06 19 Robust Regression

    78/101

  • 8/13/2019 2013 06 19 Robust Regression

    79/101

    Robust regression for the linear model

    Using robust regression as regression diagnostics

    E l I d l f b i i h

  • 8/13/2019 2013 06 19 Robust Regression

    80/101

    Example: Index plots of robust regression weights wi

    Idea: Weights indicate levels of unusualness (y- and/or x-discrepancies) the smaller, the more unusual.

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 35/45

  • 8/13/2019 2013 06 19 Robust Regression

    81/101

    Robust regression for the linear model

    Using robust regression as regression diagnostics

    E l RR l t (T k 1991)

  • 8/13/2019 2013 06 19 Robust Regression

    82/101

    Example: RR-plots (Tukey, 1991)

    Idea: Robust regression residuals are better than OLS residuals fordiagnosing outliers:

    OLS regression tries to produce normal-looking residuals

    even when the data themselves are not normal.(Rousseeuw & van Zomeren, 1990)

    Workshop Modern methods for robust regression, (# 152, Robert Andersen) 36/45 Robust regression for the linear model

    Using robust regression as regression diagnostics

    E a le RR lots (T ke 1991)

  • 8/13/2019 2013 06 19 Robust Regression

    83/101

    Example: RR-plots (Tukey, 1991)

    Idea: Robust regression residuals are better than OLS residuals fordiagnosing outliers:

    OLS regression tries to produce normal-looking residuals

    even when the data themselves are not normal.(Rousseeuw & van Zomeren, 1990)

    RR-plots: Residual-residual scatterplot matrix

    OLS assumptions hold = scatter around y=x line(OLS vsrobust regression residuals)

    Workshop Modern methods for robust regression (# 152 Robert Andersen) 36/45 Robust regression for the linear model

    Using robust regression as regression diagnostics

  • 8/13/2019 2013 06 19 Robust Regression

    84/101

    Workshop Modern methods for robust regression (# 152 Robert Andersen) 37/45

  • 8/13/2019 2013 06 19 Robust Regression

    85/101

    Standard errors for robust regression

    Standard errors for robust regression

    Analytical (asymptotic) SEs are available for only some types of

  • 8/13/2019 2013 06 19 Robust Regression

    86/101

    Analytical (asymptotic) SEs are available for only some types ofrobust regression: M- and GM-estimation. S- and GS-estimation. MM-estimation.

    These estimates are given by

    SE = Diagonalvar-cov matrix.

    Workshop Modern methods for robust regression (# 152 Robert Andersen) 39/45 Standard errors for robust regression

    Standard errors for robust regression

    Analytical (asymptotic) SEs are available for only some types of

  • 8/13/2019 2013 06 19 Robust Regression

    87/101

    Analytical (asymptotic) SEs are available for only some types ofrobust regression: M- and GM-estimation. S- and GS-estimation. MM-estimation.

    These estimates are given by

    SE = Diagonalvar-cov matrix. Some problems with SE: Unreliable for small n (say,

  • 8/13/2019 2013 06 19 Robust Regression

    88/101

    Analytical (asymptotic) SEs are available for only some types ofrobust regression: M- and GM-estimation. S- and GS-estimation. MM-estimation.

    These estimates are given by

    SE = Diagonalvar-cov matrix. Some problems with SE: Unreliable for small n (say,

  • 8/13/2019 2013 06 19 Robust Regression

    89/101

    Bootstrapping is useful to estimate difficult/unknown sampling

    distributions.

    Workshop Modern methods for robust regression (# 152 Robert Andersen) 40/45 Standard errors for robust regression

    Computing standard errors: Bootstrapping

    Bootstrapping is useful to estimate difficult/unknown sampling

  • 8/13/2019 2013 06 19 Robust Regression

    90/101

    Bootstrapping is useful to estimate difficult/unknown sampling

    distributions.Remember: If OLS assumptions are met, then OLS estimates for theSEs are better than bootstrap estimates!

    Workshop Modern methods for robust regression (# 152 Robert Andersen) 40/45

  • 8/13/2019 2013 06 19 Robust Regression

    91/101

  • 8/13/2019 2013 06 19 Robust Regression

    92/101

  • 8/13/2019 2013 06 19 Robust Regression

    93/101

    Standard errors for robust regression

    Computing bootstrapping CIs

    Having estimated regression coefficients plustheir associated

  • 8/13/2019 2013 06 19 Robust Regression

    94/101

    g g p

    (bootstrapped) SEs, it is now possible to compute (1

    )% CI.

    Workshop Modern methods for robust regression (# 152 Robert Andersen) 42/45 Standard errors for robust regression

    Computing bootstrapping CIs

    Having estimated regression coefficients plustheir associated

  • 8/13/2019 2013 06 19 Robust Regression

    95/101

    (bootstrapped) SEs, it is now possible to compute (1 )% CI. There are some possibilities:

    Bootstrap t CI: When bias is small and the bootstrap samplingdistribution is roughly normally distributed.

    CI= tnk1,/2SE()

    Workshop Modern methods for robust regression (# 152 Robert Andersen) 42/45

  • 8/13/2019 2013 06 19 Robust Regression

    96/101

    Standard errors for robust regression

    Computing bootstrapping CIs

    Having estimated regression coefficients plustheir associated

  • 8/13/2019 2013 06 19 Robust Regression

    97/101

    (bootstrapped) SEs, it is now possible to compute (1 )% CI. There are some possibilities:

    Bootstrap t CI: When bias is small and the bootstrap samplingdistribution is roughly normally distributed.

    CI= tnk1,/2SE() Bootstrap percentile CI: When bias is small but the bootstrap

    sampling distribution deviates from the normal distribution.

    CI= (q/2(), q1/2()), =bootstrapped dist.

    Bias-corrected percentile CI: When bias is large (e.g., for smallsample sizes).

    W ksh M d th ds f b st ssi (# 152 R b t A d s ) 42/45

  • 8/13/2019 2013 06 19 Robust Regression

    98/101

    Standard errors for robust regression

    Example: Public opinion about pay inequality

  • 8/13/2019 2013 06 19 Robust Regression

    99/101

    Bootstrap t CIs

    B Boot. SE Lower 95% Upper 95%

    Intercept .0978 .0580Gini .0051 .0012 .0026 .0076GDP .0057 .0033 .0015 .0129

    W k h M d th d f b t i (# 152 R b t A d ) 43/45 Robust regression in R

  • 8/13/2019 2013 06 19 Robust Regression

    100/101

    Robust regression in R

    W k h M d th d f b t i (# 152 R b t A d ) 44/45 Robust regression in R

    Robust regression in R

    Fitting regression models

  • 8/13/2019 2013 06 19 Robust Regression

    101/101

    Fitting regression models

    Reg. Model R package R command

    OLS lm(SECPAY gini + GDP)L-Estimation (LAV) quantreg rq(SECPAY gini + GDP)M-Estimation (Huber) MASS rlm(SECPAY gini + GDP)M-Estimation (Biweight) MASS rlm(SECPAY gini + GDP,psi=psi.bisquare)MM-Estimation MASS rlm(SECPAY

    gini + GDP,method="MM")

    Model diagnostics (stats package, loaded by default)

    Statistic Diagnosing. . . R command

    Hat values Leverage hatvalues(lm(SECPAY

    gini + GDP))

    Studentized residual Reg. outlier rstudent(lm(SECPAY gini + GDP))DFBETA Influence dfbeta(lm(SECPAY gini + GDP))Cooks D Influence cooks.distance(lm(SECPAY gini + GDP))

    W k h M d h d f b i (# 152 R b A d ) 45/45