Endogenous explanatory variables - Nuffield College .Endogenous explanatory variables ... Note that

download Endogenous explanatory variables - Nuffield College .Endogenous explanatory variables ... Note that

of 33

  • date post

    11-Aug-2018
  • Category

    Documents

  • view

    218
  • download

    0

Embed Size (px)

Transcript of Endogenous explanatory variables - Nuffield College .Endogenous explanatory variables ... Note that

  • Endogenous explanatory variables

    Violation of the assumption that Cov(xi, ui) = 0 has serious consequences

    for the OLS estimator

    This is one of the key assumptions needed to establish consistency

    When one or more of the explanatory variables is correlated with the error

    term ui, we have both E(ui|xi) 6= 0 and E(xiui) 6= 0, so the OLS estimator

    will be both biased and inconsistent

    1

  • We will consider two situations where this occurs:

    - a linear model with Cov(xi, ui) = 0 is the correct specification, but one

    or more of the explanatory variables is measured with error

    - a linear model with Cov(xi, ui) = 0 is the correct specification, but one or

    more of the explanatory variables is not measured at all, and hence omitted

    from the model we can estimate

    These are simply two examples of cases where we have simultaneity or

    endogeneity, i.e. one or more of the explanatory variables is correlated

    with the error term

    2

  • Measurement error/errors-in-variables

    A common concern in applied econometrics is that relevant explanatory

    variables may be poorly measured

    Examples - survey data on households:

    - recall bias: how much time did you spend unemployed last year?

    - rounding bias: how much money did you spend on food last week?

    3

  • Illustrate attenuation biasfor the case of a single explanatory variable,

    measured with error

    - the OLS estimator is biased towards zero if the explanatory variable is

    measured with error

    - this bias does not disappear in large samples (OLS is inconsistent)

    Note that measurement error in the dependent variable does not lead to

    the same bias and inconsistency problems, provided the measurement error

    in yi is uncorrelated with (correctly measured) xi

    4

  • Consider the model with a single explanatory variable and no intercept

    yi = xi + ui for i = 1, ..., N

    where yi and xi denote the true values of these variables, that we may not

    observe

    To simplify, suppose E(ui) = E(xi ) = E(yi ) = 0 for i = 1, ..., N (original

    variables may be expressed as deviations from their sample means)

    We focus on large sample properties, and assume that E(xiui) = 0 for

    i = 1, ..., N , and we have independent observations, so that OLS would be

    a consistent estimator of if we observed the true values of yi and xi

    5

  • First consider additive, mean zero measurement error in the dependent

    variable only

    yi = yi + vi yi = yi vi

    yi is the observed value

    yi is the true value

    vi is the measurement error, with E(vi) = 0 for i = 1, ..., N

    The true values xi are observed

    6

  • Substituting this expression for yi in the true model

    (yi vi) = xi + ui

    or yi = xi + (ui + vi)

    Consistency requires xi to be uncorrelated with the error term (ui + vi)

    Given E(xiui) = 0, the additional requirement is that E(xivi) = 0 for

    i = 1, ..., N

    That is, the measurement error in the dependent variable is uncorrelated

    with the explanatory variable

    7

  • Now consider additive, mean zero measurement error in the explanatory

    variable (only)

    xi = xi + ei xi = xi ei

    Substituting for xi in the true model

    yi = (xi ei) + ui

    or yi = xi + (ui ei)

    The OLS estimator of here is biased and inconsistent

    - for a given value of xi , observed xi and the measurement error ei are

    positively correlated, which implies non-zero correlation between xi and the

    error term in this model (ui ei)8

  • yi = xi + (ui ei)

    For > 0, this implies a negative correlation between xi and (ui ei)

    For < 0, this implies a positive correlation between xi and (ui ei)

    For > 0, the OLS estimator of will be biased downwards

    For < 0, the OLS estimator of will be biased upwards

    In either case, the OLS estimator of will be biased towards zero

    - this is known as attenuation bias

    9

  • To analyse this further, we invoke the classical errors-in-variables assump-

    tions (for i = 1, ..., N)

    E(xiei) = 0 Measurement error is uncorrelated with the true value of xi

    E(uiei) = 0 Measurement error is uncorrelated with the true model error ui

    V (ei) = 2e Measurement error is homoskedastic

    V (xi ) = 2x Population variance of the true x

    i exists and is finite

    Now OLS = (XX)1X y =

    Ni=1

    xiyi

    Ni=1

    x2i

    =

    1N

    Ni=1

    xiyi

    1N

    Ni=1

    x2i

    Using xi = xi +ei and yi = x

    i+ui together with the above assumptions,

    we obtain10

  • p limN

    OLS =

    p lim 1N

    Ni=1

    (xi + ei)(xi + ui)

    p lim 1N

    Ni=1

    (xi + ei)2

    =

    (p lim 1N

    Ni=1

    x2i

    ) + p lim 1N

    Ni=1

    xiui +

    (p lim 1N

    Ni=1

    xi ei

    ) + p lim 1N

    Ni=1

    uiei

    p lim 1N

    Ni=1

    x2i + 2p lim1N

    Ni=1

    xi ei + p lim1N

    Ni=1

    e2i

    =E(x2i ) + E(x

    iui) + E(x

    iei) + E(uiei)

    E(x2i ) + 2E(xiei) + E(e

    2i )

    =E(x2i ) + 0 + 0 + 0

    E(x2i ) + 0 + E(e2i )

    =

    (2x

    2x + 2e

    ) =

    1 + (2e/2x)6= if 2e > 0

    11

  • p limN

    OLS =

    1 + (2e/2x)

    < for > 0 and 2e > 0

    p limN

    OLS =

    1 + (2e/2x)

    > for < 0 and 2e > 0

    The OLS estimator of is inconsistent, with a bias towards zero that does

    not diminish as the sample becomes large

    For given 2x, the severity of this attenuation bias increases with the

    variance of the measurement error (2e)

    The magnitude of the inconsistency depends inversely on the signal-to-

    noiseratio (2x/2e)

    12

  • Under the classical errors-in-variables assumptions with homoskedasticmea-

    surement error, the presence of measurement error affects the estimated slope

    parameter, but not the linearity of the relationship between yi and observed

    xi

    With heteroskedastic measurement error, the presence of measurement er-

    ror may also introduce an incorrect indication of non-linearity in the rela-

    tionship

    For example, if > 0 and V (ei) tends to be larger for individuals with

    higher values of xi , then estimation of a non-linear relationship between yi

    and observed xi could give an incorrect indication of a concave relationship

    (illustrate)13

  • Multiple regression with errors in variables

    yi = xi + ui

    xi = xi + e

    i

    where xi, xi and e

    i are 1K vectors

    As before

    yi = xi + (ui ei)

    In general, the OLS estimator of the K 1 vector of parameters will be

    biased and inconsistent, since E[xi(ui ei)] 6= 0

    14

  • If only one of the explanatory variables in xi is measured with error, we

    can show that

    - the OLS estimator of the coeffi cient on that variable is biased towards

    zero

    - the OLS estimator of the coeffi cients on the other explanatory variables

    are also biased, in unknown directions

    If several explanatory variables are measured with error, it is very diffi cult

    to sign the biases for any of the coeffi cients

    15

  • Omitted variables

    Another common concern in applied econometrics is that relevant explana-

    tory variables may be omitted from the model

    Relevant explanatory variables are often unobserved or unobservable

    Example

    - survey data on individuals do not contain data on characteristics like

    ability or motivation

    This may make it diffi cult to attach causal significance to estimated para-

    meters in linear regression-type models

    16

  • Illustrate omitted variable bias for the case of a single included variable

    and a single omitted variable

    - the OLS estimator is biased if the omitted variable is relevant and corre-

    lated with the included regressor

    - this bias does not disappear in large samples (OLS is inconsistent)

    - the direction of the bias depends on the sign of the correlation between

    the included variable and the omitted variable

    17

  • Consequently omitted variables - or unobserved heterogeneity- presents a

    formidable challenge to drawing causal inferences from cross-section regres-

    sions

    There is a serious danger that observed, included explanatory variables

    may just be proxying for unobserved, omitted factors - rather than exerting

    a direct, causal influence on the outcome of interest

    18

  • Note that this problem is not confined to empirical research in economics

    Beware of medical studies claiming that some activity will help you live

    longer

    These claims are often based on cross-section correlations

    It is diffi cult to draw causal conclusions unless we are confident that the

    study has controlled for all potentially relevant confounding factors

    19

  • We first consider the model with one included variable (x1i) and one omit-

    ted variable (x2i)

    The true model is

    yi = x1i1 + x2i2 + ui for i = 1, 2, ..., N

    satisfying E(ui) = E(x1i) = E(x2i) = 0 and E(x1iui) = E(x2iui) = 0

    However the model we estimate excludes x2i

    yi = x1i1 + (ui + x2i2) for i = 1, 2, ..., N

    Illustration suggests that the OLS estimator 1 in the estimated model

    will be a biased and inconsistent estimator of 1 in the true model, in cases

    where x2i and x1i are correlated, and where 2 6= 020

  • Stack across the N observations to obtain

    y = X