     • date post

11-Aug-2018
• Category

## Documents

• view

218

0

Embed Size (px)

### Transcript of Endogenous explanatory variables - Nuffield College .Endogenous explanatory variables ... Note that

• Endogenous explanatory variables

Violation of the assumption that Cov(xi, ui) = 0 has serious consequences

for the OLS estimator

This is one of the key assumptions needed to establish consistency

When one or more of the explanatory variables is correlated with the error

term ui, we have both E(ui|xi) 6= 0 and E(xiui) 6= 0, so the OLS estimator

will be both biased and inconsistent

1

• We will consider two situations where this occurs:

- a linear model with Cov(xi, ui) = 0 is the correct specification, but one

or more of the explanatory variables is measured with error

- a linear model with Cov(xi, ui) = 0 is the correct specification, but one or

more of the explanatory variables is not measured at all, and hence omitted

from the model we can estimate

These are simply two examples of cases where we have simultaneity or

endogeneity, i.e. one or more of the explanatory variables is correlated

with the error term

2

• Measurement error/errors-in-variables

A common concern in applied econometrics is that relevant explanatory

variables may be poorly measured

Examples - survey data on households:

- recall bias: how much time did you spend unemployed last year?

- rounding bias: how much money did you spend on food last week?

3

• Illustrate attenuation biasfor the case of a single explanatory variable,

measured with error

- the OLS estimator is biased towards zero if the explanatory variable is

measured with error

- this bias does not disappear in large samples (OLS is inconsistent)

Note that measurement error in the dependent variable does not lead to

the same bias and inconsistency problems, provided the measurement error

in yi is uncorrelated with (correctly measured) xi

4

• Consider the model with a single explanatory variable and no intercept

yi = xi + ui for i = 1, ..., N

where yi and xi denote the true values of these variables, that we may not

observe

To simplify, suppose E(ui) = E(xi ) = E(yi ) = 0 for i = 1, ..., N (original

variables may be expressed as deviations from their sample means)

We focus on large sample properties, and assume that E(xiui) = 0 for

i = 1, ..., N , and we have independent observations, so that OLS would be

a consistent estimator of if we observed the true values of yi and xi

5

• First consider additive, mean zero measurement error in the dependent

variable only

yi = yi + vi yi = yi vi

yi is the observed value

yi is the true value

vi is the measurement error, with E(vi) = 0 for i = 1, ..., N

The true values xi are observed

6

• Substituting this expression for yi in the true model

(yi vi) = xi + ui

or yi = xi + (ui + vi)

Consistency requires xi to be uncorrelated with the error term (ui + vi)

Given E(xiui) = 0, the additional requirement is that E(xivi) = 0 for

i = 1, ..., N

That is, the measurement error in the dependent variable is uncorrelated

with the explanatory variable

7

• Now consider additive, mean zero measurement error in the explanatory

variable (only)

xi = xi + ei xi = xi ei

Substituting for xi in the true model

yi = (xi ei) + ui

or yi = xi + (ui ei)

The OLS estimator of here is biased and inconsistent

- for a given value of xi , observed xi and the measurement error ei are

positively correlated, which implies non-zero correlation between xi and the

error term in this model (ui ei)8

• yi = xi + (ui ei)

For > 0, this implies a negative correlation between xi and (ui ei)

For < 0, this implies a positive correlation between xi and (ui ei)

For > 0, the OLS estimator of will be biased downwards

For < 0, the OLS estimator of will be biased upwards

In either case, the OLS estimator of will be biased towards zero

- this is known as attenuation bias

9

• To analyse this further, we invoke the classical errors-in-variables assump-

tions (for i = 1, ..., N)

E(xiei) = 0 Measurement error is uncorrelated with the true value of xi

E(uiei) = 0 Measurement error is uncorrelated with the true model error ui

V (ei) = 2e Measurement error is homoskedastic

V (xi ) = 2x Population variance of the true x

i exists and is finite

Now OLS = (XX)1X y =

Ni=1

xiyi

Ni=1

x2i

=

1N

Ni=1

xiyi

1N

Ni=1

x2i

Using xi = xi +ei and yi = x

i+ui together with the above assumptions,

we obtain10

• p limN

OLS =

p lim 1N

Ni=1

(xi + ei)(xi + ui)

p lim 1N

Ni=1

(xi + ei)2

=

(p lim 1N

Ni=1

x2i

) + p lim 1N

Ni=1

xiui +

(p lim 1N

Ni=1

xi ei

) + p lim 1N

Ni=1

uiei

p lim 1N

Ni=1

x2i + 2p lim1N

Ni=1

xi ei + p lim1N

Ni=1

e2i

=E(x2i ) + E(x

iui) + E(x

iei) + E(uiei)

E(x2i ) + 2E(xiei) + E(e

2i )

=E(x2i ) + 0 + 0 + 0

E(x2i ) + 0 + E(e2i )

=

(2x

2x + 2e

) =

1 + (2e/2x)6= if 2e > 0

11

• p limN

OLS =

1 + (2e/2x)

< for > 0 and 2e > 0

p limN

OLS =

1 + (2e/2x)

> for < 0 and 2e > 0

The OLS estimator of is inconsistent, with a bias towards zero that does

not diminish as the sample becomes large

For given 2x, the severity of this attenuation bias increases with the

variance of the measurement error (2e)

The magnitude of the inconsistency depends inversely on the signal-to-

noiseratio (2x/2e)

12

• Under the classical errors-in-variables assumptions with homoskedasticmea-

surement error, the presence of measurement error affects the estimated slope

parameter, but not the linearity of the relationship between yi and observed

xi

With heteroskedastic measurement error, the presence of measurement er-

ror may also introduce an incorrect indication of non-linearity in the rela-

tionship

For example, if > 0 and V (ei) tends to be larger for individuals with

higher values of xi , then estimation of a non-linear relationship between yi

and observed xi could give an incorrect indication of a concave relationship

(illustrate)13

• Multiple regression with errors in variables

yi = xi + ui

xi = xi + e

i

where xi, xi and e

i are 1K vectors

As before

yi = xi + (ui ei)

In general, the OLS estimator of the K 1 vector of parameters will be

biased and inconsistent, since E[xi(ui ei)] 6= 0

14

• If only one of the explanatory variables in xi is measured with error, we

can show that

- the OLS estimator of the coeffi cient on that variable is biased towards

zero

- the OLS estimator of the coeffi cients on the other explanatory variables

are also biased, in unknown directions

If several explanatory variables are measured with error, it is very diffi cult

to sign the biases for any of the coeffi cients

15

• Omitted variables

Another common concern in applied econometrics is that relevant explana-

tory variables may be omitted from the model

Relevant explanatory variables are often unobserved or unobservable

Example

- survey data on individuals do not contain data on characteristics like

ability or motivation

This may make it diffi cult to attach causal significance to estimated para-

meters in linear regression-type models

16

• Illustrate omitted variable bias for the case of a single included variable

and a single omitted variable

- the OLS estimator is biased if the omitted variable is relevant and corre-

lated with the included regressor

- this bias does not disappear in large samples (OLS is inconsistent)

- the direction of the bias depends on the sign of the correlation between

the included variable and the omitted variable

17

• Consequently omitted variables - or unobserved heterogeneity- presents a

formidable challenge to drawing causal inferences from cross-section regres-

sions

There is a serious danger that observed, included explanatory variables

may just be proxying for unobserved, omitted factors - rather than exerting

a direct, causal influence on the outcome of interest

18

• Note that this problem is not confined to empirical research in economics

Beware of medical studies claiming that some activity will help you live

longer

These claims are often based on cross-section correlations

It is diffi cult to draw causal conclusions unless we are confident that the

study has controlled for all potentially relevant confounding factors

19

• We first consider the model with one included variable (x1i) and one omit-

ted variable (x2i)

The true model is

yi = x1i1 + x2i2 + ui for i = 1, 2, ..., N

satisfying E(ui) = E(x1i) = E(x2i) = 0 and E(x1iui) = E(x2iui) = 0

However the model we estimate excludes x2i

yi = x1i1 + (ui + x2i2) for i = 1, 2, ..., N

Illustration suggests that the OLS estimator 1 in the estimated model

will be a biased and inconsistent estimator of 1 in the true model, in cases

where x2i and x1i are correlated, and where 2 6= 020

• Stack across the N observations to obtain

y = X