Download - Endogenous explanatory variables Violation of the assumption that ...

Endogenous explanatory variables

Violation of the assumption that Cov(xi, ui) = 0 has serious consequences

for the OLS estimator

This is one of the key assumptions needed to establish consistency

When one or more of the explanatory variables is correlated with the error

term ui, we have both E(ui|xi) 6= 0 and E(xiui) 6= 0, so the OLS estimator

will be both biased and inconsistent

1

We will consider two situations where this occurs:

- a linear model with Cov(xi, ui) = 0 is the correct specification, but one

or more of the explanatory variables is measured with error

- a linear model with Cov(xi, ui) = 0 is the correct specification, but one or

more of the explanatory variables is not measured at all, and hence omitted

from the model we can estimate

These are simply two examples of cases where we have simultaneity or

endogeneity, i.e. one or more of the explanatory variables is correlated

with the error term

2

Measurement error/errors-in-variables

A common concern in applied econometrics is that relevant explanatory

variables may be poorly measured

Examples - survey data on households:

- recall bias: how much time did you spend unemployed last year?

- rounding bias: how much money did you spend on food last week?

3

Illustrate ‘attenuation bias’for the case of a single explanatory variable,

measured with error

- the OLS estimator is biased towards zero if the explanatory variable is

measured with error

- this bias does not disappear in large samples (OLS is inconsistent)

Note that measurement error in the dependent variable does not lead to

the same bias and inconsistency problems, provided the measurement error

in yi is uncorrelated with (correctly measured) xi

4

Consider the model with a single explanatory variable and no intercept

y∗i = x∗iβ + ui for i = 1, ..., N

where y∗i and x∗i denote the true values of these variables, that we may not

observe

To simplify, suppose E(ui) = E(x∗i ) = E(y∗i ) = 0 for i = 1, ..., N (original

variables may be expressed as deviations from their sample means)

We focus on large sample properties, and assume that E(x∗iui) = 0 for

i = 1, ..., N , and we have independent observations, so that β̂OLS would be

a consistent estimator of β if we observed the true values of y∗i and x∗i

5

First consider additive, mean zero measurement error in the dependent

variable only

yi = y∗i + vi↔ y∗i = yi − vi

yi is the observed value

y∗i is the true value

vi is the measurement error, with E(vi) = 0 for i = 1, ..., N

The true values x∗i are observed

6

Substituting this expression for y∗i in the true model

(yi − vi) = x∗iβ + ui

or yi = x∗iβ + (ui + vi)

Consistency requires x∗i to be uncorrelated with the error term (ui + vi)

Given E(x∗iui) = 0, the additional requirement is that E(x∗ivi) = 0 for

i = 1, ..., N

That is, the measurement error in the dependent variable is uncorrelated

with the explanatory variable

7

Now consider additive, mean zero measurement error in the explanatory

variable (only)

xi = x∗i + ei↔ x∗i = xi − ei

Substituting for x∗i in the true model

y∗i = (xi − ei)β + ui

or y∗i = xiβ + (ui − eiβ)

The OLS estimator of β here is biased and inconsistent

- for a given value of x∗i , observed xi and the measurement error ei are

positively correlated, which implies non-zero correlation between xi and the

error term in this model (ui − eiβ)8

y∗i = xiβ + (ui − eiβ)

For β > 0, this implies a negative correlation between xi and (ui − eiβ)

For β < 0, this implies a positive correlation between xi and (ui − eiβ)

For β > 0, the OLS estimator of β will be biased downwards

For β < 0, the OLS estimator of β will be biased upwards

In either case, the OLS estimator of β will be biased towards zero

- this is known as ‘attenuation bias’

9

To analyse this further, we invoke the classical errors-in-variables assump-

tions (for i = 1, ..., N)

E(x∗iei) = 0 Measurement error is uncorrelated with the true value of x∗i

E(uiei) = 0 Measurement error is uncorrelated with the true model error ui

V (ei) = σ2e Measurement error is homoskedastic

V (x∗i ) = σ2x∗ Population variance of the true x∗i exists and is finite

Now β̂OLS = (X′X)−1X ′y∗ =

N∑i=1

xiy∗i

N∑i=1

x2i

=

1N

N∑i=1

xiy∗i

1N

N∑i=1

x2i

Using xi = x∗i +ei and y∗i = x∗iβ+ui together with the above assumptions,

we obtain10

p limN→∞

β̂OLS =

p lim 1N

N∑i=1

(x∗i + ei)(x∗iβ + ui)

p lim 1N

N∑i=1

(x∗i + ei)2

=

(p lim 1

N

N∑i=1

x∗2i

)β + p lim 1

N

N∑i=1

x∗iui +

(p lim 1

N

N∑i=1

x∗i ei

)β + p lim 1

N

N∑i=1

uiei

p lim 1N

N∑i=1

x∗2i + 2p lim1N

N∑i=1

x∗i ei + p lim1N

N∑i=1

e2i

=E(x∗2i )β + E(x

∗iui) + E(x

∗iei)β + E(uiei)

E(x∗2i ) + 2E(x∗iei) + E(e

2i )

=E(x∗2i )β + 0 + 0 + 0

E(x∗2i ) + 0 + E(e2i )

=

(σ2x∗

σ2x∗ + σ2e

)β =

β

1 + (σ2e/σ2x∗)6= β if σ2e > 0

11

p limN→∞

β̂OLS =β

1 + (σ2e/σ2x∗)

< β for β > 0 and σ2e > 0

p limN→∞

β̂OLS =β

1 + (σ2e/σ2x∗)

> β for β < 0 and σ2e > 0

The OLS estimator of β is inconsistent, with a bias towards zero that does

not diminish as the sample becomes large

For given σ2x∗, the severity of this ‘attenuation bias’ increases with the

variance of the measurement error (σ2e)

The magnitude of the inconsistency depends inversely on the ‘signal-to-

noise’ratio (σ2x∗/σ2e)

12

Under the classical errors-in-variables assumptions with homoskedasticmea-

surement error, the presence of measurement error affects the estimated slope

parameter, but not the linearity of the relationship between y∗i and observed

xi

With heteroskedastic measurement error, the presence of measurement er-

ror may also introduce an incorrect indication of non-linearity in the rela-

tionship

For example, if β > 0 and V (ei) tends to be larger for individuals with

higher values of x∗i , then estimation of a non-linear relationship between y∗i

and observed xi could give an incorrect indication of a concave relationship

(illustrate)13

Multiple regression with errors in variables

y∗i = x∗′i β + ui

x′i = x∗′i + e′i

where x′i, x∗′i and e

′i are 1×K vectors

As before

y∗i = x′iβ + (ui − e′iβ)

In general, the OLS estimator of the K × 1 vector of parameters β will be

biased and inconsistent, since E[xi(ui − e′iβ)] 6= 0

14

If only one of the explanatory variables in xi is measured with error, we

can show that

- the OLS estimator of the coeffi cient on that variable is biased towards

zero

- the OLS estimator of the coeffi cients on the other explanatory variables

are also biased, in unknown directions

If several explanatory variables are measured with error, it is very diffi cult

to sign the biases for any of the coeffi cients

15

Omitted variables

Another common concern in applied econometrics is that relevant explana-

tory variables may be omitted from the model

Relevant explanatory variables are often unobserved or unobservable

Example

- survey data on individuals do not contain data on characteristics like

ability or motivation

This may make it diffi cult to attach causal significance to estimated para-

meters in linear regression-type models

16

Illustrate omitted variable bias for the case of a single included variable

and a single omitted variable

- the OLS estimator is biased if the omitted variable is relevant and corre-

lated with the included regressor

- this bias does not disappear in large samples (OLS is inconsistent)

- the direction of the bias depends on the sign of the correlation between

the included variable and the omitted variable

17

Consequently omitted variables - or ‘unobserved heterogeneity’- presents a

formidable challenge to drawing causal inferences from cross-section regres-

sions

There is a serious danger that observed, included explanatory variables

may just be proxying for unobserved, omitted factors - rather than exerting

a direct, causal influence on the outcome of interest

18

Note that this problem is not confined to empirical research in economics

Beware of medical studies claiming that some activity will help you live

longer

These claims are often based on cross-section correlations

It is diffi cult to draw causal conclusions unless we are confident that the

study has controlled for all potentially relevant confounding factors

19

We first consider the model with one included variable (x1i) and one omit-

ted variable (x2i)

The true model is

yi = x1iβ1 + x2iβ2 + ui for i = 1, 2, ..., N

satisfying E(ui) = E(x1i) = E(x2i) = 0 and E(x1iui) = E(x2iui) = 0

However the model we estimate excludes x2i

yi = x1iβ1 + (ui + x2iβ2) for i = 1, 2, ..., N

Illustration suggests that the OLS estimator β̂1 in the estimated model

will be a biased and inconsistent estimator of β1 in the true model, in cases

where x2i and x1i are correlated, and where β2 6= 020

Stack across the N observations to obtain

y = X1β1 + (u +X2β2) (all vectors are N × 1)

The OLS estimator of β1 is

β̂1 = (X′1X1)

−1X ′1y

Substituting for y = X1β1 +X2β2 + u from the true model

β̂1 = (X′1X1)

−1X ′1(X1β1 +X2β2 + u)

= β1 +[(X ′1X1)

−1X ′1X2

]β2 + (X

′1X1)

−1X ′1u

= β1 + δ̂β2 + (X′1X1)

−1X ′1u

where δ̂ = (X ′1X1)−1X ′1X2 is the OLS estimator of ...

21

...the coeffi cient in a regression of the omitted variable x2i on the included

variable x1i, i.e.

x2i = x1iδ + ei

Taking probability limits, and using E(x1iui) = 0, we obtain

p limN→∞

β̂1 = β1 + (p limN→∞

δ̂)β2

The OLS estimator of β1 in the model that omits x2i is inconsistent unless

- either p limN→∞ δ̂ = 0 (the omitted variable is orthogonal to the included

variable)

- or β2 = 0 (the omitted variable is not a relevant explanatory variable in

the true model)22

p limN→∞

β̂1 = β1 + (p limN→∞

δ̂)β2

Thus if we omit a relevant explanatory variable (β2 6= 0), the only case in

which β̂1 remains a consistent estimator of the true, causal parameter β1 is

the case where p limN→∞ δ̂ = 0, i.e. where x1i and x2i are uncorrelated

If x1i and x2i are positively correlated, we have p limN→∞ δ̂ > 0

If β2 is also positive, we expect an upward bias in the OLS estimator β̂1

Intuitively, the OLS estimator β̂1 picks up an indirect relationship between

x1i and yi, due to the fact that the included x1i proxies for the omitted x2i,

as well as the direct causal effect of x1i on yi at a given level of x2i, measured

by β1 in the true model (cf. illustration)23

Conversely if x1i and x2i are negatively correlated (p limN→∞ δ̂ < 0) and

β2 > 0,

or if x1i and x2i are positively correlated (p limN→∞ δ̂ > 0) and β2 < 0,

we expect a downward bias in the OLS estimator β̂1

Thus if regressionmodels omit relevant (but perhaps unmeasured) explana-

tory factors that are correlated with the included (measured) regressors, we

cannot draw causal inferences from the pattern of partial correlations among

the observed variables

24

Some examples

- do individuals with lots of education tend to have high earnings because

education raises their productivity and wages, or because intrinsically high

ability (high productivity) individuals also tend to have lots of education?

- do countries with high investment tend to have high per capita income

levels because investment raises income, or because (for example) countries

with good institutions tend to have both high investment and high incomes?

25

In the first example, causality may run from ability to both education and

earnings, rather than from education to earnings

In the second example, causality may run from institutions to both invest-

ment and incomes, rather than from investment to incomes

Since it is very diffi cult to control adequately for individual ability or the

quality of national institutions, we should be very cautious about drawing

any causal inference from statistically significant coeffi cients reported in such

cross-section regression studies

26

Multiple regression with omitted variables

The analysis proceeds in a similar way

y = X1β1 + (X2β2 + u)

where now X1 is N ×K1, β1 is K1× 1, X2 is N ×K2 and β2 is K2× 1 (i.e.

there are K1 included regressors and K2 omitted variables)

As in the simpler example we can obtain

p limN→∞

β̂1 = β1 + (p limN→∞

(X ′1X1)−1X ′1X2)β2

Each column of the K1×K2 matrix (X ′1X1)−1X ′1X2 is the K1× 1 vector of

OLS estimates of the coeffi cients in amultiple regression of the corresponding

column of X2 on all of the included variables in X1

27

The general point is that it is harder to be confident about the direction

of the biases expected in the estimated β1 coeffi cients

If there is only one omitted variable (K2 = 1), which is correlated with

several of the included explanatory variables, we can show that the bias

in each of the estimated coeffi cients on the included regressors will depend

on their partial correlation with the omitted variable, not on the simple

correlation between the included regressor and the omitted variable

i.e. direction of biases depends on the sign of coeffi cients in a multiple

regression of the omitted variable on all the included regressors jointly

- not on the sign of coeffi cients in a set of simple regressions relating the

omitted variable to each of the included regressors individually28

Single omitted variable

yi = β1 + β2x2i + ... + βK−1xK−1,i + (βKxKi + ui)

where E(xkiui) = 0 for k = 1, ..., K (and x1i = 1 for all i = 1, ..., N)

[Relation to general model: K1 = K − 1, K2 = 1]

Linear projection of omitted xKi on the included variables

xKi = δ1 + δ2x2i + ... + δK−1xK−1,i + vi

s.t. E(xkivi) = 0 for k = 1, ..., K − 1 (by definition of linear projection)

Substitute

yi = (β1+βKδ1)+(β2+βKδ2)x2i+ ...+(βK−1+βKδK−1)xK−1,i+(ui+βKvi)

29

Now since E(xki(ui + βKvi)) = 0 for k = 1, ..., K − 1, we have

p limN→∞

β̂k = βk + βKδk for k = 1, ..., K − 1

(Or equivalently p limN→∞

β̂k = βk + (p limN→∞

δ̂k)βK)

The inconsistency thus depends on the sign of the partial correlations (re-

flected in the sign of the δk coeffi cients)

- not on the sign of the simple correlations between each included xki and

the omitted variable xKi

30

Multiple omitted variables

If there are several omitted variables, it is very diffi cult to predict the

direction of the biases

But the OLS estimator β̂1 is a biased and inconsistent estimator of β1 in

the true model, except in the special case where all of the omitted variables

are orthogonal to all of the included variables

31

Simultaneity bias

Note that both omitted variables and measurement error (in the explana-

tory variable(s)) result in correlation between the included explanatory vari-

able(s) and the error term in the estimated model

These are both examples of the more general phenomenon of ‘simultaneity’

or ‘endogeneity’- sources of correlation between the explanatory variable(s)

and the error term, such that the OLS estimator is biased and inconsistent

32

This can also arise naturally in situations where the dependent variable

and at least one of the explanatory variables are chosen jointly as part of

the same decision problem

Example:

- firms choosing inputs and output jointly in models of production, where

the error term includes unobserved (total factor) productivity

- high productivity firms are likely to be larger, using more inputs, as well

as producing more output from given inputs

- expect OLS estimates of coeffi cients on the inputs to be biased and incon-

sistent in production functions (Marschak & Andrews, Econometrica, 1944)33