EC403 U3 Random Regressors and Moment Based Estimation (1)
-
Upload
swamykrish -
Category
Documents
-
view
23 -
download
2
description
Transcript of EC403 U3 Random Regressors and Moment Based Estimation (1)
EC403 Applied Econometrics
Unit 3 (Chapter 10 in Text)
Random Regressors and Moment-Based EstimationRandom Regressors and Moment-Based Estimation
Learning Objectives
� 10.1 Linear Regression with Random x’s
� 10.2 Cases in Which x and e are Correlated
� 10.3 Estimators Based on the Method of
Chapter Contents
� 10.3 Estimators Based on the Method of Moments
� 10.4 Specification Tests
�One classical assumption:
10.1Linear Regression with Random x’s
If E(e|x) = 0, then we can show that it is also true that x and e are
uncorrelated, and that cov(x, e) = 0. Explanatory variables that are
not correlated with the error term are called exogenous
variables.
10.1 Linear Regression with Random x’s
variables.
� Relaxation of the above assumption:
Conversely, if x and e are correlated, then cov(x, e) ≠ 0 and we can show that E(e|x) ≠ 0. Explanatory variables that are correlated with the error term are called endogenous variables.
If assumption A10.3* is not true, and in particular if cov(x,e) ≠ 0 so that x and e are correlated, then the least squares estimators are inconsistent.
– They do not converge to the true parameter values even in very large samples.
10.1Linear Regression with Random x’s
10.1.2Large Sample
Properties of the Least Squares
Estimators
– None of our usual hypothesis testing or interval estimation procedures are valid.
10.1Linear Regression with Random x’s
10.1.3Why Least Squares
Estimation Fails
FIGURE 10.1 (a) Correlated x and e
10.1Linear Regression with Random x’s
10.1.3Why Least Squares
Estimation Fails
FIGURE 10.1 (b) Plot of data, true and fitted regression functions
The statistical consequences of correlation between x and e is that the least squares estimator is biased— and this bias will not disappear no
10.1Linear Regression with Random x’s
10.1.3Why Least Squares
Estimation Fails
is biased— and this bias will not disappear no matter how large the sample
– Consequently the least squares estimator is inconsistentwhen there is correlation between x and e
When an explanatory variable and the error term are correlated, the explanatory variable is said to be endogenous
– This term comes from simultaneous equations
10.2Cases in Which x
and e are Correlated
10.2 Cases in Which x and e are Correlated
– This term comes from simultaneous equations models
• It means ‘‘determined within the system’’
– Using this terminology when an explanatory variable is correlated with the regression error, one is said to have an ‘‘endogeneity problem’’
Case 1: Measurement error
The errors-in-variables problem occurs when an explanatory variable is measured with error
– If we measure an explanatory variable with
10.2Cases in Which x
and e are Correlated
10.2.1Measurement Error
error, then it is correlated with the error term, and the least squares estimator is inconsistent
Let y = annual savings and x* = the permanent annual income of a person
– A simple regression model is:
– Current income is a measure of permanent
10.2Cases in Which x
and e are Correlated
10.2.1Measurement Error
*1 2i i iy x v= β + β +Eq. 10.1
– Current income is a measure of permanent income, but it does not measure permanent income exactly.
• It is sometimes called a proxy variable
• To capture this feature, specify that:*
i i ix x u= +Eq. 10.2
Substituting:
10.2Cases in Which x
and e are Correlated
10.2.1Measurement Error
( )
*1 2
1 2
iy x v
x u v
= β + β +
= β + β − +( )
( )
1 2
1 2 2
1 2
x u v
x v u
x e
= β + β − +
= β + β + − β
= β + β +
Eq. 10.3
In order to estimate Eq. 10.3 by least squares, we must determine whether or not x is uncorrelated with the random disturbance e
– The covariance between these two random
10.2Cases in Which x
and e are Correlated
10.2.1Measurement Error
– The covariance between these two random variables, using the fact that E(e) = 0, is:
( ) ( ) ( )( )
( )
*2
2 22 2
cov ,
0u
x e E xe E x u v u
E u
= = + − β
= −β = −β σ ≠Eq. 10.4
The least squares estimator b2 is an inconsistentestimator of β2 because of the correlation between the explanatory variable and the error term
– Consequently, b2 does not converge to β2 in
10.2Cases in Which x
and e are Correlated
10.2.1Measurement Error
– Consequently, b2 does not converge to β2 in large samples
– In large or small samples b2 is notapproximately normal with mean β2 and variance ( ) ( )2
2var b x x= −∑
Case 2: Simultaneous Equations System
Another situation in which an explanatory variable is correlated with the regression error term arises in simultaneous equations models
10.2Cases in Which x
and e are Correlated
10.2.2Simultaneous
Equations Bias
in simultaneous equations models
– Suppose we write:
1 2Q P e= β + β +Eq. 10.5
There is a feedback relationship between P and Q
– Because of this, which results because price and quantity are jointly, or simultaneously, determined, we can show that cov(P, e) ≠ 0
– The resulting bias (and inconsistency) is called
10.2Cases in Which x
and e are Correlated
10.2.2Simultaneous
Equations Bias
– The resulting bias (and inconsistency) is called the simultaneous equations bias
Case 3: Omitted Variable Correlated with e
When an omitted variable is correlated with an included explanatory variable, then the regression
10.2Cases in Which x
and e are Correlated
10.2.3Omitted Variables
included explanatory variable, then the regression error will be correlated with the explanatory variable, making it endogenous
Consider a log-linear regression model explaining observed hourly wage:
10.2Cases in Which x
and e are Correlated
10.2.3Omitted Variables
( ) 21 2 3 4ln β β β βWAGE EDUC EXPER EXPER e= + + + +Eq. 10.6
– What else affects wages? What have we omitted?
( ) 1 2 3 4ln β β β βWAGE EDUC EXPER EXPER e= + + + +Eq. 10.6
We might expect cov(EDUC, e) ≠ 0
– If this is true, then we can expect that the least squares estimator of the returns to another year
10.2Cases in Which x
and e are Correlated
10.2.3Omitted Variables
squares estimator of the returns to another year of education will be positively biased, E(b2) > β2, and inconsistent
• The bias will not disappear even in very large samples
Estimating our wage equation, we have:
– We estimate that an additional year of education
10.2Cases in Which x
and e are Correlated
10.2.4Least Squares Estimation of a Wage Equation
( )( ) ( ) ( ) ( ) ( )
2ln 0.5220 0.1075 0.0416 0.0008
se 0.1986 0.0141 0.0132 0.0004
WAGE EDUC EXPER EXPER=− + × + × − ×
– We estimate that an additional year of education increases wages approximately 10.75%, holding everything else constant
• If ability has a positive effect on wages, then this estimate is overstated, as the contribution of ability is attributed to the education variable
A moment is a quantitative measure of the shape of a set of points, e.g. mean, variance, median, etc.
The method of momentsis a method of estimation of population parameters by equating
10.3 Estimators Based on the Method of Moments
estimation of population parameters by equating sample moments with unobservable population moments and then solving those equations for the quantities to be estimated.
The method of moments estimation procedure equates m population moments to m sample moments to estimate m unknown parameters
10.3.1Method of Moments
Estimation of a Population Mean
and Variance
10.3Estimators Based on
the Method of Moments
moments to estimate m unknown parameters
– Example:
Eq. 10.9 ( ) ( ) ( )22 2 2var Y E Y E Y= σ = − µ = − µ
The first two population and sample moments of Yare:
10.3.1Method of Moments
Estimation of a Population Mean
and Variance
10.3Estimators Based on
the Method of Moments
Population Moments Sample Moments
Eq. 10.10 ( )( )
1
2 22 2
Population Moments Sample Moments
ˆ
ˆ
i
i
E Y y N
E Y y N
= µ = µ µ =
= µ µ =∑
∑
Solve for the unknown mean and variance parameters:
10.3.1Method of Moments
Estimation of a Population Mean
and Variance
10.3Estimators Based on
the Method of Moments
Eq. 10.11 ˆ iy N yµ = =∑and
Eq. 10.11 ˆ iy N yµ = =∑
( )22 2 22 2 2
2ˆ ˆ ii i y yy y Nyy
N N N
−−σ = µ − µ = − = = ∑∑ ∑%Eq. 10.12
In the linear regression model y = β1 + β2x + e, we usually assume:
10.3.2Method of Moments
Estimation in the Simple Linear
Regression Model
10.3Estimators Based on
the Method of Moments
Eq. 10.13 ( ) ( )1 20 0i i iE e E y x= ⇒ −β −β =– If x is fixed, or random but not correlated with
e, then:
Eq. 10.13
Eq. 10.14
( ) ( )1 20 0i i iE e E y x= ⇒ −β −β =
( ) ( )1 20 0E xe E x y x= ⇒ − β − β =
We have two equations in two unknowns:
10.3.2Method of Moments
Estimation in the Simple Linear
Regression Model
10.3Estimators Based on
the Method of Moments
1
Eq. 10.15
( )
( )
1 2
1 2
10
10
i i
i i i
y b b xN
x y b b xN
− − =
− − =
∑
∑
These are equivalent to the least squares normal equations and their solution is:
10.3.2Method of Moments
Estimation in the Simple Linear
Regression Model
10.3Estimators Based on
the Method of Moments
Eq. 10.16
( )( )( )2 2
i i
i
x x y yb
x x
− −=
−∑∑
– Under "nice" assumptions, the method of moments principle of estimation leads us to the same estimators for the simple linear regression model as the least squares principle
1 2b y b x= −
‘Nice’: when all the usual assumptions of the linear model hold, the method of momentsleads to the least squares estimator
If x is random and correlated with the error term,
10.3Estimators Based on
the Method of Moments
If x is random and correlated with the error term, the method of moments leads to an alternative, called instrumental variables estimation, or two-stage least squares estimation, that will work in large samples
IV or 2SLS Estimation� Instrumental VariableSuppose that there is another variable, z, such that:
1. z does not have a direct effect on y, and thus it does not belong on the right-hand side of the model as an explanatory variable
2. z is not correlated with the regression error
10.3.3Instrumental
Variables Estimation in the Simple Linear Regression Model
10.3Estimators Based on
the Method of Moments
2. z is not correlated with the regression error term e• Variables with this property are said to be
exogenous3. z is strongly [or at least not weakly] correlated
with x, the endogenous explanatory variableA variable z with these properties is called an instrumental variable
If such a variable z exists, then it can be used to form the moment condition:
– Use Eqs. 10.13 and 10.16, the sample moment
10.3.3Instrumental
Variables Estimation in the Simple Linear Regression Model
10.3Estimators Based on
the Method of Moments
( ) ( )1 20 0E ze E z y x= ⇒ − β − β = Eq. 10.16
– Use Eqs. 10.13 and 10.16, the sample moment conditions are:
( )
( )
1 2
1 2
1 ˆ ˆ 0
1 ˆ ˆ 0
i i
i i i
y xN
z y xN
−β −β =
− β − β =
∑
∑
Eq. 10.17
Solving these equations leads us to method of moments estimators, which are usually called the instrumental variable (IV) estimators:
10.3.3Instrumental
Variables Estimation in the Simple Linear Regression Model
10.3Estimators Based on
the Method of Moments
Eq. 10.18
( )( )( )( )2
1 2
ˆ
ˆ ˆ
i ii i i i
i i i i i i
z z y yN z y z y
N z x z x z z x x
y x
− −−β = =
− − −
β = − β
∑∑ ∑ ∑∑ ∑ ∑ ∑
These new estimators have the following properties:
– They are consistent, if z is exogenous, with E(ze) = 0
– In large samples the instrumental variable
10.3.3Instrumental
Variables Estimation in the Simple Linear Regression Model
10.3Estimators Based on
the Method of Moments
– In large samples the instrumental variable estimators have approximate normal distributions
• In the simple regression model:
Eq. 10.19( )
2
2 2 22ˆ ~ ,
zx i
Nr x x
σβ β − ∑
– The error variance is estimated using the estimator:
10.3.3Instrumental
Variables Estimation in the Simple Linear Regression Model
10.3Estimators Based on
the Method of Moments
( )2
1 22ˆ ˆ
ˆ2
i i
IV
y x
N
− β − βσ =
−∑
Note that we can write the variance of the instrumental variables estimator of β2 as:
– Because the variance of the instrumental variables estimator will always be larger than the variance of the least squares estimator, and thus it is said to be less efficient
( ) ( )( )2
22 2 22
varˆvarzxzx i
b
rr x x
σβ = =−∑
To extend our analysis to a more general setting, consider the multiple regression model:
– Let x be an endogenous variable correlated
10.3.4Instrumental
Variables Estimation in the Multiple
Regression Model
10.3Estimators Based on
the Method of Moments
1 2 2β β βK Ky x x e= + + + +L
– Let xK be an endogenous variable correlated with the error term
– The first K - 1 variables are exogenous variables that are uncorrelated with the error term e - they are ‘‘included’’ instruments
We can estimate this equation in two steps with a least squares estimation in each step
The first stage regression has the endogenous variable xK on the left-hand side, and all exogenous and instrumental variables on the right-hand side
– The first stage regression is:– The first stage regression is:
– The least squares fitted value is:
1 2 2 1 1 1 1K K K L L Kx x x z z v− −= γ + γ + + γ + θ + + θ +L LEq. 10.20
1 2 2 1 1 1 1ˆ ˆˆ ˆ ˆˆK K K L Lx x x z z− −= γ + γ + + γ + θ + + θL L
Eq. 10.21
The second stage regression is based on the original specification:
– The least squares estimators from this equation are the instrumental variables (IV ) estimators
10.3.4Instrumental
Variables Estimation in the Multiple
Regression Model
10.3Estimators Based on
the Method of Moments
Eq. 10.22*
1 2 2 ˆβ β βK Ky x x e= + + + +L
are the instrumental variables (IV ) estimators– Because they can be obtained by two least
squares regressions, they are also popularly known as the two-stage least squares (2SLS) estimators • We will refer to them as IV or 2SLS or
IV/2SLS estimators
In the simple regression, if x is endogenous and we have L instruments:
10.3.4aUsing Surplus Instruments in
Simple Regression
10.3Estimators Based on
the Method of Moments
1 1 1ˆ ˆˆˆ L Lx z z= γ + θ + + θL
– The two sample moment conditions are:
( )( )
1 2
1 2
1 ˆ ˆβ β 0
1 ˆ ˆˆ β β 0
i i
i i i
y xN
x y xN
− − =
− − =
∑
∑
Solving using the fact that , we get:
10.3.4aUsing Surplus Instruments in
Simple Regression
10.3Estimators Based on
the Method of Moments
( )( ) ( )( )ˆ ˆ ˆx x y y− − − −∑ ∑
x̂ x=
( )( )( )( )
( )( )( )( )2
1 2
ˆ ˆ ˆβ̂
ˆˆ ˆ
ˆ ˆβ β
i i i i
i ii i
x x y y x x y y
x x x xx x x x
y x
− − − −= =
− −− −
= −
∑ ∑∑∑
Estimation Issue1: Validity of Instruments
The first stage regression is a key tool in assessing whether an instrument is ‘‘strong’’ or ‘‘weak’’ in the multiple regression setting
10.3.5aOne Instrumental
Variable
10.3Estimators Based on
the Method of Moments
1 2 2 1 1 1 1K K K Kx x x z v− −= γ + γ + + γ + θ +LEq. 10.24
Suppose the first stage regression equation is:
– The key to assessing the strength of the instrumental variable z1 is the strength of its relationship to xK after controlling for the effects of all the other exogenous variables
Suppose the first stage regression equation is:
– We require that at least one of the instruments be strong
Eq. 10.25
1 2 2 1 1 1 1K K K L L Kx x x z z v− −= γ + γ + + γ + θ + + θ +L L
Using FATHEREDUC andMOTHEREDUC, the first stage equation is:
21 2 3 1 2γ γ γ θ θEDUC EXPER EXPER MOTHEREDUC FATHEREDUC v= + + + + +
Table 10.1 First-Stage Equation
The IV/2SLS estimates are:
Obtain the predicted values of education from the first stage equation and insert it into the log-linear wage equation to replace EDUC
– Then estimate the resulting equation by least squares
( )�
( ) ( ) ( ) ( ) ( )
2ln 0.0481 0.0614 0.0442 0.0009
se 0.4003 0.0314 0.0134 0.0004
WAGE EDUC EXPER EXPER= + + −
To compare, the OLS estimates of the log-linear wage equation are:
( )�
( ) ( ) ( ) ( ) ( )
2ln 0.1982 0.0493 0.0449 0.0009
se 0.4729 0.0374 0.0136 0.0004
WAGE EDUC EXPER EXPER= + + −
Estimation Issue 2: Identification
The multiple regression model, including all Kvariables, is:
G exogenous variables B endogenous variables
1 2 2 1 1
G exogenous variables B endogenous variables
G G G G K Ky x x x x e+ += β + β + β + β + + β +���������������������� ������������������������L L
Think of G = Good explanatory variables, B = Bad explanatory variables and L = Lucky instrumental variables– It is a necessary condition for IV estimation that
L ≥ B– If L = B then there are just enough instrumental
variables to carry out IV estimation
10.3.8Instrumental
Variables Estimation in a General Model
10.3Estimators Based on
the Method of Moments
variables to carry out IV estimation • The model parameters are said to just identified
or exactly identified in this case• The term identified is used to indicate that the
model parameters can be consistently estimated – If L > B then we have more instruments than are
necessary for IV estimation, and the model is said to be overidentified