0 Econometrics II

download 0 Econometrics II

of 17

Transcript of 0 Econometrics II

  • 8/13/2019 0 Econometrics II

    1/17

    Econometrics II. Lecture Notes 1

    ESTIMATING LINEAR

    EQUATIONS BY

    INSTRUMENTAL VARIABLES

    1. Instrumental Variables Estimates

    2. Two Step Least Squares Estimates

    3. Asymptotic properties of 2SLS estimates

    4. Specification Tests

    (a) Endogeneity

    (b) Overidentifying restrictions

    (c) Functional form

    (d) Heteroskedasticity

    1

  • 8/13/2019 0 Econometrics II

    2/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    1.1 Instrumental Variables Estimates

    Consider the linear model

    y = 1+

    2x2+ +KxK+ u (1.1)

    E [u] = 0, Cov[xj, u] = 0, j = 1, 2, . . . , K 1, (1.2)

    but xK might be correlated with u, so that is potentially endogenous, while the other

    regressors are exogenous. This can be due to the existence of omitted regressors, so that

    these are correlated with xK, but not with the other regressors, measurement error in

    either dependent or independent variables, or simultaneity.

    If Cov[xK, u]= 0 OLS estimation of all the coefficients in (1.1) typically results ininconsistency. The method of Instrumental Variables (IV) provides a general solution

    to the problem of an endogenous explanatory variable. To use the IV approach we need

    a variable z1,not in equation (1.1) that satisfies two conditions.

    1. First z1 must be exogenous in equation (1.1),

    Cov[z1, u] = 0. (1.3)

    2. The second condition involves the relationship between z1 and the endogenous

    variable xK, in particular the linear projection ofxKon all exogenous variables,

    xK=1+2x2+ +K1xK1+1z1+rK (1.4)

    where by definition E [rK] = 0 and rK is uncorrelated with 1, x2 . . . , xK1 and

    z1. Equation (1.4) is called a reduced form (RF) equation for the endogenous

    variable xK, which always exists as a linear projection. The second assumption is

    that

    1= 0, (1.5)i.e. z1is partially correlated with zK(after the other exogenous variables 1, x2, . . . , xK1

    are controlled).

    If these two conditions are satisfied then z1 is called an IV or just instrumentfor xK.

    Since 1, x2, . . . , xK1 are already exogenous, the full list of IV is the same as the list

    of exogenous variables. These conditions allow identification of the parameters j in

    equation (1.1), so that they can be written down in terms of population moments of

    observable variables.

    2

  • 8/13/2019 0 Econometrics II

    3/17

  • 8/13/2019 0 Econometrics II

    4/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    1.2 Multiple instruments: Two-Stage Least Squares

    (2SLS)

    Let z1, . . . , z L be variables such that

    Cov (zh, u) = 0, h= 1. . . . , L ,

    so that each variable is exogenous in (1.1) .This implies that there are many potential IV

    estimates (with each ofzh or any linear combination). We will see that the 2SLS is the

    most efficient IV: 2SLS chose the most correlated linear combination of the exogenous

    variablesz = (1, x2, . . . , xK1, z1, . . . , z L) withxK.IfxKis exogenous, then we will chose

    just xK. Writing the RF for xK as

    xK=1+2x2+ +K1xK1+1z1+ +LzL+rK, (1.6)

    we find that

    xK=1+2x2+ +K1xK1+1z1+ +LzLis uncorrelated withrK,and, as any linear combination, also with u (i.e. x

    K is the part

    ofxKuncorrelated with u,so that xK is endogenous because rK is correlated with u).

    The coefficients 1, 2, . . . , K1, 1, . . . , h can be estimated by OLS, and obtain

    xK=1+2x2+ +K1xK1+1z1+ +LzL,

    so that for x= (1, x2, . . . , xK1,xK) we define

    2SLS

    n = En[xx]1En[xy] .

    In fact the 2SLS is an OLS estimate noting that

    x= L (x|z) =nz, n= En[zz]1

    En[zx]

    so that

    En[xx] = nEn[zx

    ] =En[xz]En[zz

    ]1En[zx

    ]

    = En[xz]En[zz

    ]1En[zz

    ]En[zz]1En[zx

    ]

    = En

    nzz

    n

    = En[xx

    ] ,

    or in other terms, x xis orthogonal to x.

    4

  • 8/13/2019 0 Econometrics II

    5/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    The name 2SLScomes from these two steps:

    1. RegressxK on z= (1, x2, . . . , xK1, z1, . . . , z L) and compute xK.

    2. Regressy on x= (1, x2, . . . , xK1,xK) .

    However, note that the usual OLS s.e.s of the second step are not correct for the

    2SLS estimate, while the first step can not be substituted by a regression of xK on

    z= (1, z1, . . . , z L) .

    E [zx] has full column rank if one ofj is different from zero in the RF (1.6), otherwise

    the last element ofx could be written as a linear combination of the other ones (plus an

    orthogonal term to z). The values ofj are not relevant. To test the rank condition,

    H0: 1= =L= 0

    against the alternative that at least one j is different from zero we need to use an F or

    a (robust) Wald statistic.

    The model with a single endogenous variable is said to be overidentified when L >1,

    and there are L 1 overidentifying restrictions.

    5

  • 8/13/2019 0 Econometrics II

    6/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    1.3 Asymptotic Properties of 2SLS estimates

    ASS. 1 (Orthogonality) For someL 1 vectorzE[zu]L1

    =0.

    Unless every element ofx is exogenous, z will have to contain variables obtained from

    outside the model.

    ASS. 2 (Rank Condition)

    (a) rank(E[zz])LK

    =L; (b) rank(E[zx]LK

    ) =K.

    A necessary condition for ASS. 2(b) to hold is the order condition L K. Whenz= x (exogenous regressors) and K =L, then this is equivalent to the rank condition

    for OLS. For testing part (b) it can be checked that in the RF of the endogenous variables

    at least one element in z not in xis significant (but this is not sufficient).

    ASS. 2 identifies since we can definex= L (x|z) =z,where =E [zz]1 E [zx] andx= x + r,whereE [zr] = 0 and E [xr] = 0.Then the 2SLS estimate is an IV estimate

    with instrumentsx, so that

    E [xx]= E [xy] ,

    and is identified by = E [xx]1 E [xy] ,ifE [xx] has rank K, but

    E [xx] =E [zx] =E [xz]E [zz]1E [zx] ,

    which is not singular ifE [zx

    ] has rank K, i.e. ASS. 2(b).

    The 2SLS estimate can also be written as

    2SLS

    n =En[xz

    ]En[zz]1En[zx

    ]1

    En[xz]En[zz

    ]1En[zy] ,

    from which we obtain the following results.

    THM. 1 (Consistency of 2SLS) Under ASS. 1-2, asn ,

    2SLS

    n p .

    6

  • 8/13/2019 0 Econometrics II

    7/17

  • 8/13/2019 0 Econometrics II

    8/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    Properties of 2SLS estimation

    This result is not interesting when L = K (why?). Whenx is exogenous, then zcontainsxand OLS is the efficient linear estimate because L (x|z) =x.

    Further we will always do better adding more instruments, at least under cond.homoskedasticity (but ifL >> Kfinite sample properties might be affected).

    If ASS. 3 fails there can be more efficient estimates when L > K. Even if x isexogenous and included in z,if ASS. 1-2 hold but not ASS. 3, OLS is not necessarily

    asymptotically efficient.

    Testing can be done as usual, even Ftests based on SSRr and SSRur for exclusionand homogeneity restrictions, but these should be computed from the second stage

    OLS regression, not from the single-step 2SLS procedure.

    Heteroskedasticity Robust Inference. ASS. 3 can be restrictive in many contexts,so a general estimate ofE [u2xx] ,such as

    En

    u2nxx

    ,

    should be used instead of

    2

    nEn[xx

    ] .

    Problems of 2SLS estimation

    2SLS estimation is never unbiased when one explanatory variable is endogenous(and has a reduced number of moments in the Gaussian case).

    In large samples IV estimator can be ill-behaved if the instruments are weak. If

    there is a small correlation ofu and z and the correlation between x and z is weak,

    then the limit of2SLS

    n could be arbitrarily large, and 2SLS could have worse

    properties than OLS.

    The stantard errors of 2SLS are typically large, compared e.g. with OLS s.e.s,since from the second step its AVar depends on the variability of xK,instead than

    of that ofxK itself,

    AV ar 2SLS

    nK = 2

    SS RK=

    2

    SS TK(1 R2

    2S)

    = 2

    SS TK(1 R2

    2S) R2

    K

    ,

    8

  • 8/13/2019 0 Econometrics II

    9/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    with SS RK from the regression of xK (with SS TK) on 1,x1, . . . ,xK1, and R2

    2S

    is the correspondent R2 correcting for colinearity among regressors. Since xK is a

    projection ofxKon the instruments, its variability is equal toSS TK=S STKR

    2

    K

    where R2Kis obtained from regression (1.6) and S STKis for the original xK.ThenSS TK is small if the 1st step regression is not very informative on the variability

    ofxK, i.e. if the instruments are weak.

    2SLS solution to the Omitted Variables and Measurement Error Problems

    Omitted Variables:

    y = 0+

    1x1+ +KxK+q+v, E [v|x, q] = 0.

    The solution would be to putqin the error term, u = q+v,and then find instruments

    for any element ofx that is correlated with q. These instruments should satisfy:

    1. Be redundant in the structural model E [y|x, q] .

    2. Be uncorrelated with the omitted variableq.

    3. Be sufficiently correlated with the endogenous elements ofx.

    Multiple Indicator Solution.

    An alternative is using indicators of the unobservables q ,as when using proxy variables.

    An indicator q1 can be written as

    q1 = 0+1q+a1Cov(q, a1) = 0; Cov (x,a1) =0. (1.7)

    The CEV model appears when q1 is the observed measurement, 0= 0 and 1= 1.

    Writing, with 1= 0,

    q= (0/1) + (1/1) q1 (1/1) a1,

    the error term is necessarily correlated with q1, so the proxy variable solution is incon-

    sistent under (1.7).

    9

  • 8/13/2019 0 Econometrics II

    10/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    To use this assumption we need more information, as a a second indicator,

    q2=0+1q+a2,

    wherea2 satisfies the same assumptions as a1,and

    Cov(a1, a2) = 0.

    Then we find that

    y= 0+ x +

    1q1+ (v (/1)a1) ,

    where q2 is uncorrelated with v (because is redundant in the structural equation) and

    with a1 by the assumption that the only relationship between q1 and q2 is due to q.

    Thanks to this relation, then q2 can be used as IV for q1.

    Note that this is different from leaving q in the error term, which would imply to find

    instruments for all elements ofx which are correlated with q.

    10

  • 8/13/2019 0 Econometrics II

    11/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    1.4 Specification Tests

    1.4.1 Endogeneity tests

    The 2SLS estimate is less efficient that the OLSE when the explanatory variables are

    exogenous. Therefore, endogeneity tests are useful to decide whether 2SLS estimation

    is necessary.

    In the model

    y1=z

    1 +1y2+u1 (1.8)

    wherez1 are L1 exogenous variables and both y1 and y2 are endogenous, the set of allexogenous variables is denoted by the L 1 vector z, where z1 is a strict subset ofz,

    E [zu1] = 0.

    We also assume that equation (1.8) is identified, which requires that L > L1 (order

    condition) and that the extra elements in z wrt to z1 are partially correlated with y2.

    Hausman (1978) suggested comparing the OLS and 2SLS estimates of1

    = (, 1)

    to

    build a formal tests of endogeneity ofy2.Ify2is uncorrelated withu1then both estimates

    should only differ up to sampling error. Otherwise the OLSE is inconsistent and both

    estimates should differ.

    Hausman Test for Endogeneity

    To see if the difference OLS-2SLS is significative, the original form is complicated because

    involves a singular matrix (and its generalized inverse), but there is an equivalent version

    based on a simple regression.

    For that we use the Reduced Form ofy2,

    y2=

    2z +v2, E [zv2] = 0,

    so thaty2 is endogenous iffv2 is correlated with the structural erroru1(becausez

    is uncorrelated with u1 by assumption).

    The linear projection ofu1 on v2

    u1= 1v2+e1 (1.9)

    11

  • 8/13/2019 0 Econometrics II

    12/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    with 1

    =C ov (v2, u1) /V ar (v2) gives e1, uncorrelated with v2 and zero mean, so

    that E [ze1] = 0 because v2 and u1 are uncorrelated with z.

    Then y2 is exogenous iffu1 and v2 are uncorrelated, i.e. iffH0: 1= 0.

    Plugging equation (1.9) into the original equation we obtain

    y1= z

    1 +1y2+1v2+e1,

    where E [z1e1] = E [y2e1] = E [v2e1] = 0 by construction, so that a test ofH0 can

    be done with the usual t-test.

    The problem is that v2is not observed, but 2can be estimated by OLS, and thenresiduals v2 are obtained, so we run the regression

    y1=z

    1 +1y2+1v2+error. (1.10)

    Now, despite v2 is a generated regressor, the t-test remains valid under H0 (pro-

    vided homoskedasticity holds).

    In fact the estimates of 1

    = (, 1)

    are identical to the 2SLS estimates (see

    Problem 1), but s.e.s are not valid unless 1

    = 0.

    It can also make sense to compare the estimates of1. Under the equivalent toASS. 1-3, it can be shown that

    AV ar

    1,2SLS 1,OLS

    = AV ar

    1,2SLS

    AV ar 1,OLS

    ,

    so it is easy to build a t-test using the s.e.s under homoskedasticity.

    The extension to multiple endogenous regressors,

    y1= z

    1 + y

    21+u1,

    is straightforward and is routinely implemented with heteroskedasticity robust

    Wald tests on the coefficient ofv2 in

    y1= z

    1 + y

    21+ v

    21+error.

    Also LM tests can be implemented using the OLS residuals u1 of regressing y1 on

    z1,y2 on the regression

    u1 on z1, y2, v2

    where v2 are the RF residuals ofy2.

    12

  • 8/13/2019 0 Econometrics II

    13/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    1.4.2 Endogeneity and over-identification restrictions tests

    When there are more instruments that needed, it is possible to test whether the addi-tional instruments are valid, i.e. if they are orthogonal to the structural error u1,

    y1= z

    1 + y

    21+u1,

    where z1 is L1 1 and y2 is G1 1. The vector of all exogenous variables z is L 1,z= (z

    1, z

    2) , withz2 L2 1 and L = L1+L2.

    The model is overidentified ifL2> G1 (more extra IVs than endogenous regressors) i.e.

    ifL > L1+G2 (more exogenous variables than regressors).

    A valid LM test for endogeneity can be formed as nR2u from the OLS regression of

    u1 on z

    where u1 are the 2SLS residuals using all of the instruments z (assuming they contain a

    constant). Under the null,

    H0: E [zu1] = 0,

    so with ASS. 3

    J=nR2ud2Q1,whereQ1=L2 G1,the number of overidentifying restrictions.

    If we reject, then our choice of IV has to be reconsidered, if not, then we can have some

    confidence on them, but the test has low power against some endogenous instruments.

    This test was proposed by Sargan (1958) for the 2SLS estimator under conditional

    homoskedasticity. Hansen (1982) extended this test to general GMM estimates.

    1.4.3 Tests of Functional Form

    RESET, Ramsey (1969), to test for

    y = x +u, E [u|x] = 0

    i.e.E

    [y|x] =x

    , proposed to augment the regression with nonlinear functions ofx.

    13

  • 8/13/2019 0 Econometrics II

    14/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    1.4.4 Tests for Heteroskedasticity

    Heteroskedasticity does not affect consistency of OLS and 2SLS estimates, but doesaffect asymptotic variances of estimates, needed for inference.

    We assume first all regressors are exogenous,

    y=x +u, E [u|x] = 0,

    so the model is well specified. The null is

    H0:E

    u2|x= 2while the alternative is that E [u2|x] depends on x. Then it is sensible to study thecorrelation betweenu2 and h (x) for some Q 1 vector function h,through regressions

    u2i =0+

    1hi+vi,

    wherehi=h (xi) .Under H0, E [vi|xi] = E [vi|hi] = 0, 1= 0 and 0=2.Then to test

    H0:1=0

    we can use an LM or a Wald test (under the assumption that E [v2i |xi] is constant).

    However we can not test for H0 directly because ui is not observed, so residuals uni =

    yi xin have to be used instead, and then regress

    u2ni on 1, hi

    and compare nR2c to a 2

    Q distribution (no adjustment, see Problem 10).

    There are different tests based on the choice ofhi : Breusch and Pagan (1979) and

    Koenker (1981) propose hi=xi,while White (1980) proposes all nonconstant and unique

    elements ofxiandxix

    i.Both have degrees of freedom depending on the dimension ofxi,which can be quite large in applications. To avoid problems from this, it can be taken

    hi = (yni,y2

    ni)

    , where yni are just the OLS fitted values, which are linear functions of

    xi,leading to an asymptotic test with two degrees of freedom.

    If we allow for endogenous regressors, and have exogenous variables z, then we have

    to consider hi=h (zi) , but no endogenous regressors in any case.

    RECOMMENDED READINGS: Wooldridge (2002, Ch. 5 & 6). Hayashi (2000, Ch. 3).

    Ruud (2000, Ch. 20-22). Mittelhammer et al. (2000, Ch. 17).

    14

  • 8/13/2019 0 Econometrics II

    15/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    Problem Set 1

    1. Show the equivalence of 2SLS estimates of (1, 1)

    in the system

    y1 = z

    11+1y2+u1

    y2 = z2+v2,

    where y2 is the suspected endogenous regressor, and z is the set of all exogenous

    variables, with at least one more element thanz1,and the OLS estimate of (

    1, 1)

    in

    y1=z

    11+1y2+1v2+error

    where v2 are the residuals from the OLS estimation of the reduced form for y2.

    [Hint: use partitioned regression algebra for OLS estimates and that z1 and v2are

    orthogonal in the sample.]

    2. Consider the multiple indicator model.

    (a) Show that ifq2is uncorrelated withxj, j= 1, 2, . . . , K ,then the reduced form

    ofq1depends only on q2[Hint: Use the fact that the reduced form ofq1is thelinear projection ofq1onto (1, x1, x2,...,xK, q2) and find the coefficient vector

    on x using two-step multiple regression algebra].

    (b) What happens if q2 and x are correlated? In this setting, is it realistic to

    assume that q2 and x are uncorrelated?

    3. Consider IV estimation of the simple linear model with a single, possibly endoge-

    nous, explanatory variable x, and a single instrument z:

    y = 0+1x+u

    E(u) = 0; Cov (z, u) = 0; Cov (z, x) = 0, E u2|z= 2.(a) Under the preceding (standard) assumptions, show that AVar

    n1/2

    1

    1

    can be expressed as

    2

    2zx2x

    where 2x = V ar (x) and zx = Corr (z, x) . Compare this result with the

    asymptotic variance of the OLS estimate under E(u

    |x) = 0 and E(u2

    |x) =

    2.

    1

  • 8/13/2019 0 Econometrics II

    16/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    (b) Comment on how each factor affects the asymptotic variance of the IV esti-

    mator. What happens as zx 0.

    4. A model with a single endogenous explanatory variable can be written as

    y1=z

    11+1y2+u1, E [zu1] = 0,

    where z= (z1, z

    2) . Consider the following two-step method, intended to mimic

    2SLS:

    (a) Regressy2 onz2, and obtain fitted values, y2 (that is, z1 is omitted from the

    first-stage regression.)

    (b) Regressy1 on z1, y2 to obtain 1 and 1. Show that 1 and 1 are generally

    inconsistent. When would 1 and 1 be consistent?

    5. In the setup of Sections 1.2-1.3 with x= (x1, . . . , xK) and z= (x1, . . . , xK1, z1, . . . , z M) ,

    with x1= 1,and assume that E [zz] is nonsingular. Prove that rank(E [zx]) =K

    iff at least one j in equation (1.6) is different from zero.

    6. Consider the model of the previous exercise.

    (a) FindL

    (y|z) in terms ofj , x1, . . . , xK1 and x

    K=L

    (xK|z).(b) Argue that, provided x1, . . . , xK1,x

    K are not perfectly collinear, an OLS

    regression ofy on x1, . . . , xK1, x

    Kconsistently estimates all j.

    (c) State a necessary and sufficient condition for xKnot to be a perfect linear

    combination ofx1, . . . , xK1.What 2SLS assumption is identical to?

    7. Consider a structural linear model with unobserved variableq

    y = x +q+v, E [v|x, q] = 0,

    Suppose, in addition, that E [q|x] = x for some K 1 vector ; thus, qand xare possibly correlated.

    (a) Show that E [y|x] is linear in x. What consequences does this fact have fortests of functional form to detect the presence of q? Does it matter how

    strongly qand xare correlated?

    (b) Now add the assumptions Var[v|x,q] =2v,Var[q|x] =2q.Show that Var[y|x]is constant. What does this fact imply about using tests for heteroskedasticity

    to detect omitted variables?

    2

  • 8/13/2019 0 Econometrics II

    17/17

    Econometrics II-1. VI methods. 2012/13 UC3M. Master in Economic Analysis

    (c) Now write the equation asy= x + u, where E [ux] = 0 and Var[u|x] =2.IfE [u|x] = E [u] ,argue that an LM test regressing squares of residuals u2i on

    functions ofxi,will detect heteroskedasticity in u, at least in large samples.

    8. In the linear model y = x +u assume that ASS. 1-3, hold with w in place ofz,

    where w contains all nonredundant elements ofx and z. Further, assume that the

    rank conditions hold for OLS and 2SLS. Show that

    AV ar2SLS OLS

    = AV ar

    2SLS

    AV ar

    OLS

    .

    9. Show that the degrees of freedom of the overidentifying restrictions tes is actually

    Q1= L2

    G1 (and not e.g. L2).

    10. Show that the heteroskedasticity test hasQ degrees of freedom (the dimension of

    hi) despite using residuals u2

    in to build it.

    3