Nielsen GMM

16
Econometrics 2 — Fall 2005 Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 32 Outline (1) Introduction and motivation (2) Moment Conditions and Identication (3) A Model Class: Instrumental Variables (IV) Estimation (4) Method of Moment (MM) Estimation Examples: Mean, OLS and Linear IV (5) Generalized Method of Moment (GMM) Estimation Properties: Consistency and Asymptotic Distribution (6) Ecient GMM Examples: Two-Stage Least Squares (7) Comparison with Maximum Likelihood Pseudo-ML Estimation (8) Empirical Example: C-CAPM Model 2 of 32

description

ggm

Transcript of Nielsen GMM

  • Econometrics 2 Fall 2005

    Generalized Method of Moments

    (GMM) Estimation

    Heino Bohn Nielsen

    1 of 32

    Outline(1) Introduction and motivation

    (2) Moment Conditions and Identification

    (3) A Model Class: Instrumental Variables (IV) Estimation

    (4) Method of Moment (MM) EstimationExamples: Mean, OLS and Linear IV

    (5) Generalized Method of Moment (GMM) EstimationProperties: Consistency and Asymptotic Distribution

    (6) Ecient GMMExamples: Two-Stage Least Squares

    (7) Comparison with Maximum LikelihoodPseudo-ML Estimation

    (8) Empirical Example: C-CAPM Model

    2 of 32

  • IntroductionGeneralized method of moments (GMM) is a general estimation principle.Estimators are derived from so-called moment conditions.

    Three main motivations:(1) Many estimators can be seen as special cases of GMM.

    Unifying framework for comparison.

    (2) Maximum likelihood estimators have the smallest variance in the class of consistentand asymptotically normal estimators.But: We need a full description of the DGP and correct specification.GMM is an alternative based on minimal assumptions.

    (3) GMM estimation is often possible where a likelihood analysis is extremely dicult.We only need a partial specification of the model.Models for rational expectations.

    3 of 32

    Moment Conditions and Identification A moment condition is a statement involving the data and the parameters:

    g(0) = E[f(wt, zt, 0)] = 0. ()

    where is a K 1 vector of parameters; f() is an R dimensional vector of (non-linear) functions; wt contains model variables; and zt contains instruments.

    If we knew the expectation then we could solve the equations in () to find 0.

    If there is a unique solution, so thatE[f(wt, zt, )] = 0 if and only if = 0,

    then we say that the system is identified.

    Identification is essential for doing econometrics. Two ideas:(1) Is the model constructed so that 0 is unique (identification).(2) Are the data informative enough to determine 0 (empirical identification).

    4 of 32

  • Instrumental Variables Estimation In many applications, the moment condition has the specific form:

    f(wt, zt, ) = u(wt, )| {z }(11)

    zt|{z}(R1)

    ,

    where the R instruments in zt are multiplied by the disturbance term, u(wt, ).

    You can think of u(wt, ) as the equivalent of an error term.The moment condition becomes

    g(0) = E[u(wt, 0) zt] = 0,stating that the instruments are uncorrelated with the error term of the model.

    This class of estimators is referred to as instrumental variables estimators.The function u(wt, ) may be linear or non-linear in .

    5 of 32

    Example: Moment Condition From RE Consider a monetary policy rule, where the interest rate depends on expected futureinflation:

    rt = E[t+1 | It] + t.Noting that

    xt+1 = E[xt+1 | It] + vt,where vt is the expectation error, we can write the model as

    rt = E[t+1 | It] + t = xt+1 + (t vt) = xt+1 + ut.

    Under rational expectations, the expectation error, vt, should be orthogonal to theinformation set, It, and for zt It we have the moment condition

    E[ut zt] = E[(rt xt+1) zt] = 0.This is enough to identify .

    6 of 32

  • Method of Moments (MM) Estimator For a given sample, wt and zt (t = 1, 2, ..., T ), we cannot calculate the expectation.We replace with sample averages to obtain the analogous sample moments:

    gT () =1

    T

    TXt=1

    f(wt, zt, ).

    We can derive an estimator, bMM , as the solution to gT (bMM) = 0. To find an estimator, we need at least as many equations as we have parameters.The order condition for identification is R K.

    R = K is called exact identification.The estimator is denoted the method of moments estimator, bMM .

    R > K is called over-identification.The estimator is denoted the generalized method of moments estimator, bGMM .

    7 of 32

    Example: MM Estimator of the Mean Assume that yt is random variable drawn from a population with expectation 0.We have a single moment condition:

    g(0) = E[f(yt, 0)] = E[yt 0] = 0,

    where f(yt, 0) = yt 0.

    For a sample, y1, y2, ..., yT , we state the corresponding sample moment conditions:

    gT (b) = 1TTXt=1(yt b) = 0.

    The MM estimator of the mean 0 is the solution, i.e.

    bMM = 1TTXt=1

    yt,

    which is the sample average.

    8 of 32

  • Example: OLS as a MM Estimator Consider the linear regression model of yt on xt (K 1):

    yt = x0t0 + t. ()

    Assume that () represents the conditional expectation:

    E[yt | xt] = x0t0 so that E[t | xt] = 0.

    That implies the K unconditional moment conditionsg(0) = E[xtt] = E [xt (yt x0t0)] = 0,

    which we recognize as the minimal assumption for consistency of the OLS estimator.

    9 of 32

    We define the corresponding sample moment conditions as

    gT (b) = 1TTXt=1

    xtyt x0tb = 1T

    TXt=1

    xtyt 1

    T

    TXt=1

    xtx0tb = 0.And the MM estimator is derived as the unique solution:

    bMM = TX

    t=1xtx0t

    !1 TXt=1

    xtyt,

    provided thatPT

    t=1 xtx0t is non-singular.

    Method of moments is one way to motivate the OLS estimator.Highlights the minimal (or identifying) assumptions for OLS.

    10 of 32

  • Example: Under-Identification Consider again a regression model

    yt = x0t0 + t = x01t0 + x02t0 + t.

    Assume that the K1 variables in x1t are predetermined, while the K2 = K K1variables in x2t are endogenous. That implies

    E[x1tt] = 0 (K1 1) ()E[x2tt] 6= 0 (K2 1). ()

    We have K parameters in 0 = (00, 00)0, but only K1 < K moment conditions(i.e. K1 equations to determine K unknowns).The parameters are not identified and cannot be estimated consistently.

    11 of 32

    Example: Simple IV Estimator Assume K2 new variables, z2t, that are correlated with x2t but uncorrelated with t:

    E[z2tt] = 0. ()TheK2 moment conditions in () can replace (). To simplify notation, we define

    xt(K1)

    =

    x1tx2t

    and zt

    (K1)=

    x1tz2t

    .

    xt are model variables, z2t are new instruments, and zt are instruments.We say that x1t are instruments for themselves.

    Using () and () we have K moment conditions:

    g(0) =E[x1tt]E[z2tt]

    = E[ztt] = E[zt (yt x0t0)] = 0,

    which are sucient to identify the K parameters in .

    12 of 32

  • The corresponding sample moment conditions are given by

    gT (b) = 1TTXt=1

    ztyt x0tb = 0.

    The method of moments estimator is the unique solution:

    bMM = TX

    t=1ztx0t

    !1 TXt=1

    ztyt,

    provided thatPT

    t=1 ztx0t is non-singular.

    Note the following:(1) We need the instruments to identify the parameters.(2) The MM estimator coincides with the simple IV estimator.(3) The procedure only works with K2 instruments (i.e. R = K).(4) Non-singularity of

    PTt=1 ztx0t requires relevant instruments.

    13 of 32

    Generalized Method of Moments Estimation The case R > K is called over-identification.More equations than parameters and no solution to gT () = 0 in general.

    Instead we minimize the distance from gT () to zero.The distance is measured by the quadratic form

    QT () = gT ()0WTgT (),

    where WT is an RR symmetric and positive definite weight matrix.

    The GMM estimator depends on the weight matrix:bGMM(WT ) = arg min{gT ()0WTgT ()} .

    14 of 32

  • Distances and Weight Matrices Consider a simple example with 2 moment conditions

    gT () =gagb

    ,

    where the dependence of T and is suppressed.

    First consider a simple weight matrix, WT = I2 :

    QT () = gT ()0WTgT () =ga gb

    1 00 1

    gagb

    = g2a + g2b ,

    which is the square of the simple distance from gT () to zero.Here the coordinates are equally important.

    Alternatively, look at a dierent weight matrix:

    QT () = gT ()0WTgT () =ga gb

    2 00 1

    gagb

    = 2 g2a + g2b ,

    which attaches more weight to the first coordinate in the distance.15 of 32

    Consistency: Why Does it Work? Assume that a law of large numbers (LLN) applies to f(wt, zt, ), i.e.

    T1TXt=1

    f(wt, zt, ) E[f(wt, zt, )] for T .

    That requires IID or stationarity and weak dependence.

    If the moment conditions are correct, g(0) = 0, then GMM is consistent,bGMM(WT ) 0 as T ,for any WT positive definite.

    Intuition: If a LLN applies, then gT () converges to g().Since bGMM(WT ) minimizes the distance from gT () to zero, it will be a consistentestimator of the solution to g(0) = 0.

    The weight matrix, WT , has to be positive definite, so that we put a positive andnon-zero weight on all moment conditions.

    16 of 32

  • Asymptotic Distribution Assume a central limit theorem for f(wt, zt, ), i.e.:

    T gT (0) = 1T

    TXt=1

    f(wt, zt, 0) N(0, S),

    where S is the asymptotic variance.

    Then it holds that for any positive definite weight matrix, W , the asymptotic distri-bution of the GMM estimator is given by

    TbGMM 0 N(0, V ).

    The asymptotic variance is given by

    V = (D0WD)1D0WSWD (D0WD)1 ,

    where

    D = Ef(wt, zt, )

    0

    is the expected value of the RK matrix of first derivatives of the moments.17 of 32

    Ecient GMM Estimation The variance of bGMM depends on the weight matrix, WT .The ecient GMM estimator has the smallest possible (asymptotic) variance.

    Intuition: a moment with small variance is informative and should have large weight.It can be shown that the optimal weight matrix, WoptT , has the property that

    plimWoptT = S1.

    With the optimal weight matrix, W = S1, the asymptotic variance simplifies to

    V =D0S1D

    1D0S1SS1D D0S1D1 = D0S1D1 . The best moment conditions have small S and large D. A small S means that the sample variation of the moment (noise) is small. A large D means that the moment condition is much violated if 6= 0.The moment is very informative on the true values, 0.

    18 of 32

  • Hypothesis testing can be based on the asymptotic distribution:bGMM a N(0, T1bV ). An estimator of the asymptotic variance is given bybV = D0TS1T DT1 ,where

    DT|{z}(RK)

    =gT ()0 =

    1

    T

    TXt=1

    f(wt, zt, )0

    is the sample average of the first derivatives.And ST is an estimator of S = T V [gT ()]. If the observations are independent, aconsistent estimator is

    ST =1

    T

    TXt=1

    f(wt, zt, )f(wt, zt, )0.

    Estimation of the weight matrix is typically the most tricky part of GMM.19 of 32

    Test of Overidentifying Moment Conditions Recall that K moment conditions are sucient to estimate the K parameters in .If R > K, we can test the validity of the RK overidentifying moment conditions.

    By MM estimation we can set K moment conditions equal to zero.If all R conditions are valid then the RK moments should also be close to zero.

    From CLT we havegT (0)

    a N(0, T1S).

    If we use the optimal weights, WoptT S1, then

    J = T gT (bGMM)0WoptT gT (bGMM) = T QT (bGMM) 2(RK). This is the J-test or the Hansen test for overidentifying restrictions.In linear models it is often referred to as the Sargan test.J is not a test of the validity of model or the underlying economic theory.J considers whether theRK moments are in line with theK identifying moments.

    20 of 32

  • Computational Issues The estimator is defined by minimizing QT (). Minimization can be done by

    QT () =

    (gT ()0WTgT ()) = 0(K1).

    Sometimes analytically but often by numerical optimization. We need an optimal weight matrix, WoptT , but that depends on the parameters!

    Two-step ecient GMM:(1) Choose an initial weight matrix, e.g. W[1] = IR, and find a consistent but inecient

    first-step GMM estimator b[1] = argmin gT ()0W[1]gT ().(2) Find the optimal weight matrix, Wopt[2] , based on b[1]. Find the ecient estimatorb[2] = argmin gT ()0Wopt[2] gT ().The estimator is not unique as it depends on the initial weight matrix W[1].

    21 of 32

    Iterated GMM estimator: From the estimator b[2] it is natural to update the weights, Wopt[3] , and update b[3].We can switch between estimating Wopt[] and b[] until convergence.Iterated GMM does not depend on the initial weight matrix.The two approaches are asymptotically equivalent.

    Continuously updated GMM estimator: A third approach is to recognize from the outset that the weight matrix depends onthe parameters, and minimize

    QT () = gT ()0WT ()gT ().

    That is never possible to solve analytically.

    22 of 32

  • Example: 2SLS Consider again a regression model

    yt = x0t0 + t = x01t0 + x02t0 + t,

    where E[x1tt] = 0 and E[x2tt] 6= 0.Assume that you have R > K valid instruments in zt so that

    g(0) = E[ztt] = E[zt (yt x0t0)] = 0.

    The corresponding sample moments are given by

    gT ()| {z }(R1)

    =1

    T

    TXt=1

    zt (yt x0t) =1

    T Z0 (Y X) ,

    where Y (T 1), X (T K), and Z (T R) are the stacked matrices.

    In this case we cannot solve gT () = 0 directly; Z 0X is RK and not invertible.23 of 32

    Instead, we want to derive the GMM estimator by minimizing the criteria functionQT () = gT ()0WTgT ()

    =T1Z 0 (Y X)

    0WT T1Z 0 (Y X)= T2

    Y 0ZWTZ 0Y 20X 0ZWTZ 0Y + 0X 0ZWTZ 0X

    .

    We take the first derivative, and the GMM estimator is the solution toQT () = 2T

    2X 0ZWTZ 0Y + 2T2X 0ZWTZ 0X = 0.

    We find bGMM(WT ) = (X 0ZWTZ 0X)1X 0ZWTZ 0Y , depending on WT . To estimate the optimal weight matrix, WoptT = S1T , we use the estimator

    ST =1

    T TXt=1

    f(wt, zt, )f(wt, zt, )0 =1

    T

    TXt=1b2t ztz0t,

    which allows for general heteroskedasticity of the disturbance term.

    24 of 32

  • For the asymptotic distributions, we recall thatbGMM a N 0, T1 D0S1D1 .The derivative is given by

    DT(RK)

    =gT ()0 =

    T1

    PTt=1 zt (yt x0t)

    0 = T

    1TXt=1

    ztx0t,

    so the variance of the estimator becomes

    VhbGMMi = T1 D0TWoptT DT1

    = T1T1

    TXt=1

    xtz0t

    !T1

    TXt=1b2t ztz0t

    !1T1

    TXt=1

    ztx0t

    !1

    =

    TXt=1

    xtz0t

    !1 TXt=1b2t ztz0t

    TXt=1

    ztx0t

    !1.

    Note that this is the heteroskedasticity consistent (HC) variance estimator of White.GMM with allowance for heteroskedastic errors automatically produces heteroskedas-ticity consistent standard errors!

    25 of 32

    If we assume that the error terms are IID, the optimal weight matrix simplifies to

    ST =b2T

    TXt=1

    ztz0t = T1b2Z 0Z,where b2 is a consistent estimator for 2.

    In this case the ecient GMM estimator becomesbGMM = X 0ZS1T Z 0X1X 0ZS1T Z 0Y.=X 0Z

    T1b2Z 0Z1Z 0X1X 0Z T1b2Z 0Z1Z 0Y

    =X 0Z (Z 0Z)1Z 0X

    1X 0Z (Z 0Z)1Z 0Y,

    which is identical to the two stage least squares (2SLS) estimator.

    The variance of the estimator isVhbGMMi = T1 D0TS1T DT1 = b2(X 0Z (Z 0Z)1Z 0X)1,

    which again coincides with the 2SLS variance.26 of 32

  • Pseudo-ML (PML) Estimation The first order conditions for ML estimation can be seen as a sample counterpart toa moment condition

    1

    T

    TXt=1

    st () = 0 corresponds to E[st ()] = 0,

    and ML becomes a special case of GMM.

    bML is consistent for weaker assumptions than maintained by ML.The FOC for a normal regression model corresponds to

    E[xt(yt x0t)] = 0,

    which is weaker than the assumption that the entire distribution is correctly specified.OLS is consistent even if t is not normal.

    A ML estimation that maximizes a likelihood function dierent from the true modelslikelihood is referred to as a pseudo-ML or a quasi-ML estimator.Note that the variance matrix is no longer the inverse information.

    27 of 32

    (My Unfair) Comparison of ML and GMM

    Maximum Likelihood Generalized Method of Moments

    Assumptions: Full specification. Partial specification/weak assumptions.Know Density(0) apart from 0. Moment conditions: E[f(data;0)] = 0.

    Strong economic assumptions.

    Eciency: Cramr Rao lower bound. Ecient based on moment condition.(Smallest possible variance). Larger than Cramr Rao.

    Typical Statistical description of the data. Estimate deep parameters ofapproach: Misspecification testing. economic model.

    Restrictions recover economics.

    Robustness: First order conditions should hold! Moment conditions should hold!PML is a GMM interpretation of ML. Weights and variances canUse larger PML variance. be made robust.

    28 of 32

  • Example: The C-CAPM Model Consider the consumption based capital asset pricing (C-CAPM) model of Hansenand Singleton (1982).

    A representative agent maximizes the discounted value of lifetime utility subject toa budget constraint:

    maxXs=1

    E [s u(ct+s) | It] ,

    At+1 = (1 + rt+1)At + yt+1 ct+1,

    where At is financial wealth, yt is income, 0 1 is a discount factor, and It isthe information set at time t.

    The first order condition is given by the Euler equation:u0(ct) = E [ u0(ct+1) Rt+1 | It] ,

    where u0() is the derivative, and Rt+1 = 1 + rt+1 is the return factor.29 of 32

    Now assume a constant relative risk aversion (CRRA) utility function:

    u(ct) =c1t1 , < 1,

    so that u0(ct) = ct . That gives the explicit Euler equation:

    ct E ct+1 Rt+1 | It

    = 0.

    To ensure stationarity, we reformulate:

    E" ct+1ct

    Rt+1 1 | It

    #= 0,

    which is a conditional moment condition.

    That implies the unconditional moment conditions

    E [f (ct+1, ct, Rt+1; zt; , )] = E"

    ct+1ct

    Rt+1 1

    !zt

    #= 0,

    for all variables zt It included in the formation set.30 of 32

  • To estimate the parameters, = (, )0, we need at least R = 2 instruments in zt.We try with R = 3 instruments:

    zt =1, ctct1

    , Rt0.

    That produces the moment conditions

    E"

    ct+1ct

    Rt+1 1

    ! #= 0

    E"

    ct+1ct

    Rt+1 1

    !ctct1

    #= 0

    E"

    ct+1ct

    Rt+1 1

    !Rt

    #= 0,

    for t = 1, 2, ..., T .

    The model is formally identified but is poorly determined.Weak instruments, little variation in the data, or wrong model!

    31 of 32

    Results for US data, 1959 : 3 1978 : 12.Method Lags T J DF p val

    2-Step HC 1 0.9987(0.0086)

    0.8770(3.6792)

    237 0.434 1 0.510

    Iterated HC 1 0.9982(0.0044)

    1.0249(1.8614)

    237 1.068 1 0.301

    CU HC 1 0.9981(0.0044)

    0.9549(1.8629)

    237 1.067 1 0.302

    2-Step HAC 1 0.9987(0.0092)

    0.8876(4.0228)

    237 0.429 1 0.513

    Iterated HAC 1 0.9980(0.0045)

    0.8472(1.8757)

    237 1.091 1 0.296

    CU HAC 1 0.9977(0.0045)

    0.7093(1.8815)

    237 1.086 1 0.297

    2-Step HC 2 0.9975(0.0066)

    0.0149(2.6415)

    236 1.597 3 0.660

    Iterated HC 2 0.9968(0.0045)

    0.0210(1.7925)

    236 3.579 3 0.311

    CU HC 2 0.9958(0.0046)

    0.5526(1.8267)

    236 3.501 3 0.321

    2-Step HAC 2 0.9970(0.0068)

    0.1872(2.7476)

    236 1.672 3 0.643

    Iterated HAC 2 0.9965(0.0047)

    0.2443(1.8571)

    236 3.685 3 0.298

    CU HAC 2 0.9952(0.0048)

    0.9094(1.9108)

    236 3.591 3 0.30932 of 32