Classical Inference with ML and GMM Estimates with Various Rates
Transcript of Classical Inference with ML and GMM Estimates with Various Rates
Classical Inference with ML and GMM Estimates
with Various Rates of Convergence
Lung-fei Lee∗
June 2005
Department of Economics, Ohio State UniversityColumbus, OH 43210
Abstract
This paper considers classical hypothesis testing in the maximum likelihood(ML) and generalized method of moments (GMM) frameworks, where compo-nents of unconstrained (and constrained) estimates of a model may have variousrates of convergence and their limiting distributions are asymptotically normallydistributed. Sufficient conditions are established under which the likelihood ra-tio, efficient score, C(α), and Wald-type statistics for the testing of generalequality constraints can be asymptotically χ2 and are asymptotically equivalentunder both null and a sequence of local alternatives. Similarly, results for theanalogous difference test, gradient test, C(α)-type gradient test, and Wald testin the GMM estimation framework are established.
1 Introduction
In this paper, we consider classical hypothesis testing in the maximum like-
lihood (ML) and generalized method of moments (GMM) frameworks, where
components of unconstrained (and constrained) estimates of a model may have
various rates of convergence in distribution and their limiting distributions are
asymptotically normally distributed.
We consider the classical hypothesis testing of general (linear or nonlinear)
equality constraints on parameters of an econometric model. In the ML frame-
work, the classical testing procedures include the likelihood ratio (LR) test, the∗I appreciate having financial support from the NSF under grant no. 0519204 for my
research.
1
Lagrangian Multiplier (LM) (efficient score) test, Neyman’s C(α) test, the Wald
(W) test, and the minimum distance (MD) test. For the GMM approach, the
corresponding testing procedures of the LR, LM and C(α) tests are, respec-
tively, the difference test and the gradient test (Newey and West 1987; Ruud
2000), and C(α)-type gradient test (Lee 2005).
It is well known that in both the ML and GMM frameworks, these classi-
cal test statistics are asymptotically equivalent under both the null hypothesis
and an appropriate local alternative hypothesis, when all parameter estimates
in the model have the same (usually,√
n) rate of convergence and are asymp-
totically normally distributed.1 When various parameter estimates may have
different rates of convergence, the situation becomes complicated. In this paper,
we investigate the asymptotic properties of the various classical test statistics
when the ML estimates (MLE) or GMM estimates (GMME) of components of
the parameter vector may have different rates of convergence and their prop-
erly normalized estimates are asymptotically normally distributed. Under some
circumstances, we show that the familiar asymptotic properties of the classical
test statistics and their asymptotically equivalent results will still be valid.
In Section 2, we shall set up the situation in both the ML and GMM esti-
mation frameworks, where components of the unconstrained MLE and GMME
may have different rates of convergence. We shall focus on the case where their
asymptotic distributions are asymptotically normal. Section 3 considers gen-
eral equality constraints and asymptotic properties of the constrained MLE and
GMME. The subsequent sections consider the various classical hypothesis statis-
tics and their asymptotic properties. Section 4 specifies the local alternative
hypothesis under consideration. The MD test approach provides a framework
which connects the various classical testing statistics in Section 5. Section 6 con-
siders the Wald test. The LR type tests are in Section 7. The score-type tests
are considered in Section 8. Conclusions are drawn in Section 9. All the proofs
of the propositions are collected in Appendix A. Appendix B provides a result1The exceptional case that has not been considered in the literature is the C(α)-type
gradient test statistic, which has only been recently formulated in Lee (2005).
2
on the overidentification test in the GMM framework. Appendix B provides an
example to illustrate the importance of a critical assumption (Assumption G or
R) in establishing our asymptotic normal theory.
2 MLE and GMME with Various Rates of Con-
vergence
2.1 ML Estimation and the Asymptotic Distribution ofMLE
Let Ln(β) be the likelihood function of the parameter vector β in the parameter
space S, which is a convex and compact subset of the p dimensional Euclidian
space. The β0 denotes the true parameter vectors of β, which lies in the interior
of S. The likelihood function is twice continuously differentiable with respect
to β.
The β can be estimated by the maximization of ln Ln(β) on the parameter
space S. Let βn be the unconstrained MLE of β. We shall assume that the
consistency of βn has been established.2 For the purpose of this paper, our
subsequent analysis will concentrate on the issue of asymptotic distributions of
estimators and the associated inference statistics.
Assumption ML-C. The βn is a consistent estimate of β0 .
The asymptotic distribution of βn follows from
βn − β0 = −
∂2 ln Ln(β∗1,n)
∂β1∂β′
...∂2 ln Ln(β∗
p,n)
∂βp∂β′
−1
∂ lnLn(β0)∂β
, (1)
where βl is the lth component of β and β∗l,n’s lie between βn and β0, by the mean
value theorem.3 To simplify notations, we shall denote B∗n = (β∗
1,n, · · · , β∗p,n)
2For this circumstance, the conventional analysis by establishing the uniform convergenceof 1
nlnLn(β) to a well-defined limiting objective function (see, e.g., Amemiya (1985)) will
not be useful, because the limiting objective function will be flat around some componentsof β. For the spatial-econometric model in Lee (2004a), the analysis is applicable to someconcentrated likelihood function. In Park and Phillips (2000) for the analysis of nonstationarybinary choice models, they adopted the approach initiated in Wu (1981).
3The mean value theorem is applied to each component of∂ ln Ln(β)
∂β. The mean value
3
and ∂2 ln Ln(B∗n)
∂β∂β′ in place of the matrix of second order derivatives in the above
expression (1).
Assumption ML-D. Suppose there exists a sequence of invertible p × p
matrices Γn such that
1) −Γ′−1n
∂2 ln Ln(B∗n)
∂β∂β′ Γ−1n
p→ Ω;
2) Γ′−1n
∂ ln Ln(β0)∂β
d→ N (0, Ω),
where Ω is a p × p positive definite matrix, for any consistent estimates β∗j,n,
j = 1, · · · , p, in B∗n of β0.
Proposition 2.1 Under Assumptions ML-C and ML-D, the MLE βn has
the asymptotic distribution that
Γn(βn − β0) = Ω−1Γ′−1n
∂ lnLn(β0)∂β
+ op(1) d−→ N (0, Σ), (2)
where Σ = Ω−1.
If Γn is a diagonal matrix, the diagonal elements of Γn will represent the
rates of convergence for components of the MLE vector βn. The rates might not
be the same for all the components. The asymptotic variance of the MLE βn is
Γ−1n ΣΓ
′−1n = (Γ′
nΩΓn)−1. From 1) of Assumption ML-D, this asymptotic vari-
ance can be estimated by the estimated information matrix (−∂2 ln Ln(βn)∂β∂β′ )−1.
2.2 GMM Estimation and the Asymptotic Distribution ofthe GMME
We start with the framework with k moment equations
E(fn(β0)) = 0,
where fn : S → <k with k < p are continuously differentiable mappings. The
following basic assumptions are considered:
theorem is applicable to scalar value functions but not to vector-valued functions. This dis-tinction is less relevant in conventional asymptotic analysis but, for the analysis in this paper,our assumptions need to take this specific feature into account.
4
Assumption GMM-D1. There exists a sequence of invertible k × k ma-
trices Λn such that
Λnfn(β0)d−→ N (0, V ),
where V is a k × k positive definite variance matrix.
In this case, a possible generalized method of moments may be formulated
as
minβ∈S
f ′n(β)Λ′
nV −1n Λnfn(β), , (3)
where Vn is a consistent estimate of V . The minimized objective function takes
into account the possible different rates of convergence of the moments fn(β0)
at the true parameter vector β0. If all the moments have the same rate of
convergence, this reduces to the conventional GMM objective function. We
shall assume that the GMME βn is consistent to begin with.4
Assumption GMM-C. The GMME βn is a consistent estimate of β0
In order to derive appropriate rates of convergence of the GMM estimates,
the following assumption is useful. The notation ∂fn(B)∂β′ shall denote the matrix
of the partial derivatives of fn with its components evaluated at possible different
values of β with B ∈ Sk.
Assumption GMM-D2. There exists a sequence of invertible p × p ma-
trices Γn such that
Λn∂fn(B)
∂β′ = Fn(B)Γn,
for some k× p stochastic matrix Fn(B) on Sk, which converges in probability to
a nonstochastic matrix F (B) uniformly on Sk where F (β0, · · · , β0) has the full
rank p.
Note that the matrix Γn in Assumption GMM-D2 may not be that one in
Assumption ML-D in a single model. However, we use the same notation for
some unified assumptions in the subsequent Section 3 for both ML and GMM
estimation.4Contrary to the conventional case, the limiting objective function will be a stochastic
function instead of a nonstochastic function. Examples for consistency analyses for relatedsituations are in Moon and Schorfheide (2002) and Lee (2004b).
5
When B = (β, · · · , β), one may simply, respectively, denote Fn(β) and F (β)
for Fn(B) and F (B). Furthermore, F (β0, · · · , β0) will be denoted by F0 for
simplicity. Under these situations, the asymptotic normal distribution of βn
can be derived.
Proposition 2.2 Under Assumptions GMM-C, GMM-D1, and GMM-D2,
the GMME βn has the limiting distribution that
Γn(βn − β0) = −(F ′0V
−1F0)−1F ′0V
−1Λnfn(β0) + oP (1)d→ N (0, Σ), (4)
where Σ = (F ′0V
−1F0)−1.
If Γn is a diagonal matrix, then its diagonal elements represent various
rates of convergence for the components of βn. The asymptotic variance of
the GMME βn is Γ−1n ΣΓ
′−1n = (Γ′
nF ′0V
−1F0Γn)−1, which can be estimated by
(∂f ′n(βn)∂β Λ′
nV −1n Λn
∂fn(βn)∂β′ )−1.
3 Constrained ML and GMM Estimates
Consider the general equality constraints in the form that
R(β) = 0, (5)
where R : <p → <(p−q) forms a set of functionally independent (p−q) constraints
with ∂R(β0)∂β′ having the full rank (p − q). Such constraints may equivalently be
represented in the alternative form
β = g(δ), (6)
where δ ∈ <q with q ≤ p is the vector of free parameters, and ∂g(δ0)∂δ′ has the
full rank q. Explicitly, suppose that R(β) = 0 in (5). Let β = (β∗′
1 , β∗′
2 )′ be
a partition with β∗1 ∈ <p−q and β∗
2 ∈ <q . The β∗2 can be regarded as the free
parameters. Given β∗2 , β∗
1 can be solved from R(β∗1 , β∗
2) = 0 as β∗1 = g1(β∗
2 ).
Therefore, the constraints R(β) = 0 can be rewritten into β = g(δ) where δ = β∗2
and g(δ) = (g′1(β∗2 ), β∗′
2 )′. Conversely, suppose that (6) is satisfied. Decompose
6
(6) into β∗1 = g1(δ) and β∗
2 = g2(δ) where β = (β∗′
1 , β∗′
2 )′ with β∗1 ∈ <(p−q)
and β∗2 ∈ <q, and g2 is invertible. The corresponding constraints in (5) is
β∗1 − g1(g−1
2 (β∗2 )) = 0 with R(β) = β∗
1 − g1(g−12 (β∗
2 )).
For the constraints in (5), we shall consider the situation in the following
assumption, where ∂R(B)∂β′ refers to the matrix of partial derivatives of R with
respect to β′ with its (p − q) components evaluated at possibly different values
of β.
Assumption R. There exists a (p − q) × (p − q) invertible matrix Cn(B)
and a (p − q) × p matrix An(B) with B ∈ S(p−q) such that
∂R(B)∂β′ Γ−1
n = Cn(B)An(B), (7)
where An(B) converges to a nonstochastic finite matrix A(B) uniformly in B in
a neighborhood of (β0, · · · , β0) in S(p−q), and A0 = A(β0, · · · , β0) has full row
rank (p − q).
For the constraints in (6), we consider the following situation. Similarly,∂g(∆)
∂δ′ denotes the partial derivative of g with δ′ with its components evaluated
at possibly different values of δ.
Assumption G. There exists a p × q matrix Gn(D) and a q × q invertible
matrix Dn(∆) such that
Γn∂g(∆)
∂δ′= Gn(∆)Dn(∆) (8)
where Gn(D) converges uniformly in ∆ to G(∆) in a neighborhood of (δ0, · · · , δ0) ∈
(<q)p and G0 = G(δ0, · · · , δ0) has full column rank q.
In order to derive tractable asymptotic distributions of the constrained es-
timates, these assumptions are useful. Appendix C provides an illustrative
example on the asymptotic properties of the MD estimator (MDE) when As-
sumption G is not satisfied. That example illustrates that a general asymptotic
theory might not be feasible if Assumption G does not hold.
In these assumptions, we have paid special attention on each component of
the vector-valued functions R(β) and g(δ) because the linear expansion based
7
on the mean value theorem is applicable only to scalar-value functions. These
assumptions are essentially related to each other if the arguments in the various
components are the same. Because R(β) = 0 with β = g(δ) for all δ, it follows
∂R(g(δ))∂β′
∂g(δ)∂δ′
=∂R(g(δ))
∂β′ Γ−1n Γn
∂g(δ)∂δ′
= 0.
Hence the columns of Γ′−1n
∂R′(g(δ))∂β and Γn
∂g(δ)∂δ′ are perpendicular, and the
columns of [Γ′n
∂R′(g(δ))∂β
, Γn∂g(δ)∂δ′ ] span the p-dimensional Euclidian space <p.
Suppose (7) holds for some An(β) such that ∂R(β)∂β′ Γ−1
n = Cn(β)An(β). The
A′n(β) will span the same column subspace as that of Γ
′−1n
∂R′(g(δ))∂β in <p. The
matrix Gn(δ) can be chosen such that its columns are perpendicular to the
columns of A′n(β) and spans the orthogonal subspace of the column space of
A′n(β). The Gn(δ) can be taken as the orthonormal submatrix corresponds
to the eigenvectors of (Ip − A′n(β)[An(β)A
′−1n (β)]−1An(β)) with the nonzero
(unit) eigenvalues. Because Gn(δ) and Γn∂g(δ)∂δ′ span the same column space in
<p, there must exist an invertible transformation Dn(δ) such that Γn∂g(δ)∂δ′ =
Gn(δ)Dn(δ) as in (8) of Assumption G. Similarly, (8) implies (7) when the
components are evaluated at the same argument.
For simplicity, when B = (β, · · · , β), An(B) will be denoted by An(β) and
Cn(B) by Cn(β). Similarly, when ∆ = (δ, · · · , δ), Gn(∆) and Dn(∆) will be,
respectively, presented by Gn(δ) and Dn(δ). Note that the rows of A0 and G0
are always perpendicular to each other as in the following Proposition.
Lemma 3.1 Under Assumptions R and G, An(β)Gn(δ) = 0, where β = g(δ),
for all δ. This implies, in particular, A0G0 = 0 and the identity
P12 A′
0(A0PA′0)
−1A0P12 = Ip − P−1
2 G0(G′0P
−1G0)−1G′0P
− 12
holds for any p × p-dimensional positive definite matrix P .
The identity is useful for the equivalent expressions of some test statistics
and their limiting distributions in subsequent sections. As they shall be shown
in subsequent sections, while Γn provides the proper rates of the unconstrained
and constrained estimators of β0, if Dn = Dn(∆) does not depend on δ, Dn
8
will provide the rates matrix for the (constrained) estimator of δ0. If Cn =
Cn(B) does not depend on β, C−1n provides the rates matrix of R(βn) with the
unconstrained estimate βn. If Dn(∆) does depend on δ, the following situation
may be of consideration.
Assumption GL. Dn(∆) = D2n(∆)D1n where D2n(∆) is invertible and
D−12n (∆) converges to a matrix S(∆) uniformly in ∆ in a neighborhood of ∆0 =
(δ0, · · · , δ0) at which S(∆) is continuous.
In the event Dn(∆) does not depend on δ to begin with, Assumption GL
will be redundant as D2n(∆) shall be an identity matrix. Note that we have
neither restricted Dn(∆) nor D1n to be diagonal matrices. The implications of
such cases have been illustrated in some examples of constraints in Lee (2004b).
Finally, we note for asymptotic properties of the various test statistics, Assump-
tion GL will not be needed. Assumption GL is relevant only for the asymptotic
distribution and the rate of convergence of estimators of δ0. Assumption GL
allows the possibility that the resulted (constrained) estimator of δ0 may have
degenerate distribution after proper rates normalization.
3.1 Examples
Here we provide a simple example in a GMM estimation framework, where
Assumptions GMM-D1 and GMM-D2 hold, and also an example in the ML
framework. Assumption R and/or Assumption G will also be valid for some
restrictions of interest. Other relatively complicated examples can be found in
Lee (2004b).
3.1.1 A model of social interactions with rational expectations
The illustrated example is a model of social interactions with rational ex-
pectations as in Manski (1993) and Brock and Durlauf (2001). The social in-
teractions model under consideration is
yri = λ1
mr
mr∑
j=1
E(yrj |Jr) + xri,1α1 +1
mr
mr∑
j=1
xrj,2α2 + ur + εri, (9)
with i = 1, · · · , mr and r = 1, · · · , R in a group setting, where r refers to the
rth group and R is the total number of groups in the sample, while i refers to
9
the ith individual in a group and mr is the total number of members in the
rth group. The ur represents the specific group unobservable variable. The Jr
denotes the information set of the group r, which includes all exogenous vari-
ables xri,1, xri,2 for all i = 1, · · · , mr and r = 1, · · · , R. The disturbances εri
are i.i.d. (0, σ2) for all r and i. In this model, expected outcomes of the group
may influence outcome of each individual member in the group. The expected
outcomes shall be determined as equilibrium outcomes of the equation. The
parameter λ captures this possible effect of the expected group outcome on the
individual’s behavior. This has been termed an endogenous effect in Manski
(1993). The variables 1mr
∑mr
j=1 xrj,2 may capture interaction effect on an in-
dividual’s behavior through observed characteristics of his/her group, which is
termed exogenous interaction effect or contextual effect. For the identification
of the parameters, Manski has noted that xri,1 shall contain relevant exogenous
variables not included in xri,2.
For the estimation of this model, one may consider the GMM estimation
framework. The structural equation (9) implies that
1mr
mr∑
i=1
E(yri|Jr) =1
1 − λ(xr,1α1 + xr,2α2 + ur).
Hence, the structural equation can be rewritten as
yri = (xri,1 − xr,1)α1 + xr,1α1
1 − λ+ xr,2
α2
1 − λ+ ur + εri,
where xr,l = 1mr
∑mr
i=1 xri,l for l = 1, 2. The structural equation can be conve-
niently decomposed into the within group and between group equations:
yri − yr = (xri,1 − xr,1)α1 + (εri − εr), i = 1, · · · , mr ; r = 1, · · · , R, (10)
and
yr = xr,1α1
1 − λ+ xr,2
α2
1 − λ+ ur + εr, r = 1, · · · , R, (11)
where εr = 1mr
∑ni=1 εri.
Under the specification of a random component model where ur is un-
correlated with xr,1 and xr,2, the moment conditions of this model can be
10
E[(xri,1 − xr,1)′(εri − εr)] = 0 and E[x′r(ur + εr)] = 0, where xr consists of
all distinctive variables in xr,1 and xr,2. These moment conditions can be used
for the GMM estimation. Let β = (α′1, α
′2, λ). Suppose that xr,1 is of dimen-
sion k1 and xr is of dimension k. The empirical k1 + k moments vector-valued
function is
fn(β) =(
1n
∑Rr=1
∑mr
i=1(xri,1 − xr,1)′(yri − xri,1α1)1R
∑Rr=1 x′
r(yr − xr,1α1
1−λ− xr,2
α21−λ
)
).
For this set of moments, take Λn =(√
nIk1 00
√RIk
). One can see that, in
general,
Λnfn(β0) =
(1√n
∑Rr=1
∑mr
i=1(xri,1 − xr,1)′εri
1√R
∑Rr=1 x′
r(ur + εr)
)d→ N (0, V ),
which satisfies Assumption GMM-D1. The gradient matrix of fn(β) with β is
∂fn(β)∂β′ = (
∂fn(β)∂α′
1
,∂fn(β)
∂α′2
,∂fn(β)
∂λ) =
(An 0 0Bn Cn Dn
)
where
An = − 1n
R∑
r=1
mr∑
i=1
(xri,1 − xr,1)′(xri,1 − xr,1), Bn = − 1(1 − λ)R
R∑
r=1
x′rxr,1,
and
Cn = − 1(1 − λ)R
R∑
r=1
x′rxr,2, Dn = − 1
(1 − λ)2R
R∑
r=1
x′r(xr,1α1 + xr,2α2).
It follows that Λn∂fn(β)
∂β′ = Fn(β)Γn, where
Fn(β) =
(An 0 0√Rn
Bn Cn Dn
), Γn =
√nIk1 0 00
√RIk2 0
0 0√
R
.
Under the assumption that either Rn converges to a finite constant or 0 as n →
∞, the limiting matrix F (β) of Fn(β) can have full column rank when xr,1 has
at least a distinct relevant variable not included in xr,2. Thus, Assumption
GMM-D2 can be satisfied.
The hypotheses that are of interest may be the tests on whether the inter-
action effects are significant or not. We may consider three cases: 1) λ = 0, 2)
α2 = 0, and 3) both λ = 0 and α2 = 0.
11
1) H0 : λ = 0. For this case, R(β) = (0, 0, 1)β and, hence, ∂R(β)∂β′ = (0, 0, 1).
It follows that ∂R(β)∂β′ Γ−1
n = CnAn(β) with Cn = 1√R
and An(β) = (0, 0, 1), which
has the full row rank 1. Thus, Assumption R is satisfied. Alternatively, consider
β = g(δ) where δ = (α′1, α
′2)′ and g(δ) = (α′
1, α′2, 0)′. As ∂g(δ)
∂δ′ =
Ik1 00 Ik2
0 0
,
Γn∂g(δ)∂δ′ = Gn(δ)Dn with Gn(δ) =
Ik1 00 Ik2
0 0
and Dn =
(√nIk1 00
√RIk2
).
Thus, Assumption G is valid.
2) H0 : α2 = 0. This case has R(β) = (0, Ik2 , 0)β. As ∂R(β)∂β′ Γ−1
n = CnAn(β)
with Cn = 1√R
Ik2 and An(β) = (0, Ik2 , 0), which has the full row rank, thus,
Assumption R holds. Alternatively, g(δ) = (α′1, 0, λ)′ with δ = (α′
1, λ)′. As
∂g(δ)∂δ′ =
Ik1 00 00 1
, Γn
∂g(δ)∂δ′ = Gn(δ)Dn with Gn(δ) =
Ik1 00 00 1
, which has
full column rank, and Dn =(√
nIk1 00
√R
). Thus, Assumption G is valid.
3) H0 : λ = 0, α2 = 0. This case corresponds to R(β) =(
0 Ik2 00 0 1
)β.
Equivalently, g(δ) = (α′1, 0, 0)′ where δ = α1. Assumption R is satisfied with
Cn =
(1√R
Ik2 00 1√
R
)and An(β) =
(0 Ik2 00 0 1
). Assumption G is satisfied
with Gn(δ) =
Ik1
00
and Dn =
√nIk1 .
3.1.2 Mixed Estimation
Theil and Golderger (1961) has considered the pooling of sample information
and stochastic restrictions in a mixed estimation framework. The mixing esti-
mation issue may be extended into a general nonlinear restriction framework,
where the possible different degrees of information in the sample and the prior
stochastic restrictions are presented with different rates of convergence.
Let lnLn1(β1) be the log likelihood function which presents the sample infor-
mation about β. Suppose that this log likelihood function satisfies the standard
regularity of the conventional likelihood theory, in particular, 1n
∂2 ln Ln(B1n)∂β∂β′
p→
N (0, V2), and 1√n
∂ ln Ln(β10)∂β1
d→ N (0, V2). Suppose that the stochastic prior
information (restrictions) on β is β2 = h(β1) + ε2, where β2 is an estimate
12
of h(β) such that γn2(β2 − β20)d→ N (0, V2), where β2 = h(β1). The β2 is
a (p − q)-dimensional vector and β is a q-dimensional vector. The ∂h(β1)∂β1
is
a q × (p − q) matrix with full rank. Without loss of generality in asymp-
totical analysis, ε2 may be assumed to be N (0, 1γ2
n2V2). The sample infor-
mation and the prior information are as usual assumed to be independent.
Thus, for this case, Assumption ML-D is satisfied for the unrestricted pa-
rameter vector β = (β1, β2) with Γn =(
γn1Iq 00 γn2I(p−q)
), γn1 =
√n, and
Ω =
(−plimn→∞
1n
∂2 ln Ln1(β10)∂β1∂β′
10
0 V −12
).
For the constrained estimation, one has β = g(δ) where δ = β1 and g(δ) =
(δ′, h′(δ))′. The sample and the prior information will be mixed together for the
estimation of δ. The interesting question is whether the prior information will
be of any importance when one has a large sample. The answer to this question
in our setting will depend on the relative ratio of the rates γn1 and γn2. The
rate of the constrained (mixed) estimator of δ is also interesting. For this model,
it can be shown that Assumptions G and GL will be satisfied. The rate of the
(mixed) estimate δ will have the rate of D1n. The resulted rate matrix D1n will
depend on the ratio of γn1 and γn2.
(1) γn1γn2
→ ∞. In this case, Γn∂g(∆)
∂δ′ = Gn(∆)D1n where D1n = γn1Iq and
Gn(∆) =(
Iqγn2γn1
∂h(∆)∂δ′
). The limiting matrix G0 =
(Iq
0
)which has full
rank q. This corresponds to the case that the prior information is relatively
much weaker than that of the sample, so the mixed estimate of δn has the
γn1-rate of convergence, i.e.,√
n-rate.
(2) γn1γn2
→ c where c > 0 is a finite constant. In this case, D1n = γn1Iq and
Gn(∆) =(
Iqγn2γn1
∂h(∆)∂δ′
)and G0 =
(Iq
c∂h(δ0)∂δ′
)has full rank q. As the
rates are the same, the rate of δn has the similar rate. Both the prior
information and sample are useful for the constrained estimation.
(3) γn1γn2
→ 0. In this case, the prior information is relatively stronger than the
sample information.
13
(i) Case 1: (p− q) ≥ q. In this case, ∂h(δ)δ′ has rank q. The D1n = γn2Iq ,
Gn(∆) =( γn1
γn2Iq
∂h(δ)∂δ′
), and G0 =
(0
∂h(δ0)∂δ′
), which has rank q.
(ii) Case 2: (p − q) < q. Let m = q − (p − q). As ∂h(δ)δ′ has only
rank (p − q), the G0 in the preceding Case 1 is not relevant. The
search for D1n and Gn(∆) is relatively complicated. Take D1n =
γn1Iq . Then, Γn∂g(∆)
∂δ′ D−11n =
(Iq
γn2γn1
∂h(∆)∂δ′
), which does not converge.
Decompose ∂h(∆)∂δ′ such that ∂h(∆)
∂δ′ = (H1(∆), H2(∆)) where H2(∆)
is an invertible (p− q) × (p − q) matrix, and H1(∆) is a (p − q) × m
matrix. Consider the following matrix, which is invertible,
D−12n (∆) =
(0 Im
γn1γn2
H−12 (∆) −H−1
2 (∆)H1(∆)
).
It follows that γn2γn1
∂h(∆)∂δ′ D−1
2n (∆) = (I(p−q) 0). The relevant Gn ma-
trix can be taken as
Gn(∆) =(
Iqγn2γn1
∂g(∆)∂δ′
)D−1
2n (∆) =
0 Imγn1γn2
H−12 (∆) −H−1
2 (∆)H1(∆)I(p−q) 0
.
The limiting matrix G0 is G0 =
0 Im
0 −H−12 (δ0)H1(δ0)
I(p−q) 0
, which
has full rank q. Thus, Assumptions G and GL are satisfied. The rate
for the constrained estimator δn will be γn1.
3.2 Constrained ML Estimation
Consider the constraints in the form β = g(δ), where β ∈ <p and δ ∈ <q with
q ≤ p. Let Lcn(δ) = Ln(g(δ)) be the likelihood function of the constrained
parameter vector δ. Let δ0 denote the corresponding true parameter vector of
δ.
The δ can be estimated by the maximization of lnLcn(δ) on the parameter
space of δ. Let δn be the MLE of δ. We shall assume that the consistency of δn
has been established.55In a linear time series model, Nagaraj and Fuller (1991) establish the consistency of the
constrained estimator via the consistency of the unconstrained estimator. In general, one mayexpect that arguments in establishing the consistency of the unconstrained estimator mightbe carried over for the consistency of the constrained estimator.
14
Assumption ML-C′. The δn is a consistent estimate of δ0.
The following proposition provides the asymptotic distribution of the con-
strained MLE δn.
Proposition 3.1 Under Assumptions ML-C′, ML-D, G and GL,
D1n(δn − δ0) = S0[G′0ΩG0]−1G′
0Γ′−1n
∂ ln Ln(β0)∂β
+ oP (1)
d−→ N (0, S0(G′0ΩG0)−1S′
0),
where S0 = S(∆0).
Note that if S0 does not have full rank, some components of D1n(δn − δ0)
may be asymptotically linearly dependent and, hence, the limiting distribution
of D1n(δn − δ0) can be degenerated.
The following proposition provides the asymptotic distribution of the con-
strained MLE βcn of β0.
Proposition 3.2 Under Assumptions ML-C′, ML-D and G,
Γn(βcn − β0) = G0 [G′0ΩG0]
−1G′
0Γ−1n
∂ ln Ln(β0)∂β
+ oP (1)
d−→ N (0, G0(G′0ΩG0)−1G′
0).
The constrained MLE βcn of β0 is asymptotically efficient relative to the uncon-
strained MLE βn.
The constrained MLE βcn has the same rate matrix Γn as the unconstrained
MLE βn but is asymptotically efficient relative to the unconstrained MLE under
the null hypothesis. The asymptotic variance of βcn is Γ−1n G0(G′
0ΩG0)−1G′0Γ
′−1n ,
which can be estimated
∂g(δn)∂δ′
(∂g′(δn)
∂δ(−∂2 lnLn(βcn)
∂β∂β′ )∂g(δn)
∂δ′)−1 ∂g′(δn)
∂δ, (12)
which is also familiar in the regular case. The asymptotic variance in ( (12)
provides robust estimates of asymptotic variances of constrained ML estimates
for both the regular case as well as the irregular case under consideration.
15
3.3 Constrained GMM Estimation
The constrained GMM for the estimation of δ0 is
minδ
f ′n(g(δ))Λ′
nV −1n Λnfn(g(δ)).
Let δn be the constrained GMME of δ0. The corresponding constrained GMME
of β0 is βcn = g(δn).
Assumption GMM-C′. The constrained GMME δn is a consistent esti-
mate of δ0.
Proposition 3.3 Under Assumptions GMM-D1, GMM-D2, GMM-C′, and
G, the constrained GMME βcn has the limiting distribution
Γn(βcn − β0)d→ N (0, G0(G′
0F′0V
−1F0G0)−1G′0),
and is asymptotically efficient relative to the unconstrained GMME βn.
In addition, suppose that Assumption GL is satisfied, then
D1n(δn − δ0) = −S0(G′0F
′0V
−1F0G0)−1G′0F
′0V
−1Λnfn(β0) + oP (1)d−→ N (0, S0(G′
0F′0V
−1F0G0)−1S′0).
From the limiting distribution for the constrained GMME δn, the rates of con-
vergence for the components of δn are in D1n. The constrained GMME βcn may
have the same rates Γn as the unconstrained GMME βn, but can be asymptot-
ically efficient relative to the unconstrained one.
The asymptotic variance of the constrained GMME βcn is
Γ−1n G0(G′
0F′0V
−1F0G0)−1G′0Γ
′−1n =
∂g(δ0)∂δ′
(D′nG′
0F′0V
−1F0G0Dn)−1 ∂g′(δ0)∂δ
under Assumption G, which can be estimated by
∂g(δn)∂δ′
(∂g′(δn)
∂δ
∂f ′n(g(δn))
∂βΛ′
nV −1n Λn
∂fn(g(δn))∂β′
∂g(δn)∂δ′
)−1 ∂g′(δn)∂δ
,
under Assumption GMM-D2.
16
4 Hypothesis Testing Under the Null and LocalAlternative Hypotheses
In the subsequent sections, we shall consider hypothesis tests, which include
the MD, W, LR, and LM tests under the ML framework. For the GMM, the
tests corresponding to the LR and LM tests shall be the difference (D) test and
the gradient (G) test. The null hypothesis is H0 : R(β0) = 0 or, equivalently,
β0 = g(δ0). We shall also investigate the asymptotic properties of the various
test statistics under the local alternative
H1 : βn0 = β0 + Γ−1n ∆ (13)
for some constant matrix ∆, where Γn is the same rates matrix in the assump-
tions ML-D, GMM-D2, R, and G. Under H1, while R(β0) = 0, R(βn0) may not
be zero. Corresponding to β0, there exists a unique δ0 such that β0 = g(δ0) but
βn0 may not be in the image of g(δ) for all δ.
Under this local alternative, the condition 2) in Assumption ML-D shall be
modified with the sequence of true parameter vectors βn0:
Assumption ML-D′. Under H1 : βn0 = β0 +Γ−1n ∆, where ∆ is a constant
vector,
1) −Γ′−1n
∂2 ln Ln(B∗n)
∂β∂β′ Γ−1n
p→ Ω;
2) Γ′−1n
∂ ln Ln(βn0)∂β
d→ N (0, Ω)
where Ω is a positive definite matrix, for any consistent estimates β∗j,n, j =
1, · · · , p of β0.
Similarly, Assumption GMM-D1 shall be replaced by
Assumption GMM-D1′. Under H1 : βn0 = β0 + Γ−1n ∆, there exists a
sequence of invertible k × k matrices Λn such that
Λnfn(βn0)d→ N (0, V ),
where V is a k × k positive definite variance matrix.
17
The concept of contiguity of probability measures is useful for establishing
convergences of statistics in probability under H1 in (13) while the convergences
of such statistics under H0 is known. Suppose that Pn and Qn are two se-
quences of probability measures. The sequence Qn is said to be contiguous to
Pn if for any sequence of random variables (or events) Tn for which Tn → 0
in Pn-probability, Tn → 0 in Qn-probability (Le Cam, 1960; Hajek and Sidak,
1967). Contiguity can be established by Le Cam’s first lemma on the log like-
lihood ratio of densities functions corresponding to Pn and Qn (Le Cam, 1960;
Bickel et al., 1993, Appendix A.9). For our purpose, the sequence of Pn cor-
responds to the distribution of model under H0 with the true parameter β0, and
Qn corresponds to the distributions under the sequence of local alternatives
in (13).
Proposition 4.1 Let Tn be a statistic. Under Assumption ML-D, if Tnp→ 0
under H0, then Tnp→ 0 under H1.
The contiguity properties of the model under H0 and H1 are useful to es-
tablish the asymptotic distributions of the unconstrained and unconstrained
estimators under H1.
The following theorem provides the asymptotic distributions of both the
unconstrained and constrained MLE’s under the sequence of local alternatives.
Proposition 4.2 Suppose Assumptions ML-C′, ML-D′ and G hold. Under
the local alternative H1, the unconstrained MLE βn has the following asymptotic
distribution:
Γn(βn − β0) = Ω−1Γ′−1n
∂ lnLn(βn0)∂β
+ ∆ + op(1) d→ N (∆, Ω−1);
and the constrained MLE βcn has the asymptotic distribution:
Γn(βcn − β0) = G0 [G′0ΩG0]
−1G′
0[Γ′−1n
∂ ln Ln(βn0)∂β
+ Ω∆] + oP (1)
d−→ N (G0(G′0ΩG0)−1G′
0Ω∆, G0(G′0ΩG0)−1G′
0).
Furthermore, under the additional Assumption GL,
D1n(δn − δ0) = S0[G′0ΩG0]−1G′
0[Γ′−1n
∂ lnLn(βn0)∂β
+ Ω∆] + oP (1)
18
d−→ N (S0(G′0ΩG0)−1G′
0Ω∆, S0(G′0ΩG0)−1S′
0).
For the GMM estimation, in place of features in the underlying likelihood
function of a model, we shall assume directly that the contiguity property holds
under H1 with H0.
Assumption CT. The distributions under the sequence of local alternatives
H1 : βn0 = β0 + Γ−1n ∆ are contiguous to the distribution under H0.
The following one summaries the asymptotic distributions of the uncon-
strained and constrained GMM estimators under the sequence of local alterna-
tives.
Proposition 4.3 Suppose that Assumptions GMM-D1′, GMM-D2, GMM-
C′, and G hold. Under the local alternative H1 and Assumption CT, the uncon-
strained GMM estimate βn has the following asymptotic distribution:
Γn(βn − β0) = −(F ′0V
−1F0)−1F ′0V
−1Λnfn(βn0) + ∆ + op(1)d→ N (∆, (F ′
0V−1F0)−1).
and the constrained GMM estimate βcn has
Γn(βcn − β0)d→ N (G0(G′
0F′0V
−1F0G0)−1G′0F
′0V
−1F0∆, G0(G′0F
′0V
−1F0G0)−1G′0).
Under the additional Assumption GL,
D1n(δn − δ0) = −S0(G′0F
′0V
−1F0G0)−1G′0F
′0V
−1[Λnfn(β0) − F0∆] + oP (1)d−→ N (S0(G′
0F′0V
−1F0G0)−1G′0F
′0V
−1F0∆, S0(G′0F
′0V
−1F0G0)−1S′0).
For the unconstrained MLE, its limiting variance is Σ = Ω−1 and the lim-
iting variance of the unconstrained GMME is Σ = (F ′0V
−1F0)−1 under both
the local and alternative hypotheses. The next section shall first explore the
MD test. Subsequently, we can show that the classical test statistics can be
asymptotically equivalent to the MD test under both the null hypothesis and
19
the local alternative hypothesis. It is interesting to point out that, for all those
statistics, ∂f ′n(β)∂β
Λ′nV −1Λn
∂fn(β)∂β′ of the GMM approach plays a role similar to
(−∂2 ln Ln(β)∂β∂β′ ) of the ML approach.
5 The Minimum Distance Test
Suppose that an unconstrained estimator of β0 is βn with
Γn(βn − β0)d→ N (0, Σ), (14)
where the limiting variance matrix Σ is positive definite. If Γn is a diagonal
matrix, its diagonal elements will consist of various rates of convergence for
components of the estimate βn.
With the unconstrained estimate βn, a constrained estimator under the con-
straints R(β) = 0 can be derived by minimizing a weighted distance subject to
the constraints:
minβ
[Γn(βn − β)]′Σ−1n [Γn(βn − β)] : R(β) = 0, (15)
where Σn is a consistent estimate of Σ. Equivalently, in terms of the constraints
in the form β = g(δ), the MD estimation is
minδ
[Γn(βn − g(δ))]′Σ−1n [Γn(βn − g(δ))]. (16)
In the ML estimation under our setting, a consistent estimate of Σ−1 = Ω
is (−Γ′−1n
∂2 ln L(βn)∂β∂β′ Γ−1
n ). A version of the MD estimation in (15) can be based
on the distance
(βn − β)′(−∂2 lnL(βn)∂β∂β′ )(βn − β).
For the GMM estimation under our setting, as the unconstrained estimator
βn has the limiting distribution Γn(βn − β0)d→ N (0, (F ′
0V−1F0)−1). The
MD approach can be minβ[Γn(βn − β)]′F ′nV −1
n Fn[Γn(βn − β)], where Fn =
Fn(βn, · · · , βn) is a consistent estimate of F0. Alternatively, the distance func-
tion can be
(βn − β)′(∂f ′
n(βn)∂β
Λ′nV −1
n Λn∂fn(βn)
∂β′ )(βn − β).
20
These two formulations are identical because, under the situation in Assumption
GMM-D2, ∂f ′n(βn)∂β
Λ′nV −1
n Λn∂fn(βn)
∂β′ = Γ′nF ′
n(βn)V −1n Fn(βn)Γn.
Let βcm,n be the constrained MDE of β0 with R(β0) = 0, and δm be the
MDE of δ0. From (15) and (16), βcm,n = g(δm) when the same weighting matrix
Σ−1n is used in both (15) and (16). The consistency of βcm,n can be established
with the arguments in Moon and Schorfheide (2002) and Lee (2004b). We would
like to show that the minimized distance function in (15) can be a useful test
statistic for the constraints.
Proposition 5.1 Suppose that Assumption R or Assumption G is satisfied.
Under the null hypothesis H0, when Γn(βn−β0)d→ N (0, Σ), the MDE βcm,n
has the asymptotic distribution:
Γn(βcm,n − β0) = Ip − ΣA′0(A0ΣA′
0)−1A0 · Γn(βn − β0) + op(1)
= G0(G′0Σ
−1G0)−1G′0Σ
−1 · Γn(βn − β0) + op(1)
d→ N (0, Σcm),
where Σcm = Σ − ΣA′0(A0ΣA′
0)−1A0Σ = G0(G′0Σ−1G0)−1G′
0. In addition,
under Assumption GL,
D1n(δcm,n − δ0) = S0(G′0Σ
−1G0)−1G′0Σ
−1Γn(βn − β0) + op(1)d→ N (0, S0(G′
0Σ−1G0)−1S′
0).
Under the sequence of local alternatives H1 and Assumption CT, when Γn(βn−
β0)d→ N (∆, Σ),
Γn(βcm,n − β0)d→ N (µ, Σcm),
where µ = G0(G′0Σ−1G0)−1G′
0Σ−1∆ = (Ip − ΣA′0(A0ΣA′
0)−1A0)∆. Further-
more, under Assumption GL,
D1n(δcm,n − δ0)d→ N (S0µ, S0(G′
0Σ−1G0)−1S′
0).
The MDE βcm,n is asymptotically efficient relative to the unconstrained esti-
mate βc,n because Σcm is smaller than Σ by the generalized Schwartz inequality.
21
Assumption G plays a crucial role in the asymptotic distribution of δcm,n, and,
hence, the asymptotic distribution of the constrained estimator βcm,n.
Appendix C provides an illustrative example on asymptotic properties of
the MDE when Assumption G is not satisfied. That example illustrates the
possible rates of convergence would be rather complicated and the asymptotic
distributions might not be normal.
Proposition 5.2 Suppose that Assumption R or Assumption G is satisfied,
then
[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)]
= [Γn(βn − β0)]′A′0(A0ΣA′
0)−1A0[Γn(βn − β0)] + op(1).. (17)
Under the null hypothesis H0, when Γn(βn − β0)d→ N (0, Σ),
[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)] d→ χ2
(p−q),
where χ2(p−q) is the (central) χ2 random variable with (p−q) degrees of freedom.
Under the local alternative hypothesis H1 and Assumption CT, when Γn(βn−
β0)d→ N (∆, Σ),
[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)] d→ χ2
(p−q)(η), (18)
which is a noncentral χ2 random variable with (p − q) degrees of freedom and
the noncentral parameter η, where
η = ∆′A′0(A0ΣA′
0)−1A0∆ = ∆′(Σ−1 − Σ−1G0(G′
0Σ−1G0)−1G′
0Σ−1)∆. (19)
The MD test provides the reference with which all the classical asymptotic
tests in subsequent sections can be compared. The asymptotic equivalency of
those classical asymptotic tests under both H0 and H1 can then be demon-
strated.
22
6 The Wald Test
The Wald test can be constructed with R(βn), where βn is either the MLE or
GMME. By the mean value theorem and Assumption R,
R(βn) =∂R(B∗
n)∂β′ (βn−β0) =
∂R(B∗n)
∂β′ Γ−1n ·Γn(βn−β0) = Cn(B∗
n)An(B∗n)·Γn(βn−β0).
As Γn(βn − β0)d→ N (0, Σ) under H0, it follows that
C−1n (B∗
n)R(βn) = A0Γn(βn − β0) + op(1) d→ N (0, A0ΣA′0).
The C−1n (B∗
n) apparently represents the rates matrix of R(βn). The Wald test
statistic Wn in its general form can be
Wn = R′(βn)[∂R(βn)
∂β′ Γ−1n ΣnΓ
′−1n
∂R′(βn)∂β
]−1R(βn). (20)
For the MLE, its limiting variance is Σ = Ω−1. Under the setting in As-
sumption ML-D, Γ′−1n (−∂2 ln Ln(βn)
∂β∂β′ )Γ−1n estimates Ω. An alternative form of
the Wald test with the MLE βn is the following familiar one:
R′(βn)
∂R(βn)
∂β′
(−∂2 lnLn(βn)
∂β∂β′
)−1∂R(βn)
∂β
−1
R(βn).
In the GMM framework, the limiting variance of the GMME βn is Σ =
(F ′0V
−1F0)−1. The Wald test statistic can be
R′(βn)[∂R(βn)
∂β′ (∂f ′
n(βn)∂β
Λ′nV −1
n Λn∂fn(βn)
∂β′ )−1 ∂R′(βn)∂β
]−1R(βn).
This follows because
R′(βn)[∂R(βn)
∂β′ Γ−1n (F ′(βn)V −1
n F (βn))−1Γ′−1n
∂R′(βn)∂β
]−1R(βn)
= R′(βn)[∂R(βn)
∂β′ (Γ′nF ′(βn)V −1
n F (βn)Γn)−1 ∂R′(βn)∂β
]−1R(βn)
= R′(βn)[∂R(βn)
∂β′ (∂f ′
n(βn)∂β
Λ′nV −1
n Λn∂fn(βn)
∂β′ )−1∂R′(βn)∂β
]−1R(βn)
under Assumption GMM-D2.
23
Proposition 6.1 Suppose that the setting under Assumption R holds. The
Wald test statistic for testing the hypothesis R(β) = 0 has, under the null hy-
pothesis H0,
Wn = [Γ−1n (βn − β0)]′A′
0(A0ΣA′0)
−1A0[Γn(βn − β0)] + op(1) d→ χ2(p−q). (21)
Under the local alternative hypothesis H1 and Assumption CT, Wnd→ χ2
(p−q)(η),
where the noncentral parameter η is in (19). Under both H0 and H1, the Wald
test statistic is asymptotically equivalent to the MD test statistic.
7 The Likelihood Ratio Type Tests
7.1 The Maximum Likelihood Ratio Test
The following proposition gives the asymptotic distribution of the LR statistic.
Proposition 7.1 Suppose Assumption G; Assumptions ML-C and ML-D
under H0; and Assumptions ML-C′ and ML-D′ under H1 hold.
Then, under H0
2[lnLn(βn) − ln Ln(βcn)] d−→ χ2(p−q),
and, under the local alternative hypothesis H1,
2[lnLn(βn) − lnLn(βcn)] d−→ χ2(p−q)(η),
where the noncentral parameter η is in (19).
From (33) in the proof and (17), we can see that the LR statistic is asymp-
totically equivalent to the MD test statistic under both the null hypothesis H0
and the local alternative hypothesis H1.
7.2 Difference Test
In the GMM framework, the difference test is analogous to the LR test in the
likelihood framework. The difference test in the GMM framework is based on
24
the difference of the minimized objective functions with and without constraints.
It is
D = f ′n(g(δn))Λ′
nV −1n Λnfn(g(δn)) − f ′
n(βn)Λ′nV −1
n Λnfn(βn),
where Vn is a consistent estimate of V .
Proposition 7.2 Suppose that the Assumption G; and GMM-C, GMM-D1
and GMM-D2 under H0; and GMM-C′, GMM-D1′, GMM-D2, and CT under
H1 hold.
Then, the difference test D is asymptotically equivalent to the MD test under
both the null and local alternative hypotheses. Under H0, it is asymptotic χ2(p−q),
and under H1 it is asymptotic χ2(p−q)(η).
8 The Score Type Tests
8.1 The LM (Efficient Score) Test and Neyman’s C(α)Test
The LM statistic is ∂ ln Ln(βcn)∂β′ (−∂2 ln Ln(βcn)
∂β∂β′ )−1 ∂ ln Ln(βcn)∂β′ . The following propo-
sition provides the asymptotic distribution of the LM statistic under the null
hypothesis.
Proposition 8.1 Suppose Assumption G; Assumptions ML-C and ML-D
under H0; and Assumptions ML-C′ and ML-D′ under H1 hold.
The LM statistic
∂ lnLn(βcn)∂β′ (−∂2 ln Ln(βcn)
∂β∂β′ )−1 ∂ lnLn(βcn)∂β′
is asymptotically equivalent to the LR test statistic under both H0 and the local
alternative H1.
The LM statistic is evaluated at the restricted MLE. Neyman (1959) gener-
alizes the efficient score test to a test which is invariant to restricted consistent
estimates, namely, the C(α)-test. The C(α) statistic may have computational
advantage relative to the score test if the restricted MLE is difficult to be com-
puted but appropriate consistent estimates can be available. Neyman’s original
C(α)-statistic is formulated in the case where the retrictions are imposed on
25
a subset of parameters being known. Smith (1987) and Dagenais and Du-
four (1991) discuss the general version in terms of general explicit constraints
R(β) = 0. With the identity in Lemma 3.1, one can formulate the C(α)-statistic
in terms of the explicit constraints β = g(δ). In its generalized form for testing
the explicit constraints the C(α)-test is
Cα =∂ lnLn(βcn)
∂β′ (−∂2 ln Ln(βcn)∂β∂β′ )−1 ∂ lnLn(βcn)
∂β′
−∂ lnLn(βcn)∂δ′
(−∂g′(δn)∂δ
∂2 lnLn(βcn)∂β∂β′
∂g(δn)∂δ′
)−1 ∂ ln Ln(βcn)∂δ′
,
where βcn = g(δn) and δn is Dn-consistent where Dn = Dn(∆) in Assumption
G is assumed to be independent of ∆. The following proposition shows that Cα
is asymptotically equivalent to the minimum distance statistic under both H0
and H1.
Proposition 8.2 Suppose that Assumptions ML-C and ML-D under H0;
and Assumptions ML-C′ and ML-D′ under H1 hold. Furthermore, Assumption
G holds with Dn = Dn(∆), which does not depend on parameters δ.
Then, the Cα test is asymptotically equivalent to the minimum distance test
under both the null and local alternative hypotheses. Under H0, it is asymptotic
χ2(p−q), and under H1 it is asymptotic χ2
(p−q)(η).
8.2 Gradient Test and C(α)-type Gradient Test
The derivative of the GMM objective function in (3) evaluated at the restricted
estimator βcn is two times ∂f ′n(βcn)∂β Λ′
nV −1n Λnfn(βcn). This suggests to use the
inverse of ∂f ′n(βcn)∂β Λ′
nV −1n Λn
∂fn(βcn)∂β′ as the weighting matrix. Thus, the gradient
test statistic can be formulated as
G = f ′n(βcn)Λ′
nV −1n Λn
∂fn(βcn)∂β′ [
∂f ′n(βcn)∂β
Λ′nV −1
n Λn∂fn(βcn)
∂β′ ]−1
×∂f ′n(βcn)∂β
Λ′nV −1
n Λnfn(βcn).
Proposition 8.3 Suppose that the Assumption G; and GMM-C, GMM-D1
and GMM-D2 under H0; and GMM-C′, GMM-D1′, GMM-D2, and CT under
H1 hold.
26
Then, the gradient test G is asymptotically equivalent to the difference test D
under both the null and local alternative hypotheses. Under H0, it is asymptotic
χ2(p−q), and under H1 it is asymptotic χ2
(p−q)(η).
In the likelihood framework, a C(α) statistic is invariant to consistent es-
timates, which generalizes the score test statistic. In the GMM framework, a
C(α)-type gradient test can also be formulated (Lee 2005). With various rates
of convergence, this statistic shall be
C = f ′n(βcn)Λ′
nV −1n Λn
∂fn(βcn)∂β′ [
∂f ′n(βcn)∂β
Λ′nV −1
n Λn∂fn(βcn)
∂β′ ]−1
×∂f ′n(βcn)∂β
Λ′nV −1
n Λnfn(βcn)
−f ′n(βcn)Λ′
nV −1n Λn
∂fn(βcn)∂δ′
[∂f ′
n(βcn)∂δ
Λ′nV −1
n Λn∂fn(βcn)
∂δ′]−1
×∂f ′n(βcn)∂δ
Λ′nV −1
n Λnfn(βcn),
where βcn = g(δn) and δn is Dn-consistent where Dn = Dn(∆) in Assumption
G is assumed to be independent of ∆.
Proposition 8.4 Suppose that the Assumption GMM-C, GMM-D1 and
GMM-D2 under H0; and GMM-C′, GMM-D1′, GMM-D2, and CT under H1
hold. Furthermore, Assumption G holds with Dn = Dn(∆), which does not
depend on parameters δ.
Then, the C(α)-type gradient test C is asymptotically equivalent to the min-
imum distance test under both the null and local alternative hypotheses. Under
H0, it is asymptotic χ2(p−q), and under H1 it is asymptotic χ2
(p−q)(η).
9 Conclusion
This paper has considered the classical asymptotic test statistics, namely, the
likelihood ratio, efficient score, Neyman’s C(α), and Wald-type statistics for the
testing of general (linear or nonlinear) equality constraints on parameters in a
model, where the MLE’s of various parameters in the model may have different
rates of convergence. We have established a set of general sufficient conditions
such that these test statistics are asymptotically χ2. Indeed, we show that under
27
these sufficient conditions, these classical test statistics are all asymptotically
equivalent under both the null hypothesis and a sequence of local alternative
hypotheses. These test statistics are also asymptotically equivalent to a properly
defined MD test statistic (Lee 2004b).
In addition to the test statistics in the likelihood framework, we have ex-
tended the analogous difference test, gradient test and Wald test in the GMM
estimation framework (Newey and West 1987), where the GMM estimates of
various parameters in the model may have different rates of convergence. An
additional C(α)-type gradient statistic is also considered. These test statistics
are shown to be asymptotic equivalent under both the null hypothesis and a
sequence of local alternative hypotheses, under a set of sufficient conditions.
A Appendix: Proofs
Proof of Proposition 2.1
It follows from (1) that
Γn(βn − β0) = −(
Γ′−1n
∂2 ln Ln(B∗n)
∂β∂β′ Γ−1n
)−1
Γ′−1n
∂ ln Ln(β0)∂β
= Ω−1Γ′−1n
∂ ln Ln(β0)∂β
+ oP (1) d−→ N (0, Σ). (22)
Q.E.D.
Proof of Proposition 2.2
The first order condition of the GMM estimation is ∂f ′n(βn)∂β Λ′
nV −1n Λnfn(βn) =
0. By the mean value theorem,
fn(βn) = fn(β0) +∂fn(B∗
n)∂δ′
(βn − β0).
It follows from the first order condition that
βn − β0 = −
(∂f ′
n(βn)∂β
Λ′nV −1
n Λn∂fn(B∗
n)∂β′
)−1∂f ′
n(βn)∂β
Λ′nV −1
n Λnfn(β0)
= −[Γ′nF ′
n(βn)V −1n Fn(B∗
n)Γn]−1Γ′nF ′
n(βn)V −1n Λnfn(β0)
= −Γ−1n [F ′
n(βn)V −1n Fn(B∗
n)]−1F ′n(βn)V −1
n Λnfn(β0).
28
Therefore, the asymptotic distribution of βn follows from
Γn(βn − β0) = −(F ′0V
−1F0)−1F ′0V
−1Λnfn(β0) + oP (1) d→ N (0, Σ).
Q.E.D.
Proof of Lemma 3.1
Because R(β) = 0 when β = g(δ), R(g(δ)) = 0 for all δ. It follows that, by
Assumptions R and G,
∂R(β)∂β′
∂g(δ)∂δ′
= 0 ⇔ ∂R(β)∂β′ Γ−1
n Γn∂g(δ)∂δ′
= 0
⇔ Cn(β)An(β)Gn(δ)Dn(δ) = 0 ⇔ An(β)Gn(δ) = 0,
for β = g(δ). In the limit, as n → ∞, A0G0 = 0.
Because the columns Σ12 A′
0 are perpendicular to Σ− 12 G0, and A0 has full
row rank and G0 has full column rank, the columns of (Σ12 A′
0, Σ− 12 G0) span the
full p-dimensional Euclidean space Rp. Therefore, for any y ∈ Rp, y = y1 + y2
with y1 lies in the space spanned by the columns of Σ12 A′
0 and y2 lies in the
space spanned by the columns of Σ− 12 G0. As Σ
12 A′
0(A0ΣA′0)−1A0Σ
12 y = y1 and
(Ip − Σ− 12 G0(G′
0Σ−1G0)−1G′
0Σ− 1
2 )y = (Ip − Σ− 12 G0(G′
0Σ−1G0)−1G′
0Σ− 1
2 )y1 = y1,
because G′0Σ− 1
2 y1 = 0, the two mappings are identical. Q.E.D.
Proof of Proposition 3.1
The linear expansion of ∂ ln Ln(g(δn))∂β
at δ0 may be done in two steps. By the
mean value theorem, in the first step,
∂ lnLn(g(δn))∂β
=∂ ln Ln(β0)
∂β+
∂2 lnLn(B∗n)
∂β∂β′ (g(δn) − β0),
and, in the second step,
g(δn) = g(δ0) +∂g(∆∗
n)∂δ′
(δn − δ0).
Together, one has
∂ ln Ln(g(δn))∂β
=∂ lnLn(g(δ0))
∂β+
∂2 lnLn(B∗n)
∂β∂β′∂g(∆∗
n)∂δ′
(δn − δ0).
29
Because ∂ ln Lcn(δ)∂δ = ∂g′(δ)
∂δ∂ ln Ln(β)
∂β and ∂ ln Lcn(δn)∂δ = 0, it follows that
∂g′(δn)∂δ
∂ ln Ln(g(δn)∂β = 0. Therefore,
δn − δ0 = −
[∂g′(δn)
∂δ
∂2 ln Ln(B∗n)
∂β∂β′∂g(∆∗
n)∂δ′
]−1∂g′(δn)
∂δ
∂ ln Ln(β0)∂β
= −[D′
n(δn)G′n(δn)Γ
′−1n
∂2 ln Ln(B∗n)
∂β∂β′ Γ−1n Gn(∆∗
n)Dn(∆∗n)]−1
×D′n(δn)G′
n(δn)Γ′−1n
∂ ln Ln(β0)∂β
= −D−1n (∆∗
n)[G′
n(δn)Γ′−1n
∂2 lnLn(B∗n)
∂β∂β′ Γ−1n Gn(∆∗
n)]−1
×G′n(δn)Γ
′−1n
∂ ln Ln(β0)∂β
, (23)
under the setting in Assumption G. As δnp→ δ0 implies δ∗j,n
p→ δ0 for j =
1, · · · , p, therefore
Dn(∆∗n) · (δn − δ0) = [G′
0Γ′−1n (−∂2 ln Ln(β0)
∂β∂β′ )Γ−1n G0]−1G′
0Γ′−1n
∂ ln Ln(β0)∂β
+ oP (1)
= (G′0ΩG0)−1G′
0Γ′−1n
∂ lnLn(β0)∂β
+ oP (1).
Under Assumption GL, it follows that
D1n(δn − δ0) = S0[G′0Γ
′−1n (−∂2 ln Ln(β0)
∂β∂β′ )Γ−1n G0]−1G′
0Γ′−1n
∂ ln Ln(β0)∂β
+ oP (1)
= S0(G′0ΩG0)−1G′
0Γ′−1n
∂ ln Ln(β0)∂β
+ oP (1).
Q.E.D.
Proof of Proposition 3.2 The constrained estimator of β0 is βcn = g(δn). By
the mean value theorem,
βcn − β0 = g(δn) − g(δ0) =∂g(∆∗
n)∂δ′
D−1n (∆∗
n) · Dn(∆∗n) · (δn − δ0)
= Γ−1n Gn(∆∗
n)
[G′0Γ
′−1n (−∂2 lnLn(β0)
∂β∂β′ )Γ−1n G0]−1G′
0Γ′−1n
∂ lnLn(β0)∂β
+ oP (1)
,
because ∆∗n is the same one in the preceding Proposition 3.2. Hence,
Γn(βcn − β0) = G0
[G′
0ΩG0]−1
G′0Γ
′−1n
∂ lnLn(β0)∂β
+ oP (1)
. (24)
30
As both the constrained and unconstrained MLE’s have the same rate matrix
Γn, their efficiency can be compared with their limiting variance matrix. The
generalized Schwartz inequality implies that Ω−1 ≥ G0(G′0ΩG0)−1G′
0. Hence
the constrained MLE βcn is asymptotically efficient relative to the unconstrained
MLE βn. Q.E.D.
Proof of Proposition 3.3 The first order condition for the constrained model
is
∂g′(δn)∂δ
∂f ′n(g(δn))
∂βΛ′
nV −1n Λnfn(g(δn)) = 0. (25)
The linearization of fn(g(δn)) at δ0 is desirable to be performed in two steps.
It shall first be linearized at β0 as
fn(g(δn)) = fn(β0) +∂fn(B∗
n)∂β′ (g(δn) − β0).
In the second step, linearize g(δn) at δ0 as g(δn) = g(δ0) + ∂g(∆∗n)
∂δ′ (δn − δ0).
Combining these together, one has
fn(g(δn)) = fn(β0) +∂fn(B∗
n)∂β′
∂g(∆∗n)
∂δ′(δn − δ0), (26)
because β0 = g(δ0). By substituting the expansion (26) into (25), it follows that
δn − δ0 = −
∂g′(δn)
∂δ
∂f ′n(g(δn))
∂βΛ′
nV −1n Λn
∂fn(B∗n)
∂β′∂g(∆∗
n)∂δ′
−1
·∂g′(δn)∂δ
∂f ′n(g(δn))
∂βΛ′
nV −1n Λnfn(β0)
= −
∂g′(δn)
∂δΓ′
nF ′n(g(δn))V −1
n Fn(B∗n)Γn
∂g(∆∗n)
∂δ′
−1
·∂g′(δn)
∂δΓ′
nF ′n(g(δn))V −1
n Λnfn(β0)
= −D−1n (∆∗
n)
G′n(δn)F ′
n(g(δn))V −1n Fn(B∗
n)Gn(∆∗n)−1
·G′n(δn)F ′
n(g(δn))V −1n Λnfn(β0),
which implies, in turn, that
Dn(∆∗n) · (δn − δ0) = −(G′
0F′0V
−1F0G0)−1G′0F
′0V
−1Λnfn(β0) + oP (1). (27)
31
Under Assumption GL,
D1n(δn − δ0) = −S0(G′0F
′0V
−1F0G0)−1G′0F
′0V
−1Λnfn(β0) + oP (1). (28)
Therefore, D1n(δn − δ0)d→ N (0, S0(G′
0F′0V
−1F0G0)−1S′0).
By the delta method, the limiting distribution of the constrained GMM
estimator βcn is
Γn(βcn − β0) = Γn(g(δn) − g(δ0)) = Γn∂g(∆∗
n)∂δ′
(δn − δ0)
= G0Dn(∆∗n)(δn − δ0) + oP (1) d→ N (0, G0(G′
0F′0V
−1F0G0)−1G′0).
By the generalized Schwartz inequality, (F ′0V
−1F0)−1 ≥ G0(G′0F
′0V
−1F0G0)−1G′0,
and, hence, βcn is efficient relative to βn. Q.E.D.
Proof of Proposition 4.1
Under the Assumption ML-D, by the mean value theorem,
ln Ln(βn0) − ln Ln(β0)
=∂ lnLn(β0)
∂β′ (βn0 − β0) +12(βn0 − β0)′
∂2 lnLn(βn)∂β∂β′ (βn0 − β0)
= (Γ′−1n
∂ ln Ln(β0)∂β
)′Γn(βn0 − β0)
+12(Γn(βn0 − β0))′ · Γ
′−1n
∂2 lnLn(βn)∂β∂β′ Γ−1
n · Γn(βn0 − β0)
= (Γ′−1n
∂ ln Ln(β0)∂β
)′∆ +12∆′Γ
′−1n
∂2 lnLn(βn)∂β∂β′ Γ−1
n ∆
d→ N (−12∆′Ω∆, ∆′Ω∆),
under H0. The result follows from Le Cam’s lemma. Q.E.D.
Proof of Proposition 4.2 The mean value theorem implies that
Γ′−1n
∂ lnLn(β0)∂β
= Γ′−1n
∂ lnLn(βn0 − Γ−1n ∆)
∂β
= Γ′−1n ∂ ln Ln(βn0)
∂β− ∂2 ln Ln(B∗
n)∂β∂β′ Γ−1
n ∆ = Γ′−1n
∂ lnLn(βn0)∂β
+ Ω∆ + op(1)
d→ N (Ω∆, Ω).
From (22) in the proof of Proposition 2.1, the difference of Γn(βn − β0) and
Ω−1Γ′−1n
∂ ln Ln(β0)∂β converges in probability to zero under H0. By contiguity in
32
Proposition 4.1, this difference will also converge to zero under the sequence of
local alternatives in (13). Hence, under H1,
Γn(βn − β0)
= Ω−1Γ′−1n
∂ lnLn(β0)∂β
+ op(1) = Ω−1(Γ′−1n
∂ ln Ln(βn0)∂β
+ Ω∆) + op(1)
= Ω−1Γ′−1n
∂ lnLn(βn0)∂β
+ ∆ + op(1) d→ N (∆, Ω−1),
under Assumption ML-D′.
Similarly, under H1, from Proposition 3.1 and by contiguity,
D1n(δn − δ0) = S0(G′0ΩG0)−1G′
0Γ′−1n
∂ lnLn(β0)∂β
+ op(1)
= S0(G′0ΩG0)−1G′
0[Γ′−1n
∂ ln Ln(βn0)∂β
+ Ω∆] + op(1)
d→ N (S0(G′0ΩG0)−1G′
0Ω∆, S0(G′0ΩG0)−1S′
0),
and, from Proposition 3.2,
Γn(βcn − β0) = G0(G′0ΩG0)−1G′
0Γ′−1n
∂ lnLn(β0)∂β
+ op(1)
= G0(G′0ΩG0)−1G′
0(Γ′−1n
∂ lnLn(βn0)∂β
+ Ω∆) + op(1)
d→ N (G0(G′0ΩG0)−1G′
0Ω∆, G0(G′0ΩG0)−1G′
0).
Q.E.D.
Proof of Proposition 4.3
By the mean value theorem, Assumptions GMM-D1′ and GMM-D2,
Λnfn(β0) = Λnfn(βn0 − Γ−1n ∆) = Λnf(βn0) −
∂fn(B∗n)
∂β′ Γ−1n ∆
= Λnf(βn0) − Fn(B∗n)∆ = Λnf(βn0) − F0∆ + op(1).
From Proposition 2.2 and by contiguity,
Γn(βn − β0) = −(F ′0V
−1F0)−1F ′0V
−1Λnfn(β0) + op(1)
= −(F ′0V
−1F0)−1F ′0V
−1(Λnfn(βn0) − F0∆) + op(1)
= −(F ′0V
−1F0)−1F ′0V
−1Λnfn(βn0) + ∆ + op(1)p→ N (∆, (F ′
0V−1F0)−1)
33
under H1. The result for the unconstrained estimate follows.
For the constrained estimators, the asymptotic expansion (28) in the proof
of Proposition 3.3 is valid under H1, i.e.,
Dn(∆∗n)(δn − δ0) = −(G′
0F′0V
−1F0G0)−1G′0F
′0V
−1Λnfn(β0) + oP (1),
and
Γn(βcn − β0) = G0Dn(∆∗n)(δn − δ0) + op(1),
under the sequence of local alternatives in (13). Similar arguments provide the
results for the constrained estimators under H1. Q.E.D.
Proof of Proposition 5.1 First, we shall derive the asymptotic distribution
of the MDE βcm,n under H0. The Lagrangian function of (15) is
L(β, λ) =12[Γn(βn − β)]′Σ−1
n [Γn(βn − β)] + λ′R(β),
where λ is the (p − q)-dimensional vector of Lagrangian multipliers. The first
order conditions are
−Γ′nΣ−1
n Γn(βn − βcm,n) +∂R′(βcm,n)
∂βλcm,n = 0 (29)
R(βcm,n) = 0.
By the mean value theorem, R(βcm,n) = ∂R(B∗n)
∂β′ (βcm,n−β0) because R(β0) = 0.
It follows that the constrained estimates satisfy the following equations(
Γ′nΣ−1
n Γn∂R′(βcm,n)
∂β∂R(B∗
n)∂β′ 0
)(βcm,n − β0
λcm,n
)=(
Γ′nΣ−1
n Γn(βn − β0)0
),
which implies that
(βcm,n − β0) = PnΓ′nΣ−1
n Γn(βn − β0)
where
Pn = (Γ′nΣ−1
n Γn)−1 − (Γ′nΣ−1
n Γn)−1 ∂R′(βcm,n)∂β
·
[∂R(B∗
n)∂β′ (Γ′
nΣ−1n Γn)−1∂R′(βcm,n)
∂β
]−1∂R(B∗
n)∂β′ (Γ′
nΣ−1n Γn)−1.(30)
34
Under Assumption R,
PnΓ′n = Γ−1
n Σn − ΣnA′n(βcm,n)[An(B∗
n)ΣnA′n(βcm,n)]−1An(B∗
n)Σn.
Therefore,
Γn(βcm,n − β0) = Ip − ΣA′0(A0ΣA′
0)−1A0Γn(βn − β0) + oP (1)
d→ N (0, (Σ − ΣA′0(A0ΣA′
0)−1A0Σ)).
Let δn be the MDE of δ0 derived from
minδ
[Γn(βn − g(δ))]′Σ−1n [Γn(βn − g(δ))].
The corresponding MDE of β0 will have βcm,n = g(δn). By the mean value
theorem, there exists ∆∗n such that
βcm,n = g(δn) = g(δ0) +∂g(∆∗
n)∂δ
(δn − δ0).
From the first order condition ∂g′(δn)∂δ Γ′
nΣ−1n Γ(βn − g(δn)) = 0, it follows that
∂g′(δn)∂δ
Γ′nΣ−1
n Γn
[βn − g(δ0) −
∂g(∆∗n)
∂δ′(δn − δ0)
]= 0.
Under Assumption G, this implies under H0 that
δn − δ0 =
[∂g′(δn)
∂δΓ′
nΣ−1n Γn
∂g(∆∗n)
∂δ′
]−1∂g′(δn)
∂δΓ′
nΣ−1n Γn(βn − β0)
= [D′n(δn)G′
n(δn)Σ−1n Gn(∆∗
n)Dn(∆∗n)]−1D′
n(δn)G′n(δn)Σ−1
n Γn(βn − β0).
Therefore,
Dn(∆∗n)(δn − δ0) = (G′
0Σ−1G0)−1G′
0Σ−1Γn(βn − β0) + oP (1)
d→ N (0, (G′0Σ
−1G0)−1),
under H0. Under the situation in Assumption GL, it follows that
D1n(δn − δ0) = S0(G′0Σ
−1G0)−1G′0Σ
−1Γn(βn − β0) + oP (1)
d→ N (0, S0(G′0Σ
−1G0)−1S′0).
35
Furthermore,
Γn(βcm,n − β0) = Γn∂g(∆∗
n)∂δ′
(δn − δ0) = G0Dn(∆∗n)(δn − δ0) + op(1).
Hence,
Γn(βcm,n − β0) = G0(G′0Σ
−1G0)−1G′0Σ
−1Γn(βn − β0) + op(1)
d→ N (0, G0(G′0Σ
−1G0)−1G′0),
under H0. Under H1, the results follow by contiguity and the property Γn(βn −
β0)d→ N (∆, Σ).
Note that the identity in Lemma 3.1
Σ12 A′
0(A0ΣA′0)
−1A0Σ12 = Ip − Σ− 1
2 G0(G′0Σ
−1G0)−1G′0Σ
− 12
implies that
Ip − ΣA′0(A0ΣA′
0)−1A0 = G0(G′
0Σ−1G0)−1G′
0Σ−1
and
Ip − ΣA′0(A0ΣA′
0)−1A0 = G0(G′
0Σ−1G0)−1G′
0Σ−1.
These justify the common values of Σcm and µ. Q.E.D.
Proof of Proposition 5.2
From Proposition 5.1,
Γn(βn − βcm,n) = ΣA′0(A0ΣA′
0)−1A0Γn(βn − β0) + op(1)
under both H0 and H1. Hence, the minimized distance is
[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)]
= [Γn(βn − β0)]′A′0(A0ΣA′
0)−1A0[Γn(βn − β0)] + oP (1)
= u′nΣ1/2A′
0(A0ΣA′0)
−1A0Σ1/2un + oP (1),
where un = Σ−1/2Γn(βn − β0), under both H0 and H1.
Under H0, und→ N (0, Ip), therefore
[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)] d→ χ2
(p−q).
36
On the other hand, under H1, und→ N (Σ− 1
2 ∆, Ip) and
[Γn(βn − βcm,n)]′Σ−1n [Γn(βn − βcm,n)] d→ χ2
(p−q)(∆′A0(A0ΣA′
0)−1A0∆).
Q.E.D.
Proof of Proposition 6.1 By the mean value theorem,
R(βn) =∂R(B∗
n)∂β′ (βn − β0) =
∂R(B∗n)
∂β′ Γ−1n · Γn(βn − β0)
= Cn(B∗n)An(B∗
n) · Γn(βn − β0), (31)
under the constraints R(β0) = 0. It follows that
C−1n (B∗
n)R(βn) = An(B∗n)Γn(βn − β0) = A0Γn(βn − β0) + oP (1).
Therefore, the Wald test statistic has
Wn = (C−1n (B∗
n)R(βn))′(A0ΣA′0)
−1(C−1n (B∗
n)R(βn))
= (Γn(βn − β0))′A′0(A0ΣA′
0)−1A0Γn(βn − β0) + oP (1)
= [Γn(βn − βcm,n)]′Σ−1[Γn(βn − βcm,n)] + oP (1)
by (17), which is asymptotically equivalent to the MD test under both the null
and local alternative hypotheses by contiguity. Q.E.D.
Proof of Proposition 7.1 By the expansion of ln Ln(βcn) at βn,
2[lnLn(βn) − ln Ln(βcn)] = −(βn − βcn)′∂2 lnLn(β∗
n)∂β∂β′ (βn − βcn)
= −[Γn(βn − βcn)]′Γ−1n
∂2 lnLn(β∗n)
∂β∂β′ Γ−1n [Γn(βn − βcn)]
= [Γn(βn − βcn)]′Ω[Γn(βn − βcn)] + op(1), (32)
where β∗n lies between βn and βcn. The results (2) and (24) imply that
Γn(βn − βcn) = [Ω−1 − G0(G′0ΩG0)−1G′
0]Γ−1n
∂ lnLn(β0)∂β
+ op(1).
and, hence, (32) can be written as
2[lnLn(βn) − lnLn(βcn)]
37
=∂ ln Ln(β0)
∂β′ Γ−1n [Ω−1 − G0(G′
0ΩG0)−1G′0]Ω
×[Ω−1 − G0(G′0ΩG0)−1G′
0]Γ−1n
∂ lnLn(β0)∂β
+ op(1)
= u′n[Ip − Ω
12 G0(G′
0ΩG0)−1G′0Ω
12 ]un + oP (1) d−→ χ2
(p−q),
where un = Ω−1/2Γ−1n
∂ ln Ln(β0)∂β
= Ω12 Γn(βn − β0) + op(1) d→ N (0, 1) under H0.
Under H1, Assumptions ML-C′ and ML-D′ imply that
Γ′−1n
∂ lnLn(β0)∂β
= Γ′−1n
∂ lnLn(βn0)∂β
+ Γ′−1n
∂2 ln Ln(B∗n)
∂β∂β′ (β0 − βn0)
= Γ′−1n
∂ lnLn(βn0)∂β
− Γ′−1n
∂2 ln Ln(B∗n)
∂β∂β′ Γ−1n ∆
= Γ′−1n
∂ lnLn(βn0)∂β
+ Ω∆ + op(1).
It follows that
Γn(βn − β0) = Ω−1Γ′−1n
∂ lnLn(β0)∂β
+ op(1)
= Ω−1Γ′−1n
∂ lnLn(βn0)∂β
+ ∆ + op(1) d→ N (∆, Ω−1).
The equation (24) implies that
Γn(βcn − β0)
= G0(G′0ΩG0)−1G′
0Γ′−1n
∂ ln Ln(β0)∂β
+ op(1)
= G0(G′0ΩG0)−1G′
0Γ′−1n
∂ ln Ln(βn0)∂β
+ G0(G′0ΩG0)−1G′
0Ω∆ + op(1).
Their difference gives
Ω12 Γn(βn − βcn) = [Ip − Ω
12 G0(G′
0ΩG0)−1G′0Ω
12 ](un + Ω
12 ∆),
where un = Ω− 12 Γ−1
n∂ ln Ln(βn0)
∂β
d→ N (0, 1) by Assumption ML-D′ under H1.
Therefore,
2[lnLn(βn) − ln Ln(βcn)] = [Γn(βn − βcn)]′Ω[Γn(βn − βcn)] + op(1)
= (un + Ω12 ∆)′[Ip − Ω
12 G0(G′
0ΩG0)−1G′0Ω
12 ](un + Ω
12 ∆) + op(1)
d→ χ2(p−q)[∆
′(Ω − ΩG0(G′0ΩG0)−1G′
0Ω)∆].
38
Q.E.D.
Proof of Proposition 7.2 From Propositions 2.2, for the unconstrained GMM
estimator βn,
Γn(βn − β0) = −(F ′0V
−1F0)−1F ′0V
−1Λnfn(β0) + op(1),
under H0. By contiguity, this holds also under H1.
By expansion,
Λnfn(βn) = Λnfn(β0) + Λn∂fn(B∗
n)∂β′ (βn − β0)
= Λnfn(β0) + F0Γn(βn − β0) + oP (1)
= [Ik − F0ΣF ′0V
−1]Λnfn(β0) + op(1),
and
V −1/2Λnfn(βn) = [Ik − V − 12 F0ΣF ′
0V− 1
2 ]un + oP (1),
where Σ = (F ′0V
−1F0)−1 and un = V − 12 Λnfn(β0). It follows that
f ′n(βn)Λ′
nV −1n Λnfn(βn) = u′
n[Ik − V − 12 F0ΣF ′
0V− 1
2 ]un. (33)
For the constrained GMM estimate δn,
Λfn(g(δn)) = Λnfn(β0) + Λn∂fn(B∗
n)∂β′
∂g(∆∗n)
∂δ′(δn − δ0)
= Λnfn(β0) + Fn(B∗n)Gn(∆∗
n)Dn(∆∗n)(δn − δ0)
= Λnfn(β0) + F0G0 · Dn(∆∗n)(δn − δ0) + oP (1)
= [Ik − F0G0(G′0Σ
−1G0)−1G′0F
′0V
−1]Λfn(β0) + oP (1),
and
V − 12 Λnfn(g(δn)) = [Ik − V − 1
2 F0G0(G′0Σ
−1G0)−1G′0F
′0V
− 12 ]un + oP (1). (34)
It follows that
f ′n(g(δn))Λ′
nV −1n Λnfn(g(δn))
= u′n[Ik − V − 1
2 F0G0(G′0Σ
−1G0)−1G′0F
′0V
− 12 ]un + oP (1). (35)
39
From these asymptotic expansions,
f ′n(g(δn))Λ′
nV −1n Λfn(g(δn)) − f ′
n(βn)Λ′nV −1
n Λnfn(βn)
= u′n[V − 1
2 F0ΣF ′0V
− 12 − V − 1
2 F0G0(G′0Σ
−1G0)−1G′0F
′0V
− 12 ]un + op(1) (36)
= [Γn(βn − β0)]′(Σ−1 − Σ−1G0(G′0Σ
−1G0)−1G0Σ−1)[Γn(βn − β0)] + op(1),
because Γn(βn − β0) = −ΣF ′0V
− 12 un + op(1) from Proposition 2.2. From this
expression and (17) in Proposition 5.2, we conclude
f ′n(g(δn))Λ′
nV −1n Λfn(g(δn)) − f ′
n(βn)Λ′nV −1
n Λnfn(βn)
= [Γn(βn − βcm,n)]′F ′0V
−1F0[Γn(βn − βcm,n)] + op(1),
where the latter is the MD test statistic, under both H0 and H1. Q.E.D.
Proof of Proposition 8.1
By the mean value theorem, ∂ ln Ln(βcn)∂β
= ∂2 ln L(β∗n)
∂β∂β′ (βcn − βn). Hence,
∂ ln Ln(βcn)∂β′ (−∂2 lnLn(βcn)
∂β∂β′ )−1 ∂ ln Ln(βcn)∂β
= [Γn(βcn − βn)]′Γ−1n (−∂2 lnLn(β0)
∂β∂β′ )Γ−1n [Γn(βcn − βn)] + oP (1)
= 2[lnLn(βn) − lnLn(βcn)] + oP (1). (37)
From (37), the LM statistic is asymptotically equivalent to the LR test statistic
in (32) under both H0 and H1. Q.E.D.
Proof of Proposition 8.2 Define the following two-step estimates
β∗n = βcn − (
∂2 lnLn(βcn)∂β∂β′ )−1 ∂ ln Ln(βcn)
∂β,
and
β∗cn = βcn − ∂g(δn)
∂δ′(∂g′(δn)
∂δ
∂2 ln Ln(βcn)∂β∂β′
∂g(δn)∂δ′
)−1 ∂ lnLn(βcn)∂δ
.
First, it shall be shown that Cα can be rewritten in terms of distance of β∗n
and β∗cn. Because ∂ ln Ln(g(δ))
∂δ′ = ∂ ln Ln(g(δ))∂β′
∂g(δ)∂δ′ , the difference of these two
estimates is
β∗n − β∗
cn = [(−∂2 lnLn(βcn)∂β∂β′ )−1 − ∂g(δn)
∂δ′(−∂g′(δn)
∂δ
∂2 ln Ln)(βcn)∂β∂β′
×∂g(δn)∂δ
)−1 ∂g′(δn)∂δ
]∂ ln Ln(βcn)
∂β.
40
With this expression, it follows that (β∗n − β∗
cn)′(−∂2 ln Ln(βcn)∂β∂β′ )(β∗
n − β∗cn) = Cα.
Second, it shall be shown that β∗n and β∗
cn are, respectively, asymptoti-
cally equivalent to the unconstrained and constrained MLE’s βn and βcn un-
der both H0 and H1. By the mean value theorem ∂ ln L(βcn)∂β = ∂ ln Ln(β0)
∂β +∂2 ln Ln(βcn)
∂β∂β′ (βcn − β0), it follows that
Γn(β∗n − β0) = −Γn(
∂2 lnLn(βcn)∂β∂β′ )−1[
∂ lnLn(βcn)∂β
− ∂2 lnLn(βcn)∂β∂β′ (βcn − β0)]
= −Γn(∂2 lnLn(βcn)
∂β∂β′ )−1Γ′nΓ
′−1n
∂ ln Ln(β0)∂β
− Γn(∂2 ln Ln(βcn)
∂β∂β′ )−1Γ′n
×Γ′−1n [
∂2 ln Ln(βcn)∂β∂β′ − ∂2 ln Ln(βcn)
∂β∂β′ ]Γ−1n · Γn(βcn − β0)
= −Ω−1Γ′−1n
∂ ln Ln(β0)∂β
+ oP (1)
because Γn(βcn − β0) = Op(1). For the β∗cn, it has
Γn(β∗cn − β0) = G0(G′
0ΩG0)−1G′0 · Γ
′−1n
∂ lnLn(β0)∂β
+[Ip − G0(G′0ΩG0)−1G′
0Ω] · Γn(βcn − β0) + o(1)
= G0(G′0ΩG0)−1G′
0 · Γ′−1n
∂ lnLn(β0)∂β
+ op(1),
where the second term in the first equality goes to zero in probability because
Γn(βcn−β0) = Γn∂g(δn)
∂δ′(δn−δ0) = Gn(δn)Dn(δn−δ0) = G0·Dn(δn−δ)+op(1),
using the mean value theorem. From the proof of Propositions 3.2 and 4.2,
one concludes that Γn(β∗n − β0) = Γn(βn − β0) + op(1) and Γn(β∗
cn − β0) =
Γn(βcn −β0)+op(1) under H0. By contiguity, the asymptotic equivalence holds
also under H1.
Therefore, Cα = [Γn(βn − βcn)]′Ω[Γn(βn − βcn)] + oP (1), i.e., Cα is asymp-
totically equivalent to the minium distance test statistics under both H0 and
H1. Q.E.D.
Proof of Proposition 8.3
By Assumption GMM-D2,
Λn∂fn(βcn)
∂β′ [∂f ′
n(βcn)∂β
Λ′nV −1
n Λn∂fn(βcn)
∂β′ ]−1∂f ′n(βcn)∂β
Λ′n
= Fn(βcn)Γn[Γ′nF ′
n(βcn)V −1n Fn(βcn)Γn]−1Γ′
nF ′n(βcn) = F0ΣF ′
0 + op(1),
41
where Σ = (F ′0V
−1F0)−1.
By (34) in the proof of Proposition 7.2,
V − 12 Λnfn(βc,n) = [Ik − V − 1
2 F0G0(G′0Σ
−1G0)−1G′0F
′0V
− 12 ]un + op(1).
Therefore,
f ′n(βcn)Λ′
nV −1n Λn
∂fn(βcn)∂β′ [
∂f ′n(βcn)∂β
Λ′nV −1
n Λn∂fn(βcn)
∂β]−1
×∂fn(βcn)∂β′ ΛnV −1
n Λnfn(βcn)
= f ′n(βcn)Λ′
nV −1n F0ΣF ′
0V−1n Λnfn(βcn) + op(1)
= u′n[Ik − V − 1
2 F0G0(G′0Σ
−1G0)−1G′0F
′0V
− 12 ]V − 1
2 F0ΣF ′0V
− 12
·[Ik − V − 12 F0G0(G′
0Σ−1G0)−1G′
0F′0V
− 12 ]un + op(1)
= u′n[V − 1
2 F0ΣF ′0V
− 12 − V − 1
2 F0G0(G′0Σ
−1G0)−1G′0F
′0V
− 12 ]un + op(1),
where the latter is the difference test statistic in (36), under both H0 and H1.
Q.E.D.
Proof of Proposition 8.4 Define the following two-step estimates
β∗n = βcn − (
∂f ′n(βcn)∂β
Λ′nV −1
n Λn∂fn(βcn)
∂β′ )−1 ∂f ′n(βcn)∂β
Λ′nV −1
n Λnfn(βcn),
and
β∗cn = βcn−
∂g(δn)∂δ′
(∂f ′
n(βcn)∂δ
Λ′nV −1
n Λn∂fn(βcn)
∂δ′)−1∂f ′
n(βcn)∂δ
Λ′nV −1
n Λnfn(βcn).
First, it shall be shown that C can be rewritten in terms of distance of β∗n
and β∗cn. Because ∂fn(g(δ))
∂δ′ = ∂fn(g(δ))∂β′
∂g(δ)∂δ′ , the difference of these two estimates
can be rewritten as
β∗cn − β∗
n = L′−1n MnL−1
n
∂f ′n(βcn)∂β
Λ′nV −1
n Λnfn(βcn),
where Ln is defined by the decomposition (∂f ′n(βcn)∂β Λ′
nV −1n Λn
∂fn(βcn)∂β′ ) = LnL′
n
and Mn = Ip−L′n
∂g(δn)∂δ′ [∂g′(δn)
∂δ LnL′n
∂g(δn)∂δ′ ]−1 ∂g′(δn)
∂δ Ln. The Mn is a symmetric
and idempotent matrix. Therefore,
(β∗n − β∗
cn)′∂f ′
n(βcn)∂β
Λ′nV −1
n Λn∂fn(βcn)
∂β′ (β∗n − β∗
cn)
42
= f ′n(βcn)Λ′
nV −1n Λn
∂fn(βcn)∂β′ L
′−1n MnL−1
n
∂f ′n(βcn)∂β
Λ′nV −1
n Λnfn(βcn) = C.
Second, it shall be shown that β∗n and β∗
cn are, respectively, asymptotically
equivalent to the unconstrained and constrained optimum GMM estimators βn
and βcn under both H0 and H1. The β∗n implies that
Γn(β∗n − β0) = Γn(
∂f ′n(βcn)∂β
Λ′nV −1
n Λn∂fn(βcn)
∂β′ )−1 ∂f ′n(βcn)∂β
Λ′nV −1
n Λn
×[∂f ′
n(βcn)∂β
(βcn − β0) − fn(βcn)].
By the mean value theorem, fn(βcn) = fn(β0)+ ∂fn(βcn)∂β′ (βcn−β0), and because
Γn(∂f ′
n(βcn)∂β
Λ′nV −1
n Λn∂fn(βcn)
∂β′ )−1 ∂f ′n(βcn)∂β
Λ′n = (F ′
0V−1Fn)−1F ′
0 + op(1),
it follows that
Γn(β∗n − β0) = −(F ′
0V−1F0)−1F ′
0V−1Λnfn(β0) + op(1).
For the β∗cn, it has
Γn(β∗cn − β0) = −G0(G′
0F′0V
−1F0G0)1G′0F
′0V
−1fn(β0)
+[Ip − G0(G′0F
′0V
−1F0G0)−1G′0F
′0V
−1F0]Γn(βcn − β0)
= −G0(G′0F
′0V
−1F0G0)1G′0F
′0V
−1fn(β0) + op(1),
where the second term in the first equality goes to zero in probability because
Γn(βcn − β0) = G0 ·Dn(δn − δ) + op(1). From the proof of Propositions 3.3 and
4.3, one concludes that Γn(β∗n − β0) = Γn(βn − β0) + op(1) and Γn(β∗
cn − β0) =
Γn(βcn −β0)+op(1) under H0. By contiguity, the asymptotic equivalence holds
also under H1.
Therefore, C = [Γn(βn − βcn)]′F ′0V
−1F0[Γn(βn − βcn)] + oP (1), i.e., C is
asymptotically equivalent to the minium distance test statistics under both H0
and H1. Q.E.D.
B Appendix: GMM Overidentification Test
In this appendix, we demonstrate the possible construction of the overidentifica-
tion test statistic in our framework. The overidentification test in Hansen (1982)
43
is designed to test the validity with an extra number of moment conditions for
the GMM estimation of β0.
With the (unconstrained) GMM estimate βn, the minimized objective func-
tion is f ′n(βn)Λ′
nV −1n Λnfn(βn), which can be used as the overidentification test
statistic. The following proposition provides the asymptotic distribution of the
overidentification test statistic. The number of unknown parameters in β is p
and the number of moment equations is k, where p < k.
Proposition A.1 Suppose that Assumptions GMM-C, GMM-D1, GMM-D2
hold, then
f ′n(βn)Λ′
nV −1n Λnfn(βn) d→ χ2
(k−p).
Proof. From (33) in the proof of Proposition 7.2,
f ′n(βn)Λ′
nV −1n Λnfn(βn) = u′
n[Ik − V − 12 F0(F ′
0V−1F0)−1F ′
0V−1
2 ]un + oP (1),
where un = V − 12 Λnfn(β0). Because un
d→ N (0, Ik) under Assumption GMM-
D1, it follows that f ′n(βn)Λ′
nV −1n Λnfn(βn) d→ χ2(k − p). Q.E.D.
C Appendix: An Illustrative Example Where
Assumption G Does Not Hold
Consider the case where β = (β1, β2) and β = g(δ) = (δ2, δ). Suppose that
the unconstrained estimate βn = (βn1, βn2)′ has the asymptotic properties that
Γn(βn − β0)d→ N (0, I2), where Γn = (γn1, γn2) and γn1 is a faster rate than
γn2, i.e., γn2γn1
→ 0.
The derivative of g(δ) with δ is ∂g′(δ)∂δ
= (2δ, 1). As
Γn∂g(δ)∂δ
=(
2δγn2γn1
)γn1,
Gn(δ) = (2δ, γn2γn1
)′ and Dn = γn1. Assumption G will not be satisfied when the
true δ0 is 0 in this example, because Gn(δ0) = Gn(0) → G0 = (0, 0)′ which does
not have full column rank one.
44
Consider now the MD estimation of δ. Because Σn = I2, the MD estimation
is minδ [Γn(βn−g(δ))]′[Γn(βn−g(δ))] = minδ Qn(δ), where Qn(δ) = [γn1(βn1−
δ2)]2 + [γn2(βn2 − δ)]2. Because Qn(δ) is a polynomial function of order four,
one has the following exact expansion from the first order condition of the MDE
δcm,n at δ0 = 0:
0 =∂Qn(δcm,n)
∂δ=
∂Qn(0)∂δ
+∂2Qn(0)
∂δ2δcm,n+
12!
∂3Q(0)∂δ3
δ2cm,n+
13!
∂4Qn(0)∂δ4
δ3cm,n,
where ∂Qn(0)∂δ
= −2γ2n2βn2,
∂2Qn(0)∂δ2 = −4γ2
n1βn1 + 2γ2n2,
∂3Qn(0)∂δ3 = 0, and
∂4Qn(0)∂δ4 = 24γ2
n1. Together, this implies the following relationship among βn1,
βn2 and δcm,n:
γn2βn2 =(
γn2 − 2γn1
γn2γn1βn1
)δcm,n + 2
γ2n1
γn2δ3cm,n. (38)
It turns out that the rate of convergence of δcm,n to δ0 = 0 will depend on how
the ratio γn1γ2
n2will behave, and its asymptotic distribution may or may not be
normally distributed.
Case (1). γn1γ2
n2→ 0 i.e., the rate γn1 is faster than γn2 but slower than γ2
n2:
In this case, the preceding relation (38) shall be rewritten as
γn2βn2 =(
1 − 2γn1
γ2n2
γn1βn1
)γn2δcm,n + 2
γ2n1
γ4n2
(γn2δcm,n)3,
one has γn2δcm,n = γn2βn2 + op(1). Under this situation, δcm,n has the
lower rate γn2 of convergence. As γn2βn2 = γn2(βn2 − β02)d→ N (0, 1)
under β02 = 0, δcm,n is asymptotically normally distributed as βn2. In
this case, the information of βn1 does not play a role even βn1 converges
in probability to zero at the fast rate of γn1.
Case (2). γn1γ2
n2→ c where c 6= 0 is a finite constant i.e., γn1 is as fast as γ2
n2.
The limiting distribution z of γn2δcm,n might not be normally distributed
and will be characterized by the polynomial equation with normal ran-
dom coefficients: v2 − (1 − 2cv1)z − 2c2z3 = 0 where v1 and v2 are two
independent N (0, 1) variables because the limiting distributions of γn1βn1
and γn2βn2 are independently distributed N (0, 1) variables.
45
Case (3). γn1γ2
n2→ ∞, i.e., the rate γn1 is faster than γ2
n2-rate.
For this case, (38) shall be rewritten as
γn2βn2 =(
γ2n2
γn1− 2γn1βn1
)(γn1
γn2δcm,n) + 2
γ2n2
γn1(γn1
γn2δcm,n)3.
As γ2n2
γn1→ 0, it follows that γn1
γn2δcm,n = −1
2γn2βn2
γn1βn1+ op(1). Thus, in this
case, δcm,n has the γn1γn2
-rate of convergence, which is faster than the γn2-
rate but slower than the γn1-rate. The limiting distribution of γn1γn2
δcm,n is
a half of the ratio of two independently distributed N (0, 1) variables.
References
[1] Amemiya, T. (1985), Advanced Econometrics. Harvard University Press,
Cambridge, Massachusetts.
[2] Bickel, P.J., C.A.J. Klaassen, Y. Ritov, and J.A. Wellner (1993), Efficient
and Adaptive Estimation for Semiparametric Models. Baltimore: John Hop-
kins University Press.
[3] Brock, W.A. and S.N. Durlauf (2001), “Interactions-base models”. In Hand-
book of Econometrics, J.J. Heckman and E.E. Leamer (eds). North-Holland:
Amsterdam, 3297-3380.
[4] Dagenais, M.G. and J-M. Dufour (1991), “Invariance, nonlinear models,
and asymptotic tests”. Econometrica 59, 1601-1615.
[5] Hajek, J. and Z. Sidak (1967), Theory of Rank Tests. New York: Academic
Press.
[6] Hansen, L.P. (1982), “Large sample properties of generalized method of
moments estimators”. Econometrica 50, 1029-1054.
[7] Le Cam, L. (1960), “Locally asymptotically normal families of distribu-
tions”. University of California Publications in Statistics 3, 37-98.
46
[8] Lee, L.F., (2004a), “Asymptotic distributions of quasi-maximum likelihood
estimators for spatial econometric models”. Econometrica 72, 1899-1926.
[9] Lee, L.F., (2004b), “Pooling estimates with different rates of convergence –
a minimum χ2 approach: with an emphasis on a social interactions model”.
Manuscript, Department of Economics, OSU.
[10] Lee, L.F., (2005), “A C(α)-tpe gradient test in the GMM approach”.
Manuscript, Department of Economics, OSU.
[11] Manski, C.F., (1993), “Identification of endogenous social effects: the re-
flection problem”. Review of Economic Studies 60: 531-542.
[12] Moon, H.R. and F. Schorfheide (2002), “Minimum distance estimation of
nonstationary time series models”. Econometric Theory 18, 1385-1407.
[13] Nagaraj, N. and W. Fuller (1991), “Estimation of the parameters of linear
time series models subject to nonlinear restrictions”. Annals of Statistics
19, 1143-1154.
[14] Newey, W.K. and K.D. West (1987), “Hypothesis testing with efficient
method of moments estimation”. International Economic Review 28, 777-
787.
[15] Neyman, J. (1959), “Optimal asymptotic tests of composite statistical hy-
potheses”. In Probability and Statistics, the Harald Cramer Volume, ed., U.
Grenander, New York, Wiley.
[16] Park, J. and P.C.B. Phillips (2000), “Nonstationary binary choice”. Econo-
metrica 68, 1249-1280.
[17] Smith, R.J. (1987), “Alternative asymptotically optimal tests and their
application to dynamic specification”. Review of Economic Studies 54, 665-
680.
[18] Ruud, P.A., (2000), Classical Econometric Theory, Oxford University
Press, New York, NY.
47