ConsistentLagrangeMultiplierTypeSpecificationTests ...€¦ ·...
Transcript of ConsistentLagrangeMultiplierTypeSpecificationTests ...€¦ ·...
Consistent Lagrange Multiplier Type Specification Testsfor Semiparametric Models∗
Ivan Korolev†
Job Market Paper
October 12, 2017(the latest version is available here)
Abstract
This paper considers specification testing in semiparametric econometric models. It de-velops a consistent series-based specification test for semiparametric conditional mean modelsagainst nonparametric alternatives. Consistency is achieved by turning a conditional momentrestriction into a growing number of unconditional moment restrictions using series methods.The test is simple to implement because it requires estimating only the restricted semipara-metric model and because the asymptotic distribution of the test statistic is pivotal. The useof series methods in estimation of the null semiparamertic model allows me to account for theestimation variance and obtain refined asymptotic results. The test remains valid even if othersemiparametric methods are used to estimate the null model as long as they achieve suitableconvergence rates. This includes popular kernel estimators for single index or partially linearmodels. The test demonstrates good size and power properties in simulations. I illustrate theuse of my test with the Canadian gasoline demand example from Yatchew and No (2001) andfind no evidence against the semiparametric specifications used in that paper.
∗I am grateful to my advisors Frank Wolak, Han Hong, and Peter Reiss for their support and guidance,as well as to Svetlana Bryzgalova, Brad Larsen, and Joe Romano for very helpful conversations. I also thankChris Bruegge, Liran Einav, Guido Imbens, Gordon Leslie, Jessie Li, Onder Polat, and Paulo Somaini foruseful comments. I thank Adonis Yatchew for permission to use the Canadian household gasoline consumptiondataset from Yatchew and No (2001). I gratefully acknowledge the financial support from the StanfordGraduate Fellowship Fund as a Koret Fellow and from the Stanford Institute for Economic Policy Researchas a B.F. Haley and E.S. Shaw Fellow. All remaining errors are mine.
†Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA, 94305. E-mail:[email protected]. Website: http://web.stanford.edu/~ikorolev/
1
1 Introduction
Applied economists often want to achieve two conflicting goals in their work. On the
one hand, they wish to use the most flexible specification possible, so that their results are
not driven by functional form assumptions. On the other hand, they wish to have a model
that is consistent with the restrictions imposed by economic theory and can be used for valid
counterfactual analysis.
While parametric models are often too restrictive and may not capture heterogeneity in
the data well, nonparametric models may violate restrictions imposed by economic theory
and suffer from the curse of dimensionality, i.e. become imprecise if the dimensionality
of regressors is high. Because of this, and because economic theory usually specifies one
portion of the model but leaves the other unrestricted, semiparametric models are especially
attractive for empirical work in economics. For instance, semiparametric models have been
used in estimation of demand functions (Hausman and Newey (1995), Schmalensee and Stoker
(1999), Yatchew and No (2001)), production functions (Olley and Pakes (1996), Levinsohn
and Petrin (2003)), Engel curves (Gong et al. (2005)), the labor force participation equation
(Martins (2001)), and the relationship between land access and poverty (Finan et al. (2005)).
Because many semiparametric models are restricted versions of fully nonparametric mod-
els, it is important to check the validity of implied restrictions. If semiparametric models
are correctly specified, then using them, as opposed to nonparametric models, typically leads
to more efficient estimates and may increase the range of counterfactual questions that can
be answered using the model at hand. On the other hand, if semiparametric models are
misspecified, then the semiparametric estimates are likely to be misleading and may result
in incorrect policy implications.
In this paper I develop a new specification test which determines whether a semiparamet-
ric conditional mean model that the researcher has estimated provides a statistically valid
description of the data as compared to a general nonparametric model. The test statistic
is based on a quadratic form in the semiparametric model residuals. When the errors are
2
homoskedastic, this quadratic form can be computed as nR2 from the regression of the semi-
parametric residuals on the series approximating functions. Thus, the proposed test is simple
to implement and avoids kernel smoothing in high dimensions. Moreover, the asymptotic dis-
tribution of the test statistic is pivotal, i.e. does not depend on the unknown parameters, so
that calculating asymptotically exact critical values for the test is straightforward and does
not require the use of resampling methods.
The proposed test uses series methods to turn a conditional moment restriction into a
growing number of unconditional moment restriction. I show that if the series functions
can approximate the nonparametric alternatives that are allowed as the sample size grows,
the test is consistent. My assumptions and proofs make precise what is required of the
approximation and its behavior as the number of series terms and the sample size grow.
These arguments differ from standard parametric arguments, when the number of regressors
in the model is fixed.
My asymptotic theory allows both the number of parameters under the null as well as
the number of restrictions to grow with the sample size. By doing so, I show that the
parametric Lagrange Multiplier test can be extended to semiparametric models and serve
as a consistent model specification test for these models. Because series methods have a
projection interpretation, using series methods to nest the null model in the alternative and
estimate the restricted model makes it possible to directly account for the estimation variance
and obtain refined asymptotic results. This refinement, which can be viewed as a degrees of
freedom correction, allows me to derive the asymptotic distribution of the test statistic under
fairly weak rate conditions and leads to very good finite sample performance of the test in
simulations.
Though this refinement is unique to series estimation methods, the proposed test, with a
slight modification, remains valid even if other semiparametric methods, such as kernels or
local polynomials, are used to estimate the null model. Thus, the test applies to a wide class
of semiparametric models, including single index models or partially linear models estimated
3
using the two-step method proposed in Robinson (1988). Because the degrees of freedom
correction is not available in that case, I have to impose more restrictive rate conditions
on the convergence rates of semiparametric estimators, as well as an additional high level
assumption which may be difficult to verify in practice. Moreover, even though the test
statistics for series estimators and for other semiparametric estimators are asymptotically
equivalent, my simulations show that the test based on the latter is undersized and low-
powered in finite samples.
Intuitively, while the test based on series estimation methods uses the projection property
to directly account for the form of the semiparametric residuals, the test based on general
estimation methods only requires the semiparametric residuals to be close to the true errors.
However, in finite samples, there may be substantial difference between the semiparametric
residuals and the true errors, which the latter approach fails to capture. As a result, even
though both approaches are asymptotically valid, the former one yields an accurate approx-
imation of the finite sample distribution of the test statistic, while the latter one does not
work nearly as well.
Specification tests have long played an important role in theoretical econometrics. Several
papers have studied specification testing when the null model contains a nonparametric
component. Early work on specification testing in semiparametric models required certain
ad hoc modifications, such as sample splitting in Yatchew (1992) and Whang and Andrews
(1993) or randomization in Gozalo (1993). Fan and Li (1996) solve this problem and develop
a kernel-based specification test which can be used to test a semiparametric null hypothesis
against a general nonparametric alternative, but their test requires high-dimensional kernel
smoothing and cannot be implemented with standard econometric software. Lavergne and
Vuong (2000) refine the test of Fan and Li (1996), but they only consider significance testing
in nonparametric models. Kernel-based specification tests are also developed in Chen and
Fan (1999), Delgado and Manteiga (2001), Ait-Sahalia et al. (2001), and Bravo (2012). In
all these papers, the asymptotic distribution of the test statistic is quite complicated and
4
requires either estimating several nuisance parameters or using the bootstrap, which can be
computationally costly.
I circumvent this problem by relying on series methods to construct the test statistic.
Because the number of series terms grows with the sample size, the usual asymptotic results
for the parametric Lagrange Multiplier no longer apply. However, it is possible to normalize
the test statistic so that the resulting normalized statistic is asymptotically standard normal.
Therefore, the quantiles of the standard normal distribution can be used as asymptotically
exact critical values for the test.
The proposed test is based on a quadratic form in the restricted model residuals. Thus,
it can be viewed as a nonparametric generalization of the conventional Lagrange Multiplier
test, classical treatments of which include Breusch and Pagan (1980), Engle (1982), and En-
gle (1984). This generalization is not novel in itself, as Hall (1990) and McCulloch and Percy
(2013) also consider Lagrange Multiplier specification tests for parametric against nonpara-
metric models. However, they only consider null hypotheses with fully specified parametric
distributions, while I allow for semiparametric conditional mean models. Moreover, their
asymptotic analysis treats the number of series terms in the alternative model as fixed, and
as a result their tests fail to achieve consistency. In contrast, I develop an asymptotic the-
ory for the case when the number of series terms grows with the sample size and obtain a
consistent specification test.
My work is closely related to the literature on series-based specification tests, such as
de Jong and Bierens (1994), Hong and White (1995), Koenker and Machado (1999), Donald
et al. (2003), and Sun and Li (2006). These papers extend the Conditional Moment test
of Newey (1985) by considering a growing number of unconditional restrictions and thus
achieve consistency. However, they only consider parametric null hypotheses, while my test
can handle a broad class of semiparametric models. Moreover, these papers do not explicitly
develop the degrees of freedom correction. In contrast, I develop this correction for the
case when the semiparametric null model is nested in the nonparametric alternative and
5
estimated using series methods, and I show that it plays a crucial role in the semiparametric
case, because it allows me to weaken the rate conditions and improves the finite sample
performance of the test.
Hong and White (1995), among others, investigate the behavior of the test statistic for
the parametric null hypotheses under the global alternative and under local alternatives. I
repeat this analysis for semiparametric null hypotheses and reach similar conclusions: the
test statistic diverges to infinity at a rate faster than the parametric rate n1/2 under the
global alternative, but the test can only detect local alternatives which approach zero slower
than the parametric rate n−1/2. Moreover, both rates are asymptotically the same as in the
parametric case.1
To my knowledge, there are two studies that develop series-based specification tests that
allow for semiparametric null hypotheses. Gao et al. (2002) consider only additive models
and do not explicitly develop a consistent test against a general nonparametric alternative.
Their test is based on the estimates from the unrestricted model and can be viewed as a Wald
type test for variables significance in nonparametric additive conditional mean models. In
contrast, my test is based on the residuals from the restricted semiparametric model and is
consistent against a broad class of fully nonparametric alternatives for the conditional mean
function.
Li et al. (2003) use the approach that was first put forth in Bierens (1982) and Bierens
(1990) and develop a series-based specification test based on weighting the moments, rather
than considering an increasing number of series-based unconditional moments. Their test
can detect local alternatives which approach zero at the parametric rate n−1/2, but the
asymptotic distribution of their test statistic depends on nuisance parameters, and it is
difficult to obtain appropriate critical values. They propose using a residual-based wild
bootstrap to approximate the critical values. In contrast, my test statistic is asymptotically
standard normal under the null, so calculating the critical values is straightforward.1By saying “asymptotically the same,” I mean that the exact rates in the parametric and semiparametric
cases are different but the ratio of two rates goes to 1 as n→∞.
6
Another attractive feature of the proposed test is that the alternative model does not
have to be fully nonparametric. Because series methods make it easy to impose restrictions
on nonparametric models, the proposed test can be used to test a more restricted semipara-
metric model against a broader semiparametric alternative instead of a fully nonparametric
alternative. For instance, a researcher may be willing to compare a partially linear model
Yi = X ′1iβ + g(X2i) + εi with a varying coefficient model Yi = X ′i1β(X2i) + g(X2i) + εi or an
additive model Yi = h(X1i) + g(X2i) + εi. The proposed test can be modified to handle such
comparisons by considering only the unconditional moments based on the series terms that
are present under the alternative.
Restricting the class of alternatives will result in the loss of consistency against a general
nonparametric alternative, because the semiparametric class of alternatives will be unable
to detect certain deviations from the null. However, this will also increase the test power if
the true model does lie in the conjectured semiparametric class. It is possible to use my test
with several alternatives simultaneously, including semiparametric alternatives to improve
the power in certain directions but also including a general nonparametric alternative to
achieve consistency. The Bonferroni correction can then be used to control the test size. I
show in simulations that this approach leads to higher power when the null hypothesis is false
but the true model is still semiparametric without disturbing the size of the test or losing
consistency against a general class of alternatives.
Finally, similar to the overidentifying restrictions test in GMM models or other omnibus
specification tests, the proposed test is silent on how to proceed if the null is rejected. In this
respect, it is clearly a test of a particular model specification but not a comprehensive model
selection procedure. I plan to study model selection methods for semiparametric and non-
parametric models, such as series-based Bayesian Information Criterion or upward/downward
testing procedures based on the proposed test, in future work.
The remainder of the paper is organized as follows. Section 2 presents a motivating
example from industrial organization. Section 3 introduces the model and describes how to
7
construct the series-based specification test for semiparametric models. Section 4 develops
the asymptotic theory for the proposed test when series methods are used in estimation.
Section 5 extends the asymptotic theory to the case when other semiparametric methods,
such as kernels or local polynomials, are used in estimation. Section 6 studies behavior
of the proposed test in simulations. Section 7 applies the proposed test to the Canadian
household gasoline consumption example from Yatchew and No (2001) and shows that the
semiparametric specifications used in that paper are not rejected. Section 8 concludes.
Appendix A collects all tables and figures. Appendix B provides an intuitive derivation
of the proposed test as well as a step-by-step description of how to implement the proposed
test. Appendix C contains proofs of technical results.
2 Motivating Example
Suppose that a researcher has cross-section data on total costs TCi, output Qi, firm
characteristics Zi, and factor prices pLi and pKi for firms in a given industry, and wants to
estimate a cost function. Any cost function C(Q, pL, pK) has to satisfy certain properties,
such as monotonicity, concavity, and homogeneity of degree 1 in factor prices. Because
nonparametric estimation of the cost function may lead to violation of these restrictions, the
researcher may choose a theory-based parametric functional form of the cost function.
If the researcher assumes a Cobb-Douglas production function for firm i, Qi = AiLαi K
βi ,
under certain assumptions on the unobservables (for details, see Reiss and Wolak (2007)) she
will derive the following relationship:
lnTCi = C1 + γ ln pKi + (1− γ) ln pLi + δ lnQi + εi,
where γ = βα+β
and δ = 1α+β
, and E[εi| ln pKi, ln pLi, Qi, Zi] = 0.
This specification is restrictive because it it does not allow the parameters to vary across
firms, while economic theory usually provides no reason to believe that the parameters should
8
be the same for all firms. Hence, a better alternative may be a semiparametric varying
coefficient model, which allows the input prices and output elasticities to vary flexibly with
observable firm characteristics Zi: γ = γ(z) and δ = δ(z) for some unknown functions
γ(·) and δ(·). Then the labor and capital shares can be recovered from these functions as
α(z) = 1−γ(z)δ(z)
and β(z) = γ(z)δ(z)
. As a result, the total cost relationship will be given by:
lnTCi = C1(Zi) + γ(Zi) ln pKi + (1− γ(Zi)) ln pLi + δ(Zi) lnQi + εi,
This model may fit the data better and be more realistic than the fully parametric model.
Moreover, it can easily satisfy the monotonicity, concavity, and homogeneity restrictions.
However, because the semiparametric model is more restricted than the nonparametric model,
the researcher may want to guard against possible misspecification of the semiparametric
model and check whether it provides a reasonable description of the data. The proposed
test allows the researcher to test the semiparametric model against a general nonparametric
model:
lnTCi = g(lnQi, ln pLi, ln pKi, Zi) + Ui,
with E[Ui| lnQi, ln pLi, ln pKi, Zi] = 0.
3 Model and Specification Test
This section describes the model, discusses semiparametric series estimators, and intro-
duces the test statistic.
3.1 Model and Null Hypothesis
Let (Yi, X′i)′ ∈ R1+dx , dx ∈ N, i = 1, ..., n, be independent and identically distributed
random variables with E[Y 2i ] < ∞. Then there exists a measurable function g such that
9
g(Xi) = E[Yi|Xi] a.s. Then the nonparametric model can be written as
Yi = g(Xi) + εi, E[εi|Xi] = 0
The goal of this paper is to test the null hypothesis that the conditional mean function
is semiparametric. A generic semiparametric null hypothesis is given by
HSP0 : PX (g(Xi) = f(Xi, θ0, h0)) = 1 for some θ0 ∈ Θ, h0 ∈ H, (3.1)
where f : X ×Θ×H → R is a known function, θ ∈ Θ ⊂ Rd is a finite-dimensional parameter,
and h ∈ H = H1 × ...×Hq is a vector of unknown functions.
The global alternative is
H1 : PX (g(Xi) 6= f(Xi, θ, h)) > 0 for all θ ∈ Θ, h ∈ H (3.2)
When the semiparametric null hypothesis is true, the model becomes
Yi = f(Xi, θ0, h0) + εi, E[εi|Xi] = 0
Because the model is semiparametric while the moment condition E[εi|Xi] = 0 is fully
nonparametric, the null model is overidentified and it is possible to test its specification.
In order to do so, I turn the conditional moment restriction E[εi|Xi] = 0 into a sequence
of unconditional moment restrictions using series methods. But first, I introduce series
approximating functions and semiparametric series estimators.
Let P kn(x) = (p1(x), ..., pkn(x))′ be a kn-dimensional vector of approximating functions
of x, where the number of series terms kn grows with the sample size n. Possible choices of
series functions include:
10
(a) Power series. For univariate x, they are given by
P kn(x) = (1, x, ..., xkn−1)′ (3.3)
Multivariate power series can be formed from products of univariate power series (see
Section 5 in Newey (1997)).
(b) Splines. Let s be a positive scalar giving the order of the spline, and let t1, ..., tkn−s−1
denote knots. Then a vector of spline approximating functions for univariate x is given
by (see Section 2 in Donald et al. (2003)):
P kn(x) = (1, x, ..., xs,1x > t1(x− t1)s, ...,1x > tkn−s−1(x− tkn−s−1)s) (3.4)
Multivariate spline approximating functions can be formed from products of univariate
splines.
3.2 Series Estimators
In this section I introduce additional notation and define semiparametric estimators for
the case when series methods are used to estimate the restricted model.2 Suppose that the
researcher writes the fully nonparametric model in the series form:
Yi = g(Xi) + εi ≡ P kn(Xi)′β +Rkn
i + εi,
where Rkni ≡
(g(Xi)− P kn(Xi)
′β)is the approximation error. Rewrite the model as:
Yi = g(Xi) + εi ≡ f(Xi, θ∗, h∗) + (g(Xi)− f(Xi, θ
∗, h∗)) + εi ≡ f(Xi, θ∗, h∗) + d(Xi) + εi,
2For more details and an example, refer to Appendix B.1.
11
where f(Xi, θ∗, h∗) is the semiparametric null model, θ∗ and h∗ are pseudo-true param-
eter values3 that coincide with the true values θ0 and h0 under the null, and d(Xi) ≡
(g(Xi)− f(Xi, θ∗, h∗)) is the misspecification error. The null hypothesis corresponds to
d(Xi) = 0 w.p. 1.
In many cases of interest, it is possible to explicitly separate P kn(x) into two groups of
series terms. The first group, Wmn(Xi) is present both under the null and alternative and
approximates f(Xi, θ, h), while the other group, T rn(Xi), is present only under the alternative
and approximates d(Xi).
Rewrite the model as
Yi = P kn(Xi)′β +Rkn
i + εi ≡ Wmn(Xi)′β1 + T rn(Xi)
′β2 +Rmn1i +Rrn
2i + εi,
where E[εi|Xi] = 0, P kn(Xi) = (Wmn(Xi)′, T rn(Xi)
′)′, β = (β′1, β′2)′, Rkn
i = Rmn1i +Rrn
2i . Here
Rmn1i ≡ (f(Xi, θ
∗, h∗)−Wmn(Xi)′β1) and Rrn
2i ≡ (d(Xi)− T rn(Xi)′β2) are the approximation
errors for two parts of the model.
In addition to kn, which will be required to grow with the sample size to achieve con-
sistency, mn can also grow with the sample size, because the null model is allowed to be
semiparametric. rn = kn − mn is the number of series terms that capture the deviation
from the semiparametric model. This number has to grow with the sample size if it is to
approximate any nonparametric alternative. Typically the number of terms in the restricted
semiparametric model, mn, is significantly smaller than the number of terms in the unre-
stricted nonparametric model, kn, so that rn →∞, mn/kn → 0 and rn/kn → 1.
The null hypothesis corresponds to d(Xi) = 0, which implies β2 = 0 and Rrn2i = 0. The
null model becomes4
Yi = W ′iβ1 +Ri + εi
3By “parameters”, I mean both finite-dimensional parameters θ and unknown functions, or infinite-dimensional parameters, h.
4To simplify notation, I denote Wi ≡Wmn(Xi) and Ri ≡ Rmn1i , and make their dependence on the sample
size implicit. Note that Ri = Rkni under the null.
12
The estimate of β under the null is5
β1 = (W ′W )−1W ′Y,
and the restricted residuals are given by
ε = Y −Wβ1 = Y −W (W ′W )−1W ′Y = MWY = MW (ε+R),
where MW = I −PW , PW = W (W ′W )−1W ′. The residuals satisfy the condition W ′ε/n = 0.
3.3 Test Statistic
Instead of the conditional moment restriction implied by the null hypothesis: E[εi|Xi] =
E[Yi − f(Xi, θ0, h0)|Xi] = 0, the test will be based on the unconditional moment restriction
E[P kn(Xi)εi] = 0. I will show that requiring kn to grow with the sample size and P kn(x)
to approximate any unknown function sufficiently well6 will allow me to obtain a consistent
specification test based on a growing number of unconditional moment restrictions.
The test statistic is be based on the sample analog of the population moment condition
E[P kn(Xi)εi] = 0. In the homoskedastic case, the test statistic resembles the LM test statistic
and is given by
ξ = ε′P (σ2P ′P )−1P ′ε, (3.5)
where P = (P kn(X1), ..., Pkn(Xn))′, ε = (ε1, ..., εn)′, εi = Yi − f(Xi, θ, h) are semiparamet-
ric residuals, and σ2 = ε′ε/n. In the heteroskedastic case, the test statistic is modified
appropriately:
ξHC = ε′P (P ′ΣP )−1P ′ε, (3.6)
5For any vector Vi, let V = (V1, ..., Vn)′ be a matrix which stacks all observations together.6The notion of sufficiently well is made precise later.
13
where Σ = diag(ε2i ).
Because the dimensionality kn of P grows with the sample size, a normalization is needed
to obtain a convergence in distribution result. I show that the test statistic
tτn =ξ − τn√
2τn(3.7)
for a suitable sequence τn →∞ as n→∞ works and is asymptotically pivotal. I will discuss
the choice of τn in the next section.
Remark 1 (Computing the Test Statistic ξ). In the homoskedastic case, there are two simple
regression-based ways to compute ξ.
1. Step 1. Estimate the restricted semiparametric model, obtain the residuals εi.
Step 2. Regress the residuals εi on the series functions P kn(Xi), compute R2 from this
regression. Then ξ = nR2.
2. Step 1. Estimate the semiparametric model using 2SLS, with Wmn(Xi) as regressors
and P kn(Xi) = (Wmn(Xi)′, T rn(Xi)
′)′ as instrumental variables.
Step 2. Obtain the 2SLS residuals εi, compute σ2 = ε′ε/n
Step 3. Compute the overidentifying restrictions test statistic J = ε′P (σ2P ′P )−1P ′ε.
Because in this case the 2SLS estimates are equal to the OLS estimates, ξ = J .
4 Asymptotics with Series Estimation Methods
In this section I derive the asymptotic properties of the proposed test. I focus here
on the case when series methods are used to nest the null model in the alternative and
estimate the restricted model. This is because series methods have a projection interpretation,
which makes it possible to directly account for the estimation variance. As a result, the
rate conditions required for the validity of the proposed test are mild. The use of other
14
semiparametric estimators, such as kernels or local polynomials, does not allow for the degrees
of freedom correction and leads to more restrictive rate conditions. The results on the
asymptotic behavior of the proposed test in a general case are presented in Section 5.
First, I impose basic assumptions on the data generating process.
Assumption 1. (Yi, X′i)′ ∈ R1+dx , dx ∈ N, i = 1, ..., n are i.i.d. random draws of the random
variables (Y,X), and the support of X, X , is a compact subset of Rdx.
Next, I define the error term and impose two moment conditions on it.
Assumption 2. Let εi = Yi − E[Yi|Xi]. The following two conditions hold:
(a) 0 < σ2(x) = E[ε2i |Xi = x] <∞.
(b) E[ε4i |Xi] is bounded.
The following assumption deals with the behavior of the approximating series functions.
From now on, let ||A|| = [tr(A′A)]1/2 be the Euclidian norm of a matrix A.
Assumption 3. (Donald et al. (2003), Assumption 2)
For each k there is a constant scalar ζ(kn) and matrix B such that P k(x) = BP k(x) for
all x ∈ X , supx∈X ||P k(x)|| ≤ ζ(k), E[P k(Xi)Pk(Xi)
′] has smallest eigenvalue bounded away
from zero uniformly in x, and√k ≤ ζ(k).
Remark 2 (ζ(k) for Common Basis Functions). Explicit expressions for ζ(k) are available for
certain families of basis approximating functions. For instance, it has been shown (see, e.g.,
Section 15.1.1 in Li and Racine (2007)) that under additional assumptions, ζ(k) = O(k1/2)
for splines and ζ(k) = O(k) for power series.
The following lemma shows that a convenient normalization can be used. This normal-
ization is typical in the literature on series methods.
Lemma 1. (Donald et al. (2003), Lemma A.2)
If Assumption 3 is satisfied then it can be assumed without loss of generality that P k(x) =
P k(x) and that E[P k(Xi)Pk(Xi)
′] = Ik.
15
Next, I impose an assumption which requires the approximation error for the semipara-
metric model to vanish sufficiently fast under the null.
Assumption 4. Suppose that H0 holds. There exists α > 0 such that
supx∈X|f(x, θ0, h0)−Wmn(x)′β1| = O(m−αn )
The following lemma provides the rates of convergence for semiparametric series estima-
tors under the null hypothesis.
Lemma 2 (Li and Racine (2007), Theorem 15.1). Suppose that H0 holds.. Let g(x) =
f(x, θ, h) = Wmn(x)′β1, gi = g(Xi) = f(Xi, θ0, h0), and gi = g(Xi). Under Assumptions 1,
3, and 4, the following is true:
(a) supx∈X |g(x)− g(x)| = Op
(ζ(mn)(
√mn/n+m−αn )
)(b) 1
n
∑ni=1 (gi − gi)2 = Op(mn/n+m−2αn )
(c)∫
(g(x)− g(x))2dF (x) = Op(mn/n+m−2αn )
Remark 3 (Convergence Rates for Semiparametric Series Estimators). The exact rates given
in Lemma 2 are derived in Li and Racine (2007) for series estimators in nonparametric models.
However, as other examples in Chapter 15 in Li and Racine (2007) show, similar rates can be
derived in a wide class of semiparametric models, such as partially linear, varying coefficient,
or additive models (see Theorems 15.5 and 15.7 in Li and Racine (2007)). In each of these
cases, it is possible to replace the rates in Lemma 2 with the rates for the particular case of
interest. With appropriately modified rate conditions and assumptions, the results developed
below will continue to hold.
Given the assumptions and results above, it is now possible to derive the asymptotic
distribution of the test statistic 3.7 under the null. I separately consider cases when the
errors are homoskedastic and heteroskedastic.
16
4.1 Homoskedastic Case
The limiting distribution of the test statistic under the null is given by the next result:
Theorem 1. Assume that series methods are used to estimate the restricted model. Further,
assume that Assumptions 1, 2, 3, and 4 are satisfied. Moreover, σ2(x) = σ2, 0 < σ2 < ∞,
for all x ∈ X , and the following rate conditions hold:
ζ(kn)2knr1/2n /n→ 0 (4.1)
ζ(rn)rn/n1/2 → 0 (4.2)
ζ(kn)m1/2n k1/2n /n1/2 → 0 (4.3)
nm−2αn /r1/2n → 0 (4.4)
ζ(rn)2/n1/2 → 0 (4.5)
Then
trn =ξ − rn√
2rn
d→ N(0, 1),
where ξ is as in Equation 3.5.
Remark 4 (Rate Conditions under Homoskedasticity). Conditions 4.1–4.5 are sufficient
conditions for the result of the theorem to hold. Conditions 4.1 and 4.2 are used to bound the
error from replacing σ2T ′MWT with σ2E[TiT′i ]. Conditions 4.3 and 4.4 are used to bound the
error from approximating unknown functions with finite series expansions. Condition 4.5 is
used to obtain the convergence in distribution result by applying Lemma A.1 in the Appendix.
Remark 5 (Examples of Permissible Choices of mn, rn, and kn under Homoskedasticity). As
discussed above, typically ζ(x) = O(x1/2) for splines and ζ(x) = O(x) for power series. It can
be shown that for splines, the rates kn = O(n2/7), rn = O(n2/7), mn = O(n1/4) are permissible
in the sense of rate conditions 4.1–4.5 if α ≥ 4. For power series, the rates kn = O(n2/9),
rn = O(n2/9), mn = O(n1/5) are permissible in the sense of rate conditions 4.1–4.5 if α ≥ 5.
17
The following corollary shows that using a χ2 approximation with a growing number of
degrees of freedom directly also results in an asymptotically exact test in the homoskedastic
case.
Corollary 1. If the conditions of Theorem 1 hold, then
P(ξ ≥ χ2
rn(1− α))
= P
(ξ − rn√
2rn≥χ2rn(1− α)− rn√
2rn
)→ α
Thus, there are two ways to construct an asymptotically exact specification test. Exactly
which one might be preferred in terms of the finite sample performance is studied in Section 6.
4.2 Heteroskedastic Case
The limiting distribution of the heteroskedasticity robust test statistic under the null is
given by the next result:
Theorem 2. Assume that series methods are used to estimate the restricted model. Further,
assume that Assumptions 1, 2, 3, and 4 are satisfied. Moreover, the following rate conditions
hold:
(mn/n+m−2αn )ζ(rn)2r1/2n → 0 (4.6)
ζ(rn)rn/n1/2 → 0 (4.7)
ζ(kn)m1/2n k1/2n /n1/2 → 0 (4.8)
nm−2αn /r1/2n → 0 (4.9)
ζ(rn)2/n1/2 → 0 (4.10)
Also assume that ||Ω− Ω|| = op(r−1/2n ), where Ω = T ′ΣT/n and
Ω = T ′ΣT/n− (T ′ΣW/n)(W ′ΣW/n)−1(W ′ΣT/n)
18
Then
tHC,rn =ξHC − rn√
2rn
d→ N(0, 1),
where ξHC is as in Equation 3.6.
Remark 6 (Rate Conditions under Heteroskedasticity). Conditions 4.6–4.10 are sufficient
conditions for the result of the theorem to hold. Conditions 4.6 and 4.7 are used to bound the
error from replacing T ′ΣT/n with E[ε2iTiT′i ].7 Conditions 4.8 and 4.9 are used to bound the
error from approximating unknown functions with finite series expansions. Condition 4.10 is
used to obtain the convergence in distribution result by applying Lemma A.1 in the Appendix.
Remark 7 (Examples of Permissible Choices of mn, rn, and kn under Heteroskedasticity). It
can be shown that the same rates as in Remark 5 (kn = O(n2/7), rn = O(n2/7), mn = O(n1/4)
for splines if α ≥ 4; kn = O(n2/9), rn = O(n2/9), mn = O(n1/5) for power series if α ≥ 5) are
permissible in the sense of rate conditions 4.6–4.10.
However, it is not clear whether these rates satisfy the high-level assumption ||Ω− Ω|| =
op(r−1/2n ). It is possible that this assumption imposes stronger rate restrictions which are
difficult to verify.
The following corollary shows that using a χ2 approximation with a growing number of
degrees of freedom directly also results in an asymptotically exact test in the heteroskedastic
case.
Corollary 2. If the conditions of Theorem 2 hold, then
P(ξHC ≥ χ2
rn(1− α))
= P
(ξHC − rn√
2rn≥χ2rn(1− α)− rn√
2rn
)→ α
7Instead of imposing primitive conditions under which Ω can be replaced with Ω, I directly impose therequirement that ||Ω − Ω|| = op(r
−1/2n ). This is a high level condition, but it is difficult to derive sufficient
primitive conditions.
19
4.3 Behavior of Test Statistic under a Global Alternative
In this subsection, I study the behavior of the test statistic under a global alternative.
To obtain a consistent specification test, I will rely on the following result from Donald
et al. (2003) that shows that the conditional mean restriction E[εi|Xi] = 0 is equivalent to a
sequence of unconditional moment restrictions.
Assumption 5. (Donald et al. (2003), Assumption 1)
Assume that E[P k(Xi)Pk(Xi)
′] is finite for all k, and for any a(x) with E[a(Xi)2] < ∞
there are k × 1 vectors γk such that, as k →∞,
E[(a(Xi)− P k(Xi)′γk)
2]→ 0
Lemma 3. (Donald et al. (2003), Lemma 2.1)
Suppose that Assumption 5 is satisfied and E[ε2i ] is finite. If E[εi|Xi] = 0 then E[P k(Xi)εi] =
0 for all k. Furthermore, if E[εi|Xi] 6= 0 then E[P k(Xi)εi] 6= 0 for all k large enough.
This is a population result in the sense that it does not involve the sample size n. In
order to use this result in practice, I require the number of series terms used to construct the
test statistic, kn, to grow with the sample size. By doing so, I ensure that the unconditional
moment restriction E[P kn(Xi)εi] = 0, on which the test is based, is equivalent to the condi-
tional moment restriction E[εi|Xi]. Thus, the test will be consistent against a wide class of
general nonparametric alternatives.
In order to analyze the behavior of the test under a global alternative, I introduce some
notation first. The true model is nonparamertic:
Yi = g(Xi) + εi, E[εi|Xi] = 0
20
An alternative way to write this model is
Yi = f(Xi, θ∗, h∗) + ε∗i ,
where θ∗ and h∗ are pseudo-true parameter values and ε∗i ≡ εi + (g(Xi) − f(Xi, θ∗, h∗)) ≡
εi + d(Xi) is a composite error term. The pseudo-true parameter values minimize
E[(g(Xi)− f(Xi, θ, h))2]
over a suitable parameter space, and I assume that the semiparametric estimates under
misspecification are consistent for the pseudo-true values.
Note that the model can be written as
Yi = Wmn(Xi)′β∗1 + ε∗i +R∗i ,
where R∗i ≡ (f(Xi, θ∗, h∗) −Wmn(Xi)
′β∗1). The pseudo-true parameter value β∗1 solves the
moment condition E[Wmn(Xi)(Yi−Wmn(Xi)′β∗1)] = 0, and the semiparametric estimator β1
solves its sample analog W ′(Y −W ′β1)/n = 0.
The following theorem provides the divergence rate of the test statistic under the global
alternative.
Theorem 3. Assume that series methods are used in estimation. Let Ω∗ = E[ε∗2i TiT′i ]. In
the homoskedastic case, let Ω = σ2T ′MWT/n, and in the heteroskedastic case, let
Ω = T ′ΣT/n− (T ′ΣW/n)(W ′ΣW/n)−1(W ′ΣT/n),
where Σ = diag(ε2i ).
Suppose that supx∈X |f(x, θ∗, h∗)−Wmn(x)′β∗1 | → 0, ||Ω − Ω∗|| p→ 0, the smallest eigen-
value of Ω∗ is bounded away from zero, mn →∞, rn →∞, rn/n→ 0, E[ε∗iT′i ]Ω∗−1E[Tiε
∗i ]→
21
∆, where ∆ is a constant. Then under homoskedasticity
√rnn
ξ − rn√2rn
p→ ∆/√
2,
and under heteroskedasticity √rnn
ξHC − rn√2rn
p→ ∆/√
2
The divergence rate of the test statistic under the alternative is n/√rn. However, in
most cases, the restricted semiparametric model is of lower dimension than the unrestricted
nonparametric model, so that mn/kn → 0 and rn/kn → 1. Thus, the divergence rate in the
semiparametric case discussed here is the same as in the parametric case in Hong and White
(1995) and Donald et al. (2003), so the fact that the null hypothesis is semiparametric does
not affect the global power of the test.
4.4 Behavior of Test Statistic under Local Alternatives
In this section, I analyze the power of the proposed test against local alternatives of the
form
H1n : gn(Xi) = f(Xi, θ∗, h∗) + (r1/4n /n1/2)d0(Xi),
where d0 is square integrable on X , E[d0(Xi)] = 0, and E[f(Xi, θ∗, h∗)d0(Xi)] = 0. The
null hypothesis corresponds to d0 = 0. I need to slightly modify the assumptions from the
precious sections.
First, I impose the following condition. It is not primitive; however, an analogous result
for ε instead of d is derived in the proof of Theorem 1. Thus, this assumption is likely to
hold under the primitive conditions I impose.
Assumption 6. Assume that ||n−1T ′d|| = Op(√rn/n) and ||n−1W ′d|| = Op(
√rn/n).
Next, I impose the following assumption which requires the basis functions T rn(Xi) to
approximate d0(Xi) sufficiently well. Note that because d0(Xi) is orthogonal to the semipara-
22
metric part of the model, f(Xi, θ∗, h∗), it is also orthogonal to Wmn(Xi), and only T rn(Xi)
is needed to approximate it.
Assumption 7. There exists αd > 0 such that
supx∈X|d0(x)− T rn(x)′π| = O(r−αd
n )
Next, I apply the following result to obtain the convergence rates of series estimators
of d0(x). These estimators are infeasible, because in practice d0(x) is unknown, but these
convergence rates will be used in the proofs.
Lemma 4 (Li and Racine (2007), Theorem 15.1). Let d(x) = T rn(x)′π, π = (T ′T )−1T ′d
di = d0(Xi), and di = d(Xi). Under Assumptions 1, 3, and 7, the following is true:
(a) supx∈X |d(x)− d0(x)| = Op
(ζ(rn)(
√rn/n+ r−αd
n ))
(b) 1n
∑ni=1 (di − di)2 = Op(rn/n+ r−2αd
n )
(c)∫
(d(x)− d0(x))2dF (x) = Op(rn/n+ r−2αdn )
Next, I impose an assumption which parallels Assumption 4.
Assumption 8. There exists α > 0 such that
supx∈X|f(x, θ∗, h∗)−Wmn(x)′β∗1 | = O(m−αn )
Now I impose an assumption which mimics the results of Lemma 2 and requires the
semiparametric estimators to converge to the pseudo-true values under the local alternative
at the same rates as to the true values under the null.
Assumption 9. Let f(x) = f(x, θ, h) = Wmn(x)′β1, f ∗(x) = fn(x, θ∗, h∗), fi = f(Xi), and
f ∗i = f ∗(Xi). For some α > 0, the following conditions hold:
(a) supx∈X |f(x)− f ∗(x)| = Op
(ζ(mn)(
√mn/n+m−αn )
)23
(b) 1n
∑ni=1 (fi − f ∗i )2 = Op(mn/n+m−2αn )
(c)∫
(f(x)− f ∗(x))2 = Op(mn/n+m−2αn )
The next result gives the behavior of the test under local alternatives. For simplicity, I
only treat the homoskedastic case, but a similar result can be obtained in the heteroskedastic
case at the expense of additional, less transparent, assumptions and tedious derivations.
Theorem 4. Assume that series methods are used to estimate the restricted model. Further,
assume that Assumptions 1, 2, 3, 6, 7, 8, and 9 are satisfied. Moreover, σ2(x) = σ2,
0 < σ2 <∞, for all x ∈ X , and rate conditions 4.1–4.5 hold.
Then
trn =ξ − rn√
2rn
d→ N(δ, 1),
where ξ is as in Equation 3.5 and δ = E[d2i ]/σ2.
5 Asymptotics with General Semiparametric Methods
One may expect that the same type of specification tests, based on the quadratic form
ξ = ε′P (σ2P ′P )−1P ′ε, may remain valid even if other semiparametric methods, such as
kernels, are used to estimate the restricted model, or if the null model is not nested in the
alternative. In this section I develop the asymptotic theory for the proposed test in this
general case.
Because general semiparametric methods, unlike series methods, make it difficult to ex-
plicitly nest the restricted model in the unrestricted one, rn, the number of restrictions
imposed by the null hypothesis on a general nonparametric model, is undefined. Thus, the
only possible normalization in this case is τn = kn, the number of series terms used to con-
struct the test statistic. Moreover, because general semiparametric methods do not have a
projection interpretation, it is problematic to directly account for the estimation variance
24
and derive a degrees of freedom correction. Thus, I will have to impose stronger assumptions
than in the series estimation case.
First, I impose a generic assumption on the convergence rate of the semiparametric con-
ditional mean function estimator to the true conditional mean function. In many semipara-
metric models, θ is√n-consistent and h obeys certain convergence rates, so semiparametric
estimators often satisfy this assumption for appropriately chosen ηn and ψn.
Assumption 10. Let g(x) = f(x, θ, h), gi = g(Xi), and gi = g(Xi). For some ηn → 0 and
ψn → 0 as n→∞, the following conditions hold:
(a) supx∈X |g(x)− g(x)| = Op(ηn)
(b) 1n
∑ni=1 (gi − gi)2 = Op(ψn)
(c)∫
(g(x)− g(x))2dF (x) = Op(ψn)
Next, I impose a high-level assumption on how the error term in the model interacts with
the estimation error:
Assumption 11. Assume that
∑i
εi(gi − gi) = op(k1/2n )
This assumption is used to bound the difference between the quadratic form in the semi-
parametric residuals, ε′P (σ2P ′P )−1P ′ε, and the quadratic form in the true regression errors,
ε′P (σ2P ′P )−1P ′ε. While this is a straightforward task in the parametric case or in the semi-
parametric case with series estimation methods, this is a much more involved task in the
general case. The primitive conditions for this assumption may differ depending on a semi-
parametric null model and a particular method used to estimate it. I plan to rigorously derive
the primitive conditions for important special cases in future work. In the current version
of the paper, I justify this assumption by providing an outline of the proof for leave-one-out
kernel estimators in Remark A.2 in the Appendix.
25
As before, I separately treat the homoskedastic and heteroskedastic cases.
5.1 Homoskedastic Errors
The limiting distribution of the test statistic under the null is given by the next result:
Theorem 5. Assume that Assumptions 1, 2, 3, 10, and 11 are satisfied. Moreover, σ2(x) =
σ2, 0 < σ2 <∞, for all x ∈ X , and the following rate conditions hold:
ψnk3/2n → 0 (5.1)
ζ(kn)kn/n1/2 → 0 (5.2)
nψn/k1/2n → 0 (5.3)
ζ(kn)2/n1/2 → 0 (5.4)
Then
tkn =ξ − kn√
2kn
d→ N(0, 1),
where ξ is as in Equation 3.5.
Remark 8 (Rate Conditions for General Semiparametric Estimators under Homoskedas-
ticity). Conditions 5.1–5.4 are sufficient conditions for the result of the theorem to hold.
Conditions 5.1 and 5.2 are used to bound the error from replacing σ2P ′P/n with σ2E[PiP′i ].
Condition 5.3 is used to bound the error from replacing ε with ε. Condition 5.4 is used to
obtain the convergence in distribution result by applying Lemma A.7 in the Appendix.
Theorem 5 does not explicitly account for the estimation error (ε−ε); instead, it imposes
the rate conditions which make it asymptotically negligible. As a result, the rate conditions
imposed in Theorem 5 are substantially stronger than in Theorem 1. Intuitively, because
Theorem 5 does not directly account for the form of the semiparametric estimator, it has
to impose strong restrictions on its convergence rate, so that asymptotically the estimation
error is negligible.
26
Remark 9 (Examples of Permissible Choices of mn and kn under Homoskedasticity in Gen-
eral Case). Suppose that series methods are used for estimation but are paired with rate
conditions 5.1–5.4 instead of 4.1–4.5. Then the rates given in Remark 5 will not work. It can
be shown that, for splines, kn = O(n2/7) would require mn = o(n7) and α ≥ 7. For example,
kn = O(n2/7) and mn = O(n2/15) are permissible in the sense of rate conditions 5.1–5.4 if
α ≥ 7. For power series, kn = O(n2/9) would require mn = o(n9) and α ≥ 9. For example,
kn = O(n2/7) and mn = O(n2/19) are permissible in the sense of rate conditions 5.1–5.4 if
α ≥ 9.
5.2 Heteroskedastic Errors
Now I deal with heteroskedastic errors. The limiting distribution of the heteroskedasticity
robust test statistic under the null is given by the next result:
Theorem 6. Assume that Assumptions 1, 2, 3, 10, and 11 are satisfied. Moreover, the
following rate conditions hold:
ψnk1/2n ζ(kn)2 → 0 (5.5)
ζ(kn)kn/n1/2 → 0 (5.6)
nψn/k1/2n → 0 (5.7)
ζ(kn)2/n1/2 → 0 (5.8)
Then
tHC,kn =ξHC − kn√
2kn
d→ N(0, 1),
where ξHC is as in Equation 3.6.
Remark 10 (Rate Conditions for General Semiparametric Estimators under Heteroskedas-
ticity). Conditions 5.5–5.8 are sufficient conditions for the result of the theorem to hold.
Conditions 5.5 and 5.6 are used to bound the error from replacing σ2P ′P/n with σ2E[PiP′i ].
27
Condition 5.7 is used to bound the error from replacing ε with ε. Condition 5.8 is used to
obtain the convergence in distribution result by applying Lemma A.7 in the Appendix.
5.3 Behavior of Test Statistic under a Global Alternative
In this section I consider the same global alternative as in Section 4. It is given by
Yi = g(Xi) + εi = f(Xi, θ∗, h∗) + ε∗i ,
where θ∗ and h∗ are pseudo-true parameter values and ε∗i ≡ εi + (g(Xi) − f(Xi, θ∗, h∗)) ≡
εi+d(Xi) is a composite error term. As before, let θ and h be the semiparametric estimators.
Then the following result holds:
Theorem 7. Let Ω∗ = E[ε∗2i PiP′i ]. In the homoskedastic case, let Ω = σ2P ′P/n, and in the
heteroskedastic case, let Ω = P ′ΣP/n, where Σ = diag(ε2i ).
Suppose that supx∈X |f(x, θ, h)− f(x, θ∗, h∗)| p→ 0, ||Ω−Ω∗|| p→ 0, the smallest eigenvalue
of Ω∗ is bounded away from zero, kn →∞, kn/n→ 0, E[ε∗iP′i ]Ω∗−1E[Piε
∗i ]→ ∆, where ∆ is
a constant. Then under homoskedasticity
√knn
ξ − kn√2kn
p→ ∆/√
2,
and under heteroskedasticity √knn
ξHC − kn√2kn
p→ ∆/√
2
The result of Theorem 7 is very similar to the result of Theorem 3. Given that in most
cases of interest mn/kn → 0, rn/kn → 1, the divergence rates of the statistics trn and tkn ,
n/√rn and n/
√kn respectively, are asymptotically equivalent. However, Theorem 3 achieves
this result under weaker assumptions, as it only requires the series approximation error to
go to zero without explicitly requiring the semiparametric estimator to be consistent. In
contrast, Theorem 7 requires the semiparametric estimator f(x, θ, h) to be consistent for the
28
pseudo-true value f(x, θ∗, h∗).
5.4 Behavior of Test Statistic under Local Alternatives
In this section I consider a slightly different family of local alternatives as compared to
Section 4. It is given by
H1n : gn(Xi) = f(Xi, θ∗, h∗) + (k1/4n /n1/2)d0(Xi),
where d0 is square integrable on X , E[d0(Xi)] = 0, and E[f(Xi, θ∗, h∗)d0(Xi)] = 0. The null
hypothesis corresponds to d0 = 0. I impose the following assumptions.
First, I assume that the unknown function d(x) can be approximated by a finite series
expansion sufficiently well.
Assumption 12. There exists αd > 0 such that
supx∈X|d0(x)− P kn(x)′π| = O(k−αd
n )
Next, I apply the following result to obtain the convergence rates of series estimators
of d0(x). These estimators are infeasible, because in practice d0(x) is unknown, but these
convergence rates will be used in the proofs.
Lemma 5 (Li and Racine (2007), Theorem 15.1). Let d(x) = P kn(x)′π, π = (P ′P )−1P ′d
di = d0(Xi), and di = d(Xi). Under Assumptions 1, 3, and 12, the following is true:
(a) supx∈X |d(x)− d0(x)| = Op
(ζ(kn)(
√kn/n+ k−αd
n ))
(b) 1n
∑ni=1 (di − di)2 = Op(kn/n+ k−2αd
n )
(c)∫
(d(x)− d0(x))2dF (x) = Op(kn/n+ k−2αdn )
I modify Assumption 10 as follows:
29
Assumption 13. Let f(x) = f(x, θ, h), f ∗(x) = f(x, θ∗, h∗), fi = f(Xi), and f ∗i = f ∗(Xi).
For some ηn → 0 and ψn → 0 as n→∞, the following conditions hold:
(a) supx∈X |f(x)− f ∗(x)| = Op(ηn)
(b) 1n
∑ni=1 (fi − f ∗i )2 = Op(ψn)
(c)∫
(f(x)− f ∗(x))2 = Op(ψn)
Assumption 13 requires the semiparametric estimators to converge to the pseudo-true
values fast enough. Next, I impose a high-level assumption on how the error term in the
model interacts with the estimation error. It parallels Assumption 11 and is discussed in
Remark A.2 in the Appendix.
Assumption 14. Assume that
∑i
εi(fi − fi) = op(k1/2n )
The next result gives the behavior of the test under local alternatives:
Theorem 8. Assume that Assumptions 1, 2, 3, 12, 13, and 14 are satisfied. Moreover,
σ2(x) = σ2, 0 < σ2 <∞, for all x ∈ X , and rate conditions 5.1–5.4 hold.
Then
tkn =ξ − kn√
2kn
d→ N(δ, 1),
where ξ is as in Equation 3.5 and δ = E[d2i ]/σ2.
5.5 Asymptotic Theory: Summary
To summarize, there are two versions of the proposed test which are asymptotically equiv-
alent. One relies on series estimation method to define the number of restrictions imposed by
the semiparametric model on a general nonparametric model and uses the projection inter-
pretation of series estimators to derive the degrees of freedom correction. Another one uses
30
general estimation methods, without imposing the series structure and defining the number
of restrictions. While the former approach restricts the class of models to which the test
applies to the models that can be estimated by series, it allows me to obtain refined asymp-
totic results under mild rate conditions. In contrast, the latter approach is applicable to a
broader range of models, but it results in cruder asymptotic analysis and requires stronger
assumptions.
These two approaches differ because they differently cope with a key step in the proof,
going from the semiparametric regression residuals ε to the true errors ε. The series-based
approach relies on the projection property of series estimators to eliminate the estimation
variance and hence only has to deal with the approximation bias. Specifically, it uses the
equality ε = MW ε + MWR, applies a central limit theorem for U -statistics to the quadratic
form in MW ε, and bounds the remainder terms by requiring the approximation error R to
be small.
The general approach does not impose any special structure on the model residuals and
thus has to deal with both the bias and variance of semiparametric estimators. Specifically,
it uses the equality ε = ε+ (g − g), which can be written in the series form8 as ε = ε+R+
W ′(β1 − β1), applies a central limit theorem for U -statistics to the quadratic form in ε, and
bounds the remainder terms by requiring both the bias term R and variance termW ′(β1− β1)
to be small. This, in turn, requires the semiparametric estimates of the restricted model to
converge to the true values very fast and leads to restrictive rate conditions.
6 Simulations
There are several variants of the test, depending on whether series methods are used in
estimation, what normalization is used, and what limiting distribution is used. In this section
I study the finite sample performance of different variants of the proposed tests in simulations8Even though series methods do not have to be used in estimation, it is convenient to write the model in
the series form to facilitate the comparison between two approaches.
31
that mimic the cost estimation example in Section 2. The Monte Carlo studies have several
goals: first, to compare the tests based on the χ2 and normal asymptotic approximations;
second, to compare two normalizations of the test statistic; third, to investigate the test
behavior with different sample sizes and with different numbers of series terms; finally, to
study the use of multiple alternatives as a tool to improve the test power in particular
directions.
I assume that the researcher wants to estimate returns to scale (or their inverse δ(Zi)) in
the model
(lnTCi − ln pLi)︸ ︷︷ ︸TCi
= C1(Zi) + γ(Zi) (ln pKi − ln pLi)︸ ︷︷ ︸pi
+δ(Zi) lnQi︸︷︷︸Qi
+εi
From now on, I consider the rearranged model TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + εi.
In the subsequent analysis, I will test the specification of the semiparametric conditional
mean model against the nonparametric alternative using the proposed specification test.
My test also applies to parametric null hypotheses, but I omit the results of testing the
parametric model against the nonparametric alternative for brevity. The varying coefficient
null hypothesis is given by
HSP0 : P
(E[TCi|pi, Qi, Zi] = C1(Zi) + γ(Zi)pi + δ(Zi)Qi
)= 1 for some C1(Zi), γ(Zi), δ(Zi),
while the alternative is
H1 : P(E[TCi|pi, Qi, Zi] 6= C1(Zi) + γ(Zi)pi + δ(Zi)Qi
)> 0 for all C1(Zi), γ(Zi), δ(Zi)
I obtain (ln pLi, ln pKi, lnQi, Zi) as follows. First, I draw four uniform random variables
32
Vj,i ∼ U [0, 2], j = 1, ..., 4. Then I set
ln pKi = 1.25 + 0.75(0.7V1,i + 0.15V2,i + 0.1V3,i + 0.05V4,i)
ln pLi = 1.4 + 0.6(0.1V1,i + 0.75V2,i + 0.05V3,i + 0.1V4,i)
lnQi = 2 + (0.05V1,i + 0.1V2,i + 0.65V3,i + 0.2V4,i)
Zi = 1.25 + 0.75(0.025V1,i + 0.025V2,i + 0.15V3,i + 0.8V4,i)
This way of generating data ensures that all variables are bounded and at the same time
correlated with one another. Table 1 shows the pairwise correlations between the variables.
Then I simulate data from two data generating processes:
1. Semiparametric varying coefficient model, which corresponds to the null hypothesis
HSP0 :
TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + εi, (6.1)
where
C1(z) = 20− 3(z − 2) + 7(z − 2)2 − 12(z − 2)3
α(z) = 0.35 + 0.05 exp(2(z − 2)) + 0.05(z − 2)2
β(z) = 0.3 + 0.075 exp(2(z − 2)) + 0.025(z − 2)2
γ(z) =β(z)
α(z) + β(z), δ(z) =
1
α(z) + β(z)
Figure 1 plots the coefficient functions C1(z), γ(z), and δ(z). This choice of functional
forms is motivated by the fact that it results in reasonable and economically interesting
parameter values. For instance, returns to scale lie roughly between 0.7 and 1.2, with
more than 90% of the firms having decreasing returns to scale and less that 10% having
increasing returns to scale. The firms with increasing returns to scale are those with
33
the highest R&D spending.
2. Nonparametric model, which corresponds to the alternative hypothesis H1:
TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + λ1|pi|1/2 + λ2|Qi|1/3 + εi, (6.2)
where (λ1, λ2) = (0.65, 0.65).
Figures 3 and 4 plot the relationship between the total costs and prices or output under
the null and under the alternative to illustrate how the nonparametric model compares
to the semiparametric one.
Though the deviation from the null is nontrivial, it explains only a small fraction of the
dependent variable variation. The variance of the dependent variable TCi is around
8.25, while the variance of the semiparametric part C1(Zi)+γ(Zi)pi+δ(Zi)Qi is around
6.10 and the variance of the deviation from the null λ1|pi|1/2 +λ2|Qi|1/3 is around 0.03.
Because the parametric model and the semiparametric models are restricted versions of
the nonparametric model, I will use the proposed test to check if they are correctly specified.
But first, I demonstrate why specification testing is important.
Suppose that the true model is the varying coefficient model above, but the researcher
estimates the parametric model:
TCi = C1 + γpi + δQi + θZi + εi,
or the parametric model with interactions:
TCi = C1 + (γ0 + γ1Zi)pi + (δ0 + δ1Zi)Qi + θZi + εi
Figure 2 shows the estimates of δ(z), the inverse of returns to scale, from OLS, OLS with
interactions, and the varying coefficient model, as well as the true function δ(z). As we can
34
see from the figure, both ordinary OLS, and OLS with interactions, yield very misleading
estimates of δ(z) and thus returns to scale. The true δ(z) is greater than 1 and slowly
decreasing over most of its support, meaning that most firms in the sample have decreasing
returns to scale. The OLS results imply that it δ(z) is smaller than 1 and hence returns to
scale are increasing. The OLS with interactions results imply that δ(z) is increasing in R&D
spending, so not only the magnitude of returns to scale is incorrect, but also the pattern of
their dependence on R&D spending. If the researcher wants to evaluate a possible merger
of two firms, OLS or OLS with interactions estimates will likely lead to very misleading
counterfactuals. Thus, using a plausible model is important, and I will show next that the
proposed test helps determine if a given model is correctly specified.
Before I move on and discuss the behavior of the proposed test in finite samples, I need
to make two choices in order to implement the test. First, I need to choose the family of
basis functions; second, choose the number of series terms in the restricted and unrestricted
models. I use power series because of their simplicity, but I do not have any data-driven
methods to choose tuning parameters.
The choice of tuning parameters presents a big practical challenge in implementing the
proposed test, as well as many other specification tests.9 If one is interested only in estimating
the null or the alternative models, then certain data-driven methods, such as cross-validation,
can be used to select the number of series terms. For details, see, e.g., Section 15.2 in Li
and Racine (2007). However, it is not clear how these data-driven procedures may affect
the proposed test and whether using them would lead to any optimality results in testing.
I leave the choice of tuning parameters for the proposed test for future research, and in my
simulations choose tuning parameters according to the rate conditions imposed in Section 4.
However, the choices I make are still arbitrary, because I can multiply mn or kn by any
constant and still satisfy the rate conditions, which are asymptotic in nature.
The number of terms in the parametric model is mOLSn = 4, the number of regressors
9For a discussion of regularization parameters choice in the context of kernel-based tests, see a review bySperlich (2014).
35
plus one. The number of terms in the series expansions of the unknown coefficient functions
C1(z), γ(z), and δ(z) in the varying coefficient model is ln = b2.5n0.12c, which leads to
mV CMn = 3ln. In the nonparametric model, I include jn = b3n0.06c series terms in pi and Qi
each, which leads to kn = lnj2n. The number of restrictions is given by rOLSn = kn − mOLS
n
and rV CMn = kn −mV CMn correspondingly.
These choices lead to mn = 15, rn = 65, and kn = 80 when n = 1, 000, and to mn = 18,
rn = 132, and kn = 150 when n = 5, 000. Because the behavior of the test statistic depends
both on the sample size and the number of series terms, I also consider an intermediate
setup with n = 5, 000 but mn = 15, rn = 65, and kn = 80 to separate these two effects.
In other words, I first fix the number of series terms and increase the sample size, and then
fix the sample size and increase the number of series terms. Throughout my analysis, I use
B = 2, 000 simulation draws.
6.1 Homoskedastic Errors
First, I investigate the performance of the test when the errors are homoskedastic. I use
the ξ test statistic directly:
ξ = ε′P (σ2P ′P )−1P ′εa∼ χ2
τn ,
as well as the t statistic:
tτn =ξ − τn√
2τn
a∼ N(0, 1),
with τn = kn and τn = rn.
I consider two sample sizes, n = 1, 000 and n = 5, 000. The errors are normally dis-
tributed: εi ∼ i.i.d. N(0, 2.25). With this choice of the distribution of εi, the semiparametric
part of the model explains about 70% of the dependent variable variance, while the errors
account for the remaining 30%. I repeated my analysis with centered exponential errors,
which have an asymmetric distribution, and found no substantial difference from the normal
case. The results for exponential errors are omitted for brevity.
36
Table 2 shows the size and power of the nominal 5% level test for three combinations of
the sample size and the number of series terms: Setup 1 with (n = 1, 000, kn = 80), Setup 2
with (n = 5, 000, kn = 80), and Setup 3 with (n = 5, 000, kn = 150), and two normalizations
of the test statistic: τn = kn and τn = rn. As we can see from the table, the test with
the former normalization is severely undersized, with the size being below 1% even with
n = 5, 000 observations. In contrast, the size of the test with the latter normalization is
close to the nominal level of 5%. Moreover, in all three settings, the test based on the latter
normalization has better power against the semiparametric varying coefficient null model
when the nonparametric model is true.
As far as the choice between the ξ test statistic and the normalized t test statistic is
concerned, the latter typically leads to slightly higher rejection probabilities, meaning that
the test based on the t statistic has marginally better power but is slightly oversized.
Finally, we can see that increasing the sample size from n = 1, 000 to n = 5, 000 while
keeping the series term constant at kn = 80 brings the size of the test closer to the nominal
level and greatly increases power, while increasing the number of series terms from kn = 80
to kn = 150 with the sample size of n = 5, 000 has almost no effect on the size but reduces
the power of the test. This observation calls for a data-driven method to choose the number
of series terms. As my simulations show, including too many series terms when they are not
necessary can worsen the test performance; however, including too few series terms can make
it difficult to detect certain alternatives10, which will also lead to low power. Thus, having
a data-driven method that would help balance these two considerations and determine the
appropriate number of series terms to include is important, and I plan to study this question
in future work.
Next, I plot the simulated distribution of the test statistic in different Monte Carlo set-
tings. Figure 5 plots the distribution of the t test statistic under the null for the three setups
I study. As we can see, in all three settings the simulated distribution of the trn test statistic10The series-based test cannot detect alternatives which are orthogonal to all series terms used to form the
test statistic. The more series terms are used to construct the test, the fewer such alternatives exist.
37
is very close to the standard normal. In contrast, the simulated distribution of the tkn test
statistic is off to the left as compared to the standard normal. This illustrates the importance
of the proper normalization, which explicitly accounts for the estimation variance, and helps
explain why the tests based on the normalization τn = kn are severely undersized.
6.2 Heteroskedastic Errors
In this subsection I investigate the performance of the test when the errors are het-
eroskedastic. The form of heteroskedasticity is εi ∼ i.n.i.d. N(0, 0.015 exp(Qi+Zi)). Figure 6
illustrates this form of heteroskedasticity. The test statistic is based on
ξHC = ε′P (P ′ΣP )−1P ′ε,
where Σ = diag(ε2i ), and is given by
tHC,τn =ξHC − τn√
2τn,
for τn = rn. I do not report the results for τn = kn, because they are similar to the previous
section, i.e. the test based on tHC,kn is severely undersized and low-powered. Instead of
comparing the test statistics tHC,rn and tHC,kn , I compare the feasible test statistic tHC,rn
with the infeasible test statistic
tHC,rn,inf =ξHC,inf − rn√
2rn,
where
ξHC,inf = ε′P (P ′ΣP )−1P ′ε,
where Σ = diag(E[ε2i |Qi, pi, Zi]) = diag(0.015 exp(Qi+Zi)). I make this comparison in order
to understand whether the behavior of the test under heteroskedastic errors is driven by
heteroskedasticity itself or by using the estimated variance-covariance matrix instead of the
38
true one. Because of higher computational burden associated with the heteroskedasticity
robust test statistic, I reduce the number of simulation draws from B = 2, 000 to B = 1, 000.
Table 3 shows the size and power of the nominal 5% level test for the three setups discussed
above and two test statistics, tHC,rn and tHC,rn,inf . Figure 7 plots the distribution of the t
test statistic under the null for these three setups.
As we can see, in the heteroskedastic case the feasible test is somewhat oversized, and the
size of the test based on the ξ test statistic is closer to the nominal level than the size of the
test based on the t test statistic. Interestingly, as we go from Setup 2 to Setup 3, i.e. increase
kn and rn while keeping n constant, the size of the test becomes closer to the nominal level,
but the simulated distribution of the t test statistics moves away from standard normal.
As for the comparison between the feasible and infeasible test statistics, even though
their simulated distributions look different, with the infeasible test statistic distribution being
closer to the standard normal, the simulated size of these two tests is very close. Similarly
to the feasible test, the infeasible test is oversized in all three setups. Hence, it appears that
it is heteroskedasticity itself, and not the variance-covariance matrix estimation, that causes
the size distortion. At the same time, the variance-covariance matrix estimation noticeably
affects the finite-sample distribution of the test statistic, but has very little effect on its tail
behavior and, as a result, on the finite-sample rejection probabilities.
6.3 Test Behavior under Local Alternatives
In this section, I study the behavior of the proposed test under local alternatives. In-
stead of fixing the DGP and moving the model, as in Sections 4.4 and 5.4, I fix the model
and move the DGP. As discussed in Hong and White (1995), these two approaches lead to
similar conclusions, and while the former simplifies asymptotic theory, the latter is easier to
implement in simulations.
More specifically, I use the same setup as before, but the true DGP changes with the
39
sample size n:
TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi +(r1/4n /n1/2
)d0(pi, Qi)εi
d0(pi, Qi) = λ1|pi|1/2 + λ2|Qi|1/3
(λ1, λ2) = (0.75, 0.75)
In other words, the true model is nonparametric, but it approaches the semiparametric
varying coefficient model at the rate (r1/4n /n1/2) as the sample size grows. Based on the result
of Theorem 4, the rejection probability should remain the same as the sample size changes.
I gradually increase the sample size from n = 500 to n = 25, 000 and compute the
simulated rejection probabilities for the tests based on the trn test statistic. Table 4 shows
the size and power of the nominal 5% level test for the semiparametric null hypothesis as
the sample size varies. Figure 8 plots the distribution of the t test statistic under local
alternatives for n = 1, 000 and n = 10, 000.
As we can see, the rejection probabilities for the test based on the trn statistic lie between
21% and 27% for all sample sizes considered, and the simulated distributions of the trn
statistic for n = 1, 000 and n = 10, 000 look very similar. These findings are consistent with
the theoretical results established in Section 4.4.
6.4 Multiple Alternatives and Bonferroni Correction
If a researcher wants to estimate a model that can be nested in an expanding set of
alternatives (e.g. a parametric model can be nested in a semiparametric partially linear
model or in a nonparametric model), it is possible to test it against multiple alternatives
simultaneously while using the Bonferroni correction.
Namely, if the null model is fully parametric:
HP0 : P (E[Yi|Xi] = X ′1iβ1 +X ′2iβ2) = 1 for some β1 ∈ Rdx1 , β2 ∈ Rdx2 ,
40
where Xi = (X ′1i, X′2i)′, a researcher may consider a semiparametric varying coefficient and
a fully nonparametric alternatives simultaneously:
HV C1 : E[Yi|Xi] = X ′1iβ(X2i) for some β(·) : Rdx2 → Rdx1+1;
HNP1 : E[Yi|Xi] = g(Xi) for some g(·) : Rdx → R.
where X1i = (1, X ′1i)′.
Intuitively, using the former alternative may improve power of the test if the true model
turns out to be close to a varying coefficient one, while the latter ensures consistency against
a general nonparametric alternative. The test statistic should be modified accordingly, by
including in P kn(Xi) only those power series terms that are present under the alternative.
For example, alternative HV C1 does not allow higher powers of X1i to enter the model, so
they should be removed from P kn(Xi) when constructing the test statistic for HP0 against
HV C1 .
Because now several hypotheses tests are done simultaneously, the Bonferroni correction
is needed to control size. Namely, the nominal significance level for each individual test
should be α/2 (or α/T , if there are T tests) instead of α. The resulting overall test rejects
the null if at least one individual test rejects the null at the α/2 level.
Next, I simulate data from three data generating processes. The first DGP is fully para-
metric, while the latter two resemble the DGPs used before but make it more difficult to
distinguish between the parametric and semiparametric models.
1. Parametric model:
TCi = C1 + γpi + δQi + θZi + εi
(C1, γ, δ, θ) = (26, 0.5, 1.25,−2)
41
2. Semiparametric varying coefficient model:
TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + εi
C1(z) = 20− 3(z − 2) + 0.5(z − 2)2 − 0.025(z − 2)3
α(z) = 0.35 + 0.05 exp(2(z − 2)) + 0.05(z − 2)2
β(z) = 0.3 + 0.075 exp(2(z − 2)) + 0.025(z − 2)2
3. Nonparametric model:
TCi = C1(Zi) + γ(Zi)pi + δ(Zi)Qi + λ1|pi|1/2 + λ2|Qi|1/3 + εi
(λ1, λ2) = (0.65, 0.65)
Because in all previous simulations the normalization τn = rn performs better than τn =
kn, in this section I only consider the former normalization. Table 5 compares the size and
power of the nominal 5% level test against a general nonparametric alternative and of the
simultaneous Bonferroni-corrected test of the parametric null against two alternatives for two
sample sizes, n = 1, 000 with kn = 80 and n = 5, 000 with kn = 150.
As we can see, the Bonferroni procedure controls the size of the test fairly well and sig-
nificantly improves power when the true model is semiparametric almost without sacrificing
power when the true model is nonparametric. Thus, when several alternative models are
available, it may be beneficial to test the null model against different alternatives simultane-
ously to improve the power against particular alternatives while still controlling the size by
using the Bonferroni adjustment.
A possible direction for future research is to develop a step-up or step-down model selec-
tion procedure that would not only test a given model against one or several alternatives,
but also would allow the researcher to consider a different, more general, null model if the
original null is rejected.
42
7 Empirical Example
In this section, I apply the proposed test to the Canadian household gasoline consumption
data from Yatchew and No (2001). They estimate gasoline demand as a function of the
gasoline price and various demographics.
Yatchew and No (2001) use several demand models, including semiparametric specifica-
tions. They use differencing (see Yatchew (1997) and Yatchew (1998) for details) to estimate
semiparametric models. The relevance of semiparametric models in gasoline demand esti-
mation was first pointed out by Hausman and Newey (1995) and Schmalensee and Stoker
(1999). Yatchew and No (2001) follow these papers in using semiparametric specifications
and pay special attention to specification testing. However, they only test semiparametric
specifications against parametric ones, while I use series methods to estimate semiparamet-
ric specifications and implement the proposed series-based specification test to assess their
validity as compared to a general nonparametric model. As I will show, the semiparametric
models used by Yatchew and No (2001) are adequate and are not rejected when compared
to a nonparametric model; however, less flexible semiparametric or parametric models are
rejected. Thus, it appears that Yatchew and No (2001) succeed in choosing a flexible enough
yet parsimonious model.
The specifications I estimate parallel those used in Yatchew and No (2001). In particu-
lar, I estimate a partially linear model which is nonparametric both in PRICE and AGE
(corresponds to Model 3.1 in Yatchew and No (2001)):
y = f(PRICE,AGE) + z′β + ε, (7.1)
where y is the logarithm of the total distance traveled in the month, PRICE is the gasoline
price per liter, AGE is the age of the first driver of the car, and z includes the logarithm of
the household income, the logarithm of the number of drivers in the household, the logarithm
of the household size, monthly dummies, an urban dummy, and a dummy for singles under
43
35 years old.
Next, I estimate the additive semiparametric model (roughly corresponds to Model 3.3
in Yatchew and No (2001)):
y = f(PRICE) + g(AGE) + z′β + ε (7.2)
I also consider a model which is nonparametric in AGE but parametric in log(PRICE)
(roughly corresponds to Model 3.4 in Yatchew and No (2001)):
y = g(AGE) + γ log(PRICE) + z′β + ε (7.3)
Finally, I consider a fully parametric model:
y = α0 + α1 log(AGE) + γ log(PRICE) + z′β + ε (7.4)
In order to apply my test, I need to choose the series functions P kn(x) or, equivalently,
define a nonparametric alternative that can be approximated by these series functions. Ide-
ally, I would want to use a fully nonparametric alternative y = h(PRICE,AGE, z) + ε.
However, this is impractical in the current setting: the dataset from Yatchew and No (2001)
contains 12 monthly dummies, an urban dummy, and a dummy for singles under 35 years old.
The fully nonparametric alternative would require to completely saturate the model with the
dummies, i.e. interact all series terms in continuous regressors with a full set of dummies.
This would be equivalent to dividing the dataset into 12 ·2 ·2 = 48 bins and estimating within
every bin separately. Given that the total number of observations is 6,230, this would leave
me with about 125 observations per bin on average and would make semi- or nonparametric
estimation problematic.
I choose a different approach to deal with the dummies. I assume that the nonparamet-
ric alternative still satisfies separability between the dummies and the remaining variables.
44
Separating z into z1, which includes the logarithm of the household income, the logarithm
of the number of drivers in the household, the logarithm of the household size, and z2,
which includes the dummies, I rewrite all models with z1 in place of z and I consider the
nonparametric alternative given by
y = h(PRICE,AGE, z1) + z′2λ+ ε (7.5)
Before I move on to specification testing, I present results from specification 7.2 to com-
pare my results with the results in Yatchew and No (2001). I estimate the semiparametric
specification 7.2 using power series, with ln = 4 terms for both AGE and PRICE. Instead of
using the levels, I take the logarithms of AGE and PRICE and demean them to avoid deal-
ing with exploding values of the series terms. In other words, I use log(AGE), log(AGE)2,
log(AGE)3, log(AGE)4 and log(PRICE), log(PRICE)2, log(PRICE)3, log(PRICE)4,
where log(AGE) = log(AGE)−log(AGE) and log(PRICE) = log(PRICE)−log(PRICE).11
This, together with other regressors, yieldsmn = 25 parameters in the semiparametric model.
Figure 9 plots the estimated age and price effects from specification 7.2 and correspond
to Figure 2 in Yatchew and No (2001). As we can see, the relationship between PRICE
and gasoline demand is close to linear, with nonlinearities present only at the boundaries of
the support of PRICE, while the relationship between AGE and gasoline demand is highly
nonlinear. Thus, we could expect to reject specifications which are parametric and linear in
AGE.
Table 6 shows the estimates from specification 7.2 and compares them to estimates in
Figure 2 in Yatchew and No (2001). As we can see from these figures and tables, my results
closely mirror those in Yatchew and No (2001).
Now I turn to specification testing. To construct the regressors P kn used to evaluate the
test statistic, I use ln = 4 power series terms in AGE and PRICE, the set of dummies
discussed above, jn = 2 power series terms in the logarithms of INCOME, DRIV ERS,11Here, for any variable X, X =
∑iXi/n is the sample average of X.
45
and HHSIZE. I then use pairwise interactions (tensor products) of univariate power se-
ries, and add all possible three, four, and five element interactions between AGE, PRICE,
INCOME, DRIV ERS, and HHSIZE, without using higher powers in these interaction
terms to avoid multicollinearity. This gives rise to kn = 120 terms in the nonparametric
model.
I estimate specifications 7.1–7.4 using series method and then construct the proposed test
statistic. The resulting t-statistic equals 0.652 for specification 7.1, 0.165 for specification 7.2,
0.050 for specification 7.3, and 3.545 for specification 7.4. Table 7 collects these test results
together. Thus, only the parametric specification is rejected at the 5% significance level, while
all three semiparametric specifications under consideration are not rejected when compared
to a general nonparametric alternative.
This is in line with the results in Yatchew and No (2001): even though they do not test
their semiparametric specifications against a general nonparametric alternative, they find no
evidence against a specification similar to 7.3 when compared to specification 7.1.
In addition to the specifications considered above, I also consider a semiparametric spec-
ification which is nonparametric in price but not age:
y = f(PRICE) + γ log(AGE) + z′β + ε (7.6)
When testing it against specification 7.1, the t-statistic is 3.702, so that specification 7.6
is rejected at the 5% significance level. Thus, it seems that controlling for AGE flexibly is
crucial in this gasoline demand application.
Finally, it is worth noting that if I use the proposed test to compare specification 7.4
against a semiparametric alternative 7.1 (so that P kn includes only series terms in AGE
and PRICE and their interactions), I obtain the t-statistic of 19.505 (as opposed to 3.545
when the alternative is fully nonparametric), which means that the parametric model is
rejected at the 5% significance level. Thus, using a general nonparametric alternative instead
of a semiparametric alternative, which leads to the test being consistent against a general
46
alternative, deteriorates the power of the test if the true model is close to semiparametric.
For practical purposes, if the null model under consideration is nested in several more
general models, it may make sense to test it against these several alternatives simultaneously
using Bonferroni correction. Using a general nonparametric alternative will result in consis-
tency, while using more restrictive alternatives may result in better power properties if these
alternatives are close to being correct.
8 Conclusion
In this paper, I develop a new specification test for semiparametric conditional mean
models. The proposed test achieves consistency by turning a conditional moment restriction
into a growing number of unconditional moment restrictions using series methods. Because
the number of series terms grows with the sample size, the usual asymptotic theory for the
parametric Largange Multiplier test is no longer valid. I explicitly allow the number of terms
to grow and show that a normalized test statistic converges in distribution to the standard
normal. The proposed test has several attractive features compared to the existing tests.
First, the proposed test is simple to implement. The test statistic is based on a quadratic
form in the semiparametric regression residuals, so only estimation of the restricted model
is required to compute the test statistic. In the homoskedastic case, the quadratic form on
which the test is based can be computed as nR2 from the regression of the semiparametric
residuals on the series terms used to form the test. Moreover, the asymptotic distribution of
the test statistic is pivotal, which facilitates the calculation of appropriate critical values.
Second, when the null model is nested in the alternative and is estimated by series meth-
ods, the projection property of series estimators makes it possible to explicitly account for the
estimation variance and obtain refined asymptotic results. This refinement can be thought
of as a degrees of freedom correction. Simulations show that because of this adjustment the
proposed test behaves well in finite samples.
47
Third, series methods make it easy to restrict the class of alternatives from a fully non-
parametric to certain semiparametric classes, such as additive, varying coefficient, or partially
linear. Doing so will result in the loss of consistency against a general alternative but will
improve the power of the test in certain directions. In order to maintain consistency, one can
combine tests against various alternatives simultaneously, including the fully nonparametric
alternative, and use the Bonferroni correction to control the size of the test.
Finally, the test is not limited to models estimated by series methods. With a slightly
different normalization, it remains valid even when the restricted model is estimated using
other semiparametric methods, such as kernels or local polynomials. However, because with
these semiparametric estimators the degrees of freedom correction is not available, stronger
assumptions are required, and my simulations show that the test based on a generic normal-
ization is typically undersized and low powered.
I apply the proposed test to the Canadian household gasoline consumption data from
Yatchew and No (2001) and find no evidence against the semiparametric specifications used
in their paper. However, I show that my test does reject less flexible semiparametric or
parametric models.
There are several avenues for future research, such as developing a data-driven procedure
to choose tuning parameters (the number of series terms under the null and alternative),
extending the proposed test to semiparametric models with endogeneity in the parametric
part, and designing a model selection procedure (such as step-up or step-down procedures)
based on the proposed test.
48
Appendices
A Tables and Figures
Table 1: Correlation Between Regressorsln pKi ln pLi lnQi Zi
ln pKi 1ln pLi 0.337 1lnQi 0.260 0.232 1Zi 0.113 0.161 0.474 1
The table shows pairwise correlations between the regressors and is based on one realization of the regressorvalues.
Table 2: Rejection Probabilities, Normal ErrorsSP NP SP NP
ξ test statistic t test statisticτn = kn
Setup 1 0.003 0.022 0.003 0.024Setup 2 0.004 0.625 0.004 0.649Setup 3 0.003 0.495 0.004 0.512
τn = rnSetup 1 0.057 0.176 0.068 0.197Setup 2 0.053 0.918 0.063 0.927Setup 3 0.053 0.849 0.058 0.859
Setup 1: n = 1, 000, kn = 80, rn = 65. Setup 2: n = 5, 000, kn = 80, rn = 65. Setup 3: n = 5, 000, kn = 150,rn = 132. SP refers to semiparametric DGP, NP refers to nonparametric DGP. Entries in bold correspondto cases when H0 is true. Results are based on B = 2, 000 simulation draws.
49
Table 3: Rejection Probabilities, Normal Heteroskedastic ErrorsSP NP SP NP
ξ test statistic t test statisticFeasible Test Statistic tHC,rn
Setup 1 0.103 0.288 0.120 0.328Setup 2 0.076 0.965 0.083 0.975Setup 3 0.071 0.934 0.076 0.943
Infeasible Test Statistic tHC,rn,infSetup 1 0.095 0.303 0.106 0.325Setup 2 0.070 0.969 0.075 0.974Setup 3 0.071 0.939 0.076 0.954
Setup 1: n = 1, 000, kn = 80, rn = 65. Setup 2: n = 5, 000, kn = 80, rn = 65. Setup 3: n = 5, 000, kn = 150,rn = 132. SP refers to semiparametric DGP, NP refers to nonparametric DGP. Entries in bold correspondto cases when H0 is true. Results are based on B = 1, 000 simulation draws.
Table 4: Rejection Probabilities, Local Alternativesn kn rn P(reject H0)
500 80 65 0.2291000 80 65 0.2602500 96 78 0.2095000 150 132 0.26410000 175 154 0.27025000 200 176 0.254
Results are based on B = 2, 000 simulation draws.
Table 5: Rejection Probabilities, Normal Errors, BonferroniP SP NP P SP NPξ test statistic t test statistic
HP0 vs. HNP
1
Setup 1 0.049 0.101 0.223 0.056 0.117 0.245Setup 2 0.051 0.497 0.947 0.059 0.519 0.952
HP0 vs. HNP
1 and HV C1
Setup 1 0.048 0.239 0.213 0.071 0.310 0.273Setup 2 0.052 0.955 0.934 0.073 0.971 0.952
Setup 1: n = 1, 000, kn = 80, rn = 65. Setup 2: n = 5, 000, kn = 150, rn = 132. P refers to parametricDGP, SP refers to semiparametric DGP, NP refers to nonparametric DGP. Entries in bold correspond tocases when H0 is true. Results are based on B = 2, 000 simulation draws.
50
Table 6: Partially Linear Model EstimatesYatchew and No (2001) My Estimates
log income 0.2844 0.2852(0.0211) (0.0206)
log drivers 0.5420 0.5397(0.0345) (0.0338)
log hhsize 0.1101 0.1103(0.0284) (0.0276)
single, age < 35 0.2054 0.1870(0.0622) (0.0636)
urban dummy -0.3339 -0.3309(0.0202) (0.0198)
monthly effects Yes Yesaverage price elasticity -0.890 -0.937R2 0.2689 0.2648s2 0.4967 0.5014number of observations 6,230 6,230
The table compares the estimates from Figure 2 in Yatchew and No (2001) and the semiparametric estimatesI obtain using specification 7.2. I use ln = 4 power series terms in both PRICE and AGE, which yieldsmn = 25 parameters in the semiparametric model. Standard errors are shown in parentheses.
Table 7: Specification Tests for Yatchew and No (2001)Model t-statistic Reject H0?7.1 0.652 No7.2 0.165 No7.3 0.050 No7.4 3.545 Yes7.6 3.702 Yes
The table shows the values of the t test statistic for specifications 7.1–7.6. The critical value is based onN(0, 1) distribution and equals 1.645 at the 5% significance level.
51
Figure 1: Varying Coefficient Functions
This figure shows the true coefficient functions C1(z), γ(z), and δ(z) for the varying coefficient semiparametric
model from equation 6.1 used in simulations.
Figure 2: Comparison of RTS Estimates
This long dashed line shows the true function δ(z) for equation 6.1. The short dashed line shows its OLS
estimate, the dash-dotted line shows its OLS with interactions estimate, and the solid line shows its semi-
parametric varying coefficient estimate. The figure is based on B = 1, 000 simulation draws with n = 1, 000
observations in each. The regressors are fixed across the simulation draws, only the errors are redrawn as
εi ∼ i.i.d. N(0, 2.25).
52
Figure 3: H0 in 3D
The left figure shows the dependence of TCi (on z axis) on Pi (on y axis) Qi (on x axis) under H0 in
equation 6.1. The right figure shows the dependence of TCi (on z axis) on Pi (on y axis) Qi (on x axis)
under H1 in equation 6.2. The figure is based on one realization of the regressor values and errors.
Figure 4: H0 and H1 in 2D
The left figure shows the dependence of TCi on pi conditional on fixed Qi and Zi. The right figure shows the
dependence of TCi on Qi conditional on fixed pi and Zi. The solid lines show the linear relationship which
holds under H0 in equation 6.1. The dashed lines show the nonlinear relationship which holds under H1 in
equation 6.2. The figure is based on one realization of the regressor values and errors.
53
Figure 5: Distribution of t under H0, Normal Errors
The solid line shows the simulated distribution of the trn test statistic, the dash-dotted line shows the
simulated distribution of the tkntest statistic, and the dashed line shows the standard normal distribution.
The results are based on B = 2, 000 simulation draws, εi ∼ i.i.d. N(0, 2.25). In the upper left figure
n = 1, 000, kn = 80, rn = 65; in the upper right figure n = 5, 000, kn = 80, rn = 65; in the bottom figure
n = 5, 000, kn = 150, rn = 132.
Figure 6: Form of Heteroskedastic Errors
The figure illustrates the form of heteroskedasticity εi ∼ i.i.d. N(0, 0.015 exp(Qi + Zi)). The coordinates of
the points in the scatter plot are given by (lnQi +Zi, εi). It is based on one realization of the regressor values
and errors.
54
Figure 7: Distribution of t under H0, Normal Heteroskedastic Errors
The solid line shows the simulated distribution of the feasible tHC,rn test statistic, the dash-dotted line shows
the simulated distribution of the infeasible tHC,rn,inf test statistic, and the dashed line shows the standard
normal distribution. The results are based on B = 1, 000 simulation draws, εi ∼ i.n.i.d. N(0, 0.015 exp(Qi +
Zi)). In the upper left figure n = 1, 000, kn = 80, rn = 65; in the upper right figure n = 5, 000, kn = 80,
rn = 65; in the bottom figure n = 5, 000, kn = 150, rn = 132.
55
Figure 8: Distribution of t under Local Alternatives
The solid line shows the simulated distribution of the trn test statistic under local alternatives for n = 10, 000,
kn = 175, rn = 154, the dash-dotted line shows the simulated distribution of the trn test statistic under local
alternatives for n = 1, 000, kn = 80, rn = 65, and the dashed line shows the standard normal distribution.
The results are based on B = 2, 000 simulation draws, εi ∼ i.i.d. N(0, 2.25).
Figure 9: Age and Price Effecst
The figure shows the estimated nonparametric functions f(PRICE) and g(AGE) for specification 7.2.
56
B Some Practical Tips
B.1 Series Representation
In order to derive the series-based specification test, I write the restricted and unrestricted
models in a series form. For a variable z, let Qln(z) = (q1(z), ..., qln(z))′ be a sequence of
approximating functions of z. Then an unknown function h(z) can be approximated as
h(z) ≈ln∑j=1
γjqj(z) = Qln(z)′γ
Let Wmn(x) be the sequence of functions which is used to estimate the restricted semi-
parametric model. Namely, for the partially linear model, mn = dx1 + ln and Wmn(x) =
(x′1, Qln(x2)
′)′. Then the semiparametric partially linear model can be written as
Yi = X ′1iα + g(X2i) + εi = X ′1iα +Qln(X2i)′γ +Ri + εi = Wmn(Xi)
′β1 + ei,
where β1 = (α′, γ′)′, Ri = g(X2i)−Qln(X2i)′γ is the approximation error, and ei = εi +Ri.
Next, let T rn(x) = (t1(x), ..., trn(x, ))′ be the sequence of approximating functions which
is used in addition to Wmn(x) to estimate the fully nonparametric model, so that P kn(x) =
(Wmn(x)′, T rn(x)′)′. Namely, for the partially linear model, T rn(x) may include powers of x1
and interactions between x1 and x2. The difference between T rn(x) and Wmn(x) is that the
former is present only in the unrestricted nonparametric model, while the latter is present in
both restricted and unrestricted models.
The unrestricted nonparametric model can be written as
Yi = P kn(Xi)′β +Ri + εi = Wmn(Xi)
′β1 + T rn(Xi)′β2 +Ri + εi,
where β = (β′1, β′2)′.
The null hypothesis that the conditional mean function is semiparametric corresponds
57
to rn restrictions β2 = 0. To test this hypothesis, the researcher first needs to estimate the
semiparametric model
Yi = Wmn(Xi)′β1 + ei,
obtain the estimates β1, compute the residuals εi = Yi − Wmn(Xi)′β1, and then use the
following statistic as a basis for the test:
ξ = ε′P (σ2P ′P )−1P ′ε,
where σ2 = ε′ε/n.
As I show below, this test statistic can be derived as a Largange Multiplier or Condi-
tional Moment test statistic for the semiparametric model written in the series form if the
dependence of the number of terms on the sample size is ignored, and the model is treated
as parametric. In parametric models with a fixed number of restrictions r, this test statistic
converges in distribution χ2r under the null. In the present paper, the number of restrictions
rn grows with the sample size, so the usual asymptotic result does not hold. I develop an
asymptotic theory for the proposed test in Section 4.
B.2 Proposed Test as LM Test
As shown above, the unrestricted nonparametric model is given by
Yi = P kn(Xi)′β + ei = Wmn(Xi)
′β1 + T rn(Xi)′β2 + ei,
where β = (β′1, β′2)′. The semiparametric null model imposes the restriction that β2 = 0. If
one ignored the presence of approximation errors and the dependence of the number of series
terms on the sample size, this restriction could be tested using the Lagrange Multiplier test.
58
The (quasi-)log-likelihood for the nonparametric model is given by
Ln(β, σ2) = −1
2log 2π − 1
2log σ2 − 1
2σ2n(Y − Pβ)′(Y − Pβ)
Then the score equals
Sn(β, σ2) =
∂Ln(β,σ2)∂β
∂Ln(β,σ2)∂σ2
=
1σ2n
P ′(Y − Pβ)
− 12σ2 + 1
2σ4n(Y − Pβ)′(Y − Pβ)
,
which under the null hypothesis becomes
Sn(β, σ2) =
1σ2n
P ′ε
0
,
where β = (β′1,0′rn)′.
The information matrix evaluated at true parameter values is:
Fn(β, σ2) =
(E[−∂Sn
∂β′ ] E[−∂Sn
∂σ2 ]
)=
E[ 1σ2n
P ′P ] 0
0 12σ4
Assuming that the information matrix equality holds, the ξ test statistic for the restricted
null model is constructed by evaluating the score and the Hessian at the restricted estimates:
ξ = nSn(β, σ2)′Fn(β, σ2)−1Sn(β, σ2) = ε′P (σ2P ′P )−1P ′ε,
B.3 Proposed Test as CM/J Test
As shown above, the semiparametric model is given by
Yi = Wmn(Xi)′β1 +Ri + εi, E[εi|Xi] = 0
59
Ignoring the presence of the approximation error, rewrite it as
Yi = Wmn(Xi)′β1 + ei, E[ei|Xi] = 0
It has mn parameters but should satisfy kn = mn + rn moment conditions, because ei
should be uncorrelated with any function of Xi, not only with Wmn(Xi):
E[P kn(Xi)ei] = 0 (B.1)
Because the model satisfies more unconditional moment restrictions than there are pa-
rameters to estimate, it is overidentified. Thus, the specification test can be based on testing
the increasing number of moment conditions.
The semiparametric model satisfies the following population moment condition
E[Wmn(Xi)ei] = 0 (B.2)
It means that the residuals εi solve the sample analog of the population moment condi-
tion B.2:1
n
n∑i=1
Wmn(Xi)εi =1
nW ′ε = 0
If the semiparametric conditional model model is correctly specified, we would expect
that the sample analog of the moment condition B.1 to be close to zero, i.e.
1
n
n∑i=1
P kn(Xi)εi =1
nP ′ε ≈ 0
The Conditional Moment test statistic could be used to evaluate whether these moment
conditions are statistically different from zero:
ξ = ε′P (σ2P ′P )−1P ′ε
60
Remark A.1 (Conditional Moments and Overidentifying Restrictions Tests). The proposed
test statistic is very similar, though not exactly the same, as the overidentifying restrictions
test statistic. Suppose that instead of estimating the parameters by solving the sample analog
of the OLS population moment conditions:
E[Wmn(Xi)ei] = E[Wmn(Xi)(Yi −Wmn(Xi)′β1)] = 0,
the researcher estimates the parameters using GMM based on the overidentified population
moment condition:
E[P kn(Xi)ei] = E[P kn(Xi)(Yi −Wmn(Xi)′β1)] = 0
For a weighting matrix M and its first step estimate M , the GMM estimates β1 solve
minβ1
((Y −Wβ1)′P/n) M (P ′(Y −Wβ1)/n)
For the GMM residuals εi = Yi −Wmn(Xi)′β1, the overidentifying test statistic is
J = n(ε′P/n)M(P ′ε/n),
In general, β1 6= β1, because they solve different minimization problems, and hence ε 6= ε.
Under the null, however, β1 and β1, and thus ε and ε, should be close to one another, so it
should be possible to use both ξ and J as a basis for the specification test. The expression for
J may look more familiar from GMM literature; however, I prefer to use ξ because it can also
be viewed as an LM test, because it requires estimating only the restricted semiparametric
model, and because it makes it possible to directly account for the estimation variance.
If the researcher estimates 2SLS (i.e. GMMwithM = (σ2E[PiP′i ])−1 and M = (σ2P ′P/n)−1),
then two statistics coincide as long as the second step estimate of σ2 is used to form the J
61
statistic. Namely, 2SLS estimates solve
minβ1
((Y −Wβ1)′P/n) (σ2P ′P/n)−1 (P ′(Y −Wβ1)/n)
The first order condition is
(W ′P/n)(σ2P ′P/n)−1(P ′(Y −Wβ1)/n) = 0,
which yields
β1 = (W ′P (P ′P )−1P ′W )−1W ′P (P ′P )−1P ′Y = (W ′W )−1W ′Y = β1
Thus, the 2SLS and OLS estimates of β1 coincide, hence, ε = ε and σ2 = σ2, where σ2 is
the updated second-step estimate. The J statistic becomes
J = n(ε′P/n)(σ2P ′P/n)−1(P ′ε/n) = ε′P (σ2P ′P )−1P ′ε = ξ
B.4 Implementing Test in Practice
If the researcher is willing to nest the null model in the alternative and use series methods
to estimate the null model, then the following steps are needed to implement the test:
1. Pick the sequence of approximating functions of x, Wmn(x) = (w1(x), ..., wmn(x))′,
which will be used to estimate the semiparametric model.
2. Estimate the semiparametric model Yi = Wmn(Xi)′β1+εi using series methods. Obtain
the estimates β1 = (W ′W )−1W ′Y and residuals εi = Yi −Wmn(Xi)β1.
3. Pick the sequence of approximating functions of x, T rn(x) = (t1(x), ..., trn(x))′, which
will complement Wmn(x) to form the matrix P kn(x), kn = mn + rn, which corresponds
to a general nonparametric model. P kn(x) should be able to approximate any unknown
62
function sufficiently well. Common choices of basis functions include power series (see
Equation 3.3) and splines (see Equation 3.4).
4. Compute the quadratic form ξ = ε′P (σ2P ′P )−1P ′ε.
Note that in the homoskedastic case ξ can be computed as nR2 from the regression of
ε on P . The residuals from the regression of ε on P are given by e = ε−P (P ′P )−1P ′ε,
so that
e′e = ε′ε− εP (P ′P )−1P ′ε
Then
nR2 = n
(1− e′e
ε′ε
)= n
εP (P ′P )−1P ′ε
ε′ε= εP (σ2P ′P )−1P ′ε
Note also that, as shown in Remark A.1, ξ can be computed as the overidentifying
restrictions test statistic from the 2SLS instrumental variables regression of Yi on
Wmn(Xi) with (Wmn(Xi)′, T rn(Xi))
′ as instruments.
5. Compute the test statistic which is asymptotically standard normal under the null:
t =ξ − rn√
2rn
a∼ N(0, 1)
Reject the null if t > z1−α, the (1− α)-quantile of the standard normal distribution.
Alternatively, use the χ2 approximation directly: ξ a∼ χ2rn , reject the null if ξ > χ2
rn(1−
α), the (1− α)-quantile of the χ2 distribution with rn degrees of freedom.
If the researcher is willing to use other semiparametric methods, such as kernels, to
estimate the null model, then the following steps are needed to implement the test:
1. Estimate the semiparametric model Yi = f(Xi, θ, h) + εi using the preferred estimation
method. Obtain the estimates θ and h and residuals εi = Yi − f(Xi, θ, h).
63
2. Pick the sequence of approximating functions of x, P kn(x) = (p1(x), ..., pkn(x))′, which
is implicitly used to approximate a general nonparametric model. As in the previous
case, P kn(x) should be able to approximate any unknown function sufficiently well.
Common choices of basis functions include power series (see Equation 3.3) and splines
(see Equation 3.4).
3. Compute the quadratic form ξ = ε′P (σ2P ′P )−1P ′ε.
As before, in the homoskedastic case ξ can be computed as nR2 from the regression of
ε on P .
4. Compute the test statistic which is asymptotically standard normal under the null:
t =ξ − kn√
2kn
a∼ N(0, 1)
Reject the null if t > z1−α, the (1− α)-quantile of the standard normal distribution.
Alternatively, use the χ2 approximation directly: ξ a∼ χ2kn, reject the null if ξ > χ2
kn(1−
α), the (1− α)-quantile of the χ2 distribution with kn degrees of freedom.
If the researcher suspects that the errors are heteroskedastic, then ξ = ε′P (σ2P ′P )−1P ′ε
is replaced with ξHC = ε′P (P ′ΣP )−1P ′ε, where Σ = diag(ε2i ). All other steps remain un-
changed.
64
C Proofs
C.1 Proof of Theorem 1
Given the projection nature of series estimators, the test statistic becomes
ξ − rn√2rn
=ε′P (σ2P ′P )−1P ′ε− rn√
2rn=
(ε+R)′MWP (σ2P ′P )−1P ′MW (ε+R)− rn√2rn
Note that P = (W T ), so that MWP = (0n×mn MWT ). Then by the blockwise matrix
inverse formula,
MWP (σ2P ′P )−1P ′MW = MWT (σ2T ′MWT )−1T ′MW
Thus,
ξ − rn√2rn
=(ε+R)′MWT (σ2T ′MWT )−1T ′MW (ε+R)− rn√
2rn=
(ε+R)′T (σ2T ′T )−1T ′(ε+R)− rn√2rn
,
where T = MWT . The remainder of the proof consists of several steps.
Step 1. Show that ||T ′T /n− T ′T/n|| = op(1/√rn).
Note that
T ′T /n = T ′T/n− (T ′W/n)(W ′W/n)−1(W ′T/n)
Thus,
||T ′T /n− T ′T/n|| = ||(T ′W/n)(W ′W/n)−1(W ′T/n)||
By Lemma 15.2 in Li and Racine (2007), E[||P ′P/n − Ikn||2] = Op(ζ(kn)2kn/n) and
||P ′P/n− Ikn|| = Op(ζ(kn)√kn/n). Note that
P ′P/n− Ikn =
W ′W/n W ′T/n
T ′W/n T ′T/n
− Imn 0mn×rn
0rn×mn Irn
Hence, ||W ′W/n − Imn|| = Op(ζ(kn)
√kn/n) and ||W ′T/n|| = Op(ζ(kn)
√kn/n). Thus,
65
the eigenvalues of W ′W/n are bounded below and above w.p.a.1, and
∣∣∣∣∣∣(T ′W/n)(W ′W/n)−1(W ′T/n)∣∣∣∣∣∣ ≤ C
∣∣∣∣∣∣(T ′W/n)(W ′T/n)∣∣∣∣∣∣
≤ C∣∣∣∣∣∣(T ′W/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)∣∣∣∣∣∣ = Op(ζ(kn)2kn/n),
where the last inequality is due to the fact that ||AB||2 ≤ ||A||2||B||2 (see, e.g., Trefethen
and Bau III (1997), p. 23).
As long as condition 4.1 holds, ζ(kn)2kn√rn/n→ 0, and ||T ′T /n−T ′T/n|| = op(1/
√rn).
This also implies that the smallest and largest eigenvalues of T ′T /n converge to one.
Step 2. Decompose the test statistic and bound the remainder terms.
(ε+R)′T (σ2T ′T )−1T ′(ε+R) = ε′T (σ2T ′T )−1T ′ε+ 2R′T (σ2T ′T )−1T ′ε+R′T (σ2T ′T )−1T ′R
By the projection inequality and Assumption 4,
R′T (σ2T ′T )−1T ′R ≤ R′R/σ2 = Op(nm−2αn )
Because T ′T /n and T T ′/n have the same nonzero eigenvalues and all eigenvalues of T ′T /n
converge to one, λmax(T T′/n) converges in probability to 1. Thus,
∣∣∣R′T (σ2T ′T )−1T ′ε∣∣∣ ≤∣∣∣Cλmax(T T
′/n)R′ε∣∣∣ ≤ ∣∣∣CR′ε∣∣∣ =
∣∣∣C∑i
Riεi
∣∣∣Note that
E
(∑i
Riεi
)2 = E
[∑i
∑j
εiεjRiRj
]= E
[∑i
R2i ε
2i
]
= nE[R2i ε
2i ] = nσ2E[R2
i ] ≤ nσ2 supx∈X
R(x)2 = O(nm−2αn )
by Assumption 4.
Hence, ∣∣∣R′T (σ2T ′T )−1T ′ε∣∣∣ ≤ ∣∣∣C∑
i
Riεi
∣∣∣ = Op(n1/2m−αn )
66
Thus,
(ε+R)′T (σ2T ′T )−1T ′(ε+R) = ε′T (σ2T ′T )−1T ′ε+Op(nm−2αn ) +Op(n
1/2m−αn )
Next,
ε′T (σ2T ′T )−1T ′ε = ε′T (σ2T ′T )−1T ′ε− 2ε′PWT (σ2T ′T )−1T ′ε+ ε′PWT (σ2T ′T )−1T ′PW ε
Note that
E[(n−1ε′T )Ω−1(n−1T ′ε)] = E[ε2iT′iΩ−1Ti]/n = E[tr(Ω−1ε2iTiT
′i )] = tr(Irn)/n = rn/n
Thus, by Markov’s inequality,
||Ω−1(n−1T ′ε)|| ≤ C√
(n−1ε′T )Ω−1(n−1T ′ε) = Op(√rn/n)
Because the eigenvalues of Ω are bounded below and above w.p.a. 1, it is also true that
||n−1T ′ε|| = Op(√rn/n) and ||n−1W ′ε|| = Op(
√mn/n). Using this result and the inequality
||AB||2 ≤ ||A||2||B||2, get
∣∣∣∣∣∣ε′PWT (σ2T ′T )−1T ′ε∣∣∣∣∣∣ =
∣∣∣∣∣∣ε′W (W ′W )−1W ′T (σ2T ′T )−1T ′ε∣∣∣∣∣∣
=∣∣∣∣∣∣n(ε′W/n)(W ′W/n)−1(W ′T/n)(σ2T ′T /n)−1(T ′ε/n)
∣∣∣∣∣∣≤ Cn
∣∣∣∣∣∣(ε′W/n)(W ′T/n)(T ′ε/n)∣∣∣∣∣∣ ≤ Cn
∣∣∣∣∣∣(ε′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′ε/n)∣∣∣∣∣∣
= nOp(√mn/n)Op(ζ(kn)
√kn/n)Op(
√rn/n) = Op(ζ(kn)
√mnknrn/n)
In turn,
∣∣∣∣∣∣ε′PWT (σ2T ′T )−1T ′PW ε∣∣∣∣∣∣ =
∣∣∣∣∣∣ε′W (W ′W )−1W ′T (σ2T ′T )−1T ′W (W ′W )−1W ′ε∣∣∣∣∣∣
=∣∣∣∣∣∣n(ε′W/n)(W ′W/n)−1(W ′T/n)(σ2T ′T /n)−1(T ′W/n)(W ′W/n)−1(W ′ε/n)
∣∣∣∣∣∣≤ Cn
∣∣∣∣∣∣(ε′W/n)(W ′T/n)(T ′W/n)(W ′ε/n)∣∣∣∣∣∣
≤ Cn∣∣∣∣∣∣(ε′W/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′W/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′ε/n)∣∣∣∣∣∣
= nOp(√mn/n)Op(ζ(kn)
√kn/n)Op(ζ(kn)
√kn/n)Op(
√mn/n) = Op(ζ(kn)2mnkn/n)
67
Thus, under conditions 4.3 and 4.4,
ε′P (σ2P ′P )−1P ′ε = ε′T (σ2T ′T )−1T ′ε+Op(nm−2αn ) +Op(n
1/2m−αn )
+Op(ζ(kn)2mnkn/n) +Op(ζ(kn)√mnknrn/n) = ε′T (σ2T ′T )−1T ′ε+ op(
√rn)
(C.1)
Step 3. Deal with the leading term.
Define Ω = σ2E[TiT′i ] = σ2Irn , where Ti = T rn(Xi).
Lemma A.1. (Donald et al. (2003), Lemma 6.2) If E[Tiεi] = 0, E[(εiTiΩ−1T ′iεi)
2]/(rn√n)→
0, and rn →∞, then
n(n−1ε′T )Ω−1(n−1T ′ε)− rn√2rn
d→ N(0, 1) (C.2)
All three conditions of this lemma hold. E[Tiεi] = 0 and rn →∞ hold trivially, while
E[(εiTiΩ−1T ′iεi)
2] ≤ CE[ε4i ||Ti||4] ≤ CE[||Ti||4] ≤ Cζ(rn)2rn
Under condition 4.5, the assumptions of Lemma A.1 hold.
Lemma A.2. Suppose that Assumptions 2, 3, and 4 hold. Moreover, σ2(x) = σ2 for all
x ∈ X . Then
n(n−1ε′T )(σ2n−1T ′T )−1(n−1T ′ε)− n(n−1ε′T )Ω−1(n−1T ′ε)√rn
p→ 0 (C.3)
Proof of Lemma A.2. To prove the lemma, I will need an auxiliary result given below.
Lemma A.3. Let Ω = σ2T ′T /n, Ω = σ2T ′T/n Ω = σ2T ′T/n, Ω = σ2E[TiT′i ]. Suppose that
Assumptions 2(ii), 3, and 4 are satisfied. Then
||Ω− Ω|| = Op(ζ(kn)2kn/n)
||Ω− Ω|| = Op(rn/n1/2)
||Ω− Ω|| = Op(ζ(rn)r1/2n /n1/2)
If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ζ(kn)2kn/n→0, ζ(rn)r
1/2n /n1/2 → 0, and rn(mn/n + m−2αn )1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤
λmax(Ω) ≤ C and 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C.
68
Proof of Lemma A.3 is presented after the remainder of the main proof.
Given the result of Lemma A.3,∣∣∣∣∣n(n−1ε′T )Ω−1(n−1T ′ε)√2rn
− n(n−1ε′T )Ω−1(n−1T ′ε)√2rn
∣∣∣∣∣=
∣∣∣∣∣n(n−1ε′T )(Ω−1 − Ω−1)(n−1T ′ε)√2rn
∣∣∣∣∣ =
∣∣∣∣∣n(n−1ε′T )(Ω−1 − Ω−1)(n−1T ′ε)√2rn
∣∣∣∣∣=
∣∣∣∣∣n(n−1ε′T )(Ω−1(Ω− Ω)Ω−1(Ω− Ω)Ω−1)(n−1T ′ε)√2rn
− n(n−1ε′T )(Ω−1(Ω− Ω)Ω−1)(n−1T ′ε)√2rn
∣∣∣∣∣≤∣∣n(n−1ε′T )(Ω−1(Ω− Ω)Ω−1(Ω− Ω)Ω−1)(n−1T ′ε)
∣∣√
2rn+
∣∣n(n−1ε′T )(Ω−1(Ω− Ω)Ω−1)(n−1T ′ε)∣∣
√2rn
≤ n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2rn
As has been shown above, ||Ω−1(n−1T ′ε)|| = Op(√rn/n). Hence,
n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2rn
=nOp(rn/n)op(1/
√rn)√
2rn=op(√rn)√
2rn= op(1),
provided that ||Ω− Ω|| = op(1/√rn), which holds under rate conditions 4.1–4.2.
The result of Theorem 1 now follows from equations C.1, C.2, and C.3.
Proof of Lemma A.3. It has been shown in the proof of Theorem 1 that ||T ′T /n−T ′T/n|| =Op(ζ(kn)2kn/n). As long as σ2 p→ σ2, this implies ||Ω− Ω|| = Op(ζ(kn)2kn/n).
Next, due to homoskedasticity,
||Ω− Ω|| = ||(σ2 − σ2)∑i
TiT′i/n|| ≤ |σ2 − σ2|
∑i
||Ti||2/n
= |n−1∑i
(ε2i − σ2) + 2n−1∑i
εi(gi − gi) + n−1∑i
(gi − gi)2|∑i
||Ti||2/n
First, by Chebyshev’s inequality, n−1∑
i (ε2i − σ2) = Op(n
−1/2).
Second, by Assumption 2, n−1∑
i (gi − gi)2 = Op(mn/n+m−2αn ).
Finally, note that
n−1∑i
εi(gi − gi) = n−1∑i
εi(W′i (β1 − β1) +Ri) = n−1(β1 − β1)′W ′ε+ n−1R′ε
69
Using the result proved above, n−1R′ε = Op(n−1/2m−αn ).
Next,
∣∣∣n−1(β1 − β1)′W ′ε∣∣∣ =
∣∣∣∣∣∣n−1(β1 − β1)′W ′ε∣∣∣∣∣∣ ≤ ||β1 − β1||∣∣∣∣∣∣n−1W ′ε
∣∣∣∣∣∣= Op
((mn/n+m−2αn )1/2
)Op(n
−1/2m1/2n ) = Op
(m1/2n (mn/n+m−2αn )1/2/n1/2
)Then
n−1∑i
εi(gi − gi) = Op(n−1/2m−αn ) +Op
(m1/2n (mn/n+m−2αn )1/2/n1/2
)= Op
(m1/2n (mn/n+m−2αn )1/2/n1/2
)Combining the results,
σ2 − σ2 = Op(n−1/2) +Op(mn/n+m−2αn ) +Op
(m1/2n (mn/n+m−2αn )1/2/n1/2
)= Op(n
−1/2),
because m1/2n (mn/n+m−2αn )1/2/n1/2 = o(n−1/2) and mn/n+m−2αn = o(n−1/2).
By Lemma 1, E[||Ti||2] ≤ rn, which yields ||Ω− Ω|| = Op
(rnn
−1/2).Next, ||Ω− Ω|| = ||σ2(T ′T/n− E[TiT
′i ])||.
Thus,
E[||Ω− Ω||2] = E[||σ2∑i
(TiT′i − E[TiT
′i ])/n||2]
= σ4E[||TiT ′i − E[TiT′i ]||2]/n ≤ CE[||Ti||4]/n = Cζ(rn)2rn/n,
and hence ||Ω− Ω|| = Op(ζ(rn)√rn/n).
The remaining conclusions follow from Lemma A.6 in Donald et al. (2003).
C.2 Proof of Theorem 2
Given the projection nature of the series estimators, the test statistic becomes
ξ − rn√2rn
=ε′P (P ′ΣP )−1P ′ε− rn√
2rn=
(ε+R)′MWP (P ′ΣP )−1P ′MW (ε+R)− rn√2rn
Next, note that P = (W T ), so that MWP = (0n×mn MWT ). Then by the blockwise
70
matrix inverse formula,
MWP (P ′ΣP )−1P ′MW = MWT (nΩ)−1T ′MW ,
where
Ω = T ′ΣT/n− (T ′ΣW/n)(W ′ΣW/n)−1(W ′ΣT/n)
Thus,
ξ − rn√2rn
=(ε+R)′MWT (nΩ)−1T ′MW (ε+R)− rn√
2rn=
(ε+R)′T (nΩ)−1T ′(ε+R)− rn√2rn
,
where T = MWT . The remainder of the proof consists of several steps.
Step 1. Assume that ||Ω − Ω|| = op(1/√rn), where Ω = T ′ΣT . This is not a primitive
assumption; however, it is difficult to derive more primitive sufficient conditions.
Step 2. Use the following auxiliary result:
Lemma A.4. Let Ω =∑
i ε2iTiT
′i/n, Ω =
∑i ε
2iTiT
′i/n, Ω =
∑i σ
2i TiT
′i/n, Ω = E[ε2iTiT
′i ],
where σ2i = E[ε2i |Xi]. Suppose that Assumptions 2(ii), 3, and 4 are satisfied. Then
||Ω− Ω|| = Op
(ζ(rn)2(mn/n+m−2αn )
)||Ω− Ω|| = Op(ζ(rn)r1/2n /n1/2)
||Ω− Ω|| = Op(ζ(rn)r1/2n /n1/2)
If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ζ(rn)r1/2n /n1/2 →
0 and ζ(rn)2(mn/n + m−2αn )1/2, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C and 1/C ≤λmin(Ω) ≤ λmax(Ω) ≤ C.
Moreover, if ||Ω− Ω|| = op(1), then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C.
Step 3. Decompose the test statistic and bound the remainder terms.
(ε+R)′T (nΩ)−1T ′(ε+R) = ε′T (nΩ)−1T ′ε+ 2R′T (nΩ)−1T ′ε+R′T (nΩ)−1T ′R
Because T ′T /n and T T ′/n have the same nonzero eigenvalues and all eigenvalues of T ′T /n
converge to one, λmax(T T′/n) converges in probability to 1. Moreover, the eigenvalues of Ω
71
are bounded below and above. Thus, by Assumption 4,
R′T (nΩ)−1T ′R ≤ CR′(n−1T T ′)R/σ2 ≤ CR′R/σ2 = Op(nm−2αn )
Next, ∣∣∣R′T (nΩ)−1T ′ε∣∣∣ ≤ ∣∣∣Cλmax(T T
′/n)R′ε∣∣∣ ≤ ∣∣∣CR′ε∣∣∣ = Op(n
1/2m−αn )
In turn,
ε′T (nΩ)−1T ′ε = ε′T (nΩ)−1T ′ε− 2ε′PWT (nΩ)−1T ′ε+ ε′PWT (nΩ)−1T ′PW ε
Next,
∣∣∣∣∣∣ε′PWT (nΩ)−1T ′ε∣∣∣∣∣∣ =
∣∣∣∣∣∣ε′W (W ′W )−1W ′T (nΩ)−1T ′ε∣∣∣∣∣∣
=∣∣∣∣∣∣n(ε′W/n)(W ′W/n)−1(W ′T/n)Ω−1(T ′ε/n)
∣∣∣∣∣∣≤ Cn
∣∣∣∣∣∣(ε′W/n)(W ′T/n)(T ′ε/n)∣∣∣∣∣∣ ≤ Cn
∣∣∣∣∣∣(ε′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′ε/n)∣∣∣∣∣∣
= nOp(√mn/n)Op(ζ(kn)
√kn/n)Op(
√rn/n) = Op(ζ(kn)
√mnknrn/n)
In turn,
∣∣∣∣∣∣ε′PWT (nΩ)−1T ′PW ε∣∣∣∣∣∣ =
∣∣∣∣∣∣ε′W (W ′W )−1W ′T (nΩ)−1T ′W (W ′W )−1W ′ε∣∣∣∣∣∣
=∣∣∣∣∣∣n(ε′W/n)(W ′W/n)−1(W ′T/n)Ω−1(T ′W/n)(W ′W/n)−1(W ′ε/n)
∣∣∣∣∣∣≤ Cn
∣∣∣∣∣∣(ε′W/n)(W ′T/n)(T ′W/n)(W ′ε/n)∣∣∣∣∣∣
≤ Cn∣∣∣∣∣∣(ε′W/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′W/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′ε/n)∣∣∣∣∣∣
= nOp(√mn/n)Op(ζ(kn)
√kn/n)Op(ζ(kn)
√kn/n)Op(
√mn/n) = Op(ζ(kn)2mnkn/n)
Thus, under conditions 4.7 and 4.8
ε′P (P ′ΣP )−1P ′ε = ε′T (nΩ)−1T ′ε+Op(nm−2αn ) +Op(n
1/2m−αn )
+Op(ζ(kn)2mnkn/n) +Op(ζ(kn)√mnknrn/n) = ε′T (nΩ)−1T ′ε+ op(
√rn)
(C.4)
Step 4. Deal with the leading term.
Conditions of Lemma A.1 do not rely on the homoskedasticity assumption. Thus, the
72
result of the lemma remains valid even under heteroskedasticity, as long as ζ(rn)2/√n→ 0:
n(n−1ε′T )Ω−1(n−1T ′ε)− rn√2rn
d→ N(0, 1) (C.5)
Next, as in the proof of Theorem 1,∣∣∣∣∣n(n−1ε′T )Ω−1(n−1T ′ε)√2rn
− n(n−1ε′T )Ω−1(n−1T ′ε)√2rn
∣∣∣∣∣ =
∣∣∣∣∣n(n−1ε′T )(Ω−1 − Ω−1)(n−1T ′ε)√2rn
∣∣∣∣∣≤ n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√
2rn
Similarly to the proof of Theorem 1, ||Ω−1(n−1T ′ε)|| = Op(√rn/n).
Then
n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2rn
=nOp(rn/n)op(1/
√rn)√
2rn=op(√rn)√
2rn= op(1),
provided that ||Ω− Ω|| = op(1/√rn), which holds under rate conditions 4.6, 4.7, and 4.7.
Hence,
n(n−1ε′T )Ω−1(n−1T ′ε)− n(n−1ε′T )Ω−1(n−1T ′ε)√rn
p→ 0 (C.6)
The result of Theorem 2 now follows from equations C.4, C.5, and C.6.
Proof of Lemma A.4. First,
||Ω− Ω|| = ||∑i
TiT′i (ε
2i − ε2i )/n|| = ||
∑i
TiT′i
((εi + gi − gi)2 − ε2i
)/n||
= ||∑i
TiT′i
((gi − gi)2 + 2εi(gi − gi)
)/n|| ≤ sup
i||Ti||2
∣∣∣∑i
((gi − gi)2 + 2εi(gi − gi)
)/n∣∣∣
= ζ(rn)2[Op
(mn/n+m−2αn
)+Op
(n−1/2m1/2
n (mn/n+m−2αn )1/2)]
= Op
(ζ(rn)2(mn/n+m−2αn )
)The following two results can be obtained exactly as in Lemma A.6 in Donald et al.
73
(2003):
||Ω− Ω|| = ||∑i
TiT′i (ε
2i − σ2
i )/n|| = Op(ζ(rn)√rn/n)
||Ω− Ω|| = ||∑i
TiT′iε
2i /n− Ω|| = Op(ζ(rn)
√rn/n)
Finally, the results about the eigenvalues can also be obtained in the same way as in
Lemma A.6 in Donald et al. (2003).
C.3 Proof of Theorem 3
Denote ΩP = (σ2n−1P ′P ) in the homoskedastic case and ΩP = (n−1P ′ΣP ) in the het-
eroskedastic case. Then under homoskedasticity
ξ = n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε) = n(n−1ε′P )Ω−1P (n−1P ′ε),
while under heteroskedasticity
ξHC = n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε) = n(n−1ε′P )Ω−1P (n−1P ′ε)
Note that
√rnn
n(n−1ε′P )Ω−1P (n−1P ′ε)− rn√2rn
=1√2
(n−1ε′P )Ω−1P (n−1P ′ε) + T2,
where T2 = −rn/(n√
2)→ 0.
Hence, it suffices to show that (n−1ε′P )Ω−1P (n−1P ′ε)p→ ∆.
Next, note that due to the projection nature of the series estimators, ε = MWY =
MW (ε∗ +R∗). Hence,
(n−1ε′P )Ω−1P (n−1P ′ε) =(n−1(ε∗ +R∗)′MWT
)Ω−1
(n−1T ′MW (ε∗ +R∗)
),
where Ω is defined in the statement of Theorem 3.
74
Next,
(n−1(ε∗ +R∗)′MWT
)Ω−1
(n−1T ′MW (ε∗ +R∗)
)=(n−1ε∗′MWT
)Ω−1
(n−1T ′MW ε
∗)+(n−1R∗′MWT
)Ω−1
(n−1T ′MW ε
∗)+(n−1R∗′MWT
)Ω−1
(n−1T ′MWR
∗)Similarly to the proofs of Theorems 1 and 2, but using the fact that supx∈X R
∗(x) = o(1)
instead of supx∈X R(x) = O(m−αn ),
(n−1R∗′MWT
)Ω−1
(n−1T ′MW ε
∗) ≤ CR∗′ε∗/(nσ2) ≤= Op(n−1/2)op(1) = op(1)
and (n−1R∗′MWT
)Ω−1
(n−1T ′MWR
∗) ≤ CR∗′R∗/(nσ2) = op(1)
Thus,
(n−1(ε∗ +R∗)′MWT
)Ω−1
(n−1T ′MW (ε∗ +R∗)
)= (n−1ε∗′MWT )Ω−1(n−1T ′MW ε
∗) + op(1)
Next, given that, as shown in the proof of Theorem 1, MWT = T + op(1) and the
eigenvalues of Ω are bounded above and below w.p.a. 1,
(n−1ε∗′MWT )Ω−1(n−1T ′MW ε∗) = (n−1ε∗′T )Ω−1(n−1T ′ε∗) + op(1)
Next,
∣∣∣(n−1ε∗′T )(Ω−1 − Ω∗−1)(n−1T ′ε∗)∣∣∣ ≤ ∣∣∣(n−1ε∗′T )(Ω∗−1(Ω− Ω∗)Ω∗−1(Ω− Ω∗)Ω∗−1)(n−1T ′ε∗)
∣∣∣+∣∣∣(n−1ε∗′T )(Ω∗−1(Ω− Ω∗)Ω∗−1)(n−1T ′ε∗)
∣∣∣ ≤ ||Ω∗−1n−1T ′ε∗||2(||Ω− Ω∗||+ C||Ω− Ω∗||2) = op(1)
Thus, (n−1ε∗′T )Ω−1(n−1T ′ε∗) = (n−1ε∗′T )Ω∗−1(n−1T ′ε∗) + op(1).
To complete the proof, note that V ar(Tiε∗i ) ≤ Ω∗, because Ω∗ = E[ε∗2i TiT′i ]. Then
E[(n−1T ′ε∗ − E[Tiε∗i ])′Ω∗−1(n−1T ′ε∗ − E[Tiε
∗i ])]
≤ E[(n−1T ′ε∗ − E[Tiε∗i ])′V ar(Tiε
∗i )−1(n−1T ′ε∗ − E[Tiε
∗i ])]
= E[tr(V ar(Tiε
∗i )−1(n−1T ′ε∗ − E[Tiε
∗i ])(n
−1T ′ε∗ − E[Tiε∗i ])′)] = tr(Irn)/n = rn/n→ 0
75
Thus,
∣∣∣(n−1ε∗′T )Ω∗−1(n−1T ′ε∗)− E[ε∗iT′i ]Ω∗−1E[Tiε
∗i ]∣∣∣
≤∣∣∣(n−1T ′ε∗ − E[Tiε
∗i ])′Ω∗−1(n−1T ′ε∗ − E[Tiε
∗i ])∣∣∣+ 2
∣∣∣E[ε∗iT′i ]Ω∗−1(n−1T ′ε∗ − E[Tiε
∗i ])∣∣∣
≤ op(1) + 2√E[ε∗iT
′i ]Ω∗−1E[Tiε∗i ]
√(n−1T ′ε∗ − E[Tiε∗i ])
′Ω∗−1(n−1T ′ε∗ − E[Tiε∗i ])
= op(1) + 2√
∆op(1) = op(1)
Combining the results above, (n−1ε′P )Ω−1P (n−1P ′ε)p→ ∆.
C.4 Proof of Theorem 4
Because series methods are used, it is still the case that ε = MWY . Under the local
alternative,
ε = MWY = MW (gn + ε) = MW (f ∗ + (r1/4n /n1/2)d+ ε)
= MW
(W ′β∗1 + (f ∗ −W ′β∗1) + (r1/4n /n1/2)d+ ε
)= MW
(ε+R + (r1/4n /n1/2)d
)Thus, the test statistic becomes
ξ − rn√2rn
=ε′P (σ2P ′P )−1P ′ε− rn√
2rn
=(ε+R + (r
1/4n /n1/2)d)′MWP (σ2P ′P )−1P ′MW (ε+R + (r
1/4n /n1/2)d)− rn√
2rn
=(ε+R + (r
1/4n /n1/2)d)′MWT (σ2T ′MWT )−1T ′MW (ε+R + (r
1/4n /n1/2)d)− rn√
2rn
=(ε+R + (r
1/4n /n1/2)d)′T (σ2T ′T )−1T ′(ε+R + (r
1/4n /n1/2)d)− rn√
2rn,
where T = MWT . The remainder of the proof consists of several steps.
Step 1. It has been shown in the proof of Theorem 1 that ||T ′T /n−T ′T/n|| = op(1/√rn)
as long as condition 4.1 holds, ζ(kn)2kn√rn/n→ 0. This also implies that the smallest and
largest eigenvalues of T ′T /n converge to one.
76
Step 2. Decompose the test statistic and bound the remainder terms.
(ε+R + (r1/4n /n1/2)d)′T (σ2T ′T )−1T ′(ε+R + (r1/4n /n1/2)d)
= ε′T (σ2T ′T )−1T ′ε+ 2R′T (σ2T ′T )−1T ′ε+R′T (σ2T ′T )−1T ′R
+ 2(r1/4n /n1/2)d′T (σ2T ′T )−1T ′ε+ 2(r1/4n /n1/2)d′T (σ2T ′T )−1T ′R + (r1/2n /n)d′T (σ2T ′T )−1T ′d
As in the proof of Theorem 1, R′T (σ2T ′T )−1T ′R = Op(nm−2αn ) and
∣∣∣R′T (σ2T ′T )−1T ′ε∣∣∣ =
Op(n1/2m−αn ).
Recall that, because T ′T /n and T T ′/n have the same nonzero eigenvalues and all eigen-
values of T ′T /n converge to one, λmax(T T′/n) converges in probability to 1.
Thus, as for the fourth term,
∣∣∣(r1/4n /n1/2)d′T (σ2T ′T )−1T ′ε∣∣∣ ≤ (r1/4n /n1/2)
∣∣∣Cλmax(T T′/n)d′ε
∣∣∣≤ (r1/4n /n1/2)
∣∣∣Cd′ε∣∣∣ = (r1/4n /n1/2)Op(n1/2) = Op(r
1/4n )
As for the fifth term,
∣∣∣(r1/4n /n1/2)d′T (σ2T ′T )−1T ′R∣∣∣ ≤ (r1/4n /n1/2)
∣∣∣Cλmax(T T′/n)d′R
∣∣∣ ≤ (r1/4n /n1/2)∣∣∣Cd′R∣∣∣
By the Cauchy-Schwartz inequality,
∣∣∣(r1/4n /n1/2)d′R∣∣∣ ≤ (r1/4n n1/2)
√√√√(∑i
R2i /n
)(∑i
d2i /n
)
= (r1/4n n1/2)√Op(m−2αn )E[d2i ](1 + op(1)) = Op(r
1/4n m−αn n1/2)
As for the last term,
(r1/2n /n)d′T (σ2T ′T )−1T ′d = (r1/2n /n)d′T (σ2T ′T )−1T ′d
+ (r1/2n /n)d′PWT (σ2T ′T )−1T ′PWd− 2(r1/2n /n)d′PWT (σ2T ′T )−1T ′d
77
Using Assumption 6,
∣∣∣∣∣∣(r1/2n /n)d′PWT (σ2T ′T )−1T ′d∣∣∣∣∣∣ = (r1/2n /n)
∣∣∣∣∣∣d′W (W ′W )−1W ′T (σ2T ′T )−1T ′d∣∣∣∣∣∣
= r1/2n
∣∣∣∣∣∣(d′W/n)(W ′W/n)−1(W ′T/n)(σ2T ′T /n)−1(T ′d/n)∣∣∣∣∣∣
≤ Cr1/2n
∣∣∣∣∣∣(d′W/n)(W ′T/n)(T ′d/n)∣∣∣∣∣∣ ≤ Cr1/2n
∣∣∣∣∣∣(d′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′d/n)∣∣∣∣∣∣
= r1/2n Op(√mn/n)Op(ζ(kn)
√kn/n)Op(
√rn/n) = Op
(ζ(kn)r3/2n m1/2
n k1/2n /n3/2)
Next,
∣∣∣∣∣∣(r1/2n /n)d′PWT (σ2T ′T )−1T ′PWd∣∣∣∣∣∣ = (r1/2n /n)
∣∣∣∣∣∣d′W (W ′W )−1W ′T (σ2T ′T )−1T ′W (W ′W )−1W ′d∣∣∣∣∣∣
= r1/2n
∣∣∣∣∣∣(d′W/n)(W ′W/n)−1(W ′T/n)(σ2T ′T /n)−1(T ′W/n)(W ′W/n)−1(W ′d/n)∣∣∣∣∣∣
≤ Cr1/2n
∣∣∣∣∣∣(d′W/n)(W ′T/n)(T ′W/n)(W ′d/n)∣∣∣∣∣∣
≤ Cr1/2n
∣∣∣∣∣∣(d′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′T/n)
∣∣∣∣∣∣ ∣∣∣∣∣∣(T ′W/n)∣∣∣∣∣∣ ∣∣∣∣∣∣(W ′d/n)
∣∣∣∣∣∣= r1/2n Op(
√mn/n)Op(ζ(kn)
√kn/n)Op(ζ(kn)
√kn/n)Op(
√mn/n) = Op(ζ(kn)2r1/2n mnkn/n
2)
Thus, under rate condition 4.3,
(r1/2n /n)d′T (σ2T ′T )−1T ′d = (r1/2n /n)d′T (σ2T ′T )−1T ′d+ op(r1/2n )
Next, similarly to the proof of Theorem 1,∣∣∣∣∣r1/2n (n−1d′T )(n−1σ2T ′T )−1(n−1T ′d)√2rn
− r1/2n (n−1d′T )(n−1σ2T ′T )−1(n−1T ′d)√
2rn
∣∣∣∣∣≤r1/2n ||(n−1σ2T ′T )−1n−1T ′ε||2
(||(n−1σ2T ′T )− (n−1σ2T ′T )||+ C||(n−1σ2T ′T )− (n−1σ2T ′T )||2
)√
2rn
=r1/2n Op(rn/n)op(1/
√rn)√
2rn= op(r
1/2n /n) = op(1)
Next,
(r1/2n /n)d′T (T ′T )−1T ′d/σ2 = (r1/2n /n)(d′T (T ′T )−1T ′)(T (T ′T )−1T ′d)/σ2 = (k1/2n /n)d′d/σ2,
where d = T (T ′T )−1T ′d are the fitted values from the nonparametric series regression of d
78
on T .
By Lemma 4, (d− d)′(d− d) = Op (n(rn/n+ r−2αn )).
By the Cauchy-Schwartz inequality and Lemma 4,
∣∣∣d′(d− d)∣∣∣ ≤ n
√√√√(∑i
d2i /n
)(∑i
(di − di)2/n
)
= n√E[d2i ](1 + op(1))Op (n(rn/n+ r−2αn )) = Op
(n(rn/n+ r−2αn )1/2
)Thus,
(r1/2n /n)d′T (σ2T ′T )−1T ′d = (r1/2n /n)d′d/σ2 +Op
(r1/2n (rn/n+ r−2αn )1/2
)= r1/2n E[d2i ](1 + op(1))/σ2 +Op
(r1/2n (rn/n+ r−2αn )1/2
)+ op(r
1/2n )
where the last equality is due to the law of large numbers and σ2 = σ2 + op(1).
Thus,
ε′P (σ2P ′P )−1P ′ε = ε′T (σ2T ′T )−1T ′ε+ r1/2n E[d2i ]/σ2 +Op(nm
−2αn ) +Op(n
1/2m−αn )
+Op(r1/4n ) +Op(r
1/4n m−αn n1/2) +Op
(r1/2n (rn/n+ r−2αn )1/2
)+ op(r
1/2n )
Next, as has been shown in the proof of Theorem 1,
ε′P (σ2P ′P )−1P ′ε = ε′T (σ2T ′T )−1T ′ε+Op(nm−2αn ) +Op(n
1/2m−αn )
+Op(ζ(kn)2mnkn/n) +Op(ζ(kn)√mnknrn/n)
Under rate conditions 4.3 and 4.4,
ε′P (σ2P ′P )−1P ′ε = ε′T (σ2T ′T )−1T ′ε+ r1/2n E[d2i ]/σ2 + op(r
1/2n ) (C.7)
Step 3. Deal with the leading term.
The result of Lemma A.1 applies, so that
n(n−1ε′T )Ω−1(n−1T ′ε)− rn√2rn
d→ N(0, 1) (C.8)
79
Next, use the following lemma:
Lemma A.5. Suppose that Assumptions 2, 3, and 8 hold Then
n(n−1ε′T )(σ2n−1T ′T )−1(n−1T ′ε)− n(n−1ε′T )Ω−1(n−1T ′ε)√rn
p→ 0 (C.9)
Proof of Lemma A.5. To prove the lemma, I will again need an auxiliary result given in the
following lemma.
Lemma A.6. Let Ω = σ2T ′T /n, Ω = σ2T ′T/n Ω = σ2T ′T/n, Ω = σ2E[TiT′i ]. Suppose that
Assumptions 2(ii), 3, and 8 are satisfied. Then
||Ω− Ω|| = Op(ζ(kn)2kn/n)
||Ω− Ω|| = Op(rnn−1/2)
||Ω− Ω|| = Op(ζ(rn)r1/2n /n1/2)
If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ζ(kn)2kn/n→0, ζ(rn)r
1/2n /n1/2 → 0, and rn(mn/n + m−2αn )1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤
λmax(Ω) ≤ C and 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C.
Proof of Lemma A.6 is presented after the remainder of the main proof.
Given the result of Lemma A.6, as in the proof of Theorem 1,∣∣∣∣∣n(n−1ε′T )Ω−1(n−1T ′ε)√2rn
− n(n−1ε′T )Ω−1(n−1T ′ε)√2rn
∣∣∣∣∣ ≤ n||Ω−1n−1T ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2rn
=nOp(rn/n)op(1/
√rn)√
2rn=op(√rn)√
2rn= op(1),
provided that ||Ω − Ω|| = op(1/√rn), which holds under rate conditions which holds under
rate conditions 4.1–4.2.
The result of Theorem 4 now follows from equations C.7, C.8, and C.9.
Proof of Lemma A.6. It has been shown in the proof of Theorem 1 that ||T ′T /n−T ′T/n|| =Op(ζ(kn)2kn/n).
As long as σ2 p→ σ2, this implies ||Ω− Ω|| = Op(ζ(kn)2kn/n).
80
Under the local alternative,
ε = ε+ (gn − f) = ε+ (f ∗n − fn) + (r1/4n /n1/2)d
Thus,
σ2 = ε′ε/n = ε′ε/n+ (f ∗n − fn)′(f ∗n − fn)/n+ (r1/2n /n)d′d/n
+ 2(f ∗n − fn)′ε/n+ 2(r1/4n /n1/2)(f ∗n − fn)′d/n+ 2(r1/4n /n1/2)d′ε/n
First, by Chebyshev’s inequality, n−1∑
i (ε2i − σ2) = Op(n
−1/2).
Second, similarly to the proof of Lemma A.3,
n−1∑i
εi(f∗ni − fni) = Op
(m1/2n (mn/n+m−2αn )1/2/n1/2
),
and
n−1∑i
(f ∗ni − fni)2 = Op(mn/n+m−2αn )
Next, by the law of large numbers,
(r1/2n /n)d′d/n = (r1/2n /n)E[d(Xi)2](1 + op(1)) = Op(r
1/2n /n)
In turn,
∣∣∣(r1/4n /n1/2)(f ∗n − fn)′d/n∣∣∣ = Op(r
1/4n (mn/n+m−2αn )1/2/n1/2)
Finally,
(r1/4n /n1/2)d′ε/n = (r1/4n /n1/2)Op(n−1/2) = Op(r
1/4n /n)
Combining the results,
σ2 − σ2 = Op(n−1/2) +Op
(m1/2n (mn/n+m−2αn )1/2/n1/2
)+Op(r
1/2n /n) +Op(r
1/4n (mn/n+m−2αn )1/2/n1/2) = Op(n
−1/2)
Moreover, by Lemma 1, E[||Ti||2] ≤ rn, which yields ||Ω− Ω|| = Op(rnn−1/2).
The remaining conclusions can be proved as in Lemma A.9.
81
C.5 Proof of Theorem 5
Recall that
ξ = ε′P (σ2P ′P )−1P ′ε
Define Ω = σ2E[PiP′i ] = σ2Ikn , where Pi = P kn(Xi). The proof of the theorem relies on
verifying the conditions of the following lemmas:
Lemma A.7. (Donald et al. (2003), Lemma 6.2) If E[Piεi] = 0, E[(εiPiΩ−1P ′iεi)
2]/(kn√n)→
0, and kn →∞, then
n(n−1ε′P )Ω−1(n−1P ′ε)− kn√2kn
d→ N(0, 1) (C.10)
All three conditions of this lemma hold. E[Piεi] = 0 and kn →∞ hold trivially, while
E[(εiPiΩ−1P ′iεi)
2] ≤ CE[ε4i ||Pi||4] ≤ CE[||Pi||4] ≤ Cζ(kn)2kn
As long as conditions 5.4 holds, ζ(kn)2/√n→ 0, so that the condition of the lemma holds.
Lemma A.8. Suppose that Assumptions 2, 3, 10, and 11 hold. Then
n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )Ω−1(n−1P ′ε)√kn
p→ 0 (C.11)
Proof. Step 1. Show that ε can be replaced with ε.
Because ε = Y − g = ε+ (g − g),
n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)
= (g − g)′P (P ′P )−1P ′(g − g)/σ2 + 2(g − g)′P (P ′P )−1P ′ε/σ2
As for the first term,
(g − g)′P (P ′P )−1P ′(g − g)/σ2 ≤ (g − g)′(g − g)/σ2 = Op(nψn)
by the projection inequality, Assumption 10(b) and σ2 p→ σ2.
82
As for the second term,
∣∣∣(g − g)′P (P ′P )−1P ′ε/σ2∣∣∣ ≤ ∣∣∣λmax
((σ2P ′P/n)−1
)n−1(g − g)′PP ′ε
∣∣∣ ≤ ∣∣∣Cλmax(PP′/n)(g − g)′ε
∣∣∣Because P ′P/n and PP ′/n have the same nonzero eigenvalues and all eigenvalues of
P ′P/n converge to one, λmax(PP′/n) converges in probability to 1. Thus,
∣∣∣(g − g)′P (P ′P )−1P ′ε/σ2∣∣∣ ≤ ∣∣∣Cλmax(PP
′/n)(g − g)′ε∣∣∣ ≤ ∣∣∣C(g − g)′ε
∣∣∣ = op(k1/2n )
by Assumption 11.
Then
n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε) = Op(nψn) + op(k1/2n )
As long as rate condition 5.3 holds, nψn/k1/2n → 0, this implies
n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)√kn
p→ 0
Step 2. Show that Ω = σ2P ′P/n can be replaced with Ω.
To prove this, I will use the auxiliary result presented in the following lemma.
Lemma A.9. Let Ω = σ2P ′P/n, Ω = σ2P ′P/n, Ω = σ2E[PiP′i ]. Suppose that Assumptions
2(ii), 3, 10, and 11 are satisfied. Then
||Ω− Ω|| = Op(kn/n1/2) +Op(knψn) + op(k
3/2n /n)
||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2)
If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ψ1/2n kn +
ζ(kn)k1/2n /n1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C and 1/C ≤ λmin(Ω) ≤
λmax(Ω) ≤ C.
Proof of Lemma A.9 is presented after the remainder of the main proof.
83
Using the result of Lemma A.9,∣∣∣∣∣n(n−1ε′P )Ω−1(n−1P ′ε)√2kn
− n(n−1ε′P )Ω−1(n−1P ′ε)√2kn
∣∣∣∣∣=
∣∣∣∣∣n(n−1ε′P )(Ω−1 − Ω−1)(n−1P ′ε)√2kn
∣∣∣∣∣ =
∣∣∣∣∣n(n−1ε′P )(Ω−1 − Ω−1)(n−1P ′ε)√2kn
∣∣∣∣∣=
∣∣∣∣∣n(n−1ε′P )(Ω−1(Ω− Ω)Ω−1(Ω− Ω)Ω−1)(n−1P ′ε)√2kn
− n(n−1ε′P )(Ω−1(Ω− Ω)Ω−1)(n−1P ′ε)√2kn
∣∣∣∣∣≤∣∣n(n−1ε′P )(Ω−1(Ω− Ω)Ω−1(Ω− Ω)Ω−1)(n−1P ′ε)
∣∣√
2kn+
∣∣n(n−1ε′P )(Ω−1(Ω− Ω)Ω−1)(n−1P ′ε)∣∣
√2kn
≤ n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2kn
Similarly to the proof of Theorem 1, ||Ω−1(n−1P ′ε)|| = Op(√kn/n).
Then
n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2kn
=nOp(kn/n)op(1/
√kn)√
2kn=op(√kn)√
2kn= op(1),
provided that ||Ω− Ω|| = op(1/√kn), which holds under rate conditions 5.1 and 5.2.
The result of Theorem 5 follows directly from combining the results in Equations C.10
and C.11.
Proof of Lemma A.9. Due to homoskedasticity,
||Ω− Ω|| = ||(σ2 − σ2)∑i
PiP′i/n|| ≤ |σ2 − σ2|
∑i
||Pi||2/n
= |n−1∑i
(ε2i − σ2) + 2n−1∑i
εi(gi − gi) + n−1∑i
(gi − gi)2|∑i
||Pi||2/n
First, by Chebyshev’s inequality, n−1∑
i (ε2i − σ2) = Op(n
−1/2).
Second, by Assumption 10, n−1∑
i (gi − gi)2 = Op(ψn).
Finally, by Assumption 11, n−1∑
i εi(gi − gi) = op(k1/2n /n).
Moreover, by Lemma 1, E[||Pi||2] ≤ kn, which yields
||Ω− Ω|| = Op(kn/n1/2) +Op(knψn) + op(k
3/2n /n)
84
Next, ||Ω− Ω|| = ||σ2(P ′P/n− E[PiP′i ])||.
Thus,
E[||Ω− Ω||2] = E[||σ2∑i
(PiP′i − E[PiP
′i ])/n||2]
= σ4E[||PiP ′i − E[PiP′i ]||2]/n ≤ CE[||Pi||4]/n = Cζ(kn)2kn/n,
and by Markov inequality ||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2).
The remaining conclusions follow from Lemma A.6 in Donald et al. (2003).
Remark A.2 (Discussion of Assumptions 11 and 14.). Assumptions 11 and 14 can be jus-
tified as follows. Supposed that gi are leave-one-out kernel estimates of gi (i.e., ones that do
not use observation i to estimate gi). Then a typical form of gi is
gi =
∑j 6=i YjKij∑j 6=iKij
=
∑j 6=i (gj + εj)Kij∑
j 6=iKij
=
∑j 6=i gjKij∑j 6=iKij
+
∑j 6=i εjKij∑j 6=iKij
,
where Kij = K(Ai−Aj
h
)is a d-dimensional kernel function, and Ai is chosen appropriately
(e.g. Ai = Xi in usual nonparametric models or Ai = X ′iα in single index models).
Then
n∑i=1
εi(gi − gi) =1nhd
∑ni=1
∑j 6=i εiεjKij
1nhd
∑j 6=iKij
+1nhd
∑ni=1
∑j 6=i εi(gj − gi)Kij
1nhd
∑j 6=iKij
Under appropriate regularity conditions (possibly including trimming to deal with the
small denominator problem), 1nhd
∑j 6=iKij ≡ pi = pi(1 + op(1)), where pi is the density at
observation i. If pi is bounded below from zero, then the denominator will be asymptotically
bounded. Thus, it is enough to deal with the numerator.
Let In1 = 1nhd
∑ni=1
∑j 6=i εiεjKij and In2 = 1
nhd
∑ni=1
∑j 6=i εi(gj − gi)Kij.
First, consider In1.
E[I2n1] = E
[1
n2h2d
n∑i=1
∑j 6=i
n∑k=1
∑l 6=k
εiεjεkεlKijKkl
]
This expectation is nonzero only when i = k and j = l or when i = l and j = k, so it
85
reduces to
E[I2n1] = 2E
[1
n2h2d
n∑i=1
∑j 6=i
ε2i ε2jK
2ij
]= 2
1
h2dE[ε21ε
22K
212] = 2
1
h2dσ4E[K2
12]
Next, for some appropriate Ai (e.g. Ai = Xi in usual nonparametric models or Ai = X ′iα
in single index models)
1
h2dE[K2
12] =1
h2dE
[K
(A2 − A1
h
)2]
=
∫ ∫K
(a2 − a1h
)2
f(a1)f(a2)da1da2
=1
hd
∫ ∫K(u)2f(a1)f(a1 + hu)duda1 = hda
∫ ∫K(u)2f(a1)
2(1 + o(1))duda1
=1
hd
(∫K(u)2du
)E[f(Ai)](1 + o(1)) = O(h−d)
Hence, E[I2n1] = O(hda), and consequently In1 = Op(h−d/2).
Next, consider In2.
E[I2n2] = E
[1
n2h2d
n∑i=1
∑j 6=i
n∑k=1
∑l 6=k
εiεk(gj − gi)(gl − gk)KijKkl
]
This expectation is nonzero only when i = k, so it reduces to
E[I2n2] = E
[1
n2h2d
n∑i=1
∑j 6=i
∑l 6=i
ε2i (gj − gi)(gl − gi)KijKil
]
= E
[1
n2h2d
n∑i=1
∑j 6=i
∑l 6=i,j
ε2i (gj − gi)(gl − gi)KijKil
]+ E
[1
n2h2d
n∑i=1
∑j 6=i
ε2i (gj − gi)2K2ij
]
=n
h2dE[ε21(g2 − g1)(g3 − g1)K12K13] +
1
h2dE[ε21(g2 − g1)2K2
12]
=n
h2dσ2E[(g2 − g1)(g3 − g1)K12K13] +
1
h2dσ2E[(g2 − g1)2K2
12]
86
As for the first term,
1
h2dE[(g2 − g1)(g3 − g1)K12K13] =
1
h2dE
[(g(A2)− g(A1))(g(A3)− g(A1))
K
(A2 − A1
h
)K
(A3 − A1
h
)]
=1
h2d
∫ ∫ ∫(g(a2)− g(a1))(g(a3)− g(a1))K
(a2 − a1h
)K
(a3 − a1h
)f(a1)f(a2)f(a3)da1da2da3
=
∫ ∫ ∫(g(a1 + hu)− g(a1))(g(a1 + hv)− g(a1))K(u)K(v)
f(a1)f(a1 + hu)f(a1 + hv)da1dudv
=
∫ ∫ ∫(g′(a1)hu+ g′′(a1)h
2u2 + o(h2))
(g′(a1)hv + g′′(a1)h2v2 + o(h2))K(u)K(v)
f(a1)(f(a1) + f ′(a1)hu+ o(h))
(f(a1) + f ′(a1)hv + o(h))da1dudv = O(h4)
if the second order kernel is used. If a νth higher order kernel (ν > 2) is used, then this will
become O(h2ν).
Hence, nσ2E[(g2 − g1)(g3 − g1)K12K13] = O(nh4).
As for the second term,
1
h2dE[(g2 − g1)2K2
12] =1
h2dE
[(g(A2)− g(A1))
2K
(A2 − A1
h
)2]
=1
h2d
∫ ∫(g(a2)− g(a1))
2K
(a2 − a1h
)2
f(a1)f(a2)da1da2
=1
hd
∫ ∫(g(a1 + hu)− g(a1))
2K(u)2f(a1)f(a1 + hu)da1du
=1
hd
∫ ∫g′(a1)
2h2u2K(u)2f(a1)2(1 + o(1))da1du
=1
hd
(∫u2K(u)2du
)E[g′(Ai)
2f(Ai)](1 + o(1)) = O(h2−d)
Hence, σ2E[(g2 − g1)2K212] = O(h2−d).
87
Thus, E[I2n2] = O(nh4 + h2−d), and consequently In2 = Op(n1/2h2) +Op(h
1−d/2).
Combining the results above,
n∑i=1
εi(gi − gi) = Op(n1/2h2) +Op(h
−d/2)
In order for Assumptions 11 and 14 to hold, the bandwidth h and the number of series
terms kn need to satisfy:
n1/2h2/k1/2n → 0
1/(k1/2n hd/2)→ 0
C.6 Proof of Theorem 6
Conditions of Lemma A.7 do not rely on the homoskedasticity assumption. Thus, the
result of the lemma remains valid even under heteroskedasticity, as long as ζ(kn)2/√n→ 0:
n(n−1ε′P )Ω−1(n−1P ′ε)− kn√2kn
d→ N(0, 1), (C.12)
where now Ω = E[ε2iPiP′i ].
Lemma A.10. Suppose that Assumptions 2, 3, 10, and 11 hold. Then
n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)− n(n−1ε′P )Ω−1(n−1P ′ε)√kn
p→ 0 (C.13)
Proof.
Lemma A.11. Let Ω =∑
i ε2iPiP
′i/n, Ω =
∑i ε
2iPiP
′i/n, Ω =
∑i σ
2i PiP
′i/n, Ω = E[ε2iPiP
′i ],
where σ2i = E[ε2i |Xi]. Suppose that Assumptions 2(ii), 3, 10, and 11 are satisfied. Then
||Ω− Ω|| = Op(ζ(kn)2ψn) + op(ζ(kn)2k1/2n /n)
||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2)
||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2)
If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ζ(kn)2ψ1/2n →
0 and ζ(kn)k1/2n /n1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C and 1/C ≤
88
λmin(Ω) ≤ λmax(Ω) ≤ C.
Lemma A.10 can now be proven in two steps.
Step 1. Show that ε can be replaced with ε.
Because ε = ε+ (g − g),
∣∣∣n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)− n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)∣∣∣
=∣∣∣n(n−1(g − g)′P )(n−1P ′ΣP )−1(n−1P ′(g − g)) + 2n(n−1(g − g)′P )(n−1P ′ΣP )−1(n−1P ′ε)
∣∣∣≤∣∣∣Cn(n−1(g − g)′P )(n−1P ′(g − g))
∣∣∣+ 2∣∣∣Cn(n−1(g − g)′P )(n−1P ′ε)
∣∣∣,because the eigenvalues of n−1P ′ΣP are bounded below and above by Lemma A.11. As was
discussed in the proof of Theorem 5, the eigenvalues of PP ′/n are bounded above. Thus,
∣∣∣Cn(n−1(g − g)′P )(n−1P ′(g − g))∣∣∣+ 2
∣∣∣Cn(n−1(g − g)′P )(n−1P ′ε)∣∣∣
=∣∣∣C(g − g)′(PP ′/n)(g − g)
∣∣∣+ 2∣∣∣C(g − g)′(PP ′/n)ε
∣∣∣ ≤ ∣∣∣C(g − g)′(g − g)∣∣∣+ 2
∣∣∣C(g − g)′ε∣∣∣
= Op(nψn) + op(k1/2n ) = Op(nψn) + op(k
1/2n )
As long as nψn/k1/2n → 0,
n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)− n(n−1ε′P )(n−1P ′ΣP )−1(n−1P ′ε)√kn
p→ 0
Step 2. Show that Ω = P ′ΣP/n can be replaced with Ω.
Similarly to the proof of Theorem 5,∣∣∣∣∣n(n−1ε′P )Ω−1(n−1P ′ε)√2kn
− n(n−1ε′P )Ω−1(n−1P ′ε)√2kn
∣∣∣∣∣ =
∣∣∣∣∣n(n−1ε′P )(Ω−1 − Ω−1)(n−1P ′ε)√2kn
∣∣∣∣∣≤ n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√
2kn
Similarly to the proof of Theorem 1, ||Ω−1(n−1P ′ε)|| = Op(√kn/n).
Then
n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2kn
=nOp(kn/n)op(1/
√kn)√
2kn=op(√kn)√
2kn= op(1),
89
provided that ||Ω− Ω|| = op(1/√kn), which holds under rate conditions 5.5 and 5.6.
The result of Theorem 6 follows directly from combining the results in Equations C.12
and C.13.
Proof of Lemma A.11. First,
||Ω− Ω|| = ||∑i
PiP′i (ε
2i − ε2i )/n|| = ||
∑i
PiP′i
((εi + gi − gi)2 − ε2i
)/n||
= ||∑i
PiP′i
((gi − gi)2 + 2εi(gi − gi)
)/n|| ≤ sup
i||Pi||2
∣∣∣∑i
((gi − gi)2 + 2εi(gi − gi)
)/n∣∣∣
= ζ(kn)2(Op(ψn) + op(k1/2n /n)) = Op(ζ(kn)2ψn) + op(ζ(kn)2k1/2n /n)
The following two results can be obtained exactly as in Lemma A.6 in Donald et al.
(2003):
||Ω− Ω|| = ||∑i
PiP′i (ε
2i − σ2
i )/n|| = Op(ζ(kn)√kn/n)
||Ω− Ω|| = ||∑i
PiP′iε
2i /n− Ω|| = Op(ζ(kn)
√kn/n)
Finally, the results about the eigenvalues can also be obtained in the same way as in
Lemma A.6 in Donald et al. (2003).
C.7 Proof of Theorem 7
Note that
√knn
n(n−1ε′P )Ω−1(n−1P ′ε)− kn√2kn
=1√2
(n−1ε′P )Ω−1(n−1P ′ε) + T1,
where T1 = − knn√2→ 0.
Hence, it suffices to show that (n−1ε′P )Ω−1(n−1P ′ε)p→ ∆.
90
First, note that V ar(Piε∗i ) ≤ Ω∗, because Ω∗ = E[ε∗2i PiP′i ]. Then
E[(n−1P ′ε∗ − E[Piε∗i ])′Ω∗−1(n−1P ′ε∗ − E[Piε
∗i ])]
≤ E[(n−1P ′ε∗ − E[Piε∗i ])′V ar(Piε
∗i )−1(n−1P ′ε∗ − E[Piε
∗i ])]
= E[tr(V ar(Piε
∗i )−1(n−1P ′ε∗ − E[Piε
∗i ])(n
−1P ′ε∗ − E[Piε∗i ])′)] = tr(Ikn)/n = kn/n→ 0
Thus,
∣∣∣(n−1ε∗′P )Ω∗−1(n−1P ′ε∗)− E[ε∗iP′i ]Ω∗−1E[Piε
∗i ]∣∣∣
≤∣∣∣(n−1P ′ε∗ − E[Piε
∗i ])′Ω∗−1(n−1P ′ε∗ − E[Piε
∗i ])∣∣∣+ 2
∣∣∣E[ε∗iP′i ]Ω∗−1(n−1P ′ε∗ − E[Piε
∗i ])∣∣∣
≤ op(1) + 2√E[ε∗iP
′i ]Ω∗−1E[Piε∗i ]
√(n−1P ′ε∗ − E[Piε∗i ])
′Ω∗−1(n−1P ′ε∗ − E[Piε∗i ])
= op(1) + 2√
∆op(1) = op(1)
Hence, (n−1ε∗′P )Ω∗−1(n−1P ′ε∗) = ∆ + op(1).
Next,
∣∣∣(n−1ε∗′P )Ω∗−1(n−1P ′ε∗)− (n−1ε∗′P )Ω∗−1(n−1P ′ε∗)∣∣∣
≤∣∣∣(n−1P ′ε∗ − n−1P ′ε∗)′Ω∗−1(n−1P ε∗ − n−1P ′ε∗)∣∣∣+ 2
∣∣∣(n−1ε∗′P )Ω∗−1(n−1P ′ε∗ − n−1P ′ε∗)∣∣∣
By the assumption of the theorem, supx∈X |f(x, θ, h)− f(x, θ∗, h∗)| → 0, so that
n−1P ′ε∗ − n−1P ′ε∗ = n−1∑i
Pi(f(Xi, θ, h)− f(Xi, θ∗, h∗)) = op(1)
Thus, ∣∣∣(n−1ε∗′P )Ω∗−1(n−1P ′ε∗)− (n−1ε∗′P )Ω∗−1(n−1P ′ε∗)∣∣∣ = op(1),
and (n−1ε∗′P )Ω∗−1(n−1P ′ε∗) = ∆ + op(1).
Finally,
∣∣∣(n−1ε∗′P )(Ω−1 − Ω∗−1)(n−1P ′ε∗)∣∣∣ ≤ ∣∣∣(n−1ε∗′P )(Ω∗−1(Ω− Ω∗)Ω∗−1(Ω− Ω∗)Ω∗−1)(n−1P ′ε∗)
∣∣∣+∣∣∣(n−1ε∗′P )(Ω∗−1(Ω− Ω∗)Ω∗−1)(n−1P ′ε∗)
∣∣∣ ≤ ||Ω∗−1n−1P ′ε∗||2(||Ω− Ω∗||+ C||Ω− Ω∗||2)→ 0,
It follows from the results above and from the triangle inequality that (n−1ε′P )Ω−1(n−1P ′ε)p→
91
∆.
C.8 Proof of Theorem 8
Proof of this theorem follows along the same lines as proof of Theorem 5.
Recall that
ξ = ε′P (σ2P ′P )−1P ′ε
Define Ω = σ2E[PiP′i ] = σ2Ikn , where Pi = P kn(Xi). First, the conditions of Lemma A.7
hold, so that
n(n−1ε′P )Ω−1(n−1P ′ε)− kn√2kn
d→ N(0, 1) (C.14)
Next, I prove the following result.
Lemma A.12. Suppose that Assumptions 2, 3, 12, 13, 14 hold. Then
n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )Ω−1(n−1P ′ε)√kn
p→ E[d2i ]/σ2 (C.15)
Proof. Step 1. Show that ε can be replaced with ε.
Under the local alternative,
ε = ε+ (gn − f) = ε+ (f ∗n − fn) + (k1/4n /n1/2)d
Thus,
n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)
= (f ∗n − fn)′P (P ′P )−1P ′(f ∗n − fn)/σ2 + (k1/2n /n)d′P (P ′P )−1P ′d/σ2
+ 2(f ∗n − fn)′P (P ′P )−1P ′ε/σ2 + 2(k1/4n /n1/2)d′P (P ′P )−1P ′ε/σ2
+ 2(k1/4n /n1/2)(f ∗n − fn)′P (P ′P )−1P ′d/σ2
As for the first term,
(f ∗n − fn)′P (P ′P )−1P ′(f ∗n − fn)/σ2 ≤ (f ∗n − fn)′(f ∗n − fn)/σ2 = Op(nψn)
92
by the projection inequality, Assumption 10(b) and σ2 p→ σ2.
As for the second term, note that
(k1/2n /n)d′P (P ′P )−1P ′d/σ2 = (k1/2n /n)(d′P (P ′P )−1P ′)(P (P ′P )−1P ′d)/σ2 = (k1/2n /n)d′d/σ2,
where d = P (P ′P )−1P ′d are the fitted values from the nonparametric series regression of d
on P . Next,
∣∣∣d′d− d′d∣∣∣ =∣∣∣(d− d)′(d− d) + 2d′(d− d)
∣∣∣ ≤ ∣∣∣(d− d)′(d− d)∣∣∣+ 2
∣∣∣d′(d− d)∣∣∣
By Lemma 5, (d− d)′(d− d) = Op (n(kn/n+ k−2αn )).
By the Cauchy-Schwartz inequality and Lemma 5,
∣∣∣d′(d− d)∣∣∣ ≤ n
√√√√(∑i
d2i /n
)(∑i
(di − di)2/n
)
= n√E[d2i ](1 + op(1))Op (n(kn/n+ k−2αn )) = Op
(n(kn/n+ k−2αn )1/2
)Thus,
(k1/2n /n)d′P (P ′P )−1P ′d/σ2 = (k1/2n /n)d′d/σ2 +Op
(k1/2n (kn/n+ k−2αn )1/2
)= k1/2n E[d2i ](1 + op(1))/σ2 +Op
(k1/2n (kn/n+ k−2αn )1/2
)where the last equality is due to the law of large numbers and σ2 = σ2 + op(1).
As for the third term,
∣∣∣(f ∗n − fn)′P (P ′P )−1P ′ε/σ2∣∣∣ ≤ ∣∣∣λmax
((σ2P ′P/n)−1
)n−1(f ∗n − fn)′PP ′ε
∣∣∣≤∣∣∣Cλmax(PP
′/n)(f ∗n − fn)′ε∣∣∣
Because P ′P/n and PP ′/n have the same nonzero eigenvalues and all eigenvalues of
P ′P/n converge to one, λmax(PP′/n) converges in probability to 1. Thus,
∣∣∣(f ∗n − fn)′P (P ′P )−1P ′ε/σ2∣∣∣ ≤ ∣∣∣Cλmax(PP
′/n)(f ∗n − fn)′ε∣∣∣ ≤ ∣∣∣C(f ∗n − fn)′ε
∣∣∣
93
By Assumption 14,∣∣∣∑i εi(f
∗ni − fni)
∣∣∣ = op(k1/2n ).
As for the fourth term,
(k1/4n /n1/2)∣∣∣d′P (P ′P )−1P ′ε/σ2
∣∣∣ ≤ (k1/4n /n1/2)∣∣∣λmax
((σ2P ′P/n)−1
)n−1d′PP ′ε
∣∣∣≤ (k1/4n /n1/2)
∣∣∣Cλmax(PP′/n)d′ε
∣∣∣ ≤ (k1/4n /n1/2)∣∣∣Cd′ε∣∣∣
= (k1/4n /n1/2)Op(n1/2) = Op(k
1/4n )
As for the last term,
∣∣∣(k1/4n /n1/2)(f ∗n − fn)′P (P ′P )−1P ′d/σ2∣∣∣ ≤ (k1/4n /n1/2)
∣∣∣λmax
((σ2P ′P/n)−1
)n−1(f ∗n − fn)′PP ′d
∣∣∣≤ (k1/4n /n1/2)
∣∣∣Cλmax(PP′/n)(f ∗n − fn)′d
∣∣∣ ≤ (k1/4n /n1/2)∣∣∣C(f ∗n − fn)′d
∣∣∣By the Cauchy-Schwartz inequality,
∣∣∣(k1/4n /n1/2)(f ∗n − fn)′d∣∣∣ ≤ (k1/4n n1/2)
√√√√(∑i
(f ∗ni − fni)2/n
)(∑i
d2i /n
)
= (k1/4n n1/2)√Op(ψn)E[d2i ](1 + op(1)) = Op(k
1/4n ψ1/2
n n1/2)
Then
n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)
= k1/2n E[d2i ]/σ2 +Op
(k1/2n (kn/n+ k−2αn )1/2
)+Op(k
1/4n ) +Op(k
1/4n ψ1/2
n n1/2) + op(k1/2n )
As long as rate conditions 5.3 and 5.4 hold, this implies that
n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)− n(n−1ε′P )(σ2n−1P ′P )−1(n−1P ′ε)√kn
p→ E[d2i ]/σ2
Step 2. Show that Ω = σ2P ′P/n can be replaced with Ω.
To prove this, I will use the auxiliary result presented in the following lemma.
Lemma A.13. Let Ω = σ2P ′P/n, Ω = σ2P ′P/n, Ω = σ2E[PiP′i ]. Suppose that Assumptions
94
2(ii), 3, 13, and 14 are satisfied. Then
||Ω− Ω|| = Op(knn−1/2) +Op(k
3/2n /n) +Op(k
5/4n ψ1/2
n /n1/2) + op(k3/2n /n)
||Ω− Ω|| = Op(ζ(kn)k1/2n /n1/2)
If Assumption 2(i) is also satisfied then 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C, and if ψ1/2n kn +
ζ(kn)k1/2n /n1/2 → 0, then w.p.a. 1, 1/C ≤ λmin(Ω) ≤ λmax(Ω) ≤ C and 1/C ≤ λmin(Ω) ≤
λmax(Ω) ≤ C.
Proof of Lemma A.13 is presented after the remainder of the main proof.
Using the result of Lemma A.13, similarly to the proof of Theorem 5,∣∣∣∣∣n(n−1ε′P )Ω−1(n−1P ′ε)√2kn
− n(n−1ε′P )Ω−1(n−1P ′ε)√2kn
∣∣∣∣∣ ≤ n||Ω−1n−1P ′ε||2(||Ω− Ω||+ C||Ω− Ω||2)√2kn
=nOp(kn/n)op(1/
√kn)√
2kn=op(√kn)√
2kn= op(1)
provided that ||Ω− Ω|| = op(1/√kn), which holds under rate conditions 5.1 and 5.2.
The result of Theorem 8 follows directly from combining the results in Equations C.14
and C.15.
Proof of Lemma A.13. Under the local alternative,
ε = ε+ (g − fn) = ε+ (f ∗n − fn) + (k1/4n /n1/2)d
Thus,
σ2 = ε′ε/n = ε′ε/n+ (f ∗n − fn)′(f ∗n − fn)/n+ (k1/2n /n)d′d/n
+ 2(f ∗n − fn)′ε/n+ 2(k1/4n /n1/2)(f ∗n − fn)′d/n+ 2(k1/4n /n1/2)d′ε/n
First, by Chebyshev’s inequality, n−1∑
i (ε2i − σ2) = Op(n
−1/2)
Second, by Assumptions 13 and 14, n−1∑
i εi(f∗ni − fni) = op(k
1/2n /n) and n−1
∑i (f
∗ni − fni)2 =
Op(ψn).
Next, by the law of large numbers, (k1/2n /n)d′d/n = (k
1/2n /n)E[d(Xi)
2](1 + op(1)) =
Op(k1/2n /n).
95
In turn,∣∣∣(k1/4n /n1/2)(f ∗n − fn)′d/n
∣∣∣ = Op(k1/4n ψ
1/2n /n1/2).
Finally,
(k1/4n /n1/2)d′ε/n = (k1/4n /n1/2)Op(n−1/2) = Op(k
1/4n /n)
Combining the results,
σ2 − σ2 = Op(n−1/2) +Op(k
1/2n /n) +Op(k
1/4n ψ1/2
n /n1/2) + op(k1/2n /n)
By Lemma 1, E[||Pi||2] ≤ kn, which yields
||Ω− Ω|| = Op(knn−1/2) +Op(k
3/2n /n) +Op(k
5/4n ψ1/2
n /n1/2) + op(k3/2n /n)
The remaining conclusions can be proved as in Lemma A.9.
96
References
Ait-Sahalia, Y., P. J. Bickel, and T. M. Stoker (2001): “Goodness-of-fit tests for ker-
nel regression with an application to option implied volatilities,” Journal of Econometrics,
105, 363 – 412.
Bierens, H. J. (1982): “Consistent model specification tests,” Journal of Econometrics, 20,
105 – 134.
——— (1990): “A Consistent Conditional Moment Test of Functional Form,” Econometrica,
58, 1443–1458.
Bravo, F. (2012): “Generalized empirical likelihood testing in semiparametric conditional
moment restrictions models,” The Econometrics Journal, 15, 1–31.
Breusch, T. S. and A. R. Pagan (1980): “The Lagrange Multiplier Test and its Ap-
plications to Model Specification in Econometrics,” The Review of Economic Studies, 47,
239–253.
Chen, X. and Y. Fan (1999): “Consistent hypothesis testing in semiparametric and non-
parametric models for econometric time series,” Journal of Econometrics, 91, 373 – 401.
de Jong, R. M. and H. J. Bierens (1994): “On the Limit Behavior of a Chi-Square Type
Test If the Number of Conditional Moments Tested Approaches Infinity,” Econometric
Theory, 10, 70–90.
Delgado, M. A. and W. G. Manteiga (2001): “Significance Testing in Nonparametric
Regression Based on the Bootstrap,” The Annals of Statistics, 29, 1469–1507.
Donald, S. G., G. W. Imbens, and W. K. Newey (2003): “Empirical likelihood estima-
tion and consistent tests with conditional moment restrictions,” Journal of Econometrics,
117, 55 – 93.
97
Engle, R. F. (1982): “A general approach to Lagrange multiplier model diagnostics,” Jour-
nal of Econometrics, 20, 83 – 104.
——— (1984): “Chapter 13 Wald, likelihood ratio, and Lagrange multiplier tests in economet-
rics,” Elsevier, vol. 2 of Handbook of Econometrics, 775 – 826.
Fan, Y. and Q. Li (1996): “Consistent Model Specification Tests: Omitted Variables and
Semiparametric Functional Forms,” Econometrica, 64, 865–890.
Finan, F., E. Sadoulet, and A. de Janvry (2005): “Measuring the poverty reduction
potential of land in rural Mexico,” Journal of Development Economics, 77, 27 – 51.
Gao, J., H. Tong, and R. Wolff (2002): “Model Specification Tests in Nonparametric
Stochastic Regression Models,” Journal of Multivariate Analysis, 83, 324 – 359.
Gong, X., A. van Soest, and P. Zhang (2005): “The effects of the gender of children
on expenditure patterns in rural China: a semiparametric analysis,” Journal of Applied
Econometrics, 20, 509–527.
Gozalo, P. L. (1993): “A Consistent Model Specification Test for Nonparametric Estima-
tion of Regression Function Models,” Econometric Theory, 9, 451–477.
Hall, A. (1990): “Lagrange Multiplier Tests for Normality against Seminonparametric Al-
ternatives,” Journal of Business & Economic Statistics, 8, 417–426.
Hausman, J. A. and W. K. Newey (1995): “Nonparametric Estimation of Exact Con-
sumers Surplus and Deadweight Loss,” Econometrica, 63, 1445–1476.
Hong, Y. and H. White (1995): “Consistent Specification Testing Via Nonparametric
Series Regression,” Econometrica, 63, 1133–1159.
Koenker, R. and J. A. Machado (1999): “GMM inference when the number of moment
conditions is large,” Journal of Econometrics, 93, 327 – 344.
98
Lavergne, P. and Q. Vuong (2000): “Nonparametric Significance Testing,” Econometric
Theory, 16, 576–601.
Levinsohn, J. and A. Petrin (2003): “Estimating Production Functions Using Inputs to
Control for Unobservables,” The Review of Economic Studies, 70, 317–341.
Li, Q., C. Hsiao, and J. Zinn (2003): “Consistent specification tests for semiparamet-
ric/nonparametric models based on series estimation methods,” Journal of Econometrics,
112, 295 – 325.
Li, Q. and J. S. Racine (2007): Nonparametric econometrics: theory and practice, Prince-
ton University Press.
Martins, M. F. O. (2001): “Parametric and semiparametric estimation of sample selection
models: an empirical application to the female labour force in Portugal,” Journal of Applied
Econometrics, 16, 23–39.
McCulloch, J. H. and E. R. Percy (2013): “Extended Neyman smooth goodness-of-fit
tests, applied to competing heavy-tailed distributions,” Journal of Econometrics, 172, 275
– 282, latest Developments on Heavy-Tailed Distributions.
Newey, W. K. (1985): “Maximum Likelihood Specification Testing and Conditional Mo-
ment Tests,” Econometrica, 53, 1047–1070.
——— (1997): “Convergence rates and asymptotic normality for series estimators,” Journal
of Econometrics, 79, 147 – 168.
Olley, G. S. and A. Pakes (1996): “The Dynamics of Productivity in the Telecommuni-
cations Equipment Industry,” Econometrica, 64, 1263–1297.
Reiss, P. C. and F. A. Wolak (2007): “Structural Econometric Modeling: Rationales and
Examples from Industrial Organization,” Handbook of Econometrics, Volume 6A, 4277 –
4415.
99
Robinson, P. M. (1988): “Root-N-Consistent Semiparametric Regression,” Econometrica,
56, 931–954.
Schmalensee, R. and T. M. Stoker (1999): “Household Gasoline Demand in the United
States,” Econometrica, 67, 645–662.
Sperlich, S. (2014): “On the choice of regularization parameters in specification testing: a
critical discussion,” Empirical Economics, 47, 427–450.
Sun, Y. and Q. Li (2006): “An alternative series based consistent model specification test,”
Economics Letters, 93, 37 – 44.
Trefethen, L. N. and D. Bau III (1997): Numerical Linear Algebra, vol. 50, SIAM.
Whang, Y.-J. and D. W. Andrews (1993): “Tests of specification for parametric and
semiparametric models,” Journal of Econometrics, 57, 277 – 318.
Yatchew, A. (1997): “An elementary estimator of the partial linear model,” Economics
Letters, 57, 135 – 143.
——— (1998): “Nonparametric Regression Techniques in Economics,” Journal of Economic
Literature, 36, 669–721.
Yatchew, A. and J. A. No (2001): “Household Gasoline Demand in Canada,” Economet-
rica, 69, 1697–1709.
Yatchew, A. J. (1992): “Nonparametric Regression Tests Based on Least Squares,” Econo-
metric Theory, 8, 435–451.
100