Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV...
Transcript of Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV...
Optimal Sup-norm Rate, Adaptive Estimation,and Inference on NPIV
Xiaohong Chen (Yale) and Tim Christensen (NYU)
Cemmap Celebration Conference | Andrew’s Birthday ConferenceNovember 14-16, 2014
Introduction (1)
I we consider nonparametric instrumental variables (NPIV) regression:
Yi = h0(Xi) + ui
E[ui|Xi] 6= 0
E[ui|Wi] = 0
I endogeneity is an important issue in economicsI nonparametric in h0 avoids functional form misspecification
I h0 is identified via the conditional moment restriction
E[Yi|Wi] = E[h0(Xi)|Wi]
I this “smoothes out” features of h0, making h0 difficult to recoverI NIPV is an ill-posed inverse problem with unknown operator
Introduction (2)
I there is a large and growing literature on NPIV:
1. identification/consistency: Newey & Powell (03); Carrasco, Florens& Renault (07); Andrews (11);...
2. convergence rates in L2 norm: Hall & Horowitz (05); Blundell, Chen& Kristensen (BCK, 07); Chen & Reiß (11); Darolles, Fan, Florens &Renault (11);...
3. almost rate-adaptive estimation in L2: Horowitz (14)4. almost rate-adaptive estimation of linear functionals: Breunig &
Johannes (13)5. inference on linear functionals of h0: Ai & Chen (AC, 03, 07);
Carrasco, Florens & Renault (07) Horowitz & Lee (13)6. inference on nonlinear functionals of h0: Chen & Pouzo (14)7. testing: Horowitz (12); Canay, Santos & Shaikh (13); Breunig
(13);...8. partial identification: Santos (12); Freyberger & Horowitz (13);...
I all the existing published results on NPIV are based on L2 norm.
Contributions of this paper1. we derive the upper bound on sup-norm convergence rates for
general sieve NPIV estimators.
2. we derive minimax lower bounds in sup-norm loss over Holder classof functions for NPIR (nonparametric indirect regression) and NPIV.
3. we show that spline and wavelet sieve NPIV estimators attain thesup-norm minimax lower bounds, and hence attain the optimalsup-norm convergence rates.
4. we introduce a data-driven procedure for choosing the dimension ofthe sieve NPIV that is sup-norm rate-adaptive
5. we provide inference theory for plug-in sieve NPIV estimators ofnonlinear functionals of h0 under mild conditions.
I An application: inference on exact consumer surplus innonparametric demand estimation when both price and income areendogenous.
Parametric vs nonparametric IV
I parametric IV model
Yi = X ′iβ0 + ui
E[uiXi] 6= 0
E[uiWi] = 0
I identified if rank(E[XiW′i ]) = dim(β0)
I nonparametric IV model
Yi = h0(Xi) + ui
E[ui|Xi] 6= 0
E[ui|Wi] = 0
I identified if h 7→ E[h(Xi)|Wi = ·] is injective
Parametric vs sieve nonparametric IV
I A parametric IV model can be estimated via 2SLS:
β = [X ′W (W ′W )−1W ′X]−1[X ′W (W ′W )−1W ′Y ]
I NP (03), AC (03), BCK (07): A nonparametric IV model can beestimated via sieve NPIV, i.e., 2SLS on basis functions
h(x) = ψJ(x)′c
c = [Ψ′B(B′B)−1B′Ψ]−1Ψ′B(B′B)−1B′Y
ψJ(x) = (ψJ1(x), . . . , ψJJ(x))′, Ψ = (ψJ(X1), . . . , ψJ(Xn))′
bK(w) = (bK1(w), . . . , bKK(w))′, B = (bK(W1), . . . , bK(Wn))′
I K ≥ J , with J = sieve number of endogenous regressors (the keysmoothing parameter), K = sieve number of instruments.
I Horowitz (11): modified sieve NPIV: K = J andbK = ψJ=orthognormal series of L2([0, 1]d).
Outline
1. Optimal sup-norm rates
2. Sup-norm rate-adaptive estimation
3. MC study I: Adaptive estimation procedure
4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals
5. MC study II: Bootstrap uniform confidence sets
Outline
1. Optimal sup-norm rates
2. Sup-norm rate-adaptive estimation
3. MC study I: Adaptive estimation procedure
4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals
5. MC study II: Bootstrap uniform confidence sets
Preliminaries: measuring ill-posedness
I let ΠK : L2(W )→ BK denote the orthogonal projection onto thesieve space BK = ∨bK1, . . . , bKK
I weak norm ‖h‖w,2 = ‖ΠKTh‖L2(W ) where Th(Wi) = E[h(Xi)|Wi]
I BCK (07): sieve measure of ill-posedness
s−1JK = sup
h∈ΨJ :‖h‖w,2 6=0
‖h‖L2(X)
‖h‖w,2=
1
smin(G−1/2ψ S′G
−1/2b )
where Gb = Gb,K = E[bK(Wi)bK(Wi)
′],Gψ = Gψ,J = E[ψJ(Xi)ψ
J(Xi)′],
S′ = S′JK = E[ψJ(Xi)bK(Wi)
′].
I the NPIV model is said to beI mildly ill-posed if s−1
JK = O(Jς/d) for some ς > 0I severely ill-posed if s−1
JK = O(exp( 12Jς/d)) for some ς > 0
Preliminaries: roughness properties of the sieve
I following Newey (97), we define ζ(J) = ζb(K) ∨ ζψ(J),
ζb(K) := supw‖G−1/2
b bK(w)‖`2
ζψ(J) := supx‖G−1/2
ψ ψJ(x)‖`2
I we also introduceξψ(J) := sup
x‖ψJ(x)‖`1
which is better suited to studying sup-norm rates
I sup-norm variance term depends on ξψ(J), eJ = λmin(Gψ,J), s−1JK
Assumptions imposed for sup-norm rate
1. (i) (Xi, Yi,Wi)ni=1 is an i.i.d. sample; (ii) X has compact supportX ⊂ Rd with nonempty interior; W has support W ⊂ Rdw ; (iii)supx |h0(x)| <∞; (iv) h 7→ E[h(X)|W = ·] is injective on L∞(X)
2. (i) supw E[u2i |Wi = w] ≤ σ2; (ii) E[|ui|(2+δ)] <∞ for some δ > 0.
3. (i) λmin(Gb,K) > 0; eJ = λmin(Gψ,J) > 0; J ≤ K;
(ii) s−1JKζ(J)
√(J log J)/n = o(1);
(iii) ζb(K)(2+δ)/δ√
(log J)/n = o(1)
4. there exists πJh0 ∈ ΨJ such that: (i) ‖h0 − πJh0‖∞ ≤ C∗J−p/d;(ii) s−1
JK‖h0 − πJh0‖w,2 ≤ C∗2‖h0 − πJh0‖L2(X);(iii) ‖QJ(h0 − πJh0)‖∞ ≤ C∗∞‖h0 − πJh0‖∞with QJ : L2(X)→ ΨJ the oblique projectionQJh(x) = ψJ(x)[S′G−1
b S]−1S′G−1b E[bK(Wi)h(Xi)].
Upper bound (1)
Theorem (Upper bound for NPIV)Let Assumptions 1–4 hold. Then:
‖h− h0‖∞ = Op
(J−p/d + s−1
JKξψ(J)√
(log J)/(neJ)).
I For Cohen-Daubechies-Vial (CDV) wavelets and B-splines, we showthat [ξψ(J)]2/eJ = O(J), hence
‖h− h0‖∞ = Op
(J−p/d + s−1
JK
√(J log J)/n
).
Upper bound (2)
CorollaryLet X = [0, 1]d, 0 < infx f(x), supx f(x) <∞, and let ΨJ be spannedby a CDV wavelet basis or B-spline basis of sufficient regularity. Then:
Mildly ill-posed case: Choosing J K (n/ log n)d/(2(p+ς)+d) yields:
‖h0 − h‖∞ = Op
((n/ log n)−p/(2(p+ς)+d)
).
Severely ill-posed case: Choosing J = c′0(log n)d/ς for any c′0 ∈ (0, 1)and K = c0J for some finite c0 ≥ 1 yields:
‖h0 − h‖∞ = Op
((log n)−p/ς
).
Optimality (1)
I Chen and Reiß (11) showed that the L2(X) rates
I ‖h− h0‖L2(X) = Op(n−p/(2(p+ς)+d)) in the mildly ill-posed case
I ‖h− h0‖L2(X) = Op((logn)−p/ς) in the severely ill-posed case
are optimal in a L2- minimax sense
I sup norm ≥ L2 norm
I therefore our sup-norm rate are optimal in the severely ill-posed case
I what about the mildly ill-posed case?
I now derive the minimax lower bound in sup-norm loss, i.e. the ratern over a parameter space H s.t.
lim infn→∞
infhn
suph∈H
Ph(‖h− hn‖∞ ≥ crn
)≥ c′ > 0,
for constants c, c′.
Optimality (2)
I trick: rewrite the NPIV model in terms of a nonparametric indirectregression (NPIR) model:
Yi = E[h0(Xi)|Wi] + εi
E[εi|Wi] = 0
εi ∼ N(0, σ0(Wi)2)
where E[ · |Wi] is known and σ0(·)2 ≥ σ20 > 0
I NPIV:
Yi = h0(Xi) + E[h0(Xi)|Wi]− h0(Xi) + εi︸ ︷︷ ︸=:ui
where by construction E[ui|Wi] = 0
I NPIR is more informative than NPIV
I implication: lower bound for NPIV ≥ lower bound for NPIR
Lower bound for NPIR
Assumption (S)(i) h0 ∈ Bp∞,∞([0, 1]d), (ii) there is a ς > 0 such that
‖Th‖L2(X) . ‖h‖B−ς2,2
for all h ∈ B(p, L) := h ∈ Bp∞,∞([0, 1]d) : ‖h‖Bp∞,∞ ≤ L.
Theorem (Lower bound for NPIR)Let Assumption S hold for the NPIR model with a random sample(Yi,Wi)ni=1. Then:
lim infn→∞
infhn
suph∈B(p,L)
Ph(‖h− hn‖∞ ≥ c(n/ log n)−p/(2(p+ς)+d)
)≥ c′ > 0,
where inf hndenotes the infimum over all estimators based on the sample
of size n, and the constants c, c′ depend only on p, L, d, ς, σ0 .
Lower bound for NPIV
Corollary (Lower bound for NPIV)Let Assumption S hold for the NPIV model with a random sample(Xi, Yi,Wi)ni=1 and infw E[u2|W = w] ≥ σ2
0. Then:
lim infn→∞
infhn
suph∈B(p,L)
Ph(‖h− hn‖∞ ≥ c(n/ log n)−p/(2(p+ς)+d)
)≥ c′ > 0,
where inf hndenotes the infimum over all estimators based on the sample
of size n, and the constants c, c′ depend only on p, L, d, ς.
Outline
1. Optimal sup-norm rates
2. Sup-norm rate-adaptive estimation
3. MC study I: Adaptive estimation procedure
4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals
5. MC study II: Bootstrap uniform confidence sets
Adaptive estimation for NPIV
I must choose J optimally to attain optimal rates
I optimal choice depends on the unknown p and s−1JK
I want a data-driven method for choosing J optimally
I existing methods focus on L2 loss, minimizing a MSE-type criterionI Horowitz (14): modified sieve NPIV: K = J andbK = ψJ=orthonormal series of L2([0, 1]d).
I optimal in L2 up to a logn factorI Liu & Tao (14): Mallows Cp model selection of sieve NPIV assuming
homoskedastic error.
I CV/AIC/BIC/Mallows criteria aren’t well suited to sup-norm rates
I we introduce a sup-norm adaptive Lepski-type procedure
Lepski-type procedureI set K = K(J) J deterministically (e.g. K = c0J + a)
I choose J by the following method. Define the sets:
J0 =j ∈ [Jmin, Jmax] : j−p/d ≤ C0Vsup(j)
J =
j ∈ [Jmin, Jmax] : ‖hj − hl‖∞ ≤
√2σ[Vsup(j) + Vsup(l)]
∀ l ∈ (j, Jmax]
where
Vsup(j) = s−1jK(j)ξψ(j)
√(log n)/(nej)
Vsup(j) = s−1jK(j)ξψ(j)
√(log n)/(nej)
sJK(J) = smin((Ψ′Ψ)−1/2(Ψ′B)(B′B)−1/2), eJ = λmin(Ψ′Ψ/n).
I J0 = minj∈J0j is optimal but infeasible
I J = minj∈J j is our data-driven estimator of J
I hJ denotes the sieve NPIV estimator with J = J , K = K(J)
Heuristic argument
I Suppose J0 ⊆ J . Then:
J := min J ≤ J0 := minJ0
and:
‖hJ − h0‖∞ ≤ ‖hJ0 − h0‖∞ + ‖hJ − hJ0‖∞≤ ‖hJ0 − h0‖∞
+2√
2σs−1J0K(J0)ξψ(J0)
√(log n)/(neJ0)
. ‖hJ0 − h0‖∞+s−1
J0K(J0)ξψ(J0)√
(log n)/(neJ0) wpa1
⇒ ‖hJ − h0‖∞ = Op(J−p/d0 + s−1
J0K(J0)ξψ(J0)√
(log n)/(neJ0))
I implication: J is rate adaptive to the oracle J0
Choosing Jmax
I still need to choose Jmax
I data-driven estimator of Jmax:
Jmax = minJ > Jmin : s−1JK(J)ζ(J)
√(JL(J) log n)/n ≥ 1
where L(J) = a log(log(J)) for some constant a > 0.
Oracle propertyI now consider the special case with a CDV wavelet or B-spline sieve,
rectangular support, and well-behaved density
Theorem (Adaptivity)Let Assumptions 1–4 hold and s−1
JmaxK(Jmax)
√(J2
max log n)/n = o(1).
Then: Jmax ≤ Jmax ≤ Jmax wpa1; and
J0 ⊆ J wpa1
and so:
‖hJ − h0‖∞ = Op(J−p/d0 + s−1
J0K(J0)
√(J0 log n)/n) .
I implication: sup-norm rate adaptive in the mildly and severelyill-posed cases; no loss of log n factor.
I automatically implies L2(X)-norm rate adaptive in the severelyill-posed case, and almost adaptive in the mildly ill-posed case (upto log n factor).
Outline
1. Optimal sup-norm rates
2. Sup-norm rate-adaptive estimation
3. MC study I: Adaptive estimation procedure
4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals
5. MC study II: Bootstrap uniform confidence sets
MC design
I Newey and Powell (03) design, but with compact support: generate UiV ∗iW ∗i
∼ N 0
00
,
1 0.5 00.5 1 00 0 1
and set Xi = Φ((W ∗i + V ∗i )/
√2) and Wi = Φ(W ∗i )
I linear design: h0(x) = 4x− 2
I nonlinear design: h0(x) = log(|6x− 3|+ 1)sgn(x− 12 )
I generate 1000 samples of length 1000
I implement with cubic/quartic B-splines (with nested knots) andLegendre polynomials
I use σ = 1 (true σ) and σ = .1
I take L(J) = 110 log log J in definition of J
I compare sup-norm and L2-norm error of Lepski procedure againstinfeasible choice of J which minimizes sup-norm error in each sample
MC design: linear h0 (black), nonlinear h0 (red)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
MC design: scatter plot of (Xi,Wi)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X
W
MC design: scatter plot of (Xi, Yi) with nonlinear h0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−5
−4
−3
−2
−1
0
1
2
3
4
5
X
MC results: Lepski procedure, linear design
Table 1: Linear design, cubic (r = 4) and quartic (r = 5) B-spline bases
Lepski (σ = 1) Lepski (σ = 0.1) InfeasiblerJ rK L∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss
Results with K(J) = J − rJ + rKMean 4 4 0.4262 0.1547 0.4262 0.1547 0.4141 0.1608Med. 4 4 0.3828 0.1394 0.3828 0.1394 0.3708 0.1443Mean 4 5 0.4179 0.1524 0.4209 0.1536 0.3937 0.1540Med. 4 5 0.3681 0.1368 0.3692 0.1370 0.3476 0.1368Mean 5 5 0.6633 0.2355 0.6633 0.2355 0.6243 0.2494Med. 5 5 0.6007 0.2202 0.6007 0.2202 0.5646 0.2311
Results with K(J) = 2(J − rJ) + rK + 1Mean 4 4 0.4188 0.1526 0.4188 0.1526 0.3895 0.1552Med. 4 4 0.3696 0.1375 0.3696 0.1375 0.3470 0.1371Mean 4 5 0.3918 0.1439 0.3945 0.1449 0.3720 0.1486Med. 4 5 0.3430 0.1291 0.3430 0.1291 0.3295 0.1311Mean 5 5 0.6366 0.2277 0.6366 0.2277 0.5816 0.2352Med. 5 5 0.5800 0.2089 0.5800 0.2089 0.5228 0.2111
MC results: Lepski procedure, linear design
Table 2: Linear design, Legendre polynomial bases
Lepski (σ = 1) Lepski (σ = 0.1) InfeasibleL∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss
Results with K(J) = JMean 0.0882 0.0492 0.2943 0.1185 0.0869 0.0494Med. 0.0777 0.0452 0.1674 0.0810 0.0764 0.0453
Results with K(J) = 2JMean 0.0878 0.0490 0.2745 0.1119 0.0862 0.0492Med. 0.0779 0.0453 0.1640 0.0807 0.0766 0.0455
MC results: Lepski procedure, nonlinear design
Table 3: Nonlinear design, cubic (r = 4) and quartic (r = 5) B-spline bases
Lepski (σ = 1) Lepski (σ = 0.1) InfeasiblerJ rK L∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss
Results with K(J) = J − rJ + rKMean 4 4 0.4343 0.1621 0.4343 0.1621 0.4233 0.1671Med. 4 4 0.3855 0.1469 0.3855 0.1469 0.3748 0.1503Mean 4 5 0.4262 0.1600 0.4271 0.1605 0.4030 0.1615Med. 4 5 0.3738 0.1444 0.3744 0.1445 0.3514 0.1445Mean 5 5 0.6726 0.2407 0.6726 0.2407 0.6318 0.2531Med. 5 5 0.6069 0.2278 0.6069 0.2278 0.5646 0.2345
Results with K(J) = 2(J − rJ) + rK + 1Mean 4 4 0.4271 0.1601 0.4286 0.1609 0.3987 0.1623Med. 4 4 0.3764 0.1445 0.3764 0.1445 0.3518 0.1443Mean 4 5 0.4002 0.1518 0.4029 0.1528 0.3812 0.1563Med. 4 5 0.3410 0.1384 0.3414 0.1384 0.3258 0.1402Mean 5 5 0.6471 0.2330 0.6471 0.2330 0.5895 0.2390Med. 5 5 0.5797 0.2143 0.5797 0.2143 0.5341 0.2141
MC results: Lepski procedure, nonlinear design
Table 4: Nonlinear design, Legendre polynomial bases
Lepski (σ = 1) Lepski (σ = 0.1) InfeasibleL∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss
Results with K(J) = JMean 0.2494 0.1305 0.4283 0.1719 0.2297 0.1224Med. 0.2367 0.1266 0.3210 0.1426 0.2218 0.1243
Results with K(J) = 2JMean 0.2475 0.1306 0.4063 0.1644 0.2241 0.1208Med. 0.2346 0.1267 0.3132 0.1395 0.2178 0.1242
Outline
1. Optimal sup-norm rates
2. Sup-norm rate-adaptive estimation
3. MC study I: Adaptive estimation procedure
4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals
5. MC study II: Bootstrap uniform confidence sets
Pointwise and uniform inference
I Our sup-norm rates allow for low-level mild conditions forasymptotic normality of plug-in sieve NPIV estimators of possiblynonlinear functionals of h0 in two cases
1. “pointwise” inference on f(h0)I e.g.: exact consumer surplus of a price variation from p0 to p1 at
income i
Qi = h0(Pi, Ii) + ui
f(h0) = S(p0)
where S′i(p) = −h0(p, i− Si(p))
Si(p1) = 0
cf. Hausman & Newey (95), Vanhems (10), Blundell et al. (12)
2. “uniform” inference on fτ (h0) : τ ∈ T where T ⊂ RdTI e.g.: uniform inference on consumer surplus/deadweight loss
Pointwise inference (1)I we focus on slower than root-n functionals that are bounded wrt the
sup norm:
5. (i) there exists a linear functional Df(h0)[·] and constant C s.t.
|f(h)− f(h0)−Df(h0)[h− h0]| ≤ C‖h− h0‖2∞
for all h ∈ Nn(h0) where h ∈ Nn wpa1;
(ii) V−1/2n ‖h− h0‖2∞ = op(n
− 12 )
I includes CS/DWL functionals and quadratic functional.
I sufficient for a more general condition of Chen and Pouzo (14).
I here Vn ∞ is the sieve variance
Vn = Df(h0)[ψJ ]′ΣnDf(h0)[ψJ ]
Σn = [S′G−1b S]−1
(S′G−1
b ΩG−1b S
)[S′G−1
b S]−1,
where S = E[bK(Wi)ψJ(Xi)
′], Ω = E[u2i bK(Wi)b
K(Wi)′]
Pointwise inference (2)
Theorem (Pointwise asymptotic normality of sieve t-statistics)Let Assumptions 1–5 (etc) hold. Then:
√n(f(h)− f(h0))
V1/2n
→d N(0, 1).
I here Vn ∞ is the sieve variance estimator
Vn = Df(h)[ψJ ]′ΣDf(h)[ψJ ]
Σ = [S′G−1b S]−1
(S′G−1
b ΩG−1b S
)[S′G−1
b S]−1
S = B′Ψ/n, Gb = (B′B/n), Ω = n−1∑ni=1 u
2i bK(Wi)b
K(Wi)′.
I just like 2SLS variance estimator but using basis functions. Chenand Pouzo (14), Newey (13).
Uniform inference (1)
I now impose a uniform (for τ ∈ T ) version of Assumption 5
5′. (i) Dfτ (h0)[·] is a linear functional for each τ ∈ T , (ii) there exists aconstant C s.t.
supτ∈T|fτ (h)− fτ (h0)−Dfτ (h0)[h− h0]| ≤ C‖h− h0‖2∞
for all h ∈ Nn(h0) where h ∈ Nn wpa1;
(ii) supτ∈T V−1/2τ,n ‖h− h0‖2∞ = op(n
− 12 )
I here Vτ,n = Dfτ (h0)[ψJ ]′ΣnDfτ (h0)[ψJ ]
I estimate with Vτ,n = Dfτ (h)[ψJ ]′ΣDfτ (h)[ψJ ]
Uniform inference (2)
Theorem (Uniform asymptotic normality of sieve t-statistics)Let Assumptions 1–5’ (etc) hold. Then there exists a sequence of tightGaussian processes Gn on `∞(T ) with covariance function
E[Gn(t1)Gn(t2)] =Dft1(h0)[ψJ ]′ΣnDft2(h0)[ψJ ]
V1/2t1,nV
1/2t2,n
and random variables Zn =d supτ∈T |Gn(τ)| such that
supτ∈T
∣∣∣∣∣√n(fτ (h)− fτ (h0))
V1/2τ,n
∣∣∣∣∣ = Zn + op(1)
as n, J,K →∞.
I we follow Chernozhukov, Chetverikov, Kato (14) (also seeChernozhukov, Lee, Rosen (13)) construction rather than strongapproximation
Example: uniform confidence bands
I fτ (h0) = h0(τ) with T = X , and Dfτ (h)[ψJ ] = ψJ(τ)
I by previous theorem, there exists a sequence of tight Gaussianprocesses Gn on `∞(X ) with covariance function
E[Gn(x1)Gn(x2)] =ψJ(x1)′Σnψ
J(x2)
V1/2x1,nV
1/2x2,n
and random variables Zn =d supx∈X |Gn(x)| such that
supx∈X
∣∣∣∣∣√n(h(x)− h0(x))
V1/2x,n
∣∣∣∣∣ = Zn + op(1)
as n, J,K →∞.
I invert for uniform confidence band
Example: uniform inference on exact consumer surplus
I fτ (h0) = Si(p) with T = [p0, p0]× [i, i]
Dfτ (h)[ψJ ] = −∫ p
p1ψJ(t, i− Si(t))e
∫ sp∂2h0(u,i−Si(u)) du dt
Dfτ (h)[ψJ ] = −∫ p
p1ψJ(t, i− Si(t))e
∫ sp∂2h(u,i−Si(u)) du dt
I uniform asymptotic normality of Si(p) : (p, i) ∈ [p0, p0]× [i, i]follows from previous theorem
I could equally consider uniform inference on deadweight loss
I our sup-norm rates here are critical to control bias
Outline
1. Optimal sup-norm rates
2. Sup-norm rate-adaptive estimation
3. MC study I: Adaptive estimation procedure
4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals
5. MC study II: Bootstrap uniform confidence sets
MC design
I same Newey and Powell (03) linear and nonlinear designs
I estimate Jmax as before, use J = Jmax, K = K(Jmax) toimplement sieve NPIV estimator
I estimate critical values for uniform confidence bands using the sievescore bootstrap (Chen and Pouzo, 14) with Mammen (93) two-pointdistribution with 1000 bootstrap replications for each sample
I computationally simpler than bootstrap sieve t stat in Chen-Pouzo(14) or the bootstrap in Horowitz-Lee (12).
I compare MC with nominal coverage probabilities
Estimated UCBs (dashed), h (black line), h0 (red line)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
MC results: coverage probabilities
Table 5: Linear and nonlinear design, cubic (r = 4) and quartic (r = 5)
B-spline bases.
rJ rK 90% 95% 99% 90% 95% 99%
linear 4 4 0.933 0.966 0.996 0.944 0.971 0.994linear 4 5 0.937 0.975 0.995 0.937 0.963 0.994linear 5 5 0.961 0.983 0.997 0.959 0.985 0.997nonlinear 4 4 0.884 0.945 0.987 0.912 0.956 0.989nonlinear 4 5 0.894 0.946 0.987 0.906 0.951 0.987nonlinear 5 5 0.956 0.978 0.995 0.951 0.979 0.996
Note: Left panel uses K(J) = J − rJ + rK , right panel uses K(J) =2(J − rJ) + rK + 1.
MC results: coverage probabilities
Table 6: Linear and nonlinear design, Legendre polynomial bases
90% 95% 99% 90% 95% 99%
linear 0.937 0.964 0.997 0.928 0.959 0.989nonlinear 0.901 0.952 0.988 0.906 0.948 0.989
Note: Left panel uses K(J) = J , right panel uses K(J) = 2J .
Conclusions
I contributions:
1. optimal sup-norm rates and attainability by sieve estimators2. Lepski procedure for adaptive estimation in sup norm3. pointwise and uniform inference on possibly nonlinear functionals
I first such results for NPIV (or indeed any ill-posed inverse problemwith unknown operator)
I application to inference on consumer surplus in demand estimationand uniform confidence bands