Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV...

Optimal Sup-norm Rate, Adaptive Estimation,and Inference on NPIV

Xiaohong Chen (Yale) and Tim Christensen (NYU)

Cemmap Celebration Conference | Andrew’s Birthday ConferenceNovember 14-16, 2014

Introduction (1)

I we consider nonparametric instrumental variables (NPIV) regression:

Yi = h0(Xi) + ui

E[ui|Xi] 6= 0

E[ui|Wi] = 0

I endogeneity is an important issue in economicsI nonparametric in h0 avoids functional form misspecification

I h0 is identified via the conditional moment restriction

E[Yi|Wi] = E[h0(Xi)|Wi]

I this “smoothes out” features of h0, making h0 difficult to recoverI NIPV is an ill-posed inverse problem with unknown operator

Introduction (2)

I there is a large and growing literature on NPIV:

1. identification/consistency: Newey & Powell (03); Carrasco, Florens& Renault (07); Andrews (11);...

2. convergence rates in L2 norm: Hall & Horowitz (05); Blundell, Chen& Kristensen (BCK, 07); Chen & Reiß (11); Darolles, Fan, Florens &Renault (11);...

3. almost rate-adaptive estimation in L2: Horowitz (14)4. almost rate-adaptive estimation of linear functionals: Breunig &

Johannes (13)5. inference on linear functionals of h0: Ai & Chen (AC, 03, 07);

Carrasco, Florens & Renault (07) Horowitz & Lee (13)6. inference on nonlinear functionals of h0: Chen & Pouzo (14)7. testing: Horowitz (12); Canay, Santos & Shaikh (13); Breunig

(13);...8. partial identification: Santos (12); Freyberger & Horowitz (13);...

I all the existing published results on NPIV are based on L2 norm.

Contributions of this paper1. we derive the upper bound on sup-norm convergence rates for

general sieve NPIV estimators.

2. we derive minimax lower bounds in sup-norm loss over Holder classof functions for NPIR (nonparametric indirect regression) and NPIV.

3. we show that spline and wavelet sieve NPIV estimators attain thesup-norm minimax lower bounds, and hence attain the optimalsup-norm convergence rates.

4. we introduce a data-driven procedure for choosing the dimension ofthe sieve NPIV that is sup-norm rate-adaptive

5. we provide inference theory for plug-in sieve NPIV estimators ofnonlinear functionals of h0 under mild conditions.

I An application: inference on exact consumer surplus innonparametric demand estimation when both price and income areendogenous.

Parametric vs nonparametric IV

I parametric IV model

Yi = X ′iβ0 + ui

E[uiXi] 6= 0

E[uiWi] = 0

I identified if rank(E[XiW′i ]) = dim(β0)

I nonparametric IV model

Yi = h0(Xi) + ui

E[ui|Xi] 6= 0

E[ui|Wi] = 0

I identified if h 7→ E[h(Xi)|Wi = ·] is injective

Parametric vs sieve nonparametric IV

I A parametric IV model can be estimated via 2SLS:

β = [X ′W (W ′W )−1W ′X]−1[X ′W (W ′W )−1W ′Y ]

I NP (03), AC (03), BCK (07): A nonparametric IV model can beestimated via sieve NPIV, i.e., 2SLS on basis functions

h(x) = ψJ(x)′c

c = [Ψ′B(B′B)−1B′Ψ]−1Ψ′B(B′B)−1B′Y

ψJ(x) = (ψJ1(x), . . . , ψJJ(x))′, Ψ = (ψJ(X1), . . . , ψJ(Xn))′

bK(w) = (bK1(w), . . . , bKK(w))′, B = (bK(W1), . . . , bK(Wn))′

I K ≥ J , with J = sieve number of endogenous regressors (the keysmoothing parameter), K = sieve number of instruments.

I Horowitz (11): modified sieve NPIV: K = J andbK = ψJ=orthognormal series of L2([0, 1]d).

Outline

1. Optimal sup-norm rates

2. Sup-norm rate-adaptive estimation

3. MC study I: Adaptive estimation procedure

4. Application: Asymptotic normality of plug-in NPIV of nonlinearfunctionals

5. MC study II: Bootstrap uniform confidence sets

Preliminaries: measuring ill-posedness

I let ΠK : L2(W )→ BK denote the orthogonal projection onto thesieve space BK = ∨bK1, . . . , bKK

I weak norm ‖h‖w,2 = ‖ΠKTh‖L2(W ) where Th(Wi) = E[h(Xi)|Wi]

I BCK (07): sieve measure of ill-posedness

s−1JK = sup

h∈ΨJ :‖h‖w,2 6=0

‖h‖L2(X)

‖h‖w,2=

1

smin(G−1/2ψ S′G

−1/2b )

where Gb = Gb,K = E[bK(Wi)bK(Wi)

′],Gψ = Gψ,J = E[ψJ(Xi)ψ

J(Xi)′],

S′ = S′JK = E[ψJ(Xi)bK(Wi)

′].

I the NPIV model is said to beI mildly ill-posed if s−1

JK = O(Jς/d) for some ς > 0I severely ill-posed if s−1

JK = O(exp( 12Jς/d)) for some ς > 0

Preliminaries: roughness properties of the sieve

I following Newey (97), we define ζ(J) = ζb(K) ∨ ζψ(J),

ζb(K) := supw‖G−1/2

b bK(w)‖`2

ζψ(J) := supx‖G−1/2

ψ ψJ(x)‖`2

I we also introduceξψ(J) := sup

x‖ψJ(x)‖`1

which is better suited to studying sup-norm rates

I sup-norm variance term depends on ξψ(J), eJ = λmin(Gψ,J), s−1JK

Assumptions imposed for sup-norm rate

1. (i) (Xi, Yi,Wi)ni=1 is an i.i.d. sample; (ii) X has compact supportX ⊂ Rd with nonempty interior; W has support W ⊂ Rdw ; (iii)supx |h0(x)| <∞; (iv) h 7→ E[h(X)|W = ·] is injective on L∞(X)

2. (i) supw E[u2i |Wi = w] ≤ σ2; (ii) E[|ui|(2+δ)] <∞ for some δ > 0.

3. (i) λmin(Gb,K) > 0; eJ = λmin(Gψ,J) > 0; J ≤ K;

(ii) s−1JKζ(J)

√(J log J)/n = o(1);

(iii) ζb(K)(2+δ)/δ√

(log J)/n = o(1)

4. there exists πJh0 ∈ ΨJ such that: (i) ‖h0 − πJh0‖∞ ≤ C∗J−p/d;(ii) s−1

JK‖h0 − πJh0‖w,2 ≤ C∗2‖h0 − πJh0‖L2(X);(iii) ‖QJ(h0 − πJh0)‖∞ ≤ C∗∞‖h0 − πJh0‖∞with QJ : L2(X)→ ΨJ the oblique projectionQJh(x) = ψJ(x)[S′G−1

b S]−1S′G−1b E[bK(Wi)h(Xi)].

Upper bound (1)

Theorem (Upper bound for NPIV)Let Assumptions 1–4 hold. Then:

‖h− h0‖∞ = Op

(J−p/d + s−1

JKξψ(J)√

(log J)/(neJ)).

I For Cohen-Daubechies-Vial (CDV) wavelets and B-splines, we showthat [ξψ(J)]2/eJ = O(J), hence

‖h− h0‖∞ = Op

(J−p/d + s−1

JK

√(J log J)/n

).

Upper bound (2)

CorollaryLet X = [0, 1]d, 0 < infx f(x), supx f(x) <∞, and let ΨJ be spannedby a CDV wavelet basis or B-spline basis of sufficient regularity. Then:

Mildly ill-posed case: Choosing J K (n/ log n)d/(2(p+ς)+d) yields:

‖h0 − h‖∞ = Op

((n/ log n)−p/(2(p+ς)+d)

).

Severely ill-posed case: Choosing J = c′0(log n)d/ς for any c′0 ∈ (0, 1)and K = c0J for some finite c0 ≥ 1 yields:

‖h0 − h‖∞ = Op

((log n)−p/ς

).

Optimality (1)

I Chen and Reiß (11) showed that the L2(X) rates

I ‖h− h0‖L2(X) = Op(n−p/(2(p+ς)+d)) in the mildly ill-posed case

I ‖h− h0‖L2(X) = Op((logn)−p/ς) in the severely ill-posed case

are optimal in a L2- minimax sense

I sup norm ≥ L2 norm

I therefore our sup-norm rate are optimal in the severely ill-posed case

I what about the mildly ill-posed case?

I now derive the minimax lower bound in sup-norm loss, i.e. the ratern over a parameter space H s.t.

lim infn→∞

infhn

suph∈H

Ph(‖h− hn‖∞ ≥ crn

)≥ c′ > 0,

for constants c, c′.

Optimality (2)

I trick: rewrite the NPIV model in terms of a nonparametric indirectregression (NPIR) model:

Yi = E[h0(Xi)|Wi] + εi

E[εi|Wi] = 0

εi ∼ N(0, σ0(Wi)2)

where E[ · |Wi] is known and σ0(·)2 ≥ σ20 > 0

I NPIV:

Yi = h0(Xi) + E[h0(Xi)|Wi]− h0(Xi) + εi︸︷︷︸=:ui

where by construction E[ui|Wi] = 0

I NPIR is more informative than NPIV

I implication: lower bound for NPIV ≥ lower bound for NPIR

Lower bound for NPIR

Assumption (S)(i) h0 ∈ Bp∞,∞([0, 1]d), (ii) there is a ς > 0 such that

‖Th‖L2(X) . ‖h‖B−ς2,2

for all h ∈ B(p, L) := h ∈ Bp∞,∞([0, 1]d) : ‖h‖Bp∞,∞ ≤ L.

Theorem (Lower bound for NPIR)Let Assumption S hold for the NPIR model with a random sample(Yi,Wi)ni=1. Then:

lim infn→∞

infhn

suph∈B(p,L)

Ph(‖h− hn‖∞ ≥ c(n/ log n)−p/(2(p+ς)+d)

)≥ c′ > 0,

where inf hndenotes the infimum over all estimators based on the sample

of size n, and the constants c, c′ depend only on p, L, d, ς, σ0 .

Lower bound for NPIV

Corollary (Lower bound for NPIV)Let Assumption S hold for the NPIV model with a random sample(Xi, Yi,Wi)ni=1 and infw E[u2|W = w] ≥ σ2

0. Then:

lim infn→∞

infhn

suph∈B(p,L)

Ph(‖h− hn‖∞ ≥ c(n/ log n)−p/(2(p+ς)+d)

)≥ c′ > 0,

where inf hndenotes the infimum over all estimators based on the sample

of size n, and the constants c, c′ depend only on p, L, d, ς.

Outline






Adaptive estimation for NPIV

I must choose J optimally to attain optimal rates

I optimal choice depends on the unknown p and s−1JK

I want a data-driven method for choosing J optimally

I existing methods focus on L2 loss, minimizing a MSE-type criterionI Horowitz (14): modified sieve NPIV: K = J andbK = ψJ=orthonormal series of L2([0, 1]d).

I optimal in L2 up to a logn factorI Liu & Tao (14): Mallows Cp model selection of sieve NPIV assuming

homoskedastic error.

I CV/AIC/BIC/Mallows criteria aren’t well suited to sup-norm rates

I we introduce a sup-norm adaptive Lepski-type procedure

Lepski-type procedureI set K = K(J) J deterministically (e.g. K = c0J + a)

I choose J by the following method. Define the sets:

J0 =j ∈ [Jmin, Jmax] : j−p/d ≤ C0Vsup(j)

J =

j ∈ [Jmin, Jmax] : ‖hj − hl‖∞ ≤

√2σ[Vsup(j) + Vsup(l)]

∀ l ∈ (j, Jmax]

where

Vsup(j) = s−1jK(j)ξψ(j)

√(log n)/(nej)

Vsup(j) = s−1jK(j)ξψ(j)

√(log n)/(nej)

sJK(J) = smin((Ψ′Ψ)−1/2(Ψ′B)(B′B)−1/2), eJ = λmin(Ψ′Ψ/n).

I J0 = minj∈J0j is optimal but infeasible

I J = minj∈J j is our data-driven estimator of J

I hJ denotes the sieve NPIV estimator with J = J , K = K(J)

Heuristic argument

I Suppose J0 ⊆ J . Then:

J := min J ≤ J0 := minJ0

and:

‖hJ − h0‖∞ ≤ ‖hJ0 − h0‖∞ + ‖hJ − hJ0‖∞≤ ‖hJ0 − h0‖∞

+2√

2σs−1J0K(J0)ξψ(J0)

√(log n)/(neJ0)

. ‖hJ0 − h0‖∞+s−1

J0K(J0)ξψ(J0)√

(log n)/(neJ0) wpa1

⇒ ‖hJ − h0‖∞ = Op(J−p/d0 + s−1

J0K(J0)ξψ(J0)√

(log n)/(neJ0))

I implication: J is rate adaptive to the oracle J0

Choosing Jmax

I still need to choose Jmax

I data-driven estimator of Jmax:

Jmax = minJ > Jmin : s−1JK(J)ζ(J)

√(JL(J) log n)/n ≥ 1

where L(J) = a log(log(J)) for some constant a > 0.

Oracle propertyI now consider the special case with a CDV wavelet or B-spline sieve,

rectangular support, and well-behaved density

Theorem (Adaptivity)Let Assumptions 1–4 hold and s−1

JmaxK(Jmax)

√(J2

max log n)/n = o(1).

Then: Jmax ≤ Jmax ≤ Jmax wpa1; and

J0 ⊆ J wpa1

and so:

‖hJ − h0‖∞ = Op(J−p/d0 + s−1

J0K(J0)

√(J0 log n)/n) .

I implication: sup-norm rate adaptive in the mildly and severelyill-posed cases; no loss of log n factor.

I automatically implies L2(X)-norm rate adaptive in the severelyill-posed case, and almost adaptive in the mildly ill-posed case (upto log n factor).

Outline






MC design

I Newey and Powell (03) design, but with compact support: generate UiV ∗iW ∗i

∼ N 0

00

,

1 0.5 00.5 1 00 0 1

and set Xi = Φ((W ∗i + V ∗i )/

√2) and Wi = Φ(W ∗i )

I linear design: h0(x) = 4x− 2

I nonlinear design: h0(x) = log(|6x− 3|+ 1)sgn(x− 12 )

I generate 1000 samples of length 1000

I implement with cubic/quartic B-splines (with nested knots) andLegendre polynomials

I use σ = 1 (true σ) and σ = .1

I take L(J) = 110 log log J in definition of J

I compare sup-norm and L2-norm error of Lepski procedure againstinfeasible choice of J which minimizes sup-norm error in each sample

MC design: linear h0 (black), nonlinear h0 (red)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

MC design: scatter plot of (Xi,Wi)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

X

W

MC design: scatter plot of (Xi, Yi) with nonlinear h0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−5

−4

−3

−2

−1

0

1

2

3

4

5

X

MC results: Lepski procedure, linear design

Table 1: Linear design, cubic (r = 4) and quartic (r = 5) B-spline bases

Lepski (σ = 1) Lepski (σ = 0.1) InfeasiblerJ rK L∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss

Results with K(J) = J − rJ + rKMean 4 4 0.4262 0.1547 0.4262 0.1547 0.4141 0.1608Med. 4 4 0.3828 0.1394 0.3828 0.1394 0.3708 0.1443Mean 4 5 0.4179 0.1524 0.4209 0.1536 0.3937 0.1540Med. 4 5 0.3681 0.1368 0.3692 0.1370 0.3476 0.1368Mean 5 5 0.6633 0.2355 0.6633 0.2355 0.6243 0.2494Med. 5 5 0.6007 0.2202 0.6007 0.2202 0.5646 0.2311

Results with K(J) = 2(J − rJ) + rK + 1Mean 4 4 0.4188 0.1526 0.4188 0.1526 0.3895 0.1552Med. 4 4 0.3696 0.1375 0.3696 0.1375 0.3470 0.1371Mean 4 5 0.3918 0.1439 0.3945 0.1449 0.3720 0.1486Med. 4 5 0.3430 0.1291 0.3430 0.1291 0.3295 0.1311Mean 5 5 0.6366 0.2277 0.6366 0.2277 0.5816 0.2352Med. 5 5 0.5800 0.2089 0.5800 0.2089 0.5228 0.2111

MC results: Lepski procedure, linear design

Table 2: Linear design, Legendre polynomial bases

Lepski (σ = 1) Lepski (σ = 0.1) InfeasibleL∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss

Results with K(J) = JMean 0.0882 0.0492 0.2943 0.1185 0.0869 0.0494Med. 0.0777 0.0452 0.1674 0.0810 0.0764 0.0453

Results with K(J) = 2JMean 0.0878 0.0490 0.2745 0.1119 0.0862 0.0492Med. 0.0779 0.0453 0.1640 0.0807 0.0766 0.0455

MC results: Lepski procedure, nonlinear design

Table 3: Nonlinear design, cubic (r = 4) and quartic (r = 5) B-spline bases

Lepski (σ = 1) Lepski (σ = 0.1) InfeasiblerJ rK L∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss

Results with K(J) = J − rJ + rKMean 4 4 0.4343 0.1621 0.4343 0.1621 0.4233 0.1671Med. 4 4 0.3855 0.1469 0.3855 0.1469 0.3748 0.1503Mean 4 5 0.4262 0.1600 0.4271 0.1605 0.4030 0.1615Med. 4 5 0.3738 0.1444 0.3744 0.1445 0.3514 0.1445Mean 5 5 0.6726 0.2407 0.6726 0.2407 0.6318 0.2531Med. 5 5 0.6069 0.2278 0.6069 0.2278 0.5646 0.2345

Results with K(J) = 2(J − rJ) + rK + 1Mean 4 4 0.4271 0.1601 0.4286 0.1609 0.3987 0.1623Med. 4 4 0.3764 0.1445 0.3764 0.1445 0.3518 0.1443Mean 4 5 0.4002 0.1518 0.4029 0.1528 0.3812 0.1563Med. 4 5 0.3410 0.1384 0.3414 0.1384 0.3258 0.1402Mean 5 5 0.6471 0.2330 0.6471 0.2330 0.5895 0.2390Med. 5 5 0.5797 0.2143 0.5797 0.2143 0.5341 0.2141

MC results: Lepski procedure, nonlinear design

Table 4: Nonlinear design, Legendre polynomial bases

Lepski (σ = 1) Lepski (σ = 0.1) InfeasibleL∞ loss L2 loss L∞ loss L2 loss L∞ loss L2 loss

Results with K(J) = JMean 0.2494 0.1305 0.4283 0.1719 0.2297 0.1224Med. 0.2367 0.1266 0.3210 0.1426 0.2218 0.1243

Results with K(J) = 2JMean 0.2475 0.1306 0.4063 0.1644 0.2241 0.1208Med. 0.2346 0.1267 0.3132 0.1395 0.2178 0.1242

Outline






Pointwise and uniform inference

I Our sup-norm rates allow for low-level mild conditions forasymptotic normality of plug-in sieve NPIV estimators of possiblynonlinear functionals of h0 in two cases

1. “pointwise” inference on f(h0)I e.g.: exact consumer surplus of a price variation from p0 to p1 at

income i

Qi = h0(Pi, Ii) + ui

f(h0) = S(p0)

where S′i(p) = −h0(p, i− Si(p))

Si(p1) = 0

cf. Hausman & Newey (95), Vanhems (10), Blundell et al. (12)

2. “uniform” inference on fτ (h0) : τ ∈ T where T ⊂ RdTI e.g.: uniform inference on consumer surplus/deadweight loss

Pointwise inference (1)I we focus on slower than root-n functionals that are bounded wrt the

sup norm:

5. (i) there exists a linear functional Df(h0)[·] and constant C s.t.

|f(h)− f(h0)−Df(h0)[h− h0]| ≤ C‖h− h0‖2∞

for all h ∈ Nn(h0) where h ∈ Nn wpa1;

(ii) V−1/2n ‖h− h0‖2∞ = op(n

− 12 )

I includes CS/DWL functionals and quadratic functional.

I sufficient for a more general condition of Chen and Pouzo (14).

I here Vn ∞ is the sieve variance

Vn = Df(h0)[ψJ ]′ΣnDf(h0)[ψJ ]

Σn = [S′G−1b S]−1

(S′G−1

b ΩG−1b S

)[S′G−1

b S]−1,

where S = E[bK(Wi)ψJ(Xi)

′], Ω = E[u2i bK(Wi)b

K(Wi)′]

Pointwise inference (2)

Theorem (Pointwise asymptotic normality of sieve t-statistics)Let Assumptions 1–5 (etc) hold. Then:

√n(f(h)− f(h0))

V1/2n

→d N(0, 1).

I here Vn ∞ is the sieve variance estimator

Vn = Df(h)[ψJ ]′ΣDf(h)[ψJ ]

Σ = [S′G−1b S]−1

(S′G−1

b ΩG−1b S

)[S′G−1

b S]−1

S = B′Ψ/n, Gb = (B′B/n), Ω = n−1∑ni=1 u

2i bK(Wi)b

K(Wi)′.

I just like 2SLS variance estimator but using basis functions. Chenand Pouzo (14), Newey (13).

Uniform inference (1)

I now impose a uniform (for τ ∈ T ) version of Assumption 5

5′. (i) Dfτ (h0)[·] is a linear functional for each τ ∈ T , (ii) there exists aconstant C s.t.

supτ∈T|fτ (h)− fτ (h0)−Dfτ (h0)[h− h0]| ≤ C‖h− h0‖2∞

for all h ∈ Nn(h0) where h ∈ Nn wpa1;

(ii) supτ∈T V−1/2τ,n ‖h− h0‖2∞ = op(n

− 12 )

I here Vτ,n = Dfτ (h0)[ψJ ]′ΣnDfτ (h0)[ψJ ]

I estimate with Vτ,n = Dfτ (h)[ψJ ]′ΣDfτ (h)[ψJ ]

Uniform inference (2)

Theorem (Uniform asymptotic normality of sieve t-statistics)Let Assumptions 1–5’ (etc) hold. Then there exists a sequence of tightGaussian processes Gn on `∞(T ) with covariance function

E[Gn(t1)Gn(t2)] =Dft1(h0)[ψJ ]′ΣnDft2(h0)[ψJ ]

V1/2t1,nV

1/2t2,n

and random variables Zn =d supτ∈T |Gn(τ)| such that

supτ∈T

∣∣∣∣∣√n(fτ (h)− fτ (h0))

V1/2τ,n

∣∣∣∣∣ = Zn + op(1)

as n, J,K →∞.

I we follow Chernozhukov, Chetverikov, Kato (14) (also seeChernozhukov, Lee, Rosen (13)) construction rather than strongapproximation

Example: uniform confidence bands

I fτ (h0) = h0(τ) with T = X , and Dfτ (h)[ψJ ] = ψJ(τ)

I by previous theorem, there exists a sequence of tight Gaussianprocesses Gn on `∞(X ) with covariance function

E[Gn(x1)Gn(x2)] =ψJ(x1)′Σnψ

J(x2)

V1/2x1,nV

1/2x2,n

and random variables Zn =d supx∈X |Gn(x)| such that

supx∈X

∣∣∣∣∣√n(h(x)− h0(x))

V1/2x,n

∣∣∣∣∣ = Zn + op(1)

as n, J,K →∞.

I invert for uniform confidence band

Example: uniform inference on exact consumer surplus

I fτ (h0) = Si(p) with T = [p0, p0]× [i, i]

Dfτ (h)[ψJ ] = −∫ p

p1ψJ(t, i− Si(t))e

∫ sp∂2h0(u,i−Si(u)) du dt

Dfτ (h)[ψJ ] = −∫ p

p1ψJ(t, i− Si(t))e

∫ sp∂2h(u,i−Si(u)) du dt

I uniform asymptotic normality of Si(p) : (p, i) ∈ [p0, p0]× [i, i]follows from previous theorem

I could equally consider uniform inference on deadweight loss

I our sup-norm rates here are critical to control bias

Outline






MC design

I same Newey and Powell (03) linear and nonlinear designs

I estimate Jmax as before, use J = Jmax, K = K(Jmax) toimplement sieve NPIV estimator

I estimate critical values for uniform confidence bands using the sievescore bootstrap (Chen and Pouzo, 14) with Mammen (93) two-pointdistribution with 1000 bootstrap replications for each sample

I computationally simpler than bootstrap sieve t stat in Chen-Pouzo(14) or the bootstrap in Horowitz-Lee (12).

I compare MC with nominal coverage probabilities

Estimated UCBs (dashed), h (black line), h0 (red line)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

MC results: coverage probabilities

Table 5: Linear and nonlinear design, cubic (r = 4) and quartic (r = 5)

B-spline bases.

rJ rK 90% 95% 99% 90% 95% 99%

linear 4 4 0.933 0.966 0.996 0.944 0.971 0.994linear 4 5 0.937 0.975 0.995 0.937 0.963 0.994linear 5 5 0.961 0.983 0.997 0.959 0.985 0.997nonlinear 4 4 0.884 0.945 0.987 0.912 0.956 0.989nonlinear 4 5 0.894 0.946 0.987 0.906 0.951 0.987nonlinear 5 5 0.956 0.978 0.995 0.951 0.979 0.996

Note: Left panel uses K(J) = J − rJ + rK , right panel uses K(J) =2(J − rJ) + rK + 1.

MC results: coverage probabilities

Table 6: Linear and nonlinear design, Legendre polynomial bases

90% 95% 99% 90% 95% 99%

linear 0.937 0.964 0.997 0.928 0.959 0.989nonlinear 0.901 0.952 0.988 0.906 0.948 0.989

Note: Left panel uses K(J) = J , right panel uses K(J) = 2J .

Conclusions

I contributions:

1. optimal sup-norm rates and attainability by sieve estimators2. Lepski procedure for adaptive estimation in sup norm3. pointwise and uniform inference on possibly nonlinear functionals

I first such results for NPIV (or indeed any ill-posed inverse problemwith unknown operator)

I application to inference on consumer surplus in demand estimationand uniform confidence bands

Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV...

Documents

Transcript of Optimal Sup-norm Rate, Adaptive Estimation, and Inference ... Chen.pdf · general sieve NPIV...