matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap...

19
doi: 10.1111/j.1467-9469.2007.00565.x © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA, 2007 Bootstrap Bandwidth Selection Using an h-Dependent Pilot Bandwidth JOS ´ E E. CHAC ´ ON, JES ´ US MONTANERO and AGUST ´ IN G. NOGALES Departamento de Matem´ aticas, Universidad de Extremadura ABSTRACT. The problem of choosing the bandwidth h for kernel density estimation is considered. All the plug-in-type bandwidth selection methods require the use of a pilot bandwidth g. The usual way to make an h-dependent choice of g is by obtaining their asymptotic expressions separately and solving the two equations. In contrast, we obtain the asymptotically optimal value of g for every fixed h, thus making our selection ‘less asymptotic’. Exact error expressions show that some usually assumed hypotheses have to be discarded in the asymptotic study in this case. Two versions of a new bandwidth selector based on this idea are proposed, and their properties are analysed through theoretical results and a simulation study. Key words: bootstrap bandwidth choice, h-dependent pilot bandwidth, kernel density estima- tion, plug-in method 1. Introduction One of the main problems in kernel density estimation is the choice of an appropriate bandwidth, as it is well known that the performance of the estimator depends heavily on this parameter. A wide variety of methods arose in the 1980s and early 1990s to solve the problem of automatic (i.e. data-dependent) bandwidth selection. We are going to focus here on the L 2 context, although there are also some L 1 -oriented bandwidth selectors (see Devroye, 1997, for a good review). Essentially, there are three possible approaches to automatic bandwidth selection problems: the reference distribution approach, the use of cross-validation techniques and the plug-in ideas. The reference distribution approach consists of obtaining an asymptotic explicit formula for the optimal bandwidth and replacing there the unknown density with the density of some reference distribution; the most usual of these reference distributions is the Gaussian one with mean zero and estimated variance (Silverman, 1986). The cross-validation bandwidth selector for kernel density estimation was introduced in Rudemo (1982) and Bowman (1984), and many efforts have been devoted to study its proper- ties. Although it can be viewed as a leave-one-out method, a good motivation for this selector comes from the fact that it minimizes the minimum variance unbiased estimator (MVUE) of a shifted version of the expected L 2 error of the kernel estimator (section 2), also known as mean integrated squared error (MISE). Basically, all plug-in methods try to estimate the MISE of the kernel estimate (or an asymp- totic version of it, the AMISE) and set the bandwidth selector as the minimizer of the estimated criterion function. As both MISE and AMISE depend on the unknown density function, another kernel density estimation, using a second ‘pilot’ bandwidth, is needed. To select this pilot bandwidth according to some criterion we will usually need another density estimation. The options are to use a reference distribution here or to continue to the next stage, using another kernel density estimator and so on, ending up this process by using a reference distribution at some stage. Reviews of all these bandwidth selection methods can be found in Cao et al. (1994), Chiu (1996), Jones et al. (1996) or Devroye (1997).

Transcript of matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap...

Page 1: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

doi: 10.1111/j.1467-9469.2007.00565.x© Board of the Foundation of the Scandinavian Journal of Statistics 2007. Published by Blackwell Publishing Ltd, 9600 GarsingtonRoad, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA, 2007

Bootstrap Bandwidth Selection Usingan h-Dependent Pilot Bandwidth

JOSE E. CHACON, JESUS MONTANERO and AGUSTIN G. NOGALES

Departamento de Matematicas, Universidad de Extremadura

ABSTRACT. The problem of choosing the bandwidth h for kernel density estimation is considered.All the plug-in-type bandwidth selection methods require the use of a pilot bandwidth g. The usualway to make an h-dependent choice of g is by obtaining their asymptotic expressions separately andsolving the two equations. In contrast, we obtain the asymptotically optimal value of g for everyfixed h, thus making our selection ‘less asymptotic’. Exact error expressions show that someusually assumed hypotheses have to be discarded in the asymptotic study in this case. Two versionsof a new bandwidth selector based on this idea are proposed, and their properties are analysedthrough theoretical results and a simulation study.

Key words: bootstrap bandwidth choice, h-dependent pilot bandwidth, kernel density estima-tion, plug-in method

1. Introduction

One of the main problems in kernel density estimation is the choice of an appropriate bandwidth,as it is well known that the performance of the estimator depends heavily on this parameter. Awide variety of methods arose in the 1980s and early 1990s to solve the problem of automatic (i.e.data-dependent) bandwidth selection. We are going to focus here on the L2 context, althoughthere are also some L1-oriented bandwidth selectors (see Devroye, 1997, for a good review).Essentially, there are three possible approaches to automatic bandwidth selection problems: thereference distribution approach, the use of cross-validation techniques and the plug-in ideas.

The reference distribution approach consists of obtaining an asymptotic explicit formulafor the optimal bandwidth and replacing there the unknown density with the density of somereference distribution; the most usual of these reference distributions is the Gaussian one withmean zero and estimated variance (Silverman, 1986).

The cross-validation bandwidth selector for kernel density estimation was introduced inRudemo (1982) and Bowman (1984), and many efforts have been devoted to study its proper-ties. Although it can be viewed as a leave-one-out method, a good motivation for this selectorcomes from the fact that it minimizes the minimum variance unbiased estimator (MVUE) ofa shifted version of the expected L2 error of the kernel estimator (section 2), also known asmean integrated squared error (MISE).

Basically, all plug-in methods try to estimate the MISE of the kernel estimate (or an asymp-totic version of it, the AMISE) and set the bandwidth selector as the minimizer of theestimated criterion function. As both MISE and AMISE depend on the unknown densityfunction, another kernel density estimation, using a second ‘pilot’ bandwidth, is needed. Toselect this pilot bandwidth according to some criterion we will usually need another densityestimation. The options are to use a reference distribution here or to continue to the nextstage, using another kernel density estimator and so on, ending up this process by using areference distribution at some stage.

Reviews of all these bandwidth selection methods can be found in Cao et al. (1994), Chiu(1996), Jones et al. (1996) or Devroye (1997).

Page 2: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

2 J. E. Chacon et al. Scand J Statist

In many plug-in methods, the pilot bandwidth g is allowed to depend on the original band-width h. This is done, for instance, in Park & Marron (1990) or Sheather & Jones (1991).To this end, asymptotic expressions for h and g are given separately and then they are com-bined to get the dependence between h and g, usually following a pattern like g =Cnphm forconstants C, p and m, where n is the sample size (see also Jones et al., 1991). However, theexact, non-asymptotic calculations in section 3 show that this kind of parametrization maynot be adequate in some cases. In contrast, our goal is to obtain an expression for the opti-mal g as a function of h for every fixed h, making the choice ‘less asymptotic’. This is donein section 4. Then, data-dependent pilot bandwidth selectors aiming to estimate this optimalpilot bandwidth function are provided in section 5, and the finite sample performance of theresulting selectors of the original bandwidth is investigated via a simulation study in section 6.Finally, section 7 contains all the proofs.

2. Plug-in bandwidth selection

The kernel density estimator based on a sample X1, . . ., Xn with common density f is givenby

fn,K ,h(x)= 1n

n∑i=1

Kh(x −Xi),

where the kernel K is an integrable symmetric function with∫

K =1, the bandwidth h is apositive real number and we have used the notation Kh(x)=K (x/h)/h. We will measure theerror of this estimator through the MISE, defined by

MISEn(h)=E

∫[ fn,K ,h(x)− f (x)]2 dx

and our goal is to propose a new bandwidth selector; that is, an estimator of the optimalbandwidth h0n =argminh > 0MISEn(h). To this end, we must propose an estimator Mn(h) ofthe MISE function and then define the selector as the minimizer of this criterion functionMn(h).

It is easy to show (Wand & Jones, 1995) that the MISE function can be written as

MISEn(h)=R( f )+ R(K )nh

+RK ,h( f ),

where we have used the notation

R(L)=∫

L2, RL,h( f )=∫

(Lh ∗ f )f ,

K = n−1n

(K ∗K )−2K ,

with ∗ standing for the convolution operator. Therefore, the optimal bandwidth can bedefined also as the minimizer of the function

Mn(h)=MISEn(h)−R( f )= R(K )nh

+RK ,h( f ),

which depends on the unknown density f only through the functional RK ,h( f ). That is thequantity that we must estimate to provide an estimator of Mn(h). It can be expressed asRK ,h( f )=E[Kh(X1 −X2)] so that it is a regular statistical functional of order 2. Therefore, forevery fixed h > 0 its MVUE is given by the U -statistic

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 3: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 3

Sn(h)= 1n(n−1)

∑i �= j

Kh(Xi −Xj),

as shown in Lee (1990, chap. 1). This leads to the well-known cross-validation criterion

CVn(h)= R(K )nh

+Sn(h) (1)

and the cross-validation bandwidth selector HCV =argminh > 0 CVn(h), which has been exten-sively studied in papers like Hall (1983), Stone (1984), or Hall & Marron (1987a).

Essentially, all plug-in approaches consist of estimating Mn(h) by smoothing methods andselecting the bandwidth that minimizes the estimated version of Mn(h). Sheather & Jones(1991), Jones et al. (1991) and Hall et al. (1992) first consider an asymptotic approximation ofMn(h), also depending on the unknown f , and then estimate this approximation by replacingf with another kernel estimator fn,L,g with possibly different kernel L and pilot bandwidth g.Bootstrap methods proceed in the same way but try to estimate the exact Mn(h) instead, thatis, they use M∗

n,L,g(h)=R(K )/(nh)+RK ,h( fn,L,g) as an estimate for Mn(h); see Marron (1992),Cao (1993) or Grund & Polzehl (1997) and the references therein. This way, plug-in andbootstrap procedures are closely related; they could be considered as asymptotically equiva-lent methods (Loader, 1999).

The plug-in estimator of the functional RK ,h( f ) can be written as

RK ,h( fn,L,g)= 1n2

n∑i, j=1

(Kh ∗ Lg)(Xi −Xj),

where L =L ∗ L. It includes the non-random term (Kh ∗ Lg)(0)/n=RK ,h(Lg)/n. This suggestsreplacing the bootstrap estimator M∗

n,L,g for its no-diagonal version M∗n,L,g(h)=R(K )/(nh)+

Tn,L,g(h), where

Tn,L,g(h)= 1n(n−1)

∑i �= j

(Kh ∗ Lg)(Xi −Xj). (2)

This way, Tn,L,g(h) is the MVUE of E[(Kh ∗ Lg)(X1 − X2)]=RK ,h(Lg ∗ f ), which is in fact asmoothed version of RK ,h( f ). Then, we select H∗ =argminh > 0 M∗

n,L,g(h).The previous elimination of the diagonal terms is the analogue of the one presented in Hall

& Marron (1987b) for a similar problem. Jones & Sheather (1991) showed that some improve-ment could be obtained in that problem by reinstating the diagonal terms. In our case, theasymptotics for the corresponding diagonals-in version also improved; however, some pre-liminary simulations showed an unacceptable performance of the diagonals-in selector inpractice, and a decision was made not to include its asymptotic results here.

One of the disadvantages of the plug-in approach is that it raises the new problem of choos-ing the pilot bandwidth g. Mimicking the steps for selecting h, we must evaluate the meansquare error (MSE) function of M∗

n,L,g(h) as an estimate of Mn(h), MSEn(h, g)=E[{M∗n,L,g(h)−

Mn(h)}2], and try to estimate the optimal pilot bandwidth, g0n(h), which minimizes MSEn(h, g)as a function of g, i.e. g0n(h)=argming > 0MSEn(h, g). Notice that, using this methodology,no global Lp measure of the performance of M∗

n,L,g seems appropriate, since Mn �∈Lp for anyp ≥ 1. Therefore, the pilot bandwidth must be necessarily local for this problem; that is, wemust choose one g for every fixed h > 0. We will refer to this as an h-dependent bandwidth.As mentioned in the introduction, the fact that the pilot bandwidth should depend on h hadbeen previously introduced in the literature with different motivations; see Park & Marron(1990), Jones et al. (1991) or Sheather & Jones (1991).

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 4: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

4 J. E. Chacon et al. Scand J Statist

3. Exact error formulas

To provide a new method for choosing the pilot bandwidth we must estimate the MSE ofthe estimator M∗

n,L,g(h) in the previous section. As usual, it is possible to give a squared bias–variance decomposition of the MSE, in a way such that MSEn(h, g)=B2

n(h, g)+Vn(h, g).Next we provide exact expressions for each of these terms. Let us denote �≡�h,g = Kh ∗ Lg

and �≡�h,g =�h,g −�h,0 = Kh ∗ Lg − Kh.

Proposition 1The squared bias and variance functions can be expressed as

B2n(h, g)=E[�(X1 −X2)]2,

Vn(h, g)= 4(n−2)n(n−1)

�1 + 2n(n−1)

�2 − 4n−6n(n−1)

�0,

where �0 =E[�(X1 −X2)]2, �1 =E[�(X1 −X2)�(X2 −X3)] and �2 =E[�(X1 −X2)2].

A useful application of proposition 1 is the calculation of explicit exact error formulas inthe Gaussian case; that is, when K =L = f =�, with � standing for the density of the stan-dard normal distribution, �(x)= exp(−x2/2)/

√2�. Notice that we only need to calculate �i

for i =0, 1, 2 to give an exact expression for MSEn(h, g).

Proposition 2Let us denote �1 =−2, �2 = (n−1)/n and �2

pi ≡�2pi(h, g)=ph2 +2g2 +2− i. Then, for i =0, 1, 2,

we have

�i =1

2�

2∑p,q=1

�p�q

(i�2pi + i�2

qi +�2pi�

2qi)1/2

.

Once we have an explicit formula for MSEn(h, g) we can find g0n(h) numerically in theGaussian case. This exact optimal pilot bandwidth exhibits some surprising properties:

1. Figure 1 shows g0n(h) as a function of h for n=100. It is clear that g0n(h) �→0 as h→0;in fact, g0,100(0) ≈ 0.226. Jones et al. (1991), generalizing some approaches as Park &Marron (1990) or Sheather & Jones (1991), suggest taking a factorized pilot bandwidthgn(h)=Cnphm and obtain a remarkably fast convergence rate by choosing appropriate

0.0 0.2 0.4 0.6 0.8 1.0

0.17

0.18

0.19

0.20

0.21

0.22

h

g 0n(h

)

Fig. 1. The function g0n(h) for n=100.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 5: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 5

values for p and m. Figure 1 shows that in this case this parametrization may not beadequate, in the sense that it would never fulfill the condition that gn(h) be a non-constant function converging to a finite non-zero value as h→0.

2. However, for the study of asymptotic results what we need to know is the asymptoticbehaviour of the optimal pilot bandwidth function g0n(h) evaluated at the optimalbandwidth h0n. Then we get a sequence that seems to satisfy the classical conditionsg0n(h0n)→0 and ng0n(h0n)→∞ (Fig. 2).

3. The most important feature that may be extracted from the analysis of the exact calcula-tions for the Gaussian case is the relative behaviour of the two bandwidth sequences, h0n

and g0n(h0n). The common assumption when choosing a pilot bandwidth sequence gn isthat h0n/gn → c for some 0 ≤ c < ∞ as n → ∞; see, e.g. Hall et al. (1992) orMarron (1992). However, in view of Fig. 3 we have some evidence that in this casewe may expect h0n/g0n(h0n)→∞.

The previous remarks show that for the asymptotic study of the MSE some usually assumedhypotheses about the pilot bandwidth sequence have to be discarded. For this reason, at thetime of giving an asymptotic representation of g0n(h) we will only assume that g0n(h) → 0pointwise (see theorems 1 and 2).

1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

log10(n)

g 0n(h

0n)

ng0n

(h0n

)

1 2 3 4 5 6 7

log10(n)

010

0020

0030

0040

0050

0060

00

Fig. 2. The sequences g0n(h0n) and ng0n(h0n).

1 2 3 4 5 6 7

010

2030

4050

60

log10(n)

h 0n/g

0n(h

0n)

Fig. 3. The sequence h0n/g0n(h0n).

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 6: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

6 J. E. Chacon et al. Scand J Statist

4. Asymptotics

4.1. Asymptotically optimal pilot bandwidth

In this section we are going to push forward from the exact formulas of the previous sectionto new, asymptotic ones, for MSEn(h, g). This will provide us with an asymptotic formulafor g0n(h).

Let us denote mj()=∫ xj(x) dx/j! and |mj |()=∫ |xj(x)|dx/j!. Then, an integrable func-tion (not necessarily a kernel) is said to be of finite order if the set M ={j ∈ N, j ≥ 1 :|mj |() < ∞, mj() /=0} is non-empty. In this case, a =min M is called the order of . Wewill consider here only finite-order symmetric kernel functions. In fact, the following con-ditions will be assumed:

(K1) L is a symmetric kernel of even order l with |ml +2|(L) < ∞, K is a symmetric kernel ofeven order k with |m2k +2|(K ) < ∞.(K2) K has l +2 bounded continuous derivatives. For j =0, 1, . . ., l +1, the jth derivative K (j)

is such that lim|x|→∞ K (j)(x)=0.(K3) L has 2k integrable continuous derivatives.

We want to remark that, although these assumptions may not be minimal for some of theproofs, we prefer not to specify the minimal ones in order to keep the exposition as clear aspossible.

The following result gives an asymptotic expansion of the MSE function with no con-ditions on the density f. The notation R,,h( f )=∫ (h ∗ f )(h ∗ f )f will be used.

Theorem 1Assume that K and L satisfy conditions (K1) and (K2). Let g≡gn(h) be any sequence of band-width functions such that g →0 pointwise as n→∞. Then, for any density f,

MSEn(h, g)=C�1/n+C�2gl /n+C2

�g2l +O(1/n2)+O(gl +2/n)+O(g2l +2),

whereC�1 ≡C�1(h)=4[RK , K ,h( f )−RK ,h( f )2],

C�2 ≡C�2(h)=16ml (L)[RK (l) , K ,h( f )/hl −RK (l) ,h( f )RK ,h( f )/hl ],

C� ≡C�(h)=2ml (L)RK (l) ,h( f )/hl .

Remark 1. The O notation in theorem 1 is meant to be pointwise; that is, two sequencesof functions n(h) and n(h) satisfy n =O(n) if

lim supn→∞

n(h)n(h)

<∞

for every fixed h.Next we provide asymptotic representations for the optimal pilot bandwidth sequence g0n(h).

For this result we need the following smoothness assumption on the density:

(D) The density f has 2k + l +2 bounded integrable derivatives.

Theorem 2Assume that K, L and f satisfy conditions (K1), (K2) and (D). Then, the optimal pilot band-width function g0n(h) is asymptotically equivalent to

g0n(h)={

(−C�2C−2� /2)1/l n−1/l if C�2 < 0

0 if C�2 ≥0.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 7: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 7

Further, let us denote

C�2 =−16|ml (L)|[∫

( f 2)(l/2)f (l/2) −R( f (l/2))R( f )],

C� =2|ml (L)|R( f (l/2)).

If C�2 < 0 and h=hn →0 then

g0n(hn)∼ (−C�2C−2� /2)1/l n−1/l . (3)

Remark 2. All the direct plug-in methods try to estimate a functional R( f (k)) by R( f (k)n,L,g)

for one or several k’s and the choice g = 0 for this estimator is meaningless. For the boot-strap method, as stated in Hall et al. (1992) the use of g = 0 in M∗

n,L,g(h) yields exactly thesame cross-validation criterion CVn(h) as defined in (1), so that g = 0 is indeed a choice.

Remark 3. To provide the asymptotic behaviour of the sequence g0n(hn) for hn →0 we haveneeded to impose the condition on the density that C�2 =C�2(0) < 0. Although unfortunatelywe have not been able to prove it, we believe that in fact this condition holds for any den-sity f for which C�2 is well-defined. It is easy to check that it holds in the Gaussian casef (x)=��(x − �) (section 5.1) and we have not found any example of a density for whichC�2 > 0.

Remark 4. It is well-known that for a smooth enough density f, the optimal bandwidthh0n is of order n−1/(2k +1) if a kernel K of order k is used to estimate f (see Silverman, 1986,or Wand & Jones, 1995). Comparing these rates with that of the optimal pilot bandwidthg0n(h0n) we obtain

h0n/g0n(h0n)→{∞ if l ≤2k

0 if l > 2k. (4)

Therefore, for the asymptotic study, the relative behaviour of the two bandwidth sequencesis given by the relationship between the orders of the kernels K and L.

Henceforth, for a function , we will denote ∂ i/∂hi by [i]. Another important propertyof the optimal bandwidth sequence is the relationship between g0n(h0n) and g[1]

0n (h0n).

Corollary 1If K and f satisfy conditions (K1)–(K3) and (D) and h=h0n, then

g[1]0n (h)

g0n(h)∼ (−1)l/2+1dK �f h2k−1/l, (5)

where �f is a constant depending only on f and dK =2kmk(K )2.

4.2. Convergence rates of bootstrap selectors with asymptotically optimal pilot bandwidths

Once we have studied the asymptotically best choice of the pilot bandwidth function g0n(h)it is natural to explore how this choice is related with the original problem; that is, withthe problem of choosing the bandwidth h for estimating the density. Indeed, next we givethe asymptotic behaviour of the selector H∗ when gn(h) is any sequence of pilot bandwidthfunctions satisfying conditions (3), (4) and (5).

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 8: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

8 J. E. Chacon et al. Scand J Statist

Theorem 3Suppose that K, L and f satisfy conditions (K1)–(K3) and (D). Let gn(h)=C0(h)n−1/l be asequence of positive functions such that, as n→∞,

C0(h0n)∼ C0, C [1]0 (h0n)/C0(h0n)∼ �0h2k−1

0n ,

where C0 and �0 are positive constants. Then, the distribution of nr(H∗ −h0n)/h0n is asymptoti-cally N(0, �2), where:

(i) If 2≤ l ≤2k, then

r = 14k +2

, �2 = 2R( )R( f )(2k +1)2(dK R( f (k))R(K )4k +1)−1/(2k +1)

,

with (x)=x(K ∗K )′(x)−2xK ′(x).(ii) If 2k +2≤ l ≤4k, then

r = 2l −4k −12l

, �2 = 2R(L(2k))R( f )

(2k +1)2C4k +10 R( f (k))2

.

(iii) If l ≥4k +2, then

r = 12

, �2 = 4(2k +1)2

[R( f (2k)f 1/2)

R( f (k))2−1].

Remark 5. If 2 ≤ l ≤ 2k, we get the same rate and asymptotic variance as those of thecross-validation method; see Park & Marron (1990). If 2k +2≤ l ≤4k the rate improves andfor l ≥ 4k +2, we get the optimal rate n−1/2 and also the optimal asymptotic variance; seeHall & Marron (1991) and Fan & Marron (1992).

Remark 6. As was shown in corollary 1, the optimal pilot bandwidth g0n(h) satisfies theconditions of theorem 3. Therefore, the same rates would apply if it were possible to use thetheoretically optimal g0n(h) as a pilot bandwidth.

5. Pilot bandwidth selectors

All the results in section 4 are intended for a deterministic pilot bandwidth sequence gn(h).However, for practical purposes, a data-dependent way of choosing this pilot bandwidth isneeded. As was mentioned in section 1, the usual procedure here is to calculate the valueof the asymptotically optimal g0n(h) for some parametric reference density � (with estimatedparameters) and to use it in the estimator. This approach is considered in section 5.1. Never-theless, in this case all the functionals that appear in the expression of g0n(h) can be estimatedempirically; thus, we can provide an estimator of g0n(h) which does not depend on anotherbandwidth, only on the data. This is addressed in section 5.2.

5.1. Normal reference pilot bandwidth selection

Next we provide a way to calculate the asymptotically optimal pilot bandwidth g0n(h) whenthe normal reference density �=�� is used, with �=min{s, IQR/1.34}, where s denotes thesample standard deviation and IQR the interquantile range. Notice that g0n(h) depends onlyon C�2(h) and C�(h), so that it suffices to give formulas for RK (l) , h( f )/hl and RK (l) , K ,h( f )/hl

in the Gaussian case. This is done in the next result, whose proof is omitted to save space,since it is based on the same tools used in the proof of proposition 2.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 9: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 9

Proposition 3Let �1, �2 be as in proposition 2. For K =� and any even l,

RK (l) ,h(��)/hl = (−1)l/2OF(l)√2�

2∑p=1

�p

( ph2 +2�2)(l +1)/2,

RK (l) , K ,h(��)/hl = (−1)l/2OF(l)2�

2∑p,q =1

�p�q( ph2 +2�2)l/2

[( ph2 +2�2)(qh2 +2�2)−�4](l +1)/2,

where OF (l) denotes the odd factorial; that is, OF (l)= (l −1)(l −3) . . .5 ·3 ·1.

Using the previous formulas we can compute gNR, n(h), the asymptotically optimal pilot band-width in the Gaussian case when �2 is estimated by �2. The bootstrap bandwidth minimizingM∗

n,L,g(h) with g = gNR, n(h) will be denoted H∗NR.

5.2. Empirical pilot bandwidth selection

A nice property of the asymptotically optimal pilot bandwidth g0n(h) is that all the unknownquantities needed to define it can be estimated empirically. Specifically, it can be easily checkedthat

RK (l) ,h( f )=E[(K (l))h(X1 −X2)],

RK (l) , K ,h( f )=E[(K (l))h(X1 −X2)Kh(X1 −X3)].

Therefore, reasonable estimators of the previous quantities are given by

Sn, l (h)= 1n(n−1)

∑i �= j

(K (l))h(Xi −Xj),

Sn, l,0(h)= 1n(n−1)(n−2)

∑i �= j �=k �= i

(K (l))h(Xi −Xj)Kh(Xi −Xk),

respectively. Substituting these estimates in the expression of g0n(h) gives gemp, n(h). The boot-strap bandwidth minimizing M∗

n,L,g(h) with g = gemp, n(h) will be denoted H∗emp.

Unfortunately, the calculation of the variance of the U -statistic Sn, l (h) shows that, whenh = hn is of order n−1/(2k +1) (as the optimal h0n), then Var[Sn, l (hn)] �→0 if l ≥ 2k + 2. There-fore, the use of H∗

emp is possible only in the case l ≤ 2k, which leads to the slow relative rateof convergence n−1/(4k +2). This fact seems to show some kind of trade-off between attaininggood rates of convergence and estimating the optimal bandwidths that lead to those rates:while the pilot kernel L is needed to be of order 4k +2 to get

√n-convergence, the optimal

bandwidth choice is not empirically estimable in that case.

6. Simulation study

We have carried out a simulation study to explore the finite-sample performance of the twonew bandwidth selectors H∗

NR and H∗emp. The Gaussian density � is used as kernel K , so that

k =2. Therefore, we would have to use a pilot kernel L of order l =10 to ensure the optimalrate of convergence. However, as g = gemp, n(h) does not give consistent estimates in that case,we have preferred to check the performance of the two bootstrap methods H∗

NR and H∗emp in

the simplest possible case; that is, when l =2, also taking to this end L =�.Assuming the risk of not being exhaustive, and with the aim of not obscuring the interpre-

tation of the results, we have restricted the bunch of selection methods to compare with onlyto the bandwidth selector HSJ of Sheather & Jones (1991) and the cross-validation selectorHCV (see section 2).

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 10: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

10 J. E. Chacon et al. Scand J Statist

The selector HSJ is included because it is the recommended method in the exhaustivesimulation studies of Cao et al. (1994) and Jones et al. (1996). However, as Loader (1999)points out, the performance of HSJ depends crucially on the choice of the pilot bandwidth,and the use of normal reference densities may lead in some cases to poor behaviour,compared with HCV. In fact, as Devroye (1997) remarks, whenever a reference density needsto be used in the specification of a selector, it is possible to find a bimodal density that makesthe performance of that selector nearly catastrophic. That is the reason why HCV may out-perform HSJ in some cases. We recognize that the same arguments could be applied to ourH∗

emp, not requiring a reference density, against H∗NR.

The set of test densities consists of the 15 normal mixture densities that appear in Marron& Wand (1992). For each of these densities, B =500 samples of sizes n=100 and n=400 havebeen generated, and for each sample all the bandwidth selectors HCV, HSJ, H∗

NR and H∗emp

have been computed. For any selector H the performance of the estimate fn,K ,H (x) has beenmeasured using the squared L2 error, ISE(H)=∫ ( fn,K ,H − f )2.

Table 1. Averages and standard deviations for n=100

ISE averages (×102) ISE standard devs. (×102)

Density number HCV HSJ H∗NR H∗

emp HCV HSJ H∗NR H∗

emp

1 0.825 0.640 0.673 0.811 0.708 0.447 0.470 0.7062 1.125 0.836 0.877 1.096 1.082 0.566 0.597 0.9953 5.263 7.994 7.316 5.415 2.508 2.246 2.225 2.5564 5.005 6.206 5.755 5.373 2.552 3.148 2.928 2.7495 8.277 6.506 6.778 7.950 7.087 4.300 4.578 6.2256 1.041 0.800 0.873 1.019 0.752 0.429 0.444 0.6767 1.472 1.298 1.300 1.419 0.916 0.669 0.674 0.8808 1.291 1.075 1.167 1.290 0.838 0.497 0.518 0.8229 1.141 0.946 1.020 1.136 0.710 0.438 0.458 0.706

10 4.990 5.260 5.347 4.875 1.134 0.487 0.477 1.05311 1.173 0.991 1.064 1.186 0.659 0.455 0.452 0.61812 2.797 2.748 2.829 2.794 0.928 0.393 0.407 1.05313 1.555 1.270 1.350 1.535 0.907 0.452 0.474 0.74114 4.020 5.459 5.049 3.970 1.172 0.750 0.733 1.14815 4.032 8.230 5.046 4.004 1.100 1.067 0.927 1.064

Table 2. Averages and standard deviations for n=400

ISE averages (×102) ISE standard devs. (×102)

Density number HCV HSJ H∗NR H∗

emp HCV HSJ H∗NR H∗

emp

1 0.281 0.229 0.246 0.281 0.236 0.145 0.158 0.2352 0.356 0.274 0.302 0.343 0.274 0.162 0.189 0.2543 1.817 3.681 2.226 1.853 0.665 0.914 0.740 0.6814 1.681 1.966 1.623 1.744 0.759 0.967 0.717 0.8155 2.620 2.132 2.306 2.624 1.988 1.293 1.443 2.0846 0.333 0.288 0.302 0.329 0.198 0.154 0.160 0.1907 0.467 0.422 0.418 0.463 0.252 0.203 0.202 0.2528 0.454 0.408 0.411 0.458 0.248 0.196 0.202 0.2499 0.428 0.388 0.385 0.426 0.232 0.161 0.167 0.225

10 1.436 4.432 1.770 1.447 0.459 0.250 0.678 0.46611 0.510 0.450 0.467 0.506 0.238 0.159 0.168 0.22612 1.219 1.938 1.298 1.210 0.320 0.208 0.293 0.31013 0.783 0.715 0.721 0.770 0.243 0.159 0.163 0.23514 1.827 3.136 2.367 1.811 0.372 0.286 0.272 0.35915 1.790 3.092 2.365 1.766 0.392 0.332 0.247 0.373

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 11: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 11

The results of the simulation study are summarized in Table 1 for n=100 and Table 2for n=400, including averages and standard deviations of the random variables ISE(H) forevery selector in the study and every density in the set of Marron & Wand (1992). The resultsfor n=100 are similar for all the selectors in terms of averages, and the differences betweenthe selectors are more clearly appreciated for n=400. Figure 4 shows the box-plots of thedistributions of ISE(H) for every selector and density in the study for n=400.

Obviously, without looking at the tables, it is to be expected that H∗emp be more variable

than H∗NR, since H∗

emp makes quite an effort to try to estimate g0n(h). In fact, the simulationsshow that the performance of H∗

emp is entirely similar to that of HCV, so that the same con-

0.00

00.

010

0.02

0

0.00

00.

010

0.02

0

0.01

0.03

0.05

0.07

0.02

0.04

0.06

0.08

0.00

0.05

0.10

0.15

0.20

0.00

00.

005

0.01

00.

015

0.00

00.

010

0.02

0

0.00

00.

010

0.02

0

0.00

00.

010

0.02

0

0.01

0.03

0.05

0.00

50.

015

0.02

5

0.00

50.

015

0.02

5

CV SJ Bnorm Bemp CV SJ Bnorm Bemp CV SJ Bnorm Bemp

CV SJ Bnorm Bemp CV SJ Bnorm Bemp CV SJ Bnorm Bemp

CV SJ Bnorm Bemp CV SJ Bnorm Bemp CV SJ Bnorm Bemp

CV SJ Bnorm Bemp CV SJ Bnorm Bemp CV SJ Bnorm Bemp

CV SJ Bnorm Bemp CV SJ Bnorm Bemp CV SJ Bnorm Bemp

0.00

50.

015

0.02

5

0.01

00.

020

0.03

00.

040

0.01

00.

020

0.03

00.

040

Fig. 4. Box-plots of the distributions of ISE(H) for H =HCV, HSJ, H∗NR and H∗

emp (from left to right)and n=400 for densities #1–#15 of Marron & Wand (1992), from left to right and top to bottom.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 12: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

12 J. E. Chacon et al. Scand J Statist

clusions as in the aforementioned studies of Cao et al. (1994) or Jones et al. (1996) can beapplied: the centre points of HCV and H∗

emp are extremely good in most cases, but they areusually (although not always) far more variable than HSJ. However, we want to point out thatwe agree with Loader (1999) that, for some densities, especially for those having complicatedfeatures, such as #10–#15, this variability may be due to the fact that these two methods,HCV and H∗

emp, try hard to discover the complicated structure of the unknown density, mean-while HSJ has less variability, but always selects a wrong, too oversmoothed, bandwidth. SeeFigs 5 and 6.

Overall, our preferred selector is H∗NR. Although it is seldom the least variable selector,

it has the nice property that its variability is close to that of the selector having the least.This is easy to verify by looking at Table 2 and densities #1–#4, for instance. Thus, H∗

NR

is quite trustworthy. Usual multiple comparison procedures seem to confirm the graphicaland numerical impressions that H∗

NR has a similar performance to HSJ when the latter is wellbehaved and improves on it when it clearly fails, especially for n=400. Nevertheless, as H∗

NR

depends on a normal reference plug-in stage, it shares with HSJ the aforementioned undesir-able property of possible catastrophic behaviour towards oversmoothing. Even so, the otherreason why we prefer H∗

NR over HSJ is that the effect of the normal reference density is muchless dramatic than in the case of HSJ. This is shown in Figures 5 and 6; the effect is quiteclear for densities #3, #10, #12 and #14–#15, and less clear, but still noticeable for densities#4, #7–#9 and #13. For all these densities, H∗

NR appears as a good compromise betweenHCV and HSJ: usually, it is not so variable as HCV and its centre point is better than that ofHSJ.

7. Proofs

7.1. Proofs of results in section 3

Proof of proposition 1. The bias formula may be obtained as a direct consequence of the factthat E[Tn,L,g(h)]=E[�(X1 −X2)] and RK ,h( f )=E[Kh(X1 −X2)]. The second formula is just thevariance of a U -statistic of order 2, which can be found, for instance, in Lee (1990).

Proof of proposition 2. We will use several facts about the normal density function, all ofthem contained in Aldershof et al. (1995) or Wand & Jones (1995, app. C). For K =L =�we have Kh =∑2

p=1 �p�h√

p and Lg =�g√

2, so that �h,g = Kh ∗ Lg =∑2p=1 �p�(ph2 +2g2)1/2 . Thus,

E�[�(X1 −X2)]=2∑

p=1

�p

∫ ∫�(ph2 +2g2)1/2 (x −y)�(x)�(y) dx dy

=2∑

p=1

�p

∫(�(ph2 +2g2)1/2 ∗�)(x)�(x) dx

= 1√2�

2∑p=1

�p

�p0≡Dn(h, g).

Therefore,

�0 =Dn(h, g)2 = 12�

2∑p,q=1

�p�q

�p0�q0.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 13: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 13

0.1 0.2 0.3 0.4 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.02 0.04 0.06 0.08 0.10

0.01 0.02 0.03 0.04

050

100

150

0.1 0.2 0.3 0.4

0.05 0.10 0.15 0.20 0.25

010

2030

4050

010

2030

4050

60

010

2030

4050

60

HCVHSJHNRHemp

HCVHSJHNRHemp

HCVHSJHNRHemp

HCVHSJHNRHemp

HCVHSJHNRHemp

HCVHSJHNRHemp

HCVHSJHNRHemp

HCVHSJHNRHemp

0.05 0.10 0.15 0.20 0.25 0.30 0.35

05

1015

05

1015

05

1015

05

1015

Fig. 5. Estimated densities of the compared bandwidth selectors for densities #1–#8 of Marron & Wand(1992), from left to right and top to bottom, and n=400. The solid vertical line corresponds to thelocation of h0n.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 14: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

14 J. E. Chacon et al. Scand J Statist

0.1 0.2 0.3 0.4 0.05 0.10 0.15 0.20 0.25 0.30

0.1 0.2 0.3 0.4 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

105

015

105

015

105

015

20

105

015

20

0.05 0.10 0.15 0.20 0.25

0.05 0.10 0.15 0.20 0.25

8060

4020

0

5060

4030

2010

050

4030

2010

0

Hemp

HNR

HSJ

HCV

Hemp

HNR

HSJ

HCV

Hemp

HNR

HSJ

HCV

Hemp

HNR

HSJ

HCV

Hemp

HNR

HSJ

HCV

Hemp

HNR

HSJ

HCV

Hemp

HNR

HSJ

HCV

Fig. 6. Estimated densities of the compared bandwidth selectors for densities #9–#15 of Marron &Wand (1992), from left to right and top to bottom, and n=400. The solid vertical line corresponds tothe location of h0n.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 15: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 15

The formulas for the remaining �1 and �2 can be easily obtained by taking into accountthat �1 =∫ (�∗�)(x)2�(x) dx, �2 =∫ (�2 ∗�)(x)�(x) dx, with

(�∗�)2 = 1√2�

2∑p,q =1

�p�q

(�2p1 +�2

q1)1/2��p1�q1/(�2

p1 +�2q1)1/2 ,

�2 ∗�= 1√2�

2∑p,q =1

�p�q

(�2p2 +�2

q2)1/2�[�2

p2�2q2 /(�2

p2 +�2q2)+1]1/2 ,

and the fact that∫�[�2

pi�2qi /(�

2pi +�2

qi )+ i−1]1/2 (x)�(x) dx = 1√2�

(�2

pi +�2qi

i�2pi + i�2

qi +�2pi�

2qi

)1/2

.

7.2. Proofs of results in section 4.1

If K and L satisfy condition (K1) it is easy to check that K and L are functions of order kand l such that m0(K )=−(n+1)/n, mk(K )=−2mk(K )/n, m0(L)=1 and ml (L)=2ml (L).

The next two lemmas give asymptotic versions for the squared bias and variance functionsfrom section 3. Using them, theorem 1 is straightforward.

Lemma 1Assume the conditions and notations of theorem 1. Then,

B2n(h, g)=C2

�g2l +O(g2l +2).

Proof. The key of the proof is to derive a convenient asymptotic expansion for the con-volution �= Kh ∗ Lg. Using the same tools as in Cao (1993), it is not hard to prove that

(Kh ∗ Lg) (x)= Kh(x)+glml (L)(Kh)(l)(x)+gl +2IK ,L,g(x), (6)

where

IK ,L,g(x)= 1(l +1)!

∫ ∫ 1

0zl +2L(z)(1− t)l +1(Kh)(l +2)(x − tgz) dt dz

and (Kh)(l) denotes the lth derivative of Kh(x). Now, conditions (K1) and (K2) imply thatsupg supx |IK ,L,g(x)|< ∞, so that we can express the last term in (6) as O(gl +2) (uniformly inx). Taking into account that (Kh)(l) = (K (l))h/hl , this yields

E[Tn,L,g(h)]=∫ ∫

(Kh ∗ Lg) (x −y) f (x) f ( y) dx dy dz

=RK ,h( f )+glml (L)RK (l) ,h( f )/hl +O(gl +2). (7)

Using (7), the proof of the asymptotic expansion of B2n(h, g) is straightforward.

Lemma 2Assume the conditions and notations of theorem 1. Then,

Vn(h, g)=C�1/n+C�2gl /n+O(1/n2)+O(gl +2/n).

Proof. Using the results in section 3, we just have to give the asymptotic expansions for �0,�1 and �2. From the proof of lemma 1 and the fact that �0 =E[Tn,L,g(h)]2, we immediately get

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 16: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

16 J. E. Chacon et al. Scand J Statist

�0 =RK ,h( f )2 +2glml (L)RK (l) ,h( f )RK ,h( f )/hl +O(gl +2).

Using the expansion (6) of �(x)= (Kh ∗ Lg) (x) and the same tools as in the proof of lemma1, it is easy to check that

�1 =∫

(�∗ f )(x)2f (x) dx =RK , K ,h( f )+2glml (L)RK (l) , K ,h( f )/hl +O(gl +2).

Analogously, taking into account that hh = ()h/h,

�2 =∫

(�2 ∗ f ) (x) f (x) dx =RK2,h( f )/h+2glml (L)RK (l)K ,h( f )/hl +1 +O(gl +2).

As �2 =O(1), to finish the proof it is enough to recall that, by proposition 1, the dominantpart of Vn(h, g) is given by that of 4(�1 −�0)/n+2�2/n2.

As g0n(h) corresponds to the value of g that minimizes the asymptotically dominant partof MSEn(h, g), the proof of the first part of theorem 2 is very similar to the one that gives thedominant part of h0n; see Hall et al. (1991) or Hall & Marron (1991, rem. 4.2). The asymp-totic equivalence of g0n(h0n) follows easily from the next lemma, which provides asymptoticexpansions for the coefficient functions, C� and C�2, involved in the asymptotic formula ofthe optimal pilot bandwidth g0n(h).

Lemma 3Let K and L satisfy conditions (K1)–(K3) and f be a density such that condition (D) holds.If hn →0, then

C�(hn)∼ C�, C�2(hn)∼ C�2,

with C� and C�2 as in theorem 2.

Proof. Taking into account the definition of the coefficients, given in theorem 1, wejust have to find suitable asymptotic representations for the functionals RK (l) ,h( f )/hl andRK (l) , K ,h( f )/hl . If h=hn →0, then

RK (l) ,hn( f )/hl

n → (−1)l/2+1R( f (l/2)), (8)

RK (l) , K ,hn( f )/hl

n →∫

f (l)f 2 = (−1)l/2∫

( f 2)(l/2)f (l/2). (9)

We will only prove (8), as the proof of (9) may be obtained in an entirely similar way. Toshow (8) it is enough to note that RK (l) ,hn

( f )/hln = (−1)l/2RK ,h( f (l/2)). Then the desired repre-

sentation follows from the fact that m0(K )=−1+O(n−1) and the lemma included in Chaconet al. (2007, sec. 4).

Finally, the only thing left is to point out that the lth moment of a kernel L oforder l satisfies (−1)l/2ml (L) < 0, so that |ml (L)|= (−1)l/2+1ml (L). This can be shown by not-ing that, for a smooth enough density f (it does not matter which one), we can expressRL,h( f )=R( f )+ (−1)l/2ml (L)R( f (l/2))hl +o(hl ) if h=hn → 0. As RL,h( f ) ≤ R( f ) by Cauchy–Schwarz inequality, we get the sign of ml (L) by dividing by hl

n the previous expansion ofRL,h( f )−R( f )≤0 and taking the limit as n→∞.

Remark 7. Jones & Sheather (1991, p. 513), state that ‘all the kernels [of order k] inpopular use satisfy the property (−1)k/2+1mk(K )= rk for some rk > 0’. In the second part ofthe proof of the previous lemma we have shown that, in fact, every kernel of order k satisfies

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 17: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 17

such a condition. Thus, for instance, condition (i) at the bottom of p. 67 in Wand & Jones(1995) could be relaxed to (−1)r/2L(r)(0) > 0.

Proof of corollary 1. As a first step, it is not difficult to show that using the same toolsneeded to prove (8) and (9), we get

(RK (l) ,h( f )/hl )[1] ∼ (−1)l/2dK R( f (k + l/2))h2k−1, (10)

(RK (l) , K ,h( f )/hl )[1] ∼−dK (∫

f (2k + l)f 2 +∫

f (2k)f (l)f )h2k−1. (11)

Then, the result follows from theorem 2 and the asymptotics given in (8), (9), (10) and (11).

7.3. Proof of theorem 3

The proof of theorem 3 follows by standard arguments as in Jones et al. (1991) or Grund &Polzehl (1997). To apply their method of proof, it is enough to give suitable asymptotic repre-sentations for the expected value and variance of �[1]

∗ (h0n), where �∗(h)=M∗n,L,g(h) − Mn(h),

with g =gn(h) a pilot bandwidth sequence satisfying the conditions of theorem 3. The anal-ysis of the aforementioned expected value and variance will be carried out through a seriesof lemmas.

First, it is not hard to show that, for any sequence of functions g =gn(h),

RK ,h( f )[1] = 1h

R�,h( f ),

Tn,L,g(h)[1] = 1n(n−1)

∑i �= j

[1h

(�h ∗ Lg)+ g[1]

g(Kh ∗�g)

](Xi −Xj),

(12)

where �(x)=−K (x) − xK ′(x) and �(x)=−L(x) − xL′(x). Note that � and � are sym-metric functions (not kernels, though) of the same order as K and L, respectively, since theysatisfy mj(�)= jmj(K ) and mj(�)= jmj(L), for every j. Specifically, mj(�)=0 for j =0, 1, . . .,k − 1; mj(�)=O(n−1) for j =k, . . ., 2k − 1; m2k(�)=dK +O(n−1). In the same way, mj(�)=0for j =0, 1, . . ., l −1 and ml (�)=2lml (L).

The expected value of �[1]∗ (h0n) is given in the next lemma.

Lemma 4Suppose that K, L and f satisfy conditions (K1)–(K3) and (D). Write h=h0n. Then, forg =gn(h) as in theorem 3,

E[�[1]∗ (h)]∼2|ml (L)|�1h2k−1gl ,

where �1 = lR( f (l/2))�0 −dK R( f (k + l/2)).

Proof. Using the same techniques as in the previous results it is tedious, although notdifficult, to show that

E[(�h ∗ Lg)(X1 −X2)]=R�,h( f )−2dK |ml (L)|R( f (k + l/2))h2kgl +O(h2kgl +2), (13)

E[(Kh ∗�g)(X1 −X2)]=2l|ml (L)|R( f (l/2))gl +O(gl +2). (14)

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 18: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

18 J. E. Chacon et al. Scand J Statist

Using the properties of gn(h), together with (12), (13) and (14), we immediately get E[Tn,L,g

(h)[1]]=RK ,h( f )[1] +2|ml (L)|�1h2k−1gl +O(h2k−1gl +2).

In view of (12), to analyze the variance we will need suitable asymptotic expansions of theterms �0 =E[�hg(X1 − X2)]2, �1 =E[�hg(X1 − X2)�hg(X2 − X3)] and �2 =E[�hg(X1 − X2)2], where

�hg = 1h �h ∗ Lg + g[1]

gKh ∗�g. Asymptotic expressions for the variance of �[1]

∗ (h0n) are providedin the following lemma.

Lemma 5Assume the notations and hypotheses of lemma 4. Then, for g =gn(h) as in theorem 3,

(i) If l ≤2k then

Var[�[1]∗ (h)]∼ 2R( )R( f )

n2h3,

with (x)=x(K ∗K )′(x)−2xK ′(x),(ii) If l > 2k then

Var[�[1]∗ (h)]∼d2

K

[4n

(R( f (2k)f 1/2)−R( f (k))2

)+ 2R(L(2k))R( f )n2g4k +1

]h4k−2.

Proof. For g =gn(h) as in theorem 3, using standard arguments, through a series of carefulcalculations it is possible to write �0 ∼d2

K R( f (k))2h4k−2 and �1 ∼d2K R( f (2k)f 1/2)h4k−2. However,

the asymptotic expansion of �2 depends on the asymptotic behaviour of h/g. With the sametechniques, it can be shown that

�2 ∼{

R( )R( f )/h3 if l ≤2kd2

K R(L(2k))R( f )h4k−2/g4k +1 if l > 2k.

The result follows from the fact that the dominant part of Var[�[1]∗ (h)]=Var[Tn,L,g(h)[1]] is

given by that of 4(�1 − �0)/n+2�2/n2.

Proof of theorem 3. Notice that

h0nM [2]n (h0n)∼ (2k +1)dK R( f (k))h2k−1

0n , (15)

with

h0n ∼(

R(K )dK R( f (k))n

)1/(2k +1)

. (16)

Following the arguments given in Jones et al. (1991), we have

H∗ −h0n

h0n= 1

h0nM [2]n (h0n)

{E[�[1]

∗ (h0n)]+Var[�[1]∗ (h0n)]1/2Zn

},

where Zn is a sequence of asymptotically N(0, 1) random variables. The previous represen-tation, together with (15), (16) and the asymptotic expansions provided in lemmas 4 and 5,allows us to finish the proof.

Acknowledgements

The authors are grateful for the comments and suggestions of an anonymous referee. Thisresearch has been supported by the Spanish Ministerio de Educacion y Ciencia under ProjectMTM2006-06172.

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.

Page 19: matematicas.unex.esmatematicas.unex.es/~jechacon/index_archivos... · Scand J Statist Bootstrap bandwidth selection 5 values for p and m. Figure 1 shows that in this case this parametrization

Scand J Statist Bootstrap bandwidth selection 19

References

Aldershof, B., Marron, J. S., Park, B. U. & Wand, M. P. (1995). Facts about the Gaussian probabilitydensity function. Appl. Anal. 59, 289–306.

Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates.Biometrika 71, 353–360.

Cao, R. (1993). Bootstrapping the mean integrated squared error. J. Multivariate Anal. 45, 137–160.Cao, R., Cuevas, A. & Gonzalez-Manteiga, W. (1994). A comparative study of several smoothing

methods in density estimation. Comput. Statist. Data Anal. 17, 153–176.Chacon, J. E., Montanero, J., Nogales, A. G. & Perez, P. (2007). On the existence and limit behavior of

the optimal bandwidth in kernel density estimation. Statist. Sinica 17, 289–300.Chiu, S.-T. (1996). A comparative review of bandwidth selection for kernel density estimation. Statist.

Sinica 6, 129–145.Devroye, L. (1997). Universal smoothing factor selection in density estimation: theory and practice (with

discussion). Test 6, 223–320.Fan, J. & Marron, J. S. (1992). Best possible constant for bandwidth selection. Ann. Statist. 20, 2057–

2070.Grund, B. & Polzehl, J. (1997). Bias corrected bootstrap bandwidth selection. J. Nonparametr. Stat. 8,

97–126.Hall, P. (1983). Large sample optimality of least squares cross-validation in density estimation. Ann.

Statist. 11, 1156–1174.Hall, P. & Marron, J. S. (1987a). Extent to which least-squares cross-validation minimises integrated

square error in nonparametric density estimation. Probab. Theory Related Fields 74, 567–581.Hall, P. & Marron, J. S. (1987b). Estimation of integrated squared density derivatives. Statist. Probab.

Lett. 6, 109–115.Hall, P. & Marron, J. S. (1991). Lower bounds for bandwidth selection in density estimation. Probab.

Theory Related Fields 90, 149–163.Hall, P., Marron, J. S. & Park, B. U. (1992). Smoothed cross-validation. Probab. Theory Related Fields

92, 1–20.Hall, P., Sheather, S. J., Jones, M. C. & Marron, J. S. (1991). On optimal data-based bandwidth selection

in kernel density estimation. Biometrika 78, 263–269.Jones, M. C., Marron, J. S. & Park, B. U. (1991). A simple root n bandwidth selector. Ann. Statist. 19,

1919–1932.Jones, M. C., Marron, J. S. & Sheather, S. J. (1996). Progress in data-based bandwidth selection for kernel

density estimation. Comput. Statist. 11, 337–381.Jones, M. C. & Sheather, S. J. (1991). Using nonstochastic terms to advantage in kernel-based estimation

of integrated squared density derivatives. Statist. Probab. Lett. 11, 511–514.Lee, A. J. (1990). U-statistics: theory and practice. Marcel Dekker, New York.Loader, C. R. (1999). Bandwidth selection: classical or plug-in? Ann. Statist. 27, 415–438.Marron, J. S. (1992). Bootstrap bandwidth selection. In Exploring the Limits of Bootstrap (eds R. LePage

& L. Billard), 249–262. Wiley, New York.Marron, J. S. & Wand, M. P. (1992). Exact mean integrated squared error. Ann. Statist. 20, 712–736.Park, B. U. & Marron, J. S. (1990). Comparison of data-driven bandwidth selectors. J. Amer. Statist.

Assoc. 85, 66–72.Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scand. J. Statist. 9,

65–78.Sheather, S. J. & Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel den-

sity estimation. J. Roy. Statist. Soc. Ser. B 53, 683–690.Silverman, B. W. (1986). Density estimation for statistics and data analysis. Chapman & Hall, London.Stone, C. J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann.

Statist. 12, 1285–1297.Wand, M. P. & Jones, M. C. (1995). Kernel smoothing. Chapman & Hall, London.

Received November 2006, in final form February 2007

A. G. Nogales, Departamento de Matematicas, Universidad de Extremadura, Avda. de Elvas, s/n,E-06071 Badajoz, Spain.E-mail: [email protected]

© Board of the Foundation of the Scandinavian Journal of Statistics 2007.