Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of...

60
Geometrizing Rates of Convergence, II David L. Donoho Richard C. Liu Technical Report No. 120 October 1987 First Revision March 1988 Second Revision October 1988 Department of Statistics University of California Berkeley, California

Transcript of Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of...

Page 1: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

Geometrizing Rates of Convergence, II

David L. Donoho

Richard C. Liu

Technical Report No. 120October 1987

First Revision March 1988Second Revision October 1988

Department of StatisticsUniversity of CaliforniaBerkeley, California

Page 2: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

Geometrizing Rates of Convergence, II

David L. Donoho

Richard C. Liu

Department of StatisticsUniversity of California at Berkeley

ABSTRACT

Consider estimating a functional T(F) of an unknown distribution F e F fromdata X1, * * ,X,, i.i.d. F. A companion paper introduced a bound on the rate of con-vergence of estimates Tn of T as a function of n. The bound involved the modulus ofcontinuity b (e) of the functional T over F. The bound says that the estimation errorT^-T cannot converge to zero faster than b(n112) uniformly over F. This rate boundwas shown to be at least as strong as some earlier bounds on rates of convergence.In this paper we show that the "modulus of continuity" bound is attainable, to withinconstants, whenever T is linear and F is convex. In two nonlinear cases -- estimatingthe rate of decay of a density, and estimating the mode -- the bound is also attainableto within constants.We do this by introducing a new bound on the rate of convergence and showing thatthis new bound is always attainable (to within constants). The new bound is based onthe difficulty of testing between the composite, infinite dimensional hypothesesHo:T(F)<t and Hl:T(F)>t+A.The modulus bound and the new bound are comparable -- and hence the modulusbound is attainable to within constants -- whenever the difficulty of the hardest simpletwo-point testing subproblem is comparable to the difficulty of the full infinite-dimensional composite problem. This property holds whenever T is linear and F isconvex, and also in the cases of the tail rate functional and the mode discussed above.

Key words and Phrases: Density Estimation, Estimating the Mode, Estimating the Rate of Tail Decay,Rates of Convergence of estimators, Modulus of Continuity, Hellinger Distance, Minimax Tests, Mono-tone Likelihood Ratio.AMS-MOS Subject Classification: Primary 62G20. Secondary 62G05, 62F35.Abbreviated Title: Geometrizing Rates, IIAcknowledgement: This work was supported by NSF Grant DMS-84-51753 and by donations fromSchlumberger Computer Aided Systems and from Sun Microsystems, Inc. As in the other papers of thisseries, the authors would like to thank Lucien Le Cam for many helpful discussions. Jianqing Fan pro-vided valuable comments on an earlier draft.

Page 3: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 2 -

1. Introduction

Let T (F) be a functional of an unknown distribution F and let X1,... , X,, be i.i.d. F. As in

Donoho and Liu (1987a,b) (hereafter [GR I] and [GR i]), we are interested in estimating T(F). For

example, T (F) might be the linear functional f (0), the density of F at zero, or the nonlinear func-

tional ff 2, the squared L2-norm of the density f . Such functionals arise in nonparametric estimation

and have the general property that they cannot usually be estimated at a root-n rate. In fact if all that is

known is that F e F where F is a given class of smooth densities, it may turn out that no estimator

n= T (X1, X,,) can converge to T (F) at rate faster than nt/2 for some q <1.

In [GR I], this phenomenon was discussed and a new way of establishing it was introduced.

Given the modulus of continuity of T over the class F, with respect to Hellinger distance,

b(e) = sup(lT(Fi)-T(Fo)I:H(F1,FO) :sF e F) (1.1)

it was shown that no estimator can converge to T (F) faster than b (n-112) uniformly over F. This

bound is valid for all functionals, and it was shown in [GR I] that the bound is at least as strong as rate

bounds due to Farrell, Stone, and Hasminskii.

In this paper we discuss the attainability of this bound as regards rate. Since the b (n112) bound

subsumes several existing nonparametric, parametric, and semiparametric bounds, we know, of course,

from the extensive work on nonparametrics (e.g. Farrell (1972), Wahba (1975), Stone (1980), ...) that

the bound is often attainable. We show in this paper that for linear functionals, the rate is attainable in

great generality.

Some terminology. The loss function 1(t) is well behaved if it is a symmetric increasing function of ItI

and if 1( -t) < a l(t) for all t. Thus t2 and It I are well-behaved, with a =9/4 and a=3/2 respectively.

We write f (n ) 4 g (n ) if the ratio of the two terms is bounded away from zero and infinity as n o.

Combining Theorems 2.1, 2.4, and 3.1 below, we get

Corollary. Let T be linear and F be convex. If T is bounded on F, so that

s? IT|(F)I < Co,

and if b (e) is Holderian with exponent q, so that

Page 4: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 3 -

b(e) = Ceq+O(eq),then the optimal rate of convergence is b (n 112):

inf sip EF I (T,, -T) #%J I (b (n -"2)).

for any well-behaved loss function 1.

Thus, for linear functionals -- the density at a point, the derivative of a density at a point, the

density of a convolution factor at a point -- the optimal rate of convergence is r = q /2, where q is the

exponent in the modulus of continuity. In short, the rate of convergence -- a statistical quantity -- is

determined by the modulus of continuity -- a quantity deriving from the geometry of the graph of T

over the regularity class F.

We establish this result by directing attention away from the modulus of continuity, and focusing

instead on (another) new bound on the rate of convergence. In section 2 we derive a new bound from a

measure of the difficulty of testing the composite hypothesis Ho: T(F):St against the composite

hypothesis H1: T(F)2t+A. While in general, this new bound is much more difficult to compute than

the modulus bound, it appears to be the "right thing" to be computed. Indeed, under a certain

hypothesis on the asymptotic behavior of the new bound (see (2.9)), it is always attainable to within

constant factors, whatever be the functional -- linear or nonlinear. This is, to our knowledge, the first

lower bound on estimation of functionals which comes equipped with a (near-) attainability result

In section 3, we show that, in the linear T, convex F case, the modulus bound and the new bound

agree to within constants. The hypothesis (2.9) holds, and so the attainability of the new bound to

within constants implies that of the modulus bound to within constants.

In section 4, we show that the modulus bound and the new bound are equivalent if and only if a

certain minimax identity holds, at least approximately. That is, the testing difficulty of the hardest sim-

ple subproblem HO:Fo versus H1:F1, with T(Fo)<t and T(F1)>t+A, should be roughly the same as

the difficulty of the full composite problem Ho:T(F)<t versus H1:T(F).t+A. Thus, the modulus

bound "works" in the case of T linear, F convex, because the difficulty of the hardest 2-point subprob-

lem is comparable to the difficulty of the full problem.

Page 5: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 4 -

In section 5 we discuss some examples of nonlinear functionals. The first is the rate of tail decay

(Du Mouchel (1983), Hall and Welsh (1983)). For this functional, the minimax identity holds precisely.

Actually, in this case, the minimax test of Ho:T(F).t versus HI:T(F)>t+A can be worked out in

detail; it turns out to have a cerain monotonicity in t which shows that the new bound can be attained

to within a factor 2. In the second example, esimating the mode, the minimax identity does not hold,

but the hardest 2-point subproblem has a difficulty that is again comparable to the full problem, and so

the modulus is again attainable.

One should not always suppose the modulus bound to be attainable in the nonlinear setting. As

one can infer from recent results of Ritov and Bickel (1987) and, in a related problem, of Ibragimov,

Nemirovskii, and Hasminskii (1987), attainability of the modulus bound can fail already for quadtic

functionals. Our calculations, which we plan to present in another paper, give examples where the

modulus bound and the Farrel/Stone/Hasminskii bounds fail to give the optimal rate, but our new

bound can be computed, satisfies the hypothesis (2.9), and so gives the right rate.

An interesting feature of our approach is the use of notation and techniques due to Le Cam (1973,

1975, 1985) and Birg6 (1983). In brief, the idea is that the difficulty of an estimation problem ought to

be determined by the difficulty of a corresponding testing problem. As Le Cam has shown how to

bound the difficulty of certain testing problems in terms of Hellinger affinity, and has developed certain

useful tools for computing Hellinger affinity, his machinery is well suited for this paper, which seeks to

relate the Hellinger modulus to the difficulty of certain tests. In particular, Le Cam's little known result,

given below as Lemma 3.4, is fundamental. Also, a technique of Birg6 (1983) allows us to translate

exponential bounds on testing efrors (such as (2.10)) into bounds on expectations of well-behaved loss

functions (such as (2.12)).

These results should be compared with those of Birg6 (1983). He found that for the problem of

estimating the entire density (and not just a single functional of it), the geometry of the problem,

expressed in terms of certain dimension numbers, determines the optimal rate. In this paper, we show

that for estimating a linear functional, the geometry, expressed in tenns of the modulus of continuity,

determines the optimal rate. We note that the problem of recovering the entire density is like recover-

Page 6: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 5 -

ing a whole collection of linear functionals, and so is in some sense a linear problem. Thus, our work

and Birg6's both say that for linear problems the optimal rate derives from the geometry of the prob

lem.

Page 7: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 6 -

2. An Attainable Bound

As in section 1, let T be a functional of interest, and let F be the regularity class in which F is

known to lie. Let Fs, and F2, +,& denote the subsets of F where T takes values s t and a t + A, respec-

tively. Let F^',) denote the set of product measures of X1, . . . X,, iid F, F c F.,, and similarly for

F(A*)+i. Denote by conv (F(^,) the set of all measures on RI which can be gotten as convex combina-

tions of the product measures in F(.) Such a measure corresponds to the following: a random device is

used to select an element F a FS,, and then n observations are taken from this realized F. In words,

conv (Fr,,)) represents all the joint distributions of data X1, X,, which can be obtained by Baye-

sians under a scheme in which (X1, X,,) and F are random, with X1,*** X. conditionally i.i.d. F,

and where F is a random element taking values in F,,.

Let P and Q be probability distributions on a common space. Then the testing affinity (LeCam

(1973), (1985)) is

JC(P,Q) = iif Ep ¢ + EQ (1 - (2.1)

it is the sum of errors of the best test between P and Q. If P and Q are sets of measures, let i (P, Q)

denote the largest testing affinity tr(P,Q) between any pair (P,Q) with P a P and Q e Q-the

difficulty of the hardest "two-point" testing problem. We note, following Le Cam (1973, 1985) that if

we view P and Q as composite hypotheses, the minimax risk, i.e. the risk of the best test for separating

P and Q is iX (conv (P), conv (Q)). (Unless P and Q are convex, this minimax risk is usually unequal to

the risk t (P, Q) of the hardest 2-point problem). We note that ir(P ,Q) = I - I L I(P,Q), where

L 1(P ,Q) = fjdP -dQ I denotes the LI distance, so computing the minimax risk amounts to finding the

L 1 distance between the convex hulls of P and Q. Note that 0 5 i 5 1.

2.1. The Lower Bound

Our two main definitions are as follows. The upper affinity aA (n, A) of the estimation problem is

aA, (n,A) = sup i (conv (F(")), conv (F(),)&)). (2.2)

This is the minimax risk of the hardest problem of distinguishing Ho: F,, and H1: F, + a at sample

Page 8: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 7 -

size n. Next, we let AA (n,a) be the function inverse to aA:

AA (n,a) = sup (A: aA (n,A) 2 C).In words, AA (n,a) measures the largest A at which, in a sample of size n, one cannot test hypotheses

Ho: Fs, and H1: Fat +A with sum of errors less hn a. As one might expect based on the exposition

in [GR 1], AA places certain limits on how well T can be estimated. Essentially, this is because any

estimator T,, of T gives rise to a test decide Ho if Tx s T + A/2, decide H1 if T. > T + A/2.

Theorem 2.1 (Lower Bound).

inf sr PF (I T, - T (F) I AA(n,a)/2) > a/ 2 (2.3)

Proof. Without loss of generality let the supremum over t in the definition of aA be attained, at to,

and the supremum over A in the definition of AA be attained; otherwise an eF and E2 would have to be

added in several places below, and later picked arbitrarily close to zero.

The minimax risk for testing between Ho: Fsto and H1: F.t,+A is a. It follows that for any test

statistic

a :s SUE PFO (reject Ho) + PF1 (accept Ho)Foe YstO0

F1eF2t

so

a/2 supF0EI max(PFO (reject Ho) .PF1 (accept Ho)).Fie Fato+

This implies that the test mentioned earlier, based on T, has at least the indicated maximum of Type I

and Type II errors. Now

PFO(ITo - T (F) I . A/ 2) > PF (T - T (F) > A/ 2)= PF (reject HO)

and similarly

PF1 (ITx -T (F)i 2 A/2) 2 PF1 (accept Ho).Combining the last 3 displays gives

Page 9: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 8 -

F0eF, Fmax PF (IT -T(F)| a A/2) > a/2 (2.4)0'

F1e F2 to+,

as F0, F1 e F and TX was arbitrary, (2.3) is proved. 0

Corollary. For each a in (0, 1), AA (n a) is a bound on the rate of convergence: for any symmetric

increasing lossfunction I (t),

inf sVp EF l(T, -T (F)) 2 l (AA(n,a)/2) a/2 (2.5)

for all n.

The reader should note that AA is (nearly) the best lower bound derivable by a testing argument...

Indeed, for each t, A, and n, there exists a test between F, and F2,+, which attains the lower bound

(2.4) within a factor 2. Thus the key inequality (2.4) cannot be improved by more than a factor 2.

In the form we have stated it here, the lower bound is original. However, there is some relation with a

bound on the size of confidence sets, due to Meyer (1977). The only examples the authors know of

where an attempt is made to calculate something resembling this bound are Hall and Marron (1987) and

Ritov and Bickel (1987). In both examples, the authors are attempting to lower bound an estimation

error by the Bayes risk in testing between highly composite finite hypotheses. While they don't expli-

citly define any of the quantities we will deal with in this paper, a sympathetic reader may agree that

their efforts are in the same direction.

2.2. An Estimator derived from Minimax Tests

It is reasonable to guess that because the bound (2.3) cannot be substantially improved by a test-

ing argument, it might be nearly attainable. Let us consider, then, constructing an estimator using the

minimax tests which come close to attaining the key inequality (2.4). The minimax test for a given n,

t, and A may be thought of as follows: It defines an acceptance region, a measurable set

A = A (t,n,A) c R , such that if the sample X1,X2, * ,X,, falls in A, we accept H0: F!,; otherwise

we reject Ho. The existence of such tests allows us to "construct" an estimator. This estimator is not

intended to be implemented on a computer; but its finite, concrete character allows us to demonstrate

that the bound AA (n,a) can be (nearly) attained in great generality. In order to guarantee that a

Page 10: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 9 -

minimax test has the indicated form, it is convenient to assume all the elements of F are absolutely

continuous with respect to a fixed measure.

The Binary Search Estimator

The estimator we propose requires that T be bounded on F: M = SUPF I T (F) I < . The estima-

tor has a "tuning constant" A, which will depend in a prescribed way with sample size. At a given

sample size n, A is fixed and we proceed as follows. Let N =N(M,A) be the smalest integer such that

3(-v A>2M. Let IN 32)N /2 ad hN=+( ) A/2. Then the interval [IN,hN] contains [-M M]. At2 2 2

this point we proceed as follows. Given data X1 . . X,,, we perform a minimax test between the

upper third of [lN,h,,] and the lower third, i.e. we test F,-M/3 against F,M/3. We then form a new inter-

val [lN-1,hN-1] by deleting from the current one whichever third - upper or lower - is rejected by the

test. After testing the lower third of the new interval [IN1,hN.1.] against the upper third, we form the

interval [lN1-2,hv-2] by deleting from [lN-1,hN-1I whichever third was rejected. Continuing in this way,

we get a sequence of intervals, each one 2/3 as long as the previous one; we arrive after N stages at an

interval [lo,hol of length A, and we pick as our estimate T. the midpoint of this interval. The key

result about the behavior of this procedure is

Lemma 2.2. Apply the binary search estimator with parameters A, M, and N. Set lo=-2 and

nk = (2)kA for k .1. Set dk 3)Afor k =O,* . Then

N-1,UpPF IT, - T(F )I > 1,n I < l; aA4 (n ,di ).(26Spr i=kt(26

In particular,

A N-1PF {ITm-T(F)I >A) < ZaA(nX 1 ( 3 )kCA)2 k=O '2 2

which makes an interesting comparison with (2.3). Below we will see that under (2.9), aA (n 3(-)kA)

decreases rapidly with k, and this upper bound is comparable with the lower bound (2.3).

Proof. We first give a formal description of the algorithm.

Page 11: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 10 -

Algorithm Estimate ( A N :

k :=N

'N -1 2

hN :=1 3 A

while k > 0 do

ak :=tk+-j(h-Ilk)

bk :=lk+ (hk -lk)

Test Ho:Fgak against Hi:FZb,

if Accept Ho then /* new interval is (l,b) */Ik-I := 1k ; hk-l := bk

if Reject Ho then /* new interval is (a,h) */lk-I:= ak ; hk1:= hk

k :=k-1end while

T, = (10+ho)/2

end Algorithm

Suppose that in place of the Test step in the algorithm, we could substitute an oracle that always

answered correctly. Running such an ideal algorithm would produce sequences ((l*,hZ),k =0,*** N)

and ((akb ), k = 1, N all functionals of F.

Consider now the tests 1* ,t, with tk rninimax for testing

Ho:F,. versus H1:Fbk..The probability that 4k decides incorrectly is

7c(conv(F,),conv(F(r) ) < aA(n,b*-a*) CAA(n,dk-l) (2.7)

Consider now (2.6), and let k >0. If the tests 4i all decide correctly for i =k+1, ,I, then

T, E (I*,h*) and so IT, -T(F)I s h -1k Ilk. Therefore,N

P(IT,,-T(F)I>lkl S P( U ft,decides incorrectly)) (2.8)i=k+1N

< £ P E,i decides incorrectly)i=k+l

Page 12: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 11 -

N-l< Z caA(ndi);

the last step uses (2.7). The argument in the case k =0 is similar. Cl

N-1While the sum £ aA (n ,di) may look difficult to work with, a simple hypothesis on AA (n ,a) affords a

useful bound.

Theorem 2.3. Let asce (0,1) be ftxed. Suppose there exist q >0 and 0 <A0 <AI <o0 so that

A0 Ilg al < AA(n,a) s Al Ilogal] (2.9)for Ilogal/n <e-0. Pick nO so that Ilogal/n0 < e0 and ao aA(nOA0/A eg'2) < 1. Define

f3 ' llog(a (2-a))l and = Ilog(ao (2-ao))I. Then with A C AA(n,a) and di as in4 AO 4n0

Lemma 2.2

N-1 20Z aA(n,di) < +r,, (2.10a)

N-i o2i- a(n,di) < 1_02 + r, (2.1Ob)

for n > 2no, where

0 exp(-C2"q ,B)

r. - 1log(A3M'2) exp(-n y) (2.11)

The proof is given in section 7. In view of (2.6), these bounds imply that for the binary search estima-

tor with parameters A, M, and N, we can have P (IT,,-TI > KAA(n,a)) as near zero as we like, by

choosing C large and K still larger. Thus AA (n,a) is the optimal rate of convergence (Compare (2.5)).

A more precise statement is possible for well-behaved loss functions (recall the definition in the intro-

duction).

Theorem 2.4. Suppose that 1(t) is well-behaved with constant a, and that (2.9) holds. Pick C so large

that 02a < 1. Then for the binary search estimator with parameter A = C AA (n ,a) we have

EF l(Tf-T) FA F(A(n,a)) n >nj (2.12)for every F EF, where

Page 13: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 12 -

A20a (2 10a (2.13)1

1-02aThe proof is in section 7. Combining (2.12) with (2.5) gives

Corollary. Under the assunptions of Theorem 2.4,

inf sup EFI(T -T) X l(AA (n,a))TX,In words, the minimax risk has the same asymptotic behavior as l (AA (n ,a)), to within constants.

This use of minimax tests to construct estimators is inspired by work of Le Cam (1973), (1975),

(1985) and by Birg6 (1983). The Le Cam-Birg6 approach was developed for the problem of estimating

an entire density, not just a single functional of it. It is based on covering the space F by Hellinger

balls and then testing between balls to see in which ball the true F lies. Our approach differs, in that

we are testing between level sets of the functionals in question. As far as the authors can see, testing

between balls could not give the results we are looking for.

Page 14: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 13 -

3. Attainability and Linearity

The reader may suppose, rightly, that AA is not easy to calculate. In the important special case

where T is linear, it may be bounded using the modulus of continuity, as we show in this section.

3.1. The Main Result

Theorem 3.1. Suppose T is linear and F is convex. Fix coe (0,1) and aoce (0,1). Then for a .caO

and log a l/n < F0 there exist universal constants c , C with

b(c 7Ilg l) < AA (n,a) e, b(C Ilo a (3.1)We may take C = 412 and c =1/2, for ao, F0 small enough.

If b is Holderian, (3.1) establishes assumption (2.9). Invoking now the Corollaries of Theorems 2.1 and

2.4, we get the Corollary cited in the introduction.

We should emphasize that an inequality of this sort should not be expected for every functional -- the

modulus bound is simply not attainable in general. The lower bound can always be established; it is the

upper bound that may fail.

3.2. The Best 2-Point Testing Bound

To clarify matters somewhat, let us introduce yet another lower bound on the rate of convergence. The

"two-point testing bound" A2 (n ,a) is defined as follows. Let

a2(n,A) = sup 7 (F('), F (A)+,&) (3.2)

Note the omission of the convex hull operation in comparison with the definition (2.2) of aA. Similarly,

let A2 (n a) be the inverse function of a2. We can also write

A2(n,a) = sup( IT (Fl) - T (F0) :n(Ff`)FP)) >a). (3.3)

This is a lower bound on the rate of convergence. Indeed, as a < aA , we have

A2(n,a) c AA (n,a); (3.4)

as AA has the lower bound property (2.3), it follows that A2 is a lower bouind as well. Thus, (2.3) holds

with A2 in place of AA. One could also argue directly -- compare Theorem 2.1 of [GR I].

Page 15: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 14 -

One can say more; L2 iS (nearly) the best possible two-point testing bound. Thus, for a given n and a,

the largest 8 for which there exists a pair (F0,F1) with T(F1)- T(F0) > 8, and which cannot be dis-

tinguished by the best test with sum of errors better than a, is precisely A2(n,a). No 2-point bound on

the maximum probability of error can exceed a, while this bound guarantees at least ac2.

The two point bound is closely related to the modulus. Indeed we have

Lemma 3.2. Fix co (0,1) and ao e (0,1). There exist constants c and C so that for a<cao< 1 and

log aVn <o0,

b [c lg al] 5 A2(n,a) s5 b c Ito& al] (3.5)

We may take C = F and c2/2 = (I -e*) log (2-ao)caFE0 log ao

Before giving the proof, we need some facts from Le Cam (1973), (1985, Chapter 4). First, recall the

Hellinger Affinity

P(P,Q) = J4IpdAd (3.6)where p and q denote densities with respect to a measure j. which dominates P and Q (e.g.

A =P + Q). We have the inequalities

JC (P,Q ) 5 p (p, Q), p2 <_ 7(2-nt) 37

where x is the testing affinity, and the identity

p(P,Q) = -(2-H2(P,Q)) (3.8)2

where H denotes Hellinger distance. We also have the elementary, but very useful, formula

P(P (")Q(-))= p(P,Q)a (3.9)

where P(a) and Q(a) denote n-fold product measures with marginals P and Q. Arned with these, we

can proceed.

Proof. Define

ho(n,a) = inf (H (Fl,Fo): J(Ff"),FR)) s a)and

Page 16: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 15 -

hI (n,a) = sup (H (F 1,F 0): n (F F(V') . a).Using (3.6)-(3.9), we have the easy inequalities

h 2 (n,a) a 2(1 - (a(2-a))112") (3.10)

h2 (n, a) !r 2(I - al'^ 3.1

Combining these with the definition of b (e), we have

b (ho(n,a)) 5 A2(n,a) s b (h1(n,a)). (3.12)The result then follows by (3.13) and (3.14) below. 0

Lemma 3.3.

(1 - al"A) s ,logal (3.13)

Fix ao < 1, e>O. There exists a finite positive constant c so that for a < aD, I log a lln <co we have

(1-(2a)2n) > c2/2 log (3.14)n

We may take c 2/2 = (1-e 4C) log (2-a0) a0C0 log ao

This result is proved in the appendix, section 7.

In particular, If b (e) is Holderian, then b (n -"2) is equivalent, to within constants, with A2(n ,a). And so

the question of the attainability, as regards rate, of b (n -"2) is equivalent to the attainability of the best

2-point testing bound. Compare also section 6 of [GR I].

The reader will note that (3.4)-(3.5) together establish the lower bound of (3.1) -- without any

hypotheses on T or F.

3.3. Establishing the Upper Bound

Le Cam has established a fact which seems, at first, quite similar to (3.9) but is in fact far deeper.

Lemma 3.4 (LeCam, 1985, Chapter 16, page 477). Let P and Q denote sets of probabilities and P(a),

Q(n) the sets of corresponding product measures. Then

p (conv P(a), conv Q(M)) < p (conv P, cony Q)M. (3.15)

We remark that this is not an obvious consequence of the identity p (P(,Q)) p (P , Q)". Combin-

ing (3.7), (3.15), and the definition of aA, we have

Page 17: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 16 -

Corollary.

aCA (n,A) s sup p(conv (F.S), cony (F.,t (3.16)

Thus the Hellinger Distance between the convex hulls of F,t and F., + A may be used to bound a,A.

The upper bound in (3.1) follows more or less directly from this. To see how, notice that

£ = inffH(Fts,Fzt+b(E))- (3.17)

Combining this with (3.8) we have

p(FS,, I+bc) . 1 - e2/2. (3.18)Now, and this is the key observation, if T is a linear functional, and if F is convex, then F,t and

F,, +A are both convex, for all t and all A. Thus F,, = convyF,, and F?t+b(e)=convF.,+b(e);

combining (3.16) and (3.18),

aA (n,b (e)) s (1-e2/2)" (3.19)and so

AA (n,za) s b (V2(1 -aT'^)). (3.20)

At this point we invoke again Lemma 3.3. Equation (3.13), combined with (3.20), gives the upper

bound in (3.1). This completes the proof of Theorem 3.1.

Page 18: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 17 -

4. Attainability and the Minimax Identity

In general, a relation such as (3.1) between b (n -'2) and AA (n, a) is not to be expected. It

requires essentially that the hardest two-point subproblem of testing Fv versus Ft+,A be roughly as hard

as the full problem. Let us see how.

4.1. The Minimax Identity

The 2-point testing bound and the attainable bound have an interesting connection. As (3.4)

shows, the 2-point bound is always smaller; as (3.1) and (3.5) make plain, when T is linear and F is

convex

AA (n,a) . C A2(n,a) (4.1)for an appropriate constant C, for small a and large n.

It seems natural to ask if the 2-point and the attainable bounds can ever agree, i.e. if we can

have C=1 in (4.1). Chasing a few definitions, this leads in turn to the question of whether we can have

it (conv (F (t)), conv (F ( A)) = it (F (r), F (A)+ A); (4.2)

Indeed, the quantity on the left hand side is the main ingredient in the definition of AA, while that on

the left is the main ingredient in A2. Now if we return to the definition of i as a measure of the

difficulty of testing, we see that the quantity on the left is

inf sup RX (C,(Fo,F ))tests FoE r5tCFslFe I+

where R. (C,(F0,F1)) is the "risk"' EFh) (+EF(M) (I - C) representing the sum of errors of the test 4.

This is the minimax risk for the problem of testing the composite hypotheses F<t versus F2,+A. On the

other hand, the quantity on the right of (4.2) is

sup inf R. (Fo.F1)).0o 5t tatF0EF>t,&

This is the risk of the hardest 2-point testing problem. Consequently, the identity (4.2) is equivalent to

the minimax identity

Page 19: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 18 -

inf suR, (,F(F0,F)) = su infR. (C, (F0,F 1)). (4.3)

This identity says, in words, that the minimax risk in testing between the infinite dimensional composite

hypotheses F5, and F.t + & is precisely the risk of the hardest 2-point testing problem.

We will see below two concrete examples where this minimax identity holds. For clarity, we summar-

ize some implications the identity would have

Lemma 4.1. If (4.2) holds for every t, and n, and all A<AO then AA = A2 for large n, and b (n1/2)

represents the optimal rate of convergence of an estimate T,, to T.

Indeed, the conclusion that AA = A2 foUows from the definition of these quantities, and the conclusion

that b (n-112) is the optimal rate follows from (3.5), and Theorem 3.1.

It does happen that (4.2) holds in interesting examples.

Theorem 4.2. Let T(F) = f(O) and let F be the Sacks-Ylvisaker (1981) class

SY = (f:f (x)=f (0)+xf'(0)+r(x),f(0)sM, Jf =1, f 0

Ir(x)l s x2/2}.

(Here we must have 43 M 2< 1). Then for every t and n and every A small enough, the minimax

identity (4.2) holds, and so AA = A2for large n.

It is known that in general, one cannot expect (4.2) to hold. One case when (4.2) does hold is

when the sets F., and Fz,+Aj are generated by capacities -- see Huber and Strassen (1973), Bednarski

(1982). This is much stronger than simple convexity of the two sets. However, Le Cam's result, as

recorded in Lemma 3.4 above, says that convexity alone is enough to guarantee that a certain approxi-

mate minumax identity holds.

Lemma 4.3. If F,t and F2,+t., are both convex,

p(conv (F(,)), conv (F( .)) = p(F(t). F( 4 (4.4)

This says that, although (4.2) may not hold when just convexity is assumed, its analog, with X replaced

by p, does hold.

Page 20: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 19 -

Proof. We have

P(Fr. ,Fj,+AY' > p( conY (Fr)), conv (F`'t))p( Fet), F,(#&)A

=p( Fst, F>t,+A`,the first line following from Lemma 3.4, and the assumed convexity; the second from the obvious inclu-

sion relation; and the third from the formula (3.9) for affinity of product measures. As the first and last

quantities are the same, it follows that the middle inequality is actually an equality. Hence, (4.4). 0

Because of the inequalities

n < p, p2 < (2)

(4.4) places definite limits on how different the two sides of (4.2) can be, for large n. In fact, we get

for the ratio of logarithms that

1 log ( <) < 2(1+l1/logaA(n A)I). (4.5)Ilog aLA (n A)

Thus, at every n and A for which aLA (n ,A) < co < 1, we can bound the discrepancy between a2 and aCA.

In this sense, Le Cam's Lemma 3.4, which underlies (4.4), is an approximate minimax theorem. And

one could say that Theorem 3.1 holds because (4.2) "almost" holds when T is linear and F is convex.

4.2. A near-equivalence

In the case where T is linear and F convex, we have seen that AA < CA2 and also that

Ilogoc2i < M ilogaA I+D. In generl, whatever be T and F, these two accompany each other, so that if

one holds, so does the other. This gives a clue to the general attainability issue; attainability of b (n 1/2)

really does imply that the two sides of (4.2) are close -- but only in the sense that an inequality on log-

arithms such as (4.5) holds. We state two formal results; they are proved in section 7.

Theorem 4.4. Suppose that b (e) is Holderian with exponent q E (0,1]. Then there are constants

ao E (0,1/2) and Eo E (0,1) with the following property. If there exists a finite positive M such that

log a2(n ,A) I A ( A) O (4.6)tlogaA(n A)

then there exidsts a finite positive C such that

Page 21: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 20 -

AA (n,a) < C A2(n,a) a.ao, Ilogal/n<e0.Theorem 4.5. Suppose that b(e) is Holderian with exponent q e (0,1]. Then there are constants

eoE (0,1/2) and co G (0,1) with the following property. If there exists a finite positive C such that

AA(n ,a) . CA2(n ,a) a(<ao< 1, IlogaVn <e0.

then there exists a finite positive M such that

Ilog a2(n ,A) IIlogA(nAi ~M aA(n,A).: ao. (4.7)llog aA (n A^) I a

In the sequel [GR Ml and in Donoho and Liu (1988c) we give examples where a minimax iden-

tity is key to attainability of the modulus at the level of constants. Compare also Ibragimov and

Hasminskii (1984); this paper, although it does not use the modulus of continuity, shows a connection,

between a minimax theorem and precise evaluation of constants in certain nonparametric estimation

problems.

5. Attainability in two nonlinear cases

In this section we study two nonlinear functionals in order to see how the ideas of the preceding sec-

tions carry over. In the first example, the minimax identity (4.2) holds, and everything flows automati-

cally. In the second example, (4.2) fails, and we must work hard with our bare hands.

5.1. Estimating Tail Rates

While it is most intuitive to consider estimating the rate at which the tail of a density approaches

0 as x -*00 (compare Du Mouchel (1983)), a transfonnation of the problem (to observations Yi= /X5)

leads one to consider estimating the rate at which a density, known to be zero at the origin, approaches

this limit as x -+0k (compare Hall and Welsh (1984)). We adopt this point of view here. Accordingly,

let F be the set of distributions supported on (0,00) with densities f satisfying

f (x) = Cx' (1 +r(x)) O<x <8(5.1)with

0< to < t < tl <co (5.2a)and

Page 22: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 21 -

O<C_.< C .C+<0 (5.2b)

and

Jr (x )i < c 2X'p (5.2c)

For such an F E F, let T(F) = t, where t is the exponent in (5.1). This functional is nonlinear.

Consider now the problem of testing F., against F2t+a. In [GR I] we have shown that the closest

pair in a Hellinger sense has the form

f (x) = C..tx'(1-c2xP) x<a (t,A) (5.3)

f"l(x) = C+x'+(1+c2xP) x <a1(t,A) (5.4)and

fO*(X) f *(al-u-- x > a, (5.5)

fi (x) fl (a,)and

a,*( ) 1- JfO (v)dv

o (a1) _0 ~~~~~~~~~~~(5.6)fl (al) a,1 - fff (v)dv

0

As we will see, this closest Hellinger pair represents the hardest two-point testing problem. F'nm the

properties of this pair, we can show that the minimax identity (4.2) holds in this case.

Theorem 5.1. For the pair (Fo,F;) described above, we have

it (conv (F'), conv (F,,+,)) = it (F, F,,+,) = it((F* )(n),(Ft )(n))and the minimax test between Fs, and F>,+A is the likelihood ratio test between F and F

Proof. The likelihood ratio Lt,A(x) = f 1 (x)/f (x) has, according to (5.3)-(5.5), the forrn

[C A(1+c2xP) O<x<a,C_ (I1-C 2X'P)

Lt,A(x) = 1 C+ (1O+c2a) (5.7)C_ 1-c2a¶)

This is a non-decreasing function of x.

Among all distributions in F<t, Fo is the stochastically largest. Similarly, among all distributions in

F>,+A, F1 is the stochasticaUy smallest. This implies that the distribution of L,tA(X), where X is

Page 23: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 22 -

distributed F, is stochastically largest under the null hypothesis at F =F*, and stochastically smallest

under the altemative hypothesis at F =Ft.

Now let X1, * X, be iid F. Consider the likelihood ratio statistic

A

Ln,,A= .rI LtA(X,. (5.8)

Under HO:F_, this statistic is then stochastically largest at F =Fo, etc. Therefore, if we consider

accepting Ho when L.,t,A5 1 and rejecting when L,,,,a > 1, we have

SUp PF (Reject Ho0 = PF * (Reject Ho)FeF_ 0

SpUPF (Ac(p ) PF (Accept Ho)

F 6 _.t^

It follows that the worst sum of Type I and Type II errors of our test occurs at (Fo ,F ). But the

Likelihood Ratio test is optimal for that pair, and hence it is minimax. 5

As we show in [GR I], the modulus is in this case not Holderian, so that Lemma 4.1 in this case does

not apply. However, we can use the minimax identity to show attainability in a different way. An

extra level of structure in the minimax tests of section 2 may exist which we have not previously con-

sidered: monotonicity in t. We can state this in terms of acceptance regions as

A (t,n,A) ' A (t + h, n,A) Vh >O. (5.9)

The following result is proved in section 7.

Theorem 5.2. For all sufficiently small A, the likelihood ratio Lt,,A(x) is monotone decreasing in t for

each fixed x.

It follows from this theorem that the minimax test for our problem has acceptance region

A (t,n,A) = ((Xi)&,^: 17 L,A,(Xi) s 1)i=1

with the monotonicity property (5.9).

Consider what we call the likelihood ratio estimator

*AAT,, =-2 + sup (t : TL,,A(Xi) . 1, t t[t0,t1-A]). (5.10)

By the monotonicity established in Theorem 5.3, T.,, is always uniquely defined.

Page 24: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 3 -

Theorem 5.3. Suppose that Li,A(x) is monotone decreasing in t for each fied x. Then

s pPF(IT.e - T(F)II > 2}s 2ac (n,A) (5.11)

This is to be compared with the lower bound (2.3); it is parallel in form; but in the lower bound the 2 a

is replaced by cc/ 2.

Proof. By the monotonicity in t of LtA,

TnA - T(F) > A/2

happens if and only if the minimax test between HO: FST(F) and H1: FzT(F>+s would reject Ho. The

probability of this event is smaller than aA (n, A) by definition. Similarly, the probability of the event

T,lA- T (F) < -A/ 2

is also less than aA (n,A). As I T^,,- T (F) I > - is the union of these two events, (5.11) follows. 1

2

Thus, in this case, the lower bound AA (=A2) is achievable ithin a factor 4. We consider it likely that

the factor 2 on the right hand side of (5.11) can be dropped (asymptotically).

5.2. Estimating the Mode

Now let F be the class of distributions with unimodal densities f, that are uniformly bounded:

f(x) < M (5.12)and have quadratic maxima:

f (mode )-c_x2 <f (x) <f (mode)c+x2 Ix -mode I <8. (5.13)

Let T(F) = mode (F).

In [GR I] the modulus was computed for this problem, and so the closest pair in Hellinger distance was

derived; it has the form (for A small enough)

fo (x) = M -c+(x-t)2 x E (t-a2(A),t+a3(A)) (5.14a)f O (x) = M -c +a 3(A)2 x E (t+a3(A),t+A-a3(A)) (5.14b)

f*(x) = M -cC _(x _t)2 x E (t+A-a3(A),t+A+a2(A)). (5.14c)

fA (x) = fo(2(t+A

)-x) (5.15)f 0(x) = f (x) x d (t-a2(A),t+A+a2(A)) (5.16)

Page 25: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 24 -

This closest Hellinger pair probably represents the hardest 2-point testing problem. A proof based on

stochastic minorization, as in Theorems 4.2 and 5.1, will not quite work here, however. On the other

hand, one can show, using the convergence of experiments approach of [GR M], section 7, that this

pair is asymptotically hardest That is, if we set A = cn"5, then for large n, we will have

n((Fo )(*

,(F1 )( )) = F7) F2 (1 +o(1)).

However, the minimax identity (4.2) definitely does not hold in this case. Consider using the likelihood

ratio of this pair to test Fo against F1, where Fo has its mode at t-28 and F1 has its mode at t+28,

and both fo and fI are equal on the interval (t -- tt+A). Then F0e Fr, and F1 E F2,+,, but the likeli-2'

hood ratio statistic based on f l (x )/f o (x) has the same distribution under F0 as under F 1. Conse-

quently, the worst sum of type I and type II errors of this test is 1.

Although the likelihood ratio between Fo and F1 cannot furnish us with a useful test, it does suggest

a useful estimator. Roughly speaking, for small A, the likelihood ratio test decides in favor of Fo if

Z K4(X, -t ) 2 1Kx(Xi -(t +A))and in favor of F if

Y.Kn (Xi -t ) < 1:Ka (Xi -(t +A)),where

K (u) = log [M-C+U] uI<a3(KX() =logM - c+a 3(A)2 |U|<a3^

= 0 else.

This suggests the estimator

TM+., = arg max- IEKm(Xi -t).n

(Note that the maximum need not be unique). This estimator is closely related to estimating a density

using Epanechnikov's Kernel with bandwidth a3(A) and setting T,,, to be (any) maximizer of the

estimated density. Let us analyze the behavior of any such Kernel estimate of the mode. Our rate result

applies to any kernel satisfying

Page 26: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 25

Assumption (K). K is a positive, even function of compact support, bounded, square integrable, and

absolutely continuous, with

IIKI12<oo. IIKII<.*and

IIK 112<00, I1K'II.<oo,where the norms of K' are defined distributionally, and so represent the smallest constants C2 and Coo

for which

IIK()-K(-8)112 < C28

IIK(i)-K( -S)II. C..8are valid.

Theorem 5.4. Let T be the mode, and F be as in (5.12)-(5.13). Then b(e) is Holderian with exponent

2/5, and so no estimator can achieve faster than an n-115 rate of convergence uniformly over F:

lir inf inf sup PF ( IT. -TI > b (n 1/2) > e -12/2A --" T,

Let K satisfy the assumption (K) above and let h,, = cn-115. Let T,("k) be any maximizer of

f(Pt =t I K( h )1h,n i=1 hm

Then T,(k) attains the n115 rate uniformly over F:

lim lim sup stFp PF ( I T,(k)- TI > Cb(n"-12)) 0C PI-40

This is proved in section 7.

It follows from this theorem that the estimator T,A, properly tuned, achieves the optimal rate of

convergence, although its actual performance is not measured by ( This may be understood

as follows. Roughly speaking, ,(F(7)F27+A) is comparable to

sup sup PF{n- Km (Xi - T (F )) < - K (Xi - (T(F )+u))) (5.17)rM>A n i=1 n i=i

whereas ir(conv (F(t)),conv (F is comparable to

SUFP PF(-Y-Kn(Xi-T(F)) < sup-I K,,(Xi-(T(F)+u))} (5.18)Underlyin the prof o n i=i

Underlying the proof of Theorem 5.4 is the idea that although (5.18) is much larger than (5.17), they

Page 27: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 26 -

are still comparable.

Hasminsidi (1979) established a lower bound for estimation of the mode also of the order n1/5,

although his results do not quite cover our class. Hasminskii claims in this article that the n-115 rate is

attainable, and that the results of Venter (1967) show this. However, Venter's work only establishes

individual -- rather than uniforn -- rates, and only almost sure -- rather than in-probability -- rates.

Using the Lemmas 7.1 and 7.2 proved below, and some facts about Itf, - Ef, II.. due to Silverman

(1978), it is possible to show that the almost sure rate suggested by Venter's result -- log n /ni/s-n does

indeed hold uniformly over F. However, to show that n-115 is the optimal rate in probability seems

genuinely harder; here we do this by using Bemstein's inequality and a chaining argument. Thus,

Theorem 5.4 verifies Hasminskii's claim and shows that n115 is the optimal rate for estimation of the

mode over the class F.

6. An interesting example

Consider now the nonlinear functional T(F)= ff2. Let F be the family of distributions supported

in [0,1] with densities bounded by M. Then, it follows from section 5 of [GR I] that b(e) .4Me. This

suggests that the rate n-1/2 might be attinable in estimating this functional.

In a very penetrating analysis, Ritov and Bickel (1987) have shown this guess to be very far from

true. Translating their results into the language of this paper, we have

Theorem (Ritov-Bickel). With T and F as above,

aA (n,A) = 1 for all n and AE (0,(M-1)/2). (6.1)

AA(n,a) > (M-1)/2 >0 for all n.

In short, no rate of any kind is available under these conditions. As b (e) = 0 (e) we thus have an exam-

ple where

A2(n ,a) = 0 (n"2)but

Page 28: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 27 -

AA (n,a) -4 0;

the two lower bounds behave as differently as it is reasonable to expect. In view of this result, there

may be a large and interesting class of cases where the 2-point and composite bounds are not compar-

able.

Page 29: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 28 -

7. Proofs

Proof of Theorem 2.3

Before proving (2.10a-b), we first establish some exponential bounds on a,A (n ,dk). We consider two

cases, depending on k. For k small,

(3 ),,CA (Ilga)q/2 S AOej/2 (Casel1)2 n

In this case, there exists an integer m satisfying

n A0 1 21q

&L)kCLAl1( 2 )cJand I log a I/m < e0. Then a calculation reveals that

AO og I)q12 > (3 CA loga)q/2rn 2 n

and, as I log a I/m . eo, (2.9) implies

AA(m,a) > dkand thus aCA (mrdk) Ca. Le Cam (1973) gives the formula

1tjm :5 (41##. (~2-7mwhere nj.m = i(conv (PU""),conv (Q(jm))) and 7cm = it(conv (P(m),conv (Q(m))) We conclude

aA(ynm,dk) < (4ac(2-a)Y.Putting] = Lnlmj and using monotonicity of aA in n we get

aA4.(n ,dkr) <- (4Ja(2=-a~))Iw-

= exp(--Ilog(a(2-aI))(n_-1))2 m

S exp{ 2 I log(C (2Ca)) [ILL Ao ] ii

= exp[-2pC2/q (3)2kbq + 2 log(a (2- a))I]

The hypothesis CA > 2q/2 impliesAO

Page 30: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 29 -

2P32/q (3)2k)q _ jilog(a(2-a))I > pC2/q(3)2k/q2 2 ~~~~~~~2for k =0,1,2, *-* * and so we have, in Case 1

aA(n,dk) < exp(-PC2'q( 3)/q) (7.1)

In Case 2, k is so large that condition (Case 1) does not hold. It follows that

dk >A-e1/2 (Case 2)A1

using the definition of no and arguing as above,

- -1

aA(nf,dk) < (qcao(2-a-o)) o

= exp(-2^yn + 21log(ao (2- ao))1)

Now as 2(-- 1) >- for n > 2no, we get 2yn - 2 log(aO (2 -cto))I > yn, and sono no

aA (n,dk)d exp(-n y), n > 2n0. (7.2)

Consider now (2. lOa). Let K be the number of dk satisfying (Case 2). Formally,

K =#(k:.AoeJ2<(-) A Al1 I )'2S(_)NA . As (2) AN 3M, K S log(3M)/1og(A oeg2); thus K is2 n 2 2

bounded independently of n and N. Now by (7.1) and (7.2), if n > 2n0

N-i N-K-1 N-1caA (nf,d,) = + Nk=1 ~~k=1 N-K

N-K-1 3< Z exp(-3C2Jq 2_)2kIq) + K exp(-n y)

k=1 2

S £ + r (7.3)k=I 2

If l= 0, then since (2 )2l'q > 2k for k = 1,2,***,2

E exp(-fCt(3)2) . exp(-fC') + : exp(-PC 2k) = 0 + 1o2k=O 2 k=1

Combining this with (7.3) gives (2.10a). If 1 >0, we have

7- exp(-C 'q(2)2*'q) exp(-3C 2k) = 2'

which, with (7.3) gives (2.10b) . 0

Page 31: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 30 -

Proof of Theorem 2.4

N-iEp l(T -T) 7£ P (llk+i > ITm -TI > Tk)(ll(k+1)k=O

N-IS £ P(ITx-TI>T1k)1((k+1)

20 N-i o2 N-i:5I(iOI_02 + t>1Z(IO+J i -

2 + 1;(Itl r..l(¶)~~ +k=1 0(l+)10

Now as 1 is well-behaved, I (k) = (( ) A)aakl (A), so

[ 20 okak+1 aN iiEF1(Tx -T) . I(A)L 02 a +k i 1 F0 a-lJ

(A) 2a + 1 a202 aN ]

Ll_021_02 1-aG02 a I

= l(A)(A2+A3,x),

say. Now as N . log (3M)/log (A) . log (3M)( 2( log(n )- log I log acI) - log(A o)),2

aNr,, = exp(-ny+1og(a)N) . exp(-ny+A410g(n)+A5) 00 ,n -.

Therefore, A3,,, -0. For large enough n, A3,,, < A2, and so l(A)(A2+A3)3 < 2A21(A). Then, as

I (A) =- I (C AA (n ,a)) < a 9"S'ISU1 (AA (n ,a)), we have (2.11) -with A = 2A 2 a [l0gClogl51 Q

Proof of Lemma 3.3

First, we prove (3.13). As (1 - a"") = 1 - exp - -log(a)I], (3.13) is equivalent to 1 - e- < e

As ex is a strictly convex function it lies above its tangent line at x = 0, so

eE1e -£

which is precisely what we need.

Now, we prove (3.14). Note first that if we put

g(e) = (1 - e ) / e

then for all £ < co < 1.5 we have

(1-et > g (co) c . (7.4)

Indeed, g(£) is a monotone decreasing function at least on [0,1.5) as may easily be seen from its power

Page 32: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 29 -

Indeed, g (e) is a monotone decreasing function at least on [0,1.5) as may easily be seen from its power

series

2 3 4~

g(e) = 1- -+ 3 4!_ 5 ..

2! 3 4! 5

absolutely convergent for all e > 0. So g (e) > g (e0) if 0 < e < e0 < 1.5; but this is just (7.4).

Now by monotonicity of a (2 - a) on 0 < a < 1,

inf log(a (2 - a)) _ log(ao (2 - ao))O < a < aO log(a) log(ao)

Consider now (3.14).

I - (a (2 - a))1/2A] = 1- exp[ llog(a (2 - a))J]]

g (co) Ilog(a (2 - a))Jn

2 g(co) log(ao (2 -ao)) Ilog(a)log(a0o)n

. c2/2 jlog(a)ln

if we put

c 2/2 = g (e0) log(a (2 - ao))1og(ao)

as claimed.

Proof of Theorem 4.2

[GR III gives the pair (F* ,F*) with F E Fs and F;E Fat+A

H(F;, F)= min H(F1, FO)F1 F

Foe F,

This closest pair in Hellinger distance has the form

fj(x)=t +A -x2 /2 Ix I.<sS(A)

Page 33: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 30 -

f7o=t+x2/2 IxIsl(A)

where sI(A) = 4'1 (1 + o(1)) as A-+0, and

f l (x) ff (sI) Ix 1> SI(A)

Define the likelihood ratio

-f1(x)

This is a monotone decreasing function of Ix 1. Consider the random variable L,,a(X) where X has dis-

tribution F. By definition of the Sacks Ylvisaker class, under the null hypotliesis Ho: F5t,

min (IX I, s1(A)) is stochastically largest at F = F*. Similarly, under the alternative, min (IX 1, sA(A))

is stochastically smallest at F = FI. Hence LI,A(X) is stochastically largest under the null at F = FO

and stochastically smallest under the alternative at F = F; . From this point on the proof is the same as

the proof of Theorem 5.1.

Proof of Theorem 4.4

As b is Holderian, if we pick co small enough, then by (3.5) there exist constants C_, C+ so that

C-( I lga 1)ql2 :5 A2(n,a) :9 C,(IlgaI)q2 75n n

for a< ao and I log aVn <co. Let m be an integer so that m/2 + 1 2 M. Now we claim that

jlog(aA (m n,A))I > m/2 + 1I (7.6)Ilog(aA (n, A))I (

Let us see why. Let PA (n A) denote a quantity similar to aA (n A), only defined using Hellinger affinity

rather than testing affinity. Then

aA (m n .A) . PA(m n ,A) < PA (n A)m. (2 aA (n )112)

where the second inequality follows from (3.15) and the third from (3.7). Thus

Ilog aA (m n,/) I > m [.S I log aA(nA) I + log 2].Now, for a(A < 1/2, log 21Ilog aA J>1, and so the last display proves (7.6).

Combining (7.6) with hypothesis (4.6) gives

Page 34: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 31 -

I log aA (m n,A)lo> g1Ba2(n,A)1It then follows that, with a= a2(n, A)

AA(m n, a) . A2(n,a)Now by (7.5)

aA2(n, a) 5 C+ [1log ]]

and

[1 logal ]q/2C.. mlog al A2(m n 'a).m n

Combining these,

A2(n,a) . [C+M/2] A2(m n,a) . (7.7)

Hence, for every k =m n,

AA(k, a) < CO A2(k, a) (7.8)

C+ Mq2'2where CO= C_ .

We now consider I which is not divisible by m: I = n m + r for r < m. As AA(n, a) is

monotone in n for fixed a

AA(1, a) SAA(m n,a)

and

A2((n + 1) m, a) < A2(1, a)

Now by a repeat of the reasoning behind (7.7),

A2(nim,a) [ C. ((n+1)/n)qC 2 A2((n + 1) m, a) (7.9)Combining these relations, and noting that (n +1)/n .2,

AA(1, a) s AA(m n, a)

< [ _ A2(m n, a) (by (7.8))

Page 35: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 32 -

< [C-] nq'2 2q12 A2((n+1)m, a) (by (7.9))

. C A2(I, a)

where C = L-C+j] m422'2.

Proof of Theorem 4.5

As in the last proof, (7.4) holds by hypothesis. Pick an integer m so thsa

C+Then

AA(m n, a) . C A2(m n ,a)

CC+( )q/2 < C 'C+ C _( )q2m n M2C_ n

. C g )qI2 <A/2(n, a)-n

Hence, with A = AA (m n, a)

a2(n, A) 2 a

Defining p2(n,A) in a fashion analogous to A2(n A), only using p in place of testing affinity, we have

that

p2(n A) 2 a

and also

p2(m n, A) 2am

Then from X - p22

a2(m n, a)> ! a2'

And so, if a < 1/2,

tloga2(m n, A)I 2m +1 (7.10)IlogaA (m n, A)I

Page 36: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 33 -

It follows that, for aA4 < 1/2, and k of the form m*n, we have (4.7) with M =2m+1. Consider now

=m n + r, O< r S m. By monotonicity of a2and a,A in n,

Ilog(a2 (1, A) < 1log a2 (m (n + 1), A)IIlogczA(a,A)I - Ilog aA (m n,A)I

and

log9a2 (m (n + 1), A)I Ilogca2 (2 m n, A)jI log a2 (m n, ) ilog a2 (m n,A)

Then, from (3.7) again,

a2(2m n A) > .5 p2(2m nA,)2 = .5 p2(m n,A)4 > .5 a2(m nA)4and so, combining the last three displays,

10Clog2 (1 8 < 10 m + 5Ilog aA~ (I, A)

which gives (4.7) in the general case with M = lOm + 5.

Proof of Theorem 5.3

(5.7) shows that Lt,.(x) is constant in t for x < a l(t A), and is a monotone increasing function of

a 1(t,A) for x > a Q(tA). Thus the proof requires showing that a 1(t,A) is monotone decreasing in t.

Now a I(t,A) is the value of x solving

X(x) = r(x,t) (7.11)

where

X(x) = C+ x (1 + C2X )C_ (I1-C2 XP )

and

1 - ff(v)dvr (x,t) =

1 - fo(v)dv

Now X is monotone increasing in x. We claim that for sufficiently small xo, x E (O,xo), 0 < Xo < 1,

r(x,t) is monotone decreasing in t. Then, at any (t4) pair at which the solution a 1(tA) of (7.11) falls

in the interval (O,xo), the solution must be monotone decreasing in t. Finally, a little calculus will show

Page 37: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 34 -

that for a given xo, there is a A0>O so that al(toA)<xo for A<A0, where to is the constant used in

(5.2a) defining the class F. Combining the last two sentences completes the proof.

It remains only to establish the claim, i.e. to show that r (x,t) is monotone decreasing in t. Put

r(x,t)= 1 (t)

where

a(t)=C+ 1+C2xP t+A+1

t) C_t + I

I - C2X'Vt1

Now one can easily verify that

az(t), [(t) are decreasing in t . (7.12)

Then monotonicity of r(x,t) follows from

1 - a(t) a'(t)

In fact, we can show that for xo small enough

-3(t < 2 < (7.13)1a-ct) a'(t)

for all t > O and all x E (O,xo).

Let us first establish the left hand inequality. This can be rewritten as

1 > 2 5(t) - a(t)

and as 2 ,3(t) + a(t) > 2 [(t) - a(t) it is implied by

1 > max 2[(t) + (t()

By (7.12), this reduces to

1 > 2 [(O) + a(O). (7.14)

Now pick xl so that

X ) x4+1 A+p +1 A+ 1 A+p+lI

Page 38: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 35 -

Then forx e (Ox 1)

2 f(O) + a(O)

= 2 C_x U( - CZ X + - O1+ C2 XP ++<=2C_X,(1-+C2 + )+C+ 1 (1 +C2x + )

p + A+1I A+p +1A+1

.2 C.x1(1+CZ )+C. (1 +C2 X'pp + A+IAp

<1

and so (7.14) follows. Thus the left hand side of (7.13) is established for x0 < x1.

We now consider the right hand inequality of (7.13). Now

5'(t) = P(x,t) [B - B2 + B3]a'(t) = '(x,t) [A1 - A2 + A3]

where

*+ 1

a(x,t) = X+1 Ilog(x)I

and

B1=-C... [c2 ( + 1) Xp1 - C ~t + p +1]

C2 (t + 1)xP1B2=C [1- (+) . 1BC. t +p + J (t + 1) jlog(x)I

C2 (t+1 ) C_ 2 x1

AB=C+1+-]x

Lt +p + 1)2 t +p +lJ Ilog(x)l

FlC2 _+C2A(t +A+1)x xA t +1_ 1L t+~~p +A+A+1Ij ( +1) Alog(x)

2=C+F2AL- 21X +

Lt +p (t+p +A+ 1)2 (t +A+ 1)2 110g(x)lThe desired inequality is then equivalent to

B1-B2+B3 < 2 (A1 -A2-A3) (7.15)

for all t and all x < xO.

Note that for x e (0,1), B 1 is increasing in t. Thus

Page 39: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 36 -

Bi< 1im Bi=-C_(1-C2xP)=B4(x);-. m

similarly A1 is decreasing in t and

Aj. lim Ai=-C+x' (1+C2xP)=A4(x).

Pick e > 0. For x2 small enough

B4(x) <2A4(x)-e

arid so by the obvious monotonicities in x,

B4(x) < 2 A4(x) - e

for all x e (O,x2). We will show below that B2, B3, A2, and A3 are negligible, in the sense that

21 + B31 + 2 21 + 2 31 < E

for x < X3. Then we have

B1 -B2+B3<B4+B2-B3< 2A4+B2-B3 -e [by (7.16) ]

= 2 A4 - 2 A2 + 2 A3 + B2 - B3 + 2 A2 - 2 A3 - E

< 2 (Al - A2 + A3) + (B21 + IB31 + 2 2j + 2 31) - E.2 (A1 -A2 +A3)

and so (7.15) follows, for x0 . min ( x2, X3).

Note the inequalities

(7.16)

(7.17)

[ by (7.17) ]

B2 . C. (1 + C2 XP) / 11og(X)l (7.18)1B31 . C. (2 C2 xP) / log(x)I

K12 . C+xA (1 + C2 xP) / log(x)IK31 < C+xA (2 C2 xP) / 1log(x)I

valid for t 2 0, 0 < x < 1. Pick X3 so small that the sum of the upper bounds in (7.18) is less than 2

Then we have

B21 + B31 + 2 21 + 2 31 < e

and this holds for all x < X3 by the monotonicity in x of the upper bounds in (7.18). Hence, (7.17).

Putung xo = min (xI, x2, x3) we see that both sides of (7.13) hold for all t 2 0 and all

X E (O0X0).

Page 40: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 37 -

Proof of Theorem 5.4

The claim about the modulus follows from Theorem 4.2 of [GR I]. The claim about the lower

bound follows from Theorem 2.1 of lGR I]. As b(n-1"2) is asymptotic to A n-115 for an appropriate

constant A, it turns out that to establish the final claim, it is sufficient to show that

T(k) T(F) = Op(n-115) uniformly in F.

Suppose without loss of generality that K is a probability density: f K = 1. Then fx is an

estimated density, and f, (t) = E f ,,(t) is a density.

Let t,, be any maximizer of f^(t). Let te be any maximizer of f^(t). By Lemma 7.1, the

assumption h. = c n-1/5 guarantees that t,, - T(F) = 0 (n115) uniformly in F. Thus the theorem is

proved, if we can show that t:, - t,, = Op(n l"5) also uniformly in F.

Now we have

f n(tn) 2 fM(t M) (7.19)and so

(f (tn) -fnQ(t)) - (f.tr) - fi(tO)) .!fft(tr) -fn(tn)Now by Lemma 7.2, there is a constant y > 0 so that

f (t) _f(t)>y(t _t)2 (7.20)

uniformly in F, if t e (T(F)+r * h. s,T(F)+c), for some r >0 and any c smaller than the constant 8

used in defining the class F.

It follows that if te -T(F) > r h,, s it must satisfy

zn (tn) Za (te) > 7 n21S (tn - t)2 (7.21)

where Z,, is the stochastic process

Zn(t) = n (fA(t) - f (0)Therefore, if A is so large that n-115A > (r+l)h,,s, we must have

PF (tf - tA > n11 AI 5 PF (ZM(t)-ZM(tn) > y n22'5 (t - £M)2 for some t e (t +n-1/5 A,t+c))

Page 41: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 38 -

+ PF( sup f(t) >t > t's+

By unimodality of F and (7.21), we have that

PF( sup f,,(t) > f,(t,)) 5 2PF(SUP If^Q)fA(t)I>-yc2t > *,,+C t 2

Using results of Silvennan (1978) and the fact that f(t) . M for every t and every F e F, we can

show that the last expression tends to zero uniformly in F. Thus,

PF (te- e, > n-115 A). PF (Z'(t) >- n(2 - e,)2 for some t > t )+n-"' A) + o(1).

By Lemma 7.3,

PF(IZM(tA)-YA2) :5 exp(- nll5h. A48P( (t C2 7 c exp(- M IKl2+max(M h,dK1.)n'5n 2

for every F e F. Hence, if n115 h. = constant,

lim lim sup sup PFM (t') - 2 2) = °

It follows that if we can also show that

lim lim sup sup PF (Z (t) > y (t- t)2 for some t > t,, + n 115 A) (7.22)then

lim lim sup supnPF(e, -t,. >n5 A) =0.

By the obvious symmetry in the problem, a similar relation would hold for tA. - t, < -n 1/5A, and so we

would have (t: - e,) = O(n-5) uniformly in F and the proof would be done. Consider then

PF (Za (t) > 2 Y (t _ t,)2 for some t > t, + n-115 A)

Note, as the class F is closed under translation, we can always assume F is such that t, = 0.

Now, for8>0,andfori =0,1.,put=ti n-115 (A+8i)wewillalsorefertoAi =(A+Si)sothat t, = n-115 Ai (and A0 = A).

PFIZ,(t) > n215 - y t2 for some t > n1S5A) (7.23)2

.PF(Zf l(ti)> Aj2 forsome i.>0) +PF( sup Z,Q)>-yt forsome i.>0)4 t i14

Page 42: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 39 -

By Lemma 7.4, with a=-

4

m lim sup supPF (Z. (ti) for some i 2 0) =O (7.24)

By Lemma 7.6, with a = 1

4

limlim supsp PF( sup Z,(t) > 2yA for some i

2=O. (7.25)

- a F t,^t;t4 +1 4

So (7.22) is established and the proof is completed.

Lemma 7.1. Let K be positive and of support [-s,s]. Then for every F in F

Itn-T(F)I < h,,swhere t,, denotes the maximizer of f,, (t).

Proof. We have

h,,

fn(t)-fn(t+v) = f_Kn(u)(f (t+u)-f (t+u+v))du

If t >T(F) + h,,s then, t+u and t+u+v are on the same side of T(F), for every value u in the range

of integration. By the unimodality of F, it follows that f (t+u) >f (t+u+v). Therefore,

f,n(t) -f, (t+v) is an integra. if nonnegative quantities, and so is nonnegative. Thus fM (t) is monotone

decreasing on t > T(F) + his. Similarly, f,, (t) is monotone increasing on (-oo,T(F )-hn s). Thus a max-

imum of fA (t) occurs in [T(F )-h, s ,T(F )+h,, s]. We now argue that no maximum of fn occurs outside

this interval. This is equivalent to saying that t0=T(F)+h,s is a point of strict decrease of f,. Consider

then f (to+u)-f (to+u+v), viewed as a function of u. Unless this is zero a.e., the above display shows

that fn (t) is strictly bigger than f,,(t+v). Now by definition of F, f (to+u) . f (T(F))-c+(h s+u)2

while f (to+u+v) . f (T(F))-c-(h,,s+u+v)2. It follows that the difference of these two quanitities is

bounded below by a quadratic function which is strictly positive at u=-h,,s. Therefore the difference is

not 0 a.e. in the range of the integrand, and t0 is a point of strict decrease of f,.

Lemma 7.2. Let K be positive and of support [-s,s]. Let r be so large that

Page 43: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 40 -

- < (2- (r+1)2 )4C.. 12 (r-l)2 4C_ (r-l)2

C-Puty= -. Then for every F e F,

2x(t)-fx(t) > y(t-t )2 t_tx r (rh,s,c-t.)for c <8.

Proof. Without loss of generality, put T(F)=O. Now, by definition of the class F, and using the previ-

ous lemma,

f,,(t,) 2 f(0)-c+(2h,,s)2fn (t) < f (0)- c -(t-h, S )2

so that

fx (tx )-f,(t) > C_ (t_t- )2 + C_((t-hms)2_(t-_t )2)_C+(2h s)

If t > r h. s then, by the previous lemma

(t-h,,s) > (r-I)h,,s

(t -t,) > (r-l) h,s(t- tx) :5 (r+l) h,s

and so

fn(tn)-fn(t) 2 c_(.-t_,)2(1- (r-1)2-(r+1)2 4C+

C-2 2 (tt)2.

where the inequality defining r has been used. The lemma follows.

Lemma 7.3

F -n115hl/5 /2 1sUp PF(Z(t) > A) 5 exp 112 + MaX(M h, (7.26)V pL~~~~MI 1K12 + a(Mh,11KII..) n2AJ

Proof. This is an application of Bernstein's inequality, as follows.

Z. (t) nn (fM(t) - E fn(t))

= Z0,[; K xh ] I hm] -f (t)]

Page 44: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 41 -

defining Wi =K[ j / hR -fM (t) we have

Z,,(t) = n- 3/5 E WjE W, =0

Var Wi f K2xh j / h,2 f (x) dx _(f-(t))2

S f K2(U) du - f(t)h,,

IlW, II. = max(M,IIKII. / h,).Bernstein's inequality -- Shorack and Weilner (1986, page 855) or Pollard (1984, Appendix B) -- says

p4[n Wi2T1. 5 exPL Var Wi + IlW,II.* T1 / (3 n1/2)and so, putting A= n- 1/10

PnY ;W, 2 :)<exp M II 12 / h ,+ max(M,.l K1. / hn) n - \4]

which, since nothing here depends on F, except for supf (t) . M, a condition which is true of all

F e F, we have (7.26).

Lemma 7.4 Let ti = n - /5 (A0 + i 8). Let M *h, < IK II... Then for a e (0,1),

oe ~~~~~~exp(- Po AO) exp(- P2 nl h,,)SP iYE P (t)>=a O4) < 1 - exp(- AO 8 - 8 n25) (7.27)

where

n15 ha2n92= 22 IIK12 M

n115 h, a2"?2.2 IKI.

a2 y2 IIKII2P2= 1.22 IIK1le

Proof. Fix N = 10. Let m be the smallest integer such that

Page 45: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 42 -

(Ao+m S)>NM IIK .2n5 (7.28)

Then for i 2 m, as M * h. < IlK Ii. by assumption, the denominator in the exponential bound of the last

lemma is bounded by

M IlKI2 + max(M h.,,IKIIL.) (AO + i 8) n -2

:5 (I + N)IIK II (AO + i 8) n 215,N

while for i < m

M IK112 + max(M h,,IIKL1.) (AO + i 8) n-25< (N +1)IKI12M.

Applying now the exponential bound from that lemma, we have

PF(Z~() >ay215 t?)e1e(i) i.<MPF MOOti > a y n25t2 e{2(0 i >< m

where

e I(i) = exp(-fo (AO + i))and

e2(i) = exp(-fl (AO + 8 i) n2/5).

Thus

ae m-i£, PF (Z,,(ti) > a yn25 ti2) < 1£ el(i) +.I: e2(i). (7.29)

Now

ee1(i) < Eeel(i) = i exp(- o0 (Ai2 + 2 AO 8 i + 82 i2))

< exp(-f0 AO) i exp(- 2 oAo8 i)

exp(- fo A (7.30)1 - exp(- 23o o 8)

and

.Z e2(i) = exp(- f1 n2'5 (Ao + S m)) Y exp(- f31 f2'5 S i).&=ui I w0

Now, using (7.28) we see that

Page 46: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 43 -

P1n2'5(Ao+m8)>f3> n4'5NM IK2-~~~3~~~nhK11= 2n h^,

Thus

exp(- f32 n ht

*Z e2(i) 1- ep-3n215 8)(7.31)I=M I - exp(-01 n 6

Note that all results depend on F only through the inequality sup f (t) < M, which is valid for all

F e F. Therefore comparing (7.29) - (7.31) we have (7.27) and the lemma is proved.

Lemma 7.5 Suppose that the kernel K has

IKV() - K( + 8)112 < 82 IIK'II2IIK( ) - K( + 8)1.. < 8 IIKII.

for some constants lK ,'112, ILK'I!... Let 8 > 0, 1 > 0. Suppose that the kernel K is positive, supported in

[- s,s], and that h,,s < t. Then

sup PF (n25 (fn (t + 8 hs) - f 4 (t)) >1 8) S exp [I,K112 M + IIK'1. n- 25 / 3 (3

Proof. Put

Then, as fMn is monotone decreasing on [h,,s, ol] for any F E Fs5 we have

E Wi = ffM(t+8)-fn(t) < 0

and as f (x) < M forall x, forevery F E F,

Var W, < j Jf(K(x +8)-K(x))2dX

<-,K1122 62M IIK'I 2

Also

lW IL. < IIK( + 8) - K()ILI*

Page 47: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 44 -

< IIK'I1. . 1hN

Now we apply Bemlstein's inequality.

P [£-s (Wi- E Wi) > ) exp-- X2/ 2 _1

Var Wi + IlWiL.y-3 4

As E Wi .0

P(-IWi . M.P(9=(Wi -E Wi)}X)

so putting i-8 = n 1/10 X,

P (n3'5 1 Wi > Tl 8)

< e-xn:- -.rI

82 11K112IM + IIKP1..S.. I 8 1l nl/10

1-/l'5 ht r12-~~~~~~~~n

2lK'lI2 M + IIK'IL n-vs.215-3

which establishes the result.

Lemma 7.6 Suppose that K satisfies assumption (K), with support [-s,s]. Suppose that h,.s < n-1/5 o.

Put tj = n 1/5 (Ao + 5 i), A4 = AO + 8 i. Then for Ao large enough (K, Sfixed).

sup £ PF SUP n25f(t) -f (ti))> ay>Aa ) (7.33)F5 0 i=0 ti !5t !~ti+1

<s(Ao) exp(- 37(0)) + 3s(A0) exp(- P1o(Q)1 - exp(- f7(A) 1 - exp(- Plo(A)

where

OS = 1- exp[- 33 (a y)4 y 16 ]

07 = 03 (a Y)2YA 06 - log(2)

Ps = I - exp[- 4 a y hO 8 32n2AS

Pio = 04 a y 9 n2'5 - log(2)

ni15 712 52 1

Page 48: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 45 -

and 3,4346, and 39 are defined below, and do not depend on 8, AO, or on n, provided n "5 h. is con-

stant independent of n. The inequality is valid as soon as I17 > 0 and J10 > 0.

Proof. We use a chaining argument Let t = ti + p 8, p E [0,1]. Now p has a binary expansion

s Itkp Z bi 2-i where each bj = 0 or 1. Letting pk = £ bj 2- we have the telescoping sum

j=1 Jj=1

f^ (t)-f^ (ti ) E(^(ti + Pkc6 -8 (ti + P't 1 )k=1

Note that Pk - Pk-1 = bk 2 k = 0 or 2-k. This formula is certainly rigorously valid if p is a binary

rational, in which case only a finite number of terms are not zero.

For k = 1,2......let Yk,l,j I = 1,....,2k denote the random variable

Yj =n25 (f,(ti + 1 2k)-f,,(ti + (1 - 1) 2k))

Now note that, because the kemel is continuous, f,, is a continuous function of t. Therefore, the

supremum of fA(t)-f,, (ti) on the interval (t, ,ti+1) is the same as the supremum at values of t with p a

binary rational. Hence

PF( SUP n215(f^(t)-Atj))>ayAj) = PF( SUP kbkYk,2#Pk > ay )ti' .ti+

p rati'o,,al P

Now let ek = k 2 6 Thus : ek = 1. Now suppose that, for each k, it were true that

SUp Yk,j <Ek a A2

This would imply that for every binary rational p

£ bk Yk Pk,'<k. SUp Ykl,li

< 1 ek a A;k

=aA?

It follows that

PF (SUP I bACYA,2J > a y A;) 5 I: PF (SUP Yk,,l, > £, a y A;)ZbkYk,2kP~ 2) Pk 2 PF( uYk,I ekayA2)

.Z1:2kPF (Yk,I~.>ekayA)k

Our strategy now is to use the exponential bound furnished by Lemma 7.5 to bound the final sum

Page 49: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 46 -

in this display. Let us first verify that the lemma applies. By hypothesis the kernel K satisfies the

assumptions of that lemma, and h,s 5AOn-115. Therefore that lemma applies to all increments

n2/5(f(t)-fq(t:)) for t e (tj,t +), where i .0. In particular it applies to every Yk,I,.To apply Lemma 7.5, put T 2- k = ek a y A?; then by (7.32) for any k 2 1

PF(Yk,1U >e C A?2) < exp[nIK,IIz II/Kh (2k k ay )A)2 2 (7.34)

Fix N (=10, say). Let m(i) be the least integer such that

K qI122m em ax 2 > 3 N M

Then for k > m, the denominator in (7.34) is smaller than

(1 + - ) IIK'II. n- 2/5 (2k ek a A.2) / 3N

while for k < m, the denominator is smaller than

(N + 1) IIK'II2 M

Hence,

>2F-,t F'e3(k,i) k < m(i)PF(Yk1,>Yk , ayA) e4(k,i) k > m(i)

Here

e 3(k,i) = exp(- 03 (a y)2 A4?22 t2)e4(k,i) = exp(- 4 axyA2 2k ek n215)

and

n115 h2 (N + 1) IIK'II2 M

= n 1/5 hitPJ4 1

2 (1 + -) IIK'II. / 3N

As these inequalities depend on F only through the assumption that f is monotone decreasing on

[hns ,°°] and f (t) < M,

Page 50: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 47 -

00 com-l 0

SUP I PF{ sUP nl (f(t) -fn(ti)) >ayA,2) < .1: ( 2" e3(ki) + z, 2" e4(ki)).

Now

£ 2" e3(ki) < Z £ 2k e3(ki)i=0 k=1 i=O k=1

= 22k e3(k,i)k=1 i=0

And

£ e3(k,i) < exp(- 3 (a y)2 A04 22 E 2) £ exp(- f3 (ay)2 3 A 2Si 2> e2)exp(- 3 (a y)2 Ao 22* e)

1 -exp(- P3 (a 7)2 3 A3 8 22 E2)*

Note that min 22 = Using this in the denominator of the last expression, we getk > 1 T

7- e3(k i) < Ps exp(- P3 (a )2 A4 22 k2)i.=0

and so

cx

£ 2ZF e3(ki) < 5 1; 2k exp(-.33 (ay)2 2 k).k=0 i=0 k=)

Now we note that

m 2a =min& 2 4=] 6 > 0,k. k k.1I k5J

say, so that 22* E2 > 6 k. Then we have

2k exp(- 13~(a y)2 A4 222 e2) < exp(- 7 k)

and so, supposing A0 is large enough that 17 > 0,

Z 2k E e3(ki) < 5 1 - exp(- 57(A))k=1 i4) ep-P7A)

This is the first half of the lemma. Consider now the second term.

j4 k 2k e4(kJ1)< i:2k3 e4(kii=0 k =m(i) k=1 i=0

and by an argument similar to the one for e 3(k ,i)

Page 51: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 48 -

e4(k,i)5 exp(-04ayA6 2k Et n2)'m° Il-exp(>-42ayAO62kekn2/l)

Then, noting that

mi 2* 8 6mm 2 tk =- * -

k a1 9 ,

we have

£ e4(k ,i) P exp(-f4a yAi 2 etk n25).i=0

And, putting

mm 2kemin k * = 9 > °'

we have 2kek 2 39 k and so, if we put

3IO(AO) = 4 a y L2A 9 n215 - log(2)then

2k exp(-P4 ayA42t et n215) < exp(-f I*ok).And, supposing A0 is so large that flo > 0,

: 2k Z e4(k,i) < p exp(- Ps(^o))Ak=i i~ 10I- exp(- f310(AO))

by the same arguments as used for the e3(k,i) sums. This gives the second bound and completes the

proof.

Page 52: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 49 -

8. References

Bednarskd, T. (1982) Binary experiments, minimax tests, and 2-alternating capacities. Ann. Statist. 10,226-232.

Birg6, L. (1983) Approximation dans les espaces m6triques et th6orie de l'estimation. Zeit. Wahr. 65,181-237.

Donoho, D.L., and Liu, R.C. (1988a) Geometrizing Rates of Convergence, I. Technical Report 137,Department of Statistics, University of Califomia, Berkeley.

Donoho, D.L., and Liu, R.C. (1988b) Geometrizing Rates of Convergence, III. Technical Report 138,Department of Statistics, University of Califomia, Berkeley.

Donoho, D. L. and Liu, R. C. (1988c) Hardest One-Dimensional Subfamilies. Technical Report ???,Department of Statistics, University of Califomia, Berkeley.

Donoho, DL., MacGibbon, B., and Liu, R. C. (1988) Minimax risk for hyperrectangles. TechnicalReport 123, Department of Statistics, University of California, Berkeley.

Du Mouchel, W. H. (1983) Estimating the stable index a in order to measure tail thickness. Ann. Sta-tist. 11, 1019-1031.

Farrell, R. H. (1972) "On the best obtainable asymptotic rates of convergence in estimation of a den-sity function at a point." Ann. Math. Stat., 43, #1, 170-180.

Hall, P. and Marron, S. (1987) On the amount of noise inherent in bandwidth selection. Ann. Stat. 4

Hall, P. and Welsh, A. H. (1984) Best attainable rates of convergence for parameters of regular varia-tion. Ann. Stat. 12, 1079-1084.

Ibragimov, I.A., and Hasminskii, R.Z. (1984) Nonparametric estimation of the values of a linear func-tional in Gaussian white noise. Theory of Probability and its applications. 28. _ _

Ibragimov, I.A., Nemirovskii, A.S., and Hasminskii, RZ. (1987) Some problems on nonparametric esti-mation in Gaussian white noise. Theory of Probability and its applications. 31, 391406

Hasminskii, R. Z. (1979) "Lower bound for the risks of nonparametric estimates of the mode." inContribution to Statistics (J. Hajek memorial volume), Academia, Prague 1979, 91-97.

Huber, PJ. and Stassen, V. (1973) Minimax tests and the Neyman-Pearson lemma for capacities. Ann.Statist. 1, 251-263.

Le Cam, L. (1973) "Convergence of estimates under dimensionality restrictions." Ann. Stat., 1, 38-53.

Page 53: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 50 -

Le Cam, L. (1975) "On local and global properties in the theory of asymptotic normality of experi-ments." in Stochastic Processes and Related Topics, Vol. 1.

Le Cam, L. (1985) Asymptotic Methods in Statistical Decision Theory. Springer Verlag.

Liu, R.C. (1987) Geometry in Robustness and Nonparametrics. Thesis, University of California, Berke-ley.

Meyer, T.G. (1977) On fixed or scaled radii confidence sets: the fixed sample size case. Ann. Stat. 565-78.

Pollard, D. (1984). Convergence of Stochastic Processes. Springer.

Ritov, Y. and Bickel, P.J. (1987) Unachievable Information Bounds in Semiparametric and Non-parametric models. Technical Report 116, Department of Statistics, Univ. of Calif., Berkeley.

Sacks, J. and Ylvisaker, N.D. (1981) Asymptotically optimum kernels for density estimation at a pointAnn. Statist. 9, 334-346.

Shorack, G.R. and Wellner, J.A. (1986) Empirical Processes with Applications in Statistics. J. Wileyand Sons.

Stone, C. J. (1980) "Optimal rates of convergence for nonparametric estimators." Ann. Stat., 8,6,1348-1360.

Venter, J. (1967) On estimation of the mode. Ann. Math. Statist. 38, 1446-1455

Wahba, G. (1975) Optimal convergence propertes of variable knot, kernel, and orthogonal seriesmethods for density estimation. Ann. Stat. 3, 15-29.

Page 54: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

TECHNICAL REPORTSStatistics Department

University of California, Berkeley

1. BREIMAN, L. and FREEDMAN, D. (Nov. 1981, revised Feb. 1982). How many variables should be entred in aregression equti? Jour. Amer. Statist. c March 1983, 78, No. 381, 131-136.

2. BRILLINGER, D. R. (JaL 1982). Some contrasting examples of the time and frequency domain approaches to time seriesanalysis. Time Series Methods in Hydrosciences, (A. H. E1-Shaarawi and S. R. Esterby, eds.) Elsevier ScientficPublishing Co., Amsterdam, 1982, pp. 1-15.

3. DOKSUM, K. A. (Jan. 1982). On the performance of estimates in proporional hazard and log-linear models. SurvivalAnalysis, (John Crowley and Richard A. Johnson, eds.) IMS Lectue Notes - Monograph Series, (Shanti S. Gupta, seriesed.) 1982, 74-84.

4. BICKEL, P. J. and BREIMAN, L. (Feb. 1982). Sums of functions of nearest neighbor distances, moment bounds, limittheorems and a goodness of fit test. Ann. P Feb. 1982, 11. No. 1, 185-214.

5. BRILLINGER, D. R. and TUKEY, J. W. (March 1982). Spectrum estimation and system identification relying on aFourier transform. The Collected Works of J. W. Tukey, vol. 2, Wadsworth, 1985, 1001-1141.

6. BERAN, R. (May 1982). Jackknife approximadon to bootstrap estimates. AmL St March 1984, 12 No. 1, 101-118.

7. BICKEL, P. J. and FREEDMAN, D. A. (June 1982). Bootstrapping regression models with m;ny parameten.Lehmarm Festschrift, (P. J. Bickel, K. Doksum and J. L. Hodges, Jr., eds.) Wadsworth Press, Belmont, 1983, 2848.

8. BICKEL, P. J. and COLLINS, J. (March 1982). Minimizing Fisher infonnation over mixtures of distributions. Sankhyll1983, 45, Series A, Pt. 1, 1-19.

9. BREIMAN, L. and FRIEDMAN, J. (July 1982). Estimating optinal transformations for multiple regression and correlation.

10. FREEDMAN, D. A. and PETERS, S. (July 1982, revised Aug. 1983). Bootstrapping a regression equation: someempirical results. JASA, 1984, 79, 97-106.

11. EATON, M. L. and FREEDMAN, D. A. (Sept. 1982). A remark on adjusting for covariates in multiple regression.

12. BICKEL, P. J. (April 1982). Miimax estimation of the mean of a mean of a normal distribution subjt to doing wellat a point. Recent Advances in Statistics, Academic Prss, 1983.

14. FREEDMAN, D. A., ROTHENBERG, T. and SUTCH, R. (Oct. 1982). A review of a residential energy end use model.

15. BRILLINGER, D. and PREISLER, H. (Nov. 1982). Maximum likelihood estimation in a latent variable problem. Studiesin Econometrics, Time Series, and Multivariate Statistics, (eds. S. Karlin, T. Amemiya, L. A. Goodman). AcademicPress, New York, 1983, pp. 31-65.

16. BICKEL, P. J. (Nov. 1982). Robust regression based on infinitesimal neighborhoods. Ann. Statist., Dec. 1984, 12,1349-1368.

17. DRAPER, D. C. (Feb. 1983). Rank-based robust analysis of linear models. I. Exposition and review.

18. DRAPER, D. C. (Feb 1983). Rank-based robust inference in regression models with several observations per cell.

19. FREEDMAN, D. A. and FIENBERG, S. (Feb. 1983, revised April 1983). Statistics and the scientific method, Commentson and reactions to Freedman, A rejoinder to Fienberg's comments. Springer New York 1985 Cohort Analysis in SocialResearch, (W. M. Mason and S. E. Fienberg, eds.).

20. FREEDMAN, D. A. and PiETERS, S. C. (March 1983, revised Jan. 1984). Using the bootstrap to evaluate forecasdngequations. J. of Forecastng. 1985, Vol. 4, 251-262.

21. FREEDMAN, D. A. and PETERS, S. C. (March 1983, revised Aug. 1983). Bootstrapping an econometric model: someempirical results. _BES, 1985, 2, 150-158.

22. FREEDMAN, D. A. (March 1983). Structural-equation models: a case study.

23. DAGGE'IT, R. S. and FREEDMAN, D. (April 1983, revised Sept 1983). Econometrics and the law: a case study in theproof of antitst damages. Proc. of the Berkeley Conferenc, in honor of Jerzy Neyman and Jack Kiefer. Vol I pp.123-172. (L. Le Cam, R. Olshen eds.) Wadsworth, 1985.

Page 55: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 2 -

24. DOKSUM, K. and YANDELL, B. (April 1983). Tests for exponentiality. Handbook of Statistics, (P. R. Krishnaiah andP. K. Sen. eds.) 4, 1984.

25. FREEDMAN, D. A. (May 1983). Comments on a paper by Markus.

26. FREEDMAN, D. (Oct. 1983, revised March 1984). On bootstrapping two-stage least-squares estimates in stationary linearmodels. AJUL S 1984, 12,827-842.

27. DOKSUM, K. A. (Dec. 1983). An extension of partial likelihood metwds for proportional hazard modelS to generaltransformation models. Ann. Statist., 1987, 15, 325-345.

28. BICKEL, P. J., GOETE, F. and VAN ZWET, W. R. (Jan. 1984). A simple analysis of third order efficiency of estimateProc. of the Neyman-Kiefer Conference, (L. Le Camn, ed.) Wadsworth. 1985.

29. BICKEL, P. J. and FREEDMAN, D. A. Asymptotic normality and the bootstrap in stratified sampling. Ann. Statist.12 470482.

30. FREEDMAN, D. A. (Jan. 1984). The mean vs. the median: a case study in 4-R Act litigation. JBES. 1985 Vol 3pp. 1-13.

31. STONE, C. J. (Feb. 1984). An asymptotically optimal window selection rule for kemel density estimates. Ann. Statist.,Dec. 1984, !2, 1285-1297.

32. BREIMAN, L. (May 1984). Nail finders, edifices, and Oz.

33. STONE, C. J. (OCt. 1984). Additive regression and other nonparametric models. Ann. Statist., 1985, 13, 689-705.

34. STONE, C. J. (June 1984). An asymptotically optimal histogram selection rule. Proc. of the Berkeley Conf. in Honor ofJerzy Neyman and Jack Kiefer (L Le Cam and R. A. Olshen, eds.), IL 513-520.

35. FREEDMAN, D. A. and NAVIDI W. C. (Sept. 1984, revised Jan. 1985). Regression models for adjusting the 1980Census. Statistical Science. Feb 1986, Vol. 1, No. 1, 3-39.

36. FREEDMAN, D. A. (Sept 1984, revised Nov. 1984). De Finetti's theorem in continuous tme.

37. DIACONIS, P. and FREEDMAN, D. (Oct. 1984). An elementary proof of Stirling's formula. Amer. Math Monthly. Feb1986, Vol. 93, No. 2, 123-125.

38. LE CAM, L. (Nov. 1984). Sur l'approximation de familles de mesures par des families Gaussiannes. Ann. Inst.Henri Poincar6, 1985, 21, 225-287.

39. DIACONIS, P. and FREEDMAN, D. A. (Nov. 1984). A note on weak star uniformities.

40. BREIMAN, L. and IHAKA, R. (Dec. 1984). Nonlinear discriminant analysis via SCALING and ACE.

41. STONE, C. J. (Jan. 1985). The dimensionality reduction principle for generalized additive models.

42. LE CAM, L. (Jan. 1985). On the nonnal approximation for sums of independent variables.

43. BICKEL, P. J. and YAHAV, J. A. (1985). On estimating the number of unseen species: how many executions werethere?

44. BRILLINGER, D. R. (1985). The natural variability of vital rates and associated statistics. Biometrics, to appear.

45. BRILLINGER, D. R. (1985). Fourier inference: some methods for the analysis of array and nonGaussian series data.Water Resources Bulletin, 1985, 21, 743-756.

46. BREIMAN, L. and STONE, C. J. (1985). Broad spectrum estirnates and confidence intervals for tail quantiles.

47. DABROWSKA, D. M. and DOKSUM, K. A. (1985, revised March 1987). Partial likelihood in transformation modelswith censored data.

48. HAYCOCK, K. A. and BRILLINGER, D. R. (November 1985). LIBDRB: A subroutine library for elementary timeseries analysis.

49. BRILLINGER, D. R. (October 1985). Fitting cosines: some procedures and some physical examples. Joshi Festschrift,1986. D. Reidel.

50. BRILLINGER, D. R. (November 1985). What do seismology and neurophysiology have in common? - Statistics!Comptes Rendus Math. B Acad. Sci. Canada. January, 1986.

51. COX, D. D. and O'SULLIVAN, F. (October 1985). Analysis of penalized likelihood-type estimators with application togeneralized smoothing in Sobolev Spaces.

Page 56: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 3 -

52. O'SULLIVAN, F. (November 1985). A practical perspective on ill-posed invea problems: A review with somenew developments. To aPPear in Journal of Statistical Science.

53. LE CAM, L and YANG. G. L. (November 1985, revised March 1987). On the prervation of local asymptotic normalityunder informadon loss

54. BLACKWELL D. (November 1985). Approximate nonnality of larg products.

55. FREEDMAN, D. A. (June 1987). As others see us: A case study in path analysis. Joumal of EducatonalStatistics. 12, 101-128.

56. LE CAM, L. and YANG, G. L. (January 1986). Replaced by No. 68.

57. LE CAM, L. (February 1986). On the Bemstein - von Mises theorem.

58. O'SULLIVAN, F. (January 1986). Estimation of Densities and Hazards by the Method of Penalized likelihood.

59. ALDOUS, D. and DIACONIS, P. (February 1986). Strong Uniform Times and Finite Random Walks.

60. ALDOUS, D. (March 1986). On the Markov Chain simulation Method for Uniform Combinatorial Distributions andSimulated Annealing.

61. CHENG, C-S. (April 1986). An Optmization Problem with Applications to Optimal Desip Theory.

62. CHENG, C-S, MAJUMDAR, D., STUFKEN, J. & TURE, T. E. (May 1986, revised Jan 1987). Optinal step typedesign for comparing test treatments with a control.

63. CHENG, C-S. (May 1986, revised Jan. 1987). An Application of the Kiefer-Wolfowitz Equivalence Theorem.

64. O'SULLIVAN, F. (May 1986). Nonparametric Estimation in the Cox Proportional Hazads Model.

65. ALDOUS, D. (JUNE 1986). Finite-Time Implications of Relaxation Times for Stochastically Monotone Processes.

66. PITMAN, J. (JULY 1986, revised November 1986). Stationary Excursions.

67. DABROWSKA, D. ant DOKSUM, K. (July 1986, revised November 1986). Estimates and confidence intervals formedian and mean life in the proportional hazard model with censored data.

68. LE CAM, L. and YANG, G.L. (July 1986). Distinguished Statistics, Loss of information and a theorem of Robert B.Davies (Fourth edition).

69. STONE, C.J. (July 1986). Asymptotic properties of logspline density estimation.

71. BICKEL. PJ. and YAHAV, J.A. (July 1986). Richardson Extrapolation and the Bootstrap.

72. LEHMANN, E.L. (July 1986). Statistics - an overview.

73. STONE, C.J. (August 1986). A nonparametric framework for statistical modelling.

74. BIANE, PH. and YOR, M. (August 1986). A relation between Levy's stochastic area formula, Legendre polynomial,and some continued fractions of Gauss.

75. LEHMANN, E.L. (August 1986, revised July 1987). Comparing Location Experiments.

76. O'SULLIVAN, F. (September 1986). Relative risk estimation.

77. O'SULLIVAN, F. (September 1986). Deconvolution of episodic hormone data

78. PITMAN, J. & YOR, M. (September 1987). Further asymptotic laws of planar Brownian motion.

79. FREEDMAN, D.A. & ZEISEL, H. (November 1986). From mouse to man: The quantitative assessment of cancer risks.To appear in Statistical Science.

80. BRILLINGER, D.R. (October 1986). Maximum likelihood analysis of spike trains of interacting nerve cells.

81. DABROWSKA, D.M. (November 1986). Nonparametric regression with censored survival time data

82. DOKSUM, K.J. and LO, A.Y. (November 1986). Consistent and robust Bayes Procedures for Location based onPartial Information.

83. DABROWSKA, D.M., DOKSUM, KA. and MIURA, R. (November 1986). Rank estimates in a class of semiparanetrictno-sample models.

Page 57: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 4 -

84. BRILLINGER. D. (December 1986). Some statistical methods for random process data from seismology andneurophysiology.

85. DIACONIS, P. and FREEDMAN, D. (December 1986). A dozen de Finetti-style results in search of a theory.Anm. Inst. Henr Poincar, 1987, 23, 397-423.

86. DABROWSKA. D.M. (January 1987). Uniform consistency of nearest neighbour and kernel conditional Kaplan- Meier estimates.

87. FREEDMAN, DA., NAVIDI, W. and PETERS, S.C. (February 1987). On the impact of variable selection infitting regression equations.

88. ALDOUS, D. (February 1987, revised April 1987). Hashing with linear probing, under non-uniform probabilities.

89. DABROWSKA, D.M. and DOKSUM, KA. (March 1987, revised January 1988). Estimating and testing in a twosample generalized odds rate model.

90. DABROWSKA, D.M. (March 1987). Rank tests for matched pair experiments with censored data.

91. DIACONIS, P and FREEDMAN, D.A. (April 1988). Conditional limit theorems for exponential families and finiteversions of de Finetti's theorem. To appear in the Joumal of Applied Probability.

92. DABROWSKA, D.M. (April 1987, revised September 1987). Kaplan-Meier estimate on the plane.

92a. ALDOUS, D. (April 1987). The Harmonic mean formula for probabilities of Unions: Applications to sparse randomgraphs.

93. DABROWSKA, D.M. (June 1987, revised Feb 1988). Nonparametric quantile regression with censored data

94. DONOHO, D.L. & STARK, P3. (June 1987). Uncertainty principles and signal recovery.

95. CANCELLED

96. BRILLINGER, D.R. (June 1987). Some examples of the statistical analysis of seisnological daita To appear inProceedings, Centennial Anniversary Symposium, Seismographic Stations, University of California, Berkeley.

97. FREEDMAN, DA. and NAVIDI, W. (June 1987). On the multi-stage model for carcinogenesis. To appear inEnvironmental Health Perspectives.

98. O'SULLIVAN, F. and WONG, T. (June 1987). Determining a function diffusion coefficient in the heat equation.99. O'SULLIVAN, F. (June 1987). Constrained non-linear regularization with application to some system identification

problems.

100. LE CAM, L. (July 1987, revised Nov 1987). On the standard asymptotic confidence ellipsoids of Wald.

101. DONOHO, D.L. and LIU, R.C. (July 1987). Pathologies of some minimum distance estimators. Annals ofStatistics, June, 1988.

102. BRILLINGER, D.R., DOWNING, K.H. and GLAESER, R.M. (July 1987). Some statistical aspects of low-doseelectron imaging of crystals.

103. LE CAM, L. (August 1987). Harald Cramer and sums of independent random variables.

104. DONOHO, A.W., DONOHO, D.L. and GASKO, M. (August 1987). Macspin: Dynamic graphics on a desktopcomputer. IEEE Computer Graphics and applications, June, 1988.

105. DONOHO, D.L. and LIU, R.C. (August 1987). On minimax estimation of linear functionals.

106. DABROWSKA, D.M. (August 1987). Kaplan-Meier estimate on the plane: weak convergence, LIL and the bootstrap.

107. CHENG, C-S. (Aug 1987, revised Oct 1988). Some orthogonal main-effect plans for asymmetrical factorials.

108. CHENG, C-S. and JACROUX, M. (August 1987). On the construction of trend-free run orders of two-level factorialdesigns.

109. KLASS, M.J. (August 1987). Ma.xinizing E max S, / ES': A prophet inequality for sums of I.I.D. mean zero variates.

110. DONOHO, D.L. and LIU, R.C. (August 1987). The "automatic" robustness of minimum distance functionals.Annals of Statistics June, 1988.

111. BICKEL P.J. and GHOSH, J.K. (August 1987, revised June 1988). A decomposition for the likelihood ratio statisticand the Bartlett correction - a Bayesian argument

Page 58: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 5 -

112. BURDZY, K., PITMAN, J.W. and YOR, M. (September 1987). Some asymptotic laws for crossings and excursions.

113. ADHIKAR1! A. and PTMIAN, 3. (September 1987). The shortst planar arc of width 1.

114. RlTOV, Y. (September 1987). Estimation in a linear regression model with censored data.

115. BICKEL, PJ. and RITOV, Y. (Sept. 1987, revised Aug 1988). Large sample theozy of esimation in biased samplingregression models L.

116. RITOV, Y. and BICKEL, P.J. (Sept.1987, revised Aug. 1988). Achieving infonnation bounds in non andsemiparametric models.

117. R1TOV, Y. (October 1987). On the convergence of a maximal correlation algoritun with altemrnating projections.

118. ALDOUS, D.J. (October 1987). Meeting times for independent Markov chains.

119. HESSE, C.H. (October 1987). An asymptotic expansion for the mean of the passage-time distribution of integratedBrownian Motion.

120. DONOHO, D. and LIU, R. (Oct. 1987, revised Mar. 1988, Oct. 1988). Geometrizing rates of converg e, LI.

121. BRILLIJGER, D.R. (October 1987). Estimating the chances of large earthquakes by radiocarbon dating and statistcalmodelling. To appear in Statstics a Guide to the Unknwn.

122. ALDOUS, D., FLANNERY, B. and PALACIOS, J.L. (November 1987). Two applications of um processes: The fringeanalysis of search trees and the simulation of quasi-stationary distributions of Markov chains.

123. DONOHO, D.L., MACGIBBON, B. and LIU, R.C. (Nov.1987, revised July 1988). Minmax risk for hyperrectangles.

124. ALDOUS, D. (Novemnber 1987). Stopping times and tightness I.

125. HESSE, C.H. (November 1987). The present state of a stochastic model for sedimentation.

126. DALANG, R.C. (Decenber 1987, revised June 1988). Optimal stopping of two-parmeter processe onnonstandard probability spaces.

127. Same as No. 133.

128. DONOHO, D. and GASKO, M. (December 1987). Multivariate generalizations of the median and timmed meam E.

129. SMITH, D.L. (December 1987). Exponential bounds in Vapnik-tervonenkis classes of index 1.

130. STONE, C.J. (Nov.1987, revised Sept. 1988). Uniform error bounds involving logspline models.

131. Same as No. 140

132. HESSE, C.H. (Decenber 1987). A Bahadur - Type representation for empirical quantiles of a large class of stationamy,possibly ifinite - variance, linear processes

133. DONOHO, D.L. and GASKO, M. (December 1987). Multivariate generalizations of the median and timmed mean, I.

134. DUBINS, L.E. and SCHWARZ, G. (December 1987). A sharp inequality for martingales and stopping-times.

135. FREEDMAN, DA. and NAVIDI, W. (December 1987). On the risk of lung cancer for ex-smokers.

136. LE CAM, L (January 1988). On some stochastic models of the effects of radiation on cell survival.

137. DIACONIS, P. and FREEDMAN, D.A. (April 1988). On the uniforn consistency of Bayes estimates for multinomialprobabilities.

137a. DONOHO, D.L. and LIU, R.C. (1987). Geometrizing rates of convergence, I.

138. DONOHO, D.L and LIU, R.C. (January 1988). Geometrizing rates of convergence, m.

139. BERAN, R. (January 1988). Refining simultaneous confidence sets.

140. HESSE, C.H. (Decanber 1987). Numerical and statistical aspects of neural networks.

141. BRILLINGER, D.R. (January 1988). Two reports on trend analysis: a) An Elementary Trend Analysis of Rio NegroLevels at Manaus, 1903-1985 b) Consistent Detection of a Monotonic Trend Superposed on a Stationary Time Series

142. DONOHO, D.L. (Jan. 1985, revised Jan. 1988). One-sided inference about functionals of a density.

Page 59: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

- 6 -

143. DALANG, R.C. (February 1988). Randomization in the two-armed bandit problem.144. DABROWSKA. D.M., DOKSUM, KA. and SONG, J.K. (February 1988). Graphical comparisons of cumulative hazards

for two populadons.145. ALDOUS, D.J. (Febrary 1988). Lower bounds for covering times for reversible Markov Chains and random walks on

graphs.146. BICKEL PJ. and RITOV, Y. (Feb.1988, revised August 1988). Estimating integrated squared density derivatives.

147. STARK, P.R. (March 1988). Strict bounds and applications.

148. DONOHO, D.L and STARK, P.B. (March 1988). Rearrangements and smoothing.

149. NOLAN, D. (March 1988). Asymptotics for a multivariate location estimator.

150. SELLIER, F. (March 1988). Sequential probability forecasts and the probability integral transform.

151. NOLAN, D. (March 1988). Limit theorems for a random convex set

152. DIACONIS, P. and FREEDMAN, D.A. (April 1988). On a theorem of Kuchler and Lauritzen.

153. DIACONIS, P. and FREEDMAN, DA. (April 1988). On the problem oftypes

154. DOKSUM, KA. (May 1988). On the correspondence between models in binary regression analysis and survival analysis.

155. LEHMANN, E.L. (May 1988). Jerzy Neyman, 1894-1981.

156. ALDOUS, DJ. (May 1988). Stein's method in a two-dimensional coverage problem

157. FAN, J. (June 1988). On the optimal rates of convergence for nonparametric deconvolution problem.

158. DABROWSKA, D. (June 1988). Signed-rank tests for censored matched pairs.

159. BERAN, R.J. and MILLAR, P.W. (June 1988). Multivariate symmetry models.

160. BERAN, R.J. and MILLAR, P.W. (June 1988). Tests of fit for logistic models.

161. BREIMAN, L. and PETERS, S. (June 1988). Comparing automatic bivariate smoothers (A public service enterprise).

162. FAN, J. (June 1988). Optimal global rates of convergence for nonparametric deconvolution problem.

163. DIACONIS, P. and FREEDMAN, DA. (June 1988). A singular measure which is locally uniform. (Revised byTech Report No. 180).

164. BICKEL, P.J. and KRIEGER, A.M. (July 1988). Confidence bands for a distribution function using the bootstrap.

165. HESSE, C.H. (July 1988). New methods in the analysis of economic time series I.

166. FAN, JIANQING (July 1988). Nonparametric estimation of quadratic functionals in Gaussian white noise.

167. BREIMAN, L., STONE, C.J. and KOOPERBERG, C. (August 1988). Confidence bounds for extreme quantiles.

168. LE CAM, L. (August 1988). Maximum likelihood an introduction.

169. BREIMAN, L. (August 1988). Submodel selection and evaluation in regression-The conditional caseand little bootstrap.

170. LE CAM, L. (September 1988). On the Prokhorov distance between the empirical process and the associated Gaussianbridge.

171. STONE, C.J. (September 1988). Large-sample inference for logspline models.

172. ADLER, R.J. and EPSTEIN, R. (September 1988). Intersection local times for infinite systems of planar brownianmotions and for the brownian density process.

173. MILLAR, P.W. (October 1988). Optimal estimation in the non-parametric multiplicative intensity model.

174. YOR, M. (October 1988). Interwinings of Bessel processes.

175. ROJO, J. (October 1988). On the concept of tail-heaviness.

176. ABRAHAMS, D.M. and RIZZARDI, F. (September 1988). BLSS - The Berkeley interactive statistical system:An overview.

Page 60: Geometrizing Rates of Convergence,-6-2. AnAttainable Bound As in section 1, let T be a functional of interest, and let F be the regularity class in which F is knownto lie. Let Fs,

177. MfLLAR, P.W. (October 1988). Gamma-funnels in the domain of a probability, with statistical implications.

178. DONOHO, D.L. and LIU, R.C. (October 1988). Hardest one-dimensional subfamilies.

179. DONOHO, D.L. and STARK. P.B. (October 1988). Recovery of sparse signals from data missing low frequencies.

180. FREEDMAN, DA. and PITMAN, J.A. (Nov. 1988). A measur which is singular and uniformly locally uniform.(Revision of Tech Report No. 163).

181. DOKSUM, KA and HOYLAND, ARNLJOT (Nov. 1988). A model for step-stress accelerated life testing experimentsbased on Wiener processes and the inverse Gaussian distribution.

182. DALANG, R.C., MORTON, A. and WILLINGER, W. (November 1988). Equivalent martingale measures andno-arbitrage in stochastic securities market models.

Copies of these Reports plus the most recent additions to the Technical Report series are available from the Statistics Depart-ment technical typist in room 379 Evans Hall or may be requested by mail from:

Department of StatisticsUniversity of CalifomiaBerkeley, Califomia 94720

Cost: $1 per copy.