Semi-parametric Estimation of the Hölder Exponent of a Stationary Gaussian Process with Minimax...

24
Statistical Inference for Stochastic Processes 4: 283–306, 2001. c 2001 Kluwer Academic Publishers. Printed in the Netherlands. 283 Semi-parametric Estimation of the H ¨ older Exponent of a Stationary Gaussian Process with Minimax Rates GABRIEL LANG 1 and FRANC ¸ OIS ROUEFF 2 1 Ecole Nationale du G´ enie Rural et des Eaux et Forˆ ets. e-mail: [email protected] 2 Ecole Nationale Sup´ erieure des T´ el´ ecommunications. e-mail: [email protected] Abstract. Let (X(t)) t [0,1] be a centered Gaussian process with stationary increments such that IE[(X u+t X u ) 2 ] = C|t | s +r(t). Assume that there exists an extra parameter β> 0 and a polynomial P of degree smaller than s + β such that |r(t) P(t)| is bounded with respect to |t | s +β . We consider the problem of estimating the parameter s (0, 2) in the asymptotic framework given by n equis- paced observations in [0, 1]. Adding possibly stronger regularity conditions to r , we define classes of such processes over which we show that s cannot be estimated at a better rate than n min(1/2,β) . Then, we study increment (or, more generally, discrete variation) estimators. We obtained precise bounds of the bias of the variance which show that the bias mainly depend on the parameter β and the variance on two terms, one depending on the parameter s and one on some regularity properties of r . A central limit theorem is given when the variance term relying on s dominates the bias and the other variance term. Eventually, we exhibit an estimator which achieves the minimax rate over a wide range of classes for which sufficient regularity conditions are assumed on r . Key words: fractal dimension, discrete variations, H¨ older exponent, minimax optimal rate estima- tion, semi-parametric models, Adler process, fractional Brownian motion. Abbreviations: a.e. – almost everywhere; w.r.t. – with respect to; MSE – mean square error; a.s. – almost surely; iff – if and only if; Supp(f ) – support of f ; x – sup{n IN |n x}; x – inf {n IN |n x}. 1. Introduction Let (X(t)) t [0,1] be a centered Gaussian process with stationary increments and define its variogram function by: v(t) = 1 2 IE[(X(u + t) X(u)) 2 ]. (1) Notice that a second order stationary process of covariance γ(t) has a variogram function v(t) = γ(0)γ(t). Assume that v is continuous and satisfies, for s (0, 2), a positive real β and a polynomial P of order smaller than s + β v(t) = C|t | s + P(t) + O(|t | s +β ). (2)

Transcript of Semi-parametric Estimation of the Hölder Exponent of a Stationary Gaussian Process with Minimax...

Statistical Inference for Stochastic Processes 4: 283–306, 2001.c© 2001 Kluwer Academic Publishers. Printed in the Netherlands. 283

Semi-parametric Estimation of the HolderExponent of a Stationary Gaussian Process withMinimax Rates

GABRIEL LANG1 and FRANCOIS ROUEFF2

1Ecole Nationale du Genie Rural et des Eaux et Forets. e-mail: [email protected] Nationale Superieure des Telecommunications. e-mail: [email protected]

Abstract. Let (X(t))t∈[0,1] be a centered Gaussian process with stationary increments such thatIE[(Xu+t−Xu)2] =C|t |s+r(t). Assume that there exists an extra parameter β > 0 and a polynomialP of degree smaller than s+β such that |r(t)−P(t)| is bounded with respect to |t |s+β . We considerthe problem of estimating the parameter s ∈ (0, 2) in the asymptotic framework given by n equis-paced observations in [0, 1]. Adding possibly stronger regularity conditions to r , we define classesof such processes over which we show that s cannot be estimated at a better rate than nmin(1/2,β).Then, we study increment (or, more generally, discrete variation) estimators. We obtained precisebounds of the bias of the variance which show that the bias mainly depend on the parameter β andthe variance on two terms, one depending on the parameter s and one on some regularity propertiesof r . A central limit theorem is given when the variance term relying on s dominates the bias andthe other variance term. Eventually, we exhibit an estimator which achieves the minimax rate over awide range of classes for which sufficient regularity conditions are assumed on r .

Key words: fractal dimension, discrete variations, Holder exponent, minimax optimal rate estima-tion, semi-parametric models, Adler process, fractional Brownian motion.

Abbreviations: a.e. – almost everywhere; w.r.t. – with respect to; MSE – mean square error; a.s.– almost surely; iff – if and only if; Supp(f ) – support of f ; �x� – sup{n∈ IN |n� x}; �x� – inf{n ∈ IN |n� x}.

1. Introduction

Let (X(t))t∈[0,1] be a centered Gaussian process with stationary increments anddefine its variogram function by:

v(t) = 1

2IE[(X(u+ t)− X(u))2]. (1)

Notice that a second order stationary process of covariance γ (t) has a variogramfunction v(t)= γ (0)−γ (t). Assume that v is continuous and satisfies, for s ∈ (0, 2),a positive real β and a polynomial P of order smaller than s + β

v(t) = C|t|s + P(t)+O(|t|s+β). (2)

284 GABRIEL LANG AND FRANCOIS ROUEFF

Because of the symmetry satisfied by v, P is a polynomial of t2. Assumemoreover thatX is given by the n observations X1/n, . . . , Xn/n. In this contribution,we investigate minimax rate optimal estimation of s for a precise semi-parametricmodel. An upper bound of this rate is first derived over a wide range of semi-parametric models. Many estimators of s have been studied (e.g., Constantine andHall [5], Feuerverger et al. [10], Chan et al. [3], Istas and Lang [14] and Kent andWood [16]). We give general results on the statistical properties of estimators of sbased on increments (or discrete variations). We generalize the results of Istas andLang [14] and Kent and Wood [16] on a discrete variation estimator, denoted bys0(n), in the same semi-parametric framework. In particular, we prove that in semi-parametric classes defined by (2) and other precise regularity conditions, s0(n)achieves the minimax rate nmin(1/2,β).

The paper is organized as follows: Section 2 introduces the general assumption(A1) and describes the regularity conditions on v which will be considered in oursemi-parametric model. In Section 3, we determine upper bounds of the minimaxrate in given classes of processes defined by regularity conditions on v, which is themain result of this contribution. Section 4.1 introduces some notations and prelim-inary properties of discrete variations. In Section 4, we define quadratic discretevariation estimators parameterized by a scale, a number of locations and a gapbetween successive locations, and state asymptotic statistical properties of theseestimators: first in a general asymptotic setting and with general assumptions (sub-section 4.3), second, in the above asymptotic framework and with precise regularityconditions on v, for the particular estimator s0(n) (subsection 4.4). We completethis contribution with some concluding remarks on important issues related to ourwork. The main results are followed by the proofs. For convenience, the otherproofs are either sketched or postponed to the appendix following the conclusion.

2. General Assumptions

2.1. MAIN ASSUMPTION

Consider the following assumption.

(A1) (X(t))t∈[0,1] is a centered Gaussian process with stationary increments. Itsvariogram function v defined by Equation (1) is continuous and there exists ∈ (0, 2), C > 0 and a function r such that:

v(t) = C|t|s + r(t) and r(t) = o(|t|s) when t → 0. (3)

We will denote s(X) the parameter s of a process X satisfying (A1). Suchprocesses are derived from the multidimensional model defined by Adler [1] (defin-ition 8.3.1) and called index-α processes, where, in our case, α= s/2. The index αis also the upper bound of the indices for which, which probability one, the samplefunction of X(t) satisfies a global (or uniform) Holder regularity of such order(see Adler [1]). We may thus call α the Holder exponent of X. In the case of

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 285

Adler processes, it is also related to well known fractal indices (e.g., Hausdorffdimension, Box dimension) of the sample path as often mentioned in related con-tributions. For an overview of the applications of fractal indices estimation, thereader may refer to the study of Davies and Hall [8], where two-dimensional Adlerprocesses are identified by considering line transects. Semi-parametric models ofprocesses satisfying assumption (A1) are defined using additive regularity con-ditions on r in the following section. Adler processes (and in particular thosesatisfying (A1)) share a crucial property with other processes generically calledlong range dependent (LRD) processes (see e.g., Bardet et al. [2] and Lang andAzaıs [17]): they are asymptotically self-similar. In (3), the self-similarity propertylies in the term C|t|s ; for LRD processes, it lies in the exponential behavior of thespectral density at low frequencies. However, a major difference between these twoclasses of processes must be pointed out here: for Adler processes, the self-similartrend dominates only at small scales (t small in (3)) whereas for LRD processes, theself-similar trend dominates at low frequencies, that is, at large scales. Hence twomajor differences for estimating the self-similar parameter: the asymptotic frame-work is not the same and the considered classes of processes for semi-parametricestimation are completely different. A major disadvantage follows: one cannot dir-ectly apply results which hold for LRD processes (for which there exists a largeand high rank literature) although similar methods can be relevant.

2.2. HOLDER REGULARITY AT 0 OF THE DERIVATIVES

Estimators based on discrete variations has been proposed in Constantine ans Hall[5] and extended in Istas and Lang [14] and Kent and Wood [16]. We wish to invest-igate whether such estimators are optimal. We thus need to define classes of pro-cesses over which the convergence rate of such estimators and upper bounds of theminimax rate can be computed and compared. These classes are defined throughsome regularity properties of r. Following Constantine and Hall [5], Feuervergeret al. [10], Istas and Lang [14] and Kent and Wood [16], assumption (A1) may besharpened to take into account some Holder regularity of the derivatives of r atthe origin. In order to generalize this approach, we consider classes of functionsrelying on two indices (s, j)∈ IR × IN . More precisely, define:

DEFINITION 1. Let s ∈ IR, j ∈ IN andC1> 0. f : (−1, 1)→ IR belongs to F0(s, j,

C1) iff ||f ||∞ �C1 and there exists a polynomial P of degree d � max(−1, �s −j� − 1) such that, for Lebesgue-a.e. t in (−1, 1),

|f (j)(t)− P(t)|�C1|t|s−j ,where f (j) denote the j th derivative of f . We use the convention that the nullpolynomial is of order −1. The polynomial P above is unique and coincides withthe Taylor expansion of f (j) at 0 of degree d < s − j , which we denote by P :=Pf (j),s−j,0 and d := d(f (j), s − j).

286 GABRIEL LANG AND FRANCOIS ROUEFF

We clearly have: s′ � s⇒ F0(s′, j, C1) ⊆ F0(s, j, C1). On the other hand, one

may prove by integrating the inequality in Definition 1 that for j � 1, if s �= j − 1,there exists a constant C ′

1 depending on s, j, C1 such that F0(s, j, C1)⊆ F0(s, j −1, C ′

1(s, j, C1)). To simplify this property and to extend it to the case s= j − 1,we will then rather use the following class of functions, for s ∈ IR, C1> 0 andj ∈ IN .

F0(s, j, C1) =j⋂k=0

F0(s, k, C1),

which now implies.

PROPERTY 1.

• If s′ � s and j ′ � j then F0(s′, j ′, C1)⊆ F0(s, j, C1).

• If s /∈ {0, 1, . . . , j − 1}, there exists C ′1 depending on C1, j and s such that

F0(s, j, C1)⊆ F0(s, j, C′1(C1, j, s)).

We will derive a bound on the discrete variations of a function f of F0(s, j0, C1)

(Lemma 9). One may derive a similar result for its wavelet coefficients: let ψ be acompactly supported wavelet of sufficient regularity and vanishing moments (seeDaubechies [7]) and ψj,k(t) = ψ(2j t − k), there is a constant K0 only dependingon the wavelet, s and j such that, when Supp(ψj,k) ⊂ (−1, 1).

|〈f |ψj,k〉|�K0C12−j s(1 + |k|)s−j0 . (4)

It highlights the fact that this classes of functions are included in more generalfunctions spaces, namely the two-microlocal spaces (e.g., see Jaffard and Meyer[15] for a description of these spaces): F0(s, j, C1) ⊆ Cs,s−j0 ((−1, 1)). Let us onlymention here that these spaces are crucial for analyzing the oscillating propertiesof a given function. We now derive the following classes of processes using thefunction classes F0.

DEFINITION 2. Let β > 0, j ∈ IN,C0> 1 and C1> 0. We denote by H"(β, j,

C0, C1) the set of processes X which satisfy assumption (A1) with C ∈ (C−10 , C0)

and r ∈ F0(s(X)+ β, j, C1).

2.3. EXAMPLES

We now give some examples of processes satisfying (A1) and determine the couples(s′, j) such that r ∈ F0(s

′, j, C1). A classical example is given by the fractionalBrownian motion (fBm) process (defined by Mandelbrot and Van Ness [18]). De-note (XH,σ (t))t∈IR the fBm process such that XH,σ (0) = 0, a.s. and whose covari-ance function is given by:

γH,σ (t, u) := IE(XH,σ (t)XH,σ (u)) = σ

2(|t|2H + |u|2H − |t − u|2H). (5)

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 287

This parametric model is such that Equation (3) gives r = 0. We will see that, in{XH,σ ,H ∈ (0, 1), σ > 0}, the optimal rate for estimating H from XH,σ (i/n), i =1, . . . , n is

√n.

One can derive an other example of processes satisfying (A1) by adding afinite number of independent fBms. In this case v(t) is a finite sum of termsσi|t|si , i= 1, . . . , n, and assuming that si is increasing with i, the fractal indexto estimate is s1 and r(t)=∑i>1 σi|t|si . We easily get r ∈ F0(s2, j, C1(j)) for allj ∈ IN and an appropriate constant C1(j) depending on j and σi, i > 1. Thus theprocess belongs to H"(s2 − s1, j, σ0, C1(j)) for all j ∈ IN . We will see that oversuch classes the optimal rate for estimating s1 from the observations X(i/n), i =1, . . . , n is min(ns2−s1,

√n).

Another well known example of processes is given by the centered stationaryGaussian processes (YC,s(t)) with covariance functions: γ (t)= exp(−C|t|s), fors ∈ (0, 2) and C > 0. This case can be interpreted as a generalization of the preced-ing to an ‘infinite’ sum with si = i s so that s2 − s1 = s. This process thus belongsto H"(s, j, C,C ′(j)) for all j ∈ IN and an appropriate constant C ′(j) dependingon j and C. On such classes, the optimal rate is min(ns,

√n).

In all these examples, the remainder r of the variogram function is smooth(namely infinitely differentiable) away from 0: ∃s′>s,∀j, ∃C1> 0, r ∈ F0(s

′, j,C1). The classical assumption is in fact to have this property for j sufficiently large(see e.g., Constantine and Hall [5], Feuerverger et al. [10], Istas and Lang [14]and Kent and Wood [16]). We will determine classes of processes defined usingH"(β, j, C,C ′(j)) for which j , s, β are adjusted so that the rate min(nβ,

√n) is

achieved by discrete variation estimators and optimal.One may also derive processes whose covariance function is nowhere differ-

entiable but satisfies a global Holder regularity condition. For α > 0, let γα bethe so called Weierstrass function defined by γα(t) := ∑∞

j=0 2−αj cos(2jπt). As aFourier series with positive coefficients, it is a nonnegative function and thus definea centered stationary Gaussian process (Zα(t))t∈[0,1] whose covariance is γα . It iswell known that γα has global Holder regularity α and pointwise Holder regularityat most α everywhere (see e.g. Falconer [9]) so that for instance, if α < 1, it isnowhere differentiable. Adding an independent process XH,σ to Zα with α >Hgives a process which satisfies (A1) with s= 2H and r = γα.

In this case, the regularity in terms of classes F0 is not appropriate to studydiscrete variation estimators and the global Holder regularity should be taken intoaccount instead. More generally, the classes F0 define a pointwise regularity in thesense that the time origin plays a particular role in their definition. On the contrary,if r is defined through its Fourier transform, one can easily derive an estimate ofits global Holder or Sobolev regularity whereas its regularity in terms of functionclasses F0 may be useless. However, some results can also be derived for theseglobal regularity assumptions. For convenience, it will not be investigated here.Let us refer to Roueff [19] where the rates of convergence for estimating s areinvestigated for r in classes defined by Holder or Sobolev regularity.

288 GABRIEL LANG AND FRANCOIS ROUEFF

3. Minimax Upper Bounds for the Rate of Convergence

Let Sn denote the set of estimators based on the n observations at times i/n, i=1, . . . , n.

3.1. STATEMENT OF THE RESULTS

We claim that the rate√n may not be outperformed by any estimator in a semi-

parametric frame containing, even locally, the parametric class defined by the fBm(see Section 2.3).

THEOREM 1. Let ( be a nonempty open set of (0, 1)× (0,∞). Denote.

G"(() := {XH,σ |(H, σ ) ∈ (}.Then there exist a function B(H, σ ), continuous and positive on (0, 1) × (0,∞)such that:

lim infn→∞ inf

sn∈Snsup

Y∈G"(()nIE[(sn − s(Y ))2] � sup

θ∈(B(θ).

We now state that, in semi-parametric classes defined by r ∈ F0(s, j, C1), withpossibly large values of j , the minimax rate is not larger than nβ .

THEOREM 2. For any β > 0, j ∈ IN , and C0> 1, C1> 0.

lim infn→∞ inf

sn∈Snsup

Y∈H"(β,j,C0,C1)

n2βIE[(sn − s(Y ))2] > 0,

where H"(β, j, C0, C1) is defined by definition 2.

3.2. PROOF OF THEOREM 1

We first use the self-similarity of the fBm (see e.g. Samorodnitsky and Taqqu [20])to apply results on estimators based on the observations (XH,σ (i))i=1,...,n. Let g bea function of IRn. Then.

IE[g({XH,σ (i/n)}i=1,...,n)] = IE[g({XH,n−Hσ (i)}i=1,...,n)].

Now, there exist H0, σ0 and ε such that {(H, n−Hσ )|(H, σ ) ∈ (} contains the set[H0−ε,H0+ε]×{σ0} and either [H0−ε,H0+ε] ⊂ (0, 1/2) or [H0−ε,H0+ε] ⊂(1/2, 1). Now, it is sufficient to bound.

lim infn→∞ inf

sn

supH∈[H0−ε,H0+ε]

nIE[(sn([XH,σ0(i)]i=1,...,n)− s(XH,σ0))2].

Let sn be any estimator based on the observations [XH,σ0(i)]i=1,...,n. We first usethat the minimax bound is bounded by the Bayesian bound, that is for any measuredλ defined on the Borel sets of [H0 − ε,H0 + ε].

supH∈[H0−ε,H0+ε]

IE[(sn − 2H)2] �∫ H0+ε

H0−εIE[(sn − 2H)2] dλ(H).

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 289

Using the so called Van Trees inequality in the scalar case (readily applyingEquation (4) in Gill and Levit [2]), we obtain, if dλ has a, say C1, density w.r.t. theLebesgue measure.

supH∈[H0−ε,H0+ε]

IE[(sn − 2H)2] � 4

(∫ H0+ε

H0−εIn(H)λ(H) dH + I (λ)

)−1

� 4

(sup

H∈[H0−ε,H0+ε](In(H))+ I (λ)

)−1

,

where In(H) is the Fisher information of (XH,σ0(i))i=1,...,n and I (λ) = ∫(λ′(t))2/λ

(t) dt . One can actually fix H0 independently of n while σ0 > 0 and ε both dependon n in a such a way that ε behaves like 1/ log(n) up to a multiplicative con-stant. Since the only condition required on dλ is some regularity and Supp(dλ) ⊆[H0 − ε,H0 + ε] one can choose this measure appropriately so that I (λ) behavesaccording to log2(n). The rest of the proof will show that supH∈[H0−ε,H0+ε](In(H))(does not depend on σ0> 0 and) behaves like n up to a multiplicative constant asn, ε respectively tend to ∞, 0. The claimed bound will then follow. Using thatXH,σ0(0)= 0, a.s., In(H) is also the Fisher information of the fractional Gaussiannoise (fGn) vector [XH,σ0(i)−XH,σ0(i−1)]i=1,...,n. The spectral density of the fGn(see, e.g., Samorodnitsky and Taqqu [20]) at frequency u ∈ [−π, π ] is.

fH(u) = σ 20

C2(H)|ei u − 1|2

∞∑k=−∞

1

|u+ 2kπ |2H+1,

where C2(H) = πH0(2H) sin(Hπ) . We get.

In(H) = 1/2Tr ((T −1n (fH )Tn(∂HfH))

2),

where Tn(h) denotes the Toeplitz matrix of size N ×N associated to the spectralfunction h. Notice that fH(u)= |u|1−2HL1H(u) and ∂HfH(u)=−|u|1−2H(log |u|L2H (u)+L1H(u)∂H [log(C2(H))]), where L1H and L2H are positive and infinitelydifferentiable over [−π, π ]. In Dahlhaus [6], to prove the efficiency of the Whittleestimator for long-range dependent processes, the limit of the normalized Fisherinformation matrix for long-range dependent processes is computed with such as-sumptions that this result applies to the fGn in the case 1/2<H < 1. We check herethis result still holds in the case 0<H < 1/2 (the process is no longer long-rangedependent but its spectral density vanishes at 0). Theorem 5.1. in Dahlhaus [6]relies on two preliminary results (Lemmas 5.2 and 5.3) and on a result of Fox andTaqqu [11] (Theorem 1). The latter is valid in the case of the fGn with H ∈ (0, 1).Lemma 5.2 relies on some properties of the spectral density f which allow tobound the quantity |f (x)/f (y) − 1|. This bound is applied in a context invariantto permutations of x and y. Thus the same result applies when 1/f satisfies theconditions of this lemma, which is the case for fH,σ when 0<H < 1/2. The proofof Lemma 5.3 is also easy to adapt to this case and gives:

290 GABRIEL LANG AND FRANCOIS ROUEFF

LEMMA 1. Let f and g be symmetric functions such that f is positive and g isnonnegative, and assume there exist α and β, 0<α, β < 1, with 1/f (x)=O(|x|−α)and g(x) = O(|x|β). Then:

||TN(f )−1/2TN(g)1/2|| = O(Nmax(0,(α−β)/2)),

where ||A|| = sup∑ |xi |2=1

(∑ |(Ax)i |2)1/2

.

Finally, the result of Theorem 5.1 of Dahlhaus [6] applies for H ∈ (0, 1/2) ∪(1/2, 1) and gives.

limn→∞

1

2nTr ((T −1

n (fH )Tn(∂HfH))2) = I (H),

where I (H) := 14π

∫ π−π(∂H log(fH (x)))2dx is positive and continuous over (0,1).

Moreover, this convergence is uniform for H in compact sets included in (0, 1/2)∪ (1/2, 1) since it relies on bounds of L1H and L2H (Assumptions (A2) and (A7)of Dahlhaus [6]) and their derivatives which can be derived independently of H on

such sets. It yields, by continuity of I−1

.

lim infn→∞ inf

sn∈Snsup

Y∈G"(()nIE[(sn − s(Y ))2] � 4 sup

H∈p1(()

I−1(H), (6)

where p1 denotes the projection on the first component.

3.3. PROOF OF THEOREM 2

It is clear that, without loss of generality, C0 and C1 may be set to any posit-ive values, and j may be taken arbitrarily large (if j ′ � j , H"(β, j ′, C0, C1)⊆H"(β, j, C0, C1)). Using Theorem 1 and the fact that G"((0, 2) × (0, C0))⊆H"(β, j, C0, C1) for any j , β and C1, the result has to be proved only for β ∈(0, 1/2). Now, it is sufficient to prove.

lim infn→∞ inf

sn∈Snsup

Y∈H"(β,j,C0,C1)

IPY {nβ|sn − s(Y )|�C} > 0.

Following Giraitis et al. [13], we define a sequence of processes ‘close’ to the whitenoise. For n∈ IN , let γn(t) be the continuous and symmetric function such that.

• ∀u ∈ (0, 1), γn(δnu) = 1 − exp(sn(θj (u)+ log(u))),• ∀t � δn, γn(t) = 0,

where δn= 1/(2n), sn= δβn , and

θj (u) =j∑k=1

(1 − u)kk

. (7)

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 291

θj has been chosen such that, for |v|< 1

θj (1 − v)+ log(1 − v) = −∑k�j+1

vk

k, (8)

which is a series of radius of convergence 1. We claim the following lemmas, whoseproofs are postponed to the appendix.

LEMMA 2. Let j � 2. For n sufficiently large, γn(t) is a covariance function.

In the sequel of the proof, we take j � 2 and denote Xn(t)t∈[0,1] the Gaussianstationary process of covariance γn.

LEMMA 3. If β ∈ (0, 1), there exists some positive numbers C0 and C1, and aninteger N such that for n�N(j), Xn ∈H"(β, j, C0, C1) and sn = s(Xn).Now, we have, for n sufficiently large, and for all sn in Sn and all p� n.

supY∈H"(β,j,C0,C1)

IPY {nβ|sn − s(Y )|�C} �

1/2(IPXn{nβ |sn − s(Xn)|�C} + IPXp {nβ|sn − s(Xp)|�C}).Since γn(0)= 1 = γp(0), ∀t � 1/n> δn� δp, γn(t)= 0 = γp(t), (Xpi/n|i=1,...,n) and(Xni/n|i=1,...,n) have the same covariance matrices, thus the same law. Hence,

supY∈H"(β,j,C0,C1)

IPY {nβ|sn − s(Y )|�C} �

1/2(IPXp {nβ|sn − s(Xp)|�C or nβ|sn − s(Xn)|�C}.Since nβ |s(Xp)−s(Xn)|/2 tends to 2−β−1 when p tends to infinity, for C < 2−β−2,and p sufficiently large, {nβ |sn − s(Xp)|�C or nβ |sn − s(Xn)|�C} is the certainevent. Thus,

supY∈H"(β,j,C0,C1)

IPY {nβ|sn − s(Y )|�C} � 1/2,

which concludes the proof of Theorem 2.

4. Estimators by Discrete Variations

4.1. NOTATIONS AND PRELIMINARY RESULTS ON DISCRETE VARIATIONS

We start with some notations which will be helpful to handle computations ondiscrete variations. Denote I the set of nonzero sequences of IRZZ whose supportsare finite. For a ∈ I , denote M(a) the number of zero moments of this sequence:

∀r = 0, . . . ,M(a) − 1,∑k

akkr = 0 and

∑k

akkM(a) �= 0. (9)

292 GABRIEL LANG AND FRANCOIS ROUEFF

Denote the discrete a-variation V(a):,t of a function f (u) at scale : and position :tby:

V(a):,t (f ) :=∑k

akf (:(t + k)).

For a given sequence a, we will consider the discrete variations vector [V(a):,t (X)]t∈Tof the process X(t) at a given scale:, where T is a set of equispaced indices. Moreprecisely, we will use sets defined by two parameters: the number m of locationsand a gap ρ. Denote T (m, ρ) such a set:

T (m, ρ) = {j ρ | j = 1, 2, . . . , m}.For two sequences a and a′ of I , denote b := a ∗ a′ the sequence defined bybj =∑k−l=j aka

′l . And denote a2∗ = a ∗ a. One proves easily, using the binomial

formula:

PROPERTY 2. Assume a, a′ ∈ I . Then: M(a ∗ a′)=M(a)+M(a′).

Adapting, for example, Istas and Lang [14] to our notations, we get.

PROPERTY 3. Let (X(t))t∈I be a centered process with stationary increments andlet v be its variogram function as defined by (1). Assume a, a′ ∈ I with M(a)� 1and M(a′)� 1. Then, for :> 0 and t, u∈ IR such that V(a):,t (X) and V(a

′):,u(X) are

well defined w.r.t. the support I , we have.

IE(V(a):,t (X)V(a′):,u(X)) = V(−a∗a

′):,t−u (v). (10)

The following lemmas deal with variations of the main term in (3).

LEMMA 4. Assume s ∈ (0, 2). Assume a ∈ I such that M(a)� 1. Denote K(a, s):= V(−a

2∗)1,0 (| · |s) = −∑k,l akal |k − l|s . Then K(a, s) > 0.

Proof. We refer, for example, to the proof of Lemma 2.10.8 of Samorodnitskyand Taqqu [20]. �LEMMA 5. Assume b ∈ I and s ∈ (0,M(b) − 1/2). Let (mn, ρn)∈ (IN × (0,∞))IN . Assume.

(A2) There exists ρ� 0, such that limn→∞ ρn= ρ,

(A3) mnρn→∞ when n→∞.

Denote

• ?(b),2n := ∑t,u∈T (mn,ρn)

∣∣V(b)1,|t−u|(| · |s)∣∣2,

• for h> 0, K∞(b, s, h) := h∑∞j=−∞

∣∣V(b)1,jh(| · |s)∣∣2,• K∞(b, s, 0) := ∫∞

−∞∣∣V(b)1,t (| · |s)∣∣2 dt.

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 293

ThenK∞(b, s, h) is finite and continuous w.r.t. s ∈ (0,M(b)−1/2) and h ∈ [0,∞).Moreover: limn→∞m−1

n ρn?(b),2n = K∞(b, s, ρ).

Proof (sketch). Notice that | · |s belongs to the class F0(s, k, C1) for any k ∈ INand an appropriate C1> 0. For s+1/2<k�M(b), Lemma 9 in the appendix givesan appropriate bound of each variation V(b)1,jh(| · |s) = :−sV(b):,jh(| · |s), where : issufficiently small. The quadratic sum of discrete variations is aggregated appropri-ately and the bounded convergence theorem gives the claimed limits and continuityproperties. �4.2. DEFINITION OF s(a,:,m, ρ)

In this section, we define estimators using quadratic discrete variations and studytheir asymptotic statistical properties. The set of locations of the discrete variationsis more general than in Istas and Lang [14] and Kent and Wood [16]. Moreover,we only assume that the process X satisfies (A1). For sake of simplicity, the scalelog-regression is computed over the two scales : and 2:. We will explain whysuch restriction does not influence the rate of convergence of the estimator. Denote

V (a,:,m, ρ) := 1

m

∑t∈T (m,ρ)

(V(a):,t (X))2. (11)

Denote a the interpolated version of a (i.e. ∀i ∈ZZ, a2i = ai and a2i+1 = 0 or equiv-alently: V(a):,t = V(a)2:,t/2). Clearly,M(a)=M(a). We will see that, when: is small:IE[V (a,:,m, ρ)] ∼ 2sIE[V (a,:,m, ρ)]. Hence the definition of the estimator ofs based on the m discrete a-variations at scales : and 2:, whose locations areequispaced by a gap :ρ:

s(a,:,m, ρ) := log2 V (a,:,m, ρ)− log2 V (a,:,m, ρ), (12)

where log2(x)= log(x)/ log(2).

4.3. MAIN RESULTS ON s(a,:,m, ρ)

We now give general results on discrete variation estimators for processes sat-isfying assumption (A1), with general regularity conditions on r. The followingproposition is proved in the appendix.

PROPOSITION 1. Assume (A1). Let a ∈ I and let :n,mn and ρn be sequencessuch that :n, ρn > 0 and mn ∈ IN . Denote:

• sn := s(a,:n,mn, ρn),• for b ∈ I , ?(b)n (r) := (∑mn−1

j=−mn+1

∣∣V(b):n,jρn(r)∣∣2)1/2.

Assume (A2–3) and.

(A4) limn→∞:n = 0,

294 GABRIEL LANG AND FRANCOIS ROUEFF

(A5) 2M(a) > s + 1/2,

(A6) For b successively equal to −a2∗, −a2∗, −a ∗ a:

limn→∞m

−1/2n :−s

n ?(b)n (r) = 0.

Then

IE[sn − s] = (C log(2))−1

∣∣∣∣∣∣V(−a

2∗):n,0 (r)

K(a, s):sn− V(−a

2∗):n,0 (r)

K(a, s):sn

∣∣∣∣∣∣+

+O V(−a

2∗):n,0 (r)

K(a, s):sn

2

+O V(−a

2∗):n,0 (r)

K(a, s):sn

2

+

+O((mnρn)−1)+O(m−1n :

−2sn max

b∈{−a2∗,−a2∗}(?(b)n (r)

)2). (13)

And

var[sn] = (mnρn)−1(L(a, s, ρ)+ o(1))+

+O(m−1n :

−2sn max

b∈{−a2∗,−a2∗,−a∗a}(?(b)n (r)

)2), (14)

where

L(a, s, ρ) := 2

log2(2)

(K∞(a2∗, s, ρ)K(a, s)2

+

+ K∞(a2∗, s, ρ)K(a, s)2

− 2K∞(a ∗ a, s, ρ)K(a, s)K(a, s)

).

Using Theorem 1, we derive the following result, whose proof is postponed to theappendix.

COROLLARY 1. Assume a ∈I , s ∈ (0, 2M(a) − 1/2). Then for any ρ� 0, L

(a, s, ρ)� 4I−1(s/2) > 0.

Assumptions (A2–5) are asymptotic conditions on the estimator parameters while(A6) relies on regularity properties of r. More precisely, (A2) allows to give anexplicit expression of the constant in the variance bound. (A3) means that the ratiobetween the width mnρn:n of the observations interval and the scale :n tendsto infinity. It implies that the number of observations also tends to infinity. (A4)implies that the frequency of the observations tends to infinity. (A5) comes fromthe assumptions of Lemma 5: it ensures that theK∞ constants used in the definitionof L(a, s, ρ) are all finite. Otherwise, the term (mnρn)−1 in (14) should be replaced

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 295

Figure 1. The parameter s is represented on the horizontal axis. We plot 4I−1(s/2) (solid

line), which gives a lower bound of the asymptotic n−1-normalized minimax MSE, see (6);the other plots represent L(a, s, 1) which is the asymptotic n−1-normalized MSE of the es-timator based on the discrete variation a in the simpler case, that is when r is smooth (s0, βand j sufficiently large in Theorem 3). We took the following variations: a= the Db2 filter(dotted line, M(a)= 2), a= (−1, 1) (dashed line, M(a)= 1), a= (1,−2, 1) (dash-dot line,M(a)= 2). Note that forM(a)= 1, L(a, s, 1) is not finite for s� 1.5, which justifies (A5).

by a larger bound. For instance, for mn= n and ρ= 1, it should be replaced by thebound obtained in Theorem 3 with β=∞, s0 = s and j = 2M(a).

The number of scales which are taken into account in the log-regression, thenumber of null momentsM(a), provided it is sufficiently large, the choice of a fora fixed M(a) and the number ρ when it does not vanish, do not influence the ratesof convergence obtained above. However, the constants depend directly on theseparameters. In Coeurjolly [4], a numerical study on similar constants is achievedand suggests that the first order increment (a= (−1, 1)) when s is small and thefilter ‘Db2’ (see Daubechies [7], M(Db2)= 2) when s is close to 2 give smallconstants compared to other filters. Following this conclusion, we computed ourown constants (L(a, s, 1)) using the same filters and obtained similar results (seeFigure 4.3). We also plot the bound given in Corollary 1. The comparison of theconstants with the lower bound indicates that, for extremal values of s (close to0 or to 2), it is worth looking for more efficient estimators as far as constants areconcerned.

When the bias and the term of the variance due to r are negligible w.r.t. the termdue to C|t|s (assumptions (A8) and (A9) below), we complete the preceding resultwith an asymptotic normality result whose proof is postponed to the appendix.

PROPOSITION 2. Denote

D(a, s, ρ) := K∞(a ∗ a, s, ρ)2 −K∞(a2∗, s, ρ)K∞(a2∗, s, ρ).

296 GABRIEL LANG AND FRANCOIS ROUEFF

Then D(a, s, ρ)� 0. Assume moreover:

(A7) a, s and ρ satisfy: D(a, s, ρ)< 0.

Then, L(a, s, ρ)> 0.With the same notations as in Proposition 1, assume (A1–5), (A7) and:

(A8) For b successively equal to −a2∗ and −a2∗:

limn→∞ V(b):n,0(r):

−sn

√mnρn = 0,

(A9) For b successively equal to −a2∗, −a2∗ and −a ∗ a:

limn→∞ ρn:

−2sn

(?(b)n (r)

)2 = 0.

Then,√ρnmn[sn − s] →d N (0, L(a, s, ρ)).

Assumption (A7) is assumed here for the sake of simplicity and would requirea deeper study to be enlightened. It may be proved or checked numerically onprecise examples. We are not able to prove that (A7) is always satisfied, say fors ∈ (0, 2M(a) − 1/2). However, we mention the following result.

LEMMA 6. Assume a ∈I such that M(a)� 1. Then

• ∃s0> 0,∀s ∈ (0, s0),∀ρ ∈ (0,∞)\{i/j |i ∈ Supp(a ∗ a), j ∈ZZ\{0}},D(a, s,ρ)< 0.

• ∀s0 ∈ (0,max(2, 2M(a)− 1/2)), ∃ρ0> 0,∀ρ � ρ0,∀s ∈ (0, s0], D(a, s, ρ)<0.

Proof (sketch). In the first case, we prove that D(a, 0, ρ)< 0 and concludeusing continuity arguments. In the second case, we prove that, when ρ→∞,ρ−1D(a, s, ρ) tends uniformly w.r.t s ∈ [0, s0] to a limit D(a, s,∞), continuousw.r.t s and negative for all s ∈ [0, s0]. �

Let us now apply the definition of the classes F0 to give simple expressions ofthe first term in (13).

LEMMA 7. Assume

(A10) There exist β,C1> 0 such that r ∈ F0(s + β, 0, C1).

Then, if 2M(a) >d(r, s + β), we have:

IE[sn − s] = O (:βn)+ two other terms, (15)

where the two other terms are the two last terms of equation (13). If moreover,|r(t)− Pr,s+β,0(t)| = D|t|s+β + o(ts+β), we may write:

IE[sn − s] = D(C log(2))−1K(a, s + β)K(a, s)

(2β − 1):βn + o (:βn)++ two other terms.

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 297

Proof (sketch). It is a direct consequence of Lemma 9 in the appendix withj = 0 and t = 0. �The last equation in Lemma 7 shows that as soon as a remainder r is authorized in(A1), :n should be taken as small as possible to lower the bias. Besides, at leastwhen the main term dominates in (14), which is the case for instance for the sumof several independent fBms, the variance rate is not improved by increasing :n.For estimating the LRD parameter by log-regression methods in the frequency do-main (see e.g. Lang and Azaıs [17]), a classical bias/variance compromise methodconsists in taking into account an infinitely growing number of frequencies as thenumber of observations tends to infinity. Here, the behavior of the bias and of thevariance w.r.t. :n refutes the idea of using an infinitely growing number of scalesin the regression. Indeed, it would imply using larger scales than for a fixed numberof scales and thus lower rates of convergence.

Remark. As pointed out in remark 8 of Kent and Wood [16], the polynomialterms of r do not influence the bias. However, they do not influence the parameterβ (defined by (A10)) as well, which is not the case when β is defined by |r(t)| =O(|t|s+β).

4.4. s0(n, a): DEFINITION AND RESULTS

Let Sn denote the set of all the estimators derived from the observations X1/n, . . . ,

Xn/n. We want to study estimators of s with respect to the number n of observationsin the asymptotic framework defined by Sn. Thus, in order to make the estimators(a,:,m, ρ) feasible, we must restrain the parameters (:,m, ρ)w.r.t. the numberof observations n such that the set T (m, ρ) satisfies.

∀i ∈ Supp(a),∀t ∈ T (m, ρ),:(t + i) ∈ {j/n | j = 1, 2, . . . , n}.In this case, [V(a):,t (X)]t∈T is a linear transformation of the Gaussian vector[Xi/n]i=1,...,n. For instance, assume, without loss of generality, that Supp(a)∈ IN .We will denote:

s0(n, a) := s(a, 1/n, n − q, 1),where q is the smallest integer such that Supp(a)⊆{0, 1, . . . , q}. We have, forn� q, s0(n, a)∈Sn.We now investigate the asymptotic statistical properties of the Mean Square Errorof the estimator s0(n, a) with precise regularity conditions on r. The followingresult is proved in the appendix.

THEOREM 3. Assume (A1). Assume that a ∈ I with M(a)� 2. Assume that C ∈(C−1

0 , C0) for some constant C0> 1. Assume (A10) and:

(A11) Holder regularity at 0: r ∈ F0(s0, j, C1) for s0 > s − 1/2 and j > s.

298 GABRIEL LANG AND FRANCOIS ROUEFF

Then, there exists a constant K3 depending on a, s, C0, C1 such that.∣∣IE[(s0(n, a)− s(X))2] − n−1L(a, s(X), 1)(1 + o (1))∣∣�

K3(a, s, C0, C1)max

n−2β,

n−2(s0−s)−1 if j > s0 + 1/2n−2(s0−s)−1 log(n) if j = s0 + 1/2n−2(j−s) if j < s0 + 1/2

.

We have already mentioned in Lemma 7 that the bias term is directly related tothe Holder regularity of r at 0. The necessity for having an additional assumptionon the Holder regularity of the j th derivative as defined by r ∈ F0(s

′, j, C1) comesfrom the variance bound in (14): this bound is influenced by r through the quantity?(b)n (r), whose rate of convergence decreases when j increases (see Corollary 2 inthe appendix).

Theorem 3 may be compared for example, to the rates of convergence obtainedin Kent and Wood [16] with a similar estimator and in Feuerverger et al. [10]with a different estimator. In Feuerverger et al. [10], ‘identifying’ the smoothnessparameter with :−1

n , similar rates as in Theorem 3 are obtained with (A10) and(A11) for s0 = s and j = 2. In Kent and Wood [16] (Theorem 4), similar rates asin Theorem 3 are obtained for a similar estimator, with (A10) and (A11) for s0 = sand j = 4. Thus, this Theorem extends their results to more general assumptions.Moreover, it also extends the general conditions to obtain nmin(1/2,β) rate of conver-gence (see Theorem 4 below). For instance, (A10) and (A11) with s0 = s and j = 3is sufficient.

Remark. In equation (15), since :n = n−1, assumption (A10) gives a bound ofthe bias w.r.t. n−β . Since F0(s0, j, C1)⊂ F0(s0, 0, C1), when s0>s, (A11) implies(A10) with β= s0 −s. However, (A10) might hold for some β > s0 −s and providea better rate for the MSE.

Remark. The proof of this theorem relies on lemmas which derive (A6) fromthe regularity conditions above. (A8) and (A9) can be related similarly to preciseregularity conditions. With (A7), Proposition 2 gives a central limit theorem fors0(n, a).

Using Theorems 1 and 2, we now derive simple classes of processes over whichs0(n, a) is minimax optimal.

THEOREM 4. Let C0> 1, C1> 0, β > 0 and j � 1. Denote δ= min(β, 1/2) ands= min(2, j − δ). Assume that a ∈I with 2M(a) > s + 1/2. Let

L"(β, j, C0, C1) := {X ∈ H"(β, j, C0, C1) : s(X)� s}. (16)

Then, s0(n, a) is minimax rate optimal in L"(β, j, C0, C1).

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 299

∃C > 0,∀n ∈ IN,sup

Y∈H"(β,j,C0,C1)

IE[(s0(n, a)− s(Y ))2]

infsn∈Sn

supY∈H"(β,j,C0,C1)

IE[(sn − s(Y ))2]�C. (17)

Moreover, the achieved rate of convergence is nδ .Proof. SinceL"(β, j, C0, C1) includesH"(β, j, C0, C1) andG"((0, 2)×(C−1

0 ×C0)), we can apply Theorems 2 and 1. Theorem 3 applies and the condition im-posed on s(X), j, β in (17) implies that s0(n, a) achieve the rate nδ . �Note that for j � 3, s= 2 for all β, and thus L"(β, j, C0, C1)=H"(β, j, C0, C1). Itimplies that ifM(a)� 2, s0(n, a) achieves the minimax rate nδ inH"(β, j, C0, C1)

as soon as j � 3.

5. Concluding Remarks

An important issue can be associated to possible prospective works: what is theoptimal rate when weaker regularity conditions are assumed? Let us first note thatin any classes containing the subclasses H"(β, j, C0, C1), Theorems 1 and 2 stillapply and give an upper bound of the optimal rate of convergence. Using this, onecan consider classes larger than the classes L"(β, j, C0, C1) over which s0(n, a)achieves the optimal rate min(n1/2, nβ). This is done in Roueff [19] by consideringclasses based on three types of regularity conditions on r:

• r ∈ F0(s + β, 0, C1) ∩ F0(s0, j, C1) for convenient values of j and s0.• r in a ball defined by its regularity in some global Holder space.• r in a ball defined by its regularity in some Besov space.

The anonymous referees respectively suggested to investigate two other cases.

• non centered processes, or more precisely the case of processes defined byY (t)=m(t) + X(t), where m(t) is a deterministic function and X(t) is acentered process as assumed in (A1).

• non-stationary processes for which (3) would be replaced by.

IE[(X(t + h)−X(t))2] = C(t)|h|s + r(t, h). (18)

The study of the performance of discrete variation estimators on non-centeredprocesses can easily be achieved because the deterministic part can be studiedseparately. One gets the classical result that polynomial trends m(t) of degrees lessthan the order of the considered discrete variations does not influence the estimator.More generally, if m(t) is sufficiently smooth, the deterministic part is negligibleand the rate of convergence is the same as in the centered case. Now, concerningthe non-stationary case, it seems clear that one can find sufficient conditions on rand C in (18), implying that the variations w.r.t. t is sufficiently smooth so that theperformance of discrete variation estimators is not corrupted by the non-stationary

300 GABRIEL LANG AND FRANCOIS ROUEFF

trends. A more challenging issue would consist in considering cases for which newmethods or new estimators should be used. One may think for instance that if Cor r vary two much with t or even in a more general model, if s also depends ont , an adaptive method should be used to select the pertinent observations. In prac-tical situations, it would consist in using discrete variation estimators windowed orweighted adaptively around the position where the variogram IE[(X(t+h)−X(t))2]is essentially driven by C|h|s . Let us mention that, even in the stationary case,a similar situation is pointed out in Roueff [19]: if we assume (A1) with onlyr ∈ F0(s + β, 0, C1), for some β,C1> 0, the estimates of the variance of discretevariation estimators given in Proposition 1 indicate that s0(n, a) is outperformed byan other discrete variation estimator obtained by dropping a part of the data. Thiscomes from the redundancy of the observations which are two much correlated.However, it is beyond the scope of this paper to investigate the case of such stronglycorrelated or non-stationary data.

Appendix

Proof of lemma 2. We have to prove that γn is a nonnegative definite function. No-tice first that γ (j)n is continuous over (0,∞). We use that it is sufficient to show thatγn is convex over [0,∞). Set f (u) = θj (u)+ log(u) and gn(u) = exp(sn(f (u))).We have: g′′

n(u) = [f ′′(u)+ snf ′(u)2]sngn(u). From the series expansion of Equa-tion (8), it follows, when v tends to zero positively: f ′′(1−v) = −jvj−1 +o(vj−1)

and: f ′(1 − v) = vj + o(vj ).Hence, there exists δ ∈ (0, 1) such that: ∀sn� 1,∀u∈ (δ, 1), f ′′(u)+snf ′(u)2 �

0.Using Equation (7) and the definition of f , we have, when u tends to zero pos-

itively: f ′(u)= 1/u−A+o(1) And f ′′(u) = −1/u2 +O(1), where A is a positiveconstant. Hence, there exists δ′ ∈ (0, 1) such that: ∀sn� 1,∀u∈ (0, δ′), f ′′(u) +snf

′(u)2 � 0.From Equation (8), f ′′ is increasing over (0, 1] so that f ′′(u)� f ′′(δ) < 0 over

[δ′, δ]. Finally, f (2)(u)+ snf ′(u)2 is negative over (0, 1) for n sufficiently large.

Proof of lemma 3. rn is derived from γn by γn(0)−γn(t) = (t/δn)sn exp(snθj (0))+rn(t). rn is a symmetric function and satisfies:

• ∀t ∈ [0, δn], rn(t) = (t/δn)sn exp(snθj (t/δn))− (t/δn)sn exp(snθj (0)),• ∀t � δn, rn(t) = 1 − (t/δn)sn exp(snθj (0)).

Moreover, r(j)n is continuous over (0,∞) (see in Lemma 2 the same result forγn). We now check the bound of r(j)n . From now on, n is large enough to have sn� 1.Using the series expansion of the exponential function, we get for any real v andany polynomial P(x) = ∑

i[p]ixi : exp(v(P (u)−P(0))) = v∑k�1Qk(v, [p]i)uk,whereQk is a multivariate polynomial with nonnegative coefficients and the serieshas an infinite radius of convergence. We obtain that, over (0, δn), rn(t) is a series

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 301

of functions of the form Cn,ktsn+k, k� 1. The series of the j th derivatives of these

functions converges uniformly on any compact of (0, δn) so that, for t ∈ (0, δn).

|r(j)n (t)| =∣∣∣∣∣∣sn∑k�1

Qk(sn, [θj ]i)δ−sn−kn

(j−1∏i=0

(sn + k − i))t sn+k−j

∣∣∣∣∣∣� (snδ−1

n t1−β)δ−snn

[∑k∈IN

Qk+1(1, |[θj ]i |)(1 + k + j)j (t/δn)k]

t sn+β−j .

The term between parentheses is bounded by 1; the following term tends to 1 whenn→∞; the series between brackets has an infinite radius of convergence and isthus bounded over [0, 1]. Now, let t ∈ (δn, 1).

|r(j)n (t)|�exp(snθj (0))

δsnn

snj !t sn−j .

The ratio tends to 1 when n→∞ and sn� tβ . With the previous bound, it yields,for n sufficiently large, rn ∈ F0(sn+β, j, C1), for some C1> 0. Now, from property1, it implies, for sn sufficiently small, say sn + β < s < 1 for some s, rn ∈ F0(sn +β, j, C ′

1), where C ′1 does not depend on n.

Proof of Proposition 1. This result summarizes our study on consistency prop-erties of discrete variations and relies on two different parts: statistical propertieson logarithms of Gaussian quadratic forms (Lemma 8 below) and the result givenby Lemma 5. Other computations are classical and will be exposed briefly.

Recall the definitions and notations of subsection 4. Denote Tn= T (mn, ρn),and: Vb,n := V (b,:n,mn, ρn). From property 3, we have, for all t ∈ Tn: IE[Vb,n] =IE[V(b):n,t (X)

2] = V(−b2∗)

:n,0(v). We may write, using assumption (A1) and Lemma 4,

when M(b)� 1 and for any sufficiently small :> 0:

V(−b2∗)

:,0 (v) = CK(b, s):s(

1 + :−sV(−b2∗)

:,0 (r)

CK(b, s)

), (19)

where :−sV(−b2∗)

:,0 (r) tends to 0 when : tends to 0. Standard computations on thecovariance of the squares of Gaussian random variables and property 3 give for(b, b′)= (a, a), (a, a) or (a, a):

cov (Vb,n, Vb′,n) = 2m−2n

∑t,u∈Tn

cov2(V(b):n,t (X),V(b′):n,u(X))

= 2m−2n

∑t,u∈Tn

|V(−b∗b′):n,t−u(v)|2. (20)

302 GABRIEL LANG AND FRANCOIS ROUEFF

We separate the respective contribution of C| · |s and r, and notice that:

∑t,u∈Tn

|V(−b∗b′):n,t−u(r)|2 =mn−1∑

j=−mn+1

(mn − |j |)|V(−b∗b′):n,jρn(r)|2 �mn?(b∗b

′)n (r)2.

We obtain:∣∣∣∣∣∣∣cov1/2(Vb,n, Vb′,n)−√

2Cm−1n

∑t,u∈Tn

|V(−b∗b′):n,t−u(| · |s)|2

1/2∣∣∣∣∣∣∣

� 2m−1/2n ?(b∗b

′)n (r). (21)

The following lemma and Taylor expansion of log(V(b):,0(v)) as:→ 0, for b=− a2∗

and b=− a2∗ yield the bias bound. It also gives, with Lemma 5, the claimedbounds for the variance of sn.

LEMMA 8. Let (Uj,n)1 � j � n be a triangular array of zero-mean Gaussian ran-dom variables. Denote

• εn(U) := n−1(∑n

j,j ′=1 corr2(Uj,n, Uj ′,n))1/2

.

• ?n(U) := n−1∑nj=1 U

2j,n/IE(U

2j,n).

Assume that limn→∞ εn(U) = 0. Then

• |IE[log(?n(U))]| = O(εn(U)2),• |var[log(?n(U))] − var[?n(U)]| = O(εn(U)

3).

Let (Vj,n)1 � j � n be another triangular array satisfying the same assumptions asabove. Then:

|cov(log(?n(U)), log(?n(V )))− cov(?n(U),?n(V ))|= O(εn(U)

3)+O(εn(V )3).Proof (sketch). This lemma is adapted from Lemma 5 of Lang and Azaıs [17]

(by simply observing that dropping the condition on µn in their result only intro-duces a constant which does not affect the proof). It relies on the fact that the tailsdistribution of ?n(U) decreases exponentially at 0 and ∞ and provides a rigorousproof of the bound derived in Kent and Wood [16] on the moments of the logarithmof a quadratic form of Gaussian variables. �Lemma 5 and (21) give a precise bound of εn(U), which, with Lemma 8, give theclaimed result.

Proof of Corollary 1. Assume Supp(a)⊂{0, . . . , q}. Suppose first that ρ= x/ywith x, y ∈ IN\{0}. Define:n := y/n and mn := �(:−1

n − q)y/x� ≈ n. One checks

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 303

easily: sn := s(a,:n,mn, ρ)∈Sn, for n sufficiently large. In the parametric classG"((0, 1) × (0,∞)), since r = 0, Proposition 1 gives IE[(sn − s)2] = (mnρ)

−1

(L(a, s, ρ)+ o(1)). From Theorem 1 and by continuity of L(a, s, ρ) w.r.t. s ∈(0, 2M(a)−1/2), we get L(a, s, ρ)�B(s/2, 1). The continuity of L(a, s, ρ) w.r.tρ concludes the proof.

Proof of Proposition 2. We use the same notations as in the proof of Propo-sition 1. We first obtain the asymptotic normality of any linear combinations ofV−a2∗,n and V−a2∗,n. For this, we prove that, for any (b, b′)∈ {a, a}2, the followingquantity:

Gn := maxi∈T (mn,ρn)

∑j∈T (mn,ρn)

∣∣∣cov(V(b):n,iρn(v),V(b′):n,jρn

(v))

∣∣∣ .is negligible w.r.t. Sn := mnvar1/2(λV−a2∗,n +µV−a2∗,n), where (λ, µ) is a nonzerocouple of reals. Suppose, without loss of generality, λ �= 0. Using previous com-putations, S2

n is asymptotically equivalent to P(a, s, ρ, µ/λ):2sn mnρ

−1n , where, at

fixed a, s and ρ, P(a, s, ρ, µ/λ) is a nonnegative polynomial of order 2 of µ/λ andwhose discriminant is of the same sign as D(a, s, ρ). Thus D(a, s, ρ)� 0. Usingassumption (A7), P(a, s, ρ, µ/λ) is bounded away from 0, which also impliesthat L(a, s, ρ)> 0 (given by µ/λ=− 1). Besides, using property 3, separating therespective contributions of C| · |s and r, and using the Schwartz inequality, we get:

Gn = Omn−1∑j=0

|V(b∗b′):n,jρn(C| · |s)|

+O

(m1/2n ?

(b∗b′)n (r)

).

The first term is bounded using Lemma 9: for an integer k such that s + 1/2<k�M(b)+M(b′), |V(b∗b′):n,jρn

(C| · |s)| = O(:sn(1+|jρn|)s−k). The fact that s− k <−1/2 and assumptions (A2–4) give that this term is negligible w.r.t. :snm

1/2n ρ

−1/2n .

Using assumption (A9) again, we obtain the same result for the second term. ThusGn = o(Sn). The rest of the proof relies on a Lindeberg argument and the Slutski’sTheorem as in Theorem 2 of Bardet et al. [2].

Proof of Theorem 3. The proof is given in three parts. First, we establish a result(Corollary 2) relating the regularity conditions on r to assumption (A6). Then,Proposition 1 gives corresponding bounds of the MSE for any estimator satisfying(A2–5). Finally, Theorem 3 is derived. We claim the following results:

LEMMA 9. Let (bi)i=−q,...,q ∈I and :∈ (0, 1). Let f ∈ F0(s0, j, C1). Assume M(b) > j +d(f (j), s0 − j). Then, there exists K depending on s0, j and b such that:

∀t, |t| < 1/:− q ⇒ |V(b):,t (f )|�K(s0, j, b)C1:s0(1 + |t|)s0−j .

Let us denote I the integration operator for sequences (ai)i∈ZZ ∈ l1(IRZZ): I [(ai)]j:= ∑j

i=−∞ ai . And denote (b(−k)i ) the kth integrated sequence, that is the onedefined recursively by (b(−k)i ) := I [(b(−k+1)

i )].

304 GABRIEL LANG AND FRANCOIS ROUEFF

LEMMA 10. Let (bi)i=−q,...,q ∈ I . Let k be an integer such that 1 � k�M(b) andf a function such that: f (k) ∈ L1((:(t − q),:(t + q))). Then:

V(b):,t (f ) = (−1)k:k−1

⟨f (k)

∣∣ q−k∑i=−q

b(−k)i Spk(·/:− (t + i))

⟩, (22)

where Spk is the spline function of order k, that is the function 1l[0,1] k timesconvolved with itself.

Proof (sketch). Lemma 10 is obtained by successive integrations by parts, tak-ing advantage of the vanishing moments of b. Lemma 9 uses such results with theproperties of the derivatives of f by considering separately, when necessary, thetwo cases: |t| � q and |t| > q. �We now derive the straightforward following corollary from Lemma 9:

COROLLARY 2. Let (:n,mn, ρn) be in (0,∞) × IN × (0,∞) such that: :n(mnρn + q) � 1, ρn = O(1) and limn→∞ ρn mn = ∞. Denote, for p > 0:

?(b)n,p(f ) :=(mn−1∑i=0

|V(b):n,iρn(f )|p)1/p

.

Then, with the assumptions of Lemma 9 on f and b, there exists a constant K onlydepending on s0, j and b such that:

?(b)n,p(f )�K(s0, j, b)C1:s0n ρ

−1/pn

1 if j > s0 + 1/p,log1/p(ρn mn) if j = s0 + 1/p,(ρn mn)

s0−j+1/p if j < s0 + 1/p.

This corollary generalizes the results of Kent and Wood [16], obtained for anequivalent assumption to (A1) with r ∈ F0(s, 2p + 2, C0, C1), p ∈ IN . Now, repla-cing ?(b)n (r) by the bounds corresponding to the regularity assumptions, and usingEquation (15) to bound the bias term, Proposition 1 is applied, whenever assump-tions (A2–5) are satisfied. Assume β,C1> 0, r ∈ F0(s + β, 0, C1) and 2M(a) >d(r, s + β). If r ∈ F0(s0, j, C1) and 2M(a) > j + d(r(j), s0 − j):

IE[(s(a,:,m, ρ)− s(Y ))2] = O(:2βn )+O((ρnmn)−1)+

+O(:2(s0−s)n (mnρn)

− min(1,2(j−s0))(1 + 1l{1=2(j−s0)} log(mnρn)),

where s0 and j are such that consistency follows (depending on the behavior of:nw.r.t.mnρn). For s0, β or/and j sufficiently large, the main term of the MSE boundis O((ρnmn)−1). The above conditions on M(a) depends on these parameters.However, for the asymptotic behaviors of mnρn and :n given in the assumptionsof Theorem 3, the condition 2M(a) > s+1/2 is sufficient to use the bounds above.Indeed, the conditions on M(a) w.r.t. the parameters are not fulfilled only whenthe main term of the MSE bound is given by O((ρnmn)−1). Thus, we decrease theparameters appropriately so that:

SEMI-PARAMETRIC ESTIMATION OF THE HOLDER EXPONENT 305

• the conditions on M(a) w.r.t. s0, β or/and j are satisfied for all a such that2M(a) > s + 1/2.

• the main term of the MSE bound keeps being O((ρnmn)−1).

Notice that the constants of the O()s above may be explicitly given. Theorem 3follows.

Acknowledgements

We are grateful to Eric Moulines and Philippe Soulier for many fruitful discussionsand comments. Francois Roueff was supported by the CNET (France Telecom).

References

1. Adler, R.: The Geometry of Random Fields. Wiley, New York, 1981.2. Bardet, J., Lang, G., Moulines, E. and Soulier, P.: Wavelet estimator of long-range dependent

processes, Inference for Stochastic processes (special issue on Limit theorems and long-rangedependence) 3(1/2), (2000).

3. Chan, G., Hall, P. and Poskitt, D.: Periodogram-based estimators of fractal properties, Annalsof Statist. 23(5), (1995), 1684–1711.

4. Cœurjolly, J. F.: Identification du mouvement Brownien fractionnaire par variations discretes,Technical report, IMAG-LMC (UMR 5523), 1999.

5. Constantine, A. and Hall, P.: Characterizing surface smoothness via estimation of effectivefractal dimension. J. Royal Statist. Soc. 56 (1994), 97–113.

6. Dahlhaus, R.: Efficient parameter estimation for self-similar processes, Annals of Statist. 17(4)(1989), 1749–1766.

7. Daubechies, I.: Ten Lectures on Wavelets. SIAM, 1992.8. Davies, S. and Hall, P.: Fractal analysis of surface by using spatial data, J. Royal Statist. Soc.

61(1) (1999), 3–37.9. Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, New

York, 1990.10. Feuerverger, A., Hall, P. and Wood, A.: Estimation of fractal index and fractal dimension of a

Gaussian process by counting the number of level crossing, JTSA 15 (1994), 587–606.11. Fox, R. and Taqqu, M. S.: Central limit theorems for quadratic forms in random variables

having long-range dependence, Probab. Th. Rel. Fields 74(2) (1987), 213–240.12. Gill, R. D. and Levit, B. Y.: Applications of the van Trees inequality: a Bayesian Cramer-Rao

bound, Bernoulli (1) (1995), 59–79.13. Giraitis, L., Robinson, P. and Samarov, A.: Rate optimal semiparametric estimation of the

memory parameter of the Gaussian time series with long-range dependence, J. Time SeriesAnal. 18(1) (1997), 49–60.

14. Istas, J. and Lang, G.: Quadratic variations and estimation of the local Holder index of aGaussian process, Ann. Inst. Henri Poincare, Probabilites et statistiques 33(4) (1997), 407–436.

15. Jaffard, S. and Meyer, Y.: Wavelet methods for pointwise regularity and local oscillationsfunctions, in Memoirs of the American Mathematical Society, Vol. 123, A.M.S.

16. Kent, J. T. and Wood, A. T.: Estimating the fractal dimension of a locally self-similar Gaussianprocess by using increments, J. Royal Statist. Soc. 59(3) (1997), 679–699.

17. Lang, G. and Azaıs, J.-M.: Non-parametric estimation of the long-range dependence exponentfor Gaussian processes, J. Statist. plan. infer. 80 (1999), 59–80.

306 GABRIEL LANG AND FRANCOIS ROUEFF

18. Mandelbrot, B. and Van Ness, J.: Fractional Brownian motions, fractional noises and applica-tions, SIAM Rev. 10 (1968), 422–437.

19. Roueff, F.: Dimension de Hausdorff du graphe d’une fonction continue: une etude analytiqueet statistique, Ph.D. thesis, Ecole Nationale Superieure des Telecommunications.

20. Samorodnitsky, G. and Taqqu, M. S.: Stable non-Gaussian processes: stochastic models withinfinite variance, Chapman and Hall, 1994.