On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references...
Transcript of On Wavelet Regression with Long Memory Inflnite Moving ...xiaoyimi/LiLiuXiao6.pdf · references...
On Wavelet Regression with Long Memory Infinite
Moving Average Errors
Linyuan Li∗
Department of Mathematics and Statistics, University of New Hampshire, USA,
Juan Liu
Department of Mathematics and Statistics, University of New Hampshire, USA,
Yimin Xiao†
Department of Statistics and Probability, Michigan State University, USA
December 21, 2006
Abstract
We consider the wavelet-based estimators of mean regression function with longmemory infinite moving average errors and investigate their asymptotic rates of con-vergence of estimators based on thresholding of empirical wavelet coefficients. We showthat these estimators achieve nearly optimal minimax convergence rates within a loga-rithmic term over a large class of non-smooth functions that involve many jump discon-tinuities, whose number of discontinuities may grow polynomially fast with sample size.Therefore, in the presence of long memory moving average noise, wavelet estimatorsstill achieve nearly optimal convergence rates and provide explicitly the extraordinarylocal adaptability in handling discontinuities. A key result in our development is toestablish a Bernstein-type exponential inequality for an infinite weighted sums of i.i.d.random variables under certain cumulant assumption. This large deviation inequalitymay be of independent interest.
Short title: Wavelet Estimator with Long Memory Data2000 Mathematics Subject Classification: Primary: 62G07; Secondary: 62C20Keywords: Infinite moving average processes; long range dependence data; minimaxestimation; nonlinear wavelet-based estimator; rates of convergence
∗Research supported in part by the NSF grant DMS-0604499.†Research supported in part by the NSF grant DMS-0404729.
1
1 Introduction
Consider nonparametric regression
Yi = g(xi) + εi, i = 1, 2, · · · , n, (1.1)
where xi = i/n ∈ [0, 1], ε1, · · · , εn are observational errors with mean 0 and g is an unknown
function to be estimated. Common assumptions on ε1, · · · , εn are i.i.d. errors or station-
ary processes with short-range dependence such as classic ARMA processes, see, e.g., Hart
(1991), Tran, et al. (1996) and Truong and Patil (2001). However, in many fields which in-
clude agronomy, astronomy, economics, environmental sciences, geosciences, hydrology and
signal and image processing, it is unrealistic to assume that the observational errors are in-
dependent or short-range dependent. Instead, these observational errors exhibit slow decay
in correlation which is often referred to as long-range dependence or long memory. Suppose
ε1, · · · , εn, · · · is a stationary error process with mean 0 and variance 1. Then {εi, i ≥ 1} is
said to have long-range dependence or long memory, if there exists α ∈ (0, 1) such that
r(j) = E(ε1ε1+j
) ∼ C0|j|−α, (1.2)
where C0 > 0 is a constant and aj ∼ bj means that aj/bj → 1 when j →∞. The literature
on long-range dependence is very extensive, see, e.g., the monograph of Beran (1994) and the
references cited therein. Estimation for data with long-range dependence is quite different
from that for observations with independence or short-range dependence. For example, Hall
and Hart (1990) showed that the convergence rates of mean regression function estimators
differ from those with independence or short-range dependence.
In this paper we suppose that the errors {εi, i ∈ Z} constitute a strictly stationary
moving average sequence which is defined by
εi =∑j≤i
bi−j ζj, i ∈ Z. (1.3)
Here {ζj, j ∈ Z} is a sequence of i.i.d. random variables with mean zero and variance σ2,
and bi, i ∈ Z+ are nonrandom weights such that∑
i b2i = σ−2 [This implies that E(ε2
i ) = 1
for all i ∈ Z]. Furthermore, we assume that the weights decay slowly hyperbolically:
bi ∼ C1i−(1+α)/2, 0 < α < 1, (1.4)
where C1 is a constant. The equations (1.3) and (1.4) imply that (1.2) holds with C0 =
C21 σ2
∫∞0
(u + u2)−(1+α)/2 du. Hence the errors εi in (1.3) have long memory. The family of
long memory processes defined by (1.3) includes the important class of fractional ARIMA
processes. For more information on their applications in economics and other sciences, see
2
Robinson (1994) and Baillie (1996). For various theoretical results pertaining to the empirical
processes of long memory moving averages, see Ho and Hsing (1996, 1997), Giraitis, et al.
(1996, 1999), Koul and Surgailis (1997, 2001), among others.
In this paper, we will consider the nonparametric regression model (1.1) with random
errors {εi} satisfying (1.3) and (1.4). Furthermore, we assume that the random variables
{ζj, j ∈ Z} satisfy the Statulevicius condition (Sγ): There exist constants γ ≥ 0 and ∆ > 0
such that
|Γm(ζj)| ≤ (m!)1+γ
∆m−2for m = 3, 4, . . . , (1.5)
where Γm(ζj) denotes the cumulant of ζj of order m; see Section 2.2 for its definition and
some basic properties. Amosova (2002) has shown that, when γ = 0, the condition (Sγ) is
equivalent to the celebrated Cramer condition; and when γ > 0, it is equivalent to the Linnik
condition. Hence, the class of random variables satisfying (Sγ) is very large. For proving our
main theorem, we will establish a Bernstein-type exponential inequality for a weighted sums
of i.i.d. random variables ζj (see Lemma 4.2 below), which may be of independent interest.
For the nonparametric model (1.1), Csorgo and Mielniczuk (1995) and Robinson (1997)
have proposed kernel estimators of mean regression functions and provided central limit
theorems when the errors are long range dependent Gaussian sequences and stationary mar-
tingale difference sequences, respectively. They all assume that the mean regression function
g is a fixed continuously differentiable function.
Our objective of the present paper is to study the wavelet-based estimator of the re-
gression function g, where g belongs to a large function class which may have many jump
discontinuities. We investigate the asymptotic convergence rates of the estimators and show
that discontinuities of the unknown curve have a negligible effect on the performance of
nonlinear wavelet curve estimators.
Wavelet method in nonparametric curve estimation has become a well-known technique.
For a systematic discussion of wavelets and their applications in statistics, see the mono-
graph by Hardle, et al. (1998). The major advantage of wavelet method is its adaptability
to the varying degrees of smoothness of the underlying unknown curves. These wavelet es-
timators typically achieve the optimal convergence rates over exceptionally large function
space. For reference, see Donoho, et al. (1995), Donoho and Johnstone (1995, 1998), and
Hall, et al. (1998, 1999). All of the above works are under the assumption that the errors
are independent normal variables. For correlated noise, Wang (1996) and Johnstone and
Silverman (1997) examine the asymptotic properties of wavelet-based estimators of mean
regression function with long memory Gaussian noise. Kovac and Silverman (2000) and
von Sachs and Macgibbon (2000) consider a correlated heteroscedastic and/or nonstationary
noise sequence. They show that these estimators achieve minimax rates over wide range of
function spaces. All of the above works assume that the underlying function belongs to a
3
large smooth function space. Li and Xiao (2006) consider block threshold wavelet estimation
of mean regression function when the errors are long memory Gaussian processes. In this
paper we consider that the mean regression function belongs to a large class of functions with
discontinuities and the observational errors follow long memory moving average precesses.
We show that the wavelet-based estimators, based on simple thresholding of the empirical
wavelet coefficients, attain nearly optimal convergence rates over a large space of non-smooth
functions.
The rest of this paper is organized as follows. In the next section, we recall some
elements of wavelet transforms, provide nonlinear wavelet-based mean regression function
estimators and some large deviation estimates for a weighted partial sums of random variables
{εi, i ≥ 1} under the Statulevicius condition (Sγ). The main results are described in Section
3, while their proofs appear in Sections 4.
Throughout this paper, we use C to denote positive and finite constants whose value
may change from line to line. Specific constants are denoted by C0, C1, C2, A, B, M and so
on.
2 Preliminaries
This section contains some facts about wavelets and large deviation estimates that will be
used in the sequel.
2.1 Wavelet estimators
Let φ(x) and ψ(x) be father and mother wavelets, having the following properties: φ and ψ
are bounded, compactly supported, and∫
φ = 1. We call a wavelet ψ r-regular if ψ has r
vanishing moments and r continuous derivatives. Let
φj0k(x) = 2j0/2φ(2j0x− k), ψjk(x) = 2j/2ψ(2jx− k), x ∈ R, j0, j ∈ Z,
then, the collection {φj0k, ψjk, j ≥ j0, k ∈ Z} forms an orthonormal basis (ONB) of L2(R).
Furthermore, let Vj0 and Wj be linear subspaces of L2(R) with the ONB {φj0k, k ∈ Z} and
{ψjk, k ∈ Z} respectively, we have the following decomposition
L2(R) = Vj0 ⊕Wj0 ⊕Wj0+1 ⊕Wj0+2 ⊕ · · · .
Therefore, for all f ∈ L2(R),
f(x) =∑
k∈Zαj0kφj0k(x) +
∑j≥j0
∑
k∈Zβjkψjk(x), (2.1)
4
where the coefficients are given by
αj0k =
∫f(x)φj0k(x) dx, βjk =
∫f(x)ψjk(x) dx
and the series in (2.1) converges in L2(R).
The orthogonality properties of φ and ψ imply:
∫φj0k1φj0k2 = δk1k2 ,
∫ψj1k1ψj2k2 = δj1j2δk1k2 ,
∫φj0k1ψjk2 = 0, ∀j0 ≤ j, (2.2)
where δjk denotes the Kronecker delta, i.e., δjk = 1, if j = k; and δjk = 0, otherwise. For
more information on wavelets see Daubechies (1992).
In our regression model, the mean function g is supported on a fixed unit interval [0, 1],
thus we can select an index set Λ ⊂ Z and modify some of ψij(x), i, j ∈ Z, such that
ψij(x), i, j ∈ Λ forms a complete orthonormal basis for L2[0, 1]. We refer to Cohen, et al.
(1993) for more details on wavelets on the interval. Hence, without loss of generality, we
may and will assume that φ and ψ are compactly supported on [0, 1]. We also assume that
both φ and ψ satisfy a uniform Holder condition of exponent 1/2, i.e.,
|ψ(x)− ψ(y)| ≤ C|x− y|1/2, for all x, y ∈ [0, 1]. (2.3)
Daubechies (1992, Chap.6) provides examples of wavelets satisfying these conditions.
As those in the wavelet literature, we investigate wavelet-based estimators’ asymptotic
rates of convergence over a large range of Besov function classes Bσp,q, σ > 0, 1 ≤ p, q ≤ ∞,
which is a very rich class of function space. They include, in particular, the well-known
Sobolev and Holder spaces of smooth functions Hm and Cσ (Bm2,2 and Bσ
∞,∞ respectively), as
well as function classes of significant spatial inhomogeneity such as the Bump Algebra and
Bounded Variations Classes. For a more detailed study we refer to Triebel (1992).
For a given r-regular mother wavelet ψ with r > σ, the wavelet expansion of g(x) is
g(x) =∑
k∈Zαj0kφj0k(x) +
∑j≥j0
∑
k∈Zβjkψjk(x), x ∈ [0, 1], (2.4)
where
αj0k =
∫g(x)φj0k(x) dx and βjk =
∫g(x)ψjk(x) dx.
Let
Gσ∞,∞(M, A) =
{g : g ∈ Bσ
∞,∞, ‖g‖Bσ∞,∞ ≤ M, ‖g‖∞ ≤ A, supp g ⊆ [0, 1]}
,
and let PdτA be the set of piecewise polynomials of degree d ≤ r− 1, with support contained
in [0, 1], such that the number of discontinuities is less than τ and the supremum norm is
5
less than A. The spaces of mean regression functions we consider in this paper are defined
by
VdτA{Gσ∞,∞(M, A)} = {g : g = g1 + g2; g1 ∈ Gσ
∞,∞(M, A), g2 ∈ PdτA}. (2.5)
i.e., VdτA{Gσ∞,∞(M, A)} is a function space in which each element is a mixture of a regular
function g1 from the Besov space Bσ∞,∞ with a function g2 that may posses discontinuities.
In the statement below, the notation 2j(n) ' h(n) means that j(n) is chosen to satisfy
the inequalities 2j(n) ≤ h(n) < 2j(n)+1.
Our proposed nonlinear wavelet estimator of g(x) is
g(x) =∑
k∈Zαj0kφj0k(x) +
j1∑j=j0
∑
k∈ZβjkI(|βjk| > δj)ψjk(x), (2.6)
where
αj0k =1
n
n∑i=1
Yiφj0k(xi), βjk =1
n
n∑i=1
Yiψjk(xi), (2.7)
and the smoothing parameters j0, j1 are chosen too satisfy 2j0 ' log2 n and 2j1 ' n1−π for
some positive π > 0 (We will choose π < 0.75(2r+1)−1 in our main theorem below. Also for
the sake of simplicity, we always omit the dependence on n for j0 and j1). The threshold δj
is level j dependent satisfying δ2j = 23+γC2n
−α2−j(1−α) ln n, where γ is the constant in (1.5),
α is the long memory parameter in (1.2) and C2 = C0
∫∫ |x− y|−αψ(x)ψ(y) dxdy.
2.2 Large deviation estimates
Let ξ be a random variable with characteristic function fξ(t) = E exp(itξ) and E|ξ|m < ∞.
The cumulant of ξ of order m, denoted by Γm(ξ), is defined by
Γm(ξ) =1
imdm
dtm
(log fξ(t)
)∣∣∣∣t=0
, (2.8)
where log denote the principal value of the logarithm so that log f(0) = 0. Note that, under
the above assumptions, all cumulants of order not exceeding m exist and
log fξ(t) =m∑
j=1
Γm(ξ)
j!(it)j + o(|t|m) as t → 0.
Cumulants are in general more tractable than moments. For example, if ξ1, . . . , ξn are
independent random variables and if Sn = ξ1 + · · ·+ ξn, then (2.8) implies
Γm(Sn) =n∑
j=1
Γm(ξj). (2.9)
6
Moreover, if η = a ξ, where a ∈ R is a constant, then Γm(η) = am Γm(ξ). We refer to Petrov
(1975) and Saulis and Statulevicius (2000) for further information on cumulants and their
applications to limit theory.
The large tail probability estimates of ξ can be described using information on the
cumulants Γm(ξ). We will make use of the following result of Bentkus and Rudzkis (1980)
[see also Lemma 1.7 and Corollary 1.1 in Saulis and Statulevicius (2000)].
Lemma 2.1 Let ξ be a random variable with mean 0. If there exist constants γ ≥ 0, H > 0
and ∆ > 0 such that
∣∣Γm(ξ)∣∣ ≤
(m!
2
)1+γ H
∆m−2, m = 2, 3, . . . , (2.10)
then for all x > 0,
P(|ξ| ≥ x
) ≤
exp(− x2
4H
), if 0 ≤ x ≤ (H1+γ∆)1/(1+γ),
exp(− 1
4(x∆)1/(1+γ)
), if x ≥ (H1+γ∆)1/(1+γ).
(2.11)
Condition (2.10) can be regarded as a generalized Statulevicius condition. It is more
general than the celebrated Cramer and Linnik conditions. Recall that a random variable ξ
is said to satisfy the Cramer condition if there exists a positive constant a such that
E exp(a|ξ|) < ∞. (2.12)
See Petrov (1975, p. 54) for other equivalent formulations of the Cramer condition and its
various applications.
A random variable ξ is said to satisfy the Linnik condition if there exist positive con-
stants δ and Cν such that
E exp(δ |ξ|4ν/(2ν+1)
)< Cν for all ν ∈ (
0,1
2
). (2.13)
Clearly, the Linnik condition is weaker than the Cramer condition. Amosova (2002) has
proved that (i) If γ = 0, then the Statulevicius condition (Sγ) coincides with the Cramer
condition; (ii) if γ > 0, then (Sγ) coincides with the Linnik condition. See Amosova (2002)
for the precise relations among the constants γ, ∆, δ and ν in these conditions.
It is also worthwhile to mention the following result of Rudzkis, Saulis and Statulevicius
(1978) [see also Lemma 1.8 in Saulis and Statulevicius (2000)]: Let ξ be a random variable
satisfying the following conditions: E(ξ) = 0, E(ξ2) = σ2 and there exist constants γ ≥ 0
and K > 0 such that
|E(ξm)| ≤ (m!)1+γ Km−2σ2, m = 3, 4, . . . . (2.14)
7
Then ξ satisfies condition (2.10) with H = 21+γσ2 and ∆ =[2(K ∨ σ)
]−1.
Condition (2.14) is a generalization of the classical Bernstein condition: |E(ξm)| ≤12m! Km−2σ2 for all m = 3, 4, . . ., which has been used by many authors. For examples, see
Petrov (1975, p.55), Johnstone (1999, p.64), Picard and Tribouley (2000, p.301), Zhang and
Wong (2003, p.164), among others.
3 Main results and discussions
Recall that we consider the nonparametric regression model (1.1) with random errors {εi}satisfying (1.3), (1.4) and (1.5). The following theorem shows that the wavelet-based esti-
mators defined as in (2.6), based on simple thresholding of the empirical wavelet coefficients,
attain nearly optimal convergence rates over a large class of functions with discontinuities,
with a number of discontinuities that diverges polynomially fast with sample size. These
results show that the discontinuities of the unknown curve have a negligible effect on the
performance of nonlinear wavelet curve estimators.
Theorem 3.1 Suppose the wavelet ψ is r-regular. Our wavelet estimator g is defined as
in (2.6) with π < 0.75(2r + 1)−1. Let τn be any sequence of positive numbers such that
for all θ > 0, τn = O(nθ+0.25α(2r+1)−1). Then, there exists a constant C, such that for all
A,M ∈ (0,∞), 1/2 ≤ σ < r;
supd<r, τ≤τn
supg∈VdτA{Gσ∞,∞(M, A)}
E
∫ (g − g
)2 ≤ C n−2σα/(2σ+α) log2n.
Remark 3.1 The above wavelet estimators defined as in (2.6) do not depend on the un-
known parameters σ and d. However, because of the long-range dependence nature, our
thresholds δj (= λσj) must be level-dependent and our estimators depend on the unknown
long memory parameter α. Wang (1996, p.480) and Johnstone and Silverman (1997, p.340)
provide simple methods to estimate the long memory parameter α. So, in practice, one needs
to estimate long memory parameter before applying wavelet method. In this paper, we treat
it as known. Our thresholds δj = λσj =√
23+γ ln n σj (for details, see Lemma 4.2 below)
are similar to the standard term-by-term hard threshold δ =√
2 ln n σ in the Gaussian case.
However, because of the long memory non-Gaussian errors here, one needs a larger constant
23+γ instead of 2.
Remark 3.2 Minimax theory indicates that the best convergence rates over the function
space Gσ∞,∞(M, A) is at n−2σα/(2σ+α). Since Gσ
∞,∞(M, A) ⊆ VdτA{Gσ∞,∞(M, A)}, the above
estimators achieve optimal convergence rates up to a logarithmic term, without knowing the
smoothness parameter. From Wang (1996, p470), the traditional linear estimators which
8
include kernel estimator can not achieve the rates stated in Theorem 3.1. Hence our non-
linear wavelet estimators achieve nearly optimal convergence rates over a large function
space.
Remark 3.3 Wang (1996) and Johnstone and Silverman (1997) consider wavelet estimators
of mean regression function in the wavelet domain or based on the so-called “sequence space
model” with Gaussian error. For details, see Johnstone and Silverman (1997). Based on
the asymptotic equivalence between “sequence space model” and “sampled data model”
(1.1), they derive the minimax optimal convergence rates of wavelet estimators in wavelet
domain. However this implication may not be true, when the underlying mean function
g is not sufficiently smooth. Therefore, for the function space with infinitely many jump
discontinuities, we consider the wavelet estimator in the time domain or directly based
on the “sampled data model” (1.1) as in Hall, et al. (1999). In the later paper, Hall,
et al. consider block-threshold projection estimators of the mean regression function with
Gaussian error, assuming the function g belongs to a large class of functions that involve
many irregularities of a wide variety of types. Here, since our function space is relatively
simple, we consider a simple standard term-by-term hard thresholded wavelet estimator
and derive nearly optimal convergence rates with infinite long memory moving average non-
Gaussian errors. We conjecture that a block thresholded estimator similar to that in Hall,
et al. (1998, 1999) can be constructed so that it attains exact minimax convergence rates
without the logarithmic penalty. The proof would likely follow the arguments of Hall, et al.
(1998, 1999), but it would be too lengthy to discuss the details here.
4 Proofs
The overall proof of Theorem 3.1 follows along the arguments of Donoho, et al. (1996) and
Hall, et al. (1998, 1999) for the independent data case. But moving from independent data
to long range dependence data, especially non-Gaussian random errors, involves a significant
change in complexity. For nonparametric regression model with Gaussian random errors or
for the density estimation with i.i.d. random variables, one can apply standard Bernstein
inequality to obtain an exponential bound. However, these techniques are not readily ap-
plicable to infinite moving average processes with long memory. The key technical ingredient
in our proof is to establish a Bernstein-type exponential inequality for a sequence of infinite
weighted sums of i.i.d. random variables. This inequality gives us an exponential bound
just like regression model with Gaussian error or density estimation with i.i.d. variables and
may be of independent interest. We believe our assumptions on the errors to derive such
exponential bound are minimum.
9
Proof of Theorem 3.1: The proof of Theorem 3.1 can be broken into several parts.
Observing that the orthogonality (2.2) of φ and ψ implies
E
∫ (g − g
)2=: I1 + I2 + I3 + I4,
where
I1 =∑
k
E(αj0k − αj0k
)2, I2 =
jσ∑j=j0
∑
k
E(θjk − βjk
)2,
I3 =
j1∑j=jσ+1
∑
k
E(θjk − βjk
)2, I4 =
∞∑j=j1+1
∑
k
β2jk.
Here θjk = βjkI(|βjk| > δj) and jσ = jσ(n) such that 2jσ ' (n−1 log2 n
)−α/(2σ+α). In order
to prove Theorem 3.1, it suffices to show that Ii ≤ Cn−2σα/(2σ+α) log2n, i = 1, · · · , 4, for all
d, τ, σ, A, M , which are the following Lemmas 4.3 to 4.6.
For this purpose, we need some preparation. We start by collecting and proving some
lemmas. Denote
aj0k := E(αj0k) =1
n
n∑i=1
g(xi)φj0k(xi),
bjk := E(βjk) =1
n
n∑i=1
g(xi)ψjk(xi).
(4.1)
Since we consider nonparametric regression with discontinuities on the sample data model,
unlike the density estimation problem as in Hall, et al. (1998), one more step of approxi-
mation between empirical wavelet coefficients and true wavelet coefficients is needed. The
following lemma which calculates the discrepancy between them will be used for proving the
other lemmas.
Lemma 4.1 Suppose the mean regression function g as in (2.5) and the wavelets φ and ψ
satisfy the uniform Holder condition (2.3). Then, for σ ≥ 1/2 and all j0 and j,
supk|aj0k − αj0k| = O
(n−1/2 + τn−1
), (4.2)
supk|bjk − βjk| = O
(n−1/2 + τn−1
). (4.3)
Proof: We only prove (4.2). The proof of (4.3) is similar and is omitted.
Let p = 2j0 , we may write
aj0k =p1/2
n
n∑i=1
g( i
n
)φ(pi
n− k
). (4.4)
10
For fixed n, p and k, we note that
0 ≤ pi
n− k ≤ 1 if and only if
nk
p≤ i ≤ n(k + 1)
p.
Let mk = bnkpc, where bxc denotes the smallest integer that is at least x. Since φ has its
support in [0, 1], the summation in (4.4) runs from mk to mk+1 − 1. However, for simplicity
of the notation, we will not distinguish between bxc and x. Thus
aj0k =p1/2
n
mk+1−1∑i=mk
g( i
n
)φ(pi
n− k
)(let i = mk + `)
=p1/2
n
n/p−1∑
`=0
g( `
n+
k
p
)φ(p`
n
)(let t` =
p`
n)
=1
p1/2
n/p−1∑
`=0
g(t` + k
p
)φ(t`
) p
n. (4.5)
Similarly, by a simple change of variables, we have
αj0k =
∫ 1
0
g(x)φj0k(x)dx
= p1/2
∫ (k+1)/p
k/p
g(x) φ(px− k) dx (let t = px− k)
=1
p1/2
∫ 1
0
g(t + k
p
)φ(t) dt. (4.6)
Combining (4.5) and (4.6), we have
aj0k − αj0k =1
p1/2
n/p−1∑
`=0
∫ p(`+1)n
p`n
[g(t` + k
p
)φ(t`)− g
(t + k
p
)φ(t)
]dt
= J1 + J2, (4.7)
where
J1 =1
p1/2
n/p−1∑
`=0
∫ p(`+1)n
p`n
[g(t` + k
p
)− g
(t + k
p
)]φ(t`
)dt
and
J2 =1
p1/2
n/p−1∑
`=0
∫ p(`+1)n
p`n
g(t + k
p
)[φ(t`)− φ(t)
]dt.
Let us consider the term J1 first. Since g = g1+g2 with g1 ∈ Gσ∞,∞(M, A) and g2 ∈ PdτA,
we can write J1 = J1,1 + J1,2, where
J1,j =1
p1/2
n/p−1∑
`=0
∫ p(`+1)n
p`n
[gj
(t` + k
p
)− gj
(t + k
p
)]φ(t`
)dt, j = 1, 2.
11
Since g1 ∈ Gσ∞,∞(M, A), σ ≥ 1/2 and φ is bounded on [0, 1], we have
∣∣J1,1
∣∣ ≤ 1
p1/2
n/p−1∑
`=0
∫ p(`+1)n
p`n
C( |t− t`|
p
)σ
dt
≤ 1
p1/2· C
( 1
n
)σ
≤ Cn−1/2.
(4.8)
Since g2 ∈ PdτA, it is piecewise polynomial and has at most τ discontinuities. Thus g2 is
bounded on [0, 1] and is Lipschitz on every open subinterval of [0, 1] where g2 is continuous.
For simplicity, we will assume that each interval (p`n, p(`+1)
n) contains at most one discontinuity
of the function g2
( ·+kp
). This reduction, which brings some convenience for presenting our
proof, is not essential and the same argument remains true if an interval contains more
discontinuities.
If (p`n, p(`+1)
n) contains no discontinuity of g2
( ·+kp
), then by the Lipschitz condition we
have ∫ p(`+1)n
p`n
∣∣∣g2
(t` + k
p
)− g2
(t + k
p
)∣∣∣∣∣φ(
t`)∣∣ dt ≤ C
p
n2. (4.9)
If (p`n, p(`+1)
n) contains one discontinuity, say t0, of g2
( ·+kp
), then we will split the integral
in (4.9) over (p`n, t0) and (t0,
p(`+1)n
). Since the values of the integrals remain the same if
we modify the values of the function g2
( ·+kp
)at the end-points of the intervals, we may
assume that g2
( ·+kp
)are polynomials on the closed intervals [p`
n, t0] and [t0,
p(`+1)n
]. Hence
the triangle inequality and Lipschitz condition imply that the integral in (4.9) is bounded
above by a constant multiple of
∫ t0
p`n
∣∣∣g2
(t` + k
p
)− g2
(t + k
p
)∣∣∣ dt +
∫ p(`+1)n
t0
∣∣∣g2
(t` + k
p
)− g2
(t0 + k
p
)∣∣∣ dt
+
∫ p(`+1)n
t0
∣∣∣g2
(t0 + k
p
)− g2
(t + k
p
)∣∣∣ dt
≤ C n−1
( ∫ t0
p`n
dt + 2
∫ p(`+1)n
t0
dt
).
(4.10)
Summing up (4.9) and (4.10) over ` = 0, 1, . . . , n/p− 1 and recall that there are τ disconti-
nuities, we obtain
∣∣J1,2
∣∣ ≤ 1
p1/2· C (1 + τ) n−1 ≤ C (1 + τ) n−1. (4.11)
As to the second term J2, we use the boundedness of g and the uniform 1/2-Holder condition
(2.3) for φ to derive∣∣J2
∣∣ ≤ 1
p1/2· C
(p
n
)1/2
= C n−1/2. (4.12)
It is clear that (4.2) follows from (4.7), (4.8), (4.11) and (4.12).
12
Remark 4.1 If we write αjk =∫
gφjk =∫
g1φjk +∫
g2φjk = αjk,1 + αjk,2, similarly for ajk,1
and ajk,2, then Lemma 4.1 shows that supk|ajk,1 − αjk,1| = O(n−1/2) and sup
k|ajk,2 − αjk,2| =
O(n−1/2 + τn−1). Furthermore, if the number of the jump discontinuities τ ≤ τn = O(n1/2),
then supk|ajk,1 − αjk,1| = O(n−1/2) and sup
k|ajk,2 − αjk,2| = O(n−1/2). Similar results hold for
βjk and bjk.
Lemma 4.2 Under the assumptions of Theorem 3.1, we have
P( ∣∣∣βjk − bjk
∣∣∣ > δj
)≤ n−1, ∀j ∈ [j0, j1] and k = 0, 1, · · · , 2j − 1. (4.13)
Proof: First let’s calculate E(βjk − bjk)2. From (2.7) and (4.1), we have
E(βjk − bjk)2 =
1
n2
n∑i1=1
n∑i2=1
E(εi1εi2
)ψjk(xi1)ψjk(xi2)
=2j
n2
n∑i1=1
n∑i2=1
r(i1 − i2)ψ(2jxi1 − k)ψ(2jxi2 − k).
For each fixed k = 0, 1, · · · , 2j − 1, similar to (4.5), we have
E(βjk − bjk)2 =
2j
n2
n2−j−1∑i1=1
n2−j−1∑i2=1
r(i1 − i2)ψ(i12
j
n
)ψ
(i22j
n
)
= 2−jC0
(2jn−1
)α[ ∫∫
|x− y|−αψ(x)ψ(y) dxdy + o(1)],
where the last equality follows from (1.2) and a standard limiting argument. Recall that
δ2j = 23+γC2n
−α2−j(1−α) ln n in (2.6). Let σ2j = C2n
−α2−j(1−α) and λ = 2√
21+γ ln n, then
we have δ2j = λ2σ2
j . From the above calculation, we see that E(βjk − bjk)2 ∼ σ2
j . In view
of (2.7), (4.1) and (1.3), we may write βjk − bjk as an infinite weighted sum of independent
random variables {ζj, j ∈ Z}:
βjk − bjk = n−1
n∑i=1
εiψjk(xi) =:∑
s∈Zdn,sζs, (4.14)
where
dn,s =
n−1∑n
i=1 bi−sψjk(xi), if s ≤ 0;
n−1∑n
i=s bi−sψjk(xi), if 0 < s ≤ n;
0, otherwise.
Hence, we have∑s∈Z
d2n,s = E(βjk − bjk)
2 ∼ σ2j . Also let
Sn = σ−1j
∑
s∈Zdn,s ζs and Sn,K = σ−1
j
∑
|s|<K
dn,s ζs.
13
Then, as K →∞, Sn,K −→ Sn almost surely for all integers n. We re-write the partial sum
Sn,K as
Sn,K =∑
|s|<K
σ−1j dn,s ζs.
Then E(Sn,K) = 0 and, by (2.9) and (1.5), we have that for all integers m ≥ 3,
∣∣Γm(Sn,K)∣∣ =
∣∣∣∑
|s|<K
(dn,s
σj
)m
Γm(ζs)∣∣∣
≤∑
|s|<K
∣∣∣dn,s
σj
∣∣∣m (m!)1+γ
∆m−2.
(4.15)
By using (1.4), the Cauchy-Schwarz inequality and the fact that n−1∑n
i=1 ψ2jk(xi) → 1, we
have
sups∈Z
d2n,s ≤ C n−1
n∑i=1
i−(1+α) ≤ C n−1
for some finite constant C > 0. This implies
sups∈Z
d2n,s
σ2j
≤ C (n−12j)1−α. (4.16)
It follows from (4.16) that
∑
|s|<K
∣∣∣dn,s
σj
∣∣∣m
≤ sup|s|<K
(d2n,s
σ2j
)(m−2)/2
·∑
|s|<K
d2n,sσ
−2j
≤(C
(n−12j
)(1−α)/2)m−2
.
(4.17)
Combining (4.15) and (4.17) yields
∣∣Γm(Sn,K)∣∣ ≤
(m!
2
)1+γ 21+γ
[C−1 ∆
(n 2−j
)(1−α)/2]m−2, ∀m = 3, 4, . . . . (4.18)
That is, Sn,K satisfies the condition (2.10) with H = 21+γ and ∆ = C−1 ∆(n 2−j
)(1−α)/2.
Since 2j1 ' n1−π, we have ∆ ≥ C−1 ∆ nπ(1−α)/2 for all integers j ∈ [j0, j1]. Hence
λ = 2√
21+γ ln n < (H1+γ∆)1/(1+γ) for all integers j ∈ [j0, j1], for sufficiently large n. It
follows from Lemma 2.1 that
P(|Sn,K | > λ
)≤ exp
(− λ2
4H
)= n−1. (4.19)
Let K →∞ and use Fatou’s lemma, we have
P(∣∣∣βjk − bjk
∣∣∣ > δj
)= P
(|Sn| > λ) ≤ lim inf
K→∞P
(|Sn,K | > λ
)≤ n−1.
This finishes the proof of Lemma 4.2.
14
Remark 4.2 From the proof of Lemma 4.2, we see that by choosing λ appropriately, the
tail probability estimate (4.13) can be significantly improved.
Lemma 4.3 Under the assumptions of Theorem 3.1,
I1 :=∑
k
E(αj0k − αj0k
)2= o
(n−2σα/(2σ+α) log2n
).
Proof: Note that
I1 ≤ 2[ ∑
k
E(αj0k − aj0k)2 +
∑
k
(aj0k − αj0k)2]
=: 2(I11 + I12).
As to the first term, we may apply the similar calculation as that in Lemma 4.2 to derive
I11 =1
n2
∑
k
n∑i1=1
n∑i2=1
E(εi1εi2
)φj0k(xi1)φj0k(xi2)
=∑
k
2−j0C0
(2j0n−1
)α[ ∫∫
|x− y|−αφ(x)φ(y) dxdy + o(1)]
=2j0−1∑
k=0
2−j0C0
(2j0n−1
)α∫∫
|x− y|−αφ(x)φ(y) dxdy + o((2j0n−1)α
)
≤ C(2j0n−1
)α= o
(n−2σα/(2σ+α) log2n
),
where the last equality follows from our choice of j0 with 2j0 ' log2 n.
As to the second term, since τ ≤ τn = O(nθ+0.25α(2r+1)−1) = O(n1/2), from Lemma 4.1
and Remark 4.1, we have
I12 = O(2j0n−1
)= o
(n−2σα/(2σ+α) log2n
).
Together with term I11, this proves Lemma 4.3.
Lemma 4.4 Under the assumptions of Theorem 3.1,
I2 :=
jσ∑j=j0
∑
k
E(θjk − βjk
)2 ≤ Cn−2σα/(2σ+α) log2n,
where θjk = βjkI(|βjk| > δj) and jσ = jσ(n) such that 2jσ ' (n−1 log2 n
)−α/(2σ+α).
Proof: Notice θjk = βjkI(|βjk| > δj), we have
I2 ≤ 2
jσ∑j=j0
∑
k
E[β2
jkI(|βjk| ≤ δj
)]+ 2
jσ∑j=j0
∑
k
E[(
βjk − βjk
)2I(|βjk| > δj
)]
=: 2(I21 + I22).
(4.20)
15
Also,
I21 ≤jσ∑
j=j0
∑
k
β2jkI
(|βjk| ≤ 2δj
)+
jσ∑j=j0
∑
k
β2jkP
(|βjk − βjk| > δj
)
=: I211 + I212.
(4.21)
Since there are at most 2j non-zero terms of βjk’s and δ2j = 23+γC2n
−α2−j(1−α) ln n, we have
I211 ≤jσ∑
j=j0
∑
k
4δ2j ≤
jσ∑j=j0
∑
k
Cn−α2−j(1−α) ln n
≤ C log2 n · n−α
jσ∑j=j0
2jα ≤ Cn−2σα/(2σ+α) log2n.
(4.22)
As to the term I212, from (4.3) in Lemma 4.1 and our choice of τ , it is easy to see that
supk|bjk − βjk| < δj for all j ∈ [j0, jσ]. Thus, I212 = O
(∑jσ
j=j0
∑k β2
jkP(|βjk − bjk| > δj
)).
Write βjk =∫
gψjk =∫
g1ψjk +∫
g2ψjk =: βjk,1 + βjk,2 as in Remark 4.1. Since g1 ∈ Gσ∞,∞,
we have β2jk,1 = O(2−j(1+2σ)). As to βjk,2, since g2 ∈ PdτA and our wavelet ψ has r (r > d)
vanish moments, there are at most τ non-zero βjk,2 terms with β2jk,2 = O(2−j). Thus, apply
Lemma 4.2, we have
I212 ≤ C
jσ∑j=j0
2j2−j(1+2σ)n−1 + C
jσ∑j=j0
τ2−jn−1 = o(n−2σα/(2σ+α) log2n). (4.23)
Now let’s consider the second term I22. Apply Lemma 4.1 and E(βjk − bjk)2 ∼ σ2
j as
that in Lemma 4.3, we have
I22 ≤ 2[ jσ∑
j=j0
∑
k
E(βjk − bjk
)2+
jσ∑j=j0
∑
k
(βjk − bjk
)2]
≤ C
jσ∑j=j0
∑
k
n−α2−j(1−α) + C
jσ∑j=j0
2j(n−1 + τ 2n−2
)
≤ Cn−α
jσ∑j=j0
2jα + Cn−12jσ + Cτ 22jσn−2
≤ Cn−2σα/(2σ+α) log2n,
(4.24)
where the last inequality follows from our choice τ ≤ τn, σ < r and 1 ≤ r. Combining with
(4.20), (4.21), (4.22) and (4.23), this completes the proof of the lemma.
Lemma 4.5 Under the assumptions of Theorem 3.1,
I3 :=
j1∑j=jσ+1
∑
k
E(θjk − βjk
)2 ≤ Cn−2σα/(2σ+α) log2n,
16
where θjk = βjkI(|βjk| > δj) and jσ = jσ(n), such that 2jσ ' (n−1 log2 n
)−α/(2σ+α).
Proof: As in Lemma 4.4, we have
I3 ≤ 2
j1∑j=jσ+1
∑
k
E[β2
jkI(|βjk| ≤ δj
)]+ 2
j1∑j=jσ+1
∑
k
E[(
βjk − βjk
)2I(|βjk| > δj
)]
=: 2(I31 + I32).
(4.25)
Also,
I31 ≤j1∑
j=jσ+1
∑
k
β2jkI
(|βjk| ≤ 2δj
)+
j1∑j=jσ+1
∑
k
β2jkP
(|βjk − βjk| > δj
)
=: I311 + I312.
(4.26)
Let’s consider term I311 first. From Remark 4.1, we only need to prove
I311,l =
j1∑j=jσ+1
∑
k
β2jk,l I
(|βjk,l| ≤ 2δj
) ≤ Cn−2σα/(2σ+α) log2n, l = 1, 2. (4.27)
Since β2jk,1 = O(2−j(1+2σ)), we have
I311,1 ≤ C
j1∑j=jσ+1
2j · 2−j(1+2σ) ≤ C2−2σjσ = Cn−2σα/(2σ+α) log2n.
For the second term I311,2, since g2 ∈ PdτA and our wavelet ψ has r vanish moments with
r > d, there are at most τ non-zero coefficients βjk,2. Because |βjk,2| ≤ 2δj for these τ terms,
we have
I311,2 ≤ C
j1∑j=jσ+1
τδ2j ≤ Cτn−α2−(1−α)jσ ≤ Cn−2σα/(2σ+α) log2n,
the last inequality follows from τ ≤ τn = O(nθ+0.25α(2r+1)−1). Thus we prove (4.27).
As to the term I312, we have, for any positive number α1 and α2 such that α1 + α2 = 1,
I312 =
j1∑j=jσ+1
∑
k
β2jkP
(|βjk − bjk| > α1δj
)+
j1∑j=jσ+1
∑
k
β2jkI
(|bjk − βjk| > α2δj
).
Since we can choose α1 large enough, close to 1, from Lemma 4.2, the first term in I312 is
bounded by C∑j1
j=jσ+1 2j2−jn−1 = o(n−2σα/(2σ+α) log2n).
As to the second term in I312, based on Lemma 4.1, we have for all j ∈ [j0, j1], |bjk −βjk| < α2δj for sufficient large n. Therefore this term is negligible. Together with (4.26) and
(4.27), we prove the bound for term I31.
17
As to the term I32, for any η ∈ (0, 1), we have
I32 ≤j1∑
j=jσ+1
∑
k
E[(
βjk − βjk
)2I(|βjk| > ηδj
)]
+
j1∑j=jσ+1
∑
k
E[(
βjk − βjk
)2I(|βjk − βjk| > (1− η)δj
)]
=: I321 + I322.
(4.28)
Let’s consider I321 first. Applying the same argument as in I22, using Lemma 4.1 and
noticing there are at most τ terms that |βjk| > ηδj, we have
I321 ≤ C
j1∑j=jσ+1
∑
k
n−α2−j(1−α)I(|βjk| > ηδj
)+ C
j1∑j=jσ+1
τ(n−1 + τ 2n−2
)
=: I3211 + I3212.
(4.29)
For the second term I3212, based on the boundness of τ ≤ τn in Theorem 3.1, we have
I3212 ≤ Cj1τn−1 + Cj1τ3n−2 = o(n−2σα/(2σ+α) log2n).
As to the first term I3211, we can consider I3211,1 and I3211,2, respectively. For the term
I3211,2, since there are only τ terms that |βjk| > ηδj, we have I3211,2 ≤ C∑j1
j=jσ+1 τn−α2−j(1−α),
which is the same as I311,2. As to the term I3211,1, since β2jk,1 > η2δ2
j in I3211,1, we have, for
any t > 0,
I3211,1 ≤ Cn−α
j1∑j=jσ+1
∑
k
2−j(1−α)(β2
jk,1η−2δ−2
j
)t
=Cnα(t−1)
(log2 n)t
j1∑j=jσ+1
∑
k
β2tjk,12
−j(1−α)(1−t)
≤ Cnα(t−1)
(log2 n)t
j1∑j=jσ+1
2−j(1+2σ)t2−j(1−α)(1−t)
= o(n−2σα/(2σ+α) log2n
).
Together with I3212, we prove the bound for I321. In order to prove the Lemma, in view of
(4.28), we need to bound the last term I322.
As before, we may write
I322 ≤ 2
j1∑j=jσ+1
∑
k
E[(
βjk − bjk
)2I(|βjk − βjk| > (1− η)δj
)]
+ 2
j1∑j=jσ+1
∑
k
E[(
bjk − βjk
)2I(|βjk − βjk| > (1− η)δj
)]
=: 2(I3221 + I3222).
(4.30)
18
For any positive numbers α1 and α2 such that α1 + α2 = 1, we have
I3221 ≤j1∑
j=jσ+1
∑
k
E[(
βjk − bjk
)2I(|βjk − bjk| > α1(1− η)δj
)]
+
j1∑j=jσ+1
∑
k
E[(
βjk − bjk
)2I(|bjk − βjk| > α2(1− η)δj
)](4.31)
As to the first term, by Holder’s inequality, for any positive numbers a and b such that
1/a + 1/b = 1, we have it’s bound
j1∑j=jσ+1
∑
k
[E
(βjk − bjk
)2a]1/a[
P(|βjk − bjk| > α1(1− η)δj
)]1/b
.
Choose α1 close to 1 and η > 0 small enough, by Lemma 4.2 we derive[P (|βjk − bjk| >
α1(1 − η)δj)]1/b
= O(n−1/b). As to the first term, from Lemma 4.2, E(βjk − bjk)2a =
σ2aj E(
∑s∈Z σ−1
j dn,sζs)2a. Apply Rosenthal’s inequality (Hardle, et al., p.244) and calcu-
lation as in Lemma 4.2 to the above expectation term, we can show it is finite for all
a. Now choose a sufficiently large [so that b is close to 1], we can show the first term is
bounded by Cn−α2−j(1−α). Therefore we obtain that the first term in I3221 is bounded by
C∑j1
j=jσ+1 2jn−α2−j(1−α)n−1 = o(n−2σα/(2σ+α) log2n).
As to the second term in I3221, we apply Lemma 4.1 to see that, when n is sufficiently
large,∣∣bjk − βjk
∣∣ < α2(1 − η)δj for all j and k. Thus the second term in I3221 is negligible.
Hence we have derived a desired bound for term I3221.
Similar to I3221, we write
I3222 ≤j1∑
j=jσ+1
∑
k
(bjk − βjk
)2P
(|βjk − bjk| > α1(1− η)δj
)
+
j1∑j=jσ+1
∑
k
(bjk − βjk
)2I(|bjk − βjk| > α2(1− η)δj
).
(4.32)
The bound for the first term follows from Lemma 4.1 and Lemma 4.2, while the second term
is negligible too. Combining (4.32) with (4.31), we get bound for I322, which, together with
(4.28) and (4.29), proves the lemma.
Lemma 4.6 Under the assumptions of Theorem 3.1,
I4 :=∞∑
j=j1+1
∑
k
β2jk = o
(n−2σα/(2σ+α) log2n
).
19
Proof: From (2.4), we may write the wavelet coefficients βjk as βjk =∫
gψjk =∫
g1ψjk +∫g2ψjk =: βjk,1 + βjk,2. In order to prove the lemma, it suffices to show
I4,l :=∞∑
j=j1+1
∑
k
β2jk,l = o
(n−2σα/(2σ+α) log2n
), l = 1, 2.
Let’s first consider I4,1. Because the functions g and ψ have compact support, i.e., supp g ⊆[0, 1] and supp ψ ⊆ [0, 1], we have, for any level j, there are at most 2j non-zero coefficients
βjk,1’s. Since g1 ∈ Gσ∞,∞, we have β2
jk,1 = O(2−j(1+2σ)). Thus
I4,1 ≤ C
∞∑j=j1+1
2−2σj = C2−2σj1 = o(n−2σα/(2σ+α) log2n), (4.33)
where the last equality follows from our choice of j1 with 2j1 ' n1−π and π < 0.75(2r +1)−1.
As to the second term I4,2, since there are at most τ discontinuities for any level j and
β2jk,2 = O(2−j) for those at most τ coefficients, we have
I4,2 ≤ C
∞∑j=j1+1
2−2σj + C
∞∑j=j1+1
τ2−j. (4.34)
From the facts that τ ≤ τn = O(nθ+0.25(2r+1)−1) and 2j1 ' n1−π with π < 0.75(2r + 1)−1, one
can verify∑∞
j=j1+1 τ2−j = τ2−j1 = o(n−2σα/(2σ+α) log2n). Combining this with (4.33) and
(4.34) completes the proof of the lemma.
REFERENCES
Amosova, N. N. (2002). Necessity of the Cramer, Linnik and Statulevicius conditions for
the probabilities of large deviations. J. Math. Sci. (New York) 109, 2031–2036
Baillie, R. T. (1996). Long memory processes and fractional integration in econometrics.
J. Econometrics 73, 5–59.
Bentkus, R. and Rudzkis, R. (1980), Exponential estimates for the distribution of random
variables. (Russian) Litovsk. Mat. Sb. 20, 15–30.
Beran, J. (1994). Statistics for Long Memory Processes. Chapman and Hall, New York.
Cohen, A., Daubechies, I. and Vial, P. (1993). Wavelets on the interval and fast wavelet
transforms. Appl. Comput. Harm. Anal. 1, 54–82.
Csorgo, S. and Mielniczuk, J. (1995). Nonparametric regression under long-range dependent
normal errors. Ann. Statist. 23, 1000–1014.
Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM, Philadelphia.
20
Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet
shrinking. J. Amer. Statist. Assoc. 90, 1200–1224.
Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage.
Ann. Statist. 26, 879–921.
Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrink-
age: asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B. 57, 301–369.
Giraitis, L., Koul, H. L. and Surgailis, D. (1996). Asymptotic normality of regression
estimators with long memory errors. Statist. Probab. Lett. 29, 317–335.
Giraitis, L. and Surgailis, D. (1999). Central limit theorem for the empirical process of a
linear sequence with long memory. J. Statist. Plann. Inference 80, 81–93.
Hall, P. and Hart, J. D. (1990). Nonparametric regression with long-range dependence.
Stoch. Process. Appl. 36, 339–351.
Hall, P., Kerkyacharian, G. and Picard, D. (1998). Block threshold rules for curve estima-
tion using kernel and wavelet method. Ann. Statist. 26, 922–942.
Hall, P., Kerkyacharian, G. and Picard, D. (1999). On the minimax optimality of block
thresholded wavelet estimators. Statist. Sinica 9, 33–50.
Hardle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets, Approx-
imation and Statistical Applications. Lecture Notes in Statistics 129, Springer, New
York.
Hart, J. D. (1991). Kernel regression estimation with time series errors. J. Roy. Statist.
Soc. Ser. B. 53, 173–187.
Ho, H. C. and Hsing, T. (1996). On the asymptotic expansion of the empirical process of
long memory moving averages. Ann. Statist. 24, 992–1024.
Ho, H. C. and Hsing, T. (1997). Limit theorems for functionals of moving averages. Ann.
Probab. 25, 1636–1669.
Johnstone, I. M. (1999). Wavelet shrinkage for correlated data and inverse problems: adap-
tivity results. Statist. Sinica 9, 51–83.
Johnstone, I. M. and Silverman, B. W. (1997). Wavelet threshold estimators for data with
correlated noise. J. Roy. Statist. Soc. Ser. B. 59, 319–351.
Koul, H. L. and Surgailis, D. (1997). Asymptotic expansion of M-estimators with long
memory errors. Ann. Statist. 25, 818–850.
Koul, H. L. and Surgailis, D. (2001) Asymptotics of the empirical process of long memory
moving averages with infinite variance. Stoch. Process. Appl. 91, 309–336.
Kovac, A. and Silverman, B. W. (2000). Extending the scope of wavelet regression methods
by coefficient-dependent thresholding. J. Amer. Statist. Assoc. 95, 172–183.
21
Li, L. and Xiao, Y. (2006). On the minimax optimality of block thresholded wavelet
estimators with long memory data. J. Statist. Plann. Inference (in press).
Petrov, V. V. (1975), Sums of Independent Random Variables. Springer-Verlag, New York.
Picard, D. and Tribouley, K. (2000). Adaptive confidence interval for pointwise curve
estimation. Ann. Statist. 28, 298–335.
Robinson, P. M. (1994). Semiparametric analysis of long-memory time series. Ann. Statist.
22, 515–539.
Robinson, P. M. (1997). Large-sample inference for nonparametric regression with depen-
dent errors. Ann. Statist. 25, 2054–2083.
Rudzkis, R., Saulis, L. and Statulevicius, V. (1978). A general lemma on large deviation
probabilities. Lith. Math. J. 18, 226–238.
Saulis, L. and Statulevicius, V. (2000). Limit theorems on large deviations. In: Limit
Theorems of Probability Theory. (Prokhorov, Yu. V. and Statulevicius, V., editors),
Springer, New York.
Tran, L. T., Roussas, G. G., Yakowitz, S and Truong Van, B. (1996). Fixed-design regres-
sion for linear time series. Ann. Statist. 24, 975–991.
Truong, Y. K. and Patil, P. N. (2001). Asymptotics for wavelet based estimates of piecewise
smooth regression for stationary time series. Ann. Inst. Statist. Math. 53, 159–178.
Triebel, H. (1992). Theory of Function Spaces II. Birkhauser, Basel.
von Sachs, R. and Macgibbon, B. (2000). Non-parametric curve estimation by wavelet
thresholding with locally stationary errors. Scandinavian J. Statist. 27, 475–499.
Wang, Y. (1996). Function estimation via wavelet shrinkage for long-memory data. Ann.
Statist. 24, 466–484.
Zhang, S. and Wong, M. (2003). Wavelet threshold estimation for additive regression
models. Ann. Statist. 31, 152–173.
22