Transcript of Linear functional estimation under multiplicative ...
SERGIO BRENNER MIGUEL ∗
Abstract
We study the non-parametric estimation of the value ϑ(f) of a
linear functional eval- uated at an unknown density function f with
support on R+ based on an i.i.d. sample with multiplicative
measurement errors. The proposed estimation procedure combines the
estimation of the Mellin transform of the density f and a
regularisation of the inverse of the Mellin transform by a spectral
cut-off. In order to bound the mean squared error we distinguish
several scenarios characterised through different decays of the
upcoming Mellin transforms and the smoothnes of the linear
functional. In fact, we identify sce- narios, where a non-trivial
choice of the upcoming tuning parameter is necessary and propose a
data-driven choice based on a Goldenshluger-Lepski method.
Additionally, we show minimax-optimality over Mellin-Sobolev spaces
of the estimator.
Keywords: Linear functional model, multiplicative measurement
errors, Mellin-transform, Mellin-Sobolev space, minimax
theory,inverse problem, adaptation
AMS 2000 subject classifications: Primary 62G05; secondary 62F10 ,
62C20,
1 Introduction
In this paper we are interested in estimating the value ϑ(f) of a
linear functional evaluated at an unknown density f : R+ → R+ of a
positive random variable X , when Y = XU for
∗Institut für Angewandte Mathematik, MΛTHEMΛTIKON, Im Neuenheimer
Feld 205, D-69120 Heidelberg, Germany, e-mail:
{brennermiguel|johannes}@math.uni-heidelberg.de
∗∗CNRS, MAP5 UMR 8145, F-75006 Paris, France, e-mail:
fabienne.comte@parisdescartes.fr
1
fY (y) = [f ∗ g](y) =
∫ ∞ 0
f(x)g(y/x)x−1dx, y ∈ R+,
such that ∗ denotes multiplicative convolution. Therefore, the
estimation of f and hence ϑ(f)
using an i.i.d. sample Y1, . . . , Yn from fY is called a
multiplicative deconvolution problem, which is an inverse problem.
Vardi [1989] and Vardi and Zhang [1992] introduce and study
intensively multiplicative cen- soring, which corresponds to the
particular multiplicative deconvolution problem with multi-
plicative error U uniformly distributed on [0, 1]. Multiplicative
censoring is a common chal- lenge in survival analysis as explained
and motivated in Van Es et al. [2000]. The estimation of the
cumulative distribution function of X is considered in Vardi and
Zhang [1992] and As- gharian and Wolfson [2005]. Series expansion
methods are studied in Andersen and Hansen [2001] treating the
model as an inverse problem. The density estimation in a
multiplicative censoring model is considered in Brunel et al.
[2016] using a kernel estimator and a convolu- tion power kernel
estimator. Assuming a uniform error distribution on an interval
[1−α, 1+α]
for α ∈ (0, 1) Comte and Dion [2016] analyze a projection density
estimator with respect to the Laguerre basis. Belomestny et al.
[2016] study a beta-distributed error U . The multiplicative
measurement error model covers all those three variations of
multiplica- tive censoring. It was considered by Belomestny and
Goldenshluger [2020] for the point-wise density estimation. The key
to the analysis of multiplicative deconvolution is the multiplica-
tion theorem, which for a density fY = f ∗ g and their Mellin
transformsM[fY ],M[f ] and M[g] (defined below) statesM[fY ] =M[f
]M[g]. Exploiting the multiplication theorem Be- lomestny and
Goldenshluger [2020] introduce a kernel density estimator of f
allowing more generally X and U to take also negative values.
Moreover, they point out that transforming the data by applying the
logarithm is a special case of their estimation strategy. Note that
by applying the logarithm the model Y = XU writes log(Y ) = log(X)
+ log(U), and hence multiplicative convolution becomes (additive)
convolution for the log-transformed data. As a consequence, first
the density of log(X) is eventually estimated employing usual
strategies for non-parametric (additive) deconvolution problems
(see for example Meister [2009]) and then secondly transformed back
to an estimator of f . Thereby, regularity conditions commonly used
in (additive) deconvolution problems are imposed on the density of
log(X), which how- ever is difficult to interpret as regularity
conditions on the density of f . Furthermore, the analysis of a
global risk of an estimator f using this naive approach is
challenging as Comte and Dion [2016] point out. The global
estimation of the density under multiplicative measurement errors
is considered in
2
Brenner Miguel et al. [2021] using the Mellin transform and a
spectral cut-off regularization of its inverse to define an
estimator for the unknown density f . Brenner Miguel [2021] studies
the global density estimation under multiplicative measurement
errors for multivariate random variables while the global
estimation of the survival function can be found in Brenner Miguel
and Phandoidaen [2021]. In this paper we estimate the value ϑ(f) of
a known linear functional of the unknown density f plugging in the
estimator of f proposed by Brenner Miguel et al. [2021]. In
additive deconvolution linear functional estimation has been
studied for instance by Butucea and Comte [2009], Mabon [2016] and
Pensky [2017] to mention only a few. In the literature, the most
studied examples for estimating linear functionals is point-wise
estimation of the unknown density f , the survival function,
cumulative distribution function (c.d.f.) or the Laplace transform
of f . These examples are particular cases of our general setting.
More precisely, we show below, that in each of those examples the
quantity of interest can be written as linear functional in the
form
ϑ(f) := 1
Ψ(−t)M[f ](t)dt,
where Ψ : R→ C is a known function andM[f ] denotes the Mellin
transform of f . Exploiting properties of the Mellin transform we
characterize the underlying inverse problem and natural regularity
conditions which borrow ideas from the inverse problems community
(see e.g. Engl et al. [2000]). More precisely, we identify
conditions on the decay of the Mellin transform of f and g and of
the function Ψ to ensure that our estimator is well-defined. We
illustrate those conditions by different scenarios. The proposed
estimator, however, involves a tuning parameter and we specify when
this parameter has to be chosen non-trivially. For that case, we
propose a data-driven choice of the tuning parameter inspired by
the work of Goldenshluger and Lepski [2011] who consider
data-driven bandwidth selection in kernel density estimation. We
establish an oracle inequality for the plug-in spectral cut-off
estimator under fairly mild assumptions on the error density g.
Moreover we show that uniformly over Mellin-Sobolev spaces the
proposed estimator is minimax-optimal. The paper is organized in
the following way: in section 2 we develop the data-driven plug-in
estimator and introduce our basic assumptions. We state an oracle
type upper bound for the mean squared error of the plug-in spectral
cut-off estimator with fully-data driven choice of the tuning
parameter. In section 3 we state a maximal upper bound over
Mellin-Sobolev spaces mean squared error of the spectral cut-off
estimator for the plug-in spectral cut-off estimator with optimal
tuning parameter realising a squared-bias-variance trade-off and
lower bounds for the point-wise estimation of the unknown density f
, the survival function and the c.d.f. The proofs can be found in
the appendix.
3
2 Data-driven estimation
:= ∫
|h(x)|pω(x)dx. Denote by Lp(ω) the set of all measurable functions
from
to C with finite . Lp(ω)-norm. In the case p = 2 let h1, h2L2 (ω)
:=
∫ h1(x)h2(x)ω(x)dx
for h1, h2 ∈ L2 (ω) be the corresponding weighted scalar product.
Using a slight abuse of no-
tation xa with a ∈ R denotes the weight function x 7→ xa, and we
write · xa := · L2 R(xa),
respectively ·, ·xa := ·, ·L2 R+ (x2c−1). Further we use the
abbreviation LpR = LpR(ω) for the
unweighted LpR space with ω(x) = 1 for all x ∈ R. For a measurable
function h : R → C let us denote by h∞,ω the essential supremum of
the function x 7→ h(x)ω(x).
Mellin transform Let c ∈ R. For two functions h1, h2 ∈ L1 R+(xc−1)
and any y ∈ R we have∫∞
0 |h1(x)h2(y/x)x−1|dx < ∞ which allows us to define their
multiplicative convolution
h1 ∗ h2 : R→ C through
(h1 ∗ h2)(y) =
h1(y/x)h2(x)x−1dx, y ∈ R. (2.1)
For a proof sketch of h1 ∗ h2 ∈ L1 R+(xc−1) and the following
properties we refer to Bren-
ner Miguel [2021]. If in addition h1 ∈ L2 R+(x2c−1) (respectively
h2 ∈ L2
R+(x2c−1)) then h1 ∗h2 ∈ L2
R+(x2c−1), too. For h ∈ L1 R+(xc−1) we define its Mellin
transformMc[h] : R→ C
at the development point c ∈ R by
Mc[h](t) :=
xc−1+ith(x)dx, t ∈ R. (2.2)
One key property of the Mellin transform, which makes it so
appealing for multiplicative deconvolution problems, is the
multiplication theorem, which for h1, h2 ∈ L1
R+(xc−1) states
Mc[h1 ∗ h2](t) =Mc[h1](t)Mc[h2](t), t ∈ R. (2.3)
Making use of the Fourier transform, the domain of definition of
the Mellin transform can be extended to L2
R+(x2c−1). Therefore, let : R → R+, with x 7→ exp(−2πx) and denote
by −1 : R+ → R its inverse. Note that the diffeomorphisms , −1 map
Lebesgue null sets on Lebesgue null sets. Consequently, the map Φc
: L2
R+(x2c−1) → L2 R, with h 7→ c · (h )
is a well-defined isomorphism and denote by Φ−1 c : L2
R → L2 R+(x2c−1) its inverse. For h ∈
L2 R+(x2c−1) the Mellin transformMc[h] : R→ C developed in c ∈ R is
defined through
Mc[h](t) := (2π)F [Φc[h]](t) for any t ∈ R.
4
Here, F : L2 R → L2
R with H 7→ (t 7→ F [H](t) := limk→∞ ∫ k −k exp(−2πitx)H(x)dt)
denotes
the Plancherel-Fourier transform where the limit is understood in a
L2 R convergence sense.
Due to this definition several properties of the Mellin transform
can be deduced from the well-known Fourier theory. In particular
for any h ∈ L1
R+(xc−1) ∩ L2 R+ (x2c−1) we have
Mc[h](t) =
xc−1+ith(x)dx for any t ∈ R, (2.4)
which coincides with the common definition of a Mellin transform
given in Paris and Kaminski [2001].
Example 2.1. Now let us give a few examples of Mellin transforms of
commonly considered distribution families.
(i) Beta Distribution admits a density gb(x) := 1(0,1)(x)b(1− x)b−1
for a b ∈ N and x ∈ R+. Then, we have gb ∈ L2
R+(x2c−1) ∩ L1 R+(xc−1) for any c > 0 and
Mc[gb](t) = b∏
c− 1 + j + it , t ∈ R.
(ii) Scaled Log-Gamma Distribution given by its density gµ,a,λ(x) =
exp(λµ) Γ(a)
x−λ−1(log(x) − µ)a−11(eµ,∞)(x) for a, λ, x ∈ R+ and µ ∈ R. Then,
for c < λ + 1 hold gµ,a,λ ∈ L2
R+(x2c−1) ∩ L1 R+(xc−1) and
Mc[gµ,a,λ](t) = exp(µ(c− 1 + it))(λ− c+ 1− it)−a, t ∈ R.
Note that gµ,1,λ is the density of a Pareto distribution with
parameter eµ and λ and g0,a,λ
is the density of a Log-Gamma distribution.
(iii) Gamma Distribution admits a density gd(x) = xd−1
Γ(d) exp(−x)1R+(x) for d, x ∈ R+. Then,
for c > −d+ 1 we have gd ∈ L2 R+(x2c−1) ∩ L1
R+(xc−1) and
Γ(d) , t ∈ R.
(iv) Weibull Distribution admits a density gm(x) = mxm−1
exp(−xm)1R+(x) for m,x ∈ R+. For c > −m+ 1,Mc[gm] is
well-defined and
Mc[gm](t) = (c− 1 + it)
m Γ
exp(−(log(x)−µ)2/2λ2)1R+(x)
for λ, x ∈ R+ and µ ∈ R.Mc[gµ,λ] is well-defined for any c ∈ R and
it holds
Mc[gµ,λ](t) = exp(µ(c− 1 + it)) exp
( σ2(c− 1 + it)2
c : L2 R → L2
R+ (x2c−1) its inverse. If H ∈ L1 R ∩ L2
R then the inverse Mellin transform is explicitly expressed
through
M−1 c [H](x) =
1
2π
Furthermore, a Plancherel-type equation holds for the Mellin
transform. Precisely, for all h1, h2 ∈ L2
R+ (x2c−1) we have
h1, h2x2c−1 = (2π)−1Mc[h1],Mc[h2]L2 R
and h12 x2c−1 = (2π)−1Mc[h1]2
L2 R . (2.6)
Linear functional In the following paragraph we introduce the
linear functional, motivate it through a collection of examples and
determine sufficient conditions to ensure that the consid- ered
objects are well-defined. We then define an estimator based on the
empirical Mellin trans- form and the multiplication theorem for
Mellin transforms. Let c ∈ R and f ∈ L2
R+ (x2c−1). In the sequel we are interested in estimating the
linear functional
ϑ(f) := 1
Ψ(−t)Mc[f ](t)dt (2.7)
for a function Ψ : R→ C with Ψ(t) = Ψ(−t) for any t ∈ R and such
that ΨMc[f ] ∈ L1 R. The
slattern is fulfilled, if Ψ ∈ L2 R. Nevertheless a more detailed
analysis of the decay ofMc[f ]
and Ψ allows to ensure the integrability in a less restrictive
situation. Before we present an estimator for ϑ(f) let us briefly
illustrate our general approach by typical examples.
Illustration 2.2. We study in the sequel point-wise estimation at a
given point xo ∈ R+ in the following four examples.
(i) Density: Introducing the evaluation f(xo) of f at the point xo
ifMc[f ] ∈ L1 R then we have
f(xo) =M−1 c [Mc[f ]](xo) = ϑ(f) with Ψ(t) := x−c+ito , t ∈ R,
satisfying Ψ(t) = Ψ(−t)
for all t ∈ R.
(ii) Cumulative distribution function: Considering the evaluation F
(xo) = ∫ xo
0 f(x)dx of the
c.d.f. F at the point xo define for c < 1 the function ψ(x) :=
x1−2c1(0,xo)(x), x ∈ R+, which belongs to L1
R+(xc−1) ∩ L2 R+ (x2c−1). Setting
Ψ(t) :=Mc[ψ](t) =
x−c+itdx = (1− c+ it)−1x1−c+it o
we get ϑ(f) = F (xo) by an application of the Plancherel
equality.
(iii) Survival function: Introducing the evaluation S(xo) = ∫∞ xo
f(x)dx of the survival func-
tion S at the point xo define the function ψ(x) := x1−2c1(xo,∞)(x),
x ∈ R+, which for
6
R+ (x2c−1). Setting
Ψ(t) :=Mc[ψ](t) =
x−c+itdx = −(1− c+ it)−1x1−c+it o
we get ϑ(f) = S(xo) by an application of the Plancherel
equality.
(iv) Laplace transform: Given the evaluation L(xo) = ∫∞
0 exp(−xox)f(x)dx of the Laplace
transform L at the point xo define for c < 1 the function ψ(x)
:= x1−2c exp(−tox), x ∈ R+, which belongs to L1
R+(xc−1) ∩ L2 R+ (x2c−1). Setting
Ψ(t) :=Mc[ψ](t) =
∫ ∞ 0
x−c+it exp(−xox)dx = xc−1−it o Γ(1− c+ it)
we get ϑ(f) = L(xo) by an application of the Plancherel
equality.
It is worth stressing out that in all four examples introduced in
Illustration 2.2, the quantity of interest is independent of the
choice of the model parameter c ∈ R. However, the conditions on c ∈
R given Illustration 2.2 and the assumption f ∈ L2
R+(x2c−1) ensure that the represen- tation ϑ(f) is well-defined,
and hence are essential for our estimation strategy. Consequently,
we present the upcoming theory for almost arbitrary choices of c ∈
R.
Remark 2.3. Consider Illustration 2.2. Since S = 1 − F there is an
elementary connection between the estimation of the survival
function and the estimation of the c.d.f.. For example, we
eventually deduce from a c.d.f. estimator F (xo) a survival
function estimator S(xo) through S(xo) := 1− F (xo) with same risk,
that is EfY ((S(xo)−S(xo))
2) = EfY ((F xo)−F (xo)) 2).
Thus we can define for any c 6= 1 a survival function (respectively
c.d.f.) estimator using the results of (ii) and (iii) in
Illustration 2.2.
Estimation strategy To define an estimator of the quantity ϑ(f) we
make use of the mul- tiplication theorem (2.3) as it is common for
deconvolution problems. To do so, let f ∈ L2
R+(x2c−1)∩L1 R+(xc−1) and g ∈ L1
R+(xc−1) then we deduceMc[fY ](t) =Mc[f ](t)Mc[g](t)
for all t ∈ R by application of the multiplication theorem. Under
the mild assumption that Mc[g](t) 6= 0 for all t ∈ R we conclude
thatMc[f ](t) =Mc[fY ](t)/Mc[g](t) for all t ∈ R and rewrite (2.7)
into
ϑ(f) = 1
Mc[g](t) dt. (2.8)
A naive approach is to replace in (2.8) the quantity Mc[fY ] by its
empirical counterpart Mc(t) := n−1
∑n j=1 Y
c−1+it j , t ∈ R. However, the resulting integral is not
well-defined, since
7
ΨMc/Mc[g] is generally not integrable. We ensure integrability
introducing an additional spectral cut-off regularisation which
leads to the following estimator
ϑk := 1
Mc[g](t) dt for any k ∈ R+. (2.9)
The following proposition shows that the estimator is consistent
for suitable choice of the cut-off parameter k ∈ R+. We denote by
EnfY the expectation corresponding to the distribution PnfY of (Y1,
. . . , Yn) and use the abbreviation EfY := E1
fY . Analogously, we define Enf and Ef .
Proposition 2.4. For c ∈ R assume that f ∈ L2 R+ (x2c−1), ΨMc[f ] ∈
L1
R and EfY (Y 2(c−1)
1 ) <
EnfY ((ϑk − ϑ(f))2) 6 1[k,∞)ΨMc[f ]2 L1 R
+ EfY (Y
L1 R
EnfY ((ϑk − ϑ(f))2) 6 1[k,∞)ΨMc[f ]2 L1 R
+ g∞,x2c−1
Ψ(t) Mc[g](t)
2 dt. Choosing now a sequence of spectral cut-off parameters
(kn)n∈N such that kn → ∞ and
1[−kn,kn]Ψ/Mc[g]2 L1(R)n
−1 → 0 (respectively Ψ,g(kn)n−1 → 0) implies that ϑkn is a
con-
sistent estimator of ϑ(f), that is EnfY ((ϑkn − ϑ(f))2) → 0 for n →
∞. We note that the additional assumption, g∞,x2c−1 < ∞, is
fulfilled by many error densities and thus rather weak.
Remark 2.5. Despite the fact, that the first bound (2.10) only
requires a finite second moment of Y c−1
1 , we have in many cases Ψ,g(k)1[−k.k]Ψ/Mc[g]−2 L1 R → 0 for k →
∞, implying
that the bound of the variance term in (2.11) increases slower in k
than the bound presented in (2.10). It is worth stressing out, that
there exist cases where the opposite effect occurs. For instance
let the error U be lognormal-distribution with parameter µ = 0, λ =
1, see Example 2.1. Then supy∈R+ y2c−1g(y) = supz∈R
exp(2(c−1)z)√ 2πλ
exp(−(z − µ)2/2λ2)) < ∞.
Thus if E(X 2(c−1) 1 ) < ∞ both bounds are finite and following
the argumentation of Butucea
and Tsybakov [2008] one can see, that in the special case of
point-wise density estimation, the inequality presented in (2.10)
is more favourable than the inequality presented in (2.11).
For the upcoming theory, we will focus on the second bound of
Proposition 2.4. Assuming that g∞,x2c−1 <∞, allows us to state
that the growth of the second summand, also referred as variance
term, is determined by the growth of Ψ,g(k) as k going to
infinity.
8
The parametric case In this paragraph we determine when Proposition
2.4 implies a para- metric rate of the estimator. To be precise,
there are two scenarios only which occur.
(P) If supk∈R+ Ψ,g(k) = ΨMc[g]−12 L2(R) < ∞, i.e. the second
summand in (2.11) is
uniformly bounded in k and hence of order n−1. Then for all
sufficiently large values of k ∈ R+ the bias term is negligible
with respect to the parametric rate n−1.
(NP) If supk∈R+ Ψ,g(k) = ΨMc[g]−12 L2(R) = ∞, i.e. the second
summand is unbounded
and hence necessitates an optimal choice of parameter k ∈ R+
realising to squared-bias- variance trade-off.
Our aim is now to characterise when the case (P) occur. To do so,
we start by introducing a typical characterisation of the decay of
the error density and the decay of the function Ψ, starting with
the error density. Let us first revisit Example 2.1 to analyse the
decay of the presented densities.
Example 2.6 (Example 2.1 continued).
(i) Beta Distribution: For c > 0 and b ∈ N we haveMc[gb](t) =
∏b
j=1 j
and thus
cg,c(1 + t2)−b/2 6 |Mc[gb](t)| 6 Cg,c(1 + t2)−b/2 t ∈ R
where cg,c, Cg,c > 0 are positive constants only depending on g
and c.
(ii) Scaled Log-Gamma Distribution: For λ, a ∈ R+, µ ∈ R and c <
λ + 1 we have Mc[gµ,a,λ](t) = exp(µ(c− 1 + it))(λ− c+ 1− it)−a for
t ∈ R.
cg,c(1 + t2)−a/2 6 |Mc[gµ,a,λ](t)| 6 Cg,c(1 + t2)−a/2 t ∈ R
where cg,c, Cg,c > 0 are positive constants only depending on g
and c.
(iii) Gamma Distribution: For d ∈ R+ and c > −d + 1 we
haveMc[gd](t) = Γ(c+d−1+it) Γ(d)
for t ∈ R and thus
cg,c(1 + t2)(c+d−1.5)/2 exp(−|t|π/2) 6 |Mc[gd](t)| 6 Cg,c(1 +
t2)(c+d−1.5)/2 exp(−|t|π/2)
for t ∈ R where cg,c, Cg,c > 0 are positive constants only
depending on g and c.
(iv) Weibull Distribution: Letm ∈ R+ and c > −m+1 we
haveMc[gm](t) = (c−1+it) m
Γ ( c−1+it m
) for t ∈ R and thus
cg,c(1 + t2) 2c−2−m
2m exp(−|t|π 2m
) 6 |Mc[gm](t)| 6 Cg,c(1 + t2) 2c−2−m
2m exp(−|t|π 2m
)
for t ∈ R where cg,c, Cg,c > 0 are positive constants only
depending on g and c.
9
(v) Lognormal Distribution: Let λ ∈ R+, µ ∈ R and c ∈ R we have
Mc[gµ,λ](t) =
exp(µ(c− 1 + it)) exp ( λ2(c−1+it)2
2
) for t ∈ R and thus
cg,c exp(−λ2t2/2) 6 |Mc[gm](t)| 6 Cg,c exp(−λ2t2/2)
for t ∈ R where cg,c, Cg,c > 0 are positive constants only
depending on g and c.
Motivated by Example 2.1 we distinguish between smooth error and
supersmooth error densities staying in the terminology of Fan
[1991], Belomestny and Goldenshluger [2020] or Brenner Miguel et
al. [2021]. An error density g is called smooth if there exists a
γ, cg,c, Cg,c > 0 such that
cg(1 + t2)−γ/2 6 |Mc[g](t)| 6 Cg,c(1 + t2)−γ/2, t ∈ R ([G1])
and it is referred to as super smooth if there exists λ, ρ, cg,c,
Cg,c > 0 and γ ∈ R such that
cg,c(1 + t2)−γ/2 exp(−λ|t|ρ) 6 |Mc[g](t)| 6 Cg,c(1 + t2)−γ/2
exp(−λ|t|ρ), t ∈ R. ([G2])
On the other hand to calculate the growth of Ψ,g we specify the
decay of Ψ. Similar to the error density g we consider the case of
a smooth Ψ, i.e. there exists cΨ,c, CΨ,c > 0 and p > 0
such that
cΨ,c(1 + t2)−p/2 6 |Ψ(t)| 6 CΨ,c(1 + t2)−p/2, t ∈ R, ([Ψ1])
and a super smooth Ψ, i.e. there exists µ,R, cΨ,c, CΨ,c > 0 and
p ∈ R such that
cΨ,c(1 + t2)−p/2 exp(−µ|t|R) 6 |Mc[Ψ](t)| 6 CΨ,c(1 + t2)−p/2
exp(−µ|t|R), t ∈ R. ([Ψ2])
As we see in the following Illustration the examples of Ψ
considered in Illustration 2.2 do fit into these two cases.
Illustration 2.7 (Illustration 2.2 continued).
(i) Point-wise density estimation: We have that |Ψ(t)| = x−co and
thus p = 0 in sense of [Ψ1].
(ii) Point-wise cumulative distribution function estimation: We
have that |Ψ(t)| = x1−c o√
(1−c)2+t2
and thus p = 1 in sense of [Ψ1].
(iii) Point-wise survival function estimation: We have that |Ψ(t)|
= x1−c o√
(1−c)2+t2 and thus p = 1
in sense of [Ψ1].
(iv) Laplace transform estimation: We have that |Ψ(t)| = tc−1 o
|Γ(1 − c + it)| and thus p =
1− 2c, µ = π/2 and R = 1 in the sense of [Ψ2].
10
After the introduction of the typical terminology for deconvolution
settings we can state when the function Ψ,g is bounded. We
summarize the collection of scenarios in the following
Proposition.
Proposition 2.8. Assume that for a c ∈ R holds f ∈ L2 R+ (x2c−1),
ΨMc[f ] ∈ L1
R, σ =
Ef (X2(c−1) 1 ) <∞ and g∞,x2c−1 <∞. Then for the cases
(i) [Ψ1] and [G1] with 2p− 2γ > 1;
(ii) [Ψ2] and [G1] or
(iii) [Ψ2] and [G2] with (R > ρ), (R = ρ, µ > λ) or (R = ρ, µ
= λ, 2p− 2γ > 1)
we get that supk∈R+ Ψ,g <∞. Furthermore, for all k ∈ R
sufficiently large we have
EnfY ((ϑk − ϑ(f))2) 6 C(Ψ, g, σ)
n .
The proof of Proposition 2.8 is a straight forward calculus and
thus omitted. For our four examples of Ψ we get a parametric rate
for the estimation of the survival function and cumulative
distribution function if the error density fulfils [G1] with γ <
1/2 and a parametric rate for the estimation of the Laplace
transform if the error density fulfils [G1] with γ > 0 or if g
fulfils [G2] with (ρ < 1), (ρ = 1, λ < π/2),(ρ = 1 or λ =
π/2, γ < −c).
The non-parametric case We now focus on the case (NP), that is
supk∈R+ Ψ,g(k) = ∞, which occurs in several situations. In this
scenario the first summand of Proposition 2.4 is decreasing in k
while the second summand is increasing and unbounded. A choice of
the parameter k ∈ R+ realising an optimal trade-off is thus
non-trivial. We therefore define a data-driven procedure for the
choice of the parameter k ∈ R+ inspired by the work of
Goldenshluger and Lepski [2011]. In fact, let us reduce the set of
possible parameters to Kn := {k ∈ N : g∞,x2c−1Ψ,g(k) 6
n, k 6 n1/2(log n)−2} and denote Kn = maxKn. We further introduce
the variance term up to a (log n)-term
V (k) := χg∞,x2c−1σΨ,g(k)(log n)n−1
where χ > 0 is a numerical constant which is specified below and
σ := Ef (X2(c−1) 1 ). Based
on a comparison of the estimators constructed above an estimator of
the bias term is given by
A(k) := sup k′∈Kk,KnK
((ϑk′ − ϑk)2 − V (k′))+
where Ka, bK := (a, b] ∩ N for a, b ∈ R+. Analogously, we define
Ja, bK = [a, b] ∩ N and Ja, bJ = [a, b) ∩ N. Since the term σ in V
(k) depends on the unknown density f , and hence it
11
is itself unknown, we replace it by the plug-in estimator σ := 1
n
∑n j=1
we estimate V (k) and A(k) by
V (k) := 2χg∞,x2c−1σΨ,g(k) log(n)n−1 and A(k) := sup
k′∈Kk,KnK
((ϑk′ − ϑk)2 − V (k′))+.
Below we study the fully data-driven estimator ϑk of ϑ(f)
with
k := arg min k∈Kn
(A(k) + V (k)).
Theorem 2.9. For c ∈ R assume that f ∈ L2 R+ (x2c−1), ΨMc[f ] ∈
L1
R, EfY (Y 8(c−1)
1 ) < ∞ and g∞,x2c−1 <∞. Then for χ > 72 holds
EfY ((ϑ(f)− ϑk) 2) 6 C1 inf
k∈Kn (1[k,∞)ΨMc[f ]2
L1 R
n
where C1 is a positive numerical constant and C2 is a positive
constant depending on Ψ, g, EfY (Y 8(c−1)).
The proof of Theorem 2.9 is postponed to the appendix. Let us
shortly comment on the moment assumptions of Theorem 2.9. For c ∈ R
close to one, the apparently high moment assumption EfY (Y
8(c−1) 1 ) <∞ is rather weak. For the point-wise density
estimation, compare
Illustration 2.2, this assumption is always true if c = 1. For the
point-wise survival function estimation (respectively. cumulative
distribution function estimation), c = 1 cannot be full- filled but
arbitrary values of c ∈ R close to one are possible. As already
mentioned, for the pointwise density estimation the assumption
ΨM1[f ] ∈ L1
R implies that M1[f ] ∈ L1 R. For
c = 1, we see that g∞,x <∞ is fullfilled for many examples of
error densities.
3 Minimax theory
In the following section we develop the minimax theory for the
plug-in spectral cut-off esti- mator under the assumptions [G1] and
[Ψ1]. Over the Mellin-Sobolev spaces we derive an upper for all
linear functional satisfying assumption [Ψ1]. We state a lower
bound for each of the cases (i)-(iii) of Illustration 2.2
separately, that is point-wise estimation of the density f , the
survival function S and the cumulative distribution function F . We
finish this section, by motivating the regularity spaces through
their analytically implications.
Upper bound Let us restrict to the scenario where [G1] and [Ψ1]
holds for 2p − 2γ 6 1. Here one can state that there exist a
constant CΨ,g > 0 such that Ψ,g(k) 6 CΨ,gk
2γ−2p+1.
12
Now let us consider the bias term. To do so, we introduce we
Mellin-Sobolev spaces at the development point c ∈ R by
Ws c(R+) := {h ∈ L2
R+ (x2c−1) : |h|2s,c := (1 + t2)s/2Mc[h]L2 R <∞} (3.1)
with corresponding ellipsoids Ws c(L) := {h ∈ Ws
c(R+) : |h|2s,c < L}. We denote the subset of densities by
Ds,c,L R+ := {f ∈Ws
c(L) : f is a density,Ef (X2c−2 1 ) 6 L}. (3.2)
Using this construction we get the following result as a direct
consequence.
Theorem 3.1. Assume that for a c ∈ R [G1] holds for g and [Ψ1 ] for
Ψ. Additionally, assume that g∞,x2c−1 < ∞. Setting for any s
> 1/2 − p the cut-off parameter to kn :=
n1/(2s+2γ) implies then
EfY ((ϑkn − ϑ(f))2) 6 CL,s,Ψ,g,c n −(2s+2p−1)/(2s+2γ)
where CL,s,Ψ,g,c > 0 is a constant depending on L, s, p,Ψ, γ, c
and g∞,x2c−1 .
Proof of Theorem 3.1. Evaluating the upper bound in Proposition 2.4
under [G1] and [Ψ1 ] we have g∞,x2c−1σψ,g(k)n−1 6 CΨ,g,L
k2γ−2p+1
6 CL,c
|Ψ(t)|2(c2 + t2)−sdt 6 CL,c,s,ψk −2s−2p+1.
Now choosing kn := n1/(2s+2γ) balances both term leading to the
rate n−(2s+2p−1)/(2s+2γ).
The assumption s > 1/2 − p implies that ΨMc[f ] ∈ L1 R by a
simple calculus which can
be found in proof of Theorem 3.1 in the appendix. Before
considering the lower bounds let us illustrate the last Theorem
using our examples (i) to (iii) of Illustration 2.2.
Illustration 3.2.
(i) Point-wise density estimation: Since p = 0 we assume that s
> 1/2 = 1/2 − p. In this scenario Theorem 3.1 implies
sup f∈Ds,c,L
−(2s−1)/(2s+2γ).
(ii) Point-wise cumulative distribution function estimation: We
have p = 1 and hence for any s > 0 holds s > 1/2 − p. Recall
that for γ < 1/2 we are in the parametric case where we choose k
∈ R+ sufficiently large. For γ > 1/2 we deduce from Theorem 3.1
for any c < 1 that
sup f∈Ds,c,L
−(2s−1)/(2s+2γ).
13
(iii) Point-wise survival function estimation: We have p = 1 and
hence for any s > 0 holds s > 1/2 − p. Recall that for γ <
1/2 we are in the parametric case where we choose k ∈ R+
sufficiently large. For γ > 1/2 we deduce from Theorem 3.1 for
any c < 1 that
sup f∈Ds,c,L
−(2s−1)/(2s+2γ).
In example (i) the sign of c has a strong impact on the upper
bound. In fact, for c > 0 it appears that the estimation in a
point xo close to 0 is harder than for bigger values of xo. The
case for c < 0 has an opposite effect. Further in (ii) and
(iii), i.e. estimating the survival function and the c.d.f.
estimation, the estimator of the c.d.f. seems to have a better
behaviour close to 0 than the survival function estimator. We
stress out, that in Illustration 2.2 we already mention that one
can use an estimator for the survival function to construct an
estimator for the c.d.f and vice versa. The results of Illustration
3.2 suggests to estimate the survival function directly or using
the c.d.f. estimator, according if xo ∈ R+ is close to 0 or
not.
Remark 3.3. Belomestny and Goldenshluger [2020] derive for
point-wise density estimation a rate of n−2s/(2s+2γ+1) under
similar assumptions on the error density g. However, they con-
sider Hölder-type regularity classes rather than Mellin-Sobolev
spaces which are of a global nature. Even if the rates in
Illustration 3.2 seem to be less sharp compared to Belomestny and
Goldenshluger [2020], they cannot be improved as shown by the lower
bounds below.
Additionally, if γ > 1 we have that kn := n1/(2s+2γ) 6 k1/2 and
thus kn ∈ Kn. We can deduce the following Corollary using the
similar arguments of the proof of Theorem 3.1 on Theorem 2.9. We
therefore omit its proof.
Corollary 3.4. Assume that for a c ∈ R holds [G1] holds for g and
[Ψ1 ] for Ψ. Further let EfY (Y
8(c−1) 1 ), g∞,x2c−1 <∞ and f ∈ Ds,c,L
R+ for any s > 1/2− p. Then
EfY ((ϑk − ϑ(f))2) 6 Cf,g,Ψ log(n)n−(2s+2p−1)/(2s+2γ)
where Cf,g,Ψ > 0 is a constant depending on L,s,p,Ψ,γ,c, EfY (Y
8(c−1)
1 ) and g∞,x2c−1 .
To state that the presented rates of Theorem 3.1 cannot be improved
over the whole Mellin- Sobolev ellipsoids, we give a lower bound
result for the cases (i)-(iii) in the following section.
Lower bound For the following part, we will need to have an
additionally assumption on the error density g. In fact, we will
assume that g has bounded support, that is g(x) = 0 for x > d, d
∈ R+. For the sake of simplicity we will say that d = 1. Further we
assume that there exists c′g, C
′ g ∈ R+ such that
c′g(1 + t2)−γ/2 6 |M1/2[g](t)| 6 C ′g(1 + t2)−γ/2 for |t| → ∞.
([G1’])
14
For technical reasons we will restrict ourselves to the case of c
> 1/2.
Theorem 3.5. Let s, γ ∈ N , assume that [G1] and [G1’] holds . Then
there exist constants Cg,xo,i, Ls,g,xo,c,i > 0,i ∈ J3K, such
that
(i) Point-wise density estimation: for all L > Ls,g,xo,c,1, n ∈
N and for any estimator f(xo)
of f(xo) based on an i.i.d. sample (Yj)j∈J1,nK,
sup f∈Ds,c,L
−(2s−1)/(2s+2γ).
(ii) Point-wise survival function estimation: for all L >
Ls,g,xo,c,2, n ∈ N and for any estima- tor S(xo) of S(xo) based on
an i.i.d. sample (Yj)j∈J1,nK,
sup f∈Ds,c,L
−(2s+1)/(2s+2γ).
(iii) Point-wise cumulative distribution function estimation: for
all L > Ls,g,xo,c,3, n ∈ N and for any estimator F (xo) of F
(xo) based on an i.i.d. sample (Yj)j∈J1,nK,
sup f∈Ds,c,L
−(2s+1)/(2s+2γ).
We want to stress out that in the multiplicative censoring model,
the family (gk)k∈N of Beta(1,k) densities fulfils both assumption
[G1] and [G1’].
Regularity assumptions While in the theory of inverse problems the
definition of the Mellin- Sobolev spaces is quite natural, we want
to stress out that elements of these spaces can be char- acterised
by their analytical properties. In Brenner Miguel et al. [2021] one
can find a charac- terisation of Ws
1(R+). Since the generalisation for the spaces Ws c(R+) is straight
forward, we
only state the result while the proof for the case c = 1 can be
found in Brenner Miguel et al. [2021].
Proposition 3.6. Let s ∈ N. Then f ∈ Ws c(R+) if and only if f is s
− 1-times continuously
differentiable where f (s−1) is locally absolutely continuous with
derivative f (s) and ωjf (j) ∈ L2
R+ (ω2c−1) for all j ∈ J0, sK.
15
Appendix
A Proofs of section 2
Usefull inequality The next inequality was is state in the
following form in Comte [2017] based on a similar formulation in
Birgé and Massart [1998].
Lemma A.1. (Bernstein inequality) LetX1, . . . , Xn independent
random variables and Tn(X) :=∑ j=1(Xi − E(Xi)). Then for η >
0,
P(|Tn(X)− E(Tn(X))| > nη) 6 2 exp(− nη
2
2
2
i=1 E(|Xm i |) 6 m!
2 v2bm−2 for all m > 2. If the Xi are identically distributed,
the
previuos condition can be replaced by Var(X1) 6 v2 and |X1| 6
b.
Proof of Proposition 2.4. Let us denote for any k ∈ R+ the
expectation ϑk := EnfY (ϑk) which leads to the usual squared
bias-variance decomposition
EnfY ((ϑk − ϑ(f))2) = (ϑk − ϑ(f))2 + VarnfY (ϑk). (A.1)
Consider the first summand in (A.1)- An application of the
Fubini-Tonelli theorem implies
(ϑk − ϑ(f))2 =
6 1[k,∞)ΨMc[f ]2 L1 R
Study the the second term in (A.1). Then the bound in (2.10)
follows then by the following inequality
VarnfY (ϑk) 6 1
1
VarnfY (ϑk) 6 1
Mc[g](t) dt
2πn
∫ k
−k
Ψ(t)
2 dt. Furthermore we have for any y > 0 that
y2c−1fY (y) =
1 ).
16
Proof of Theorem 2.9. Let us set ϑ := ϑ(f). By the definition of k
follows for any k ∈ Kn
(ϑ− ϑk) 2 6 2(ϑ− ϑk)2 + 2(ϑk − ϑk)
2
6 2(ϑ− ϑk)2 + 2(ϑk − ϑk∧k) 2 + 2(ϑk∧k − ϑk)
2
6 2(ϑ− ϑk)2 + 4(A(k) + V (k)).
Consider A(k) we have by a straight forward calculus A(k) 6 A(k) +
supk′∈Kn(V (k′) − V (k′))+ and thus
A(k) 6 sup k′∈Kk,KnK
( 3(ϑk‘ − ϑk′)2 + 3(ϑk − ϑk)2 − V (k′)
) +
(ϑk − ϑk′)2.
By the monoticity of V (k) we deduce that for k < k′ holds V (k)
> 1 2 V (k) + 1
2 V (k′) which
simplifies the term to
( (ϑk‘ − ϑk′)2 − 1
(ϑk − ϑk′)2
while the lattern summand can be bounded for any k′ ∈ Jk,KnJ
by
(ϑk − ϑk′)2 6
|Ψ(t)Mc[f ](t)|dt)2.
Further we have that (ϑ− ϑk)2 6 2(ϑ− ϑk)2 + 2(ϑk − ϑk)2 which
implies
(ϑ− ϑk) 2 6
) + 4 sup
+ 26 sup k′∈Jk,KnK
( (ϑk′ − ϑk′)2 − 1
) +
.
To control the last term we split the centred arithmetic mean ϑk −
ϑk into two terms, applying at one term a Bernstein inequality, cf
lemma A.1, and standard techniques on the other term. For a
positive sequence (cn)n∈N and t ∈ R introduce
Mc(t) = n−1
n∑ j=1
c−1 j ) + Y c−1+it
j 1(cn,∞)(Y c−1 j )) =: Mc,1(t) + Mc,2(t).
Split the centred arithmetic mean ϑk−ϑk = νk,1 +νk,2 where νk,i :=
1 2π
∫ k −k
(Mc,i(t)− EfY (Mc,i(t)))dt. Thus we have
(ϑ− ϑk) 2 6
) + 4 sup
+ 52 sup k′∈Kn
ν2 k′,2.
The claim of the theorem follows thus by the following lemma.
17
Lemma A.2. Under the assumptions of Theorem 2.9 with cn := √
n1/2σg∞,x2c−1 log(n)/42
hold
( ν2 k′,1 −
n ,
ν2 k′,2) 6
(V (k′)− V (k′))+) 6 C(E(X
4(c−1) 1 , σ)
Proof of Lemma A.2. To prove (i). we see that
EfY ( sup k′∈Kn
12 + x)dx.
Now our aim is to apply the Bernstein inequality Lemma A.1. To do
so, defining for y > 0
the function hk(y) := 1 2π
∫ k −k
yitdt leads to
c−1 j )hk(Yj)− EfY (Y c−1
1 1(0,cn)(Y1)hk(Y1))
Ψ(t) Mc[g](t)
dt 6√kΨ,g(k) implying |Y c−1 j 1(0,cn)(Y
c−1 j )hk(Yj)| 6
cn √ kΨ,g(k) =: b. Further,
VarfY (Y c−1 1 1(0,cn)(Y1)hk(Y1)) 6 EfY (Y 2c−2
1 h2 k(Y1)) 6 x2c−1g∞σΨ,g(k) =: v.
Therefore the Bernstein inequality yields, for any x > 0
PfY (|νk,1| > √ V (k)
4v ( V (k)
√ V (k)
12 + √ x))
using the concavity of the square root. We have thus to bound the 4
upcoming terms. In fact
n
4v
(V (k) 12
n
8b
8cn √ kΨ,g(k)12
28cn
√ n1/2
k >
3
2 log(n)
by definition of Kn and cn = √ n1/2σg∞,x2c−1 log(n)/42. In analogy
we can show that
n
8b =
> 5 √
( √
√ x log(n)(σg∞,x2c−1)−1). Thus
we conclude
n .
For part (ii) we have |νk′,2| 6 (2π)−1 ∫ k′ −k′
|Ψ(t)||Mc[g](t)|−1|Mc,2(t)−EfY (Mc,2(t))|dt im-
plying with the Cauchy Schwartz inequality that
EfY ( sup k′∈Kn
1
2π
∫ Kn
−Kn
Ψ(t)
1 1(cn,∞)(Y c−1
6 CΨ,gn 1/2EfY (Y (c−1)(2+u))c−un .
for any u ∈ R+. Choosing u = 6 leads to EfY (supk′∈Jk,KnK ν 2 k′,2)
6 CΨ,g,σEfY (Y
8(c−1) 1 )n−1.
To show inequality (iii), we first define the event := {|σ − σ|
< σ 2 }. Then on we have
σ 2 6 σ 6 3
2 σ. Which implies that V (k) 6 V (k) 6 3V (k) and
EfY ( sup k′∈Kn
(V (k′)− V (k′))+) 6 2χEfY (|σ − σ|1c) 6 2χ VarfY (σ)
σ
by application of the Cauchy-Schwartz and the Markov inequality.
This implies the claim.
B Proofs of section 3
Proof of Theorem 3.5. First we outline here the main steps of the
proof. We will construct propose two densities fo, f1 in
Ds,c,L
R+ by a perturbation with a small bump, such that the differ- ence
(ϑ(f1)− ϑ(f2))2 and the Kullback-Leibler divergence of their
induced distributions can
19
be bounded from below and above, respectively. The claim follows
then by applying Theorem 2.5 in Tsybakov [2008]. We use the
following construction, which we present first. We set fo(x) :=
exp(−x) for x ∈ R+. Let C∞c (R+) be the set of all infinitely
differen- tiable functions with compact support in R+ and let ψ ∈
C∞c (R+) be a function with support in [−1, 1],
∫ 1
−1 ψ(x)dx = 0, ψ(γ−1)(0) 6= 0 , ψ(γ)(0) 6= 0 and define for j ∈ No
the fi-
nite constant Cj,∞ := max(ψ(l)∞,x0 , l ∈ J0, jK). For each xo ∈ R+
and h ∈ (0, xo/2)
(to be selected below) we define the bump-function ψh,xo(x) :=
ψ(x−xo h
), x ∈ R. Let us further define the operator S : C∞c (R) → C∞c (R)
with S[f ](x) = −xf (1)(x) for all x ∈ R and define S1 := S and Sn
:= S Sn−1 for n ∈ N, n > 2. Now, for j ∈ N, set ψj,h,xo(x) :=
Sj[ψh,xo ](x) = (−1)j
∑j i=1 ci,jx
ih−iψ(i)(x−xo h
) for x ∈ R+ and ci,j > 1. For a bump-amplitude δ > 0 and γ ∈
N we define
f1(x) = fo(x) + δhγ+s−1/2ψγ,h,yo(x)x−1, x ∈ R+. (B.1)
The corresponding survival function So of fo is given by So(x) =
exp(−x), for x ∈ R+, while Fo(x) = 1 − exp(−x). The resulting
survival function and cumulative distribution functions F1 and S1
of f1 are then given by
S1(x) = So(x) + δhγ+s−1/2ψγ−1,h,yo(x), x ∈ R+
F1(x) = Fo(x)− δhγ+s−1/2ψγ−1,h,yo(x), x ∈ R+.
To ensure that S1, respectively F1, is a survival function,
respectively a cumulative distribution function, it is sufficient
to show that f1 is a density.
Lemma B.1. For any 0 < δ < δo(ψ, γ, xo) :=
exp(−3xo/2)(3xo/2)−γ(Cγ,∞cγ) −1 the func-
tion f1, defined in eq. B.1, is a density, where cγ = ∑γ
i=1 ci,γ .
Further one can show that these functions all lie inside the
ellipsoids Ds,c,L R+ for L big
enough. This is captured in the following lemma.
Lemma B.2. Let s ∈ N and c > 1/2. Then, for all L >
Ls,c,γ,δ,ψ,xo > 0 holds fo and f1, as in (B.1), belong to
Ds,c,L
R+ .
For sake of simplicity we denote for a function ∈ L2 R+ the
multiplicative convolution
with g by := [ ∗ g].
Lemma B.3. Let h 6 ho(ψ, γ). Then
1. (S1(xo)− So(xo))2 = (F1(xo)− Fo(xo))2 > c2γ−1,γ−1
2 δ2ψ(γ−1)(0)2h2s+1
2. (f1(xo)− fo(xo)) > c2γ,γ
2 δ2ψ(γ)(0)h2s−1 and
20
3. KL(f1, f0) 6 C(g, xo, fo)ψ2δ2h2s+2γ where KL is the
Kullback-Leibler-divergence.
Selecting h = n−1/(2s+2γ), it follows
1
M
M
where C(2) g,yo,ψ,fo,δ
< 1/8 for all if δ 6 δ1(g, yo, ψ, fo). Thereby, we can use
Theorem 2.5 of Tsybakov [2008], which in turn for any estimator f
of f implies
sup f∈Ds,c,L
) > c > 0;
) > c > 0 and
sup f∈Ds,c,L
2 > C(1) ψ,δ,γ
) > c > 0.
Note that the constant C(1) ψ,δ,γ does only depend on ψ, γ and δ,
hence it is independent of the
parameters s, L and n. The claim of Theorem 3.5 follows by using
Markov’s inequality, which completes the proof.
Proofs of the lemmata
Proof of Lemma B.1. For any h ∈ C∞c (R+) holds S[h] ∈ C∞c (R+) and
thus Sj[h] ∈ C∞c (R)
for any j ∈ N. Further for h ∈ C∞c (R+) holds ∫∞ −∞ h
(1)(x)dx = 0 which implies that for any δ > 0 and we have
∫∞ 0 f1(x)dx = 1.
By construction (B.1) the function ψh,xo has support supp(ψh,xo) in
[xo/2, 3xo/2]. Since supp(S[h]) ⊆ supp(h) for all h ∈ C∞c (R+) the
function ψγ,h,xo has support in [xo/2, 3xo/2]
too. First, for x /∈ [xo/2, 3xo/2] holds f1(x) = exp(−x) > 0.
Further for x ∈ [xo/2, 3xo/2]
holds
f1(x) = fo(x) + δhs+γ−1/2x−1ψγ.h.yo(x) > exp(−3xo/2)−
δ(3xo/2)γCγ,∞cγ
since ψj,h,xo∞ 6 (3xo/2)jCj,∞cjh −j for any s > 1 and j ∈ N
where cj :=
∑j i=1 ci,j .
Now choosing δ 6 δo(ψ, γ) := exp(−3xo/2)(3xo/2)−γ(Cγ,∞cγ) −1
ensures f1(x) > 0 for all
x ∈ R+.
Proof of Lemma B.2. Our proof starts with the observation that for
all t ∈ R and c > 0 that
Mc[fo](t) ∼ tc−1/2 exp(−π/2|t|), |t| > 2,
21
by applying the Stirling formula, compare Belomestny and
Goldenshluger [2020]. Thus for every s ∈ N there exists Ls,c such
that |fo|2s,c 6 L for all L > Ls,c. Next we consider |fo −
f1|s,c. We have |fo − f1|2s = δ2h2s+2γ−1|ω−1ψγ,h,xo |2s,c where | .
|s,c is defined in (3.1). Now since supp(ψγ,h,xo) ⊂ [xo/2, 3xo/2]
and ψγ,h,xo ∈ C∞c (R+) we have that its Mellin transform is
well-defined for any c ∈ R. By a integration by parts we see that
for any φ ∈ C∞c (R+) and t, c ∈ R holds
Mc[S[φ]](t) = (c+ it)Mc[φ](t)
and thus |Mc−1[ψγ+s,h,xo ](t)|2 = ((c− 1)2 + t2)s|Mc−1[ψγ,h,xo
](t)|2 and thus
|ω−1ψγ,h,xo|2s,c 6 Cc
∫ ∞ −∞ |Mc−1[ψγ+s,h,xo ](t)|2dt = Cc
∫ 3xo/2
xo/2
x2c−3|ψγ+s,h,xo(x)|2dx
by the Parseval formula, cf eq. 2.6, which implies that |ω−1ψγ,h,yo
|2s,c 6 Cc,xoψγ+s,h,xo2. Now applying the Jensen inequality leads
to
ψγ+s,h,xo2 6 Cγ,s
γ+s∑ j=1
)2dx 6 Cγ,s,xoh −2γ−2s+1C∞,γ+s.
Thus |fo−f1|2s,c 6 C(c,s,γ,δ,ψ,xo) and |f1|2s,c 6 2(|fo−f1|2s,c+
|f1|2s,c) 6 2(C(c,s,γ,δ,ψ) +Ls,c) =:
Ls,c,γ,δ,ψ,xo,1. Now let us consider the moment condition. First we
see that ∫∞
0 x2(c−1)fo(x) =
δhs+γ−1/2
xo/2
)dx
6 Cs,c,γ,δ,ψ,xo
Thus we have Efo(X2c−2),Ef1(X2(c−1)) 6 Cc + Cs,c,γ,δ,ψ,xo =:
Ls,c,γ,δ,ψ,xo,2 Choosing now Ls,c,γ,δ,ψ,xo =
max(Ls,c,γ,δ,ψ,xo,1.Ls,c,γ,δ,ψ,xo,2) shows the claim.
Proof of Lemma B.3. First we see that (So(xo)− S1(xo))
2 = (Fo(xo)− F1(xo)) 2 = δ2h2s+2γ−1(ψγ−1,h,xo(xo))
2 and that (ψγ−1,h,xo(xo))
2 = ∑γ−1
γ−1,γ−1h −2γ+2ψ(γ−1)(0)2.
For h small enough we thus
(So(xo)− S1(xo))) 2 >
2s+1
for h < ho(γ, ψ). In analogy, we can show that
(fo(xo)− f1(xo)) 2 = δ2h2γ+2s−1(ψγ,h,xo(xo))
2 > cψ,γh 2s−1.
22
For the second part we have by using KL(f1, fo) 6 χ2(f1, fo) := ∫
R+(f1(x)−fo(x))2/fo(x)dx
it is sufficient to bound the χ-squared divergence. We notice that
fθ − fo has support in [0, 3xo/2] since f1 − fo has support in
[xo/2, 3xo/2] and g has support in [0, 1] In fact for x > 3xo/2
holds fθ(y) − fo(y) =
∫∞ y
(fθ − fo)(x)x−1g(y/x)dx = 0. Since fo is monotone
decreasing we can deduce that fo is montone decreasing since for x2
> x1 ∈ R+ holds
fo(x2) =
∫ 1
0
g(x)x−1fo(x1/x)dx = fo(x1)
since the integrand is strictly positive. We conclude therefore
that there exists a constant cfo,xo,g > 0 such that fo(x) >
cfo,xo,g > 0 for all x ∈ (0, 3xo/2). Thus
χ2(f1, fo) 6 fo(3xo/2)−1f1 − fo2 = fo(3xo/2)−1δ2h2s+2γ−1ω−1ψγ,h,xo
2.
Let us now consider ω−1ψγ,h,xo2. In the first step we see by
application of the Plancherel, cf. 2.6, that ω−1ψγ,h,xo2 = 1
2π
∫∞ −∞ |M1/2[ω−1ψγ,h,xo ](t)|2dt. Now for t ∈ R, we see by us-
ing the multiplication theorem for Mellin transforms
thatM1/2[ω−1ψγ,h,xo ](t) =M1/2[g](t) · M1/2[ω−1Sγ[ψh,xo ]](t).
Again we haveM1/2[ω−1Sγ[ψh,xo ]](t) = (−1/2+it)γM−1/2[ψh,xo ](t).
Together with assumption [G1’] we get
ω−1ψγ,h,yo 2 6
C1(g)
2π
∫ ∞ −∞ |M−1/2[ψh,yo ](t)|2dt = C1(g)ω−1ψh,yo2 6 C(g, yo)hψ2.
Since M > 2K we have thus KL(fθ(j) , fθ(0)) 6 C(g,yo)ψ2
fo(3yo/2) δ2h2s+2γ.
References
K. E. Andersen and M. B. Hansen. Multiplicative censoring: density
estimation by a series expansion approach. Journal of Statistical
Planning and Inference, 98(1-2):137–155, 2001.
M. Asgharian and D. B. Wolfson. Asymptotic behavior of the
unconditional npmle of the length-biased survivor function from
right censored prevalent cohort data. The Annals of Statistics,
33(5):2109–2131, 2005.
D. Belomestny and A. Goldenshluger. Nonparametric density
estimation from observations with multiplicative measurement
errors. In Annales de l’Institut Henri Poincaré, Probabil- ités et
Statistiques, volume 56, pages 36–67. Institut Henri Poincaré,
2020.
D. Belomestny, F. Comte, and V. Genon-Catalot. Nonparametric
Laguerre estimation in the multiplicative censoring model.
Electronic Journal of Statistics, 10(2):3114–3152, 2016.
23
L. Birgé and P. Massart. Minimum contrast estimators on sieves:
exponential bounds and rates of convergence. Bernoulli,
4(3):329–375, 1998. ISSN 1350-7265. doi: 10.2307/3318720. URL
https://doi.org/10.2307/3318720.
S. Brenner Miguel. Anisotropic spectral cut-off estimation under
multiplicative measurement errors. Preprint arXiv:2107.02120,
2021.
S. Brenner Miguel and Phandoidaen. Multiplicative deconvolution in
survival analysis under dependency. Preprint arXiv:2107.05267,
2021.
S. Brenner Miguel, F. Comte, and J. Johannes. Spectral cut-off
regularisation for density estimation under multiplicative
measurement errors. Electronic Journal of Statistics, 15(1):
3551–3573, 2021.
E. Brunel, F. Comte, and V. Genon-Catalot. Nonparametric density
and survival function estimation in the multiplicative censoring
model. Test, 25(3):570–590, 2016.
C. Butucea and F. Comte. Adaptive estimation of linear functionals
in the convolution model and applications. Bernoulli, 15(1):69–98,
2009.
C. Butucea and A. B. Tsybakov. Sharp optimality in density
deconvolution with dominating bias. i. Theory of Probability &
Its Applications, 52(1):24–39, 2008.
F. Comte. Nonparametric estimation. Master and Research.
Spartacus-Idh, Paris, 2017.
F. Comte and C. Dion. Nonparametric estimation in a multiplicative
censoring model with symmetric noise. Journal of Nonparametric
Statistics, 28(4):768–801, 2016.
H. W. Engl, M. Hanke-Bourgeois, and A. Neubauer. Regularization of
inverse problems. Kluwer Acad. Publ., 2000.
J. Fan. On the optimal rates of convergence for nonparametric
deconvolution problems. The Annals of Statistics, pages 1257–1272,
1991.
A. Goldenshluger and O. Lepski. Bandwidth selection in kernel
density estimation: Ora- cle inequalities and adaptive minimax
optimality. The Annals of Statistics, 39:1608–1632, 2011.
G. Mabon. Adaptive deconvolution of linear functionals on the
nonnegative real line. Journal of Statistical Planning and
Inference, 178:1–23, 2016.
A. Meister. Density deconvolution. In Deconvolution Problems in
Nonparametric Statistics, pages 5–105. Springer, 2009.
R. B. Paris and D. Kaminski. Asymptotics and mellin-barnes
integrals, volume 85. Cambridge University Press, 2001.
M. Pensky. Minimax theory of estimation of linear functionals of
the deconvolution density with or without sparsity. The Annals of
Statistics, 45(4):1516–1541, 2017.
A. B. Tsybakov. Introduction to nonparametric estimation. Springer
Publishing Company, Incorporated, 2008.
B. Van Es, C. A. Klaassen, and K. Oudshoorn. Survival analysis
under cross-sectional sam- pling: length bias and multiplicative
censoring. Journal of Statistical Planning and Infer- ence,
91(2):295–312, 2000.
Y. Vardi. Multiplicative censoring, renewal processes,
deconvolution and decreasing density: nonparametric estimation.
Biometrika, 76(4):751–761, 1989.
Y. Vardi and C.-H. Zhang. Large sample study of empirical
distributions in a random- multiplicative censoring model. The
Annals of Statistics, pages 1022–1039, 1992.
25