Estimation of a delta-contaminated density of a random intensity...

20
Statistics Surveys Vol. 0 (2006) 1–8 ISSN: 1935-7516 Estimation of a delta-contaminated density of a random intensity of Poisson data Daniela De Canditiis and Marianna Pensky I.A.C. “Mauro Picone” National Research Council via dei Taurini, 19 Rome, Italy and Department of Mathematics and Statistics University of Central Florida Orlando FL 32816-1353, USA e-mail: [email protected]; [email protected] Abstract: In the present paper, we constructed an estimator of a delta contaminated mixing density function g(λ) of an intensity λ of the Poisson distribution. The estimator is based on an expansion of the continuous portion g 0 (λ) of the unknown pdf over an overcomplete dictionary with the recovery of the coefficients obtained as the solution of an optimization problem with Lasso penalty. In order to apply Lasso technique in the, so called, prediction setting where it requires virtually no assumptions on the dictionary and, moreover, to ensure fast convergence of Lasso estimator, we use a novel formulation of the optimization problem based on the inversion of the dictionary elements. We formulate conditions on the dictionary and the unknown mixing density that yield a sharp oracle inequality for the norm of the difference between g 0 (λ) and its estimator and, thus, obtain a smaller error than in a minimax setting. Numerical simulations and comparisons with the Laguerre functions based estimator recently constructed by Comte and Genon-Catalot (2015) also show ad- vantages of our procedure. At last, we apply the technique developed in the paper to estimation of a delta contaminated mixing density of the Poisson intensity of the Saturn’s rings data. MSC 2010 subject classifications: Primary 62G07, 62C12; secondary 62P35. Keywords and phrases: Mixing density, Poisson distribution, empirical Bayes, Lasso penalty. Contents 1 Introduction ................................................. 1 2 The Lasso estimator of the mixing density ................................ 3 3 Implementation of the Lasso estimator .................................. 6 4 Convergence and estimation error ..................................... 7 5 Numerical Simulations ........................................... 8 6 Application to evaluation of the density of the Saturn ring ....................... 9 Acknowledgments ............................................... 10 7 Proofs .................................................... 10 References .................................................... 12 1. Introduction Poisson-distributed data appear in many contexts. In the last two decades a large amount of effort was spent on recovering the mean function in the Poisson regression model. In this set up, one observes independent Poisson variables Y 1 , ··· ,Y n where EY i = λ i = f (i/n), i =1, ··· ,n. Here, f is the function of interest which is assumed to exhibit some degree of smoothness. The difficulty in estimating f on the basis of Poisson data stems from the fact that the variances of the Poisson random variables are equal to their means and, hence, do not remain constant as f changes its values. Estimation techniques are either based * The authors would like to thank Dr. Joshua Colwell for helpful discussions and for providing the data. supported by “Italian Flagship Project Epigenomic” (http://www.epigen.it/) supported by National Science Foundation (NSF), grant DMS-1407475 1 imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Transcript of Estimation of a delta-contaminated density of a random intensity...

Page 1: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

Statistics SurveysVol. 0 (2006) 1–8ISSN: 1935-7516

Estimation of a delta-contaminated

density of a random intensity

of Poisson data

Daniela De Canditiis† and Marianna Pensky‡

I.A.C. “Mauro Picone”National Research Council

via dei Taurini, 19 Rome, Italyand

Department of Mathematics and StatisticsUniversity of Central FloridaOrlando FL 32816-1353, USA

e-mail: [email protected]; [email protected]

Abstract: In the present paper, we constructed an estimator of a delta contaminated mixing densityfunction g(λ) of an intensity λ of the Poisson distribution. The estimator is based on an expansion ofthe continuous portion g0(λ) of the unknown pdf over an overcomplete dictionary with the recoveryof the coefficients obtained as the solution of an optimization problem with Lasso penalty. In order toapply Lasso technique in the, so called, prediction setting where it requires virtually no assumptions onthe dictionary and, moreover, to ensure fast convergence of Lasso estimator, we use a novel formulationof the optimization problem based on the inversion of the dictionary elements.

We formulate conditions on the dictionary and the unknown mixing density that yield a sharporacle inequality for the norm of the difference between g0(λ) and its estimator and, thus, obtain asmaller error than in a minimax setting. Numerical simulations and comparisons with the Laguerrefunctions based estimator recently constructed by Comte and Genon-Catalot (2015) also show ad-vantages of our procedure. At last, we apply the technique developed in the paper to estimation of adelta contaminated mixing density of the Poisson intensity of the Saturn’s rings data.

MSC 2010 subject classifications: Primary 62G07, 62C12; secondary 62P35.Keywords and phrases: Mixing density, Poisson distribution, empirical Bayes, Lasso penalty.

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 The Lasso estimator of the mixing density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Implementation of the Lasso estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Convergence and estimation error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Numerical Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Application to evaluation of the density of the Saturn ring . . . . . . . . . . . . . . . . . . . . . . . 9Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1. Introduction

Poisson-distributed data appear in many contexts. In the last two decades a large amount of effort was spenton recovering the mean function in the Poisson regression model. In this set up, one observes independentPoisson variables Y1, · · · , Yn where EYi = λi = f(i/n), i = 1, · · · , n. Here, f is the function of interestwhich is assumed to exhibit some degree of smoothness. The difficulty in estimating f on the basis ofPoisson data stems from the fact that the variances of the Poisson random variables are equal to theirmeans and, hence, do not remain constant as f changes its values. Estimation techniques are either based

∗The authors would like to thank Dr. Joshua Colwell for helpful discussions and for providing the data.†supported by “Italian Flagship Project Epigenomic” (http://www.epigen.it/)‡supported by National Science Foundation (NSF), grant DMS-1407475

1

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 2: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 2

on variance stabilizing transforms (Brown et al. (2010), Fryzlewicz and Nason (2004)), wavelets (Antoniadisand Sapatinas (2004), Besbeas et al. (2004), Harmany et al. (2012)), Haar frames (Hirakawa and Wolfe(2012)) or Bayesian methods (Kolaczyk (1999) and Timmermann and Nowak (1999)).

The fact that the variance of a Poisson random variable is equal to its mean serves as a common andreliable test that data in question are indeed Poisson distributed. However, in many practical situations,although each of the data value Yi ∼ Poisson(λi), i = 1, · · · , n, the overall data do not have the Poissondistribution. This is due to the fact that the consecutive values of λi are so different from each other thatf is not really a function. In this case, in order to account for the extra-variance, it is usually reasonable toassume that λ itself is a random variable with an unknown probability density function g which needs tobe estimated.

In particular, below we consider the following problem. Let λi, i = 1, · · · , n, be independent randomvariables that are not observable and have an unknown pdf g(λ). One observes variables Yi|λi ∼ Poisson(λi),i = 1, · · · , n, that, given λi, are independent. Our objective is to estimate g(λ), the so called mixing density,on the basis of observations Y1, · · · , Yn. Here, g can be viewed as the prior density of the parameter λ, sothat the model above reduces to an empirical Bayes model where the prior has to be estimated from data.

Estimation of the prior density of the parameter of the Poisson distribution has been consideredby several authors. For example, Lambert and Tierney (1984) suggested non-parametric maximum like-lihood estimator, Walter (1985) and Walter and Hamedani (1991) studied estimators based on Laguerrepolynomials, Zhang (1995) considered smoothing kernel estimators and Herngartner (1997) investigatedFourier series based estimators of g. All papers listed above provided upper bounds for the mean integratedsquared error (MISE); Zhang (1995) and Herngartner (1997) also presented lower bounds for the MISEover smoothness classes. The common feature of all these estimators is that the convergence rates are verylow. In particular, if n→ ∞, both Zhang (1995) and Herngartner (1997) obtained convergence rates of theform (lnn/ ln lnn)−2ν where ν is the parameter of the smoothness class to which g belongs. The latter seemto imply that there is no hope for accurate estimation of the mixing density g unless the sample sizes areextremely high. On a more positive note, in a recent paper, Comte and Genon-Catalot (2015) consideredan estimator of g based on expansion of g over the orthonormal Laguerre basis. They showed that if theLaguerre coefficients of g decrease exponentially, then the resulting estimator has convergence rates thatare polynomial in n and provided some examples where this happens. Moreover, they proposed a penaltyfor controlling the number of terms in the expansion and provided oracle inequalities for the estimators ofg under various scenarios.

The low convergence rates for the prior density of Poisson parameter are due to the fact that itsrecovery constitutes a particular case of an ill-posed linear inverse problem. Indeed, let L2[0,∞) and ℓ2 bethe Hilbert spaces of, respectively, square integrable functions on [0,∞) and square integrable sequences.Denote the probability that Y = l, l = 0, 1, · · · , by P (l) = P(Y = l). Then, introducing a linear operatorQ : L2[0,∞) → ℓ2, we can present g(λ) as the solution of the following equation

(Qg)(l) =

∫ ∞

0

λl e−λ

l!g(λ) dλ = P (l), l = 0, 1, · · · (1.1)

Since exact values of the probabilities P (l) are unknown, they can be estimated by the relative frequenciesνl, so the problem of recovering g appears as an ill-posed linear inverse problem with the right-hand sidemeasured with error. Solution of equation (1.1) is particularly challenging since g is a function of a realargument while P is an infinite-dimensional vector.

On the other hand, in the last decade a great deal of effort was spent on recovery of an unknownfunction in regression setting from its noisy observations using overcomplete dictionaries. In particular, ifthe dictionary is large enough and the function of interest has a sparse representation in this dictionary,then it can be recovered with a much better precision than when it is expanded over an orthonormal basis.Lasso and its versions (see e.g. Buhlmann and van de Geer (2011) and references therein) allow one toidentify the dictionary elements that guarantee efficient estimation of the unknown regression function. Theadvantage of this approach is that the estimation error is controlled by the, so called, oracle inequalitiesthat provide upper bounds for the risk for the particular function that is estimated rather than convergencerates designed for the “worst case scenario” of the minimax setting. In addition, if the function of interestcan be represented via a linear combination of just a few dictionary elements, then one can prove that itcan be estimated with nearly parametric error rate provided certain assumptions on the dictionary hold.

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 3: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 3

In the present paper, we extend this idea to the case of estimating a mixing density g on the basisof Y1, · · · , Yn. However, there is an intrinsic difficulty arising from the fact that the problem above is anill-posed inverse problem. Currently, one can justify convergence of a Lasso estimator only if stringentassumptions on the dictionary, the, so called, compatibility conditions, are satisfied. In regression set up, aslong as compatibility conditions hold, one can prove that Lasso estimator is nearly optimal. Regrettably,while compatibility conditions may be satisfied for the functions in the original dictionary, they usually donot hold for their images due to contraction imposed by the operator Q. In the present paper, we showhow to circumvent this difficulty and apply Lasso methodology to estimation of g. We formulate conditionson the dictionary and the unknown mixing density that yield a sharp oracle inequality for the norm of thedifference between g(λ) and its estimator and, thus, result in a smaller error than in a minimax setting.Numerical simulations and comparisons with the Laguerre functions based estimator recently constructedby Comte and Genon-Catalot (2015) also show advantages of our procedure.

Our study is motivated by the analysis of astronomical data, in particular, the photon counts Yi, i =1, · · · , n that come from sets of observations of the stellar occultations recorded by the Cassini UVIS highspeed photometer at different radial points on the Saturn’s ring plane. It is well known that the Saturn ringis comprised of particles of various sizes, each on its own orbit about the Saturn. With no outside influences,these photon counts should follow the Poisson distribution, however, obstructions imposed by the particlesin the ring cause photon counts distribution to deviate from Poisson. The latter is due to the fact thatalthough, for each i = 1, · · · , n, the photon counts Yi ∼ Poisson(λi), the values of λi, i = 1, · · · , n, areextremely varied and, specifically, are best described as random variables with the unknown underlying pdfg(λ).

In addition, if a ring region contains a significant proportion of large particles, those particles cancompletely block out the light leading to zero photon counts. For this reason, we assume that the unknownpdf g is delta-contaminated, i.e., it is a combination of an unknown mass π0 at zero and a continuous part,so that g(λ) can be written as

g(λ) = π0δ(λ) + f(λ) with f(λ) = (1− π0)g0(λ) (1.2)

where g0(λ) is an unknown pdf and δ(λ) is the Dirac delta function such that, for any integrable function uone has

∫u(x)δ(x)dx = u(0). Models of the type (1.2) also appear in other applied settings (see, e.g., Lord et

al. (2005)). However, to the best of our knowledge, we are the first ones to estimate the delta-contaminateddensity of the intensity parameter of the Poisson distribution. In this setting, we also obtain a sharp oracleinequality for the norm of the difference between g0(λ) and its estimator. We also derive convergence ratesfor the estimator π0 of the mass π0 at zero. The estimator has also been successfully applied to recovery ofdelta-contaminated densities of the intensities λ for various sub-regions of the Saturn’s rings.

Finally, we should remark on several other advantages of the approach presented in the paper. First,although in the numerical studies of the paper we are using the gamma dictionary, all theoretical andmethodological results of the paper are valid for any type of dictionary functions since it is based on thenumerical inversion of dictionary elements. Moreover, the method can be used even if the underlying con-ditional distribution is different from the Poisson. The estimator exhibits no boundary effects and performswell in simulations delivering small errors. Moreover, since we apply the Tikhonov regularization for recov-ering the inverse images of the dictionary elements, our estimator can be viewed as a version of the elasticnet estimator described in Zou and Hastie (2005).

The rest of the paper is organized as follows. Sections 2 and 3 present, respectively, the method andthe algorithm for the estimation of the density function, while Section 4 studies convergence propertiesof the estimator. Section 5 investigates precision of the estimator developed in the paper via numericalsimulations using synthetic data. Section 6 provides application of the technique proposed in the paper tothe occultation data for the Saturn’s rings. Finally, Section 7 contains the proofs of the statements presentedin the paper.

2. The Lasso estimator of the mixing density

In what follows, we assume that g0(λ) in (1.2) can be well approximated by a dictionary

D = ϕk(λ), k = 1, · · · , p .

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 4: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 4

In particular, in our simulations and real-life applications we consider the dictionary that consists of gammapdfs

ϕk(λ) = γ(λ; ak, bk) =λak−1 exp(−λ/bk)

bak

k Γ(ak), k = 1, · · · , p. (2.1)

This is a natural choice since, for a fixed bk = b and ak = 1, 2, 3, · · · , this dictionary contains various linearcombinations of the Laguerre functions and, hence, its span approximates the L2[0,∞) space. Therefore,any square integrable function can be approximated by a linear combination of ϕk with a small error. On theother hand, using a variety of scales bk allows one to accurately represent a function of interest with manyfewer terms. Nevertheless, all theoretical results in Sections 2, 3 and 4 are valid for an arbitrary dictionaryin L2[0,+∞).

If π0 were known, then, using the dictionary, we would estimate g by

g(λ) = π0δ(λ) + f(λ) with f(λ) =

p∑k=1

θkϕk(λ),

where coefficients θk, k = 1, · · · , p, are chosen so to minimize the squared L2-norm

∥g − g∥22 = ∥g − π0δ∥22 +

∥∥∥∥∥p∑

k=1

θkϕk

∥∥∥∥∥2

2

− 2

p∑k=1

θk⟨g − π0δ, ϕk⟩. (2.2)

The first term in formula (2.2) does not depend on coefficients θk while the second term is completelyknown. In order to estimate the last term, note that ⟨g−π0δ, ϕk⟩ = ⟨g, ϕk⟩−π0ϕk(0). Moreover, if we foundfunctions χk ∈ ℓ2 such that

(Q∗χk)(λ) =∞∑i=0

e−λλi

i!χk(i) = ϕk(λ), ∀λ ∈ (0,+∞), (2.3)

then, it is easy to check that

⟨g, ϕk⟩ =

∫ +∞

0

g(λ)∞∑i=0

e−λλi

i!χk(i)dλ =

∞∑i=0

χk(i)

∫ +∞

0

g(λ)e−λλi

i!dλ

=∞∑i=0

χk(i)P (i) = Eχk(Y ). (2.4)

Here, P (l) is the marginal probability function

P (l) = P(Y = l) = π0 I(l = 0) +

p∑k=1

θkUk(l), l = 0, 1, 2, · · · (2.5)

where I(l = 0) is the indicator that l = 0 and

Uk(l) =

∫ +∞

0

e−λλl

l!ϕk(λ)dλ =

Γ(l + ak)

Γ(ak) l!blk(1 + bk)

−(l+ak) (2.6)

Hence, ⟨g, ϕk⟩ can be estimated by

⟨g, ϕk⟩ = n−1n∑

i=1

χk(Yi) =

∞∑l=0

χk(l)νl = ⟨χk, ν⟩, k = 1, . . . , p, (2.7)

where

νl = n−1n∑

i=1

I(Yi = l), l = 0, 1, · · · (2.8)

are the relative frequencies of Y = l and I(A) is the indicator function of a set A.

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 5: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 5

There is an obstacle to carrying out estimation above. Indeed, for some k solutions χk(Y ) of equations(2.3) may not have finite variances or variances may be too high. In particular, this is true for ϕk definedby formula (2.1) whenever bk < 1. In order to stabilize the variances we use the Tikhonov regularization.

In particular, we replace solution χk = (Q∗)−1ϕk of equation (2.3) by solution ψk,ζk of equation

(QQ∗ + ζkI)ψk,ζk = Qϕk, ζk > 0, (2.9)

where operators Q and Q∗ are defined in (1.1) and (2.3), respectively, and I is the identity operator, sothat, for any f ,

(QQ∗f)(j) =

∞∑l=0

(j + l

l

)2−(j+l+1)f(l), j = 0, 1, . . .

Observe that Var[ψk,ζk(Y )] is a decreasing function of ζk while the squared bias (Eψk,ζk − ⟨g, ϕk⟩)2 is an

increasing function of ζk. Denote ζk the unique solution of the following equation

1

nVar[ψ

k,ζk(Y )] =

(Eψ

k,ζk− ⟨g, ϕk⟩

)2(2.10)

and replace χk(Y ) in (2.7) by

ψk(Y ) = ψk,ζk

(Y ) with σ2k = Var[ψk(Y )]. (2.11)

After the values of ⟨g, ϕk⟩, k = 1, · · · , p, are estimated, the only obstacle for minimizing the righthand side of formula (2.2) is that we do not know the values of π0⟨ϕk, δ⟩ = π0ϕk(0). Therefore, we choose adictionary such that π0ϕk(0) = 0, k = 1, · · · , p. The latter means that in our numerical studies, unless weknow that π0 = 0, we choose ak > 1 in (2.1).

In order to identify the correct subset of dictionary functions ϕk, we introduce a weighted Lassopenalty. In particular, the vector of coefficients θ with components θk, k = 1, · · · , p, can be recovered as asolution of the following optimization problem

θ = argminθ

∥∥∥∥∥

p∑k=1

θkϕk

∥∥∥∥∥2

2

− 2

p∑k=1

θk⟨ψk, ν⟩+ α

p∑k=1

σk|θk|

. (2.12)

Here,∑p

k=1 σk|θk| is the weighted Lasso penalty and α is the penalty parameter. Note that the right handside of formula (2.12) is independent of π0, so the value of θ can be evaluated.

In order to implement optimization procedure suggested above, consider matrix Φ ∈ Rp×p withelements Φlk = ⟨ϕk, ϕl⟩, l, k = 1, · · · , p, and define vector ξ in Rp with components

ξk = ⟨ψk, ν⟩ =∞∑l=0

ψk(l) νl = n−1n∑

i=1

ψk(Yi). (2.13)

Introduce matrix W such that Φ = WTW and vector

η = (WT )+ξ = W (WTW)−1ξ, (2.14)

where, for any matrix A, matrix A+ is the Moore-Penrose inverse of A. Then, the optimization problem(2.12) appears as

θ = argminθ

∥Wθ − η∥22 + α

p∑k=1

σk|θk|

. (2.15)

Now, consider the problem of estimating the weight π0. Denote by f the projection of f(λ) onto thelinear space spanned by the dictionary D and by θ the coefficients of this projection. Let supp(θ) = J .Consider vector u with components

uk = Uk(0) =

∫ ∞

0

e−λϕk(λ) dλ, k = 1, · · · , p, (2.16)

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 6: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 6

and observe that

P (0) = P(Y = 0) = π0 + uT θ +∆ with ∆ =

∫ ∞

0

e−λ(f(λ)− f(λ)) dλ. (2.17)

Since π0 ≥ 0, we estimate π0 byπ0 = max(0, ν0 − uT θ). (2.18)

3. Implementation of the Lasso estimator

Formulae (2.15) and (2.18) suggest the following procedure.

The direct algorithm

1. Evaluate sample frequencies νl, l = 0, 1, ..., given by formula (2.8).2. Construct functions ψk(Y ), k = 1, · · · , p, satisfying conditions (2.10) and (2.11).3. Define a grid αl, l = 1, · · · , L, of values of α.4. For each value αl, l = 1, · · · , L, evaluate a solution θl of optimization problem (2.15) with α = αl.

5. Select θ = θl which optimizes one of the data driven criteria described in Section 5.6. Estimate π0 by π0 defined in (2.18).7. Obtain the estimator of g as

g(λ) = π0δ(λ) +

p∑k=1

θkϕk(λ). (3.1)

In order to implement Lasso estimator, for any ζk, we need to obtain a solution ψk,ζk of equation(2.9). For his purpose, we introduce a matrix version Q of operator Q in (1.1). The elements of matrix Qare Poisson probabilities Qli = e−xi(xi)

l/(l!), where xi = ih, i = 1, 2..., are the grid points at which we are

going to recover g(λ) and h is the step size. Introduce vectors ϕk and ψk,ζk , k = 1, . . . , p, with elements

ϕk(xi), i = 1, 2..., and ψk(l), l = 0, 1, ..., respectively. Then, for each k = 1, . . . , p, equation (2.9) can bere-written as

ψk = (QQT + ζkI)−1Qϕk, (3.2)

where I is the identity matrix. For the sake of finding ζk satisfying (2.10), we create a grid and chose ζk so

that to minimize an absolute value of Var[ψk,ζk(Y )]−(Eψk,ζk − ⟨g, ϕk⟩

)2where Var[ψk,ζk(Y )] is the sample

variance of ψk,ζk(Y ). After that, we evaluate ψk(Y ) in (2.11) and replace unknown variances σ2k in (2.11)

by their sample counterparts.

Remark 1. (Iterative estimation procedure) In the case when π0ϕk(0) = 0 for some values of k ∈ P,evaluations above lead to an iterative estimation algorithm. Indeed, in this case, for a given value of π0, theestimator θ is the solution of the following optimization procedure

θ = argminθ

θTΦθ − 2θT (ξ − π0z) + α

p∑k=1

σk|θk|

, (3.3)

where zk = ϕk(0). On the other hand, for a given θ, the estimator of π0 is provided by formula (2.18).Combination of (3.3) and (2.18) suggest the estimation procedure described below. However, we emphasizethat, unlike the direct algorithm, the iterative procedure comes without any guarantees for the estimationerrors.

The iterative algorithm

1. Carry out steps 1, 2 and 3 of the direct algorithm.

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 7: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 7

2. Choose an initial value π(0)0 = ν0 and obtain θ

(0)using steps 4-5 of the direct algorithm with (2.15)

replaced by (3.3) evaluated with π0 = π(0)0 .

3. For j = 1, 2, ..., set π(j)0 = max[0, ν0− (θ

(j−1))Tu] and obtain θ

(j)by steps 4-5 of the direct algorithm

with (2.15) replaced by (3.3) evaluated with π0 = π(j)0 . Repeat step 3 until one of the following stopping

criteria is met:

(i) π(j)0 = 0; (ii) ∥W θ

(j)−W θ

(j−1)∥22 < tol; (iii) j > Jmax.

Here tol and Jmax are, respectively, the tolerance level and the maximal number of steps defined inadvance.

4. Obtain the estimator

g(λ) = π0δ(λ) +

p∑k=1

θkϕk(λ).

4. Convergence and estimation error

Let g(λ) be given by (3.1). In order to derive oracle inequalities for the error of g(λ), we introduce thefollowing notations. For any vector t ∈ Rp, denote its ℓ2, ℓ1, ℓ0 and ℓ∞ norms by, respectively, ∥t∥2, ∥t∥1,∥t∥0 and ∥t∥∞ and

ft =

p∑j=1

tjϕj . (4.1)

Similarly, for any function f , denote by ∥f∥2, ∥f∥1 and ∥f∥∞ its L2, L1 and L∞ norms. Denote P =1, · · · , p. For any subset of indices J ⊆ P, subset Jc is its complement in P and |J | is its cardinality,so that |P| = p. Let LJ = Span ϕj , j ∈ J. If J ⊂ P and t ∈ Rp, then tJ ∈ R|J| denotes reduction of

vector t to subset of indices J . Recall that f is the projection of f(λ) onto the linear space spanned by thedictionary D and θ are the coefficients of this projection with supp(θ) = J .

It turns out that, as long as the sample size n is large enough, estimator fθ

is close to f with high

probability, with no additional assumptions. Indeed, the following statement holds.

Theorem 1. Let π0ϕk(0) = 0, for k = 1, · · · , p, and let τ be any positive constant. If n ≥ N0 and α ≥ α0,where

N0 =16

9(τ + 1) log p max

1≤k≤p

[∥ψk∥2∞σ2k

]and α0 = (2

√(τ + 1) log p+ 1)n−1/2, (4.2)

then with probability at least 1− 2p−τ , one has

∥fθ− f∥22 ≤ inf

t

∥ft − f∥22 + 4α

p∑j=1

σj |tj |

(4.3)

where θ is the solution of optimization problem (2.15).

Theorem 1 provides the so called “slow” Lasso rates. In order to obtain faster convergence rates and alsoto ensure that π0 is close to π0 with high probability, we impose the so called compatibility condition onthe dictionary ϕk, k = 1, · · · , p. In particular, denoting Υ = diag(σ1, · · · , σp) and considering the set ofp-dimensional vectors

J (µ, J) = d ∈ Rp : ∥(Υd)Jc∥1 ≤ µ∥(Υd)J∥1 , µ > 1, (4.4)

we assume that the following condition holds:

(A) Matrices Φ and Υ are such that for some µ > 1 and any J ⊂ P

κ2(µ, J) = min

d ∈ J (µ, J), ∥d∥2 = 0 :

dTΦd · Tr(Υ2J)

∥(Υd)J∥21

> 0. (4.5)

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 8: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 8

Observe that, in the regression setup, Υ is the identity matrix, and condition A reduces to the compatibilitycondition for general sets formulated in Section 6.2.3 of Buhlmann and van de Geer (2011) . If one has anorthonormal basis instead of an overcomplete dictionary, then matrix Φ is an identity matrix and, due tothe Cauchy inequality, κ2(µ, J) ≥ 1 for any µ and J . On the other hand, for an orthonormal basis, the bias∥ft − f∥2 in (4.3) may be large.

Under Assumption A, one can prove “fast” convergence rates for f as well as obtain the error boundsfor π0.

Theorem 2. Let π0ϕk(0) = 0, for k = 1, · · · , p, τ be any positive constant and Assumption A hold. Letα = ϖα0 where α0 is defined in (4.2) and ϖ ≥ (µ + 1)/(µ − 1). If n ≥ N0 where N0 is defined in (4.2),then with probability at least 1− 2p−τ , one has

∥fθ− f∥22 ≤ inf

J⊆P

∥f − fLJ∥22 +

(1 +ϖ)2(2√

(τ + 1) log p+ 1)2

κ2(µ, J) n

∑j∈J

σ2j

, (4.6)

where fLJ= projLJ

f . Moreover, with probability at least 1− 4p−τ , one has

(π0 − π0)2 ≤ 2τ log p

n+ inf

J⊆P

∥f − fLJ∥22 +

(1 +ϖ)2(2√(τ + 1) log p+ 1)2

κ2(µ, J) n

∑j∈J

σ2j

. (4.7)

5. Numerical Simulations

In order to evaluate the accuracy of the proposed estimator we carried out a simulation study where wetested performance of the proposed estimator under various scenarios. In order to assess precision of theestimator, for each of the scenarios, we evaluate the relative integrated error of g defined as

∆g = ∥g − g∥22/∥g∥22 (5.1)

where the norm is calculated over the grid xi = ih with i = 0, 1, . . ., if π0 = π0 = 0 and i = 1, 2, . . ., otherwise.In addition, we study prediction properties of g. In particular, we define the estimated frequencies

νl =

∫ ∞

0

λl

l!e−λ g(λ)dλ = π0I(l = 0) + (1− π0)

p∑k=1

θkUk(l), for l = 0, 1, 2... (5.2)

where Uk(l) are given in equation (2.6). The, we evaluate

∆ν = ∥ν − ν∥22/∥ν∥22, (5.3)

i.e. ∆ν evaluates the squared relative ℓ2 distance between the vectors of predicted and of observed fre-quencies. For the estimator proposed in this paper, we tested various computational schemes that differby the strategies for selecting the penalty parameter αl in step 5 of the direct algorithm. In particular, weconsidered the following options.

OPT : This estimator is obtained by using the direct algorithm presented in Section 3, where in step5 parameter αl is chosen by minimizing ∥g− g∥22, i.e. the squared ℓ2 distance between the true and theestimated function. Of course, this estimator represents only a benchmark for the proposed proceduresince it is not really applicable in practice because it requires knowledge of g.DDl2 : This estimator is by obtained by using the direct algorithm presented in Section 3, where instep 5 parameter αl is chosen by minimizing ∆ν , given in (5.3), i.e. the squared relative ℓ2 distancebetween the predicted and the observed frequencies.DDlike : This estimator is obtained by using the direct algorithm presented in Section 3, where instep 5 parameter αl is chosen by maximizing the likelihood function as suggested in Chow, Gemanand Wu (1983). In particular, since ν = (ν0, ν1, . . . , ) given in (5.2) and ν = (ν0, ν1, . . .) given in (2.8)are, respectively, the predicted and the observed frequencies, the likelihood function can be writtenas L(ν|ν) =

∏Ml=0 ν

νl

l , where M = maxi Yi.

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 9: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 9

For the sake of comparison we also define

NDE : This is the Nonparametric Density Estimator presented in Comte and Genon-Catalot (2015),for which the authors kindly provided the code.

The set of test functions represents different situations inspired by the real data problem described in thenext Section. In particular, we consider the following nine test functions:

1. the gamma density g(λ) = Γ(λ; 3, 1)2. the mixed gamma density g(λ) = 0.3Γ(λ; 3, 0.25) + 0.7Γ(λ; 10, 0.6)3. the exponential density g(λ) = Γ(λ; 1, 2)4. the Weibull density g(λ) = θp−θxθ−1 exp−(x/θ)θ I(x > 0), with p = 3 and θ = 25. the Gaussian density g(λ) = N(λ; 80, 1)6. the mixed gamma density g(λ) = 0.3Γ(λ; 2, 0.3) + 0.7Γ(λ; 40, 1)7. the delta contaminated gamma density g(λ) = 0.3δ(λ) + 0.7Γ(λ; 40, 1)8. the delta contaminated Gaussian density g(λ) = 0.2δ(λ) + 0.8N(λ; 80, 82)9. the delta contaminated Gaussian density g(λ) = 0.2δ(λ) + 0.8N(λ; 20, 42)

The first four test functions have been analyzed in Comte and Genon-Catalot (2015) and representcases where most of the data is concentrated near zero. The fifth test function corresponds to the situationwhere most of the data is concentrated away from zero. The last four test functions represent the mixturesof the two previous scenarios. All nine densities are showed in Figure 1.

Tables 1, 3 and 5 below display the average values of ∆g defined in (5.1) while Tables 2, 4 and 6report ∆ν defined in (5.3) (with the standard deviations in parentheses) over 100 different realizations ofdata Yi ∼ Poisson(λi), i = 1, .., n, where n = 10000 for Tables 1 and 2, n = 5000 for Tables 3 and 4and n = 1000 for Tables 5 and 6. We chose the grid step h = 0.5. The dictionary was constructed as acollection of the gamma pdfs (2.1) where parameters (ak, bk) belong to the Cartesian product of vectorsa = [2, 3, 4, · · · , 150] and b = [0.1, 0.15, · · · , 0.9, 0.95], hence, ϕk(0) = 0 and p = 2682. For this dictionary,max1≤k≤p

[∥ψk∥2∞/σ2

k

]= 146.95, so that condition (4.2) holds with τ = 3.85 and τ = 1.43 for n = 10000

and n = 5000, respectively, and is not valid for n = 1000. However, as simulation results show, our estimatorshows good performance even when assumption (4.2) is violated.

As it is expected, performances of all estimators deteriorate when n decreases, although not very sig-nificantly. For a fixed sample size, estimator OPT is the most precise in terms of ∆g as a direct consequenceof its definition, however, estimator DDlike is always comparable. Estimator DDl2 has similar performanceto DDlike except for cases 2 and 6 where the underlying densities are bimodal and, hence, data can beexplained by a variety of density mixtures. In conclusion, apart from OPT which is not available in thecase of real data, estimator DDlike turns out to be the most accurate in terms of both ∆g and ∆ν . Forcompleteness, Figures 1 and 2 exhibit some of the reconstructions obtained using estimator DDlike in thecase of n = 5000.

Finally, we should mention that NDE is a projection estimator that uses only the first few Laguerrefunctions. For this reason, it fails to adequately represent a density function that corresponds to the situationwhere values λi, i = 1, · · · , n, are concentrated away from zero, as it happens in case 5 (where NDE returnszero as an estimator) and case 6 (where NDE succeeds in reconstructing only the first part of the densitynear zero). Also, note that NDE errors are not displayed for cases 7, 8 and 9 since this estimator is notdefined for delta contaminated densities.

6. Application to evaluation of the density of the Saturn ring

The Saturn’s rings system can be broadly grouped into two categories: dense rings (A, B, C) and tenuousrings (D, E, G) (see the first panel of Figure 3). The Cassini Division is a ring region that separates theA and B rings. The study of structure within Saturn’s rings originated with Campani, who observed in1664 that the inner half of the disk was brighter than the outer half. Furthermore, in 1859, Maxwell provedthat the rings could not be solid or liquid but were instead made up of an indefinite number of particles ofvarious sizes, each on its own orbit about Saturn. Detailed ring structure was revealed for the first time,however, by the 1979 Pioneer and 1980-1981 Voyager encounters with Saturn. Images were taken at closerange, by stellar occultation (observing the flickering of a star as it passes behind the rings), and by radio

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 10: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 10

occultation (measuring the attenuation of the spacecraft’s radio signal as it passes behind the rings as seenfrom Earth) (see, e.g., Esposito et al, 2004). By analyzing the intensity of star light while it is passing throughSaturn’s rings, astronomers can gain insight into properties that telescopes cannot visually determine. Eachsub-region in the rings has its own associated distinct distribution of the density and sizes of the particlesconstituting the sub-region. This distribution uniquely determines the amount of light which is able to passfrom a star (behind the rings) to the photometer.

Our data Yi, i = 1, · · · , n, come from sets of observations of stellar occultations recorded by theCassini UVIS high speed photometer and contains n = 7615 754 photon counts at different radial points,located at 0.01-0.1 kilometer increments, on the Saturn’s rings plane (see the second panel of Figure 3). Withno outside influences, these photon counts should follow the Poisson distribution, however, obstructionsimposed by the particles in the rings cause their distribution to deviate from Poisson. Indeed, if datawere Poisson distributed, then its mean would be approximately equal to its variance for every sub-region.However, as the third panel of Figure 3 shows, observations Yi have significantly higher variances than means.The latter is due to the fact that, although for each i = 1, · · · , n, the photon counts Yi are Poisson(λi),the values of λi, i = 1, · · · , n, are extremely varied and, specifically, cannot be modeled as the values ofa continuous function. In fact, intensities λi, i = 1, · · · , n, are best described as random variables withan unknown underlying pdf g(λ). In addition, if the ring region contains a significant proportion of largeparticle, those particles can completely block of the light leading to zero photon counts. For this reason, weallow g(λ) to possibly contain a non-zero mass at λ = 0, hence, being of the form (1.2). The shape of g(λ)allows one to determine the density and distribution of the sizes of the particles of a respective sub-regionof the Saturn rings. This information, in turn, should shed light on the question of the origin of the ringsas well as how they reached their current configuration.

In order to identify sub-regions of the Saturn rings with different properties, we segmented thedata using the method presented in Bruni, V., De Canditiis, D., Vitulano, D. (2012) which is designedfor partitioning of complicated signals with several non-isolated and oscillating singularities. In particular,we applied the Gabor Continuous Wavelet Transform (see, e.g. Mallat (2009)) to the data and selectedthe highest scale where the number of wavelet modulus maxima takes minimum value. At this scale, wesegmented the signal by the method proposed in Bruni, V., De Canditiis, D., Vitulano, D. (2012). Weobtained a total of 1531 intervals of different sizes. Figures 4 and 5 refer to six distinct sub-regions ofthe rings. The left panels of both figures show raw data. The right panels exhibit the sample and theestimated frequencies obtained by DDlike estimator for six different intervals that are representative ofdifferent portions of the data set.

Note that in Figure 4, for all three data segments, the estimated parameter π0 = 0. This is nottrue for the first and the second panels of Figure 5 where π0 = 0.5059 and π0 = 0.2463, respectively. Thevalues of ∆ν , defined in (5.3), obtained for the six data segments are, respectively, 0.0128, 0.0159, 0.0022,0.0229, 0.003 and 0.0095, and are consistent with the values obtained in simulations. Both, the right panelsin Figures 4 and 5 and the values of ∆ν , confirm the ability of the estimator developed in the paper toaccurately explain the Saturn’s rings data.

Acknowledgments

Marianna Pensky was partially supported by National Science Foundation (NSF), grant DMS-1407475.Daniela De Canditiis was entirely supported by the “Italian Flagship Project Epigenomic” (http://www.epigen.it/).The authors would like to thank Dr. Joshua Colwell for helpful discussions and for providing the data. Theauthors also would like to thank SAMSI for providing support which allowed the author’s participation inthe 2013-14 LDHD program which was instrumental for writing this paper.

7. Proofs

Proofs of Theorems 1 and 2 are based on the following statement which is a trivial modification of Lemma2 of Pensky (2016).

Lemma 1. (Pensky (2015)). Let f be the true function and fθ be its projection onto the linear span ofthe dictionary LP . Let Υ be a diagonal matrix with components σj, j = 1, · · · , p. Consider solution of the

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 11: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 11

weighted Lasso problem

θ = argmint

tTWWT t− 2tT β + α∥Υt∥1

. (7.1)

with Φ = WTW, β = Φθ andβ = β +

√εΥη + h, η,h ∈ Rp, (7.2)

where h is a nonrandom vector, Eη = 0 and components ηj of vector η are random variables such that, forsome K > 0 and any τ > 0, there is a set

Ω =

ω : max

1≤j≤p|ηj | ≤ K

√(τ + 1) log p

with P(Ω) ≥ 1− 2p−τ . (7.3)

Denote

Ch = max1≤j≤p

[|hj |

σj√ε log p

], Cα = K

√τ + 1 + Ch. (7.4)

If α0 = Cα

√ε log p then, for any τ > 0 and any α ≥ α0, with probability at least 1− 2p−τ , one has

∥fθ− f∥22 ≤ inf

t

[∥ft − f∥22 + 4α∥Υt∥1

]. (7.5)

Moreover, if Assumption A holds and α = ϖα0 where ϖ ≥ (µ + 1)/(µ − 1), then for any τ > 0 withprobability at least 1− 2p−τ , one has

∥fˆθ − f∥22 ≤ inft,J⊆P

∥ft − f∥22 + 4α∥(Υt)Jc∥1 +(1 +ϖ)2C2

α

κ2(µ, J)ε log p

∑j∈J

ν2j

. (7.6)

Proof of Theorem 1. Let vectors b and ξ, respectively, have components bk = ⟨ϕk, f⟩ and ξkdefined in (2.13). It is easy to see that

ξk − bk =1

n

n∑i=1

[ψk(Yi)− Eψk(Yi)] +Hk with Hk = Eψk,ζk

− bk (7.7)

Applying Bernstein inequality, for any x > 0, obtain

P

(∣∣∣∣∣n−1n∑

i=1

[ψk(Yi)− Eψk(Yi)]

∣∣∣∣∣ ≥ xσk√n

)≤ 2 exp

(−x2

[2 +

4xσk∥ψk∥∞3√n

]−1).

Using the fact that A/(B+C) ≥ min(A/(2B), A/(2C)) for any A,B,C > 0, under condition n ≥ N0, derive

P

(∣∣∣∣∣n−1n∑

i=1

[ψk(Yi)− Eψk(Yi)]

∣∣∣∣∣ ≥ xn−1/2 σk

)≤ 2 exp−(x2/4). (7.8)

Choosing x = 2√(τ + 1) log p and recalling that, according to (2.10), |Hk| = n−1/2σk, gather that P(|ξk −

bk| > n−1/2 σk[2√(τ + 1) log p+ 1]) ≤ 2 p−(τ+1), so that

Ω1 =

ω : max

1≤k≤p

[|ξk − bk|σk

]≤

2√(τ + 1) log p+ 1√

n

with P(Ω1) > 1− 2p−τ . (7.9)

Then, validity of Theorem 1 follows directly from Lemma 1 with ε = n−1, ηk = ξk/σk and K = 2.

Proof of Theorem 2. Validity of inequality (4.6) follows from (7.9) and Lemma 1 with ε = σ/√n, K = 2,

hj = n−1/2σj and Ch = (log p)−1/2.

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 12: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 12

In order to establish upper bounds for (π0 − π0)2, note that due to (2.17) and (2.18) and π0 ≥ 0, one

has|π0 − π0| ≤ |P(Y = 0)− ν0|+ |uT (θ − θ) + ∆|. (7.10)

For any τ > 0, by Hoeffding inequality obtain P|P(Y = 0)− ν0| ≤

√τn−1 log p

≥ 1 − 2p−τ . Let Ω1 be

the set on which |P(Y = 0)− ν0| ≤√τn−1 log p. Then, P(Ω1) ≥ 1− 2p−τ .

Now, let Ω2 be the set on which inequality (4.6) holds and P(Ω2) ≥ 1− 2p−τ . Observe that

uT (θ − θ) =

∫ ∞

0

e−λ

p∑j=1

(θj − θj)ϕjλ) dλ =

∫ ∞

0

e−λ(f(λ)− f(λ))dλ

where f is the projection of f(λ) onto the linear space spanned by the dictionary D. Therefore, by thedefinition of ∆ in (2.17) obtain that

|uT (θ − θ) + ∆| ≤∣∣∣∣∫ ∞

0

e−λ(f(λ)− f(λ)) dλ

∣∣∣∣ ≤ ∥f − f∥2/√2. (7.11)

Hence, on a set Ω = Ω1 ∩ Ω2 with P(Ω) ≥ 1− 4p−τ , using (7.10) and (7.11), obtain (4.7) which completesthe proof.

References

[1] Antoniadis, A., Sapatinas, T. (2004). Wavelet shringkage for natural exponential families withquadratic variance functions, Biometrika, 88, 805-820.

[2] Besbeas, P., De Feis, I., Sapatinas, T. (2004). A Comparative Simulation Study of Wavelet Shrink-age Estimators for Poisson Counts, International Statistical Review, 72, 209-237.

[3] Bickel, P.J., Ritov, Y., Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector,Ann. Statist., 37, 1705-1732.

[4] Brown, L. D., Cai, T. T., Zhou, H. (2010). Nonparametric regression in exponential families, Ann.Statist., 38, 2005-2046.

[5] Bruni, V., De Canditiis, D., Vitulano, D.(2012). Time-scale energy based analysis of contoursof real-world shapes, Mathematics and Computer in Simulation, 82, 2891-2907.

[6] Buhlmann, P., van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory andApplications, Springer.

[7] Chow, Y.-S., Geman, S., Wu, L.-D. (1983). Consistent cross-validated density estimaton, Ann.Statist., 11, 25-38.

[8] Colwell, J.E., Nicholson, P.D., Tiscareno, M.S., Murray, C.D., French, R.G., Marouf,E.A. (2009). The Structure of Saturn’s Rings, in Saturn from Cassini-Huygens, Dougherty, M., Esposito,L., Krimigis, S. Eds., Springer.

[9] Comte, F., Genon-Catalot, V. (2015). Adaptive Laguerre density estimation for mixed Poissonmodels, Electr. Journ. Statist., 9, 1113-1149.

[10] Dalalyan, A.S., Hebiri, M., Lederer, J. (2014). On the prediction performance of the Lasso,arxiv: 1402.1700

[11] Esposito, L. W., Barth, C. A. , Colwell, J. E., Lawrence, G. M., McClintock, W. E.,Stewart, A. I. F., Keller, H. U., Korth, A., Lauche, H., Festou, M. C., Lane, A. L.,Hansen, C. J., Maki, J. N., West, R. A., Jahn, H., Reulke, R., Warlich, K., Shemansky, D.E., Yung, Y. L. (2004). The Cassini Ultraviolet Imaging Spectrograph investigation, Space Sci. Rev.,115, 294-361.

[12] Fryzlewicz, P., Nason G.P. (2004). A Haar-Fisz Algorithm for Poisson Intensity Estimation,Journ. Computat. Graph. Statis., 13, 621-638.

[13] Harmany, Z., Marcia, R., Willett, R. (2012). This is SPIRAL-TAP: Sparse Poisson IntensityReconstruction ALgorithms: Theory and Practice, IEEE Trans. Image Processing, 21, 1084-1096.

[14] Herngartner, N.W. (1997). Adaptive demixing in Poisson mixture models, Ann. Stat., 25, 917-928.[15] Hirakawa, K., Wolfe, P.J. (2012). Skellam Shrinkage: Wavelet-Based Intensity Estimation for

Inhomogeneous Poisson Data, IEEE Trans. Inf. Theory, 58, 1080-1093.

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 13: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 13

[16] Jansen, M. (2006). Multiscale Poisson data smoothing, J. R. Statist. Soc., Ser. B, 68, 27-48.[17] Kolaczyk, E.D. (1999). Bayesian Multiscale Models for Poisson Processes, Journ. Amer. Statist.

Assoc., 94, 920-933.[18] Lambert, D., Tierney, L. (1984). Asymptotic properties of maximum likelihood estimates in the

mixed Poisson model, Ann. Statist., 12, 1388-1399.[19] Lord, D., Washington, S.P., Ivan, J.N. (2005). Poisson, Poisson-gamma and zero-inflated regres-

sion models of motor vehicle crashes: balancing statistical fit and theory, Accid. Anal. Prevent., 37,35-46.

[20] Mallat, S. (2009). A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed. Elsevier.[21] Pensky, M. (2015). Solution of linear ill-posed problems using overcomplete dictionaries, ArXiv

1408.3386v2.[22] Timmermann, K.E., Nowak, R.D. (1999). Multiscale Modeling and Estimation of Poisson Processes

with Application to Photon-Limited Imaging, IEEE Trans. Inf. Theory, 45, 846-862.[23] Walter, G. (1985). Orthogonal polynomials estimators of the prior distribution of a compound

Poisson distribution, Sankhya, Ser. A, 47, 222-230.[24] Walter, G. and Hamedani, G. (1991). Bayes empirical Bayes estimation for natural exponential

families with quadratic variance function, Ann. Statist., 19, 1191-1224.[25] Zhang, C.-H. (1995). On estimating mixing densities in discrete exponential family models, Ann.

Statist., 23, 929-947.[26] Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net, JRSS, Ser.B,

67, Part 2, 301-320.

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 14: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 14

Table 1Average values of ∆g (with their standard deviations in parentheses) over 100 simulation runs with n = 10000

test case OPT DDl2 DDlike NDEcase1 0.0007 (0.0010) 0.0023 (0.0028) 0.0022 (0.0030) 0.1183 (0.8307)case2 0.0471 (0.0197) 0.2214 (0.0503) 0.0507 (0.0305) 0.0613 (0.0716)case3 0.0142 (0.0398) 0.0191 (0.0127) 0.0138 (0.0087) 0.0190 (0.0399)case4 0.0043 (0.0021) 0.0054 (0.0032) 0.0061 (0.0052) 0.0298 (0.0657)case5 0.0042 (0.0033) 0.0023 (0.0029) 0.0014 (0.0021) 1.0000 (0.0000)case6 0.0793 (0.0247) 0.4318 (0.0554) 0.0839 (0.0241) 0.3383 (0.0085)case7 0.0067 (0.0012) 0.0009 (0.0008) 0.0008 (0.0008) -case8 0.0060 (0.0010) 0.0068 (0.0009) 0.0069 (0.0010) -case9 0.0085 (0.0013) 0.0099 (0.0014) 0.0111 (0.0026) -

Table 2Average values of ∆ν (with their standard deviations in parentheses) over 100 simulation runs with n = 10000

test case OPT DDl2 DDlike NDEcase1 0.0011 (0.0009) 0.0006 (0.0005) 0.0007 (0.0005) 0.0675 (0.5470)case2 0.0013 (0.0007) 0.0088 (0.0024) 0.0014 (0.0010) 0.0228 (0.0431)case3 0.0002 (0.0001) 0.0006 (0.0005) 0.0003 (0.0003) 0.1130 (0.0653)case4 0.0014 (0.0009) 0.0011 (0.0006) 0.0012 (0.0009) 0.0205 (0.0530)case5 0.0045 (0.0010) 0.0043 (0.0010) 0.0044 (0.0009) 1.0000 (0.0000)case6 0.0465 (0.1402) 0.0288 (0.0045) 0.0041 (0.0020) 0.4376 (0.0618)case7 0.0013 (0.0002) 0.0006 (0.0002) 0.0006 (0.0002) -case8 0.0039 (0.0009) 0.0018 (0.0004) 0.0018 (0.0004) -case9 0.0035 (0.0006) 0.0019 (0.0007) 0.0020 (0.0006) -

Table 3Average values of ∆g (with their standard deviations in parentheses) over 100 simulation runs with n = 5000

test case OPT DDl2 DDlike NDEcase1 0.0006 (0.0008) 0.0038 (0.0049) 0.0030 (0.0046) 0.2424 (1.5002)case2 0.0590 (0.0343) 0.2106 (0.0549) 0.0640 (0.0428) 0.2048 (0.2196)case3 0.0148 (0.0097) 0.0251 (0.0310) 0.0178 (0.0119) 0.0309 (0.0650)case4 0.0055 (0.0019) 0.0074 (0.0051) 0.0086 (0.0060) 0.0493 (0.1123)case5 0.0069 (0.0052) 0.0044 (0.0048) 0.0024 (0.0036) 1.0000 (0.0000)case6 0.0830 (0.0277) 0.4068 (0.0856) 0.0879 (0.0266) 0.3456 (0.0110)case7 0.0077 (0.0035) 0.0023 (0.0030) 0.0023 (0.0030) -case8 0.0065 (0.0021) 0.0074 (0.0020) 0.0075 (0.0021) -case9 0.0096 (0.0023) 0.0114 (0.0023) 0.0128 (0.0029) -

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 15: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 15

Table 4Average values of ∆ν (with their standard deviations in parentheses) over 100 simulation runs with n = 5000

test case OPT DDL2 DDlike NDEcase1 0.0017 (0.0016) 0.0010 (0.0007) 0.0012 (0.0007) 0.1393 (1.0084)case2 0.0020 (0.0013) 0.0090 (0.0029) 0.0022 (0.0014) 0.1184 (0.1475)case3 0.0004 (0.0002) 0.0009 (0.0013) 0.0006 (0.0004) 0.1131 (0.0948)case4 0.0022 (0.0014) 0.0016 (0.0008) 0.0018 (0.0013) 0.0346 (0.0859)case5 0.0087 (0.0019) 0.0084 (0.0018) 0.0085 (0.0018) 1.0000 (0.0000)case6 0.0377 (0.1361) 0.0285 (0.0068) 0.0057 (0.0030) 0.4608 (0.0849)case7 0.0020 (0.0003) 0.0013 (0.0003) 0.0013 (0.0003) -case8 0.0052 (0.0011) 0.0032 (0.0008) 0.0032 (0.0008) -case9 0.0044 (0.0010) 0.0031 (0.0010) 0.0031 (0.0009) -

Table 5Average values of ∆g (with their standard deviations in parentheses) over 100 simulation runs with n = 1000

test case OPT DDl2 DDlike NDEcase1 0.0040 (0.0097) 0.0221 (0.0258) 0.0176 (0.0207) 0.3004 (0.9331)case2 0.0992 (0.0760) 0.1973 (0.0718) 0.1335 (0.0983) 0.5370 (0.0960)case3 0.0533 (0.0889) 0.0753 (0.0838) 0.0662 (0.0894) 0.1127 (0.3912)case4 0.0069 (0.0014) 0.0178 (0.0183) 0.0179 (0.0135) 0.1393 (0.3381)case5 0.0170 (0.0108) 0.0217 (0.0253) 0.0152 (0.0223) 1.0000 (0.0000)case6 0.1223 (0.0710) 0.3151 (0.1409) 0.1270 (0.0759) 0.4479 (0.2572)case7 0.0133 (0.0115) 0.0102 (0.0118) 0.0098 (0.0118) -case8 0.0142 (0.0137) 0.0156 (0.0139) 0.0156 (0.0154) -case9 0.0121 (0.0073) 0.0163 (0.0110) 0.0160 (0.0101) -

Table 6Average values of ∆ν (with their standard deviations in parentheses) over 100 simulation runs with n = 1000

test case OPT DDl2 DDlike NDEcase1 0.0063 (0.0042) 0.0043 (0.0026) 0.0047 (0.0027) 0.1458 (0.5377)case2 0.0076 (0.0051) 0.0117 (0.0047) 0.0084 (0.0048) 0.3498 (0.0937)case3 0.0021 (0.0022) 0.0031 (0.0033) 0.0027 (0.0037) 0.2149 (0.3773)case4 0.0075 (0.0068) 0.0046 (0.0029) 0.0048 (0.0032) 0.0967 (0.3578)case5 0.0427 (0.0091) 0.0411 (0.0090) 0.0416 (0.0090) 1.0000 (0.0000)case6 0.0157 (0.0044) 0.0304 (0.0127) 0.0166 (0.0067) 0.5344 (0.1001)case7 0.0072 (0.0018) 0.0067 (0.0018) 0.0067 (0.0018) -case8 0.0154 (0.0038) 0.0143 (0.0038) 0.0142 (0.0037) -case9 0.0120 (0.0034) 0.0111 (0.0032) 0.0111 (0.0032) -

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 16: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 16

Fig 1: The true density (red) and DDlike estimators (blue) obtained in the first 10 simulation runs withsample size n = 5000

0 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35case1

0 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35case2

0 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5case3

0 5 10 15 20 25 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35case4

0 50 100 1500

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05case5

0 50 100 150−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35case6

0 50 100 1500

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05case7

0 50 100 1500

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04case8

0 50 100 1500

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08case9

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 17: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 17

Fig 2: Sample frequencies (red) and estimated frequencies (blue) obtained in the first 10 simulation runswith sample size n = 5000

0 2 4 6 8 10 12 14 16 180

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2case1

0 5 10 15 20 250

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18case2

0 5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

0.3

0.35case3

0 2 4 6 8 10 12 14 16 180

0.05

0.1

0.15

0.2

0.25case4

0 50 100 1500

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04case5

0 10 20 30 40 50 60 70 80 90−0.05

0

0.05

0.1

0.15

0.2case6

0 10 20 30 40 50 60 70 80 900

0.05

0.1

0.15

0.2

0.25

0.3

0.35case7

0 20 40 60 80 100 120 1400

0.05

0.1

0.15

0.2

0.25case8

0 10 20 30 40 500

0.05

0.1

0.15

0.2

0.25case9

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 18: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 18

Fig 3: The first panel: Names of Saturns rings, courtesy science.nasa.gov. The second panel: the meansof the binned total data set (100 observations per bin). The third panel: the means (red) and the variances(blue) of the binned data

×1040 1 2 3 4 5 6 7 8

0

10

20

30

40

50

60

70

80

90

0 1 2 3 4 5 6 7 8

x 104

0

200

400

600

800

1000

1200

1400

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 19: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 19

Fig 4: Left panels: segments of data. Right panels: sample frequencies (blue) and estimated frequencies withthe penalty parameter obtained by DDlike criterion (red). ∆ν = 0.0128 (top panel), ∆ν = 0.0159 (middlepanel), ∆ν = 0.0022, π0 = 0 for all three cases. (bottom panel)

0.5 1 1.5 2 2.5

x 105

0

20

40

60

80

100

120

data 20 : 298192

0 50 1000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

sampled and estimated frequencies

1000 2000 3000 4000 50000

10

20

30

40

50

60

70

80

90

100

data 1492993 : 1498291

0 50 1000

0.005

0.01

0.015

0.02

0.025

sampled and estimated frequencies

1000 2000 3000 4000 50000

10

20

30

40

50

60

70

80

90

data 2244569 : 2250341

0 20 40 60 800

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

sampled and estimated frequencies

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016

Page 20: Estimation of a delta-contaminated density of a random intensity …danielad/my_papers/densitypaper... · 2016. 12. 6. · D. De Canditiis and M. Pensky./Estimation of a delta-contaminated

D. De Canditiis and M. Pensky./Estimation of a delta-contaminated density of a random intensity of Poisson data 20

Fig 5: Left panels: segments of data. Right panels: sample frequencies (blue) and estimated frequencieswith the penalty parameter obtained by DDlike criterion (red). ∆ν = 0.0229 and π0 = 0.5059 (top panel),∆ν = 0.003 and π0 = 0.2463 (middle panel), ∆ν = 0.0095 and π0 = 0 (bottom panel)

0.5 1 1.5 2 2.5

x 104

0

10

20

30

40

50

60

70

80

data 2284952 : 2310145

0 20 40 60 800

0.1

0.2

0.3

0.4

0.5

0.6

sampled and estimated frequencies

2 4 6

x 105

0

10

20

30

40

50

60

70

80

data 2864192 : 3491737

0 20 40 60 800

0.1

0.2

0.3

0.4

0.5

0.6

sampled and estimated frequencies

1000 2000 30000

10

20

30

40

50

60

70

80

90

100data 4077229 : 4081012

0 50 1000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05sampled and estimated frequencies

imsart-ss ver. 2014/10/16 file: densitypaper_pub.tex date: February 9, 2016