Proportionestimators,Lebesgue-Stieltjes integralequations ... · the Lebesgue-Stieltjes integral...

Xiongzhi ChenEmail: [email protected]

Department of Mathematics and Statistics

Proportion estimators, Lebesgue-Stieltjesintegral equations, and concentration of

measures

http://math.wsu.edu/faculty/xchen

mailto:[email protected]

Outline

Background on proportion estimation

Brief review on existing methods

The new method

Summary

Large-scale multiple hypotheses testing

Identify among hundreds of drugs those that may have aspecific side effect

Identify among thousands of genes those that might beassociated with a disease

Identify among millions of SNPs those that might beassociated with the yield of a crop

Four basic questions

Signals are entities that provide scientific information or that ascientist is searching for. Given a data set, one asks 4 basicquestions:

1 Are there any signals? (Testing)

2 If there are signals, how many signals? (Testing)

3 If there are signals, where are they? (Testing)

4 If there are signals, what are their magnitudes? (Estimation)

The proportion of false nulls

Convention: “a signal” is identified with “a false null”, and “anoise” identified with “a true null”

Among m null hypotheses, the proportion of false nulls, i.e.,the signal proportion, is defined as

π1,m = Number of false nulls

m

Example: in gene-disease association study, π1,m is the ratio ofthe number of disease-contributing genes to the total numberof genes under investigation

Paradigms of sparsity

Dense regime: liminfm→∞π1,m > 0, i.e., there is always apositive proportion of signals

Moderately sparse regime: π1,m = O(m−s

)for 0 < s < 0.5

Critically sparse regime: π1,m = O(m−0.5

)Very sparse regime: π1,m = O

(m−s

)for 0.5 < s < 1

Formulation of estimation problem

Data: {zi}mi=1 rv’s each with mean or median µi

Hypotheses: Hi0 :µi =µ0 vs Hi1 :µi 6=µ0

Indices of signals: I1,m = {1 ≤ i ≤ m :µi 6=µ0

}Signal proportion: π1,m = m1m−1, where m1 =

∣∣I1,m∣∣

Target: to consistently estimate π1,m

A brief review on existing approaches

Four approaches to proportion estimation:

1 Hard thresholding: Hi0 is false when |zi| ≥ t for some t > 0

2 Deconvolving two-component mixture model: {zi}mi=1

identically distributed as

f (x) = (1−π1,m

)f0 (x)+π1,mf1 (x)

3 Bounding an excess of uniform empirical process

4 Fourier transform, implemented for Gaussian family ormixtures with a Gaussian component

A brief review on existing approaches

Disadvantages of the four approaches:

Only Approaches 3 and 4 are consistent when π1,m tends tozero

Approach 4 is not applicable when CDFs of rv’s do not consistat least one component from a location-shift family

Consistency of Approach 3 requires CDFs to be absolutelycontinuous

Consistency of Approaches 1 and 2 require restrictivemodelling assumptions (e.g., zi’s corresponding to false nullsare identically distributed)

The strategy: set-up

Let z = (z1, . . . ,zm) and µ= (µ1, ...,µm

)Denote by Fµi the CDF of zi such that Fµi ∈F = {

Fµ :µ ∈ U},

where U has a non-empty interior

Each Fµ is uniquely determined by µ

Recall: Hi0 :µi =µ0 vs Hi1 :µi 6=µ0

The strategy: constructing “Phase function”

For each µ ∈ U , approximate 1{µ6=µ0} by ψ(t,µ;µ0

), t ∈R such

that

limt→∞ψ

(t,µ0;µ0

)= 1 and limt→∞ψ

(t,µ;µ0

)= 0 for µ 6=µ0 (1)

Define “phase function”

ϕm(t,µ

)= 1

m

m∑i=1

[1−ψ(

t,µi;µ0)]

(2)

The phase function: limt→∞ϕm(t,µ

)=π1,m for any fixed mand µ, i.e., the “Oracle”

Λm(µ

)= limt→∞ϕm

(t,µ

)(3)

The strategy: constructing “Empirical phase function”

Construct K :R2 →R that is independent of µ 6=µ0 and solvesthe Lebesgue-Stieltjes integral equation

ψ(t,µ;µ0

)= ∫K

(t,x;µ0

)dFµ (x) (4)

Define “empirical phase function”

ϕm (t,z) = 1

m

m∑i=1

[1−K

(t,zi;µ0

)](5)

The empirical phase function: for any fixed m, t and µ

E[ϕm (t,z)

]=ϕm(t,µ

)(6)

The strategy: intuition for consistency

Recall “phase function” ϕm(t,µ

)and limt→∞ϕm

(t,µ

)=π1,m

Recall “empirical phase function” ϕm (t,z) and

E[ϕm (t,z)

]=ϕm(t,µ

)Intuition: ϕm (t,z) will consistently estimate π1,m, i.e.,

Pr

(∣∣∣∣ ϕm (tm,z)

π1,m−1

∣∣∣∣→ 0

)→ 1 as m →∞ (7)

if

em (tm) = ∣∣ϕm (tm,z)−ϕm(tm,µ

)∣∣→ 0 at appropriate speed

The strategy: main difficulties in implementation

Phase functions may not exist, i.e., no solution in ψ to

limt→∞ψ

(t,µ0;µ0

)= 1 and limt→∞ψ

(t,µ;µ0

)= 0 for µ 6=µ0

or in K (independent of µ 6=µ0) to

ψ(t,µ;µ0

)= ∫K

(t,x;µ0

)dFµ (x)

The oscillation


)∣∣may not converge to 0 when K is not a Lipschitz or boundedtransform of {zi}m

i=1

The strategy: summary of results

When {zi}mi=1 are mutually independent, uniformly consistent

estimators ϕm (t,z) are constructed respectively for the estimationproblem in the settings where

Random variables have Riemann-Lebesgue type characteristicfunctions (RL Type CFs)

Random variables have CDFs that are members of naturalexponential families with support N

Random variables have CDFs that are members of naturalexponential families with separable moments

Construction I: families with Riemann-Lebesgue type CFs

Recall F = {Fµ :µ ∈ U

}and let Fµ (t) = ∫

eιtxdFµ (x) be the CF ofFµ where ι=p−1

Let rµ be the modulus of Fµ, so that Fµ = rµeιhµ

Definition (Riemann-Lebesgue type CF)

If{t ∈R : Fµ0 (t) = 0

}=∅, supt∈Rrµ(t)rµ0 (t) <∞ for each µ ∈ U\

{µ0

}, and

limt→∞

1

t

∫[−t,t]

Fµ(y)

Fµ0

(y)dy = 0, (8)

then F = {Fµ :µ ∈ U

}is said to be of “Riemann-Lebesgue type (RL

type)” at µ0 on U .

Construction I: families with Riemann-Lebesgue type CFs

Let ω : [−1,1] →R+ be bounded and Lebesgue-integrates to 1

Theorem (RVs with RL Type CFs)

Let F be of RL type and define K :R2 →R as

K(t,x;µ0

)= ∫[−1,1]

ω (s)cos(tsx−hµ0 (ts)

)rµ0 (ts)

ds. (9)

Then

ψ(t,µ;µ0

)= ∫[−1,1]

ω (s)rµ (ts)

rµ0 (ts)cos

(hµ (ts)−hµ0 (ts)

)ds. (10)

Construction I: application to location-shift families

F = {Fµ :µ ∈ U

}is a location-shift family if and only if z+µ′

has CDF Fµ+µ′ whenever z has CDF Fµ for µ,µ+µ′ ∈ U

Corollary (Construction I for Location-shift families)

If F is a location-shift family for which{t ∈R : Fµ0 (t) = 0

}=∅, then

ψ(t,µ;µ0

)= ∫K

(t,y+ (

µ−µ0)

;µ0)

dFµ0

(y)

=∫

[−1,1]ω (s)cos

(ts

(µ−µ0

))ds.

Construction I: some examples

Gaussian family Normal(µ,σ2

)with mean µ and std dev σ> 0,

for which

K(t,x;µ0

)= ∫[−1,1]

exp(2−1t2s2σ2)ω (s)exp

(ιts

(x−µ0

))ds

Laplace family Laplace(µ,2σ2

)with mean µ and std devp

2σ> 0, for which

K(t,x;µ0

)= ∫[−1,1]

(1+σ2t2s2)ω (s)exp

(ιts

(x−µ0

))ds

Cauchy family Cauchy(µ,σ

)with median µ and scale

parameter σ> 0, for which

K(t,x;µ0

)= ∫[−1,1]

exp(σ |ts|)ω (s)exp(ιts

(x−µ0

))ds

Uniform consistency classes

Definition (Uniform consistency class)

Given a family F , the sequence of sets Qm(µ, t;F

)⊆Rm ×R foreach m ∈N+ is called a “uniform consistency class” for theestimator ϕm (t,z) if

Pr(supµ∈Qm(µ,t;F)

∣∣∣π−11,m supt∈Qm(µ,t;F) ϕm (t,z)−1

∣∣∣→ 0)→ 1.

Define

Bm(ρ)= {

µ ∈Rm : m−1∑m

i=1

∣∣µi −µ0∣∣≤ ρ}

for some ρ > 0

and um = min{∣∣µj −µ0

∣∣ :µj 6=µ0}

Construction I: applied to location-shift families

Theorem (Uniform consistency for Construction I)

If F is a “good” location-shift family, then a uniform consistencyclass is

Qm(µ, t;F

)={Rm

(ρ)= O

(mϑ′)

,τm ≤ γm,um ≥ loglogmγ′′τm

,

t ∈ [0,τm] , limm→∞π

−11,mΥ

(q,τm,γm,rµ0

)= 0

},

where Rm(ρ)∼ ρ, γm = γ′ logm

Υ(q,τm,γm,rµ0

)= 2‖ω‖∞√

2qγmpm

supt∈[0,τm]

∫[0,1]

ds

rµ0 (ts).

Moreover, for all sufficiently large m, with probability at least1−o (1),

supµ∈Bm(ρ)

supt∈[0,τm]

∣∣ϕm (t,z)−ϕm(t,µ

)∣∣≤Υ(q,τm,γm,rµ0

).

Construction I: interpretation of uniform consistency class

For location-shift families, recall rµ0 as the modulus of CF Fµ0 forCDF Fµ0 and

Υ(q,τm,γm,rµ0

)= 2‖ω‖∞√

2qγmpm

supt∈[0,τm]

∫[0,1]

ds

rµ0 (ts)

The speed of convergence and uniform consistency class forthe estimator ϕm (t,z) is mainly determined by rµ0 (not by rµwith µ 6=µ0) and

um = min{∣∣µj −µ0

∣∣ :µj 6=µ0}

Uniform consistency can be achieved when {zi}mi=1 do not have

bounded first-order absolute moment (e.g., Cauchy family)

It is unlikely that ϕm (t,z) can be consistent for π1,m = O(m−0.5

)

A brief review on Natural Exponential Family (NEF)

Let β be a positive non-Dirac Radon measure on R

Let L (θ) = ∫exθβ (dx) for θ ∈R andΘ be the maximal open set

containing θ such that L (θ) <∞AssumeΘ is not empty and let κ (θ) = logL (θ). Then

F = {Gθ : Gθ (dx) = exp(θx−κ (θ))β (dx) ,θ ∈Θ}

forms an NEF with respect to the basis β

Fact: Θ is convex if it has a non-empty interior; L is analytic onthe strip AΘ = {z ∈C : ℜ (z) ∈Θ}

Fact: F can be equivalently characterized as F = {Fµ :µ ∈ U

},

where µ :Θ→ U is the mean function with U =µ (Θ)

Constructions II and III: notational set-up

µ0 is identified with θ0, and µ=µ (θ) for θ ∈ΘReuse ψ but take it as a function of t and θ

Reuse K but take it as a function of t and θ, such that

ψ (t,θ;θ0) =∫

K (t,x;θ0)dGθ (x) for Gθ ∈F .

Let θ = (θ1, . . . ,θm), the phase function

ϕm (t,θ) = 1

m

m∑i=1

[1−ψ (t,θi;θ0)

]and the empirical phase function

ϕm (t,z) = 1

m

m∑i=1

[1−K (t,zi;θ0)]

Constructions II and III: differences from Construction I

Outside families with Riemann-Lebesgue type CFs

Fourier transform used to construct K is no longer applicable

K does not always exist as a solution to

ψ (t,θ;θ0) =∫

K (t,x;θ0)dGθ (x) for Gθ ∈F

Even if K does exist, the transform induced by K on {zi}mi=1 is no

longer Lipschitz or bounded, which causes great difficulty inderiving concentration inequalities for the oscillation


)∣∣Special functions and their asymptotics are needed since NEFsare often related to these functions

Construction II: discrete NEFs with infinite supports

Assume the basis β for F is discrete with supportN, i.e., there existsa positive sequence {ck}k≥0 such that

β=∑∞k=0 ckδk

The power series H (z) =∑∞k=0 ckzk with z ∈C must have a

positive radius of convergence RH

h is the generating function (GF) of β

If β is a probability measure, then (−∞,0] ⊆Θ and RH ≥ 1, andvice versa

Construction II: discrete NEFs with infinite supports

Theorem (Construction for discrete NEF with support N)

Let F be an NEF generated by β with support N. For x ∈N and t ∈Rset

K (t,x;θ0) = H(eθ0

)∫[−1,1]

(ts)x cos(πx2 − tseθ0

)H (x) (0)

ω (s)ds. (11)

Then

ψ (t,θ;θ0) =∫

K (t,x;θ0)dGθ (x)

= H(eθ0

)H

(eθ

) ∫[−1,1]

cos(st

(eθ−eθ0

))ω (s)ds. (12)

Construction II: some examples

Poisson family Poissson(µ)

with mean µ> 0. The basis

β=∞∑

k=0

1

k!δk with L (θ) = exp

(eθ

)for θ ∈R,

µ (θ) = eθ, H (z) = ez with RH =∞ and H (k) (0) = 1 for all k ∈N

Negative Binomial family NegBinomial (θ,n) with θ < 0 andn ∈N+. The basis is

β=∞∑

k=0

c∗kk!δk with c∗k = (k+n−1)!

(n−1)!for k ∈N,

L (θ) = (1−eθ

)−nwith θ < 0, µ (θ) = neθ

(1−eθ

)−1,

H (z) = (1−z)−n with RH = 1, and H (k) (0) = c∗k for all k ∈N

Construction II: uniform consistency classes

Set η= eθ for θ ∈Θ and η= (eθ1 , . . . ,eθm

).

Theorem (Uniform consistency for Construction II)

Let F be an NEF generated by β with support N and ρ a finite,positive constant. If 1

ckk! ≤ Ck! for all k ∈N, then a uniform consistency

class is

QII,1(θ, t,π1,m;γ

)=

‖θ‖∞ ≤ ρ,π1,m ≥ m(γ−1)/2,

t = 2−1∥∥η∥∥−1/2

∞ γ logm,limm→∞ t mini∈I1,m

∣∣η0 −ηi∣∣=∞

for any fixed γ ∈ (0,1].

Construction II: uniform consistency classes

The GF of Poisson family has a constant derivative sequence at0: H (k) (0) = ckk! = 1 for all k ∈N

Theorem (Uniform consistency for Construction II)

For Poisson family, a uniform consistency class is

QII,2(θ, t,π1,m;γ

)=

‖θ‖∞ ≤ ρ,π1,m ≥ m(γ′−1)/2,

t =√∥∥η∥∥−1/2

∞ γ logm,limm→∞ t mini∈I1,m

∣∣η0 −ηi∣∣=∞

for any fixed γ ∈ (0,1) and γ′ > γ.

Construction III: continuous NEFs with separable moments

Assume 0 ∈Θ, so that β is a probability measure with finitemoments of all orders

Let

cn (θ) = 1

L (θ)

∫xneθxβ (dx) =

∫xndGθ (x) for n ∈N (13)

be the moment sequence for Gθ ∈F

Fact: (13) is the Mellin transform of the measure Gθ

Caution: Mellin transform is not Fourier transform eventhough they are connected

Construction III: continuous NEFs with separable moments

Definition (NEF with separable moments)

If there exist two functions ζ,ξ :Θ→R and a sequence {an}n≥0 thatsatisfy the following:

ξ (θ) 6= ξ (θ0) whenever θ 6= θ0, ζ (θ) 6= 0 for all θ ∈ U , and ζ doesnot depend on any n ∈N,

cn (θ) = ξn (θ)ζ (θ) an for each n ∈N and θ ∈Θ,

Ψ (t,θ) =∑∞n=0

tnξn(θ)ann! is absolutely convergent pointwise in

(t,θ) ∈R×Θ,

then the moment sequence {cn (θ)}n≥0 is called “separable” (at θ0).

Construction III: uniform consistency classes

Theorem (Construction for NEFs with Separable Moments)

Assume that the NEF F has a separable moment sequence {cn (θ)}n≥0

at θ0. For t,x ∈R set

K(t,x;µ0

)= 1

ζ (θ0)

∫[−1,1]

∞∑n=0

(−tsx)n cos(π2 n+ tsξ (θ0)

)ann!

ω (s)ds. (14)

Then

ψ(t,µ;µ0

)= ∫K (t,x;θ0)dGθ (x)

= ζ (θ)

ζ (θ0)

∫[−1,1]

cos(ts (ξ (θ0)−ξ (θ)))ω (s)ds. (15)

Construction III: one example

Let Γ be Euler’s Gamma function. Gamma family Gamma (θ,σ):

Its basis dβdν (x) = xσ−1e−x

Γ(σ) 1(0,∞) (x)dx with σ> 0

L (θ) = (1−θ)−σ and µ (θ) = σ1−θ

CDF and PDF:

dGθ

dν(x) = fθ (x) = (1−θ)σ

eθxxσ−1e−x

Γ (σ)1(0,∞) (x) for θ < 1

The Gamma family subsumes Exponential family and centralChi-square family

Construction III: one example

Fact:

cn (θ) = (1−θ)σ∫ ∞

0

e−yyn+σ−1

Γ (σ)

dy

(1−θ)n+σ−1 = Γ (n+σ)

Γ (σ)

1

(1−θ)n ,

which implies ξ (θ) = (1−θ)−1, an = Γ(n+σ)Γ(σ) and ζ≡ 1

Construction III:

K (t,x;θ0) =∫

[−1,1]ω (s)

∞∑n=0

(−tsx)nΓ (σ)cos(π2 n+ ts

1−θ0

)n!Γ (σ+n)

ds,

for which

ψ (t,θ;θ0) =∫

[−1,1]cos

(ts

((1−θ0)−1 − (1−θ)−1))ω (s)ds

Construction III: comparison with Construction I

Recall Construction I applied to location-shift family:

ψ(t,µ;µ0

)= ∫K

(t,x;µ0

)dFµ (x)

=∫

K(t,y+ (

µ−µ0)

;µ0)

dFµ0

(y)

,

where Fourier transform: K(t,x;µ0

) 7→ K(t,y+ (

µ−µ0)

;µ0)

Construction III for Gamma family:

ψ (t,θ;θ0) =∫

K (t,x;θ0)dGθ (x)

=∫

K(t,

y

1−θ ;θ0

)β

(dy

),

where Mellin transform: K (t,x;θ0) 7→ K(t,y (1−θ)−1 ;θ0

)

Construction III: uniform consistency class

Set u3,m = min1≤i≤m {1−θi} and ξ (θ) = (1−θ)−1

Theorem (Uniform Consistency of Construction III)

Consider Gamma family with a fixed σ> 0. Let ρ > 0 be a finiteconstant. If σ> 3/4, then a uniform consistency class is

QIII(θ, t,π1,m;γ

)=

‖θ‖ ≤ ρ, t = 4−1γu3,m logm,limm→∞ u3,m logm =∞,

π1,m ≥ m(γ−1)/2,limm→∞ t mini∈I1,m |ξ (θ0)−ξ (θi)| =∞

for any fixed γ ∈ (0,1].

Construction III: uniform consistency class

Theorem (Uniform Consistency of Construction III)

Consider Gamma family with a fixed σ> 0. Let ρ > 0 be a finiteconstant. If σ≤ 3/4, then

QIII(θ, t,π1,m;γ

)=

‖θ‖ ≤ ρ, t = 4−1γu3,m logm,

π1,m ≥ m(γ′−1)/2,limm→∞ t mini∈I1,m |ξ (θ0)−ξ (θi)| =∞

for any fixed γ ∈ (0,1) and γ′ > γ.

Summary

Solutions to Lebesgue-Stieltjes integral equation can serve asuniversal construction for proportion estimators

Under independence, the induced estimators are oftenuniformly consistent

The uniform consistency class and speed of convergence foreach such estimator are largely determined by

the modulus of the CF corresponding to the null parameter forfamilies with Riemann-Lebesgue type CFs (including manylocation-shift families)the radius of convergence of the generating function of discretenatural exponential familiesthe decaying speed of the moment sequence of continuousnatural exponential families

A few remaining tasks

Construct estimators via solutions to Lebesgue-Stieltjesintegral equations for π1,m for composite null hypotheses suchas

Hi0 :µi ∈ (a,b) vs Hi1 :µi ∉ (a,b)Hi0 :µi ≤ a vs Hi1 :µi > a

Uniform consistency of such estimators under dependenceoutside Gaussian families and their associated mixtures

Classify natural exponential families with separable momentfunctions

A few remaining tasks

Classify distributions whose CFs satisfy{t ∈R : Fµ (t) = 0

}=∅ for µ ∈ U ′ (16)

Completely understand the relationships between familieswith Riemann-Lebesgue type CFs, those whose CFs satisfy(16), and infinitely divisible distributions

Concentration inequalities for sums of non-Lipschitz,unbounded transforms of independent randomize variables

Acknowledgements

Kevin Vixie, Hong-Ming Yin, Charles N. Moore, Sheng-Chi Liufor discussions on solutions to integral and algebraic equations

Gérard Letac for guidance on research involving naturalexponential families

Sergey Lapin and Ovidiu Costin for providing access to twopapers

Jiashun Jin for comments on manuscript for the work

Mark D. Ward for comments on coefficients of exponentialgenerating functions

New Faculty Seed Grant by Washington State University

Key References

Xiongzhi Chen, Estimators of the proportion of false nullhypotheses: I “universal construction via Lebesgue-Stieltjesintegral equations and uniform consistency underindependence”, arxiv preprint

References in the above preprint

Proportionestimators,Lebesgue-Stieltjes integralequations ... · the Lebesgue-Stieltjes integral...

Documents

Transcript of Proportionestimators,Lebesgue-Stieltjes integralequations ... · the Lebesgue-Stieltjes integral...