Proportionestimators,Lebesgue-Stieltjes integralequations ... · the Lebesgue-Stieltjes integral...
Transcript of Proportionestimators,Lebesgue-Stieltjes integralequations ... · the Lebesgue-Stieltjes integral...
Xiongzhi ChenEmail: [email protected]
Department of Mathematics and Statistics
Proportion estimators, Lebesgue-Stieltjesintegral equations, and concentration of
measures
Outline
Background on proportion estimation
Brief review on existing methods
The new method
Summary
Large-scale multiple hypotheses testing
Identify among hundreds of drugs those that may have aspecific side effect
Identify among thousands of genes those that might beassociated with a disease
Identify among millions of SNPs those that might beassociated with the yield of a crop
Four basic questions
Signals are entities that provide scientific information or that ascientist is searching for. Given a data set, one asks 4 basicquestions:
1 Are there any signals? (Testing)
2 If there are signals, how many signals? (Testing)
3 If there are signals, where are they? (Testing)
4 If there are signals, what are their magnitudes? (Estimation)
The proportion of false nulls
Convention: “a signal” is identified with “a false null”, and “anoise” identified with “a true null”
Among m null hypotheses, the proportion of false nulls, i.e.,the signal proportion, is defined as
π1,m = Number of false nulls
m
Example: in gene-disease association study, π1,m is the ratio ofthe number of disease-contributing genes to the total numberof genes under investigation
Paradigms of sparsity
Dense regime: liminfm→∞π1,m > 0, i.e., there is always apositive proportion of signals
Moderately sparse regime: π1,m = O(m−s
)for 0 < s < 0.5
Critically sparse regime: π1,m = O(m−0.5
)Very sparse regime: π1,m = O
(m−s
)for 0.5 < s < 1
Formulation of estimation problem
Data: {zi}mi=1 rv’s each with mean or median µi
Hypotheses: Hi0 :µi =µ0 vs Hi1 :µi 6=µ0
Indices of signals: I1,m = {1 ≤ i ≤ m :µi 6=µ0
}Signal proportion: π1,m = m1m−1, where m1 =
∣∣I1,m∣∣
Target: to consistently estimate π1,m
A brief review on existing approaches
Four approaches to proportion estimation:
1 Hard thresholding: Hi0 is false when |zi| ≥ t for some t > 0
2 Deconvolving two-component mixture model: {zi}mi=1
identically distributed as
f (x) = (1−π1,m
)f0 (x)+π1,mf1 (x)
3 Bounding an excess of uniform empirical process
4 Fourier transform, implemented for Gaussian family ormixtures with a Gaussian component
A brief review on existing approaches
Disadvantages of the four approaches:
Only Approaches 3 and 4 are consistent when π1,m tends tozero
Approach 4 is not applicable when CDFs of rv’s do not consistat least one component from a location-shift family
Consistency of Approach 3 requires CDFs to be absolutelycontinuous
Consistency of Approaches 1 and 2 require restrictivemodelling assumptions (e.g., zi’s corresponding to false nullsare identically distributed)
The strategy: set-up
Let z = (z1, . . . ,zm) and µ= (µ1, ...,µm
)Denote by Fµi the CDF of zi such that Fµi ∈F = {
Fµ :µ ∈ U},
where U has a non-empty interior
Each Fµ is uniquely determined by µ
Recall: Hi0 :µi =µ0 vs Hi1 :µi 6=µ0
The strategy: constructing “Phase function”
For each µ ∈ U , approximate 1{µ6=µ0} by ψ(t,µ;µ0
), t ∈R such
that
limt→∞ψ
(t,µ0;µ0
)= 1 and limt→∞ψ
(t,µ;µ0
)= 0 for µ 6=µ0 (1)
Define “phase function”
ϕm(t,µ
)= 1
m
m∑i=1
[1−ψ(
t,µi;µ0)]
(2)
The phase function: limt→∞ϕm(t,µ
)=π1,m for any fixed mand µ, i.e., the “Oracle”
Λm(µ
)= limt→∞ϕm
(t,µ
)(3)
The strategy: constructing “Empirical phase function”
Construct K :R2 →R that is independent of µ 6=µ0 and solvesthe Lebesgue-Stieltjes integral equation
ψ(t,µ;µ0
)= ∫K
(t,x;µ0
)dFµ (x) (4)
Define “empirical phase function”
ϕm (t,z) = 1
m
m∑i=1
[1−K
(t,zi;µ0
)](5)
The empirical phase function: for any fixed m, t and µ
E[ϕm (t,z)
]=ϕm(t,µ
)(6)
The strategy: intuition for consistency
Recall “phase function” ϕm(t,µ
)and limt→∞ϕm
(t,µ
)=π1,m
Recall “empirical phase function” ϕm (t,z) and
E[ϕm (t,z)
]=ϕm(t,µ
)Intuition: ϕm (t,z) will consistently estimate π1,m, i.e.,
Pr
(∣∣∣∣ ϕm (tm,z)
π1,m−1
∣∣∣∣→ 0
)→ 1 as m →∞ (7)
if
em (tm) = ∣∣ϕm (tm,z)−ϕm(tm,µ
)∣∣→ 0 at appropriate speed
The strategy: main difficulties in implementation
Phase functions may not exist, i.e., no solution in ψ to
limt→∞ψ
(t,µ0;µ0
)= 1 and limt→∞ψ
(t,µ;µ0
)= 0 for µ 6=µ0
or in K (independent of µ 6=µ0) to
ψ(t,µ;µ0
)= ∫K
(t,x;µ0
)dFµ (x)
The oscillation
em (tm) = ∣∣ϕm (tm,z)−ϕm(tm,µ
)∣∣may not converge to 0 when K is not a Lipschitz or boundedtransform of {zi}m
i=1
The strategy: summary of results
When {zi}mi=1 are mutually independent, uniformly consistent
estimators ϕm (t,z) are constructed respectively for the estimationproblem in the settings where
Random variables have Riemann-Lebesgue type characteristicfunctions (RL Type CFs)
Random variables have CDFs that are members of naturalexponential families with support N
Random variables have CDFs that are members of naturalexponential families with separable moments
Construction I: families with Riemann-Lebesgue type CFs
Recall F = {Fµ :µ ∈ U
}and let Fµ (t) = ∫
eιtxdFµ (x) be the CF ofFµ where ι=p−1
Let rµ be the modulus of Fµ, so that Fµ = rµeιhµ
Definition (Riemann-Lebesgue type CF)
If{t ∈R : Fµ0 (t) = 0
}=∅, supt∈Rrµ(t)rµ0 (t) <∞ for each µ ∈ U\
{µ0
}, and
limt→∞
1
t
∫[−t,t]
Fµ(y)
Fµ0
(y)dy = 0, (8)
then F = {Fµ :µ ∈ U
}is said to be of “Riemann-Lebesgue type (RL
type)” at µ0 on U .
Construction I: families with Riemann-Lebesgue type CFs
Let ω : [−1,1] →R+ be bounded and Lebesgue-integrates to 1
Theorem (RVs with RL Type CFs)
Let F be of RL type and define K :R2 →R as
K(t,x;µ0
)= ∫[−1,1]
ω (s)cos(tsx−hµ0 (ts)
)rµ0 (ts)
ds. (9)
Then
ψ(t,µ;µ0
)= ∫[−1,1]
ω (s)rµ (ts)
rµ0 (ts)cos
(hµ (ts)−hµ0 (ts)
)ds. (10)
Construction I: application to location-shift families
F = {Fµ :µ ∈ U
}is a location-shift family if and only if z+µ′
has CDF Fµ+µ′ whenever z has CDF Fµ for µ,µ+µ′ ∈ U
Corollary (Construction I for Location-shift families)
If F is a location-shift family for which{t ∈R : Fµ0 (t) = 0
}=∅, then
ψ(t,µ;µ0
)= ∫K
(t,y+ (
µ−µ0)
;µ0)
dFµ0
(y)
=∫
[−1,1]ω (s)cos
(ts
(µ−µ0
))ds.
Construction I: some examples
Gaussian family Normal(µ,σ2
)with mean µ and std dev σ> 0,
for which
K(t,x;µ0
)= ∫[−1,1]
exp(2−1t2s2σ2)ω (s)exp
(ιts
(x−µ0
))ds
Laplace family Laplace(µ,2σ2
)with mean µ and std devp
2σ> 0, for which
K(t,x;µ0
)= ∫[−1,1]
(1+σ2t2s2)ω (s)exp
(ιts
(x−µ0
))ds
Cauchy family Cauchy(µ,σ
)with median µ and scale
parameter σ> 0, for which
K(t,x;µ0
)= ∫[−1,1]
exp(σ |ts|)ω (s)exp(ιts
(x−µ0
))ds
Uniform consistency classes
Definition (Uniform consistency class)
Given a family F , the sequence of sets Qm(µ, t;F
)⊆Rm ×R foreach m ∈N+ is called a “uniform consistency class” for theestimator ϕm (t,z) if
Pr(supµ∈Qm(µ,t;F)
∣∣∣π−11,m supt∈Qm(µ,t;F) ϕm (t,z)−1
∣∣∣→ 0)→ 1.
Define
Bm(ρ)= {
µ ∈Rm : m−1∑m
i=1
∣∣µi −µ0∣∣≤ ρ}
for some ρ > 0
and um = min{∣∣µj −µ0
∣∣ :µj 6=µ0}
Construction I: applied to location-shift families
Theorem (Uniform consistency for Construction I)
If F is a “good” location-shift family, then a uniform consistencyclass is
Qm(µ, t;F
)={Rm
(ρ)= O
(mϑ′)
,τm ≤ γm,um ≥ loglogmγ′′τm
,
t ∈ [0,τm] , limm→∞π
−11,mΥ
(q,τm,γm,rµ0
)= 0
},
where Rm(ρ)∼ ρ, γm = γ′ logm
Υ(q,τm,γm,rµ0
)= 2‖ω‖∞√
2qγmpm
supt∈[0,τm]
∫[0,1]
ds
rµ0 (ts).
Moreover, for all sufficiently large m, with probability at least1−o (1),
supµ∈Bm(ρ)
supt∈[0,τm]
∣∣ϕm (t,z)−ϕm(t,µ
)∣∣≤Υ(q,τm,γm,rµ0
).
Construction I: interpretation of uniform consistency class
For location-shift families, recall rµ0 as the modulus of CF Fµ0 forCDF Fµ0 and
Υ(q,τm,γm,rµ0
)= 2‖ω‖∞√
2qγmpm
supt∈[0,τm]
∫[0,1]
ds
rµ0 (ts)
The speed of convergence and uniform consistency class forthe estimator ϕm (t,z) is mainly determined by rµ0 (not by rµwith µ 6=µ0) and
um = min{∣∣µj −µ0
∣∣ :µj 6=µ0}
Uniform consistency can be achieved when {zi}mi=1 do not have
bounded first-order absolute moment (e.g., Cauchy family)
It is unlikely that ϕm (t,z) can be consistent for π1,m = O(m−0.5
)
A brief review on Natural Exponential Family (NEF)
Let β be a positive non-Dirac Radon measure on R
Let L (θ) = ∫exθβ (dx) for θ ∈R andΘ be the maximal open set
containing θ such that L (θ) <∞AssumeΘ is not empty and let κ (θ) = logL (θ). Then
F = {Gθ : Gθ (dx) = exp(θx−κ (θ))β (dx) ,θ ∈Θ}
forms an NEF with respect to the basis β
Fact: Θ is convex if it has a non-empty interior; L is analytic onthe strip AΘ = {z ∈C : ℜ (z) ∈Θ}
Fact: F can be equivalently characterized as F = {Fµ :µ ∈ U
},
where µ :Θ→ U is the mean function with U =µ (Θ)
Constructions II and III: notational set-up
µ0 is identified with θ0, and µ=µ (θ) for θ ∈ΘReuse ψ but take it as a function of t and θ
Reuse K but take it as a function of t and θ, such that
ψ (t,θ;θ0) =∫
K (t,x;θ0)dGθ (x) for Gθ ∈F .
Let θ = (θ1, . . . ,θm), the phase function
ϕm (t,θ) = 1
m
m∑i=1
[1−ψ (t,θi;θ0)
]and the empirical phase function
ϕm (t,z) = 1
m
m∑i=1
[1−K (t,zi;θ0)]
Constructions II and III: differences from Construction I
Outside families with Riemann-Lebesgue type CFs
Fourier transform used to construct K is no longer applicable
K does not always exist as a solution to
ψ (t,θ;θ0) =∫
K (t,x;θ0)dGθ (x) for Gθ ∈F
Even if K does exist, the transform induced by K on {zi}mi=1 is no
longer Lipschitz or bounded, which causes great difficulty inderiving concentration inequalities for the oscillation
em (tm) = ∣∣ϕm (tm,z)−ϕm(tm,µ
)∣∣Special functions and their asymptotics are needed since NEFsare often related to these functions
Construction II: discrete NEFs with infinite supports
Assume the basis β for F is discrete with supportN, i.e., there existsa positive sequence {ck}k≥0 such that
β=∑∞k=0 ckδk
The power series H (z) =∑∞k=0 ckzk with z ∈C must have a
positive radius of convergence RH
h is the generating function (GF) of β
If β is a probability measure, then (−∞,0] ⊆Θ and RH ≥ 1, andvice versa
Construction II: discrete NEFs with infinite supports
Theorem (Construction for discrete NEF with support N)
Let F be an NEF generated by β with support N. For x ∈N and t ∈Rset
K (t,x;θ0) = H(eθ0
)∫[−1,1]
(ts)x cos(πx2 − tseθ0
)H (x) (0)
ω (s)ds. (11)
Then
ψ (t,θ;θ0) =∫
K (t,x;θ0)dGθ (x)
= H(eθ0
)H
(eθ
) ∫[−1,1]
cos(st
(eθ−eθ0
))ω (s)ds. (12)
Construction II: some examples
Poisson family Poissson(µ)
with mean µ> 0. The basis
β=∞∑
k=0
1
k!δk with L (θ) = exp
(eθ
)for θ ∈R,
µ (θ) = eθ, H (z) = ez with RH =∞ and H (k) (0) = 1 for all k ∈N
Negative Binomial family NegBinomial (θ,n) with θ < 0 andn ∈N+. The basis is
β=∞∑
k=0
c∗kk!δk with c∗k = (k+n−1)!
(n−1)!for k ∈N,
L (θ) = (1−eθ
)−nwith θ < 0, µ (θ) = neθ
(1−eθ
)−1,
H (z) = (1−z)−n with RH = 1, and H (k) (0) = c∗k for all k ∈N
Construction II: uniform consistency classes
Set η= eθ for θ ∈Θ and η= (eθ1 , . . . ,eθm
).
Theorem (Uniform consistency for Construction II)
Let F be an NEF generated by β with support N and ρ a finite,positive constant. If 1
ckk! ≤ Ck! for all k ∈N, then a uniform consistency
class is
QII,1(θ, t,π1,m;γ
)=
‖θ‖∞ ≤ ρ,π1,m ≥ m(γ−1)/2,
t = 2−1∥∥η∥∥−1/2
∞ γ logm,limm→∞ t mini∈I1,m
∣∣η0 −ηi∣∣=∞
for any fixed γ ∈ (0,1].
Construction II: uniform consistency classes
The GF of Poisson family has a constant derivative sequence at0: H (k) (0) = ckk! = 1 for all k ∈N
Theorem (Uniform consistency for Construction II)
For Poisson family, a uniform consistency class is
QII,2(θ, t,π1,m;γ
)=
‖θ‖∞ ≤ ρ,π1,m ≥ m(γ′−1)/2,
t =√∥∥η∥∥−1/2
∞ γ logm,limm→∞ t mini∈I1,m
∣∣η0 −ηi∣∣=∞
for any fixed γ ∈ (0,1) and γ′ > γ.
Construction III: continuous NEFs with separable moments
Assume 0 ∈Θ, so that β is a probability measure with finitemoments of all orders
Let
cn (θ) = 1
L (θ)
∫xneθxβ (dx) =
∫xndGθ (x) for n ∈N (13)
be the moment sequence for Gθ ∈F
Fact: (13) is the Mellin transform of the measure Gθ
Caution: Mellin transform is not Fourier transform eventhough they are connected
Construction III: continuous NEFs with separable moments
Definition (NEF with separable moments)
If there exist two functions ζ,ξ :Θ→R and a sequence {an}n≥0 thatsatisfy the following:
ξ (θ) 6= ξ (θ0) whenever θ 6= θ0, ζ (θ) 6= 0 for all θ ∈ U , and ζ doesnot depend on any n ∈N,
cn (θ) = ξn (θ)ζ (θ) an for each n ∈N and θ ∈Θ,
Ψ (t,θ) =∑∞n=0
tnξn(θ)ann! is absolutely convergent pointwise in
(t,θ) ∈R×Θ,
then the moment sequence {cn (θ)}n≥0 is called “separable” (at θ0).
Construction III: uniform consistency classes
Theorem (Construction for NEFs with Separable Moments)
Assume that the NEF F has a separable moment sequence {cn (θ)}n≥0
at θ0. For t,x ∈R set
K(t,x;µ0
)= 1
ζ (θ0)
∫[−1,1]
∞∑n=0
(−tsx)n cos(π2 n+ tsξ (θ0)
)ann!
ω (s)ds. (14)
Then
ψ(t,µ;µ0
)= ∫K (t,x;θ0)dGθ (x)
= ζ (θ)
ζ (θ0)
∫[−1,1]
cos(ts (ξ (θ0)−ξ (θ)))ω (s)ds. (15)
Construction III: one example
Let Γ be Euler’s Gamma function. Gamma family Gamma (θ,σ):
Its basis dβdν (x) = xσ−1e−x
Γ(σ) 1(0,∞) (x)dx with σ> 0
L (θ) = (1−θ)−σ and µ (θ) = σ1−θ
CDF and PDF:
dGθ
dν(x) = fθ (x) = (1−θ)σ
eθxxσ−1e−x
Γ (σ)1(0,∞) (x) for θ < 1
The Gamma family subsumes Exponential family and centralChi-square family
Construction III: one example
Fact:
cn (θ) = (1−θ)σ∫ ∞
0
e−yyn+σ−1
Γ (σ)
dy
(1−θ)n+σ−1 = Γ (n+σ)
Γ (σ)
1
(1−θ)n ,
which implies ξ (θ) = (1−θ)−1, an = Γ(n+σ)Γ(σ) and ζ≡ 1
Construction III:
K (t,x;θ0) =∫
[−1,1]ω (s)
∞∑n=0
(−tsx)nΓ (σ)cos(π2 n+ ts
1−θ0
)n!Γ (σ+n)
ds,
for which
ψ (t,θ;θ0) =∫
[−1,1]cos
(ts
((1−θ0)−1 − (1−θ)−1))ω (s)ds
Construction III: comparison with Construction I
Recall Construction I applied to location-shift family:
ψ(t,µ;µ0
)= ∫K
(t,x;µ0
)dFµ (x)
=∫
K(t,y+ (
µ−µ0)
;µ0)
dFµ0
(y)
,
where Fourier transform: K(t,x;µ0
) 7→ K(t,y+ (
µ−µ0)
;µ0)
Construction III for Gamma family:
ψ (t,θ;θ0) =∫
K (t,x;θ0)dGθ (x)
=∫
K(t,
y
1−θ ;θ0
)β
(dy
),
where Mellin transform: K (t,x;θ0) 7→ K(t,y (1−θ)−1 ;θ0
)
Construction III: uniform consistency class
Set u3,m = min1≤i≤m {1−θi} and ξ (θ) = (1−θ)−1
Theorem (Uniform Consistency of Construction III)
Consider Gamma family with a fixed σ> 0. Let ρ > 0 be a finiteconstant. If σ> 3/4, then a uniform consistency class is
QIII(θ, t,π1,m;γ
)=
‖θ‖ ≤ ρ, t = 4−1γu3,m logm,limm→∞ u3,m logm =∞,
π1,m ≥ m(γ−1)/2,limm→∞ t mini∈I1,m |ξ (θ0)−ξ (θi)| =∞
for any fixed γ ∈ (0,1].
Construction III: uniform consistency class
Theorem (Uniform Consistency of Construction III)
Consider Gamma family with a fixed σ> 0. Let ρ > 0 be a finiteconstant. If σ≤ 3/4, then
QIII(θ, t,π1,m;γ
)=
‖θ‖ ≤ ρ, t = 4−1γu3,m logm,
π1,m ≥ m(γ′−1)/2,limm→∞ t mini∈I1,m |ξ (θ0)−ξ (θi)| =∞
for any fixed γ ∈ (0,1) and γ′ > γ.
Summary
Solutions to Lebesgue-Stieltjes integral equation can serve asuniversal construction for proportion estimators
Under independence, the induced estimators are oftenuniformly consistent
The uniform consistency class and speed of convergence foreach such estimator are largely determined by
the modulus of the CF corresponding to the null parameter forfamilies with Riemann-Lebesgue type CFs (including manylocation-shift families)the radius of convergence of the generating function of discretenatural exponential familiesthe decaying speed of the moment sequence of continuousnatural exponential families
A few remaining tasks
Construct estimators via solutions to Lebesgue-Stieltjesintegral equations for π1,m for composite null hypotheses suchas
Hi0 :µi ∈ (a,b) vs Hi1 :µi ∉ (a,b)Hi0 :µi ≤ a vs Hi1 :µi > a
Uniform consistency of such estimators under dependenceoutside Gaussian families and their associated mixtures
Classify natural exponential families with separable momentfunctions
A few remaining tasks
Classify distributions whose CFs satisfy{t ∈R : Fµ (t) = 0
}=∅ for µ ∈ U ′ (16)
Completely understand the relationships between familieswith Riemann-Lebesgue type CFs, those whose CFs satisfy(16), and infinitely divisible distributions
Concentration inequalities for sums of non-Lipschitz,unbounded transforms of independent randomize variables
Acknowledgements
Kevin Vixie, Hong-Ming Yin, Charles N. Moore, Sheng-Chi Liufor discussions on solutions to integral and algebraic equations
Gérard Letac for guidance on research involving naturalexponential families
Sergey Lapin and Ovidiu Costin for providing access to twopapers
Jiashun Jin for comments on manuscript for the work
Mark D. Ward for comments on coefficients of exponentialgenerating functions
New Faculty Seed Grant by Washington State University
Key References
Xiongzhi Chen, Estimators of the proportion of false nullhypotheses: I “universal construction via Lebesgue-Stieltjesintegral equations and uniform consistency underindependence”, arxiv preprint
References in the above preprint