Continuous shrinkage prior revisited : a collapsing ...debdeep/pres-GLT.pdf · Continuous shrinkage...
Transcript of Continuous shrinkage prior revisited : a collapsing ...debdeep/pres-GLT.pdf · Continuous shrinkage...
Continuous shrinkage prior revisited: a collapsing behavior and remedy
Se Yoon Lee, Debdeep Pati, Bani K. [email protected] [email protected] [email protected]
Department of Statistics, Texas A & M University
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 1 / 58
High-dimensional Linear Regression
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 2 / 58
High-dimensional linear regression
Consider the high-dimensional setting: predict a vector y ∈ Rn from a setof features X ∈ Rn×p, with p � n
y = Xβ + σε, ε ∼ Nn(0, In), and β is sparse.
β is said to be sparse if substantially many coefficients in β areassumed to be zeros or approximately zeros.
Main objective of high-dimensional linear regression
Main purpose is to estimate β = (β1, · · · , βj , · · · , βp)> ∈ Rp.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 3 / 58
Example: gene-expression data study
y1...yi...yn
=
x11 · · · x1j · · · x1p
......
xi1 · · · xij · · · xip...
...xn1 · · · xnj · · · xnp
β1...βj...βp
+ σ
ε1...εi...εn
yi : a health response from the i-th subject.
xij : the gene-expression level of j-th gene from the i-th subject.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 4 / 58
Sparsity level in general
Let q denote the number of signals in β = (β1, · · · , βp)> ∈ Rp.
We define the following ratio as the sparsity level;
Definition (Sparsity level in general)
s =q
p=
the number of relevant predictors
total number of predictors
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 5 / 58
Sparsity level in cancer study
Definition (Sparsity level in cancer study)
s =q
p=
the number of interesting genes
the number of protein-coding genes
Remark:
The latest estimate for protein-coding gene is p = 21, 306. (Willyard,Nature, 2018).
Geneticists now like to discover more and more interesting genes;growing q movement.
A recent request and movement in gene-expression data study
We need a sparse method which works nicely not only under ultra-sparseregime (s is very small) but also under moderately-sparse regime (s is nottoo small). Our motivation of research is that in Bayesian context, thisissue is closely related with tail of a prior.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 6 / 58
Growing q movement: Beyond BRCA 1 & 2 Movement
List of additionally discovered genes relevant with breast cancer areATM, BARD1, BRIP1, CDH1, CHEK2, MRE11A, MSH6, NBN, PALB2,
PMS2, RAD50, RAD51C, STK11, and TP53.(Visit https://www.breastcancer.org/risk/factors/genetics.)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 7 / 58
Growing q movement: National Institutes of Health (NIH)Research Outline 2019
National Institutes of Health (NIH) Research Outline 2019(Visit https://ghr.nlm.nih.gov/primer.)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 8 / 58
Bayesian approaches to high-dimensional linear regression
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 9 / 58
Bayesian approaches to high-dimensional linear regressionConsider the high-dimensional setting with p � n
y = Xβ + σε, ε ∼ Nn(0, In), and β is sparse.
Bayesian approaches
Sample from the posterior distribution:
π(β|y) ∝ Nn(y|Xβ, σ2In) · π(β),
for some sparsity favoring prior π(β).
(Typically, in Bayesian analysis, prior for the error variance σ2 is given by the Jeffreys
prior: this is not our concern.)
There are two types of priors for π(β):
Spike-and-slab priors
Continuous shrinkage priors (main talk)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 10 / 58
Spike-and-slab type priorsHistorically, spike-and-slab priors have been considered as natural solutions.
Spike-and-slab type priors (George & McCulloch, 1997)
Each component of the β = (β1, · · · , βp)> is assumed to be drawn from
βj |τ, φ ∼ (1− τ) · δ0(βj) + τ · fφ(βj), (j = 1, · · · , p),
where τ = Pr(βj 6= 0) and φ ∼ r(φ). δ0 is the Direc-delta function and fφ is a
density on R with parameter φ.
A drawback: computation is prohibitive.
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
β
de
nsity
spike
slab
Spike−and−slab
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 11 / 58
Continuous shrinkage priors
Basic idea of continuous shrinkage priors is to mimic the spike-and-slab
prior by a single continuous density.
Global-local-shrinkage priors
βj |λj , τ, σ2 ∼ N1(0, λ2j τ2σ2), σ2 ∼ h(σ2), (j = 1, · · · , p),
λj ∼ f (λj), τ ∼ g(τ), (j = 1, · · · , p),
where h, f , and g are densities supported on (0,∞).
λj : local-scale parameter associated with βj . Detect signal or noisewith magnitude of λj . Provide the heterogeneity for signal detection.
τ : global-scale parameter. Shrinkage β globally towards 0.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 12 / 58
The Horseshoe (Carvalho, Polson, & Scott, 2010, Bka)
Hierarhical formulation of the Horseshoe
βj |λj , τ, σ2 ∼ N1(0, λ2j τ2σ2), σ2 ∼ π(σ2) ∝ 1/σ2, (j = 1, · · · , p),
λj ∼ C+(0, 1), τ ∼ C+(0, 1), (j = 1, · · · , p),
where C+(0, 1) represents the half-Cauchy density:
π(x) = C+(x |0, 1) =1
1 + x2, x > 0.
The Horseshoe has no hyper-parameter.
Posterior computation is very fast: use R package horseshoe.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 13 / 58
How to visualize the Horseshoe ?
Start by fixing τ > 0, and consider the univariate setup:
β|λ ∼ N1(0, τ2λ2) and λ ∼ C+(0, 1).
And then, integrate out λ, to get marinal density
π(β|τ) =
∫ ∞0N1(β|0, τ2λ2) · C+(λ|0, 1)dλ
= K · eZ(β) · E1{Z (β)},
where K = 1/(τ21/2π3/2) and Z (β) = β2/(2τ2). E1(x) =∫∞1 e−xtt−1dt
is the exponential integral function.
A key idea of Horseshoe
Many nice properties of the Horseshoe come from E1{Z (β)}.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 14 / 58
Properties of the HorseshoeThe Horseshoe is characterized by two nice properties:
1 Inifinite spike at origin, i.e., limβ→0 π(β|τ) =∞, τ > 0.2 Heavy-tail of π(β|τ) for any τ > 0.
−4 −2 0 2 4
010
20
30
40
β
density
The Horseshoe
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 15 / 58
Spike-and-slab type prior versus the Horseshoe
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
β
density
spikeslab
Spike−and−slab
−4 −2 0 2 40
10
20
30
40
β
density
The Horseshoe
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 16 / 58
Other nice properties of the Horseshoe
1 Under the sparsity assumption
s =q
p→ 0 and n, p →∞
The Horseshoe possesses nice theoretical properties. (Bai & Ghosh,2018; Polson, Scott & Scott, 2010; van der Pas, Szabo, & van derVaart, 2017)
2 For e.g., the Horseshoe estimator (posterior mean for β) is robust andattains the minimax-optimal rate up to a constant. (Van Der Pas etal. 2014, 2016)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 17 / 58
How does the tail the Horseshoe behave ?
First, lets define the tail-index typically used in extreme value theory.
Definition (Tail-index = 1/shape parameter)
Suppose random variable X is distributed with cdf F . Then the tail-indexof the density, f = F ′, is the value α > 0, satisfying
F̄ (x) = L(x) · x−α,
where F̄ = 1− F is survival function and L is a slowly varying function.The reciprocal of tail-index, ξ = 1/α, is called the shape parameter.
Key concept in understanding tail-behavior of a density f
tail-heaviness of f ↑ ⇔ tail-index α ↓ 0 ⇔ shape parameter ξ ↑ ∞
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 18 / 58
Drawback: Restricted tail behavior of the Horseshoe
Theorem (Fixed tail-index of the Horseshoe)
Assume β|λ, τ ∼ N1(0, τ2λ2), λ ∼ C+(0, 1), and τ > 0. Then thetail-index of π(β|τ) is fixed with α = 1 for any value of τ > 0.
A drawback of the Horseshoe1 The Horseshoe is only able to control the center part.
2 The Horsehoe does NOT have a tail-index controlling mechanism.
−4 −2 0 2 4
0.0
0.5
1.0
1.5
2.0
β
de
nsity
π(β|τ=1) π(β|τ=0.5) π(β|τ=0.01)
π(β|τ)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 19 / 58
Global-local-tail shrinkage priors
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 20 / 58
Idealistic tail behavior of continuous shrinkage prior
KEY IDEA of Global-local-tail shrinkage priors
Idealistically, as the sparsity-level s = q/p increases, the shape parameter ξ = 1/α of the
marginal density of β should also accordingly increase to accommodate more signals.
Figure: Tail-part of the marginal density of β
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 21 / 58
Idealistic tail behavior of continuous shrinkage prior
KEY IDEA of Global-local-tail shrinkage priors
Idealistically, as the sparsity-level s = q/p increases, the shape parameter ξ = 1/α of the
marginal density of β should also accordingly increase to accommodate more signals.
Figure: Tail-part of the marginal density of β. Tail is lifted to accomodate moresignals.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 22 / 58
Global-local-tail shrinkage prior
We introduce a new framework for continuous shrinakge priors:
Definition (Hierarchy of global-local-tail shrinkage prior)
βj |λj , σ2 ∼ N1(0, λ2j σ2), σ2 ∼ h(σ2), (j = 1, · · · , p),
λj |τ, ξ ∼ f (λj |τ, ξ), (j = 1, · · · , p),
(τ, ξ) ∼ g(τ, ξ),
where f is a density supported on (0,∞) with the scale parameter τ > 0and the shape parameter ξ > 0, and and g is a joint density supported on(0,∞)× (0,∞). h is a density supported on (0,∞)
The local-scale density f plays a key role.
Fully Bayesian inference about ξ is challenging because ξ exists as anexponent part of f .
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 23 / 58
Global-local-tail shrinkage prior
Table: Unit scaled densities for f
f (λ|τ = 1, ξ) Shape ξHalf-α-stable distribution non-closed form ξHalf-Cauchy distribution 2{π(1 + λ2)}−1 1
Half-Levy distribution λ−3/2exp{−1/(2λ)}/√
2π 2Loggamma distribution {(1 + λ)−(1/ξ+1)}/ξ ξ
Generalized extreme value distribution exp{−(1+ξλ)−1/ξ}(1+ξλ)(1/ξ+1) ξ
Generalized Pareto distribution (1 + ξλ)−(1/ξ+1) ξ
If choosing half-Cauchy distribution for f , one can get theHorseshoe: the Horseshoe ∈ the global-local-tail shrinkage priors.
Fully Bayesian inference about ξ is challenging because ξ exists as anexponent part of f .
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 24 / 58
The GLT prior
KEY IDEA of The GLT prior
We use the Generalized Pareto Distribution (GPD) instead of half-Cauchy
distribution.
Definition (Hierarchy of the GLT prior πGLT(β))
βj |λj , σ2 ∼ N1(0, λ2j σ2), σ2 ∼ π(σ2) ∝ 1/σ2, (j = 1, · · · , p),
λj |τ, ξ ∼ GPD(τ, ξ), (j = 1, · · · , p),
τ |ξ ∼ IG(p/ξ + 1, 1),
ξ ∼ log N (µ, ρ2) · I(1/2,∞), µ ∈ R, ρ2 > 0.
(GPD: GPD; IG: inverse-gamma; logN : log-normal)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 25 / 58
The GLT prior
Graphical model representation of
y|β, σ2 ∼ Nn(Xβ, σ2In) and β ∼ πGLT(β).
y
β1
βp
σ2
λ1
λp
τ
ξ µ, ρ2
µ and ρ2: hyper-parameters.
We developed the elliptical slice sampler centered by the Hillestimator which enables
1 to learn the shape parameter ξ according to the sparsity-level,2 and render the posterior computation tuning-free.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 26 / 58
Tail behavior of the GLT priorFor a fixed τ > 0 and ξ > 1/2, consider a univariate setup:
β|λ ∼ N1(0, λ2) and λ ∼ GPD(τ, ξ).
And then, integrate out λ, to get marinal density
π(β|τ, ξ) =
∫ ∞0N1(β|0, λ2) · GPD(λ|τ, ξ)dλ
=∞∑k=0
ak{ψSk (β) + ψR
k (β)},
where ak = (−1)k · K ·(1/ξ+k
k
), K = 1/(τ23/2π1/2), Z (β) = β2ξ2/(2τ 2),
ψSk (β) = Ek/2+1{Z (β)}, and ψR
k (β) = Z (β)−1+1/ξ+k
2 γ{(1 + 1/ξ + k)/2,Z (β)}.Es(x) =
∫∞1
e−xtt−sdt, is the generalized exponential-integral function.
Key massage of this slide
Existence of {ψRk (β)}∞k=0 provides a great flexibility to the shape of the density
π(β|τ, ξ). We call the series of functions tail lifters.
The Horseshoe does NOT have tail lifters.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 27 / 58
The Horseshoe versus The GLT prior
Let’s summarize properties of two priors.
Characteristics of the Horseshoe1 Infinite spike at origin, i.e., limβ→0 π(β|τ) =∞.
2 Heavy-tail of π(β|τ). Tail-index is α = 1 (shape parameter is ξ = 1).
3 [Drawback] Restricted tail behavior of π(β|τ). Fixed tail-index.
Characteristics of the GLT prior
1 Infinite spike at origin, i.e., limβ→0 π(β|τ, ξ) =∞.
2 Heavy-tail of π(β|τ, ξ).
3 Controllable tail behavior of π(β|τ, ξ) through controlling ξ.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 28 / 58
The Horseshoe versus The GLT prior (τ = 1)
Comparison between πHS(β|τ) and πGLT(β|τ, ξ) when τ = 1.
πHS(β|τ): black curve.
πGLT(β|τ, ξ): colored in red (ξ = 1), green (ξ = 1.5), blue (ξ = 2),and violet (ξ = 3), respectively.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 29 / 58
The Horseshoe versus The GLT prior (τ = 0.001)
Comparison between πHS(β|τ) and πGLT(β|τ, ξ) when τ = 0.001.
πHS(β|τ): black curve.
πGLT(β|τ, ξ): colored in red (ξ = 1), green (ξ = 1.5), blue (ξ = 2),and violet (ξ = 3), respectively.
When τ is very small, πHS(β|τ) numerically becomes a Dirac-deltafunction.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 30 / 58
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 31 / 58
Application I: Breast cancer data study
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 32 / 58
Application I: Breast cancer data study
Consider the high-dimensional linear regression;
y = Xβ + σε, ε ∼ Nn(0, In).
We collected data from The Cancer Genome Atlas (TCGA):
(y,X) ∈ Rn × Rn×p: n = 729 breast cancer patients, p = 3250 genes.
yi : logarithm of overall-survival time from the i-th patient.
xij : the gene-expression level of j-th gene from the i-th patient.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 33 / 58
Application I: Breast cancer data studyTo see the behavior of the Horseshoe as the numbe of genes used increase,we constructed four datasets, B1 ⊂ B2 ⊂ B3 ⊂ B4;
B1 = (y,X[·, 1 : 500]) (500 genes),
B2 = (y,X[·, 1 : 1000]) (1,000 genes),
B3 = (y,X[·, 1 : 2000]) (2,000 genes),
and B4 = (y,X[·, 1 : 3250]) (3,250 genes; full data).
Guess on the sparsity level of the four datasets
Sparsity level of the datasets may increase as the number of genesincreases. (Of course, sparsity level is unknown.)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 34 / 58
Top 50 gene ranking plot (the Horseshoe)
−6
−3
0
3
6
gene rank [j]
95
% c
red
ible
in
terv
al o
f β
[j]
−6
−3
0
3
6
gene rank [j]
95
% c
red
ible
in
terv
al o
f β
[j]
−1e−03
−5e−04
0e+00
5e−04
1e−03
gene rank [j]
95
% c
red
ible
in
terv
al o
f β
[j]
−1e−03
−5e−04
0e+00
5e−04
1e−03
gene rank [j]
95
% c
red
ible
in
terv
al o
f β
[j]
Figure: B1 (top-left), B2 (top-right), B3 (bottom-left), and B4 (bottom-right).
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 35 / 58
Top 10 interesting genes (the Horseshoe)
Table: Top 10 interesting genes selected by the Horseshoe,
1 2 3 4 5B1 NGEF(−) PLN(−) C3orf59(+) C21orf63(+) LOC100130331(−)B2 FAM138F(−) SLC39A4(−) PLN(−) NGEF(−) PCGF5(+)B3 NA NA NA NA NAB4 NA NA NA NA NA
6 7 8 9 10B1 FCGR2A(−) HES4(+) BCAP31(−) GSTM1(+) TOB2(−)B2 HES4(+) FCGR2A(−) FCGR2C(−) TOB2(−) BCAP31(−)B3 NA NA NA NA NAB4 NA NA NA NA NA
NOTE: Contents of table is (gene name, direction). Gene with (+) and Gene with (−)may enhance and undermine immune system of breast cancer patients, respectively. Whenthe horseshoe prior is applied to the datasets B3 and B4, genes are unranked because thehorseshoe estimator collapsed.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 36 / 58
Top 50 gene ranking plot (the GLT prior)
−8
−4
0
4
8
gene rank [j]
95%
cre
dib
le inte
rval of
β[j]
−8
−4
0
4
8
gene rank [j]
95%
cre
dib
le inte
rval of
β[j]
−8
−4
0
4
8
gene rank [j]
95
% c
redib
le inte
rval of
β[j]
−8
−4
0
4
8
gene rank [j]
95
% c
redib
le inte
rval of
β[j]
Figure: B1 (top-left), B2 (top-right), B3 (bottom-left), and B4 (bottom-right).
Posterior means of ξ are
2.188(B1) < 2.230(B2) < 2.382(B3) < 2.922(B4).
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 37 / 58
Top 10 interesting genes (the GLT prior)
Table: Top 10 interesting genes selected by the GLT prior
1 2 3 4 5B1 NGEF(−) C21orf63(+) PLN(−) C3orf59(+) FCGR2A(−)B2 FAM138F(−) SLC39A4(−) NGEF(−) PCGF5(+) PLN(−)B3 FAM138F(−) SLC39A4(−) NGEF(−) PLN(−) COL7A1(−)B4 FAM138F(−) NSUN4(−) COL7A1(−) LOC150776(+) NGEF(−)
6 7 8 9 10B1 BCAP31(−) GSTM1(+) LOC100130331 (−) TOB2(−) ABCA17P(+)B2 FCGR2A(−) CRHR1(+) TOB2(−) GSTM1(+) LOC150776(+)B3 CRHR1(+) FCGR2A(−) RPLP1(+) HES4(+) TOB2(−)B4 SMCHD1(+) RPLP1(+) HES4(+) SLC37A2(−) SLC39A4(−)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 38 / 58
Literature coherency with oncology/genetics for theselected genes via the GLT prior for the dataset B4
See our Supplemental materials for more detail.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 39 / 58
Application II: Prostate cancer data study(Bradley Efron, 2008, 2010)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 40 / 58
Prostate cancer data study (Bradley Efron, 2008, 2010)Dataset is based on two-sample test-statistics, {yj}pj=1 (p = 6, 033genes): Data is avaiable in R package sda. See Efron (2008) fordetail.
Histogram of {yj}pj=1.
: cancerous genes (signals)
source: The Future of Indirect Evidence, Bradley Efron, Statistical Science, 2010
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 41 / 58
Prostate cancer data study (Bradley Efron, 2008, 2010)
Consider a sparse normal mean model:y1...yj...yp
=
β1...βj...βp
+ σ
ε1...εj...εp
, εji .i .d .∼ N1(0, 1),
where β = (β1, · · · , βj , · · · , βp)> ∈ Rp is assumed to be sparse.We will use the Horseshoe and GLT priors for β ∈ Rp to compare.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 42 / 58
Prostate cancer data study (Bradley Efron, 2008, 2010)
Theoretically idealistic shrinkage
1 if yj is a noise ⇒ β̂j should shrink. (β̂j ≈ 0)
2 if yj is a signal ⇒ β̂j should NOT shrink. (β̂j ≈ yj)
−4 −2 0 2 4
−4
−2
02
4
y values
estim
ate
s β^
Theoretically Idealistic shrinkage
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 43 / 58
The Horseshoe applied to the prostate cancer dataTo investigate the behavior of the Horseshoe as the number of genesused increases, we constructed four datasets,P1 = {yj}p=50
j=1 , P2 = {yj}p=100j=1 , P3 = {yj}p=200
j=1 , P4 = {yj}p=6033j=1
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 44 / 58
The GLT prior applied to the prostate cancer data
We constructed seven datasets,P1 = {yj}p=50
j=1 , P2 = {yj}p=100j=1 , P3 = {yj}p=200
j=1 , P4 = {yj}p=500j=1 ,
P5 = {yj}p=1000j=1 , P6 = {yj}p=3000
j=1 , and P7 = {yj}p=6033j=1
−6
−3
0
3
6
−6 −3 0 3 6y
β^
p
50100200500
−6
−3
0
3
6
−6 −3 0 3 6y
β^
p
100030006033
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 45 / 58
The GLT prior applied to the prostate cancer data
Posterior means of ξ are
1.620(P1) < 1.662(P2) < 1.789(P3)
< 1.905(P4) < 1.991(P5) < 2.760(P6) < 3.636(P7).
As the number of genes considered increases, the posterior mean of ξalso accordingly increases to accommodate more signals.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 46 / 58
Simulated data study
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 47 / 58
Simulation SettingConsider a simulation environment (n, p, q, SNR):
y = Xβ0 + σ0ε, ε ∼ Nn(0, In)
β0 = (β01, · · · , β0q︸ ︷︷ ︸q signals
, β0q+1, · · · , β0p︸ ︷︷ ︸p-q noises
)>
= (1, · · · , 1︸ ︷︷ ︸q signals
, 0, · · · , 0︸ ︷︷ ︸p-q noises
)>,
Generation of an artificial Data (y,X) ∈ Rn × Rn×p
Step 1. Generate row vectors {xi}ni=1 of design matrix X ∈ Rn×p from
xii .i .d .∼ Np(0, Ip),
and then colume-wisely standardize the matrix so that each vector X[·, j ]has zero mean with unit l2-norm.Step 2. Generate n-dimensional Gaussian errors ε ∼ Nn(0, In).Step 3. Get y = Xβ0 + σ0ε where σ20 = var(Xβ0)/{SNR · var(ε)}.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 48 / 58
Simulated data study
Fix n = 100, p = 500, and SNR=5.
Generated four artificial datasets;1 Generate A1 = (y,X) by choosing q = 2; then, sparsity level is
s = 2500 = 0.004
2 Generate A2 = (y,X) by choosing q = 5; then, sparsity level iss = 5
500 = 0.01
3 Generate A3 = (y,X) by choosing q = 8; then, sparsity level iss = 8
500 = 0.016
4 Generate A4 = (y,X) by choosing q = 13; then, sparsity level iss = 13
500 = 0.026
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 49 / 58
Simulated data study (the Horseshoe)
A1, A2, A3, A4
Figure: green: truth β0 / blue: inference for signals / red: inference for noises
Posterior means of τ are
1.41 · 10−6(A1) < 0.05(A2) < 0.13(A3)� 6.53 · 10−15(A4)
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 50 / 58
Simulated data study (the GLT prior)
A1, A2, A3, A4.
Figure: green: truth β0 / blue: inference for signals / red: inference for noises
Posterior means of ξ are
2.010(A1) < 2.134(A2) < 2.235(A3) < 2.347(A4).
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 51 / 58
Replicated Numerical Studies
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 52 / 58
Simulation setupGenerate 50 replicated datasets under
y = Xβ0 + σ0ε, ε ∼ Nn(0, In)
β0 = (β01, · · · , β0q︸ ︷︷ ︸q signals
, β0q+1, · · · , β0p︸ ︷︷ ︸p-q noises
)>
= (1, · · · , 1︸ ︷︷ ︸q signals
, 0, · · · , 0︸ ︷︷ ︸p-q noises
)>.
Change the sparsity-level q/p from 0.001 to 0.1. (SNR is fixed with 5)
Performance metrics
Report the medians of mean squared errors (MSE) corresponding to signaland noise coefficients measured across the 50 replicates:
MSES =1
q
q∑j=1
(β̂j − 1)2 and MSEN =1
p − q
p∑j=q+1
(β̂j)2.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 53 / 58
Replicated Numerical Studies: varied sparsity level(n, p) = (100, 500) (top) and (n, p) = (200, 1000) (bottom)• and N: GLT / �: Horseshoe
0.25
0.50
0.75
1.00
0.000 0.025 0.050 0.075 0.100
sparsity level
MS
ES
0.001
0.002
0.003
0.004
0.005
0.000 0.025 0.050 0.075 0.100
sparsity level
MS
EN
0.5
1.0
1.5
2.0
2.5
0.000 0.025 0.050 0.075 0.100
sparsity level
τ̂ ,
ξ^
0.25
0.50
0.75
1.00
0.000 0.025 0.050 0.075
sparsity level
MS
ES
0.001
0.002
0.003
0.004
0.000 0.025 0.050 0.075
sparsity level
MS
EN
0.5
1.0
1.5
2.0
2.5
0.000 0.025 0.050 0.075
sparsity level
τ̂ ,
ξ^
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 54 / 58
Summary
Verified an absence of the tail-controlling mechanism of theHorseshoe.
Introduced a new framework for continuous shrinkage priors, calledthe global-local-tail shrinkage priors.
Proposed the GLT prior which is a member of the new framework.
Demonstrated the tail-adaptability of the GLT prior through two geneexpression datasets & numerical studies.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 55 / 58
More detail about the GLT prior
More details are included in our manuscript, for e.g.,
Technical detail of the posterior computation.
Automatic Hyper-parameter Tuning Technique: elliptical sliceSampler centered by the Hill estimator.
Discussion on random shrinkage coefficients.
Theoretical detail of the GLT prior.
More about numerical studies.
Curve fitting study via sparse kernel Gaussian regression.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 56 / 58
Future work (“Big-data Bayesian inference”)
Mean Field Variational Bayes for the Horseshoe and GLT priors inhigh-dimensional setting. Currently, we are making a R package,called the VBhorseGLT.
Application of the GLT prior for large-scale gene-association network.
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 57 / 58
Thank you very much!
Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 58 / 58