Continuous shrinkage prior revisited : a collapsing ...debdeep/pres-GLT.pdf · Continuous shrinkage...

Continuous shrinkage prior revisited: a collapsing behavior and remedy

Se Yoon Lee, Debdeep Pati, Bani K. [email protected] [email protected] [email protected]

Department of Statistics, Texas A & M University

Se Yoon Lee, Debdeep Pati, Bani K. Mallick [email protected] [email protected] [email protected] ( Department of Statistics, Texas A & M University )Continuous shrinkage prior revisited : a collapsing behavior and remedy 1 / 58

High-dimensional Linear Regression


High-dimensional linear regression

Consider the high-dimensional setting: predict a vector y ∈ Rn from a setof features X ∈ Rn×p, with p � n

y = Xβ + σε, ε ∼ Nn(0, In), and β is sparse.

β is said to be sparse if substantially many coefficients in β areassumed to be zeros or approximately zeros.

Main objective of high-dimensional linear regression

Main purpose is to estimate β = (β1, · · · , βj , · · · , βp)> ∈ Rp.


Example: gene-expression data study

y1...yi...yn

=

x11 · · · x1j · · · x1p

......

xi1 · · · xij · · · xip...

...xn1 · · · xnj · · · xnp

β1...βj...βp

+ σ

ε1...εi...εn

yi : a health response from the i-th subject.

xij : the gene-expression level of j-th gene from the i-th subject.


Sparsity level in general

Let q denote the number of signals in β = (β1, · · · , βp)> ∈ Rp.

We define the following ratio as the sparsity level;

Definition (Sparsity level in general)

s =q

p=

the number of relevant predictors

total number of predictors


Sparsity level in cancer study

Definition (Sparsity level in cancer study)

s =q

p=

the number of interesting genes

the number of protein-coding genes

Remark:

The latest estimate for protein-coding gene is p = 21, 306. (Willyard,Nature, 2018).

Geneticists now like to discover more and more interesting genes;growing q movement.

A recent request and movement in gene-expression data study

We need a sparse method which works nicely not only under ultra-sparseregime (s is very small) but also under moderately-sparse regime (s is nottoo small). Our motivation of research is that in Bayesian context, thisissue is closely related with tail of a prior.


Growing q movement: Beyond BRCA 1 & 2 Movement

List of additionally discovered genes relevant with breast cancer areATM, BARD1, BRIP1, CDH1, CHEK2, MRE11A, MSH6, NBN, PALB2,

PMS2, RAD50, RAD51C, STK11, and TP53.(Visit https://www.breastcancer.org/risk/factors/genetics.)


Growing q movement: National Institutes of Health (NIH)Research Outline 2019

National Institutes of Health (NIH) Research Outline 2019(Visit https://ghr.nlm.nih.gov/primer.)


Bayesian approaches to high-dimensional linear regression


Bayesian approaches to high-dimensional linear regressionConsider the high-dimensional setting with p � n

y = Xβ + σε, ε ∼ Nn(0, In), and β is sparse.

Bayesian approaches

Sample from the posterior distribution:

π(β|y) ∝ Nn(y|Xβ, σ2In) · π(β),

for some sparsity favoring prior π(β).

(Typically, in Bayesian analysis, prior for the error variance σ2 is given by the Jeffreys

prior: this is not our concern.)

There are two types of priors for π(β):

Spike-and-slab priors

Continuous shrinkage priors (main talk)


Spike-and-slab type priorsHistorically, spike-and-slab priors have been considered as natural solutions.

Spike-and-slab type priors (George & McCulloch, 1997)

Each component of the β = (β1, · · · , βp)> is assumed to be drawn from

βj |τ, φ ∼ (1− τ) · δ0(βj) + τ · fφ(βj), (j = 1, · · · , p),

where τ = Pr(βj 6= 0) and φ ∼ r(φ). δ0 is the Direc-delta function and fφ is a

density on R with parameter φ.

A drawback: computation is prohibitive.

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

β

de

nsity

spike

slab

Spike−and−slab


Continuous shrinkage priors

Basic idea of continuous shrinkage priors is to mimic the spike-and-slab

prior by a single continuous density.

Global-local-shrinkage priors

βj |λj , τ, σ2 ∼ N1(0, λ2j τ2σ2), σ2 ∼ h(σ2), (j = 1, · · · , p),

λj ∼ f (λj), τ ∼ g(τ), (j = 1, · · · , p),

where h, f , and g are densities supported on (0,∞).

λj : local-scale parameter associated with βj . Detect signal or noisewith magnitude of λj . Provide the heterogeneity for signal detection.

τ : global-scale parameter. Shrinkage β globally towards 0.


The Horseshoe (Carvalho, Polson, & Scott, 2010, Bka)

Hierarhical formulation of the Horseshoe

βj |λj , τ, σ2 ∼ N1(0, λ2j τ2σ2), σ2 ∼ π(σ2) ∝ 1/σ2, (j = 1, · · · , p),

λj ∼ C+(0, 1), τ ∼ C+(0, 1), (j = 1, · · · , p),

where C+(0, 1) represents the half-Cauchy density:

π(x) = C+(x |0, 1) =1

1 + x2, x > 0.

The Horseshoe has no hyper-parameter.

Posterior computation is very fast: use R package horseshoe.


How to visualize the Horseshoe ?

Start by fixing τ > 0, and consider the univariate setup:

β|λ ∼ N1(0, τ2λ2) and λ ∼ C+(0, 1).

And then, integrate out λ, to get marinal density

π(β|τ) =

∫ ∞0N1(β|0, τ2λ2) · C+(λ|0, 1)dλ

= K · eZ(β) · E1{Z (β)},

where K = 1/(τ21/2π3/2) and Z (β) = β2/(2τ2). E1(x) =∫∞1 e−xtt−1dt

is the exponential integral function.

A key idea of Horseshoe

Many nice properties of the Horseshoe come from E1{Z (β)}.


Properties of the HorseshoeThe Horseshoe is characterized by two nice properties:

1 Inifinite spike at origin, i.e., limβ→0 π(β|τ) =∞, τ > 0.2 Heavy-tail of π(β|τ) for any τ > 0.

−4 −2 0 2 4

010

20

30

40

β

density

The Horseshoe


Spike-and-slab type prior versus the Horseshoe

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

β

density

spikeslab

Spike−and−slab

−4 −2 0 2 40

10

20

30

40

β

density

The Horseshoe


Other nice properties of the Horseshoe

1 Under the sparsity assumption

s =q

p→ 0 and n, p →∞

The Horseshoe possesses nice theoretical properties. (Bai & Ghosh,2018; Polson, Scott & Scott, 2010; van der Pas, Szabo, & van derVaart, 2017)

2 For e.g., the Horseshoe estimator (posterior mean for β) is robust andattains the minimax-optimal rate up to a constant. (Van Der Pas etal. 2014, 2016)


How does the tail the Horseshoe behave ?

First, lets define the tail-index typically used in extreme value theory.

Definition (Tail-index = 1/shape parameter)

Suppose random variable X is distributed with cdf F . Then the tail-indexof the density, f = F ′, is the value α > 0, satisfying

F̄ (x) = L(x) · x−α,

where F̄ = 1− F is survival function and L is a slowly varying function.The reciprocal of tail-index, ξ = 1/α, is called the shape parameter.

Key concept in understanding tail-behavior of a density f

tail-heaviness of f ↑ ⇔ tail-index α ↓ 0 ⇔ shape parameter ξ ↑ ∞


Drawback: Restricted tail behavior of the Horseshoe

Theorem (Fixed tail-index of the Horseshoe)

Assume β|λ, τ ∼ N1(0, τ2λ2), λ ∼ C+(0, 1), and τ > 0. Then thetail-index of π(β|τ) is fixed with α = 1 for any value of τ > 0.

A drawback of the Horseshoe1 The Horseshoe is only able to control the center part.

2 The Horsehoe does NOT have a tail-index controlling mechanism.

−4 −2 0 2 4

0.0

0.5

1.0

1.5

2.0

β

de

nsity

π(β|τ=1) π(β|τ=0.5) π(β|τ=0.01)

π(β|τ)


Global-local-tail shrinkage priors


Idealistic tail behavior of continuous shrinkage prior

KEY IDEA of Global-local-tail shrinkage priors

Idealistically, as the sparsity-level s = q/p increases, the shape parameter ξ = 1/α of the

marginal density of β should also accordingly increase to accommodate more signals.

Figure: Tail-part of the marginal density of β


Idealistic tail behavior of continuous shrinkage prior

KEY IDEA of Global-local-tail shrinkage priors

Idealistically, as the sparsity-level s = q/p increases, the shape parameter ξ = 1/α of the

marginal density of β should also accordingly increase to accommodate more signals.

Figure: Tail-part of the marginal density of β. Tail is lifted to accomodate moresignals.


Global-local-tail shrinkage prior

We introduce a new framework for continuous shrinakge priors:

Definition (Hierarchy of global-local-tail shrinkage prior)

βj |λj , σ2 ∼ N1(0, λ2j σ2), σ2 ∼ h(σ2), (j = 1, · · · , p),

λj |τ, ξ ∼ f (λj |τ, ξ), (j = 1, · · · , p),

(τ, ξ) ∼ g(τ, ξ),

where f is a density supported on (0,∞) with the scale parameter τ > 0and the shape parameter ξ > 0, and and g is a joint density supported on(0,∞)× (0,∞). h is a density supported on (0,∞)

The local-scale density f plays a key role.

Fully Bayesian inference about ξ is challenging because ξ exists as anexponent part of f .


Global-local-tail shrinkage prior

Table: Unit scaled densities for f

f (λ|τ = 1, ξ) Shape ξHalf-α-stable distribution non-closed form ξHalf-Cauchy distribution 2{π(1 + λ2)}−1 1

Half-Levy distribution λ−3/2exp{−1/(2λ)}/√

2π 2Loggamma distribution {(1 + λ)−(1/ξ+1)}/ξ ξ

Generalized extreme value distribution exp{−(1+ξλ)−1/ξ}(1+ξλ)(1/ξ+1) ξ

Generalized Pareto distribution (1 + ξλ)−(1/ξ+1) ξ

If choosing half-Cauchy distribution for f , one can get theHorseshoe: the Horseshoe ∈ the global-local-tail shrinkage priors.

Fully Bayesian inference about ξ is challenging because ξ exists as anexponent part of f .


The GLT prior

KEY IDEA of The GLT prior

We use the Generalized Pareto Distribution (GPD) instead of half-Cauchy

distribution.

Definition (Hierarchy of the GLT prior πGLT(β))

βj |λj , σ2 ∼ N1(0, λ2j σ2), σ2 ∼ π(σ2) ∝ 1/σ2, (j = 1, · · · , p),

λj |τ, ξ ∼ GPD(τ, ξ), (j = 1, · · · , p),

τ |ξ ∼ IG(p/ξ + 1, 1),

ξ ∼ log N (µ, ρ2) · I(1/2,∞), µ ∈ R, ρ2 > 0.

(GPD: GPD; IG: inverse-gamma; logN : log-normal)


The GLT prior

Graphical model representation of

y|β, σ2 ∼ Nn(Xβ, σ2In) and β ∼ πGLT(β).

y

β1

βp

σ2

λ1

λp

τ

ξ µ, ρ2

µ and ρ2: hyper-parameters.

We developed the elliptical slice sampler centered by the Hillestimator which enables

1 to learn the shape parameter ξ according to the sparsity-level,2 and render the posterior computation tuning-free.


Tail behavior of the GLT priorFor a fixed τ > 0 and ξ > 1/2, consider a univariate setup:

β|λ ∼ N1(0, λ2) and λ ∼ GPD(τ, ξ).

And then, integrate out λ, to get marinal density

π(β|τ, ξ) =

∫ ∞0N1(β|0, λ2) · GPD(λ|τ, ξ)dλ

=∞∑k=0

ak{ψSk (β) + ψR

k (β)},

where ak = (−1)k · K ·(1/ξ+k

k

), K = 1/(τ23/2π1/2), Z (β) = β2ξ2/(2τ 2),

ψSk (β) = Ek/2+1{Z (β)}, and ψR

k (β) = Z (β)−1+1/ξ+k

2 γ{(1 + 1/ξ + k)/2,Z (β)}.Es(x) =

∫∞1

e−xtt−sdt, is the generalized exponential-integral function.

Key massage of this slide

Existence of {ψRk (β)}∞k=0 provides a great flexibility to the shape of the density

π(β|τ, ξ). We call the series of functions tail lifters.

The Horseshoe does NOT have tail lifters.


The Horseshoe versus The GLT prior

Let’s summarize properties of two priors.

Characteristics of the Horseshoe1 Infinite spike at origin, i.e., limβ→0 π(β|τ) =∞.

2 Heavy-tail of π(β|τ). Tail-index is α = 1 (shape parameter is ξ = 1).

3 [Drawback] Restricted tail behavior of π(β|τ). Fixed tail-index.

Characteristics of the GLT prior

1 Infinite spike at origin, i.e., limβ→0 π(β|τ, ξ) =∞.

2 Heavy-tail of π(β|τ, ξ).

3 Controllable tail behavior of π(β|τ, ξ) through controlling ξ.


The Horseshoe versus The GLT prior (τ = 1)

Comparison between πHS(β|τ) and πGLT(β|τ, ξ) when τ = 1.

πHS(β|τ): black curve.

πGLT(β|τ, ξ): colored in red (ξ = 1), green (ξ = 1.5), blue (ξ = 2),and violet (ξ = 3), respectively.


The Horseshoe versus The GLT prior (τ = 0.001)

Comparison between πHS(β|τ) and πGLT(β|τ, ξ) when τ = 0.001.

πHS(β|τ): black curve.

πGLT(β|τ, ξ): colored in red (ξ = 1), green (ξ = 1.5), blue (ξ = 2),and violet (ξ = 3), respectively.

When τ is very small, πHS(β|τ) numerically becomes a Dirac-deltafunction.


Application I: Breast cancer data study


Application I: Breast cancer data study

Consider the high-dimensional linear regression;

y = Xβ + σε, ε ∼ Nn(0, In).

We collected data from The Cancer Genome Atlas (TCGA):

(y,X) ∈ Rn × Rn×p: n = 729 breast cancer patients, p = 3250 genes.

yi : logarithm of overall-survival time from the i-th patient.

xij : the gene-expression level of j-th gene from the i-th patient.


Application I: Breast cancer data studyTo see the behavior of the Horseshoe as the numbe of genes used increase,we constructed four datasets, B1 ⊂ B2 ⊂ B3 ⊂ B4;

B1 = (y,X[·, 1 : 500]) (500 genes),

B2 = (y,X[·, 1 : 1000]) (1,000 genes),

B3 = (y,X[·, 1 : 2000]) (2,000 genes),

and B4 = (y,X[·, 1 : 3250]) (3,250 genes; full data).

Guess on the sparsity level of the four datasets

Sparsity level of the datasets may increase as the number of genesincreases. (Of course, sparsity level is unknown.)


Top 50 gene ranking plot (the Horseshoe)

−6

−3

0

3

6

gene rank [j]

95

% c

red

ible

in

terv

al o

f β

[j]

−6

−3

0

3

6

gene rank [j]

95

% c

red

ible

in

terv

al o

f β

[j]

−1e−03

−5e−04

0e+00

5e−04

1e−03

gene rank [j]

95

% c

red

ible

in

terv

al o

f β

[j]

−1e−03

−5e−04

0e+00

5e−04

1e−03

gene rank [j]

95

% c

red

ible

in

terv

al o

f β

[j]

Figure: B1 (top-left), B2 (top-right), B3 (bottom-left), and B4 (bottom-right).


Top 10 interesting genes (the Horseshoe)

Table: Top 10 interesting genes selected by the Horseshoe,

1 2 3 4 5B1 NGEF(−) PLN(−) C3orf59(+) C21orf63(+) LOC100130331(−)B2 FAM138F(−) SLC39A4(−) PLN(−) NGEF(−) PCGF5(+)B3 NA NA NA NA NAB4 NA NA NA NA NA

6 7 8 9 10B1 FCGR2A(−) HES4(+) BCAP31(−) GSTM1(+) TOB2(−)B2 HES4(+) FCGR2A(−) FCGR2C(−) TOB2(−) BCAP31(−)B3 NA NA NA NA NAB4 NA NA NA NA NA

NOTE: Contents of table is (gene name, direction). Gene with (+) and Gene with (−)may enhance and undermine immune system of breast cancer patients, respectively. Whenthe horseshoe prior is applied to the datasets B3 and B4, genes are unranked because thehorseshoe estimator collapsed.


Top 50 gene ranking plot (the GLT prior)

−8

−4

0

4

8

gene rank [j]

95%

cre

dib

le inte

rval of

β[j]

−8

−4

0

4

8

gene rank [j]

95%

cre

dib

le inte

rval of

β[j]

−8

−4

0

4

8

gene rank [j]

95

% c

redib

le inte

rval of

β[j]

−8

−4

0

4

8

gene rank [j]

95

% c

redib

le inte

rval of

β[j]

Figure: B1 (top-left), B2 (top-right), B3 (bottom-left), and B4 (bottom-right).

Posterior means of ξ are

2.188(B1) < 2.230(B2) < 2.382(B3) < 2.922(B4).


Top 10 interesting genes (the GLT prior)

Table: Top 10 interesting genes selected by the GLT prior

1 2 3 4 5B1 NGEF(−) C21orf63(+) PLN(−) C3orf59(+) FCGR2A(−)B2 FAM138F(−) SLC39A4(−) NGEF(−) PCGF5(+) PLN(−)B3 FAM138F(−) SLC39A4(−) NGEF(−) PLN(−) COL7A1(−)B4 FAM138F(−) NSUN4(−) COL7A1(−) LOC150776(+) NGEF(−)

6 7 8 9 10B1 BCAP31(−) GSTM1(+) LOC100130331 (−) TOB2(−) ABCA17P(+)B2 FCGR2A(−) CRHR1(+) TOB2(−) GSTM1(+) LOC150776(+)B3 CRHR1(+) FCGR2A(−) RPLP1(+) HES4(+) TOB2(−)B4 SMCHD1(+) RPLP1(+) HES4(+) SLC37A2(−) SLC39A4(−)


Literature coherency with oncology/genetics for theselected genes via the GLT prior for the dataset B4

See our Supplemental materials for more detail.


Application II: Prostate cancer data study(Bradley Efron, 2008, 2010)


Prostate cancer data study (Bradley Efron, 2008, 2010)Dataset is based on two-sample test-statistics, {yj}pj=1 (p = 6, 033genes): Data is avaiable in R package sda. See Efron (2008) fordetail.

Histogram of {yj}pj=1.

: cancerous genes (signals)

source: The Future of Indirect Evidence, Bradley Efron, Statistical Science, 2010


Prostate cancer data study (Bradley Efron, 2008, 2010)

Consider a sparse normal mean model:y1...yj...yp

=

β1...βj...βp

+ σ

ε1...εj...εp

, εji .i .d .∼ N1(0, 1),

where β = (β1, · · · , βj , · · · , βp)> ∈ Rp is assumed to be sparse.We will use the Horseshoe and GLT priors for β ∈ Rp to compare.


Prostate cancer data study (Bradley Efron, 2008, 2010)

Theoretically idealistic shrinkage

1 if yj is a noise ⇒ β̂j should shrink. (β̂j ≈ 0)

2 if yj is a signal ⇒ β̂j should NOT shrink. (β̂j ≈ yj)

−4 −2 0 2 4

−4

−2

02

4

y values

estim

ate

s β^

Theoretically Idealistic shrinkage


The Horseshoe applied to the prostate cancer dataTo investigate the behavior of the Horseshoe as the number of genesused increases, we constructed four datasets,P1 = {yj}p=50

j=1 , P2 = {yj}p=100j=1 , P3 = {yj}p=200

j=1 , P4 = {yj}p=6033j=1


The GLT prior applied to the prostate cancer data

We constructed seven datasets,P1 = {yj}p=50

j=1 , P2 = {yj}p=100j=1 , P3 = {yj}p=200

j=1 , P4 = {yj}p=500j=1 ,

P5 = {yj}p=1000j=1 , P6 = {yj}p=3000

j=1 , and P7 = {yj}p=6033j=1

−6

−3

0

3

6

−6 −3 0 3 6y

β^

p

50100200500

−6

−3

0

3

6

−6 −3 0 3 6y

β^

p

100030006033


The GLT prior applied to the prostate cancer data


1.620(P1) < 1.662(P2) < 1.789(P3)

< 1.905(P4) < 1.991(P5) < 2.760(P6) < 3.636(P7).

As the number of genes considered increases, the posterior mean of ξalso accordingly increases to accommodate more signals.


Simulated data study


Simulation SettingConsider a simulation environment (n, p, q, SNR):

y = Xβ0 + σ0ε, ε ∼ Nn(0, In)

β0 = (β01, · · · , β0q︸︷︷︸q signals

, β0q+1, · · · , β0p︸︷︷︸p-q noises

)>

= (1, · · · , 1︸︷︷︸q signals

, 0, · · · , 0︸︷︷︸p-q noises

)>,

Generation of an artificial Data (y,X) ∈ Rn × Rn×p

Step 1. Generate row vectors {xi}ni=1 of design matrix X ∈ Rn×p from

xii .i .d .∼ Np(0, Ip),

and then colume-wisely standardize the matrix so that each vector X[·, j ]has zero mean with unit l2-norm.Step 2. Generate n-dimensional Gaussian errors ε ∼ Nn(0, In).Step 3. Get y = Xβ0 + σ0ε where σ20 = var(Xβ0)/{SNR · var(ε)}.


Simulated data study

Fix n = 100, p = 500, and SNR=5.

Generated four artificial datasets;1 Generate A1 = (y,X) by choosing q = 2; then, sparsity level is

s = 2500 = 0.004

2 Generate A2 = (y,X) by choosing q = 5; then, sparsity level iss = 5

500 = 0.01


500 = 0.016


500 = 0.026


Simulated data study (the Horseshoe)

A1, A2, A3, A4

Figure: green: truth β0 / blue: inference for signals / red: inference for noises

Posterior means of τ are

1.41 · 10−6(A1) < 0.05(A2) < 0.13(A3)� 6.53 · 10−15(A4)


Simulated data study (the GLT prior)

A1, A2, A3, A4.

Figure: green: truth β0 / blue: inference for signals / red: inference for noises


2.010(A1) < 2.134(A2) < 2.235(A3) < 2.347(A4).


Replicated Numerical Studies


Simulation setupGenerate 50 replicated datasets under

y = Xβ0 + σ0ε, ε ∼ Nn(0, In)

β0 = (β01, · · · , β0q︸︷︷︸q signals

, β0q+1, · · · , β0p︸︷︷︸p-q noises

)>

= (1, · · · , 1︸︷︷︸q signals

, 0, · · · , 0︸︷︷︸p-q noises

)>.

Change the sparsity-level q/p from 0.001 to 0.1. (SNR is fixed with 5)

Performance metrics

Report the medians of mean squared errors (MSE) corresponding to signaland noise coefficients measured across the 50 replicates:

MSES =1

q

q∑j=1

(β̂j − 1)2 and MSEN =1

p − q

p∑j=q+1

(β̂j)2.


Replicated Numerical Studies: varied sparsity level(n, p) = (100, 500) (top) and (n, p) = (200, 1000) (bottom)• and N: GLT / �: Horseshoe

0.25

0.50

0.75

1.00

0.000 0.025 0.050 0.075 0.100

sparsity level

MS

ES

0.001

0.002

0.003

0.004

0.005

0.000 0.025 0.050 0.075 0.100

sparsity level

MS

EN

0.5

1.0

1.5

2.0

2.5

0.000 0.025 0.050 0.075 0.100

sparsity level

τ̂ ,

ξ^

0.25

0.50

0.75

1.00

0.000 0.025 0.050 0.075

sparsity level

MS

ES

0.001

0.002

0.003

0.004

0.000 0.025 0.050 0.075

sparsity level

MS

EN

0.5

1.0

1.5

2.0

2.5

0.000 0.025 0.050 0.075

sparsity level

τ̂ ,

ξ^


Summary

Verified an absence of the tail-controlling mechanism of theHorseshoe.

Introduced a new framework for continuous shrinkage priors, calledthe global-local-tail shrinkage priors.

Proposed the GLT prior which is a member of the new framework.

Demonstrated the tail-adaptability of the GLT prior through two geneexpression datasets & numerical studies.


More detail about the GLT prior

More details are included in our manuscript, for e.g.,

Technical detail of the posterior computation.

Automatic Hyper-parameter Tuning Technique: elliptical sliceSampler centered by the Hill estimator.

Discussion on random shrinkage coefficients.

Theoretical detail of the GLT prior.

More about numerical studies.

Curve fitting study via sparse kernel Gaussian regression.


Future work (“Big-data Bayesian inference”)

Mean Field Variational Bayes for the Horseshoe and GLT priors inhigh-dimensional setting. Currently, we are making a R package,called the VBhorseGLT.

Application of the GLT prior for large-scale gene-association network.


Thank you very much!


Continuous shrinkage prior revisited : a collapsing ...debdeep/pres-GLT.pdf · Continuous shrinkage...

Documents

Transcript of Continuous shrinkage prior revisited : a collapsing ...debdeep/pres-GLT.pdf · Continuous shrinkage...