Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial...
Transcript of Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center ...Jaya M. Satagopan Memorial...
Introductory Bayesian Analysis
Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate)
March 14, 2013
Bayesian Analysis
• Fit probability models to observed data
• Unknown parameters – Summarize using probability distribution – For example, P(mutation increases risk by 10% | data) – Posterior distribution
• Prior information – External data – Elicit from available data
This lecture
• Bayes theorem – Prior from external source
• Loss function, Expected loss
• Bayesian analysis with data-adaptive prior – Minimize squared error loss
• Bayesian penalized estimation – Prior to minimize other loss functions
• Software packages – Winbugs, SAS
Part 1. Bayes Theorem
Bayes Theorem
• Random variables: Y and θ
• Prior distributions: P(Y), P(θ)
• Conditional distributions: P(Y | θ) and P(θ | Y)
• Know P(θ | Y), P(Y), and P(θ) • • Need P(Y | θ) [posterior distribution]
P Y ! ( ) = P ! Y ( ) ! P Y ( )
P ! ( ) =
P ! Y ( ) ! P Y ( )P ! Y ( )P Y ( )dY"
Example
Say, 5% of the population has a certain disease. When
a person is sick, a particular test is used to determine
whether (s)he has this disease. The test gives a
positive result 2% of the times when a person actually
does not have the disease. The test gives a positive
result 95% of the times when the person does indeed
have the disease. Now, one person gets a positive test.
What is the probability the person has this disease?
Example continued
• Y = 1 (disease) or 0 (no disease) • θ = 1 (positive test) or 0 (negative test)
KNOWN: • P(Y = 1) = 0.05 P(Y = 0) = 1 – P(Y = 1) = 0.95 • P(θ = 1 | Y = 0) = 0.02 P(θ = 1 | Y = 1) = 0.95 NEED: • P(Y = 1 | θ = 1)
P Y =1 ! = 1 ( ) = P ! = 1 Y= 1 ( ) P Y =1( )
P ! = 1 ( )
= P ! = 1 Y= 1 ( ) P Y =1( )
P ! = 1 Y = 1 ( ) P Y = 1 ( ) + P ! = 1 Y = 0 ( ) P Y = 0 ( )
= 0.95!0.050.95!0.05+0.02!0.95
= 0.714
Example – Breast Cancer Risk
• Case-control sampling – Cases (Y = 1) have breast cancer – Controls (Y = 0) do not have breast cancer
• Record BRCA1/2 mutation – Mutation present (θ = 1) or absent (θ = 0)
• Observe P(θ = 1 | Y = 1) and P(θ = 1 | Y = 0) – Mutation frequency in cases and controls
• Need: P(Y = 1 | θ = 1) – Disease risk among mutation carriers
Satagopan et al (2001) CEBP, 10:467-473
Breast cancer risk (continued)
• Use Bayes theorem
• P(θ = 1 | Y = 1) = mutation frequency in cases • P(θ = 1 | Y = 0) = mutation frequency in controls
• P(Y = 1) = 1 – P(Y = 0) = prior information
• Get prior from external source (SEER Registry)
P Y = 1 ! = 1 ( ) = P ! = 1 Y = 1 ( ) P Y =1( )
P ! = 1 Y = 1 ( ) P Y =1( ) + P ! = 1 Y = 0 ( ) P Y = 0( )
Breast cancer risk (continued)
BRCA Muta*on
Case Control
Present 25 23 Absent 179 1090
• P(θ = 1 | Y = 1) = 25/204
• P(θ = 1 | Y = 0) = 23/1113
• P(Y = 1) = 0.0138 – Disease risk in the 40-49
age group (SEER registry)
• P(Y = 1 | θ = 1) = 7.6%
Data for Age group 40-49
http://seer.cancer.gov
Part 2. Loss function, Bayes estimate
Loss Function and Expected Loss
• Parameter θ • Decision (estimate) d(Y) based on data Y • Loss incurred = L(d(Y), θ) ≥ 0
• Squared error loss L(d(Y), θ) = [d(Y) - θ]2
• Absolute deviation L(d(Y), θ) = |d(Y) - θ|
• Expected loss = Risk = R(d,θ) = E{L(d(Y), θ)}
( ) ( )( ) ( )∫= dY Yf ,YdL ,dR θθθ
Bayes Estimation
• There is no single d that has small R(d,θ) for all θ. – No uniformly best d
• Bayes approach
• Get d that minimizes the average risk W(d). – W(d) is also known as the Bayes risk
• Bayes estimate dB of d: W(dB) ≤ W(d)
• For squared error loss, dB is the posterior mean of θ – dB(Y) = E(θ | Y)
( ) ( )( ) ( ) ( )∫ ∫= θθθ dG dY Yf ,YdL dW
Part 3. Bayesian analysis with data-adaptive prior parameters
GxE example
Bayesian analysis of GxE interactions
• Case-control study Y = 1 (case) Y = 0 (control) • Binary risk factors (say) • Genetic factor: G = 0, 1 • Environmental exposure: E = 0, 1
• Is there a significant interaction between G and E ?
• Estimate interaction odds ratio and standard error
Test: • Is this odds ratio = 1? Is this log(odds ratio) = 0 ?
Mukherjee and Chatterjee (2008). Biometrics, 64: 685-694
Interaction odds ratio (ORGE)
Y = 0 (Control data)
E = 1 E = 0
G = 1 N011 N010
G = 0 N001 N000
Y = 1 (Case data)
E = 1 E = 0
G = 1 N111 N110
G = 0 N101 N100
OR0 = Odds of E associated with G among controls OR1 = Odds of E associated with G among cases
OR0 = N011 N000
N001 N010
OR1 = N111 N100
N101 N110
ORGE = OR1
OR0
( )( ) ( )
controlcase
11
GEGE
ORlogORlogORlog
ββ
β
ˆ - ˆ =
- = = ˆ Var !̂GE( ) = Var !̂case( ) + Var !̂control( )
Gene-Environment independence in controls
Y = 0 (Control data)
E = 1 E = 0
G = 1 N011 N010
G = 0 N001 N000
OR0 = N011 N000
N001 N010
= 1
ORGE = OR1
Var !̂GE( ) = Var !̂case( ) < Var !̂case( ) +Var !̂control( )
Independence of G and E in controls unknown. So … Test: βcontrol = 0 If hypothesis is rejected, estimate interaction OR as βGE = βcase - βcontrol. Otherwise, estimate as βGE = βcase Then test whether βGE = 0 for interaction Not a good idea !!
Weighted estimate
• Estimate based on preliminary test T for β0 = 0
• Weighted average of case-only and case-control
estimates. Weights are indicator functions
• Can do better without requiring preliminary test !!
• Choose w to minimize squared error loss
• Bayes risk:
( ) GEcasePTGE, c>TI + c)<I(T = βββ ˆ ˆ ˆ
( ) GEcasewGE, w-1 + w = βββ ˆˆˆ
( ){ } - ˆ datadata GEwGE,GEEE βββ
Bayes estimate
• w is function of and
Alternative explanation: – e is error due to assuming G and E independence in controls
• An estimate of e is: • Prior for e: N(0, σ2). • Bayes estimate of e is
• M & C (2008) suggest estimating σ2 as
• Empirical Bayes estimate:
ecaseBGE, ˆ ˆ += ββ
caseGEe ββ ˆ - ˆ ˆ = ( )2t e,Nee ~ ˆ
( ) et
eeE 22
2
ˆ ˆ +
=σσ
( )GEˆVar β ( )caseGE
2 ˆˆVar t ββ −=
( )GEˆVar β
( ) ˆ ˆ ˆ eeEcaseBGE, += ββ
Shrinkage estimation
Advanced Colorectal Adenoma Example
• 610 cases and 605 controls • G = NAT2 acetylation (yes, no) • E = Smoking (never, past, current) • Note: lack of G and E independence in controls
– Need case-control estimate • EB estimate, credible interval. Is 0 in interval?
Summary
• Uncertainty about underlying assumption • Two possible estimates
• Bayes estimate: weighted average of the two
• Shrinkage estimation
• Data-adaptive estimation of prior parameters – Minimize squared error loss
Part 4. Bayesian penalized estimation
Prior to minimize various loss functions
Part 4a. Bayesian Ridge Regression
Minimize Squared Error Loss Normal Prior
GWAS data (Chen and Witte 2007, AJHG, 81: 397-404)
• 57 unrelated individuals of European ancestry (CEU) – HapMap project
• Outcome = Expression of the CHI3L2 gene – Cheung et al 2005, Nature, 437: 1365-1369
• Risk factors = 39,186 SNPs from Chromosome 1 – Illumina 550K array from HapMap
• SNP rs755467 deemed causal for CHI3L2 expression
• Goal: How well are the neighboring SNPs ranked well?
Application to GWAS
• Y = continuous (or binary) outcome, length N (subjects) • Xm = m-th SNP, m = 1, 2, …, M (=500K, say)
• For each SNP, model: Y = µm + Xmβm + error
• βm is effect of SNP m MLE, std err, p-value
• Find the significant SNPs
• Find the SNPs having the 500 smallest p-values
Chen and Witte 2007. AJHG, 81: 397-404
Hierarchical modeling
• Incorporate external information about SNPs • Bioinformatics data (Z matrix, user-specified)
– conservation, various functional categories
• β = Zπ + U – β length G, Z is G×K, π is K×1 – U is N(0, t2T) T is specified
• Improved estimation via second stage model
• Prior for β is N(Zπ, t2T) – Need {(β - Zπ)’T-1(β - Zπ)}/t2 to be small: Penalization
Posterior inference via MCMC
• Markov chain Monte Carlo approach to get βs • Specify prior for β, π, σ2 • π ~ N(0, *) 1/σ2 ~ Gamma(**, $$) • Specify prior for t2 or fix t2
• Generate samples from full conditional distributions β | Y, π, σ2, t2, … π | Y, β, σ2, t2, … σ2 | Y, β, π, t2, … etc.
Itera*on β parameters
1 β1 β2 βG
2 β1 β2 βG
…
G β1 β2 βG
Posterior Summaries
Avg(β1) Stdev (β1)
Avg(β2) Stdev (β2)
Avg(βG) Stdev (βG)
Chen and Witte GWAS Example
• Plot “p-values” of top 500 SNPs
So, what is going on ?
• Y = µm + Xmβm + error • MLE of β’s • Variance
• β = Zπ + U, U ~ N(0, t2T) • MLE of π’s
• Bayes estimate of β’s
• Large t2: S ≈ 0 ≈ W and • Small t2: W ≈ I and
( ) ˆ , ,ˆ ,ˆ ˆG21 ββββ =
( ) ( ) 12T1T Tt V̂ S ,ˆSZSZZ ˆ−−
+== βπ
V̂
( ) V̂SW ,ˆWZˆW-I ~ =+≈ πββ
ββ ˆ ~ ≈
πβ ˆZ ~ =Shrinkage estimation
Some Remarks
• Sensitivity to choice of prior parameters
• Instead of “p-value”, P(βm > 0), m = 1, …, G
• The Bayes estimate must ideally not be too sensitive to the choice of Z
• The estimated value of π will depend upon Z, but ideally the Bayes estimate should not.
β~
Part 4b. Bayesian LASSO
Minimize Absolute deviation Laplace prior
Diabetes data (Efron et al 2004, The Annals of Statistics, 32: 407-499)
Application to the diabetes study
• Y = continuous (or other type of) outcome (N×1) • X = N×p vector of risk factors • β = p×1 vector of effects (parameters of interest) • Find the significant risk factors
• Y = Xβ + error
• Many p, potentially correlated risk factors etc
• Estimate β to minimize |β - β0| for some β0 (LASSO)
• β0 = 0 or β0 = Zπ, Z given and π must be estimated
Park and Casella (2008). J Am Stat Assoc, 103: 681-686
Bayesian LASSO
• |β - β0| ≈ 1 – exp{ - |β - β0| } • LHS takes the form of a Laplace distribution
• Y = Xβ + error error ~ N(0, σ2I)
• Laplace prior for β with mean β0
• Mixture of normal prior for β and an exponential prior for its variance
( )
( ) 222
2
2
2
0
2j0j22
j0jj
dt t2
exp2
t21exp
21
exp2
f
⎭⎬⎫
⎩⎨⎧−
⎭⎬⎫
⎩⎨⎧ −−=
⎭⎬⎫
⎩⎨⎧ −−=
∫∞
σλ
σλ
ββπσ
ββσλ
σλ
β
Bayesian LASSO setup
( )( )
( )( )21
2
22j
2j
22j
2j
22
a,a Gamma Inverse ~
tindependen p , 1, j ,lexponentia ~ t
tindependen p , 1, j t 0, N ~ t ,
I ,X N ~ , Y
σ
λ
σσβ
σβσβ
=
=
• tj2 are latent variables to facilitate MCMC steps
• a1 and a2 are specified (check for sensitivity)
• λ2 : empirical estimation from data or specify prior – Generally a Gamma(c1, c2) prior
Parameter Estimation
• Get full conditionals, apply MCMC
• Bayes estimate of β – Posterior median
• Original LASSO: quadratic programming methods
Part 4c. Other Bayesian Penalization Methods
Brief survey
Bridge Regression
• Estimate β by minimizing
• γ is pre-specified
• γ = 1 is (Bayesian) LASSO
• γ = 2 is (Bayesian) Ridge
∑=
−p
1jij Z
γπβ
Fu 1998, JCGS, 7: 397-416
Bayesian Elasticnet
• Estimate β by minimizing
• Compromise between LASSO and Ridge penalties
• Normal prior constrained within certain bounds
• Hans (2011). J Am Stat Assoc, 106: 1383-1393
( ) ( )∑∑==
−+−p
1j
2ij
p
1jij Z -1 Z πβλπβλ
Software Packages
• WinBUGS – Specify model for outcome – Specify priors – Output estimated values of β and other parameters – Uses MCMC methods – Diagnostic plots – http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/
contents.shtml
• SAS Proc MCMC – http://support.sas.com/documentation/cdl/en/statug/63033/
HTML/default/viewer.htm#mcmc_toc.htm
References: Textbooks • JS Maritz and T Lwin (1989). Empirical Bayes Methods.
Chapman and Hall.
• JM Bernardo and AFM Smith (1993). Bayesian Theory. Wiley.
• BP Carlin and TA Louis (1996). Bayes and empirical Bayes methods for data analysis. Chapman and Hall.
• A Gelman, JB Carlin, HS Stern, DB Rubin (1996). Bayesian data analysis. Chapman and Hall.
• WR Gilks, S Richardson, DJ Spiegelhalter (1996). Markov chain Monte Carlo in practice. Chapman and Hall.
• T Hastie, R Tibshirani, J Friedman (2001). The Elements of Statistical Learning. Springer.
References: Some papers
• R Tibshirani (1996). Regression shrinkage and selection via the Lasso. JRSS – Series B, 58: 267-288.
• J Fu (1998). Penalized regression: The Bridge versus the Lasso. JCGS, 7: 397-416.
• MA Newton and Y Lee (2000). Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics 56: 1088-1097.
• JM Satagopan, K Offit, W Foulkes, ME Robson, S Wacholder, CM Eng, SE Karp, CB Begg (2001). The lifetime risks of breast cancer in Ashkenazi Jewish carriers of BRCA1 and BRCA2 mutations. Cancer Epidemiology,Biomarkers and Prevention 10: 467-473.
References: Some papers
• CM Kendziorski, MA Newton, H Lan, MN Gould (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine 22:3899-3914.
• D Conti, V Cortessis, J Molitor, DC Thomas (2003). Bayesian modeling of complex metabolic pathways. Human Heredity, 56: 83-93.
• B Efron, T Hastie, I Johnstone, R Tibshirani (2004). Least angle regression. The Annals of Statistics, 32: 407-451.
• B Mukherjee, N Chatterjee (2008). Exploiting gene-environment independence for analysis of case-control studies: An empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics, 64: 685-694.
References: Some papers
• GK Chen, JS Witte (2007). Enriching the analysis of genome-wide association studies with hierarchical modeling. AJHG, 81: 397-404.
• T Park, G Casella (2008). The Bayesian Lasso. JASA, 103: 681-686.
• M Park, T Hastie (2008). Penalized logistic regression for .detecting gene interactions. Biostatistics, 9: 30-50
• C Hans (2011). Elastic net regression modeling with the orthant normal prior. JASA, 106: 1383-1393.
Many more: Bioinformatics, Genetic Epidemiology, JASA, JRSS – Series B and C, PLoS One, …