Integration of biological annotations using hierarchical modeling
-
Upload
usc -
Category
Technology
-
view
341 -
download
2
Transcript of Integration of biological annotations using hierarchical modeling
![Page 1: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/1.jpg)
Using Biological Knowledge ToDiscover Higher Order Interactions
In Genetic Association Studies
Gary K. ChenDuncan C. Thomas
Department of Preventive MedicineUSC
May 19, 2010
![Page 2: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/2.jpg)
Outline
1. Motivation
2. The algorithm: Incorporating biological priorsinto an MCMC sampler
3. Simulation 1: Performance of the method
4. Simulation 2: Detecting interactions in a knownpathway
5. Application to data from a GWAS
6. Future Extensions
![Page 3: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/3.jpg)
Common diseases have complex etiology
I GWAS have had great success in searching forgenetic variants for common diseases
I Recent successes: AMD, BMI/obesity, Type 2diabetes, breast cancer, prostate cancer
I Marginal effects from single SNP analyses donot explain all heritability. Can we movebeyond the low-hanging fruit? (e.g. CNVs, rarevariants, epistatic interactions, etc.
I Ideally we would fit a model for all SNPs (andinteractions too)
![Page 4: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/4.jpg)
Analyzing all SNPs simultaneously
I Difficult for GWAS: predictors far exceedobservations
I Shrinkage methods: LASSO, ridge regression,elastic net,...
I LASSO method (Tibshirani, J Royal Stat. Soc. 96)I penalizes likelihood based on tuning parameter λI produces sparse (interpretable) models
I In GWAS settings:I Double Exp (LaPlace) prior on β(Wu and Lange,
Bioinf. 2009)I Normal Exp Gamma prior on β(Hoggart et al
PLOS Genet 2008)I Fast! Provides the maximum a posteriori (MAP)
estimates
![Page 5: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/5.jpg)
Fully Bayesian methods for variableselection
I Bayesian model averaging assesses uncertaintyI Probabilistically proposes sub-models from a
posterior distributionI Summarize statistics of parameters averaged across
all proposed modelsI Controls for multiple comparisons
I Disadvantage: Computationally expensiveI P(β) has normal distribution for conjugacyI “Spike and slab” ensures parsimonyI Example: Stochastic Search Variable Selection
via Gibbs sampling (George and McCullochJASA 93)
I βj |γj ∼ (1− γj)N(0, τ 2j ) + γjN(0, c2
j τ2j )
I e.g., f (γ) = Πpγj
j (1− pj)(1−γj )
![Page 6: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/6.jpg)
Searching for interactions
I SSVS via Gibbs SamplingI For 1000 SNPs, length of γ:
500,500=1000 + (1000)(999)2
I Iterating through each parameter is slow
I Reversible jump MCMCI In contrast to SSVS, the “model” is
M = {j : γj 6= 0}I Model size changes at each iteration (similar to
stepwise regression)
I Informative priorsI Incorporating biological information at the level of
each variableI These priors can be used towards a proposal
function in a Metropolis Hastings algorithm
![Page 7: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/7.jpg)
Outline
1. Motivation
2. The algorithm: Incorporating biological priorsinto an MCMC sampler
3. Simulation 1: Performance of the method
4. Simulation 2: Detecting interactions in a knownpathway
5. Application to data from a GWAS
6. Future Extensions
![Page 8: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/8.jpg)
Posterior density as a two-levelhierarchical model
I Posterior density:I L(Y |β,X ,M)P(β|π, τ, σ,M ,Z ,A)
I First level as likelihood: a GLM at the subjectlevel
I logit(P(Y = 1|β,X )) ∼ β0 +∑K
k=1 βkXI X can be G, E, GxG, GxE, etc.
I Second level as prior: βk as mixed modelI βk ∼ πTZk + φk + θk
![Page 9: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/9.jpg)
Prior mean on variable in Z
Table: The Z matrix
Intercept Conservation Missense eQTL1 20 0 51 10 1 0.011 5 0 11 10 1 4.11 5 0 1.4
I βk ∼ πTZk + φk + θk
I π̂: regress β̂ on Z , π ∼ N(π̂,Σπ)
![Page 10: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/10.jpg)
Variable connectivity in A matrix
Table: Example A matrix for SNP variables
Variable 1 2 31 0 1 02 1 0 13 0 1 0
![Page 11: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/11.jpg)
One appraoch for populating the A matrix
Table: The Z matrix
Intercept Conservation Missense eQTL→ 1 20 0 5
1 10 1 0.01→ 1 5 0 1
1 10 1 4.11 5 0 1.4
I Define entry A1,3 as corr(Z1,−,Z3,−),dichotomize A
![Page 12: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/12.jpg)
φk as mean across k ’s neighbors
Table: Example A matrix for SNP variables
Variable 1 2 31 0 1 02 1 0 13 0 1 0
I βk ∼ πTZk + φk + θk
I φk ∼ N(φ̄−k ,τ 2
νk)
I φ̄−k =Pm
j=1 φjAjkPmj=1 Ajk
, νk neighbors of variable k
I We set φj = β̂j
I Example: If β̂ = (0.2, 0.5, 0.4), φ2 = 0.3
![Page 13: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/13.jpg)
How the parameters fit togetherI L(Y |β,X ,M)P(β|Z , π,A, τ, σ,M)
![Page 14: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/14.jpg)
A reversible jump MCMC algorithm
I Propose a swap, addition or deletion of anvariable
I Perform reversible jump Metropolis Hastingsstep comparing posterior probabilities
I r = L(Y |β′,X ,M′)P(β′|Z ,π,A,τ,σ,M′)P(M→M′)L(Y |β,X ,M)P(β|Z ,π,A,τ,σ,M)P(M′→M)
I Accept move with probability min(1, r)
![Page 15: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/15.jpg)
Model transition proposal density
I Suppose model M ′ has 1 newly proposedvariable:
I P(M → M ′) = Φ−1(zk)I zk ∼ N(µk − µbaseline , 1)
I The variable-specific tuning parameter µkI A function of the components of β’s prior
standardized by their residual variancesI µk = |πT Zk+φ̄−k |
σ2+ τ2
νk
I Weak empirical support for priors lead to smallnumerator, large denominator
![Page 16: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/16.jpg)
Model transition proposal density
I Suppose model M ′ has 1 newly proposedvariable:
I P(M → M ′) = Φ−1(zk)I zk ∼ N(µk − µbaseline , 1)
I The global penalty tuning parameterI Emulate the BICI BIC (M ′)− BIC (M) = χ1(ln(n))I Probability of accepting M ′ is F−1
χ (ln(n))I µbaseline = Φ(F−1
χ (ln(n)))
![Page 17: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/17.jpg)
Outline
1. Motivation
2. The algorithm: Incorporating biological priorsinto an MCMC sampler
3. Simulation 1: Performance of the method
4. Simulation 2: Detecting interactions in a knownpathway
5. Application to data from a GWAS
6. Future Extensions
![Page 18: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/18.jpg)
Using external information to enhancepower and specificity
I Disease model: 4 GxG interactions jointlycause disease through 4 endophenotypes
I Genotypes simulated for 14 independent SNPsI yik = (1− b)N(sia ∗ sib, 1) + bU(0, 1)I b ∼ Bernoulli(p), p is proportion of noiseI 24 endophenotypes y used only in the prior
I Disease status determined using a logisticmodel
I logit(Yi = 1) = β0 +β1yi01 +β2yi02 +β3yi34 +β4yi35
I First 8000 persons reserved as case controldataset, remaining 2000 for constructing priors
![Page 19: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/19.jpg)
Constructing the Z and the A matrices
I Z matrixI Measures correlation between a model variable and
each endophenotype among 2000 individuals in theprior
I Zkq = corr(gk , yq)
I A matrixI Measures similarity between two variables by
comparing correlation profiles in ZI Ajk = corr(Zjq,Zkq)
![Page 20: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/20.jpg)
Question 1: How do the priors affectpower and specificity?
I The A matrix contains information across all24 endophenotypes
I Set up 3 variants of the original Z matrixI 4 causal endophenotypes only (noise parameter
p = 0)I 4 intermediate endophenotypes only (noise
parameter p = 0.2)I 4 weakly correlated endophenotypes only (noise
parameter p = 0.8)
I Models tested:both A and Z , no A or Z , Aonly, Z only (with 3 variants)
![Page 21: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/21.jpg)
Question 1: How do the priors affectpower and specificity?
At RR=1.5, all prior models perform very well
![Page 22: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/22.jpg)
Question 1: How do the priors affectpower and specificity?
At RR=1.4, prior models with A, Z, or bothoutperform others
![Page 23: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/23.jpg)
Question 1: How do the priors affectpower and specificity?
At RR=1.3, prior models with A, Z, or both have> 5% power
![Page 24: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/24.jpg)
Question 1: How do the priors affectpower and specificity?
At RR=1.2, fully informative prior still retains 80%power
![Page 25: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/25.jpg)
Question 1: How do the priors affectpower and specificity?
At RR=1.1, all prior models perform poorly (∼ 55%power)
![Page 26: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/26.jpg)
Question 2: How do the priors affectposterior estimates (shrinkage)?
Posterior estimates of β vs MLE
![Page 27: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/27.jpg)
Question 2: How do the priors affectposterior estimates (shrinkage)?
Posterior estimates of SE of β vs MLE
![Page 28: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/28.jpg)
Question 3: How do the priors improverankings?
6,441 interactions tested. 4 causal.
![Page 29: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/29.jpg)
Question 3: How do the priors improverankings?
513,591 interactions tested. 4 causal.
![Page 30: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/30.jpg)
Summary of simulation
I Sensitivity analysisI All methods perform well at high RRsI Informative priors improve power at lower RRs but
not at extremely low RRs
I Like LASSO, shrinkage improves interpretability
I Model averaging can improve robustness ofrankings
![Page 31: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/31.jpg)
Outline
1. Motivation
2. The algorithm: Incorporating biological priorsinto an MCMC sampler
3. Simulation 1: Performance of the method
4. Simulation 2: Detecting interactions in a knownpathway
5. Application to data from a GWAS
6. Future Extensions
![Page 32: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/32.jpg)
Discovering interactions in a knownpathway: Folate
![Page 33: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/33.jpg)
Simulated data set
I 14 genes, 2 environmental variables
I 8000 individuals in casecontrol data, remaining2000 for constructing priors
I Used a pathway simulation program togenerate steady-state concentrations
I Reed et al J Nutr. 2006 Oct;136(10):2653-61I Enzyme kinetics parameters (Km, Vmax) genotype
specific
I 3 mechanisms believed to be related to diseaseetiology
I Homocysteine concentrationI Pyrimidine synthesisI Purine synthesis
![Page 34: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/34.jpg)
Estimates of π
I Construct Z and A in same manner as previoussimulation:
I Z stores genotype-metabolite correlationsI A stores dichotomized-correlations between rows of
Z
I True log relative risk: .18 (RR=1.2)
Simulated Second-level coefficients πmechanism homocysteine pyrimidine purinehomocysteine 0.18(0.13) -0.09(0.536) 0.002(0.38)pyrimidine -0.04(0.22) 0.22(0.066) -0.01(0.06)purine -0.01(0.36) 0.16(0.327) 0.19(0.07)
![Page 35: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/35.jpg)
Comparison of BMA results to stepwiseregresssion
Interaction Pyrimidine synthesisBF MLE p-value
FTD*MAT-II 15 0.038FTD*MTHFR 20 0.046MTCH*MS 534 0.006PGT*MS 14 0.018
→ SHMT*CBS 1254 0.133→ SHMT*Fol 2324 0.036
TS*MTHFR 227 0.022→ TS*SHMT 1091 N/S
![Page 36: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/36.jpg)
Pyrimidine synthesis
I SHMT*CBS SHMT*Fol SHMT*TS
![Page 37: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/37.jpg)
Comparison of BMA results to stepwiseregresssion
Interaction Purine synthesisBF MLE p-value
→ MTCH*MS 1130 0.008→ MTCH*PGT 1416 0.026→ PGT*CBS 1022 0.069→ PGT*MS 2851 0.007→ SHMT*Fol 1398 0.022
SHMT*MAT-II 646 0.012TS*MTHFR 57 0.024
![Page 38: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/38.jpg)
Purine synthesis
I MTCH*MS MTCH*PGT PGT*CBS PGT*MSSHMT*Fol
![Page 39: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/39.jpg)
Comparison of BMA results to stepwiseregresssion
Interaction HomocysteineBF MLE p-value
CBS*MAT-II 77 0.045→ CBS*Met 1072 N/S
FTD*MAT-II 38 0.045FTD*MTHFR 213 0.015
→ MS*Met 1129 N/SMTCH*MS 978 0.006PGT*MS 75 0.044TS*MTHFR 41 0.022
![Page 40: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/40.jpg)
Homocysteine levels
I CBS*Met MS*Met
![Page 41: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/41.jpg)
Summary of folate pathway simulation
I Pathway knowledge can inform model search
I Simulated three plausible disease mechanisms
I Effect of causal metabolite on disease revealedin corresponding element of π
I Revealed plausible interactions not foundthrough a stepwise regression
![Page 42: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/42.jpg)
Outline
1. Motivation
2. The algorithm: Incorporating biological priorsinto an MCMC sampler
3. Simulation 1: Performance of the method
4. Simulation 2: Detecting interactions in a knownpathway
5. Application to data from a GWAS
6. Future Extensions
![Page 43: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/43.jpg)
Using gene annotations to inform a searchfor interactions
I Proof of concept: GWAS of breast cancer
I Publicly data from NCI(https://caintegrator.nci.nih.gov/cgems/)
I 1,145 cases and 1,142 controls of Europeanancestry
I The 22 Gene Ontology terms from BiologicalProcess used to define priors in A and Z
I Included 6,078 SNPs, where each SNP had GOannotation and had lowest p-value in gene
![Page 44: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/44.jpg)
Top 10 interactions found
Interaction Non-inf prior inf priorβ(SE) BF β(SE) BF
PARK2*SORCS1 0.22(0.06) 1e4 0.27(0.06) 5e4
AK5*ARHGAP26 0.16(0.05) 427 0.17(0.05) 903FGFR2*MAML2 -0.11(0.04) 1 -0.16(0.05) 686SHC3*KIF13B N/A N/A 0.17(0.05) 621PCLO*ME3 N/A N/A 0.18(0.05) 528CNGA3*CNN1 -0.16(0.05) 41 -0.17(0.05) 462FGFR2*CDT1 N/A N/A -0.16(0.05) 445SHC3*CXCL16 N/A N/A -0.18(0.05) 403FGFR2*ABCA1 -0.1(0.05) 158 -0.11(0.05) 268CYP2J2*SORCS1 -0.11(0.05) 74 -0.14(0.05) 266FGFR2*SCG5 N/A N/A 0.21(0.05) 235
![Page 45: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/45.jpg)
Enrichment analysis
I Are the top interactions (BF > 100) enrichedfor certain GO terms?
I Compute empiric p-value for enrichmentI For each permute within bins representative of
non-independence in observed interactionsI Pool bins, compute frequency of a GO term in the
poolI pvalue: Number of iterations freq exceeded obs
freq divided by 1 million
I biological regulation (p=.008), growth(p=1e−6), metabolic process (p=.008), andregulation of biological process (p=.003).
![Page 46: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/46.jpg)
Outline
1. Motivation
2. The algorithm: Incorporating biological priorsinto an MCMC sampler
3. Simulation 1: Performance of the method
4. Simulation 2: Detecting interactions in a knownpathway
5. Application to data from a GWAS
6. Future Extensions
![Page 47: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/47.jpg)
Incorporate gene-expression data intoGWAS analyses
I Developing priorsI Should be more informative (e.g. empirical) and
granular (e.g. SNP level) than GOI Obtain genotype-expression paired data: HapMap?I Apply WGCNA to infer pathway modulesI Genotype-module correlations used in Z matrix
I Incorporate more advanced MCMC techniquesI Evolutionary Monte CarloI Multiply-try MetropolisI Brute-force search for MAP. Use MAP for initial
values?
![Page 48: Integration of biological annotations using hierarchical modeling](https://reader034.fdocuments.us/reader034/viewer/2022052621/558b9a67d8b42a3b188b456f/html5/thumbnails/48.jpg)
Acknowledgements
I James Baurley
I David Conti
I Angela Presson (thanks in advance!)
I Funding: R01 ES016813 and R01 ES015090.