Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College...
-
Upload
amia-leblanc -
Category
Documents
-
view
219 -
download
3
Transcript of Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College...
![Page 1: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/1.jpg)
Model checks for complex hierarchical models
Alex Lewin and Sylvia Richardson
Imperial College
Centre for Biostatistics
![Page 2: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/2.jpg)
Many complex models used in bioinformatics
Classification/clustering can be greatly affected by choice of distributions
Our approach: exploit the structure of the model to perform predictive checks
hierarchical models generally involve exchangeability assumptions
mixture models are partially exchangeable
Background and Aims
![Page 3: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/3.jpg)
Mixture model for gene expression data
Model checks for mixture model
distribution for gene-specific variances
different mixture priors
Future work: model checks for a clustering and variable selection model (Tadesse et al. 2005)
Outline of Talk
![Page 4: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/4.jpg)
Hierarchical mixture model for gene expression data
differential effect for gene g
variance for each gene
Data: paired log differences between 2
conditions
g
ybarg Sg
σg
μ,τwjηj
g = gener = replicatej = mixture component
ygr | δg, g N(δg, g2)
w ~ Dirichlet(1,…,1), various priors for δg, g
δg | η ~ Σwjhj(ηj), g2 | μ,τ
f(μ,τ)
![Page 5: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/5.jpg)
Mixture model for gene expression data
Many mixture models have been proposed for gene expression data
Set-up is similar to variable selection prior: point mass + alternative distribution
Particular choices for alternative:
Normal (Lönnstedt and Speed)
Uniform (Parmigiani et al)
many others …
![Page 6: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/6.jpg)
Mixture model for gene expression data
Allow for asymmetry in over-and under-expressed genes 3-component mixture model
δg | η ~ w1h1(η1) + w2h2(η2) + w3h3(η3)
6 knock-out and 5 wildtype mice
MAS5.0 processed data
![Page 7: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/7.jpg)
Mixture model for gene expression data
Classify each gene into mixture components using posterior probabilities
![Page 8: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/8.jpg)
Choice of mixture prior affects classification results
Mixture Prior for δg Est. w2 (% in null)
w1Unif(-η-,0) + w2δ(0) + w3Unif(0,η+) 0.96
w1Gam-(1.5,η-) + w2 δ(0) + w3Gam+(1.5,η+) 0.68
w1Gam-(1.5,η-) + w2N(0,ε) + w3Gam+(1.5,η+) 0.99
![Page 9: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/9.jpg)
Mixture model for gene expression data
Models checks for mixture model
distribution for gene-specific variances
different mixture priors
Future work: model checks for a clustering and variable selection model (Tadesse et al. 2005)
Outline of Talk
![Page 10: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/10.jpg)
Predict new data from the model
Use posterior predictive distribution
Condition on hyperparameters (‘mixed predictive’ * not very conservative)
Get Bayesian p-value for each gene/marker/sample
Use all p-values together (100’s or 1000’s) to assess model fit
* Gelman, Meng and Stern 1995; Marshall and Spiegelhalter 2003
Predictive model checks
![Page 11: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/11.jpg)
posterior Smpred
Sgobs
Checking distribution for gene variances
Bayesian p-value for gene g:
pg = Prob( Smpred > Sgobs | data )
All genes are exchangeable
histogram of p-values for all genes together
g
ybarg Sgobs post.
pred.
Sgppred
mixedpred.Smpred
σg
μ,τ
σpred
![Page 12: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/12.jpg)
Predictive p-values for data simulated from the model
Histograms should be Uniform
Mixed predictive distribution much less conservative than posterior predictive
‘Mixed’ v. ‘posterior’ predictive
Using global distributionUsing gene-specific distributions
![Page 13: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/13.jpg)
Checking different variance models
Model differential expression between 3 transgenic and 3 wildtype mice
g2 | μ,τ
Gam(μ,τ), μ fixed
g2 | μ,τ Gam(μ,τ)
g2 | μ,τ logNorm(μ,τ)
g2 = 2 for all genes
![Page 14: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/14.jpg)
pg = 0
for t = 1,…,niter {
σtpred f(μt,τt)
Stmpred Gam( m, m(σt
pred)-2 )
pg pg + I[ Stmpred > Sg
obs ]
}
pg pg / niter
Implementation (MCMC)
Just two extra parameters predicted at each iteration
niter = no. MCMC iterations
m = (no. replicates – 1)/2
g
ybarg Sgobs
mixedpred.Smpred
σg
μ,τ
σpred
![Page 15: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/15.jpg)
Mixture model for gene expression data
Model checks for mixture model
distribution for gene-specific variances
different mixture priors
Future work: model checks for a clustering and variable selection model (Tadesse et al. 2005)
Outline of Talk
![Page 16: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/16.jpg)
Checking mixture prior
δg | η ~ w1h1(η1) + w2h2(η2) + w3h3(η3)
OR
δg | η, zg = j ~ hj(ηj) j = 1,…,3
P(zg = j) = wj
Model checking: focus on separate mixture components
![Page 17: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/17.jpg)
δg | η, zg = j ~ hj(ηj) j = 1,…,3
Think about MCMC iterations …
Mixture component is estimated from genes currently assigned to that component
Can only define p-value for given gene and mix. component when the gene is assigned to that component (i.e. condition on zg in p-value)
So check each component using only the genes currently assigned (i.e. condition on zg in histogram)
Issues for mixture model checking
![Page 18: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/18.jpg)
g jpred
wj
ybarg Sg ybargjmpred
σg
μ,τηj
Predictive checks for mixture model
Bayesian p-value for gene g and mix. component j:
pgj = Prob( ybargjmpred > ybarg
obs | data, zg=j )
Genes assigned to the same mix. component are exchangeable
histogram of p-values for each mix. component separately
histogram for component j made only from genes with large P(zg = j)
![Page 19: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/19.jpg)
Effectively we condition on a best classification
Condition on classification to check separate components
All genes with P(zg = j) > 0
Only genes with P(zg = j) > 0.5
Predictive p-values for data simulated from the model
![Page 20: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/20.jpg)
Checking different mixture distributions
w1Unif(-η-,0) + w2δ(0) + w3Unif(0,η+)
Outer mix. components skewed too much away from zero
Null component too narrow
![Page 21: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/21.jpg)
Checking different mixture distributions
w1Gam-(1.5,η-) + w2 δ(0) + w3Gam+(1.5,η+)
Outer components skewed opposite
Null still too narrow?
![Page 22: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/22.jpg)
Checking different mixture distributions
w1Gam-(1.5,η-) + w2N(0,ε) + w3Gam+(1.5,η+)
Better fit for all components
![Page 23: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/23.jpg)
Implementationg j
pred
wj
ybarg Sg ybargjmpred
σg
μ,τηj
pgj = 0
for t = 1,…,niter { δjt
pred ~ hjt(ηjt) j = 1,…,3
ybargtmpred
N( δjtpred , g
2/nrep ) for j = zgt
pgj pgj + I[ ybargtmpred > ybarg
obs ] for j = zgt
}
pgj pgj / niter(zg=j)
Need ≈ngenes extra parameters at each iteration
![Page 24: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/24.jpg)
Summary of model checking procedure
1. Find part of model where individuals are assumed to be exchangeable (so information is shared)
2. Choose test statistic T (eg. sample mean or variance)
3. Predict Tpred from distribution for exchangeable individuals (whole posterior for Tpred)
4. Compare observed Ti for each individual i to distribution of Tpred
5. For checking mixture components, condition on the best classification
![Page 25: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/25.jpg)
Mixture model for gene expression data
Model checks for mixture model
distribution for gene-specific variances
different mixture priors
Future work: model checks for a clustering and variable selection model (Tadesse et al. 2005)
Outline of Talk
![Page 26: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/26.jpg)
yi vector of gene expression for each sample i = 1,…,n
Multi-variate mixture model for clustering samples:
yi | zi = j MVN(ζj, Λj) j = 1,…,J
P(zi = j) = wj
No. of mix. components (J) is estimated in the model
Aim to select genes which are informative for clustering the samples
Clustering and variable selection (Tadesse et al. 2005)
![Page 27: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/27.jpg)
Clustering and variable selection (Tadesse et al. 2005)
ji
Ci
Ti yy
j
))()(2
1exp( )()(1
)()()(
))()(2
1exp(~| )'()'(1
)'(1
)'()'(
i
n
i
Ti yyzLikelihood
γ = vector of indices of selected variables
γ’ = vector of indices of variables not used to cluster samples
Likelihood conditional on allocation to mixture:
Conjugate priors on multivariate means and covariance matrices
P(γg = 1) = φi = sampleg = genej = mix.
component
![Page 28: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/28.jpg)
Clustering and variable selection (Tadesse et al. 2005)
i = sampleg = genej = mix.
component
Model checking: want to check the distribution for each mixture component separately (conditional on J)
In addition, need to condition on a given variable selection
Clearly impossible computationally
μj(γ) , Σj
(γ)
yi y(γ)jpred
wj
η(γ), Ω(γ) φ
J
![Page 29: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/29.jpg)
1) Run model with no prediction
2) Find the best configuration:
set of selected variables (γ)
no. mixture components J
allocation of samples to mixture components z i
3) Re-run model, with (γ), J and zi fixed, calculated predictive p-values
Computing predictive p-values
pij = Prob( Tjpred > Ti
obs | data, zi=j, J, (γ) )
where T = |y|2 (for example)
![Page 30: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/30.jpg)
Conclusions
Choice of model distributions can greatly influence results of clustering and classification
For models where information is shared across individuals, predictive checks can be used as an alternative to cross-validation
Should be possible to do this even for quite complex models (if you can fit the model, you can check it)
![Page 31: Model checks for complex hierarchical models Alex Lewin and Sylvia Richardson Imperial College Centre for Biostatistics.](https://reader036.fdocuments.us/reader036/viewer/2022062404/5515e163550346cf6f8b4d1e/html5/thumbnails/31.jpg)
Acknowledgements
Collaborators on BBSRC Exploiting Genomics Grant
Natalia Bochkina, Clare Marshall
Peter Green
Meeting on model checking in Cambridge
David Spiegelhalter
Shaun Seaman
BBSRC Exploiting Genomics Grant
Paper and software at http://www.bgx.org.uk/