NIPS machine learning in computational biology presentation
-
Upload
kieran-campbell -
Category
Data & Analytics
-
view
384 -
download
4
Transcript of NIPS machine learning in computational biology presentation
![Page 1: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/1.jpg)
Order under uncertainty: probabilistic approaches to pseudotime
NIPS Machine Learning in Computational Biology
Kieran Campbell University of Oxford
![Page 2: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/2.jpg)
Outline
Introduction
A probabilistic model for pseudotime
Applications
Discussion
![Page 3: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/3.jpg)
Pseudotime: artificial measure of a cell’s progression through some process
pseudotime
ordering
Unordered profiles Ordered profiles
Gene A
Gene B
Cell ordering problem: assign each cell a pseudotime based on expression profile
• Genes differentially expressed across pseudotime
• Clusters of co-expression
![Page 4: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/4.jpg)
Current method: monocle
Proliferating cell
Differentiating myoblast
Interstitial mesenchymal cell
Trapnell, Cole, et al. "The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells." Nature biotechnology (2014).
Independent component analysis
Minimum spanning tree
Ordering
![Page 5: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/5.jpg)
What about uncertainty?All current methods give point estimates of pseudotime
Easy to say whether one cell precedes another
Could have large impact on downstream analyses
![Page 6: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/6.jpg)
Gaussian Process Latent Variable Models
Gaussian processes - nonparametric prior on functions
GP latent variable models assume input parameter x is unknown (t)
Behaviour defined entirely by covariance matrix between different t
Rasmussen & Williams 2006
Lawrence 2004
![Page 7: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/7.jpg)
Probabilistic approaches to pseudotime
Want to learn pseudotime from reduced dimension representation
Bayesian GPLVM to learn probabilistic pseudotime in reduced space
Gives us posterior uncertainty to propagate through to functional analyses
![Page 8: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/8.jpg)
Prior issues
Bayesian inference requires us to define a prior distribution on our parameters
How do we want our pseudotime to look?• Pseudotime artificial - equivalent on any interval • Would ideally like to ‘fill out’ on [0,1] • Identifiability issues
What’s the best strategy?Repulsive prior - low probability when adjacent cells are close
Wang, Ye, and David B. Dunson. "Probabilistic Curve Learning: Coulomb Repulsion and the Electrostatic Gaussian Process." arXiv preprint arXiv:1506.03768 (2015).
![Page 9: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/9.jpg)
Applications to single-cell RNA-seq datasets
Trapnell et al. 2014 (Monocle) - differentiating myoblasts time series data
Shin et al. 2015 (Waterfall) - adult hippocampal neurogenesis
Burns et al. 2015 (Ear) - sensory epithelia in the inner ear
![Page 10: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/10.jpg)
Low dimensional representations
MonocleLaplacian eigenmaps
representation
EarLaplacian eigenmaps
representation
Waterfall PCA
representation
Uncertainty in posterior mean curve (trajectory)
Diffuseness of predictive data
distribution
![Page 11: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/11.jpg)
Posterior uncertainty in pseudotime
Four cells drawn from Monocle dataset (155 cells in total)
95% credible interval typically covers ~ 1/4 pseudotime
Tell whether a cell is at the start, middle or end of a process
“This cell has a pseudotime of 0.12 and this one 0.14” doesn’t make sense
![Page 12: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/12.jpg)
Posterior uncertainty in pseudotime
![Page 13: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/13.jpg)
Approximating the false discovery rateInference gives us samples from the pseudotime posterior
Refit differential expression model for each gene for each sample
Compute p and q values for each sample for each gene
Compute proportion significant for each gene across all samples
Compare to point estimate: false positive if q < 0.05 but proportion significant < 0.95
![Page 14: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/14.jpg)
Approximate false discovery rates
AFDR varies from 4% to 16%
Variable between datasets
Up to around 3x expected, so if you need robust differential expression use a probabilistic approach
Examining genes in pathways still valid
![Page 15: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/15.jpg)
Effect of smoothing parametersCovariance matrix for each dimension
Corresponds to arc-length
Set a hierarchical prior on λ to penalise longer curves
Need some prior expectation of how the pseudotime will look with respect to marker genes
Small levels of shrinkage lead to unstable fits (lumpy posteriors)
Any unsupervised learning of pseudo times in single-cell genomics requires these smoothness considerations
![Page 16: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/16.jpg)
Initial dimensionality reduction stepWe lose some uncertainty in the initial dimensionality reduction step But…
• Posterior already highly multi-modal using two (optimised) reduced dimensions
• Informative to visualise and understand representations with respect to clusters and marker genes
• Most methods involve some dimensionality reduction first - important to understand uncertainty
One solution: use (Bayesian) Hierarchical GPLVM
Dimension D 3 2 1
![Page 17: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/17.jpg)
Multiple representation learning
Likelihood conditionally independent across latent dimensions
Naturally extend to integrate different reduced dimension representations (multiview learning)
Framework for integrating heterogeneous data sources
= + +
![Page 18: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/18.jpg)
Multiple representation learning (II)
Pseudotimes fit individually to each representation
Pseudotimes fit jointly for all representation
![Page 19: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/19.jpg)
Take home messages1. Don’t think of pseudotimes as point estimates
2. Use pseudotime as a rough guide for where a cell is through a biological process
3. Your FDR is probably higher than you think it is
4. If you need robust differential expression, use probabilistic methods
5. All pseudotime methods come with prior expectations about structure and smoothness
6. Don’t get caught up with a particular dimensionality reduction algorithm - they all work
![Page 21: NIPS machine learning in computational biology presentation](https://reader031.fdocuments.us/reader031/viewer/2022022203/58739c131a28ab85438b6a9f/html5/thumbnails/21.jpg)
Acknowledgements
Chris Yau
Caleb WebberChris Ponting & groups
Michalis Titsias
[email protected] @kieranrcampbell kieranrcampbell.github.io