Evaluation and integration of multiple datasets using Bayes theorem
-
Upload
odette-everett -
Category
Documents
-
view
26 -
download
1
description
Transcript of Evaluation and integration of multiple datasets using Bayes theorem
![Page 1: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/1.jpg)
Evaluation and integration of multiple datasets
using Bayes theorem
John van Dam
![Page 2: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/2.jpg)
How can we integrate multiple datasets?Proteomics data
Genetic dataPublished data
Expression data Evolutionary data
?
![Page 3: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/3.jpg)
How can we integrate multiple datasets?Proteomics data
Genetic dataPublished data
Expression data Evolutionary data
![Page 4: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/4.jpg)
Thomas Bayes (1701 – 1761)• Presbyterian minister• Fellow of the Royal Society
• Published two works:• A religious essay• An essay defending the work of Sir Isaac Newton
• His work on the “Bayes’ theorem” was published by Richard Price in 1763
• Mathematics of probabilities• A hot topic in science in early 18th century• A lot of people at the time were interested in mathematics,
statistics and probabilities because of gambling!
![Page 5: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/5.jpg)
Bayes’ theorem
• P(A|B) = Probability of A given observation B
• P(B|A) = Probability of observation of B given A
• P(A) = The a priori probability of A
• P(B) = The probability that B is observed
• Bayes’ theorem deals with “inverse probabilities”
![Page 6: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/6.jpg)
Example:• A friend tells you he had a nice conversation with someone in the train to
Nijmegen• What is the chance that this other person is a woman?• Your friend only tells you that this person has long hair.
• Does this change the previous probability?
• Say:• 75% of women have long hair• 15% of men have long hair
![Page 7: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/7.jpg)
Bayes’ theorem• What if your friend told you that this person was also wearing high heels?• We can use P(W|L) as the new prior!
• This is called Bayesian updating• You adjust your ‘belief’ with each new piece of information!
• Bayesian updating assumes no relationship between L and H other than via W!
![Page 8: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/8.jpg)
Bayesian odds• For convenience we can rewrite Bayes’ equation into odds (or Bayes factor)
![Page 9: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/9.jpg)
Bayesian odds• If we now perform Bayesian updating we can simply write
![Page 10: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/10.jpg)
Beware of ‘extreme’ cases (or priors)
• “A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.”• http://www2.isye.gatech.edu/~brani/isyebayes/jokes.html
• What did we just “probabilistically” describe if the person was actually a man?
![Page 11: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/11.jpg)
How can we integrate multiple datasets?Proteomics data
Genetic dataPublished data
Expression data Evolutionary data
![Page 12: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/12.jpg)
Ciliary biology; a relatively young field
![Page 13: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/13.jpg)
Ciliated tissues (some examples)Inner ear:Cilia function in hearing and balance
Cerebral cavities, Bronchia &Fallopian tubes
Retina:Cones and Rods
Sperm cells
![Page 14: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/14.jpg)
Bayesian integration on SysCilia data
• Tandem Affinity Purifications & SILAC
• Yeast 2 Hybrid screens
• Ciliary evolutionary co-occurrence
• Gene presence/absence profiles matching ciliary presence/absence
• System co-expression
• Genes with XBOX transcription factor binding sites
• What is the probability that gene X is ciliary given that
it is reported by experiments 1, 2, 3, …, and n?
15
![Page 15: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/15.jpg)
Bayesian integration of multiple observations
• n is the number of datasets considered• fi = dataset i
• P(fi|T) = probability that a gene is reported by dataset i given it is a known ciliary gene
• We take log odds because deviations, caused by rounding and measurement errors, are not enlarged with each multiplication
![Page 16: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/16.jpg)
Can we say something about genes that were not reported?
• In case of yes/no experiments, “No” can also have meaning.
• In case we have a result which has a value, we can use categories.For instance:
• Each gene falls into one category for each experiment.
![Page 17: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/17.jpg)
Evaluating True and False per experiment• We need a list of known ciliary genes (a Gold Standard)• We need a list of known non-ciliary genes (a Negative Set)
• Then simply becomes
Fraction of GS reported by experiment iFraction of NS reported by experiment i
![Page 18: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/18.jpg)
Gold Standard & Negative set
![Page 19: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/19.jpg)
![Page 20: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/20.jpg)
System co-expression
![Page 21: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/21.jpg)
Distinguishing between ciliary vs. non-ciliary genes
![Page 22: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/22.jpg)
Ranking based on Bayesian integration
23
The Bayesian integration enriches for more known ciliary genes, than the individual datasets. We can control for False Discovery Rate.
CiliaryPredictedNon-ciliary
![Page 23: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/23.jpg)
ROC-curve and performance of individual datasets
24
AUC: 0.86
![Page 24: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/24.jpg)
Application of the Bayesian integration• Predicting causative genes in ciliopathy disease loci or exome data• Predict which genes are likely involved in ciliary function, and which are not• Example BBS5 locus (182 genes):
25
Ensembl GeneID Gene Symbol Rank Score
ENSG00000123607 TTC21B 65 5.580545431
ENSG00000163093 BBS5 99 4.863543816
ENSG00000154479 CCDC173 157 3.916022407
ENSG00000081479 LRP2 503 0.756945148
![Page 25: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/25.jpg)
Conclusion• Bayesian integration is a powerful way to predict novel ciliary genes by
objective evaluation and integration of experimental datasets• New datasets can easily be incorporated
• You can use such a Bayesian integration to• Predict novel ciliary genes• Rank target genes from new experiments• Predict causative genes in patient exome data
![Page 26: Evaluation and integration of multiple datasets using Bayes theorem](https://reader035.fdocuments.us/reader035/viewer/2022062719/56813055550346895d960712/html5/thumbnails/26.jpg)
Acknowledgements• Huynen Lab, Radboud UMC
• Roepman lab, Radboud UMC
• Oliver Blacque, UCD Dublin
Ueffing lab, Tübingen