On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko...
-
Upload
randolf-may -
Category
Documents
-
view
217 -
download
0
Transcript of On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko...
![Page 1: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/1.jpg)
On utility of gene set signatures
in gene expression-based class
prediction
Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and
Blaž ZupanMLSB, Ljubljana 2009
![Page 2: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/2.jpg)
Class Prediction & Background knowledge
Central to machine learning research
Inclusion of background knowledge:- increase model stabilty
- increase predictve accuracy- increase interpretability
MLSB, Ljubljana 2009
![Page 3: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/3.jpg)
Domain knowledge in systems biology
Sources:- gene structure & function- biological pathways- protein interactions- literature references
analysis of high-throughput data (DNA microarrays, proteomics data, SNP analysis)
MLSB, Ljubljana 2009
![Page 4: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/4.jpg)
Gene expression microarrays
MLSB, Ljubljana 2009
969
8 at
trib
ute
s (g
ene
s)
54 data instances (samples)
GDS1059: Analysis of mononuclear cells from 54 chemotherapy treated patients less than 15 years of age with acute myeloid leukemia (AML). Results identify expression patterns associated with complete remission and relapse with resistant disease.
![Page 5: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/5.jpg)
Gene sets as background knowledge
MLSB, Ljubljana 2009
GENE SETS – groups of related genes (gene structure, molecular function, biological
pathways)
Explorative analysis:- functional annotations (gene ontology)- enrichment analysis
Gains in:- stability & robustness- insight into the investigated problem
![Page 6: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/6.jpg)
Goal
Use gene sets in inference of class prediction models – Setsig method
Test the gene-set based models:- across a larger set of data sets- across different transformation methods- comparisson with gene based models
MLSB, Ljubljana 2009
![Page 7: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/7.jpg)
Gene set transformation
MLSB, Ljubljana 2009
![Page 8: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/8.jpg)
MLSB, Ljubljana 2009
Setsig method
![Page 9: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/9.jpg)
Related workUnsupervised approaches:– Mean* and Median* (Guo et al., 2005) – Principal component analysis* (Liu et al., 2007) , – Singular value decomposition (Tomfohr et al.,
2005 and Bild et al., 2006)
Supervised approaches:– Partial least squares (Liu et al., 2007) – PCA with relevant gene selection (Chen et al.,
2008) – Activity scores based on condition-responsive
genes* (Lee et al., 2009)– Gene Set Analysis (Efron and Tibshirani, 2007)– ASSESS* (Edelman et al., 2006)
MLSB, Ljubljana 2009
![Page 10: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/10.jpg)
Experimental designData sets
30 data sets from GeneExpression Omnibus (GEO):- 2 diagnostic classes- at least 20 samples- 20 - 187 samples- 932 – 34700 genes
preprocessing:μ = 0, σ2 = 1
MLSB, Ljubljana 2009
Gene sets
Molecular signature data base
(Subramanian et al., 2005)
biological knowledge collections:
C2 - canonical pathways (639)
C5 - gene ontology (1221)
gene set size:5 < genes < 200
![Page 11: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/11.jpg)
Experimental designpredictive models
MLSB, Ljubljana 2009
learners: • support vector
machines• k-nearest neighbors • logistic regression
leave-one-out validation area under ROC (AUC)
original data - GENES
transformed data - GENE SETS:• Setsig• Mean• Median• PCA• CORGs• ASSESS
![Page 12: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/12.jpg)
Results Critical distance graph (Demšar,
2006)
MLSB, Ljubljana 2009
Support vector machines:
Average AUC rank
![Page 13: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/13.jpg)
Results Critical distance graph (Demšar,
2006)
MLSB, Ljubljana 2009
Logistic regression:
Average AUC rank
![Page 14: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/14.jpg)
Surprising? Yes.
1. Gene sets in explorative data analysis – increase stability and robustness of results
2. Contradict current reports:- Edelman et al, 2006 (ASSESS, 6 data sets)- Lee et al, 2009 (CORGs, 7 data sets)- Efron & Tibshirani, 2007 (GSA, 1 data set)
MLSB, Ljubljana 2009
![Page 15: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/15.jpg)
Why worse performance?
1. Do gene sets include class-informative genes?
MLSB, Ljubljana 2009
Average AUC rank
![Page 16: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/16.jpg)
Why worse performance?
2. Gene set signature transformation loses information.
3. Number of samples is too low to estimate gene set scores.
4. Gene sets and pathways are not specific enough to distinguish between different cancer types.
MLSB, Ljubljana 2009
![Page 17: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/17.jpg)
Gene set based class prediction models
MLSB, Ljubljana 2009
• worse/similar performance
(Setsig)
• additional insightVizRank (Mramor et al., 2007
Naive Bayes normogram (Možina et al., 2004)
![Page 18: On utility of gene set signatures in gene expression-based class prediction Minca Mramor, Marko Toplak, Gregor Leban, Tomaž Curk, Janez Demšar and Blaž.](https://reader035.fdocuments.us/reader035/viewer/2022062719/56649ebf5503460f94bca931/html5/thumbnails/18.jpg)
Thanks to...
• Marko Toplak• Janez Demšar• Tomaž Curk• Gregor Leban• Blaž Zupan
MLSB, Ljubljana 2009
• Gregor Rot • Lan Umek• Aleš Erjavec• Miha Štajdohar• Lan Žagar• Črt Gorup• Ivan Bratko