Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana...
-
Upload
nelson-oconnor -
Category
Documents
-
view
217 -
download
0
description
Transcript of Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana...
Learning disjunctions in Geronimo’s regression treesFelix Sanchez Garciasupervised by Prof. Dana Pe’er
Motivation• Gliobastoma: most common primary brain tumour in adults. • Newly diagnosed patients have an average survival of 1 year.• Need for better models of the network.• Data used to create models: microarrays
# genes 8000
# candidate regulators 800
# samples 120
Module networks• Bayesian model that benefits from high correlation of groups of
variables [2]• Algorithm similar to EM (but hard decisions). Loop:
– Module assignment step: assign variables to modules– Structure search step: calculate CPD for each module
Module 1 Module 2
Module 4
Module 3
Regression trees as CPD• Regression trees are used for each module’s CPD • Internal nodes: condition on a single variable• Leaf nodes: parameters for normal distribution• Bayesian score
• Exhaustively calculates score for each split for each regulator
……target gene’s values sorted by regulator
pdf of normal-gammaprior on structure (complexity+biological penalties)
x<0.3
y>-0.2
Incorporating pathway information
• Biological pathways: contain sets of genes and represent chains of biochemical reactions that perform some function
• Aberrations in gliobastoma tend to occure as disjunctions within pathways: derregulating 1 component is usually enough to alter the function of the whole pathway [4]
• Idea: use pathway information to obtain a better model• Methodology: extend node conditions to disjunctions of
conditions on pathway elements• We will use 15 sets of regulators (20-30 genes per set)
– 5 sets of regulators of pathways known to be related to cancer.
– 5 sets of regulators of other pathways– 5 sets of regulators chosed at random
Problem setting• Concept class: disjunction of threshold functions on a single
variable• Loss functions: -Bayesian score (biological penalty?)• Potential number of hypotheses: 2^{m}
• Related classification problem tackled by Marchand and Shah (2005) and Kestler et al. (2006).
Bibliography
1. Pe'er, D., Bayesian Network Analysis of Signaling Networks: A Primer. Sci. STKE, 2005. 2005(281): p. pl4-.
2. Segal, E., et al., Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet, 2003. 34(2): p. 166-176.
3. Lee, S.-I., et al., Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proceedings of the National Academy of Sciences, 2006. 103(38): p. 14062-14067.
4. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 2008. 455(7216): p. 1061-1068.
5. Kestler, H., W. Lindner, and A. Müller, Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles, in Artificial Neural Networks in Pattern Recognition. 2006. p. 286-297.
6. Marchand, M. and M. Shah, PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data, in Advances in Neural Information Processing Systems 17. 2005, MIT Press: Cambridge, MA. p. 881-888.