Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana...

7
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er

description

Module networks Bayesian model that benefits from high correlation of groups of variables [2] Algorithm similar to EM (but hard decisions). Loop: –Module assignment step: assign variables to modules –Structure search step: calculate CPD for each module Module 1 Module 2 Module 4 Module 3

Transcript of Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana...

Page 1: Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.

Learning disjunctions in Geronimo’s regression treesFelix Sanchez Garciasupervised by Prof. Dana Pe’er

Page 2: Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.

Motivation• Gliobastoma: most common primary brain tumour in adults. • Newly diagnosed patients have an average survival of 1 year.• Need for better models of the network.• Data used to create models: microarrays

# genes 8000

# candidate regulators 800

# samples 120

Page 3: Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.

Module networks• Bayesian model that benefits from high correlation of groups of

variables [2]• Algorithm similar to EM (but hard decisions). Loop:

– Module assignment step: assign variables to modules– Structure search step: calculate CPD for each module

Module 1 Module 2

Module 4

Module 3

Page 4: Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.

Regression trees as CPD• Regression trees are used for each module’s CPD • Internal nodes: condition on a single variable• Leaf nodes: parameters for normal distribution• Bayesian score

• Exhaustively calculates score for each split for each regulator

……target gene’s values sorted by regulator

pdf of normal-gammaprior on structure (complexity+biological penalties)

x<0.3

y>-0.2

Page 5: Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.

Incorporating pathway information

• Biological pathways: contain sets of genes and represent chains of biochemical reactions that perform some function

• Aberrations in gliobastoma tend to occure as disjunctions within pathways: derregulating 1 component is usually enough to alter the function of the whole pathway [4]

• Idea: use pathway information to obtain a better model• Methodology: extend node conditions to disjunctions of

conditions on pathway elements• We will use 15 sets of regulators (20-30 genes per set)

– 5 sets of regulators of pathways known to be related to cancer.

– 5 sets of regulators of other pathways– 5 sets of regulators chosed at random

Page 6: Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.

Problem setting• Concept class: disjunction of threshold functions on a single

variable• Loss functions: -Bayesian score (biological penalty?)• Potential number of hypotheses: 2^{m}

• Related classification problem tackled by Marchand and Shah (2005) and Kestler et al. (2006).

Page 7: Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.

Bibliography

1. Pe'er, D., Bayesian Network Analysis of Signaling Networks: A Primer. Sci. STKE, 2005. 2005(281): p. pl4-.

2. Segal, E., et al., Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet, 2003. 34(2): p. 166-176.

3. Lee, S.-I., et al., Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proceedings of the National Academy of Sciences, 2006. 103(38): p. 14062-14067.

4. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 2008. 455(7216): p. 1061-1068.

5. Kestler, H., W. Lindner, and A. Müller, Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles, in Artificial Neural Networks in Pattern Recognition. 2006. p. 286-297.

6. Marchand, M. and M. Shah, PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data, in Advances in Neural Information Processing Systems 17. 2005, MIT Press: Cambridge, MA. p. 881-888.