Post on 02-Jan-2016
Objective Bayesian Nets for Integrating Cancer Knowledge
Sylvia Nagl PhD
Cancer Systems Biology & Biomedical Informatics
UCL London
caOBNET: Overview
Knowledge integration by objective Bayesian networks (obNETS)
Maximum entropy method
An integrated clinico-genomic obNET for breast cancer
Conclusions
Bayesian networks
Graphical models • directed and acyclic graph (DAG)
Joint multivariate probability distribution
• with conditional independencies between variables
Given the data, optimal network topology can be estimated
• heuristic search algorithms and scoring criteria
Statistical significance of edge strengths
• Bayesian methods bootstrapping
Apolipoprotein E gene SNPs and plasma apoE level
Rodin & Boerwinkle 2005
Knowledge integration
Cancer treatment decisions should be based on all available knowledge
Knowledge is complex and varied: Patient's symptoms, expert knowledge, clinical databases relating to past patients, molecular databases, scientific papers, medical informatics systems
Generated by independent studies withdiverse protocols
Knowledge integration
Diverse data typesGenomic, transcriptomic, proteomic, SNPs, tissue microarray, histopathology, clinical etc.
New data types, e.g., epigenetic data
All data types capture different characteristics of a dynamic complex system At different spatial and temporal scales Cell, tumour, patient, and therapeutic system of patient-
therapy interactions
How can this disparate data be used for an integrated understanding on which to base our actions?
Objective Bayesianism
Data and knowledge impinge on belief – we try to find a coherent set of beliefs with best fit Beliefs based on undefeated items of knowledge In case of conflict, try to find compromise beliefs
Objective Bayesianism offers a formalism for determining the beliefs that best fit background knowledge
Applying Bayesian theory, an agent’s degree of belief should be representable by a probability function p
Empirical knowledge imposes quantitative constraints on p
Represented in an obNET (learnt from database)
obNETS for prediction
Standard algorithms can be used to calculate the probability of a specific outcome
A direct link between variables may suggest a causal connection
Bayesian networks
Can BNs be integrated?
Spanning genetic/molecular and clinical levels
obNETS offer a principled path to knowledge integration
Maximum entropy principle
Adopt p, from all those that satisfy the constraints, that are maximally equivocal
Williamson, J.(2002) Maximising Entropy Efficiently. Williamson, J. (2005a): Bayesian Nets and Causality. Williamson, J. (2005b): Objective Bayesian nets.
www.kent.ac.uk/secl/philosophy/jw/
Example
Two items of empirical knowledge may conflict:
Study 1: Cancer will recur in 50% of patients with given set of characteristics
Degree of belief in recurrence in individual patient = 0.5 Study 2: Frequency of recurrence is 30%
Degree of belief will be constrained to closed interval [0.3,0.5]
In general: Belief function will lie within a closed set of probability
functions There will be a unique function that maximises entropy
obNet integration
obNet integration
Original obNETs provide probability distributions
obNET integration
obNET integration
obNET integration
n number of nets
obNET integration
Maximum entropy principle
If CPTs for merged nodes disagree on probabilities,
assign closed interval and take least committal value in that range
obNET integration: Proof of principle
Two obNETs from breast cancer knowledge domain
Genomic: Comparative genome hybridisation (CGH) data - progenetix database Subset of bands with 3 or more genes implicated in tumour
progression and response to cytotoxic therapies (28 bands)
Clinical: American Surveillance, Epidemiology and End results (SEER) database
Clinical and genomic nets (Hugin 6.6)
SEER database 4731 cases
progenetix database 28 bands/502 cases
?
obNet integration
obNet learnt from 2nd progenetix dataset - 119 cases with clinical annotation (lymph node status, tumour size, grade)
22q12: -1 0 1LN:0 0.148 0.5 0.148 1 0.852 0.5 0.852
CPT
Additional empirical knowledge
Fridlyand et al. 2006
chr. 22
obNet integration
Fridlyand et al. 2006
chr. 22
CPT
obNet integration
Fridlyand et al. 2006
chr. 22
CPT
KREMEN1
MYH9
cadherin11
CD97
BMP7, ELMO2, BCAS1, BCAS4, ZNF217
Metastasis-associated genes
KREMEN1
Howard et al., 2003
Biological knowledge suggests possible causal link
(in context of whole obNET – HR status!)
Molecular profiling of tumours
Cancer clinical data & epidemiology
Translation of clinical data to genomics research
M
ulti-
scal
e ob
NE
Ts
Predictive markers
Knowledge integration
Acknowledgements
Jon Williamson (Philosophy, Unversity of Kent)
www.kent.ac.uk/secl/philosophy/jw/
Matt Williams (Cancer Research UK) Nadjet El-Mehidi (Cancer Systems Biology, UCL) Vivek Patkar (Cancer Research UK)
Contact: s.nagl@ucl.ac.uk
obNET integration: Proof of principle
Two obNETs Non-independent rearrangements at chromosomal
locations in breast cancer from comparative genome hybridisation (CGH) data - progenetix database Subset of bands with 3 or more genes implicated in tumour
progression and response to cytotoxic therapies (28 bands)
Probabilistic dependencies between clinical parameters from the American Surveillance, Epidemiology and End results (SEER) database
HR status link
Genomic systems
Genomes are dynamic molecular systems Selection acts on unstable cancer genomes as integrated
wholes, not just on individual oncogenes or tumour suppressors.
A multitude of ways to ‘solve the problems’ of achieving a survival advantage in cancer cells: Irreversible evolutionary processes Randomness of mutation Modularity and redundancy of complex systems
Genome-wide rearrangements
Can we identify probabilistic dependency networks in large sample sets of genomic data from individual tumours?
If so, under which conditions may these be interpreted as causal networks?
Can we identify probabilistic dependency networks involving molecular and clinical levels?
Systems Biology and Causation
Profound conceptual challenge regarding physical causation in complex biological systems
Mutual dependence of physical causes
The biological relevance of any factor, and therefore “the information” it conveys, is jointly determined, frequently in a statistically interactive fashion, by that factor and the system state (Susan Oyama, The Ontogeny of Information, 2000)
The influence of a gene, or a genetic mutation, depends on the context, such as availability of other molecular agents and the state of the biological system, including the rest of the genome
Cell networks are dynamically instantiated – genes for components are switched on or off in response to signals and cell state
System state
agents
Cell networks are reconfigured in response to changes in environment or cell’s internal state
System state
Cell computation networks are reconfigured in response to changes in environment or cell’s internal state
System state
Cancer: Genome instability re-programs cell networks
Selection for increased proliferation, resistance, invasiveness etc.Driven by tumour cell – tissue interactions
Genome-wide rearrangements
Can we identify probabilistic dependency networks in large sample sets of genomic data from individual tumours?
Can we identify probabilistic dependency networks involving molecular and clinical levels?
Proof of principle
Screen the whole genome for chromosomal abnormalities in one experiment
Cytogenetics
Comparative genomic hybridization (CGH) Fluorescence in situ hybridization (FISH) and multicolour
fluorescence in situ hybridization (MFISH) Detection of allelic instabilities, loss of heterozygosity (LOH)