Affymetrix/BioCarta comparison & Java-based pathway analysis Michael Edmonson 2/26/2003.
-
Upload
gerard-dorsey -
Category
Documents
-
view
213 -
download
0
Transcript of Affymetrix/BioCarta comparison & Java-based pathway analysis Michael Edmonson 2/26/2003.
Affymetrix/BioCarta comparison & Java-based pathway analysis
Michael Edmonson <[email protected]>
2/26/2003
Goals• Create programmable models of BioCarta pathway
gene interaction networks• Encode “rules” of known gene interactions in
software• Create association between available experimental
assays (microarrays) and pathway elements• Populate model with experimental data and
compare with expected states
BioCarta pathway example
Basic uses of model
• Static state diagram
• Dynamic system
Static/state-based modeling• Load model with static “snapshot” or state data taken
from microarray experiment• With data from normal tissues, use resulting state to
validate model (is the data consistent with the rules of the model?)
• With cancerous data, see if state of the model can be explained by “broken” logic: detect breakdowns in normal gene function and attempt to backtrace failures to first causes
Dynamic modeling• Integration of code with higher-level applications• Model will be a working system whose state changes
over a period of time\• Systematic/programmatic exploration of effects of
arbitrary changes in the model’s state• Explore interconnections between pathways
Source data
Fundamentals
• Functionality of model will be dictated by data used to populate it
• Need to connect BioCarta pathways with Affymetrix assays– Desirable to automatically maintain mapping as
new data becomes available
• Web-based chip/pathway browsing tools
Available BioCarta data
• List of pathway names and genes contained within them
• Graphic-only pathway diagrams (no annotations of relationships between pathway elements)
• Not computable
Available Affymetrix dataExisting database tables: Bob Clifford et al.:
Table Contentsrflp.affychip Probe information by chiprflp.affy_test Experimental data (Leslie
Derr et al.)rflp.affy_seq Probe sequencerflp.affy2ug UniGene cluster mapping
(static)clifforr.affy_tissue Tissue code table
clifforr.affy_histology Histology code tableclifforr.affy_sample Sample information
New database tables: BioCarta
TABLE CONTENTSbiocarta_pathways Name, description
biocarta_genes Name, gene list
biocarta_keyword Keywords from name, genes
• derived from CGAP flatfile
• RFLP database on LPG server
New tables: BioCarta to AffyTable ContentsAffyacc2gene Translates affy probe accessions to
gene symbols via UniGeneAffy_pathway For each pathway and chip, count
and percentage of genes presentAffy_pathway_gene Detail of present/absent genes for
each pathway and chipAffy_biocarta_basis UniGene build used for mapping
• “pathway” bot keeps tables updated with each new UniGene build
• revisions needed: UniGene clustering issues, ambiguous probes, etc
Chip/pathway browser:affy2biocarta
affy2biocarta
• http://lpgfs.nci.nih.gov:82/perl/affy2biocarta• Frontend to database; details how well pathways are
covered by individual chips• Searchable by gene, pathway or chip• Master report for each pathway of best chip to use• Ability to search for probes for missing genes on
other chips
affy2biocarta: top-level
affy2biocarta: pathway selector
affy2biocarta: pathway/chip selector
affy2biocarta: gene detail
• Puzzlements: multiple sequences, missing entry
affy2biocarta: “missing” gene search
• Note probes were found on an earlier chip!
Omissions in chip revisions
• HG-U133A generally has the most complete pathway coverage
• However, for 45 genes in BioCarta pathways no matching probe accessions could be found
• Of these 45:– 32 (71%) were found in Hs.127 (which predates 133 set)
– 36 (80%) were found on other chips
Multiple sequences/probes for same gene
• A single pathway element (gene) may have multiple probes/sequences representing it
• These states often do not all agree in expression data
• Relationship between probes and BioCarta elements needs clarification
Expression data with disagreements
Often not a 1:1 relationship between Affymetrix probes and pathway entries...
Pathway interconnections
• Many genes appear in multiple pathways, a few appear in many
• Concept of “connectome”, a.k.a. “furball”
• Potential for indirect feedback from greater system (no pathway is an island)
• Difficult to explore in detail without database of connections
Genes in multiple pathways
Java modeling
Implementation: Java• OOP
• Pathways are completely encapsulated in objects which can be embedded in higher-level programs– Programmatic control of node and connection states
• Simple classes representing elements in pathway and connections between them– Nodes, Connections, Complexes– ability to propagate signals around the network
Node
• A discrete component in the network: usually a gene but can be any event which can effect the system (contact inhibition, etc.)
• Each node has a state, which is currently binary (on or off)– Binary states resemble “present/absent” expression data, but
this highlights contention/deadlocking problem
• Contains incoming and outgoing connections to other nodes in the network
Connection
• Object describing a link between nodes and the relationship between them
• abstract execute_action() method implemented by different connection types
• example:– LogicalConnection: state of source node determines state of destination
node
– SimpleActivator, SimpleBlocker
• Connections may be individually disabled to emulate non-functioning of upstream process
Complex
• Container for multiple discrete subelements
• Provides higher-order logic based on evaluation of components’ state; e.g. performing some action only when all subcomponents are considered active
• additional functionality beyond component parts
State change propagation
• Setting the state of a node propagates the effect of that change on downstream connections
• During propagation a list of initiating nodes is accumulated and passed along; propagation stops if an initiating node is encountered again (prevents infinite loops)
What’s Next• State validation/sanity checking
• Diagnosis/backtracing of “broken” logic
• More subtle states and connection types (beyond a binary system)
• Improved probe/gene mappings
• Automated model instantiation from curated database
• Incorporation into higher-level programs