Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical...

35
Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University of California Physical Biosciences Division Lawrence Berkeley National Laboratory Berkeley, CA 94720 [email protected] http://genomics.lbl.gov

Transcript of Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical...

Page 1: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Bio/Spice: Towards a Network Bioinformatics

NIH, July 2001

Adam Arkin

Howard Hughes Medical Institute

Departments of Bioengineering and Chemistry

University of California

Physical Biosciences Division

Lawrence Berkeley National Laboratory

Berkeley, CA 94720

[email protected]

http://genomics.lbl.gov

Page 2: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Can Molecular Biology Become Cellular Engineering?

Prediction, Control and Design

Funding: ONR, DOE, DARPA, NIH

Page 3: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Adult1.5 mm long~1000 cells

Genome projects are providing parts lists for the genetic and protein components of the cellular circuitry. Bioinformatics analysis of this data provides protein function and sometimes structure by homology, partial identification of regulatory sites on the DNA and functional RNAs. Partial networks can be constructed by homology to known biochemical networks. Genetic defects that lead to disease can also be identified at this level. Evolutionary relationships among organisms can also be calculated from this data.

Genome projects are providing parts lists for the genetic and protein components of the cellular circuitry. Bioinformatics analysis of this data provides protein function and sometimes structure by homology, partial identification of regulatory sites on the DNA and functional RNAs. Partial networks can be constructed by homology to known biochemical networks. Genetic defects that lead to disease can also be identified at this level. Evolutionary relationships among organisms can also be calculated from this data.

Structural biology provides experimental data on the 3-dimensional structure of biomolecules and computational approaches to predicting structure from sequence and for predicting biomolecular recognition. Both static and dynamic models of biomolecular interactions are the basis for rational drug design and automated biochemical reaction network prediction. Biochemical studies also provide much of this information as well as quantification of the kinetics and thermodynamics of the interactions.

Structural biology provides experimental data on the 3-dimensional structure of biomolecules and computational approaches to predicting structure from sequence and for predicting biomolecular recognition. Both static and dynamic models of biomolecular interactions are the basis for rational drug design and automated biochemical reaction network prediction. Biochemical studies also provide much of this information as well as quantification of the kinetics and thermodynamics of the interactions.

Biochemical and genetic network analysis integrates data from all the steps above to provide a prediction of cellular system function. Such analyses provide insight into how cells process and act upon complex external and internal signals. These are the fundamental control mechanisms that: 1) lead to partial penetrance of genotype and maintenance of population heterogeneity, 2) determine reliability of cellular function and the propensity for disease given partial failure of a network component, 3) govern adaptation of pathogens to pharmaceutical attack, the stages of facultative infection and dynamical diseases, and 4) may provide the basis for reversal of development defects and early detection of cellular control failure.

Biochemical and genetic network analysis integrates data from all the steps above to provide a prediction of cellular system function. Such analyses provide insight into how cells process and act upon complex external and internal signals. These are the fundamental control mechanisms that: 1) lead to partial penetrance of genotype and maintenance of population heterogeneity, 2) determine reliability of cellular function and the propensity for disease given partial failure of a network component, 3) govern adaptation of pathogens to pharmaceutical attack, the stages of facultative infection and dynamical diseases, and 4) may provide the basis for reversal of development defects and early detection of cellular control failure.

Ultimately, integration of genomic data and genome derived data such as that from gene chips, structural and molecular dynamic data, network functional analyses and data, will lead to a quantitative understanding of differential developmental processes and finally a full tracing of the molecular basis of development from fertilized egg to adult organism

Ultimately, integration of genomic data and genome derived data such as that from gene chips, structural and molecular dynamic data, network functional analyses and data, will lead to a quantitative understanding of differential developmental processes and finally a full tracing of the molecular basis of development from fertilized egg to adult organism

Page 4: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Single cells in the wave

Human neutrophil tracking aStaphylococcus.

Drosophila melanogaster embryodeveloping

Myxococcus xanthus colony undergoing traveling wave self-organization on its way to sporulation.

Complex Behaviors of Cellular Systems

Photos from everyone but me

Page 5: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

>25 signalsInhomogenous environment

Non-simple geometrical spaceSite of infection

Primary chemoattractant

Response cytokineAnotherCytokine

Page 6: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Actin

PIPKgPIPK

P10

PIP4,5

PIP3,4,5

Rac

Goals of “Network Biology Approach”

SHiPPlx

oror

or

1. From the elementary interactions among the participating models, explain the complex behavior of a cellular function.• The Alliance for Cellular Signaling has

identified over 600 molecules involved in G-protein coupled signal transduction.

2. By comparing networks from many organisms, deducing the engineering principles by which cell perform particular functions and deal with uncertainty in their environment.

Page 7: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

These networks become quite large and complex

Page 8: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Tucker, Gera, and Uetz (2001)

Page 9: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Genetic Engineering and Measurement

Methods for manipulating DNA have become better and better(Methods for design proteins, etc, are still not so good)

Methods for measuring cellular components exploding!(Still needs lots of improvement)

Page 10: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Goals

From Genome Sequence (and other data)

Reverse Engineer Cellular NetworkPredict Cellular FunctionDiagnose Failures (Disease)Design Control (Disease Treatments)

Forward Engineer New FunctionUse discovered control laws for biomimetic systems

Page 11: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

What would success look like?

1. Very rapid deduction of new cellular function from well-controlled experiments

2. Rapid prediction of controllable aspects of cell function and design of control protocols

3. Robust forward design of novel function and systems1. Need for a rapid manufacture protocol

4. Identification of novel computational and control algorithms that can be abstracted into machinery.

Page 12: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Building a Rational Engineering Tool for Biosystems

SPICE for Cells?

Page 13: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Analysis and engineering of cellular circuitry

Courtesy of IBM From: Wasserman Lab, Loyola

Asynchronous Digital Telephone Switching Circuit

Full knowledge of parts listFull knowledge of “device physics”Full knowledge of interactions

No one fully understands how this circuit works!!Its just too complicated.

Designed and prototyped on a computer (SPICE analysis)Experimental implementation fault tested on computer

Asynchronous Analog Biological Switching Circuit

Partial knowledge of parts listPartial knowledge of “device physics”Partial knowledge of interactions

No one fully understands how this circuit works!!Its just too complicated.

We need a SPICE-like analysis for biological systems

Page 14: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

SPICE: Simulation Program for Integrated Circuit Evaluation

Parts database

From subcircuitdatabase

Integrated circuit

database

Automatedfault

diagnosis

Page 15: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Genome Sequence

Genes/Regulatory Sequence

Proteins/RNAs

Other Chemical Species

Biochemical Pathways/Dynamics

Cytomechanical/Spatial Processes

Cell Development/Signaling

Tissue Physiology/Development

Organism Behavior

Tools for “multilevel” analysis

Finding Parts

Physical properties

Cellular networks

Assembled Genomes Polymorphisms

ORF Identification DNA Regulatory ID RNA Gene ID

mRNA Regulation mRNA Splicing RNA 2° Struct

Protein Sequence ID Homology Modeling RNA 3° Struct

Protein 3° Struct Protein Function ID RNA Function ID

Molecular InteractionPrediction

Chromatin StructureMacromolecular

Dynamics

Biochemical and Genetic Network Prediction

Metabolic/BiosyntheticAnalysis & Engineering

Signal TransductionAnalysis

Gene expression/networkAnalysis

Cytomechanical Analysis

Morphogenesis & Development

HomeostasisCell-Cell

Interactions

Tissue MechanicsCell Behavior &

EngineeringOrganismal Behavior

Epidemiological/EcologicalModels

CancerDynamics

Multi-organism function: e.g.Infectious disease

Page 16: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Design Philosophy and Goals

•Weakly-coupled architecture

•Provides application framework for extensibility

•Highly configurable to non-programmers

•Modular, object-oriented simulation and model analysis

•Multiple-layers of simulation, analogous to SPICE

•Full database and knowledge environment

•Realms of current development: GUI, middleware/kernels, and database

Page 17: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.
Page 18: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

System Architecture

Local DB

GUI

Database access layer

Database

Reflection of remote DBs

Remote DBs

GUI component server

Analysis Kernels

Componentmanager

component 1

component 2

component 3

component n

Page 19: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

BIO/SPICE: Databasing, simulation and analysis

Bio/Spice: A Web-Servable, Biologist-Friendly, database, analysis and simulation interface was developed into a true beta product.

Interfaces to ReactDB, MechDB, and ParamDB.

With Kernel, performs basic:flux-balance analysis, stochastic and deterministic kinetics,Scientific Visualization of results.

Notebook/Kernel design optimized for distributed computing.

Page 20: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

GUI must represent biological models at different levels of abstraction.

Page 21: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.
Page 22: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.
Page 23: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.
Page 24: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.
Page 25: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.
Page 26: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Database

Local DB

Remote DBs

Databaseaccess layer

•Relational, open source

•Local database: NCBI / BIND schemas + modifications

•Reflections of useful remote databases

•API allows common database use among lab tools

Also tracks:

Data provenance

Data type: hypothetical, computed, measured

Quality measures: Edited/community

Authorities: submission, revision

Reflection of remote DBs

Page 27: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.
Page 28: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.
Page 29: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Knowledge representation for data classification and analysis

Data Ontology

Analysis Ontology

Mathematical Ontology

Cellular Ontology

Aid to user in decision making.Allows for data fusion.

Motion, Shape Change, Transport, Transformation

Differential, Algebraic, Stochastic

Page 30: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Leaves of the ontologies: Cellular

Gene expression

Transcription Translation

Initiation RBS Binding

Forms a hierarchy for modeling and data

Elongation Termination

Page 31: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Levels of AbstractionPhysical Mathematical Conceptual

Molecular Mechanics Time-scale separation Phenomenal Modelsab initio Ensemble averaging Boolean ApproximationsSemiempirical Large system limits Modularization(bioinformatic) Global/Local stability

Molecular DynamicsChemical Master EquationLangevin EquationsDeterministic KineticsReaction-Diffusion

Discrete MechanicalContinuum Mechanical

Statistical/Thermodynamic

Page 32: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Analysis kernel

Componentmanager

Mathematica dispatcher

MATLAB dispatcher

Bio/Spice simulator

component n

•Configuration XML

•Client/Server registry model

Page 33: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Automated Analysis/Target Hypothesis

Page 34: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Data Generation

Raw Data Storage

Data Filteringand Mining

Data Linkageto

Knowledge Base

Knowledge Base

Population

Gene Expression

Protein Expression

MetaboliteExpression

CellularPhysiologic

Imaging

LiteratureDatabase

Annotation

NetworkConstruction

NetworkDeductionStatistical

Data Modeling/QC

“Significant”Effect

Detection

PhenotypeCatalog

BiologicalSub-modelProduction

NetworkAnalytical

Suite

NetworkSimulation

SuiteBioinformatic

ToolIntegration

Stage I Stage II Stage III Stage IV Stage V

PerturbationSequenceDesign

ExperimentalReplication

Specific HypothesisTesting

Page 35: Bio/Spice: Towards a Network Bioinformatics NIH, July 2001 Adam Arkin Howard Hughes Medical Institute Departments of Bioengineering and Chemistry University.

Conclusions

It is time to move cell biology into a true engineering discipline

To do this we will need to develop a “sytems” theory of cell phenomenaPhysical models of cellular processes

Precise measurements of many variables in single cellsAbstractions of processes derived from physical models

Theories of how subprocesses communicateTheories of network decomposition

These circuits are not like electronic (or electrical) circuits but they Achieve pretty amazing engineering feats.

Knowledge representation is perhaps the central challengeOpen-source/freeware software development necessary.