Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from...
![Page 1: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/1.jpg)
Pacific Northwest National Laboratory
U.S. Department of Energy
DOE Data WorkshopView from Information-intensive
Applications
H. Steven WileyBiomolecular Systems Initiative
Pacific Northwest National Laboratory(www.sysbio.org)
![Page 2: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/2.jpg)
2
Information Intensive ScienceGoals of IIS
Understanding systems versus individual phenomena Strengthening/automating links between different types of data from different scales
Examples Biology: Cell Signaling Biology: BIRN Chemistry: CMCS Homeland Defense Complexity of systems is becoming pervasive
Challenges Efficient federation, graph-based queries Continuous data correlation Managing complex experiments, data provenance using multiple independent data and analysis
resources
Priorities High-performance federation, data mining, semantic query capabilities (software, hardware
architecture) Knowledge environments (lightweight, evolvable, powerful, …) Organization and Visualization of large-scale, complex information
![Page 3: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/3.jpg)
3
A systems-science approach to address complex problems
New knowledge is assimilated from different data, tools, and disciplines at each scale
Real-time bi-directional information flow Deep analysis across scales Multiple applications for the same information
Challenges Data, provenance, annotation publication Syntactic and Semantic Federation Standardization versus innovation
Examples: IUPAC – update of radical thermochemistry reference
values by global expert group PrIMe – community developed optimized reaction
mechanismsguiding experimental plans across scales, providing
community resources for applied research
Combustion is a Multi-scale Chemical Science Challenge
![Page 4: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/4.jpg)
4
Volume of data, orders of magnitude larger and at different levels of abstractionComplexity of information spaces into very high dimensions, 200 the normInformation often out of context, incomplete, fuzzyDeceptionInformation in all media types: text, imagery, video, voice, web, sensor dataTime and temporal dynamics fundamentally change the approachSpatial, yet non-spatial abstract dataMultiple ontologies, languages, culturesPrivacy Issues
Homeland Security: Pulling insight out of information overload
ImmigrationFinancial
Sensors
Shipping
Communications
Is there adomesticterrorist
plot?
Is there adomesticterrorist
plot?
Can we detect and prevent a terrorist attack BEFORE it happens?
For homeland security and science For homeland security and science we now turn to data-intensive visual analyticswe now turn to data-intensive visual analytics
![Page 5: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/5.jpg)
5
![Page 6: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/6.jpg)
6
Molecularparameters:protein levels / states /locations / interactions / activities
Cellfunction: death,proliferation,differentiation,migration, ...
Systems Biology of Cells
Ultimate aim: Understanding andpredictionof effects ofcomponent properties
![Page 7: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/7.jpg)
7
![Page 8: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/8.jpg)
8
![Page 9: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/9.jpg)
9
What, Where, Quantity, Quality?
What parts are being made? (identity)What is the regulatory network structured? (interactions)Where are the proteins located in cell? (location)What are their levels? (quantity) How do they interact with their partners? (activity)
As a function of covalent modification Contribution of steric restrictions Forward and reverse rate constants
To successfully model a complex biological system, one must minimally
know the following information:
![Page 10: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/10.jpg)
10
Cells as Input-Output Systems
Biologists look at their experiments as input-output systemsWe start with a “defined” system to which we apply a stimulus (Input: independent variable)We then look for a specific response (output: dependent variable)The relationship between the input and output provides insight into the workings of the system
SystemInput Output
Unknown context So unless we control the experimental context, we cannot
interpret our experiments
![Page 11: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/11.jpg)
11
The Two Greatest Challenges of Systems Biology
1. Working with indeterminate systems
2. Understanding context - what it is and how to control and capture it
![Page 12: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/12.jpg)
12
Defining the composition of living systems is driving analytical technologies
GenomicsProteomicsMetabanomicsExpression profilingImagingEtc…….
All of these technologies seek to rigorously define the composition of living
systems
![Page 13: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/13.jpg)
13
2,500
2,243
1,731
1,475
1,218
962
706
450
1,987
24 33 44 52 62 71
MW
Capillary LC-FTICR 2-D display of peptides from a yeast soluble protein digest>160,000 isotopic distributions corresponding to >100,000 polypeptides detected
2,500
2,243
1,731
1,475
1,218
962
706
450
1,987
24 33 44 52 62 7124 33 44 52 62 71
MW
Capillary LC-FTICR 2-D display of peptides from a yeast soluble protein digest>160,000 isotopic distributions corresponding to >100,000 polypeptides detected
Time
2-D display of detected peptides
Mass
Global simultaneous quantitative proteome measurements
Proteins identified and quantified using Proteins identified and quantified using accurate mass and time (AMT) tagsaccurate mass and time (AMT) tags
0 42 84 126LC elution time (min)
m/z 750 1000
Dimension one - separation time
Dimension two - accurate mass
1250 1500
![Page 14: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/14.jpg)
14
9.4 Tesla High Throughput Mass Spectrometer
1 Experiment per hour5000 spectra per experiment4 MByte per spectrum
Per instrument:20 Gbytes per hour480 Gbytes per day
These are based ontoday's technologies.
Time to analyze offsite: 1 weekTime to analyze onsite: 48 hoursTime to analyze onsite with smart storage: 2 hours
High Throughput ProteomicsHigh Throughput Proteomics
![Page 15: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/15.jpg)
15
Integrated, High-throughput Experiments will Generate Enormous Amounts of Data
Experiment templates for a single microbe
class of experiment
time points treatments conditions
genetic variants
biological replication
total biological samples
Proteomics data volume in TB
Metabolite data in TB
Transcription data in TB
simple (scratching the surface) 10 1 3 1 3 90 1.8 1.4 0.009moderate 25 3 5 1 3 1125 22.5 16.9 0.1125upper mid 50 3 5 5 3 11250 225.0 168.8 1.125complex 20 5 5 20 3 30000 600.0 450.0 3real interesting 20 5 5 50 3 75000 1500.0 1125.0 7.5
Profiling methodProteomics Looking at a possible 6000 proteins per microbe assuming ~20 GB per sample Metabolites Looking a panel of 500-1000 different molecules assuming ~15GB per sampleTranscription 6000 genes & 2 arrays per sample ~100 MB
Typically a single significant scientific question takes the multidimensional analysis of at least 1000 biological samples
![Page 16: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/16.jpg)
16
![Page 17: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/17.jpg)
17
Trey Ideker
The Molecular Interaction Scaffold is Huge
![Page 18: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/18.jpg)
18
Cell Imaging New multispectral, multidimensional imaging techniques
can generate enormous amounts of data
![Page 19: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/19.jpg)
19
Cell Imaging Workflow
Complex set of metadata
collected here
![Page 20: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/20.jpg)
20
How Much Data From Imaging?
Currently, a high quality image of a single cell field is 4mb per image, obtained at 4fps (16mb/s)Following cell through one cell cycle is 24h, or approximately 1.4tbNew hyperspectral microscopes analyzing only 10 wavelengths would generate 7tb/dayCharacterizing dynamics of most abundant set of genes (4000) would require 5.5pbThis is for a single instrument and a single experiment using today’s technology
![Page 21: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/21.jpg)
21
Understanding the influence of cell context is driving experimental and computational
biology
Cell SignalingDevelopmental biologyCancer and growth controlHost-pathogen interactionsDynamics of microbial communitiesCellular responses to stress
![Page 22: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/22.jpg)
22
Computational Modeling Approaches-- Diverse Spectrum
differential equations
statistical mining
Bayesian networks
SPECIFIED ABSTRACTED
Markov chains
Boolean models
relationships
mechanisms
influences *(includingstructure)
*
![Page 23: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/23.jpg)
23
Computer Models Allow Reconstruction of Processes Across Different Scales
MODEL DATABASE
Organ 1Organ 1Organ 1Organ N
Model 1Model 1Cell DataSet N
Unique IDModel NameModel Descr.Default Par.Default Comp.TimestampSecurity
Organ
Species 1 Species 1 Species 1 Species N Species
Solution Par.Input_par IDInput_par IDReact. RatesChemical Par.Concen. Val.--
GeometricPar.
Input_par IDInput_par IDValue_par--
EquationDocs.
Input_par IDInput_par IDSymbolicSource--
TissueModel 1Model 1Model 1Tissue N Cell
ComputePar.
Input_par IDInput_par IDValue_par--
Initial Conditions
Input_fld IDInput_fld IDValue_parValue_par--
ParameterDocs.
Input_par IDInput_par IDReferencesLimits-
![Page 24: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/24.jpg)
24
![Page 25: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/25.jpg)
25
![Page 26: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/26.jpg)
26
![Page 27: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/27.jpg)
27
Data is distributed across many repositories with various ontologies and data formats
Analysis tools do not address integration of heterogeneous data sets
Minimal informatics based analysis tools that support a systems biology approach
Collaboration capabilities are primitive to support shared knowledge among researchers
Obstacles preventing scientists from utilizing available data
Obstacles preventing scientists from utilizing available data
![Page 28: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/28.jpg)
28
The Challenge for Data Handling is Two-fold
1. Managing the massive amounts of compositional data necessary to define all of the relevant experimental systems
2. Capture all of the data on the relationships between context, composition and response
Integration of the analytical and experimental methodologies into a single system is necessary to
link all of the data in a useful way
![Page 29: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/29.jpg)
29
END
![Page 30: Pacific Northwest National Laboratory U.S. Department of Energy DOE Data Workshop View from Information-intensive Applications H. Steven Wiley Biomolecular.](https://reader036.fdocuments.us/reader036/viewer/2022062714/56649d425503460f94a1db31/html5/thumbnails/30.jpg)
30
Understanding Living Cells
Cell responses are multiphasic
Different classes of stimulants (information) are processed at characteristic time scales
Processing nodes within cells are spatially segregated
Each cell responds independently depending on its specific context
A response generally induces a reprogramming of the cell machinery
To create cell simulations, we must “abstract” this information to create a reference model which can then be modified