Making sense of large amounts of molecular data
description
Transcript of Making sense of large amounts of molecular data
Making sense of large amounts of molecular data
Jason E. McDermott, PhDResearch Scientist
Computational Biology and Bioinformatics GroupPacific Northwest National Laboratory
1
Proteins
Nucleic Acids
MacromolecularComplex
How do components of biological systems interact to produce behavior?
3
Molecular pathways
mTOR pathwayEGFR pathway
http://biocarta.com
A Mammoth Problem
Scientific Method Overview
5
Hypothesis
Experimental design
Data generation
Analysis/modeling
Predictions
Interpretation
HypothesisHypothesis
Hypothesis
6
Circumstantial EvidenceTraditional experimental approach
Cigarette butt on streetNeighbor was eyewitness to crimeMissing jewelry from the houseFingerprints on doorknob
High-throughput experimental approach
Cigarette sales in cityTestimony from everyone on the blockAll diamonds sold over last year in 10 mile radiusFingerprints on every surface in the house
7
ProblemNew methods generating mountains of dataVery complex systemsTraditional methods fail in some casesProgress will be made through better use of this data
ObjectivesFormulate hypotheses for further investigationIdentify gene/protein ‘targets’Identify pathways that drive diseaseDevelop systems-level biological understanding
8
What is a ‘target’?
‘Critical nodes’Regulators of important processesOutcome of modeling (a prediction) that can be used to formulate a hypothesis
What are targets used for?Mechanistic understanding of disease processesPotential biomarkers of diseasePotential therapeutic treatments: drug development
9
Examples I’ll be talking aboutBacterial virulence (Salmonella Typhimurium)Viral pathogenesis (avian flu and SARS)Ovarian cancer
Approaches I’ll be talking aboutMachine learningBiological networksData integration
LPSTLR4MEKERKEgr-1
pH
Mg2+
ROS/RNS
SP
I2-T3S
Bac
teria
l de
tect
ion
Hos
t def
enseE
nvironmenta
l responseV
irulence activation
ssrA/B
phoP/Q
ompR/envZ
ydgT
Bac
teria
l su
rviv
al
Invasion
Effectors
Env
ironm
enta
l M
odul
atio
n
Pat
hoge
n di
rect
edH
ost
dire
cted
SP
I1+
SCV
LPS
iNOSNRAMP
Fe2+
Effectors
(e.g. SifA
, SlrP,
SseJ, S
spH2)
SP
I2-T3S
Environm
ental response
Virulence
activation
ssrA/B
phoP/Q
ompR/envZ
ydgT
Effectors
(e.g. SifA
, SlrP,
SseJ, S
spH2)
Salmonella Typhimurium
Pathogen Host
Karou Geddes
Type-III secretion system secreted effectors
SlrPSspH2
SseISseJSifASifBSpvB
SseK-1SopD-1
InvJSipC
+25 other known effectors+??? other unknown effectors
http://en.wikipedia.org/
Overview of the SVM-based Identification and Evaluation of Virulence Effectors (SIEVE) Method
D2
D1
SVM-based Discrimination
Positive
Negative
SIEVE Validation Using CyaA Fusions14
0 20 40 60 80 100 120 140 160 180 2000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Secretion versus SIEVE score
CyaA Activity (relative to SrfH)
SIEV
E Zs
core
McDermott, et al. 2011. Infection and Immunity. 79(1):23-32Niemann, et al. 2011. Infection and Immunity. 79(1): 33-43
Biological Networks
Types of networksRegulatory networksProtein-protein interaction networksBiochemical reaction networksAssociation networks
NetworkNode = gene/protein or other componentEdge = inferred relationship between components
15
McDermott JE, et al. 2010. Drug Markers, 28(4):253-66.
Merging disparate observations of a system to produce a single, more informative view
16
SNVs
CNVs
mRNA
methylation
proteinphosphorylatio
n
miRNA
GenomeComparison
Pathway enrichment
LEAP
Network analysis
metabolome
Can we infer a relationship between two genes or proteins based on their expression profiles over a large number of different conditions?
A
B
C
Faith, J., et al. “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.” 2007. PLoS Biology 5:e8
Network inference method
conditions
gene
18
What are networks useful for?
Networks can be used for:Pretty figuresHypothesis generationFunctional modules and their organizationTopological identification of target critical nodesPredicting future states of the network
Networks are NOT useful for:Final mechanistic insightFine distinction of types of interactions between componentsCausality
Yu H et al. PLoS Comp Biol 2007, 3(4):e59
Hubs High centrality, highly
connected Exert regulatory influences Vulnerable
Bottlenecks High betweenness Regulate information flow
within network Removal could partition
network
20
Bottlenecks in Salmonella are essential for virulence
McDermott J, et al. 2009. J. Comp. Bio. 16(2):169-180
21
Discovery of a novel class of effectors by integrating transcriptomic and proteomic networks
Respiratory virus pathogenesisWhat are the causes of pathogenesis in respiratory viruses?Goal: Identify and prioritize potential mediators of pathogenesis that are common and unique to influenza and SARS Goal: Identify and prioritize potential mediators of high-pathogenecity viral infectionApproach:
Mouse models of infectionTranscriptomicsNetwork-based approachTopological network analysis to define targetsValidation studies
Ido1/Tnfrsf1b ModuleKepi Module
SARS-CoV-infected Wild type Mouse Inferred Network
Hypotheses for Validation
KO Mouse
Infection
Survival Death Negative NegativePhenotype:
Network: Altered Altered Altered Negative
Predicted targets abrogate influenza pathogenesis
Tnfrsf1b (aka. Tnfr2)Predicted common regulator for influenza and SARS pathogenesisTnfa bindingNegatively regulate TNFR1 signaling, which is proinflammatoryPromote endothelial cell activation/migrationActivation and proliferation of immune cells
25
H5N1 infection
0 1 2 3 4 5 6 770
80
90
100
110
B6TnfrsfPe
rcen
t Sta
rting
Wei
ght
SARS infection
0
5
10
-5
Biological Drivers in Ovarian CancerWhat genomic characteristics of ovarian cancer are executed at the protein level?
Can protein expression be used to identify the most important genomic changes?
How can we improve the survival of women with ovarian cancer?
Can proteomics provide insight into the biological processes associated with poor survival?Can we use a pathway-based approach to suggest novel therapeutic targets?
27
Proteomics
Chemoresistance in ovarian and breast cancerTumor samples from The Cancer Genome Atlas
Depth of genomic characterizationMany tumors
Proteomics and phosphoproteomics characterization of these tumorsPathway/network analysis to reveal patterns and biomarkersIntegrate data into single view of the system
28
Clustering of Proteins and Phosphoproteins
ProteinsiTRAQ Batch
Proteomic Subtypes
Transcriptomic Subtype
Log2 abundance relative to universal reference pool
Phosphoproteins
Linear regression of abundance versus days-to-death suggests possible correlations with patient survival
Protein Abundance Phosphorylation (normalized to abundance)
A Subset of Proteins and Phosphopeptides Correlate with Patient Survival
PDGFRB Pathway
Correlated with short survival
Correlated with long survival
mRNA abundance
protein abundance
Not observed
phosphorylation
Weak correlation
Weak correlation
Module 1 (short survival)
Correlated with short survival
Correlated with long survival
Protein
Phosphorylated protein
mRNA
AP-1 pathwayNFAT TF pathway
Module 2 (long survival)
CD8 T cell receptor downstream pathwayIl12-2 pathwayIl12-STAT4 pathway
Integrated Co-abundance Network for Ovarian Cancer
P-value 0.007IGKV1-5 LAX1AMPD1IGHMSLAMF7
P-value 0.005ATF3DUSP1FOSBZFP36
Kaplan-Meier plots from integrated CNV, mRNA expression, and mutations
% s
urvi
val
% s
urvi
val
Months survival Months survival
Survival Analysis from Network Targets
34
ConclusionsSeveral effective ways of big data integration
Machine learning approachesBiological network representationData integration
Understanding of disease requires system-level viewsRelatively simple approaches can yield novel insightCombining different views of system can improve insightData analysis and modeling is a starting point- not an end point
35
AcknowledgementsSysBEP (http://www.sysbep.org)
NIAID/NIH Y1-AI-8401PI: Josh Adkins, PNNL
Systems Virology (http://www.systemsvirology.org)NIAID/NIH HHSN272200800060CPI: Michael Katze, UW
Clinical Proteomics Tumor Analysis ConsortiumNCI/NIH 1U24CA160019 PIs: Richard Smith, PNNL; Karin Rodland, PNNL
Many, many people in these and other projects who helped with this work and made it possible
About Me
Email: [email protected]: http://www.jasonya.com/wp/about/Twitter: @BioDataGanacheBlog: The Mad Scientist’s Confectioner’s Club
http://www.jasonya.com/wp/
36