Network Biology: from lists to underpinnings of molecular behaviour
-
Upload
michel-dumontier -
Category
Health & Medicine
-
view
1.905 -
download
5
description
Transcript of Network Biology: from lists to underpinnings of molecular behaviour
![Page 1: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/1.jpg)
Network Biology:from lists to underpinnings of molecular
behaviour
Michel Dumontier, Ph.D.Associate Professor of Bioinformatics
Carleton University
1BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 2: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/2.jpg)
2BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 3: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/3.jpg)
Provenance
• This talk was prepared in part with input from the “Interpreting Gene Lists” workshop put forward by the Canadian Bioinformatics Workshops (bioinformatics.ca)
• http://bioinformatics.ca/workshops/2009/course-content
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier] 3
![Page 4: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/4.jpg)
So you did some mass spectrometry?
Protein Identification4BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 5: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/5.jpg)
database search vs de novoS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
WR
A
C
VG
E
K
DW
LP
T
L T
WR
A
C
VG
E
K
DW
LP
T
L T
de novo
AVGELTK
Database Search
Database ofknown peptides
MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..
5BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 6: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/6.jpg)
6BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 7: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/7.jpg)
My experiment worked and I have dozens, hundreds, or thousands of
hits…. now what?
?Protein
IdentificationS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
7BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 8: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/8.jpg)
Use the list to explore Biology
• Determine significant shared attributes• Explore putative mechanisms of actions• Test hypotheses
Protein IdentificationS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lative
Ab
un
da
nce
850.3
687.3
588.1
851.4425.0
949.4
326.0524.9
589.2
1048.6397.1226.9
1049.6489.1
629.0
Eureka!Hypothesis on the
molecular basisof disease/process
Network Biology
8BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 9: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/9.jpg)
# in list having attribute
# in list sharing these attributes
Oxidative Metabolism
Detoxification
Enriched in smokers =UP-regulated in smokers
9BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 10: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/10.jpg)
Outline
1. Explore identified proteins
2. Attribute enrichment
3. Networks
4. Pathways
5. Lab
10BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 11: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/11.jpg)
A hypothesis underlies the list of identified proteins
• An initial question was posed, an experiment performed and a list of candidates obtained.
• The question is, what are the roles of these entities in the biological process being investigated. – Normal vs pathological– Response to stimulus– Interactions and complexes
11BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 12: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/12.jpg)
Biological Answers
• Computational systems biology– Information retrieval and summary– Interaction network analysis– Pathway analysis– Function prediction
12BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 13: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/13.jpg)
Molecular Attributes
• An attribute provides information about to the entity in question (e.g. shape, function, process)
• Sequence and structure provides information about – Motifs, domains, interaction/binding sites, post-
translational modifications, conformational changes, molecular complexes, mutations, conservation/evolution
– Functions, localization, biological / pathological processes
13BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 14: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/14.jpg)
Gene Ontology
• Captures terminology related to three aspects– biological processes– molecular functions – cellular components
• Relationships between terms are largely defined with “is a” and “part of” relations
Cell division
Isomerase activity
14BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 15: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/15.jpg)
GO Structure cell
membrane chloroplast
mitochondrial chloroplastmembrane membrane
is-apart-of
Species independent. Some lower-level terms are specific to a group, but higher level terms are not
15BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 16: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/16.jpg)
Gene Ontology
• 30,393 terms, 99.2% with definitions– 18,939 biological processes– 2,735 cellular components– 8,719 molecular functions
• GO Slim is an official reduced set of GO terms– Generic, plant, yeast– Good for making pie charts
16BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 17: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/17.jpg)
Annotation
• Manual annotation– Created by scientific curators
• High quality• Small number (time-consuming to create)
• Electronic annotation– Annotation derived without human validation
• Computational predictions (accuracy varies)• Lower ‘quality’ than manual codes
• Key point: be aware of annotation origin
17BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 18: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/18.jpg)
Evidence Type(provenance of facts)
• ISS: Inferred from Sequence/Structural Similarity
• IDA: Inferred from Direct Assay• IPI: Inferred from Physical Interaction• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• TAS: Traceable Author Statement• NAS: Non-traceable Author Statement• IC: Inferred by Curator• ND: No Data available
• IEA: Inferred from electronic annotation
18BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 19: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/19.jpg)
Variable Coverage
Lomax J. Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform. 2005 Sep;6(3):298-304.
19BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 20: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/20.jpg)
GO Software Tools
• GO resources are freely available to anyone without restriction– Includes the ontologies, gene associations
and tools developed by GO• Other groups have used GO to create
tools for many purposeshttp://www.geneontology.org/GO.tools
20BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 21: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/21.jpg)
Accessing GO: QuickGO
http://www.ebi.ac.uk/ego/21BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 22: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/22.jpg)
Explore Ontologies
http://www.ebi.ac.uk/ontology-lookup
22BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 23: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/23.jpg)
Databases of Molecular Annotation
• NCBI – Genbank / RefSeq– Entrez Gene
• EBI – UniProt– Ensembl BioMart
(eukaryotes)
Model Organism Databases• Berkeley Drosophila Genome Project (BDGP)• dictyBase (Dictyostelium discoideum) • FlyBase (Drosophila melanogaster) • GeneDB (Schizosaccharomyces pombe,
Plasmodium falciparum, Leishmania major and Trypanosoma brucei)
• UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases
• Gramene (grains, including rice, Oryza) • Mouse Genome Database (MGD) and Gene
Expression Database (GXD) (Mus musculus) • Rat Genome Database (RGD) (Rattus
norvegicus)• Reactome• Saccharomyces Genome Database (SGD)
(Saccharomyces cerevisiae) • The Arabidopsis Information Resource (TAIR)
(Arabidopsis thaliana) • The Institute for Genomic Research (TIGR):
databases on several bacterial species • WormBase (Caenorhabditis elegans) • Zebrafish Information Network (ZFIN): (Danio
rerio 23BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 24: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/24.jpg)
24BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 25: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/25.jpg)
Identifiers
• Identifiers (IDs) are ideally unique, stable names or numbers that help track database records– E.g. Social Insurance Number, Entrez Gene ID 41232
• Gene and protein information stored in many databases– Genes have many IDs
• Records for: Gene, DNA, RNA, Protein– Important to recognize the correct record type– E.g. Entrez Gene records don’t store sequence. They
link to DNA regions, RNA transcripts and proteins.
25BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 26: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/26.jpg)
NCBI Database
Links
http://www.ncbi.nlm.nih.gov/Database/datamodel/data_nodes.swf
NCBI:U.S. National Center for Biotechnology Information
Part of National Library of Medicine (NLM)
26BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 27: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/27.jpg)
Common IdentifiersSpecies-specificHUGO HGNC BRCA2MGI MGI:109337RGD 2219 ZFIN ZDB-GENE-060510-3 FlyBase CG9097 WormBase WBGene00002299 or ZK1067.1 SGD S000002187 or YDL029WAnnotationsInterPro IPR015252OMIM 600185Pfam PF09104Gene Ontology GO:0000724SNPs rs28897757Experimental PlatformAffymetrix 208368_3p_s_atAgilent A_23_P99452CodeLink GE60169Illumina GI_4502450-S
GeneEnsembl ENSG00000139618Entrez Gene 675Unigene Hs.34012
RNA transcriptGenBank BC026160.1RefSeq NM_000059Ensembl ENST00000380152
ProteinEnsembl ENSP00000369497RefSeq NP_000050.2UniProt BRCA2_HUMAN or A1YBP1_HUMANIPI IPI00412408.1EMBL AF309413 PDB 1MIU
Red = Recommended27BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 28: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/28.jpg)
Identifier Mapping
• So many IDs!– Mapping (conversion) is a headache
• Four main uses– Disambiguate similarly named entities– Used to reference related information– Biological and informational provenance
• E.g. Genes to proteins, Entrez Gene to Affy
– Unification during dataset merging• Equivalent entities
28BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 29: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/29.jpg)
ID Mapping Services
• Synergizer– http://llama.med.harvard.edu/
synergizer/translate/
• Ensembl BioMart
– http://www.ensembl.org
• UniProt– http://www.uniprot.org/
29BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 30: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/30.jpg)
Outline
1. Explore identified proteins
2. Attribute enrichment
3. Networks
4. Pathways
30BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 31: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/31.jpg)
Attribute Enrichment (AE)
Given:1. list: e.g. RRP6, MRD1, RRP7, RRP43, RRP42
2. attributes: e.g. function, process, localization, interactions
AE Question: Are any of the attributes surprisingly enriched in the list?
• Details:– How to assess “surprisingly” (statistics)– How to correct for repeating the tests
31BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 32: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/32.jpg)
What is a P-value?
• The P-value is (a bound) on the probability that the “null hypothesis” is true,
• Calculated through statistics with the data and testing the probability of observing those statistics, or ones more extreme, given a sample of the same size distributed according to the null hypothesis,
• Intuitively: P-value is the probability of a false positive result (aka “Type I error”)
32BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 33: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/33.jpg)
How likely are the observed differences between the two distributions due to chance?
66
7
7
5
01
1 22
1
1
1
10
00 0
value
value distribution
33BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 34: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/34.jpg)
AE using the T-test
Answer: Two-tailed T-test
Black: N1=500
Red: N2=4500
Mean: m1 = 1.1 Std: s1 = 0.9
T-statistic =
Mean: m1 = 4.9 Std: s1 = 1.0
2
22
1
21
21
Ns
Ns
mm
= -88.5
Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same?
34BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 35: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/35.jpg)
AE using the T-test
T-statistic =
2
22
1
21
21
Ns
Ns
mm
= -88.5
T-distribution
Pro
ba
bil
ity
de
ns
ity
T-statistic
0
P-value = shaded area * 2
-88.5
Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same?
35BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 36: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/36.jpg)
T-test limitations1. Assumes distributions are both approximately Gaussian (i.e. normal)
– Score distribution assumption is often true for:• Log ratios from microarrays
– Score distribution assumption is rarely true for:• Peptide counts, sequence tags (SAGE or NextGen sequencing), transcription factor
binding sites hits
2. Tests for significance of difference in means of two distribution but does not test for other differences between distributions.
Pro
bab
ilit
y d
en
sity
score 0
Values are positive and have increasing density near zero, e.g. sequence counts
Pro
bab
ilit
y d
en
sity
score
Distributions with outliers, or “heavy-tailed” distributions
Pro
bab
ilit
y d
en
sity
score
Bimodal “two-bumped” distributions.
36BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 37: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/37.jpg)
Kolmogorov-Smirnov (K-S) testP
rob
abil
ity
den
sity
score 0
Question: Are the red and black distributions significantly different?
Calculate cumulative distributions of red and black
Cu
mu
lati
ve p
rob
abil
ity
score 0
0.5
1.0
Cumulative distribution
Length = 0.4
Formal question: Is the length of largest difference between the “empirical distribution functions” statistically significant?
37BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 38: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/38.jpg)
What is the probability of finding 4 or more proteins with feature X in a random sample of
5 proteinslist
RRP6MRD1RRP7RRP43RRP42
Background population:500 X proteins,5000 proteins
38BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 39: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/39.jpg)
Fisher’s exact test
Background population:500 X proteins, 5000 proteins
list
RRP6MRD1RRP7RRP43RRP42
P-value
Null distribution
Answer = 4.6 x 10-4
P-value for Fisher’s exact testis “the probability that a random draw of the same size as the list from the background population would produce the observed number (or more) of attributes in the list.”,depends on size of the list, # with features (in list, background), and the background population. 39BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 40: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/40.jpg)
Important details
• To test for under-enrichment of “black”, test for over-enrichment of “red”.
• Need to choose “background population” appropriately, e.g., if only portion of the total complement is queried (or having annotation), only use that population as background.
• To test for enrichment of more than one independent types of annotation (red vs black and circle vs square), apply Fisher’s exact test separately for each type.
• The hypergeometric test is equivalent to a one-tailed Fisher’s exact test.
40BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 41: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/41.jpg)
How to win the P-value lottery, part 1
Background population:500 X5000 Y
Random draws
… 7,834 draws later …
Expect a random draw with observed enrichment once every 1 / P-value draws
41BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 42: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/42.jpg)
How to win the P-value lottery, part 2Keep the list the same, evaluate different annotations
Observed drawRRP6MRD1RRP7RRP43RRP42
Different annotations
RRP6MRD1RRP7RRP43RRP42
42BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 43: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/43.jpg)
Correcting for multiple tests
• The Bonferroni correction controls the probability any one test is due to random chance aka Family-Wise Error Rate (FWER) If M = # of annotations tested: Corrected P-value = M x original P-value
• The Benjamini-Hochberg (B-H) controls the proportion of positive tests (i.e. rejections of the null hypothesis) that are false positives aka False Discovery Rate (FDR)– FDR is the expected proportion of the observed enrichments that
are due to random chance.– Less stringent than the Bonferroni
43BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 44: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/44.jpg)
Reducing multiple test correction stringency
• The correction to the P-value threshold a depends on the # of tests that you do, so, no matter what, the more tests you do, the more sensitive the test needs to be
• Can control the stringency by reducing the number of tests: – e.g. use GO slim or restrict testing to the appropriate
GO annotations.
44BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 45: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/45.jpg)
AE tools
• Web-based tools – Funspec:
• easy tool for yeast, not maintained, uses GO annotations and some annotations (e.g. protein complexes)
– YeastFeatures • Similar to Funspec, different datasets and presentation
– GoMiner: • Uses GO annotations, covers many organisms, needs a
background set of genes
• Cytoscape-based tools– BINGO:
• Does GO annotations and displays enrichment results graphically and visually organizes related categories
45BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 46: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/46.jpg)
Funspec: Simple ORA for yeasthttp://funspec.med.utoronto.ca/
Paste list hereBonferroni correct? YES!
Choose sources of annotation
Cavaets:• yeast only,• last updated 2002
46BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 47: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/47.jpg)
http://software.dumontierlab.com/yeastfeatures47BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 48: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/48.jpg)
48BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 49: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/49.jpg)
GoMiner, part 1http://discover.nci.nih.gov/gominer
1. Click “web interface”
2. Upload background
3. Upload list
4. Choose organism
5. Choose evidence code (All or Level 1)
49BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 50: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/50.jpg)
GoMiner, part 2
6. Restrict # of tests via category size
7. Restrict # of tests via GO hierarchy
8. Results emailed to this address, in a few minutes
50BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 51: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/51.jpg)
DAVID, part 1 http://david.abcc.ncifcrf.gov/
Paste list here
Choose ID type
List type: list or background?
DAVID automatically detects organism
51BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 52: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/52.jpg)
DAVID, part 2http://david.abcc.ncifcrf.gov/
52BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 53: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/53.jpg)
BINGO, an ORA cytoscape pluginhttp://www.psb.ugent.be/cbd/papers/BiNGO/index.htm
Links represent parent-child relationships in GO ontology
Colours represent significance of enrichment
Nodes represent GO categories
53BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 54: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/54.jpg)
54BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 55: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/55.jpg)
Outline
1. Explore identified proteins
2. Attribute enrichment
3. Networks • Physical networks• Genetic networks• Functional networks
4. Pathways
55BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 56: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/56.jpg)
Why Network and Pathway Analysis?
• Intuitive to Biologists• Provide a biological context for results• More efficient than searching databases gene-by-gene• Intuitive display for sharing data
• Computation on Pathway Content• Visualize multiple data types on a pathway or network• Find active pathways• Identify potential regulators
56BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 57: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/57.jpg)
network
In biology, a network is a graph comprised of nodes that correspond to entities (genes, proteins, small molecules) and edges that correspond to physical/agentive or associative relations between entities.
Vertex (node)
EdgeCycle
-5
Directed Edge (Arc)
Weighted Edge7
10
57BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 58: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/58.jpg)
Integration in a Network Context
58BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 59: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/59.jpg)
Expression data mappedto node colours
Integration in a Network Context
59BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 60: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/60.jpg)
Mapping Biology to a Network
• A simple mapping: Protein-protein interactions– one protein/node, one interaction/edge
• Edges can represent other relationships– Physical e.g. protein-protein interaction– Regulatory e.g. kinase activates target– Genetic e.g. epistasis– Similarity e.g. protein sequence similarity
• Critical: understand the mapping for network analysis
60BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 61: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/61.jpg)
Protein Sequence Similarity Network
http://apropos.icmb.utexas.edu/lgl/61BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 62: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/62.jpg)
Literature Network
• Computationally extract gene relationships from text, usually PubMed abstracts
• Useful if network is not in a database– Literature search tool
• BUT not perfect– Problems recognizing gene names– Natural language processing is difficult
• Agilent Literature Search Cytoscape plugin• iHOP (www.ihop-net.org/UniPub/iHOP/)
62BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 63: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/63.jpg)
Agilent Literature Search
63BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 64: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/64.jpg)
Cytoscape Network produced by Literature Search.
Abstract from the scientific literature
Sentences for an edge
64BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 65: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/65.jpg)
Enrichment Map
A
B
|)||,(|min
||
BA
BA
Overlap
65BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 66: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/66.jpg)
Nodes represent gene-sets
66BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 67: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/67.jpg)
Olfactory Receptor
Muscle Contraction
Ectodermal Dev. &Keratinocyte Diff.
Ubiquitin Processes
DNA Processes
Mitotic Cell Cycle
DNA Repair
DNA ReplicationRas GTPase
Serine Endopeptidase
Chromatin Remodeling
Chromosome
Ubiquitin-dependent Proteolysis
Ubiquitin Ligase
Microtubule Cytoskeleton
Intermediate Filament
Cytoskeleton
Ion ChannelCalcium
Potassium Sodium
Mitochondrial Oxidative
Metabolism
Fatty Acid Metabolism
Cytoskeleton
mRNA Transport
RNA Splicing
RNA Processes
Transcription
rRNA Processing
Ribonucleotide Metabolism
Translation
67BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 68: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/68.jpg)
68
Physical Networks
• Between two molecular objects– DNA, RNA, gene, protein, complex, small molecule,
photon– Requires a site of interaction / binding
• Biologically relevant:– Present/expressed at the same time– Share a cellular location– Leads to some biologically relevant outcome
BA
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 69: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/69.jpg)
Molecular Interactions
RAS interacting with RALGDS
(PDB: 1LFD)
Synthetic protein interacting with ATP and Zinc
(PDB: 2P0X)
69BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 70: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/70.jpg)
70
Experimental Interaction Discovery
Microarray
Two-Hybrid
MassSpectrometry
Genetics
X-Ray
NMR
Direct, Physical Indirect, Physical Indirect, Genetic
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 71: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/71.jpg)
71
Experimental Considerations• How do you know if the interaction really
exists? • Each method has its advantages and
disadvantages. – Be aware of systematic errors– Be aware of contaminants.
• Each method observes interactions from a slightly different experimental condition.
• Support from many different sources is certainly better (necessary) than just one.
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 72: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/72.jpg)
72
B
Some affinity purification caveats
A
First and most importantly, this is only a representation of the observation.
You can only tell what proteins are in the eluate; you can’t tell how they are connected to one another.
If there is only one other protein present (B), then its likely thatA and B are directly interacting.
But, what if I told you that two other proteins (B and C) werepresent along with A…. B
A
C
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 73: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/73.jpg)
73
B
Complexes with unknown topology
A
Which of these models is correct?The complex described by this experimental result is said to have an Unknown Topology.
C B
A
C B
A
C
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 74: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/74.jpg)
74
B
Complexes with unknown stoichiometry
A
Here’s another possibility?The complex described by this experimental result is also said to have Unknown Stoichiometry.
B
A
B
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 75: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/75.jpg)
75
Interaction Models
Spoke Matrix
Simple model, useful for data navigation
More accurate
Theoretical max. number of interactions
ActualTopology
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 76: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/76.jpg)
76
High-throughput Mass Spectrometric Protein Complex Identification (HMS-PCI)
Ste12
Ho et al. Nature. 2002 Jan 10;415(6868):180-3
Mike Tyers, SLRI
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 77: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/77.jpg)
77BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 78: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/78.jpg)
78
k-core analysis
• A part of a graph where every node is connected to other nodes with at least k edges (k=0,1,2,3...)
• Highest k-core is a central most densely connected region of a graph
• Regions of dense connectivity may represent molecular complexes
• Therefore, high k-cores may be molecular complexes
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 79: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/79.jpg)
79
Pre MS Ho
Gavin
Union
6-core 6-core
6-core 9-core
Interaction can define function
MCODE plugin for CytoscapeBIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 80: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/80.jpg)
80
http://pathguide.org
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 81: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/81.jpg)
Interaction Databases
• Experiment (E)• Structure detail (S)• Predicted
– Physical (P)– Functional (F)
• Curated (C)• Homology
modeling (H)• *IMEx consortium
81BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 82: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/82.jpg)
Network Classification of Disease• Traditional: Gene association• Limitations: Too many genes reduces
statistical power• New: Active cell map based approaches
combining network and molecular profiles
Chuang HY, Lee E, Liu YT, Lee D, Ideker TNetwork-based classification of breast cancer metastasisMol Syst Biol. 2007;3:140. Epub 2007 Oct 16
Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif SNetwork-based analysis of affected biological processes in type 2 diabetes modelsPLoS Genet. 2007 Jun;3(6):e96
Efroni S, Schaefer CF, Buetow KHIdentification of key processes underlying cancer phenotypes using biologic pathway analysisPLoS ONE. 2007 May 9;2(5):e425
82BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 83: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/83.jpg)
Network-Based Breast Cancer Classification• 57k intx from Y2H,
orthology, co-citation, HPRD, BIND, Reactome
• 2 breast cancer cohorts, different expression platforms
Chuang HY, Lee E, Liu YT, Lee D, Ideker TNetwork-based classification of breast cancer metastasisMol Syst Biol. 2007;3:140. Epub 2007 Oct 16
83BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 84: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/84.jpg)
• Similar network markers across 2 data sets (better than original overlap)
• Increased classification accuracy
• Better coverage of known cancer risk genes (*)
84BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 85: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/85.jpg)
PIPE
• Predicts yeast PPI from sequence– Uses interaction databases to find similar
interacting proteins– Estimates the site of interaction– 75% accuracy (61% sensitivity, 89%
specificity)– Finds new interactions among complexes
85BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 86: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/86.jpg)
86BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 87: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/87.jpg)
87BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 88: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/88.jpg)
PIPE2
• First all-to-all sequence-based computational screen of PPIs in yeast – 29,589 high confidence interactions of ~ 2 x 107
possible pairs – 16,000x faster than PIPE– 99.95% specificity
88BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 89: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/89.jpg)
89
Synthetic Genetic Interactions
• Synthetic genetic interactions (lethal, slow growth)• Mate two mutants without phenotypes to get a daughter
cell with a phenotype• Synthetic lethal (SL), slow growth
• robotic mating using the yeast deletion library• Genetic interactions provide functional data on protein
interactions or redundant genes• About 23% of known SLs (1295 - YPD+MIPS) were
known protein interactions in yeast
Tong et al. Science. 2001 Dec 14;294(5550):2364-8
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 90: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/90.jpg)
90
Cell PolarityCell Wall Maintenance Cell StructureMitosisChromosome StructureDNA Synthesis DNA RepairUnknownOthers
Synthetic Genetic Interactions in Yeast
Tong, BooneBIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 91: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/91.jpg)
Validation: Protein Localization
A – A3: Y2HB: physical methodsC: geneticE: immunological
True positives:- Localized in the
same cellular compartment
- Have common cellular role
Sprinzak, Sattath, Margalit, J Mol Biol, 200391BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 92: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/92.jpg)
Comparisons• All methods except for Y2H and synthetic
lethality technique are biased toward abundant proteins.
• PPI bias toward certain cellular localizations. • Evolutionarily conserved proteins have much
better coverage in Y2H than the proteins restricted to a certain organism.
C. Von Mering et al, Nature, 2002:
92BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 93: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/93.jpg)
Functional Associations• Molecular Interactions• Regulatory Interactions• Genetic Interactions• Similarity relationships
– Co-expression– Protein sequence– Domain architecture– Phylogenetic profiles– Gene neighborhood– Gene fusion– …
93BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 94: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/94.jpg)
http://string.embl.de/von Mering et al., Nucleic Acids Res., 2005
94BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 95: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/95.jpg)
95
95BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 96: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/96.jpg)
96BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 97: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/97.jpg)
=
Query-specific weights for multifaceted function queries
+GeneticTong et al. 2001
w1 x w2 x w3 xweights
Co-expression
CDC27
APC11CDC23
XRS2RAD54
MRE11
UNK1
UNK2
Cell cycle
DNA repair
Pavlidis et al, 2002, Lanckriet et al, 2004Mostafavi et al, 2008
+Co-complexed
Durrett 2006
Gene Function Prediction using a Multiple Association Network Integration Algorithm
97BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 98: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/98.jpg)
GeneMANIA Cytoscape Plugin
98BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 99: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/99.jpg)
Outline
1. Explore identified proteins
2. Attribute enrichment
3. Networks
4. Pathways
5. Lab
99BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 100: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/100.jpg)
pathway
In biology, a pathway is a network which consists of inputs (physical entities), outputs (physical entities, biological outcomes), and the molecular machinery and chemical transformations required/expected to realize the end-directed activity.
100BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 101: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/101.jpg)
Using Pathway Information
Databases
Literature
Expert knowledge
Experimental Data
Find active processesunderlying a phenotype
PathwayInformation
PathwayAnalysis
101BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 102: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/102.jpg)
htt
p:/
/pat
hg
uid
e.o
rg
Vuk PavlovicSylva Donaldson
>290 PathwayDatabases!
• Varied formats, representation, coverage
• Pathway data extremely difficult to combine and use
102BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 103: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/103.jpg)
Aim: Convenient Access to Pathway Information
Facilitate creation and communication of pathway dataAggregate pathway data in the public domainProvide easy access for pathway analysis
http://www.pathwaycommons.org
103BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 104: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/104.jpg)
Access From Cytoscape
104BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 105: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/105.jpg)
Fatty Acid Degradation?Other pathways / processes?
GenMAPP.org
cardiomyopathy: downregulated genes
105BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 106: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/106.jpg)
Fatty Acid Degradation Pathway
106BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 107: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/107.jpg)
Cardiomyopathy Data on Fatty Acid Degradation Pathway
107BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 108: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/108.jpg)
Visualizing Time Course Data on Pathways: Multiple Comparison View
108BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 109: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/109.jpg)
Outline
1. Explore identified proteins
2. Attribute enrichment
3. Networks
4. Pathways
5. Lab
109BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 110: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/110.jpg)
110
Network Analysis
• Cytoscape– Visualize molecular interaction
networks and integrate interactions with gene expression profiles and other state data. Data filters & custom plug-in architecture.
– http://www.cytoscape.org
• Biolayout Express 3D– Large networks– Gene expression– www.sanger.ac.uk/Teams/Team101/
biolayout/b3d.html
BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 111: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/111.jpg)
Network Analysis using Cytoscape
Databases
Literature
Expert knowledge
Experimental Data
Find biological processesunderlying a phenotype
NetworkInformation
NetworkAnalysis
111BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 112: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/112.jpg)
Network visualization and analysis
UCSD, ISB, Agilent, MSKCC, Pasteur, UCSF, Unilever, UToronto, U Texas
http://cytoscape.org
Pathway comparisonLiterature miningGene Ontology analysisActive modulesComplex detectionNetwork motif search
112BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 113: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/113.jpg)
Manipulate Networks Filter/Query
Automatic LayoutInteraction Database Search
113BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 114: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/114.jpg)
Focus
Overview
Zoom
PKC Cell Wall Integrity
114BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 115: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/115.jpg)
Active Community
• Help– 8 tutorials, >10 case studies– Mailing lists for discussion– Documentation, data sets
• Annual Conference: Houston Nov 6-9, 2009
• 10,000s users, 2500 downloads/month• >40 Plugins Extend Functionality
– Build your own, requires programming
http://www.cytoscape.org
Cline MS et al. Integration of biological networks and gene expression data using Cytoscape Nat Protoc. 2007;2(10):2366-82
115BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 116: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/116.jpg)
LAB
Objective• Create a map of the functional enrichments from
the 14 input proteins
Methods• Use HGNC to obtain the gene symbols from the
names• Submit the gene symbols to a tool that already
has datasets loaded.• Get Attributes and do analysis on network
116BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 117: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/117.jpg)
14 Proteins• ISOFORM of APOPTOSIS-INDUCING FACTOR 1, MITOCHONDRIAL • QUINONE OXIDOREDUCTASE.; 26 KDA PROTEIN.;22 KDA PROTEIN.; 32 KDA PROTEIN.• 14-3-3 PROTEIN EPSILON.• ELONGATION FACTOR 1-GAMMA.; 50 KDA PROTEIN.• AFG3-LIKE PROTEIN 2.• 3-KETOACYL-COA THIOLASE, MITOCHONDRIAL• IMPORTIN BETA-1 SUBUNIT.• FH1/FH2 DOMAIN-CONTAINING PROTEIN• ANNEXIN VI ISOFORM 2.; ANNEXIN A6.• 2,4-DIENOYL-COA REDUCTASE, MITOCHONDRIAL• HYDROXYACYL GLUTATHIONE HYDROLASE ISOFORM 1.; HYDROXYACYLGLUTATHIONE
HYDROLASE.• ISOFORM 1 OF ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA.; ISOFORM 2 OF
ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA• ISOFORM 1 OF LONG-CHAIN-FATTY-ACID--COA LIGASE 1• PHOSPHOLIPASE C DELTA 4.
117BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 118: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/118.jpg)
Get their gene symbol/identifiersHGNC - http://www.genenames.org
• Provide a table of mappings• What challenges did you face when trying to identify the
symbols from textual descriptions?118BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 119: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/119.jpg)
Identify functional enrichments
Discuss and provide a plot for the enrichment of Gene Ontology categories
119BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 120: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/120.jpg)
Build an attribute enrichment network
• Which new proteins are functionally linked?• What datasets were used in the network construction?
120BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 121: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/121.jpg)
Attribute Enrichment with a custom data set
• Use BioMart to– convert HGNC identifiers to Ensembl
Identifiers– Obtain the Gene Ontology categories for the
target proteins and the background proteins.• Use FUNC to do the enrichment analysis
121BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 122: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/122.jpg)
122BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 123: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/123.jpg)
123BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 124: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/124.jpg)
124BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 125: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/125.jpg)
125BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 126: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/126.jpg)
Collect the Gene Ontology attributes for the list, then for all the human genes
126BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
![Page 127: Network Biology: from lists to underpinnings of molecular behaviour](https://reader035.fdocuments.us/reader035/viewer/2022062513/554e86e0b4c90526358b4723/html5/thumbnails/127.jpg)
Next steps are harder…
To use FUNC, you need to convert the BioMART output to the file format above. This is pretty easy to do in excel for the protein list, but excel can’t handle the results for all the human proteins. Need to write a small script… take BIOC3008 and become a competent in simple data manipulation
http://func.eva.mpg.de/
127BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]