Lecture 5 - Blast2GO
Transcript of Lecture 5 - Blast2GO
29.09.2009 – Stefan Götz, Valencia
1
Lecture 5
Functional Analysis with Blast2GO Enriched functions
− FatiGO− Babelomics
FatiScan
Kegg Pathway Analysis Functional Similarities B2G-Far
29.09.2009 – Stefan Götz, Valencia
2
Biosynthesis 60% Biosynthesis 20%
Sporulation 20% Sporulation 20%
One Gene List (A) The other list (B)
Are this two groups of genes
carrying out different
biological roles?
84No biosynthesis
26Biosynthesis
BAGenes in group A have significantly to do with biosynthesis, but not with sporulation.
We can do this for each GO, miRNA, InterPro , ... !!!
Fisher's Exact TestC
on
ti
ng
en
cy
t
ab
le
29.09.2009 – Stefan Götz, Valencia
3
Multiple testing correction
Many tests → Many false positive = we need correction!
FDR control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses.
FWER control: The familywise error rate is the probability of making one or more false discoveries among all the hypotheses when performing multiple pairwise tests. (more conservative)
29.09.2009 – Stefan Götz, Valencia
4
Different types of comparisons
Compare one condition against another
Remove Common Ids Test and Ref-Set are
interchangeable
Set 1 Set 2
Common IDs
Compare a subset against the total
Gossip default setting Test and Ref-Set are
NOT interchangeable
Test-Set
Ref-Set
Common IDs
Test-Set
Ref-Set
Common IDs
29.09.2009 – Stefan Götz, Valencia
5
FatiGO in Blast2GO
Two-Tailed test not only identifies over but also under represented functions.
If no Ref-Set is chosen all annotations are used as reference
29.09.2009 – Stefan Götz, Valencia
6
FatiGO in Blast2GO Result table with link outs to sequence lists
29.09.2009 – Stefan Götz, Valencia
7
FatiGO in Blast2GO
Retains only the lowest, most specific enriched term per GO branch
29.09.2009 – Stefan Götz, Valencia
8
FatiGO in Blast2GO Export enriched terms data as DAG graphs!
reduce
=> To draw all nodes, set filter to 1
29.09.2009 – Stefan Götz, Valencia
9
FatiGO in Blast2GO
=> Filter results
% of sequences in Ref group
% of sequences in Test group
If Test > Ref = over-expressed
If Ref > Test = under-expressed
Export enriched terms as chart!
29.09.2009 – Stefan Götz, Valencia
10
Outline Enriched functions
− Fatigo− Babelomics
e.g: FatiScan
Kegg Pathway Analysis Functional Similarities B2G-Far
Export annotations to other tools
C04018C10 GO:0004707C04018C10 GO:0006468C04018C10 GO:0005524C04018C10 EC:2.7.11.24C04018E10 GO:0005739C04018E10 GO:0009536C04018A12 GO:0009056C04018C12 GO:0004869C04018C12 ...
29.09.2009 – Stefan Götz, Valencia
11
BabelomicsWEB tools suite
A complete suite of web tools for the functional analysis of groups of genes in high-throughput experiments
29.09.2009 – Stefan Götz, Valencia
12
BabelomicsKey features
Functional profiling by functional enrichment method (FatiGO) and gene set method (Fatiscan)
Many functional and regulatory definitions (GO, KEGG, Biocarta, text-mining derived bioentities, Transcription factors, CisRed, miRNAs, InterPro, etc.)
Wide coverage of model organisms (more than 10 species)
Import your own annotations
29.09.2009 – Stefan Götz, Valencia
13
Babelomics Tools
FatiGO: Finds differential distributions of functional terms between two groups of
genes, these terms can be: Gene Ontology , InterPro motifs, SwissProt KW ,
transcription factors (TF), gene expression in tissues, bioentities from
scientific literature, cis-regulatory elements CisRed.
Tissues Mining Tool: compares reference values of gene expression in
tissues to your results.
MARMITE: Finds differential distributions of bioentities extracted from PubMed
between two groups of genes.
FatiScan: detect significant functions with Gene Ontology, InterPro motifs,
Swissprot KW and KEGG pathways in lists of genes ordered according to
differents characteristics.
MarmiteScan: Use chemical and disease-related information to detect related
blocks of genes in a gene list with associated values.
twolists
ranked list
29.09.2009 – Stefan Götz, Valencia
14
Babelomics Databases
Babelomics
Interpro
Gene Ontology
KEGGEnsembl
SwissProt
Transcription Factors
MicroRNA
Cisred
BioentitiesLiterature
Integrated Biological DBof Functional Annotation for more than 10 species
29.09.2009 – Stefan Götz, Valencia
15
BabelomicsFatiscan features
Interpret a ranked list of genes There is not need for choosing a cut-off (all
information is included) One statistical test for each Functional Block of
annotation− Multiple testing context (hundreds of annotation)− Filtering of annotation is convenient (the less tests
the best correction)
29.09.2009 – Stefan Götz, Valencia
16
FatiScan...testing along an ordered list
• Index ranking genes according to some biological aspect under study.
• Database that stores gene class membership information.
• FatiScan searches over the whole ordered list, trying to find runs of functionally related genes.
List of genes
+
-
Annotation label A
Annotation label B
Annotation label CBA C
Block of genes enriched in the annotation A
Annotation C is homogeneously
distributed along the list
Block of genes enriched in the annotation B
29.09.2009 – Stefan Götz, Valencia
17
FatiScan Web Tool
List of genes
C04018C12 2.31C04018C13 2.23C04018C14 1.87C04018C16 1.62C04018E18 0.87C04018E19 0.12C04018A21 -0.01C04018C33 -0.18C04018C65 ....
C04018C13 GO:0004707C04018C14 GO:0006468C04018C15 GO:0005524C04018C16 EC:2.7.11.24C04018E17 GO:0005739C04018E18 GO:0009536C04018A19 GO:0009056C04018C22 GO:0004869C04018C32 ...
List of annotations
29.09.2009 – Stefan Götz, Valencia
18
FatiScangene set enrichment analysis (fold-change case-control)
Ran
ked
gen
e li
st (
e.g
. fol
d-c
hang
es
(
-.....
GO1 GO2 GO10.... GO11
A:Functional labels (GO, KEGG, etc.) over-represented among over-expressed genes
B:Functional labels under-represented among over-expressed genes
C: Functional labels over-represented among under-expressed genes
D:Functional labels under-represented among under-expressed genes
+
A
B
C D
29.09.2009 – Stefan Götz, Valencia
19
FatiScan ResultsSignificantly enriched GO-Terms
Percentages (Test/Ref)
Adjusted p-Values
p-Values
29.09.2009 – Stefan Götz, Valencia
20
Functional Analysis - Outline
Enriched functions− FatiGO− Babelomics
FatiScan
Kegg Pathway Analysis Functional Similarities B2G-Far
29.09.2009 – Stefan Götz, Valencia
21
KEGG: Kyoto Encyclopedia of Genes and Genomes
GO Term
EnzymeCode
KEGGPATHWAY
29.09.2009 – Stefan Götz, Valencia
22
Obtain Enzyme Codes
29.09.2009 – Stefan Götz, Valencia
23
KEGG Pathways
First, choose a folder to save the KEGG maps
Maps are retrieved online Export as text file (tab-sep)
29.09.2009 – Stefan Götz, Valencia
24
KEGG Pathways
--> Each enzyme in a different color --> Pathways are ordered for abundance
Or
de
re
d
Li
st
29.09.2009 – Stefan Götz, Valencia
25
Functional Analysis - Outline
Enriched functions− FatiGO− Babelomics
FatiScan
Kegg Pathway Analysis Functional Similarities B2G-Far
29.09.2009 – Stefan Götz, Valencia
26
Compare annotation-sets by graphs
We have several conditions, tissues, statistical methods → different sets of annotations
Question: How similar are they? Approach:
Superpose its GO-graphs B2G Solution:
Generate a .annot file with thedifferent conditions as Seq-Name
29.09.2009 – Stefan Götz, Valencia
27
Compare annotations by graphs Make a combined graph:
29.09.2009 – Stefan Götz, Valencia
28
How to compare two sets of annotations?
Sequence X - original annotations:GO:0045449GO:0009651GO:0005634GO:0003677
Sequence Y - transferred/new annotations:GO:0031323 -> process -> regulation of cellular meta...GO:0043157 -> process -> response to cation stressGO:0005634 -> component -> nucleusGO:0003723 -> function -> RNA binding
-> process -> regulation of transcription-> process -> response to salt stress-> component -> nucleus-> function -> DNA binding
parent (dist2)child (dist1)exact (dist0)
brother (dist2)
similarity
???????????
dissimilarity
How to quantify the similarity of to sets of annotations
29.09.2009 – Stefan Götz, Valencia
29
Measures (dis)similarity
5 groups of matches− “Identical” if the B2G annotation
is present among the original annotations of the sequence.
− “General” if the B2G annotated GO term is a parent term of one of the original annotations of the sequence.
− “Specific” if the B2G annotated GO term is a children term of one of the original annotations of the sequence
− "Other branch" if no original annotation terms lay in the same DAG branch of the considered B2G annotated GO term
− "Other Type" if the term can not be compared to any other term because the GO category is not present in the original gene product annotation.
29.09.2009 – Stefan Götz, Valencia
30
Use-Case: Compare annotations DBs
Blast2GO function: Search one set of annotations in another
Example: Annotation of 1100 sequences against NR (873 seqs, 3152 annots) and SwissProt (639 seqs, 4547 annots)
Results: Search SwissProt vs. NR annotation:Compared sequences: 610Compared annotations: 4402Swissprot has: Exact GOs:1313 - 30% More specific GOs: 499 - 11% More general GOs: 206 - 5% Other branch GOs: 1660 - 37% Other Type GOs: 724 - 16%
Compare NR vs. SwissProt annotation:Compared sequences: 610Compared annotations: 2488NR has: Exact GOs: 1313 - 53% More epecific GOs: 224 - 9% More general GOs: 387 - 15% Other branch GOs: 405 - 16% Other Type GOs: 159 - 6%
29.09.2009 – Stefan Götz, Valencia
31
Functional Analysis - Outline Enriched functions
− FatiGO− Babelomics
FatiScan
Kegg Pathway Analysis Functional Similarities B2G-Far
29.09.2009 – Stefan Götz, Valencia
32
B2G-FAR
29.09.2009 – Stefan Götz, Valencia
33
Hundreds of species now with Blast2GO protein
annotations
29.09.2009 – Stefan Götz, Valencia
34
Search your sequences in B2G-FAR
Download annotations of your species Load annotations into B2G Deselect all sequences
29.09.2009 – Stefan Götz, Valencia
35
B2G-FAR - GeneChips