EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional...
-
Upload
solomon-copeland -
Category
Documents
-
view
215 -
download
0
Transcript of EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional...
EMBRACE Gene Ontology Workshop, 7th – 9th November 2007 Bari
High throughput functional annotation and analysis with the Blast2GO suite
Ana ConesaBioinformatics Department
Centro de Investigaciones Prínicpe Felipe
EMBRACE Workshop, 7th – 9th November 2007, Bari
Credits
Biomedical InformaticsUPV, Valencia
Juan Miguel GómezMontserrat Robles
Bioinformatics DepartmentCIPF, Valencia
Blast2GO Development:
Blast2GO special thanks to:
ANNEX :Simen Myhre, Henrik Tveit (MTNU)GOSSIP: Nils Blüthgen (MicroDiscovery GmbH)ZVTM: Emmanuel Pietriga (INRIA)
goslim.tair.obo: Suparna Mundodi (TAIR)
Bioinformatics DepartmentCIPF, Valencia
Ana ConesaStefan Goetz
Centro de GenómicaIVIA, ValenciaJavier Terol
Manuel Talón
EMBRACE Workshop, 7th – 9th November 2007, Bari
Motivation
Numerous EST/genome projects
Large amounts of NEW sequence data
Functional Genomics Studies
Need of FunctionalAnnotation
Which kind of tool?
Easy to set up & runVersatil & Universal
High-throughput & interactiveCombine annotation & function analysiswww.blast2go.org
EMBRACE Workshop, 7th – 9th November 2007, Bari
Gene Ontology based annotation
more general
more specific
Molecular FunctionBiological ProcessCellular Component
IP2G
O
GO2EC
EMBRACE Workshop, 7th – 9th November 2007, Bari
Similaritybetween Sequences
Qualityof existence annotation
Precisionvs. “recall”Resolution
Level in GO hierarchy
Selection of recovered annotation data
B2G Annotation Rule
Consistencyof assigned annotation
Concepts of automatic annotation
EMBRACE Workshop, 7th – 9th November 2007, Bari
. [(max. ) (# )]lowest node sim ECw GO GOw threshold Annotation Rule
Blast2GO Annotation Rule
Lowest term satisfying the requirements EC weight
IC 1TAS 1IDA 1IMP 0.9IGI 0.9IPI 0.9ISS 0.8IEP 0.8NAS 0.7IEA 0.7ND 0.5NR 0.5RCA 0.5
Quality of source annotation
Evid
ence C
od
es
Possibility of abstraction
sim=∑positiveshsp
∑ alignmentlengthhsp
Similarity requirement
Recall
vs.
Precision
EMBRACE Workshop, 7th – 9th November 2007, Bari
InterProScan
GO-Slim
GO Second Layer
Graph Visualization
Enrichment
Statistics
KEGG maps
Validation
localB2GDB
PipelineBatch Mode
Compare
Annotation(GO,IPR,EC)
Main functions within Blast2GO
costumDB
GeneIDs
Additional Features:
BLAST MAPPING ANNOT.RULE Manual Curation
EMBRACE Workshop, 7th – 9th November 2007, Bari
Blast2GO use
SpeciesCitrus, nicotiana, maize, soybean, tomato, grape…Streptococcus, Trichoderma, Schistosoma, Cyanobacteria…European Flounder,pig, flidder crab, rat, honneybee, human…Metagenome projects…
EMBRACE Workshop, 7th – 9th November 2007, Bari
Where to find Blast2GO
More info:Bioinformatics 2005 21: 3674-3676Blast2GO tutorial: http://www.blast2go.org
Web:
http://www.blast2go.orghttp://blast2go.bioinfo.cipf.eshttp://www.geneontology.orghttp://groups.google.com/group/Blast2GO
EMBRACE Gene Ontology Workshop, 7th – 9th November 2007 Bari
Blast2GO Guided Tour
Ana ConesaBioinformatics Department
Centro de Investigaciones Prínicpe Felipe
EMBRACE Workshop, 7th – 9th November 2007, Bari
Start Blast2GO
www.blast2go.org
Desktop application
Java webstart technology
Internet connection
EMBRACE Workshop, 7th – 9th November 2007, Bari
Exercise 1
Launch Blast2GOOpen FASTA file (unizip examples.zip)Browse number of sequences and sequence lengthUnselect all sequencesSelect 5 sequencesRun Blast against NCBI nr (change parameters if desidered)
EMBRACE Workshop, 7th – 9th November 2007, Bari
Exercise 2
Open blast_example.datExamine Distribution charts
EMBRACE Workshop, 7th – 9th November 2007, Bari
MappingResources
HitACC/GI
GO-Terms
EC
sim %
GO mapping resources:• Full Gene Ontology DB• NCBI Flat Files: gene2accession (4 079 414 entries) gene_info (1 635 614 entries)• PIR - Non-Redundant Reference Protein Database: including PSD, UniProt, Swiss-Prot, TrEMBL, RefSeq, GenPept y PDB
Resources of mapping
Annotation
EMBRACE Workshop, 7th – 9th November 2007, Bari
GO
DAG ValidationAnnex
GOSlim
EC/KEGG
InterPro
EMBRACE Workshop, 7th – 9th November 2007, Bari
Exercise 3
Select 10 first sequencesRun Mapping and AnnotationSelect non annotated sequences and re-annotate with milder parametersLo annotation_example.dat fileVisualize Results on Mapping/Annotation Charts
EMBRACE Workshop, 7th – 9th November 2007, Bari
Modulation of annotation
Change annotation manually
EC-Codes
Seq. Description
GO-Term ACC
Summarize annotation by “GoSlim”
OBO GO-Slim File
Extend annotation by the GO “Second Layer”
Biological Process Cellular Component
Molecular Function
acts inis involved in
Myhre et al, Bioinformatics 2006
EMBRACE Workshop, 7th – 9th November 2007, Bari
Exercise 4
Browse BlastResults to (select one sequence and use sequence menu):
Draw annotation graph
View Annotations
Edit/change annotationSelect a few sequences to run GoSlimRun Annex
EMBRACE Workshop, 7th – 9th November 2007, Bari
Enzyme annotation and Kegg Maps
GO Enzyme Codes KEGG maps
EMBRACE Workshop, 7th – 9th November 2007, Bari
Exercise 5
Select a few sequences to run InterProScanChange terms view GO ID/term, IPS/GOMerge IPS results with Blast Annotations
Load annot_interpro_annex_example.datExport GO AnnotationsExport IPS AnnotationsSave Project and Sequence Table
EMBRACE Workshop, 7th – 9th November 2007, Bari
GO Graph Visualization as tool to explore dataInteractive and “zoomable” graphsColor graphs highlighting areas of interest
Node Score of
Annotation Content
31
2.4
2.5
1
1 3
Visualization # dscore seq
EMBRACE Workshop, 7th – 9th November 2007, Bari
Exercise 6
Select some sequences using select by names function (use test.example.txt)Create a GO Combined GraphCreate Pies at level 4 and Multilevel Pie at score 3Play with filters to simplify the graph (set score filter to 3)Export GO Graph data as table and visualize
EMBRACE Workshop, 7th – 9th November 2007, Bari
Functional analysis with B2G
Enrichment Analysis (Fisher)