EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional...

35
EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional annotation and analysis with the Blast2GO suite Ana Conesa Bioinformatics Department Centro de Investigaciones Prínicpe Felipe [email protected]

Transcript of EMBRACE Gene Ontology Workshop, 7 th – 9 th November 2007 Bari High throughput functional...

EMBRACE Gene Ontology Workshop, 7th – 9th November 2007 Bari

High throughput functional annotation and analysis with the Blast2GO suite

Ana ConesaBioinformatics Department

Centro de Investigaciones Prínicpe Felipe

[email protected]

EMBRACE Workshop, 7th – 9th November 2007, Bari

Credits

Biomedical InformaticsUPV, Valencia

Juan Miguel GómezMontserrat Robles

Bioinformatics DepartmentCIPF, Valencia

Blast2GO Development:

Blast2GO special thanks to:

ANNEX :Simen Myhre, Henrik Tveit (MTNU)GOSSIP: Nils Blüthgen (MicroDiscovery GmbH)ZVTM: Emmanuel Pietriga (INRIA)

goslim.tair.obo: Suparna Mundodi (TAIR)

Bioinformatics DepartmentCIPF, Valencia

Ana ConesaStefan Goetz

Centro de GenómicaIVIA, ValenciaJavier Terol

Manuel Talón

EMBRACE Workshop, 7th – 9th November 2007, Bari

Motivation

Numerous EST/genome projects

Large amounts of NEW sequence data

Functional Genomics Studies

Need of FunctionalAnnotation

Which kind of tool?

Easy to set up & runVersatil & Universal

High-throughput & interactiveCombine annotation & function analysiswww.blast2go.org

EMBRACE Workshop, 7th – 9th November 2007, Bari

Gene Ontology based annotation

more general

more specific

Molecular FunctionBiological ProcessCellular Component

IP2G

O

GO2EC

EMBRACE Workshop, 7th – 9th November 2007, Bari

Similaritybetween Sequences

Qualityof existence annotation

Precisionvs. “recall”Resolution

Level in GO hierarchy

Selection of recovered annotation data

B2G Annotation Rule

Consistencyof assigned annotation

Concepts of automatic annotation

EMBRACE Workshop, 7th – 9th November 2007, Bari

. [(max. ) (# )]lowest node sim ECw GO GOw threshold Annotation Rule

Blast2GO Annotation Rule

Lowest term satisfying the requirements EC weight

IC 1TAS 1IDA 1IMP 0.9IGI 0.9IPI 0.9ISS 0.8IEP 0.8NAS 0.7IEA 0.7ND 0.5NR 0.5RCA 0.5

Quality of source annotation

Evid

ence C

od

es

Possibility of abstraction

sim=∑positiveshsp

∑ alignmentlengthhsp

Similarity requirement

Recall

vs.

Precision

EMBRACE Workshop, 7th – 9th November 2007, Bari

InterProScan

GO-Slim

GO Second Layer

Graph Visualization

Enrichment

Statistics

KEGG maps

Validation

localB2GDB

PipelineBatch Mode

Compare

Annotation(GO,IPR,EC)

Main functions within Blast2GO

costumDB

GeneIDs

Additional Features:

BLAST MAPPING ANNOT.RULE Manual Curation

EMBRACE Workshop, 7th – 9th November 2007, Bari

Blast2GO Schema

EMBRACE Workshop, 7th – 9th November 2007, Bari

Blast2GO use

SpeciesCitrus, nicotiana, maize, soybean, tomato, grape…Streptococcus, Trichoderma, Schistosoma, Cyanobacteria…European Flounder,pig, flidder crab, rat, honneybee, human…Metagenome projects…

EMBRACE Workshop, 7th – 9th November 2007, Bari

Where to find Blast2GO

More info:Bioinformatics 2005 21: 3674-3676Blast2GO tutorial: http://www.blast2go.org

Web:

http://www.blast2go.orghttp://blast2go.bioinfo.cipf.eshttp://www.geneontology.orghttp://groups.google.com/group/Blast2GO

EMBRACE Gene Ontology Workshop, 7th – 9th November 2007 Bari

Blast2GO Guided Tour

Ana ConesaBioinformatics Department

Centro de Investigaciones Prínicpe Felipe

[email protected]

EMBRACE Workshop, 7th – 9th November 2007, Bari

Start Blast2GO

www.blast2go.org

Desktop application

Java webstart technology

Internet connection

EMBRACE Workshop, 7th – 9th November 2007, Bari

Load Sequences

EMBRACE Workshop, 7th – 9th November 2007, Bari

Run BLAST search

EMBRACE Workshop, 7th – 9th November 2007, Bari

BLAST results

EMBRACE Workshop, 7th – 9th November 2007, Bari

Blast Distribution Charts

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 1

Launch Blast2GOOpen FASTA file (unizip examples.zip)Browse number of sequences and sequence lengthUnselect all sequencesSelect 5 sequencesRun Blast against NCBI nr (change parameters if desidered)

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 2

Open blast_example.datExamine Distribution charts

EMBRACE Workshop, 7th – 9th November 2007, Bari

Mapping

EMBRACE Workshop, 7th – 9th November 2007, Bari

MappingResources

HitACC/GI

GO-Terms

EC

sim %

GO mapping resources:• Full Gene Ontology DB• NCBI Flat Files: gene2accession (4 079 414 entries) gene_info (1 635 614 entries)• PIR - Non-Redundant Reference Protein Database: including PSD, UniProt, Swiss-Prot, TrEMBL, RefSeq, GenPept y PDB

Resources of mapping

Annotation

EMBRACE Workshop, 7th – 9th November 2007, Bari

GO

DAG ValidationAnnex

GOSlim

EC/KEGG

InterPro

EMBRACE Workshop, 7th – 9th November 2007, Bari

Gene Ontology annotation

EMBRACE Workshop, 7th – 9th November 2007, Bari

Annotation Charts

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 3

Select 10 first sequencesRun Mapping and AnnotationSelect non annotated sequences and re-annotate with milder parametersLo annotation_example.dat fileVisualize Results on Mapping/Annotation Charts

EMBRACE Workshop, 7th – 9th November 2007, Bari

Sequence menu

EMBRACE Workshop, 7th – 9th November 2007, Bari

Modulation of annotation

Change annotation manually

EC-Codes

Seq. Description

GO-Term ACC

Summarize annotation by “GoSlim”

OBO GO-Slim File

Extend annotation by the GO “Second Layer”

Biological Process Cellular Component

Molecular Function

acts inis involved in

Myhre et al, Bioinformatics 2006

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 4

Browse BlastResults to (select one sequence and use sequence menu):

Draw annotation graph

View Annotations

Edit/change annotationSelect a few sequences to run GoSlimRun Annex

EMBRACE Workshop, 7th – 9th November 2007, Bari

Enzyme annotation and Kegg Maps

GO Enzyme Codes KEGG maps

EMBRACE Workshop, 7th – 9th November 2007, Bari

InterproScan

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 5

Select a few sequences to run InterProScanChange terms view GO ID/term, IPS/GOMerge IPS results with Blast Annotations

Load annot_interpro_annex_example.datExport GO AnnotationsExport IPS AnnotationsSave Project and Sequence Table

EMBRACE Workshop, 7th – 9th November 2007, Bari

GO Graph Visualization as tool to explore dataInteractive and “zoomable” graphsColor graphs highlighting areas of interest

Node Score of

Annotation Content

31

2.4

2.5

1

1 3

Visualization # dscore seq

EMBRACE Workshop, 7th – 9th November 2007, Bari

Level and Multilevel Charts

Visualization :Pies

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 6

Select some sequences using select by names function (use test.example.txt)Create a GO Combined GraphCreate Pies at level 4 and Multilevel Pie at score 3Play with filters to simplify the graph (set score filter to 3)Export GO Graph data as table and visualize

EMBRACE Workshop, 7th – 9th November 2007, Bari

Functional analysis with B2G

Enrichment Analysis (Fisher)

EMBRACE Workshop, 7th – 9th November 2007, Bari

Exercise 7

Run Enrichment Analysis using test and reference set filesCreate Bar ChartCreate Enriched Graph and modulate number of nodesExport results