Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

37
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009

Transcript of Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Page 1: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Introduction to the GO:a user’s guide

Iowa State Workshop

11 June 2009

Page 2: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

All workshop materials are available at AgBase.

Page 3: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Genomic Annotation Genome annotation is the process of

attaching biological information to genomic sequences. It consists of two main steps:

1. identifying functional elements in the genome: “structural annotation”

2. attaching biological information to these elements: “functional annotation”

biologists often use the term “annotation” when they are referring only to structural annotation

Page 4: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

CHICK_OLF6

DNA annotation

Protein annotation

Data from Ensembl Genome browser

TRAF 1, 2 and 3 TRAF 1 and 2

Structural annotation:

Page 5: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

catenin

Functional annotation:

Page 6: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Structural & Functional AnnotationStructural Annotation: Open reading frames (ORFs) predicted during genome

assembly predicted ORFs require experimental confirmation the Sequence Ontology (SO) provides a structured controlled

vocabulary for sequence annotation

Functional Annotation: annotation of gene products = Gene Ontology (GO)

annotation initially, predicted ORFs have no functional literature and GO

annotation relies on computational methods (rapid) functional literature exists for many genes/proteins prior to

genome sequencing GO annotation does not rely on a completed genome

sequence!

Page 7: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

1. Provides structural annotation for agriculturally important genomes

2. Provides functional annotation (GO)

3. Provides tools for functional modeling

4. Provides bioinformatics & modeling support for research community

Page 8: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Introduction to GO1. pre-GO: managing large datasets

2. Bio-ontologies

3. the Gene Ontology (GO) a GO annotation example GO evidence codes literature biocuration & computation analysis ND vs no GO sources of GO

Page 9: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

1. pre-GO: managing large datasets

Page 10: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

AgBase User Support Functional modeling training Database ID mapping

approx. 75% of requests Providing GO annotation for datasets/arrays Assistance with GO modeling tools Intermediary with between research community

and public databases NCBI, UniProtKB, GO Consortium

Computational assistance

Page 11: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Converting database accessions UniProt database Ensembl BioMart Online analysis tools

DAVID, g:profiler, etc

AgBase database ArrayIDer tool

More information about these tools is available from the online workshop resources.

Page 12: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

1. UniProt ID Mapping

Page 13: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

2. Ensembl BioMart

NOTE: Ensembl is scheduled to add plant & microbe species in 2009.

Page 14: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

3. Online analysis toolsg:profiler conversion toolhttp://biit.cs.ut.ee/gprofiler/gconvert.cgi

This tool works for all species found in Ensembl.

Page 15: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

3. Online analysis toolsDatabase for Annotation, Visualization and Integrated Discovery (DAVID)http://david.abcc.ncifcrf.gov/conversion.jsp

This tool works for a wide range of species.

Page 16: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Contact AgBase to request additional species.

4. AgBase: ArrayIDer

Page 17: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Page 18: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

2. Bio-ontologies

Page 19: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Bio-ontologies Bio-ontologies are used to capture biological

information in a way that can be read by both humans and computers.necessary for high-throughput “omics” datasetsallows data sharing across databases

Objects in an ontology (eg. genes, cell types, tissue types, stages of development) are well defined.

The ontology shows how the objects relate to each other.

Page 20: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Bio-ontologies:http://www.obofoundry.org/

Page 21: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Ontologies

digital identifier(computers)

description(humans)

relationships between terms

Page 22: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

3. The Gene Ontology

Page 23: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Functional Annotation Gene Ontology (GO) is the de facto method

for functional annotation Widely used for functional genomics (high

throughput) Many tools available for gene expression

analysis using GO The GO Consortium homepage:

http://www.geneontology.org

Page 24: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

GO Mapping Example

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA

Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA

NDUFAB1

Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA

Page 25: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

GO Mapping Example

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA

Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA

NDUFAB1

Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA

aspect or ontologyGO:ID (unique)

GO term nameGO evidence code

Page 26: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

GO Mapping Example

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA

Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA

NDUFAB1

Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

Page 27: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

GO Mapping Example

NDUFAB1

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

Biocuration of literature• detailed function • “depth”• slower (manual)

Page 28: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

P05147

PMID: 2976880

Find a paperabout the protein.

Biocuration of Literature:detailed gene function

Page 29: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Read paper to get experimental evidence of function

Use most specific termpossible

experiment assayed kinase activity:use IDA evidence code

Page 30: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

GO Mapping Example

NDUFAB1

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

Biocuration of literature• detailed function • “depth”• slower (manual)

Sequence analysis• rapid (computational)• “breadth” of coverage •less detailed

Page 31: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Computational GO annotation (“breadth”)

Ranjit Kumar

existing GO annotations

ga file accessions with no ISO

ISO PIPELINE

accessions from your species(species 1)

public orthology prediction tool(s)

1:1 orthologs

transfer GO annotation to your species (ISO)

IEA PIPELINE

fasta file of sequences (aa or nt)

InterPro analysis(domains/motifs) GO2InterPro

mapping file

domains/motifs in sequence

assign GO (IEA)no GO: “ND”

ga file

(integrate output into one ga file)

Page 32: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Unknown Function vs No GO ND – no data

Biocurators have tried to add GO but there is no functional data available

Previously: “process_unknown”, “function_unknown”, “component_unknown”

Now: “biological process”, “molecular function”, “cellular component”

No annotations (including no “ND”): biocurators have not annotated

Page 33: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

1. Primary sources of GO: from the GO Consortium (GOC) & GOC members

most up to date most comprehensive

2. Secondary sources: other resources that use GO provided by GOC members

public databases (eg. NCBI, UniProtKB) genome browsers (eg. Ensembl) array vendors (eg. Affymetrix) GO expression analysis tools

Page 34: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

Different tools and databases display the GO annotations differently.

Since GO terms are continually changing and GO annotations are continually added, need to know when GO annotations were last updated.

Page 35: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

EXAMPLES: public databases (eg. NCBI, UniProtKB) genome browsers (eg. Ensembl) array vendors (eg. Affymetrix)

CONSIDERATIONS: What is the original source? When was it last updated? Are evidence codes displayed?

Secondary Sources of GO annotation

Page 36: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Page 37: Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.

For more information about GO GO Evidence Codes:

http://www.geneontology.org/GO.evidence.shtml

gene association file information: http://www.geneontology.org/GO.format.annotation.shtml

tools that use the GO: http://www.geneontology.org/GO.tools.shtml

GO Consortium wiki: http://wiki.geneontology.org/index.php/Main_Page

All websites are available from the workshop website & handout.