Making GO Annotations For Fungal Genomes A brief overview.

118
Making GO Annotations For Fungal Genomes A brief overview

Transcript of Making GO Annotations For Fungal Genomes A brief overview.

Page 1: Making GO Annotations For Fungal Genomes A brief overview.

Making GO Annotations For Fungal Genomes

A brief overview

Page 2: Making GO Annotations For Fungal Genomes A brief overview.

Outline of Topics• Intro

• Overview of Overall Annotation Pipeline

• Introduction to the Gene Ontologies (GO)

• Making GO Annotations

• Submitting GO Annotations

• GO Tool - AmiGO

Page 3: Making GO Annotations For Fungal Genomes A brief overview.

Intro & Overview of Overall Sequence and

Annotation Pipeline

Karen Christie

Saccharomyces Genome DatabasesStanford University

Page 4: Making GO Annotations For Fungal Genomes A brief overview.

1.00E+05

1.00E+06

1.00E+07

1.00E+08

1.00E+09

1.00E+10

1.00E+11

1.00E+12

Dec-82Dec-83Dec-84Dec-85Dec-86Dec-87Dec-88Dec-89Dec-90Dec-91Dec-92Dec-93Dec-94Dec-95Dec-96Dec-97Dec-98Dec-99Dec-00Dec-01Dec-02Dec-03Dec-04Dec-05Dec-06

Growth in 2006 Percent of Total3.50 x 1010 nucs 23.2%

Total Nucleotides at GenBank/EMBL/DDBJincluding Whole Genome Shotgun

NCBI created by Congress

Dec 20061.52E+11

WGS section started

EBI created at Hinxton

Saccharomyces cerevisiaeHaemophilus influenzae

Caenorhabditis elegans

Drosophila melanogaster

Mus musculus

Homo sapiens

Page 5: Making GO Annotations For Fungal Genomes A brief overview.

Fungal Genomes being sequenced at Broad Institute

Page 6: Making GO Annotations For Fungal Genomes A brief overview.

Fungal Genomes being sequenced by JGI

Page 7: Making GO Annotations For Fungal Genomes A brief overview.

Published Literature

PubMed: over 15 million citations

Basic search: secondary metabolism → 109580

Limit search:secondary metabolism (published in the last 1 year) → 5440

Boolean operators:secondary metabolism AND Aspergillus → 479

Numbers as of 3/21/2007

Page 8: Making GO Annotations For Fungal Genomes A brief overview.

Gene Ontology Objectives• GO represents categories used to classify

specific parts of our biological knowledge:– Biological Process– Molecular Function– Cellular Component

• GO develops a common language applicable to any organism

• GO terms can be used to annotate gene products from any species, allowing comparison of information across species

Page 9: Making GO Annotations For Fungal Genomes A brief overview.

My genome is sequenced!

ATGTCTTTTTTAAGTGCATCGATGTCCTGGGGGCTTAGTATAATGCTCCCCGAGCTTCCTAGCGCTTAGTGCATTAGACTAGGGCCAAAATGACTACTGTTCTTAAAGTACTAGTACTTACTACGCCCTGTTTCTTTCTTCTTCTAAAAGACTAACTAAGTGCTAGTCTAGATCTACTATTACTACCCTACCTACTATACTAGACTAATTACCAACCCCTAGGGTACTAAATTTGCCTAGTTTACGTAGCGTTCTTAAAACGTACTAGATTACCGTACTAGGGACGTACTAAGGTACTAG…

What do I do now?

Page 10: Making GO Annotations For Fungal Genomes A brief overview.

• Sequence of genes/genome

• Primary Annotation - the location and structure of genes

• Secondary Annotation - the functions of the genes

Overview of Sequencing/Annotation Pipeline

ATGCTTCCTGATTTTGCCCTGGACTTCGCTTGTATAAATTCATTGCACC…

GO process: terrequinone A biosynthesis

GO function: methyltransferase activity

Enzyme Commission: 2.1.1.-alcohol dehydrogenase

Page 11: Making GO Annotations For Fungal Genomes A brief overview.

Who will be annotating?

• Just you?

• A single group?

• A consortium of groups?

The number of people and groups participating

and the funding will affect some decisions on

whether to set up a database or use flatfiles.

Page 12: Making GO Annotations For Fungal Genomes A brief overview.

Do you (or your group) have gene calls for your sequence?

yes no

yes

yes

no no

Make automated or manual gene calls

TIGR’s Eukaryotic Annotation course

very useful

Are the protein predictions submitted to GenBank/DDBJ/EMBL?

Submit gene/protein calls to GenBank/DDBJ/EMBL

GOA will make GO annotations (IEA) usingautomated methods

Resources to make functional annotations?

Contact GO Consortium for advice,training, help with coordination, etc.

Set up pipeline for any automated annotationsnot being done by GOA

Manual GO annotationsfrom literature, or fromsequence similarity methods

GOA will collect all GO annotationsand submit them to GOC

You (or your group) collects all GOannotations and submits them to GOC

GOA will maintain annotation file You (or your group) maintains annotation file

UniProtKB contains translationsof all coding regions in GenBank/DDBJ/EMBL

Decide who will collate all GO annotations into one file

Page 13: Making GO Annotations For Fungal Genomes A brief overview.

Automated Eukaryotic Gene Annotation

Genome Sequence

Repeat masked sequence

Gene finders Database comparisons

Combined consensus prediction

EST based refinement(adjust exons, UTRs, alternative splicing)

Automated Gene Annotation

EST Database

Develop a training set

TwinscanGeneZillaglimmerHMMAugustusFgeneshetc.

AAT_aaAAT_natRNA ScanGMAPSim4etc.

Gene predictions

Repeat masker

Genome alignments

Based on TIGR course

Page 14: Making GO Annotations For Fungal Genomes A brief overview.

Manual Gene Annotation?

1st Question - Is it in the budget?

Manual annotation can be a lot better

than automated, but is a lot more

expensive and time consuming!

Based on TIGR Eukaryotic Annotation course

Page 15: Making GO Annotations For Fungal Genomes A brief overview.

Manual Gene Annotation Tools

• Viewer only– Gbrowse

• Editors– Apollo (requires a database)– Manatee (requires a database)– Artemis (runs on flat files)

Based on TIGR Eukaryotic Annotation course

Page 16: Making GO Annotations For Fungal Genomes A brief overview.

Eukaryotic Gene Annotation

At the end of the procedure, you’ll have:• Gene calls• Protein predictions• Unique IDs for your genes

This last is important. Gene IDs are unambiguous. Gene names are frequently ambiguous. You’ll also need IDs in order to submit GO annotations.

Example:

Gene Name: SP119242 hits in Entrez nucleotide 1 hit

Gene ID: NM_138473

Page 17: Making GO Annotations For Fungal Genomes A brief overview.

Ready to make Functional Annotations!

• Questions – What’s your budget?– How much literature is available?

• Automated annotations– Faster, cheaper– Often less specific

• Manual annotations– Time consuming & more expensive– Precise and accurate

Page 18: Making GO Annotations For Fungal Genomes A brief overview.

Do you (or your group) have gene calls for your sequence?

yes no

yes

yes

no no

Make automated or manual gene calls

TIGR’s Eukaryotic Annotation course

very useful

Are the protein predictions submitted to GenBank/DDBJ/EMBL?

Submit gene/protein calls to GenBank/DDBJ/EMBL

GOA will make GO annotations (IEA) usingautomated methods

Resources to make functional annotations?

Contact GO Consortium for advice,training, help with coordination, etc.

Set up pipeline for any automated annotationsnot being done by GOA

Manual GO annotationsfrom literature, or fromsequence similarity methods

UniProtKB contains translationsof all coding regions in GenBank/DDBJ/EMBL

Decide who will collate all GO annotations into one fileDecide who will collate all GO annotations into one file

Page 19: Making GO Annotations For Fungal Genomes A brief overview.

Introduction to GO

Rama Balakrishnan

Saccharomyces Genome Database

Stanford University, CA

Page 20: Making GO Annotations For Fungal Genomes A brief overview.

A Common Language for Annotation of Genes from

Yeast, Flies and Mice

The Gene Ontologies

…and Plants and Worms

…and Humans

…and anything else!

Page 21: Making GO Annotations For Fungal Genomes A brief overview.

http://www.geneontology.org/

Page 22: Making GO Annotations For Fungal Genomes A brief overview.

What’s in a name?

• What is a cell?

Page 23: Making GO Annotations For Fungal Genomes A brief overview.

Cell

Page 24: Making GO Annotations For Fungal Genomes A brief overview.

Cell

Page 25: Making GO Annotations For Fungal Genomes A brief overview.

Cell

Page 26: Making GO Annotations For Fungal Genomes A brief overview.

Cell

Page 27: Making GO Annotations For Fungal Genomes A brief overview.

Cell

Image from http://microscopy.fsu.edu

Page 28: Making GO Annotations For Fungal Genomes A brief overview.

What’s in a name?

• The same name can be used to describe different concepts

Page 29: Making GO Annotations For Fungal Genomes A brief overview.

What’s in a name?

Page 30: Making GO Annotations For Fungal Genomes A brief overview.

What’s in a name?

• Glucose synthesis• Glucose biosynthesis• Glucose formation• Glucose anabolism• Gluconeogenesis

• All refer to the process of making glucose from simpler components

Page 31: Making GO Annotations For Fungal Genomes A brief overview.

What’s in a name?

• The same name can be used to describe different concepts

• A concept can be described using different names

Comparison is difficult – in particular across species or across databases

Page 32: Making GO Annotations For Fungal Genomes A brief overview.

What’s in a name?

• Rad54 (S. cerevisiae)• Okra (D. melanogaster)• Rhp54(S. pombe)

What do these genes products have in common?

ATP dependent helicase involved in DNA recombination, repair

Page 33: Making GO Annotations For Fungal Genomes A brief overview.

What is the Gene Ontology?

A (part of the) solution: - A controlled vocabulary that can be applied

to all organisms - Used to describe gene products - proteins

and RNA - in any organism

Page 34: Making GO Annotations For Fungal Genomes A brief overview.

What is Ontology?

• Dictionary: A branch of metaphysics concerned with the nature and relations of being.

• Barry Smith: The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area

of reality.

1606 1700s

Page 35: Making GO Annotations For Fungal Genomes A brief overview.

So what does that mean?

From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a

representation of things, that are detectable or directly observable, and the relationships between those things.

Page 36: Making GO Annotations For Fungal Genomes A brief overview.

Ontology

Includes:

1. A vocabulary of terms (names for concepts)

2. Definitions

3. Defined logical relationships to each other

Page 37: Making GO Annotations For Fungal Genomes A brief overview.

How does GO work?

• What does the gene product do? – Molecular Function

• Why does it perform these activities? – Process

• Where does it act?– Location in the cell, cellular component

What information might we want to capture about a gene product?

Page 38: Making GO Annotations For Fungal Genomes A brief overview.

• Molecular Function = elemental activity/task– the tasks performed by individual gene products; examples are

carbohydrate binding and ATPase activity

• Biological Process = biological goal or objective

– broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

• Cellular Component = location or complex– subcellular structures, locations, and macromolecular complexes; examples

include nucleus, telomere, and RNA polymerase II holoenzyme

The 3 Gene Ontologies

Page 39: Making GO Annotations For Fungal Genomes A brief overview.

Cellular Componentwhere a gene product acts

Page 40: Making GO Annotations For Fungal Genomes A brief overview.

Molecular Functionactivities or “jobs” of a gene product

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

glucose-6-phosphate isomerase activity

insulin bindinginsulin receptor

activity

drug transporter activity

Page 41: Making GO Annotations For Fungal Genomes A brief overview.

Molecular Function

• A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product.

• Sets of functions make up a biological process.

Page 42: Making GO Annotations For Fungal Genomes A brief overview.

Biological Process

cell division

transcription

limb development Courtship behavior

Page 43: Making GO Annotations For Fungal Genomes A brief overview.

Function (what) Process (why)

Drive nail (into wood) Carpentry

Drive stake (into soil) Gardening

Smash roach Pest Control

Clown’s juggling object Entertainment

Example: Gene Product = hammer

Page 44: Making GO Annotations For Fungal Genomes A brief overview.

term: gluconeogenesis

id: GO:0006094

definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.

Synonym: glucose biosynthesis

What’s in a GO term?

Page 45: Making GO Annotations For Fungal Genomes A brief overview.

No GO Areas

• GO covers ‘normal’ functions and processes– No pathological processes– No experimental conditions

• NO evolutionary relationships• NO gene products• NOT a system of nomenclature for

genes

Page 46: Making GO Annotations For Fungal Genomes A brief overview.

Ontology Structure

• The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG)

• Terms can have more than one parent and zero, one or more children

• Terms are linked by two relationships– is-a– part-of

Page 47: Making GO Annotations For Fungal Genomes A brief overview.

Chromosome

Cytoplasmic chromosome

Mitochondrialchromosome

Plastid chromosome

Nuclear chromosome

A child is a subset or instances of

a parent’s elements

Parent-Child Relationships

Page 48: Making GO Annotations For Fungal Genomes A brief overview.

One-to-many parental relationshipMany-to-many parental relationship

Each child has only one parent

Each child may have one or more parents

Parent-Child Relationships

DAG: Directed Acyclic Graph

Page 49: Making GO Annotations For Fungal Genomes A brief overview.

[other organelles]

chromosome

Intracellular organelle

nucleus

nuclear chromosome

cell part

cellular_component

mitochondrial chromosome

[Other types of chromosomes]

is_a

part_of

A Sample DAG

Page 50: Making GO Annotations For Fungal Genomes A brief overview.

True Path Rule

• The path from a child term all the way up to its top-level parent(s) must always be true

cell cytoplasm

chromosome nuclear chromosome cytoplasmic chromosome mitochondrial chromosome

nucleus nuclear chromosome

is-a

part-of

Page 51: Making GO Annotations For Fungal Genomes A brief overview.

•Terms become obsolete when they are removed or redefined

•GO IDs are never deleted from the ontologies

•For every obsoleted term, a comment is added to explains why the term is now obsolete

Ensuring Stability in a Dynamic Ontology

Obsolete Cellular ComponentObsolete Molecular Function

Biological Process

Obsolete Biological Process

Molecular FunctionCellular Component

Page 52: Making GO Annotations For Fungal Genomes A brief overview.

Why modify the GO?

• GO reflects current knowledge of biology

• Biology drives changes to the ontologies

Page 53: Making GO Annotations For Fungal Genomes A brief overview.

term: MAPKKK cascade (mating sensu Saccharomyces)

goid: GO:0007244

definition: OBSOLETE. MAPKKK cascade involved in transduction of mating pheromone signal, as described in Saccharomyces.

definition_reference: PMID:9561267

comment: This term was made obsolete because it is a gene product specific term. To update annotations, use the biological process term 'signal transduction during conjugation with cellular fusion ; GO:0000750'.

Obsolete terms

definition: MAPKKK cascade involved in transduction of mating pheromone signal, as described in Saccharomyces

Page 54: Making GO Annotations For Fungal Genomes A brief overview.

• Access gene product functional information

• Do cross species comparison

• Find how much of a proteome is involved in a process/ function/ component

•Provide a link between biological knowledge and …

• gene expression profiles

• proteomics data

What can scientists do with GO?

Page 55: Making GO Annotations For Fungal Genomes A brief overview.

Whole genome analysis(J. D. Munkvold et al., 2004)

Microarray analysis

Using GO to Aid Microarray Analysis

Page 56: Making GO Annotations For Fungal Genomes A brief overview.

Orthogonal to existing ontologies to facilitate combinatorial approaches- Share unique identifier space- Include definitions

• Anatomies

• Cell Types

• Sequence Attributes (SO)

• Temporal Attributes

• Phenotypes

• Diseases

• More….

http://obo.sourceforge.net

Beyond GO – Open Biomedical Ontologies

Page 57: Making GO Annotations For Fungal Genomes A brief overview.

GO Annotations: What are they and how are they made?

Maria Costanzo

Saccharomyces and Candida Genome DatabasesStanford University

Page 58: Making GO Annotations For Fungal Genomes A brief overview.

Let’s Get Started!

• What is an annotation?• Annotation approaches• Strategies for identifying literature to

annotate• Strategies for reading a paper for

annotation• Strategies for annotating a gene and a

genome

Page 59: Making GO Annotations For Fungal Genomes A brief overview.

What is a GO annotation?

• A annotation is a piece of information associated with a gene product

• A gene product is usually a protein but can be a functional RNA

• A GO annotation is a Gene Ontology term associated with a gene product

Page 60: Making GO Annotations For Fungal Genomes A brief overview.

Anatomy of a GO annotation

Gene Product

GO Term

IMP, IGI, IPI, ISS, IDA, IEP, TAS, NAS, ND, RCA, IC, IEA

Evidence Code

Reference

Page 61: Making GO Annotations For Fungal Genomes A brief overview.

IMP inferred from mutant phenotypeIGI inferred from genetic interactionIPI inferred from physical interactionISS inferred from sequence similarityIDA inferred from direct assayIEP inferred from expression patternIC inferred by curatorTAS traceable author statementNAS non-traceable author statementND no biological data availableIEA inferred from electronic annotation

http://www.geneontology.org/doc/GO.evidence.html

Evidence Codes for GO AnnotationsEvidence Codes for GO Annotations

Page 62: Making GO Annotations For Fungal Genomes A brief overview.

Additional annotation information

• WITH/FROM: supporting info for the evidence code– IPI, IGI, ISS, IEA, IC– Contains the interacting or similar gene product

• QUALIFIER: describes the GO term– NOT– contributes to (used with Molecular Function terms)– colocalizes with (used with Cellular Component terms)

Page 63: Making GO Annotations For Fungal Genomes A brief overview.

Approaches for annotation of a genome

1. Automated/Electronic approaches

2. Manual approaches

3. Combinatorial approach

Page 64: Making GO Annotations For Fungal Genomes A brief overview.

Electronic Annotation

• Generate annotations relatively quickly & cheaply

• Annotation derived without human validation– Sequence similarity, e.g. BLAST search ‘hits’,HMMs, etc.– Mapping file, e.g. interpro2go, ec2go, etc.

• Useful For:– genomes that don’t have extensive literature– groups with limited curatorial resources

Page 65: Making GO Annotations For Fungal Genomes A brief overview.

Electronic Annotation

• Often based on sequence similarity

• Document the method used in a abstract– unpublished abstract in your own database

– unpublished abstract submitted to GO references collection

• Annotation is not reviewed by human• IEA evidence code

Page 66: Making GO Annotations For Fungal Genomes A brief overview.

Combinatorial Approach, e.g. using sequence similarity

1. Alignments published in literature

2. Analysis using full length protein

3. Analysis using protein domains

Page 67: Making GO Annotations For Fungal Genomes A brief overview.

Example IEA Annotations from dictyBase

Page 68: Making GO Annotations For Fungal Genomes A brief overview.

Example unpublished reference

Page 69: Making GO Annotations For Fungal Genomes A brief overview.

Manual annotation

• Created by scientific curators• Time intensive• Utilizes

– published literature– sequence comparison data

• Aided by curation tools– Manatee (open source from TIGR)– Apollo (open source from GMOD)– Artemis (open source)

Page 70: Making GO Annotations For Fungal Genomes A brief overview.

Literature Source1. PubMed

- National Library of Medicine, National Institutes of Health- http://ncbi.nlm.nih.gov

2. Agricola - United States Department of Agriculture, National

Agricultural Library- http://agricola.nal.usda.gov

3. Embase- Elsevier- http://www.embase.com

4. Biosis - Thomson

- http://www.biosis.org5. Unpublished (e.g. for internal sequence analysis methods)

- abstract in your own database- unpublished abstract submitted to GO references

collection

Page 71: Making GO Annotations For Fungal Genomes A brief overview.

Example Annotation

GO Term

Gene Product

nek2

centrosomeGO:0005813

Reference

PMID: 11956323

Evidence Code

IDAInferred fromDirect Assay

Page 72: Making GO Annotations For Fungal Genomes A brief overview.

1. Species name

2. Gene/gene product names:daf-12, spo11, Sonic hedgehog

3. Process AND species: embryonic development AND elegans

4. Function AND species:transcription factor AND mays

5. Cellular component AND species (genus):plasma membrane AND Drosophila

What to Search For in Published Literature?

Page 73: Making GO Annotations For Fungal Genomes A brief overview.

GO Annotation: GMOD Tools for Enhancing Information Retrieval

GMOD – Generic Software Components for Model Organism Databases

- http://www.gmod.org/home

- Literature search tools:PubSearch – http://www.gmod.org/?q=node/44

PubFetch - http://www.gmod.org/?q=node/84

Textpresso – http://www.textpresso.org- full text of articles- semantic categories

Page 74: Making GO Annotations For Fungal Genomes A brief overview.

GO Annotation: Strategies for Identifying Literature for Curation

1. Primary research literature with new experimental data- Mutant phenotypes – process- Activity assays – function- Localization studies – component

2. Computational analyses- Phylogenetic analysis – function (ISS)- Domain analysis

3. Review articles- Summarizes and cites primary literature (TAS)

Page 75: Making GO Annotations For Fungal Genomes A brief overview.

Which parts of the paper are most important?

• Experimental Results• Results: Figures, Tables, Text• Materials and Methods

• Introductory information• Abstract

• Explanatory text (use with caution)• (Introduction) – mostly TAS information• (Discussion)

Page 76: Making GO Annotations For Fungal Genomes A brief overview.

Reading papers as curator,rather than as a bench scientist

• Don’t be swayed by the speculations or theories that may appear in the Discussion.

• Focus on the actual results vs. the possible, but not proven, implications of those results.

• Read for details and contact authors if key identifiers are missing.

Page 77: Making GO Annotations For Fungal Genomes A brief overview.

How to find a GO term to use?

• Web based tools-– AmiGO browser (http://www.godatabase.org)– QuickGO (http://www.ebi.ac.uk/ego/)

• Downloadable tool (https://sourceforge.net/projects/geneontology/)– OBO-Edit (must also download the ontology file)

Page 78: Making GO Annotations For Fungal Genomes A brief overview.

Extracting Information from a paper

Sample text from PMID: 12374299

In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

Page 79: Making GO Annotations For Fungal Genomes A brief overview.

Example Manual Annotations from SGD

Page 80: Making GO Annotations For Fungal Genomes A brief overview.

Annotation from published literature

1. Focus on known genes

2. Identify literature relevant to that genea. using gene names, species name

3. Complete annotation set for a genea. annotate available experimental datab. annotations to root nodes indicate

nothing is known

Page 81: Making GO Annotations For Fungal Genomes A brief overview.

[other organelles]

chromosome

Intracellular organelle

nucleus

nuclear chromosome

cell part

cellular_component

mitochondrial chromosome

[Other types of chromosomes]

is_a

part_of

Annotating genes to GO terms and the True Path Rule

RAD51 RAD52

Page 82: Making GO Annotations For Fungal Genomes A brief overview.

The True Path Rule Applied to Annotations

Are all paths to the root true for my gene product?

• Yes, great, annotate

• No?– Is there a term I can use where all paths

will be true– Does the ontology structure need to be

changed?

Page 83: Making GO Annotations For Fungal Genomes A brief overview.

I don’t see terms in the ontology to describe the biology of my species

• Source Forge (SF) tracker for term related issueshttps://sourceforge.net/projects/geneontology/

• Send an email to the GO mailing list• Content meetings

– Organized by the consortium if the ontology related issues can’t be resolved over email/SF

– Look for announcements on the GO website, mailing lists

Page 84: Making GO Annotations For Fungal Genomes A brief overview.

http://pamgo.vbi.vt.edu

1. Develop GO terms for functions, processes and structures used by microbes in their associations with plants and animals

• fungi, oomycetes, bacteria, nematodes• 472 terms recently added to GO

2. Create reference genomes by manual annotation of selected microbe genomes

• in progress

3. Training workshops• July 26, 2007. IS-MPMI Workshop, Sorrento,

Italy• August 8-10, 2007. Virginia Bioinformatics

Institute- travel funds available for students and

postdocs

Page 85: Making GO Annotations For Fungal Genomes A brief overview.

GO Terms needed: Secondary Metabolism

• The fungal community is going to need to add new terms: – secondary metabolism pathways– possibly other areas

• Fungal species so far annotated have not had secondary metabolism pathways, so no terms have been created to represent these areas

• The GO Consortium will be very happy to work with the fungal community to create the needed terms

Page 86: Making GO Annotations For Fungal Genomes A brief overview.

Contributing GO Annotations

Karen Christie

Saccharomyces Genome DatabasesStanford University

Page 87: Making GO Annotations For Fungal Genomes A brief overview.

Do you (or your group) have gene calls for your sequence?

yes no

yes

yes

no no

Make automated or manual gene calls

TIGR’s Eukaryotic Annotation course

very useful

Are the gene/protein predictions submitted to GenBank/DDBJ/EMBL?

Submit gene/protein calls to GenBank/DDBJ/EMBL

GOA will make GO annotations (IEA) usingautomated methods

Resources to make functional annotations?

Contact GO Consortium for advice,training, help with coordination, etc.

Set up pipeline for any automated annotationsnot being done by GOA

Manual GO annotationsfrom literature, or fromsequence similarity methods

GOA will collect all GO annotationsand submit them to GOC

You (or your group) collects all GOannotations and submits them to GOC

GOA will maintain annotation file You (or your group) maintains annotation file

UniProtKB contains translationsof all coding regions in GenBank/DDBJ/EMBL

Decide who will collate all GO annotations into one fileDecide who will collate all GO annotations into one file

Page 88: Making GO Annotations For Fungal Genomes A brief overview.

Do you (or your group) have gene calls for your sequence?

yes no

yes no no

Make automated or manual gene callsAre the protein predictions submitted to GenBank/DDBJ/EMBL?

Submit gene/protein calls to GenBank/DDBJ/EMBL

GOA will make GO annotations (IEA) usingautomated methods

Resources to make functional annotations?

GOA will collect all GO annotationsand submit them to GOC

GOA will maintain annotation file

UniProtKB contains translationsof all coding regions in GenBank/DDBJ/EMBL

Page 89: Making GO Annotations For Fungal Genomes A brief overview.

Do you (or your group) have gene calls for your sequence?

yes no

yes

yes

no

Make automated or manual gene callsAre the protein predictions submitted to GenBank/DDBJ/EMBL?

Submit gene/protein calls to GenBank/DDBJ/EMBL

GOA will make GO annotations (IEA) usingautomated methods

Resources to make functional annotations?

Contact GO Consortium for advice,training, help with coordination, etc.

Set up pipeline for any automated annotationsnot being done by GOA

Manual GO annotationsfrom literature, or fromsequence similarity methods

You (or your group) collects all GOannotations and submits them to GOC

You (or your group) maintains annotation file

UniProtKB contains translationsof all coding regions in GenBank/DDBJ/EMBL

Decide who will collate all GO annotations into one file

Page 90: Making GO Annotations For Fungal Genomes A brief overview.

I have my annotations, what next?

DB: Source of the ID in column 2Examples- SGD, MGI, UniProt

Symbol like Brr2, DDX21_HUMAN

that means something to a biologist, not an ID

ID for the gene or gene_productExamples - FBgn0015331, MGI:99240, SPAC9.03c

Object_Type - gene, transcript, protein, protein_structure, or complex, should match the ID

gene_association file - format info at http://www.geneontology.org/GO.annotation.shtml#file

Page 91: Making GO Annotations For Fungal Genomes A brief overview.

DB source DB Object ID Object Symbol Qualifier GOID DB:reference Ev_code With/From Aspect DB object Name Synonym Object_type Taxon ID Date Assigned bySGD S000004660 AAC1 GO:0005743 SGD_REF:S000050955|PMID:2167309 TAS C ADP/ATP translocatorYMR056C gene taxon:4932 20010118 SGDSGD S000004660 AAC1 GO:0005471 SGD_REF:S000050955|PMID:2167309 IDA F ADP/ATP translocatorYMR056C gene taxon:4932 20010213 SGDSGD S000004660 AAC1 GO:0006839 SGD_REF:S000050955|PMID:2167309 IGI SGD:S000000126 P ADP/ATP translocatorYMR056C gene taxon:4932 20040226 SGDSGD S000004660 AAC1 GO:0009060 SGD_REF:S000050955|PMID:2167309 IGI SGD:S000000126 P ADP/ATP translocatorYMR056C gene taxon:4932 20040226 SGDSGD S000000289 AAC3 GO:0005743 SGD_REF:S000045889|PMID:2165073 ISS SGD:S000000126|SGD:S000004660C ADP/ATP translocatorYBR085W|ANC3 gene taxon:4932 20040226 SGDSGD S000000289 AAC3 GO:0005471 SGD_REF:S000045889|PMID:2165073 ISS SGD:S000000126|SGD:S000004660F ADP/ATP translocatorYBR085W|ANC3 gene taxon:4932 20040226 SGDSGD S000000289 AAC3 GO:0009061 SGD_REF:S000045889|PMID:2165073 IGI SGD:S000000126 P ADP/ATP translocatorYBR085W|ANC3 gene taxon:4932 20040226 SGDSGD S000000289 AAC3 GO:0009061 SGD_REF:S000052497|PMID:1915842 IGI SGD:S000000126|SGD:S000004660P ADP/ATP translocatorYBR085W|ANC3 gene taxon:4932 20040226 SGDSGD S000000289 AAC3 GO:0009061 SGD_REF:S000045889|PMID:2165073 IEP P ADP/ATP translocatorYBR085W|ANC3 gene taxon:4932 20040226 SGDSGD S000003916 AAD10 GO:0008372 SGD_REF:S000069584 ND C aryl-alcohol dehydrogenase (putative)YJR155W gene taxon:4932 20010119 SGDSGD S000003916 AAD10 GO:0018456 SGD_REF:S000042151|PMID:10572264 ISS F aryl-alcohol dehydrogenase (putative)YJR155W gene taxon:4932 20020902 SGDSGD S000003916 AAD10 GO:0006081 SGD_REF:S000042151|PMID:10572264 ISS P aryl-alcohol dehydrogenase (putative)YJR155W gene taxon:4932 20020902 SGDSGD S000005275 AAD14 GO:0008372 SGD_REF:S000069584 ND C aryl-alcohol dehydrogenase (putative)YNL331C gene taxon:4932 20010119 SGDSGD S000005275 AAD14 GO:0018456 SGD_REF:S000042151|PMID:10572264 ISS F aryl-alcohol dehydrogenase (putative)YNL331C gene taxon:4932 20020902 SGDSGD S000005275 AAD14 GO:0006081 SGD_REF:S000042151|PMID:10572264 ISS P aryl-alcohol dehydrogenase (putative)YNL331C gene taxon:4932 20020902 SGDSGD S000005525 AAD15 GO:0008372 SGD_REF:S000069584 ND C aryl-alcohol dehydrogenase (putative)YOL165C gene taxon:4932 20010119 SGDSGD S000005525 AAD15 GO:0018456 SGD_REF:S000042151|PMID:10572264 ISS F aryl-alcohol dehydrogenase (putative)YOL165C gene taxon:4932 20020902 SGDSGD S000005525 AAD15 GO:0006081 SGD_REF:S000042151|PMID:10572264 ISS P aryl-alcohol dehydrogenase (putative)YOL165C gene taxon:4932 20020902 SGDSGD S000001837 AAD16 GO:0008372 SGD_REF:S000069584 ND C YFL057C gene taxon:4932 20020902 SGDSGD S000001837 AAD16 GO:0018456 SGD_REF:S000042151|PMID:10572264 ISS F YFL057C gene taxon:4932 20020902 SGDSGD S000001837 AAD16 GO:0006081 SGD_REF:S000042151|PMID:10572264 ISS P YFL057C gene taxon:4932 20020902 SGDSGD S000000704 AAD3 GO:0008372 SGD_REF:S000069584 ND C aryl-alcohol dehydrogenase (putative)YCR107W gene taxon:4932 20010119 SGDSGD S000000704 AAD3 GO:0018456 SGD_REF:S000042151|PMID:10572264 ISS F aryl-alcohol dehydrogenase (putative)YCR107W gene taxon:4932 20020902 SGDSGD S000000704 AAD3 GO:0006081 SGD_REF:S000042151|PMID:10572264 ISS P aryl-alcohol dehydrogenase (putative)YCR107W gene taxon:4932 20020902 SGD

These columns may be empty

Sample gene-associations file

Page 92: Making GO Annotations For Fungal Genomes A brief overview.

What tools/infrastructure do you need to record annotations?

• Excel spread sheet (simple, easy, small scale)

OR

• Database– FileMaker Pro, Access (Simple databases)– ORACLE, Sybase, or MySQL (Relational

databases)

Page 93: Making GO Annotations For Fungal Genomes A brief overview.

How do I share my gene_associations file?

• Provide them to the larger community by submitting your annotations to the GO project

• What information should I submit to GO?– Gene_association file– Short file with info about submitting group

• Where should I submit the data?– Contact the GOC to establish a contact for your group – [email protected]

Page 94: Making GO Annotations For Fungal Genomes A brief overview.

Databases contributing annotations include:

– dictyBase (Dictyostelium discoideum) – FlyBase (Drosophila melanogaster) – GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum,

Leishmania major and Trypanosoma brucei) – UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro

databases – Gramene (grains, including rice, Oryza) – Mouse Genome Database (MGD) and Gene Expression Database (GXD)

(Mus musculus) – Rat Genome Database (RGD) (Rattus norvegicus)– Reactome– Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) – The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) – The Institute for Genomic Research (TIGR): databases on several bacterial

species – WormBase (Caenorhabditis elegans) – Zebrafish Information Network (ZFIN): (Danio rerio)

Page 95: Making GO Annotations For Fungal Genomes A brief overview.

Annotation coverage

Annotation Coverage by Genome

Page 96: Making GO Annotations For Fungal Genomes A brief overview.

GO Current Annotations

http://www.geneontology.org/GO.current.annotations.shtml

Page 97: Making GO Annotations For Fungal Genomes A brief overview.

GO Current Annotations: Filtered Files

http://www.geneontology.org/GO.current.annotations.shtml

Page 98: Making GO Annotations For Fungal Genomes A brief overview.

GO Current Annotations: Unfiltered Files

http://www.geneontology.org/GO.current.annotations.shtml

Page 99: Making GO Annotations For Fungal Genomes A brief overview.

GOA Proteome Species Specific Files

http://www.ebi.ac.uk/GOA/proteomes.html

Page 100: Making GO Annotations For Fungal Genomes A brief overview.

Resources offered by the GO project

• Website (http://www.geneontology.org)– Lots of documentation– Tools, tutorials and software

• Mailing list ([email protected])• Help email address ([email protected])• GO project on SourceForge

(https://sourceforge.net/projects/geneontology)– Submit suggestions, e.g. new ontology terms, etc.– Download tools, e.g. OBO-Edit

• AmiGO browser (http://amigo.geneontology.org)• GO database

Page 101: Making GO Annotations For Fungal Genomes A brief overview.

AmiGO Tutorial

Rama Balakrishnan

Saccharomyces Genome Database Stanford University

Page 102: Making GO Annotations For Fungal Genomes A brief overview.

What is AmiGO?

• Web application that allows you to:

– browse the ontologies

– view annotations from various species

– compare sequences using BLAST (GOst)

Page 103: Making GO Annotations For Fungal Genomes A brief overview.

AmiGO

http://amigo.geneontology.org

Page 104: Making GO Annotations For Fungal Genomes A brief overview.

Basic Search

Page 105: Making GO Annotations For Fungal Genomes A brief overview.

AmiGO Search Results: GO Terms

Page 106: Making GO Annotations For Fungal Genomes A brief overview.

Term Details Page

Page 107: Making GO Annotations For Fungal Genomes A brief overview.
Page 108: Making GO Annotations For Fungal Genomes A brief overview.

Gene Product Details and Annotations

Page 109: Making GO Annotations For Fungal Genomes A brief overview.

Is_a relationship

Part_of relationship

Leaf node or no children

Node has been opened, can be clicked to closeNode has children, can be clicked to view children

pie chart summary of the numbers of gene products associated to

any immediate descendants of this term in the tree.

Page 110: Making GO Annotations For Fungal Genomes A brief overview.

Annotations associated with a termAnnotation data are from the gene_associations file submitted by the annotating groups

Page 111: Making GO Annotations For Fungal Genomes A brief overview.

AmiGO Advanced Search

Page 112: Making GO Annotations For Fungal Genomes A brief overview.

Filters

Page 113: Making GO Annotations For Fungal Genomes A brief overview.

BLAST• Blast a protein sequence against all gene products that have a GO

annotation

• Can be accessed from the AmiGO Home page (front page)

Page 114: Making GO Annotations For Fungal Genomes A brief overview.

BLAST can also be accessed from the annotations section

Page 115: Making GO Annotations For Fungal Genomes A brief overview.

AmiGO Help

Page 116: Making GO Annotations For Fungal Genomes A brief overview.

Contact us

• We welcome your input• Please send suggestions, bugs to us• [email protected]

Page 117: Making GO Annotations For Fungal Genomes A brief overview.

Contact us

• We welcome your input• Please send suggestions, bugs to us• [email protected]

Page 118: Making GO Annotations For Fungal Genomes A brief overview.

Acknowledgements

The people of the GO Consortium: