Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust...
-
Upload
blaze-singleton -
Category
Documents
-
view
218 -
download
0
Transcript of Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust...
Methods for Creating GO Annotations
Emily DimmerEuropean Bioinformatics Institute
Wellcome Trust Genome Campus
Cambridge
UK
The core information needed for a GO annotation
1. Database object (protein)e.g. Q9ARH1
2. GO term IDe.g. GO:0004674
3. Reference IDe.g. PubMed ID: 12374299 GOA:InterPro
4. Evidence codee.g. TAS
1. Database object (protein)e.g. Q9ARH1
2. GO term IDe.g. GO:0004674
3. Reference IDe.g. PubMed ID: 12374299 GOA:InterPro
4. Evidence codee.g. TAS
The core information needed for a GO annotation
1. Database object (protein)e.g. Q9ARH1
2. GO term IDe.g. GO:0004674
3. Reference IDe.g. PubMed ID: 12374299 GOA:InterPro
4. Evidence codee.g. TAS
The core information needed for a GO annotation
1. Database object (protein)e.g. Q9ARH1
2. GO term IDe.g. GO:0004674
3. Reference IDe.g. PubMed ID: 12374299 GOA:InterPro
4. Evidence codee.g. TAS
The core information needed for a GO annotation
GO Evidence Codes
Code Definition
IEA Inferred from Electronic Annotation
IDA Inferred from Direct Assay
IEP Inferred from Expression Pattern
IGI Inferred from Genetic Interaction
IMP Inferred from Mutant Phenotype
IPI Inferred from Physical Interaction
ISS Inferred from Sequence Similarity
TAS Traceable Author Statement
NAS Non-traceable Author Statement
RCA Reviewed Computational Analysis
IC Inferred from Curator
ND No Data
Manuallyannotated
• Every GO annotation includes an Evidence Code that gives information about the evidence from which the annotation has been made.
Additional fields can be used to further clarify an annotation
• Qualifiers
(NOT, contributes_to, colocalizes_with)
• ‘with’ data to provide users with more information on the method/experiment applied.
hSNF2H ATPase activity GO:0016887 IDA
Rsf-1 NOT ATPase activity GO:0016887 IDA
Annotations using the ‘NOT’ qualifier
Loyola et al. Mol Cell Biol. 2003 Oct;23(19):6759-68.
1. Its individual action
2. the action of the whole complex
To differentiate between these two types of annotations, if a protein does not possess the activity itself, the annotation has the contributes_to qualifier added
A protein which is part of a complex can be annotated to terms in that describe:
(Molecular Function terms)
Annotations using the ‘contributes_to’ qualifier
Cao et al. Mol Cell. 2005 Dec 22;20(6):845-54.
Bmi-1 ubiquitin-protein ligase activity IDA contributes_to
Ring1A ubiquitin-protein ligase activity IDA contributes_to
Pc3 ubiquitin-protein ligase activity IDA contributes_to
Ring1B ubiquitin-protein ligase activity IDA
Annotations using the ‘contributes_to’ qualifier
Annotations using the ‘colocalizes_with’ qualifier
• Used with cellular component terms
• To describe proteins that are transiently or peripherally associated with an organelle or complex
Meyer et al. J Cell Biol. 1997 Feb 24;136(4):775-88.
CENP-E condensed chromosome kinetochore IDA colocalizes_with
Annotations using additional identifiers in the ‘with’ column
• Provides further information to support the evidence code used in an annotation
For protein binding annotations…
Protein GO term Evidence Reference With
When transferring annotations based on sequence similarity…
Protein GO term Evidence Reference With
There are two main types of GO annotation:
Electronic Annotation
Manual Annotation
both these methods have their advantages
They can be easily distinguished by the ‘evidence code’ used.
Electronic Annotation
Fatty acid biosynthesis ( Swiss-Prot Keyword)
EC:6.4.1.2 (EC number)
IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)
MF_00527: Putative 3-methyladenine DNA glycosylase(HAMAP)
GO:Fatty acid biosynthesis
(GO:0006633)
GO:acetyl-CoA carboxylase activity
(GO:0003989)
GO:acetyl-CoA carboxylaseactivity
(GO:0003989)
GO:DNA repair
(GO:0006281)
• Very high-quality
•However these annotations often use high-level GO terms and provide little detail.
Camon et al. BMC Bioinformatics. 2005; 6 Suppl 1:S17
• High–quality, specific annotations made using:
• Peer-reviewed papers
• A range of evidence codes to categorize the types of evidence found in a paper
• very time consuming and requires trained biologists
Manual Annotation
Finding GO terms … …for chicken TaxREB107protein (Q8UWG7)
Component: cytoplasm GO:0005737
nucleolicytoplasmic
increased troponin I reporter geneactivity
positive modulator of skeletal muscle geneexpression
Component: nucleolus GO:0005730
Process: positive regulation of transcription GO:0045941
Process: positive regulation of skeletal muscle development GO:0048643
Aids for GO manual annotation
Many are on the GO Consortium tools page:
http://www.geneontology.org/GO.tools.shtml
GoPubMed gives an overview over literature abstracts taken from PubMed and categorizes them with Gene Ontology terms:
GoPubMed
http://gopubmed.org
http://www.ebi.ac.uk/ego
http://www.godatabase.org
…and more varieties of browsers available on the GO Tools page:
http://www.geneontology.org/GO.tools.html
http://www.geneontology.org/GO.tools.html
Searching for GO terms
GO annotation editors
• enhanced spreadsheets (e.g. Excel)
• Protein2GO (GOA)
• The GO Consortium is aware there is a need for a light-weight, generic GO annotation tool.
Enhanced Spreadsheets
• quick and cheap to start with
• however difficult to maintain/update a reasonable sized set of annotations
QuickGO : http://www.ebi.ac.uk/ego
Download and parse an entire gene association file…
…or look at annotations for a protein using one of the GO browsers or a database that integrates GO annotations.
How users can view GO annotations
AcknowledgementsNicky Mulder Head of InterPro Evelyn Camon GOA CoordinatorDaniel Barrell GOA ProgrammerRachael Huntley GOA Curator
David Binns & John Maslen QuickGO, Protein2GO tools Achuthanunni C. Balakrishnan Text-2-GO
Jorge Duarte IPI sets
Midori Harris GO EditorJane Lomax GO CuratorAmelia Ireland GO CuratorJennifer Clarke GO Curator
Rolf Apweiler Head of Sequence Database Group The Gene Ontology Consortium and 1.5 members of GOA currently supported by an P41 grant from the National Human Genome Research Institute (NHGRI) [grant HG002273], GOA is also supported by core EMBL funding.