CACAO Training

45
CACAO Training ASM-JGI 2012

description

CACAO Training. ASM-JGI 2012. Transferring information to new genomes. Lists of genes. Database. New knowledge. Known functions of Homologs or subsets. Curation is rate limiting. Literature. Database. Biocurators (rate limiting). Datasets. CACAO is growing. CACAO biodiversity. - PowerPoint PPT Presentation

Transcript of CACAO Training

Page 1: CACAO Training

CACAO TrainingASM-JGI 2012

Page 2: CACAO Training

Transferring information to new genomes

Database

Lists of genes

Known functions ofHomologs or subsets

New knowledge

Page 3: CACAO Training

Curation is rate limiting

Literature

Datasets

Biocurators(rate limiting)

Database

Page 4: CACAO Training

CACAO is growing

Spring 2010 Fall 2010 Spring 2011 Fall 2011 Spring 20120

500

1000

1500

2000

1 2 5 9 616 2297

309

165153

753871

1796

1316

schools

students

annotations

Page 5: CACAO Training

CACAO biodiversityE.

coli

Hum

anM

ouse

Pseu

dom

onas

Bacil

lus

Arab

idop

sisSt

rept

ococ

cus

Sacc

haro

myc

es Rat

Xant

hom

onas

Lact

obac

illus

Clos

tridi

umVi

brio

Dros

ophi

laBo

rrel

ia

Cory

neba

cter

ium

Stap

hylo

cocc

usCa

mpy

loba

cter

Citro

bacte

rLe

ishm

ania

0

50

100

150

200

250

Anno

tatio

ns

Spring 2012

Page 6: CACAO Training

CACAO 2

• CACAO changes the job of the professionals from primary curation to assessment

• Growth in CACAO makes assessment rate limiting

• Solution: Promote CACAO veterans to help with assessment

Page 7: CACAO Training

[email protected]

BIOCURATORS

Page 8: CACAO Training

The biocurator training …

Page 9: CACAO Training

What’s in it for you?

– We hope you will • learn how we think about protein function• gain skills that will help your future career• enjoy contributing to a resource used by people all over the world• have fun!

Page 10: CACAO Training

Annotation

Annotation: a note that is made while reading any form of text

For scientists,1. Nucleotide level: Where the genes are in

the genome 2. Protein level: What their functions are

From Wikipedia

Page 11: CACAO Training

Annotation

Annotation: a note that is made while reading any form of text

For scientists,1. Nucleotide level: Where the genes are in

the genome 2. Protein level: What their functions are

From Wikipedia

Page 12: CACAO Training

Functional Annotation

Annotation: a note that is made while reading any form of text

Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein

Page 13: CACAO Training

Functional Annotation

Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein

• Specific format = GO (Gene Ontology) Annotation

Page 14: CACAO Training

GO (Gene Ontology) Annotations• 3 aspects (ontologies) for

describing protein attributes:1. Biological Process2. Molecular Function3. Cellular Component

• Controlled vocabulary– Everyone uses the same terms– Terms have 7 digit IDs that computers can

understand

• Relationships between terms

GO:0005886

Page 15: CACAO Training

Molecular Function• activities or “jobs” of a gene product

GO:0004347 hexokinase activity

From PMID:9341134, rndsystems.com

GO:0016301 Kinase activity

Page 16: CACAO Training

Biological Process• a commonly recognized series of events

GO:0051301 cell division

From ridge.icu.ac.jp, edtech.clas.pdx.edu, scielosp.org

GO:0006351 transcription, DNA dependent

GO:0009405 pathogenesis

Page 17: CACAO Training

Cellular Component• where a gene product acts

From visualphotos.com, epmm.group.shef.ac.uk, http://www.cellsignal.com/products/2415.html

GO:0005739 mitochondrion

GO:0009274 peptidoglycan-based

cell wall

GO:0005840 ribosome

Page 18: CACAO Training

Where can you search for GO terms? GONUTS (gowiki.tamu.edu)

- http://gowiki.tamu.edu- http://www.ebi.ac.uk/QuickGO- http://amigo.geneontology.org

Page 19: CACAO Training
Page 20: CACAO Training
Page 21: CACAO Training
Page 22: CACAO Training
Page 23: CACAO Training

What do you actually need once you have found the correct term?

GO:0004713

Page 24: CACAO Training

Functional Annotation

Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein

• Specific format = GO (Gene Ontology) Annotation

• Peer-reviewed paper

Page 25: CACAO Training

Finding a scientific paper

• Has to be a scientific paper with experimental data in it. (Anything else is a valid reason to challenge!!)

• No review articles, no books, no textbooks, no wikipedia articles, no class notes…

• You will need the PMID number

22110029

Page 26: CACAO Training

Functional Annotation

Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein

• Specific format = GO (Gene Ontology) Annotation• Peer-reviewed paper• Protein

Page 27: CACAO Training

What can you annotate? Proteins.• PubMed for papers on a specific topic or protein or GO term• Search UniProt for something interesting (i.e. allergen) or a

protein of interest (i.e. PcnB)• Check the references in the paper you are currently reading

No matter what, you will need to find the protein’s accession on UniProt (http://uniprot.org)

Use that accession to make a page for that protein on GONUTS (http://gowiki.tamu.edu)

Add your GO annotations to the protein’s page on GONUTS

Page 28: CACAO Training

Why do you need an accession from UniProt (http://www.uniprot.org)?

1. UniProt is not editable by the community, but GONUTS is.2. GONUTS can make a page that has the annotations from UniProt for

any protein using it’s UniProt accession.3. Correct & complete annotations at the end of the competition will be

submitted back to UniProt.

*

Page 29: CACAO Training

How do you make a new protein page in GONUTS?

1

2

• GoPageMaker will: Check if the page exists in GONUTS & take you there if it does. Make a page if it does not exist in GONUTS already & pull all of the

annotations from UniProt into a table that you can edit.

• Make as many protein pages as you would like!

Page 30: CACAO Training

Annotations

edit table

Page 31: CACAO Training

Functional Annotation

Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein

• Specific format = GO (Gene Ontology) Annotation• Peer-reviewed paper• Protein

Page 32: CACAO Training

Annotations

edit table

Page 33: CACAO Training

Form for your annotation (when you edit the table)

Page 34: CACAO Training

4 REQUIRED parts of EVERY GO annotation

GOEvidence

code

ReferenceNotes (about evidence)

Page 35: CACAO Training

Summary of Evidence Codes for CACAO

Evidence codes describe the type of work or analysis done by the authors

• IDA: Inferred from Direct Assay• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• ISO: Inferred from Sequence Orthology• ISA: Inferred from Sequence Alignment• ISM: Inferred from Sequence Model• IGC: Inferred from Genomic Context

If it’s not one of these 7, your annotation is incorrect!!!

http://gowiki.tamu.edu/wiki/index.php/evidence_codes

Page 36: CACAO Training

Functional Annotation

Functional Annotation: a note in a specific format that is made based on evidence in a peer-reviewed paper about the attributes of a protein

• Specific format = GO (Gene Ontology) Annotation• Peer-reviewed paper• Protein• Evidence code

Page 37: CACAO Training

4 REQUIRED parts of EVERY GO annotation

GOEvidence

code

ReferenceNotes (about evidence)

Page 38: CACAO Training

2 other parts that may rarely be required…

With/From

Qualifier

Page 39: CACAO Training

How is CACAO scored? Rounds

• Points for a complete AND correct annotation (normally 1 week/round, today = 25 mins)

• 4 necessary parts• May be additional parts• NOTE: We will take away points if the annotation is not correct when assessed by an

experienced CACAO biocurator

• Challenges are used to steal points for incorrect &/or incomplete annotations (normally 1 week/round, today = 20 mins)

• Identify a problem • Suggest correct alternative

• Refinements can be entered by any team (during any challenge week)

Page 40: CACAO Training

Scoreboard & Challengeshttp://gowiki.tamu.edu/wiki/index.php/

Category:ASM_JGI_challenge

Page 41: CACAO Training

Team & Individual Pages

challenge

Page 42: CACAO Training

Challenges

1. Enter the reason for your challenge here. - (i.e. What’s wrong)

2. Provide the fix(es) for it.

Page 43: CACAO Training

Annotation discussion (aka argument)

Page 44: CACAO Training

• UniProt – http://uniprot.org– Find your protein(s) here (UniProt accession required)

• PubMed – http://pubmed.org– Find your papers about the protein’s attributes (molecular function,

biological process, cellular component)

• GONUTS – http://gowiki.tamu.edu– Search for GO terms– Make page for your protein on GONUTS (using UniProt accession)– Add your annotation to the protein’s Annotation table during first

(Annotation) week of any round– Review and challenge competitors’ annotations during the second

(challenge) week of any round

Page 45: CACAO Training

ASM-JGI Competition!

• You now have 25 mins to:– Use the assigned paper for your group and …– Find the correct UniProt accession– Make the page for the protein on GONUTS– Make at least one annotation

• You will have 20 mins to challenge other teams’ annotations– What fields are wrong & why?!