Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools...

20
Importing Community annotations into VectorBase

Transcript of Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools...

Page 1: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Importing Community annotations into

VectorBase

Page 2: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Aims

• Provide the VectorBase community with tools for improving genome annotation.

• Must have low entry requirements, be scaleable and (relatively) simple to use

Page 3: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Genome annotation

• First-pass genome annotation is almost always based on “automatic” computational approaches

• ab initio

• Similarity based

• Transcript (ESTs, RNAseq)

• Protein (nr protein database)

Page 4: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Genome assembly

Map Repeats

Genefinding

Protein-coding genes

Map Transcripts Map Peptides

nc-RNAs

Functional annotation

Submission to archival databases (Release)

Genome annotation - building a pipeline

Page 5: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Current VectorBase annotation pipeline

• MAKER based automatic annotation

• includes SNAP training and ab initio

• RNAseq based transcript similarity prediction

• Taxonomically constrained peptide similarity prediction

• 2 rounds of prediction refinement & final round includes all peptide similarity

• Community annotation phase

• Capture gene structure changes

• Metadata associated with locus (symbol, description, citation)

• Submission to INSDC, propagation to UniProt

• Presentation through VectorBase

Start

1.0 set(automati

c)

1.1 set(published

)

Page 6: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Processing submissions

• 4 phases

• Capture

• Moderation

• Storage

• Integration

Page 7: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Capture: Community annotation decision tree

Page 8: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Community annotation decision tree

Page 9: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Tool of choice: WebApollo

• Web-based

• Eliminates main drawback of deprecated CAP system - GFF3 format validation

Page 10: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

WebApollo example

Page 11: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Community annotation decision tree

Page 12: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Community annotation decision tree

Page 13: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Tool of choice: Web forms

Page 14: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Moderation & Storage

• Gene metadata captured through forms to spreadsheets

• Batch submissions use similar spreadsheet format

Page 15: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Integration: Dataflow for ‘patch’ build

CAP GFF3

WebApollo

Reference core

Updated geneset

TXT

Patch

Users

Stable IDs

Reports

Updated core

IDs

Reference core CAP

Release coreGoogle Fusion

TableXrefs

Release

XrefsGoogle Form

`

Metadata

Users

}Commit

Page 16: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Presentation of community annotation

Page 17: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Usage (as of 2015-03-30)

• 31 WebApollo instances (Organisms)

• 3,407 gene models

• Gene metadata (protein-coding loci)

• 4,987 gene symbols

• 512 gene synonyms

• 57,878 gene descriptions

• 910 loci citations from 208 publications

Page 18: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Supplementing annotations

• Community jamboree’s

• ‘Standard’ improvement (e.g. Sandfly, snail communities)

• Glossina community (e.g. March 2015, Kenya)

• VectorBase

• Default Xref run includes symbol/description assignment via UniProt

• Projection of gene description via orthology from key marker species (e.g. An. gambiae). Due to be deployed for June (VB-2015-06) release.

• Supplemental data from genome papers (e.g. 16 Anopheles spp, Musca)

Page 19: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
Page 20: Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.

Deprecated CAP system example