ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

Post on 14-Aug-2015

122 views 1 download

Tags:

Transcript of ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

WormBase beyond www.wormbase.org

WormBase ParaSite

• New home for parasitic worm genomes in WormBase

UCSC WormBase assembly hub• View current WormBase data on UCSC genome browser

WormBase ParaSite

Motivation

• Many (100s) of parasitic nematode genome sequences available/iminent

• Helminth genomes scattered across a number of resources

• Much of data is “draft” quality

Introducing WormBase ParaSite (parasite.wormbase.org)

• Consistent, integrated access to hundreds of parasitic nematode draft genomes

• Encompass all parasitic worms (i.e. nematodes and flatworms)

WormBase ParaSite genomes (v2)

Nematodes• 63 species (70 genomes)

•Clade I – 7 species (9)

•Clade III – 22 species (24)

•Clade IV – 16 species (16)

•Clade V - 18 species (21)

• Largest and smallest•Teladorsagia circumcincta (700 Mb)

•Parastrongyloides trichosuri (42 Mb)

Platyhelminthes• 25 species (26 genomes)

•Cestodes – 12 species

•Trematodes – 11 species

•Other– 2 species

• Largest and smallest•Spirometra erinaceieuropaei (1250 Mb)

•Hydatigera taeniaeformis (100 Mb)

Orthologs and paralogs• Ensembl “Compara” protein-tree pipeline

• 118 genomes

•9 additional nematode genomes (free living)

•13 comparator genomes

•Including human, mouse, zebrafish

• ~150,000 protein multiple alignments

• ~1000 CPU days

http://parasite.wormbase.org

http://parasite.wormbase.org

ParaSite Downloads

ftp://ftp.wormbase.org/pub/wormbase/parasite• Consistent file naming and data organisation

• Genome project (NCBI BioProject) disambiguation

• Files for each genome

• Genome fasta(s)

• Protein fasta

• Transcript fasta

• Annotation GFF3

http://parasite.wormbase.org

ParaSite Mart

• Table-based data-mining tool

• Like WormMine, but different interface

• Complementary to WormMine

•Less depth for C. elegans, but…

•Comprehensive species set (all nematode genomes)

•Some additional functionality

ParaSite Mart - orthologs

ParaSite Mart – sequence extraction

The UCSC WormBase genome Hub

Background● Many researchers like the UCSC genome browser

○ Familiar interface

○ Comparative genomics (alignments / conservation)

● Worm data at UCSC is 5 years out of date

UCSC hubs● A new mechanism for remote hosting of collections of genome browser tracks

● Emerging standard for cross-browser compatibility

● The WormBase hub

○ View up-to-date WormBase data on UCSC!

○ View some data not viewable anywhere else: genomic alignments

Nematode genomic alignments

Progressive Cactus (Nguyen et al, 2014)

• New tool (UCSC) for genome multiple alignments (100s

genomes)

• Creates “virtual” ancestor genomes

• Output = HAL file (HDF5 database)

WormBase cactus alignments

• 29 nematode genomes (more in future)

• Viewable on UCSC browser (“SNAKE” tracks)

UCSC

New Dropdowns:

● Nematodes● core

genomes● WormBase

assembly identifiers

Search:● seq. names● WBGeneIDs● gene symbols

UCSC

Release Tracks:

● transcripts● current + reference● pseudogenes● ncRNAs● mRNA alignments● WormBase links

Assembly Tracks:● repeats● conservation● comparative hub

UCSC

UCSC

EnsEMBL (and friends)

Development Hub

http://ftp.ebi.ac.uk/pub/databases/wormbase/releases/current-development-release/COMPARATIVE_ANALYSIS/hub/parasite_hub.txt

Production Hub

http://ftp.ebi.ac.uk/pub/databases/wormbase/releases/current-production-release/COMPARATIVE_ANALYSIS/hub/parasite_hub.txt

current:metazoa.ensembl.org

coming soon:parasite.wormbase.orgensembl.org

configure tracks

… and more

GBrowse

BioDalliance

JBrowse

Summary

WormBase ParaSite

• parasite.wormbase.org

• Poster 952C (Saturday)

UCSC WormBase assembly hub• ftp.ebi.ac.uk/pub/databases/wormbase/releases/current-production-release/

COMPARATIVE_ANALYSIS/hub/hub.txt

• blog.wormbase.org

More information• help@wormbase.org

• Come and see us for a tutorial!