Download - ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

Transcript
Page 1: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

WormBase beyond www.wormbase.org

WormBase ParaSite

• New home for parasitic worm genomes in WormBase

UCSC WormBase assembly hub• View current WormBase data on UCSC genome browser

Page 2: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

WormBase ParaSite

Motivation

• Many (100s) of parasitic nematode genome sequences available/iminent

• Helminth genomes scattered across a number of resources

• Much of data is “draft” quality

Introducing WormBase ParaSite (parasite.wormbase.org)

• Consistent, integrated access to hundreds of parasitic nematode draft genomes

• Encompass all parasitic worms (i.e. nematodes and flatworms)

Page 3: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

WormBase ParaSite genomes (v2)

Nematodes• 63 species (70 genomes)

•Clade I – 7 species (9)

•Clade III – 22 species (24)

•Clade IV – 16 species (16)

•Clade V - 18 species (21)

• Largest and smallest•Teladorsagia circumcincta (700 Mb)

•Parastrongyloides trichosuri (42 Mb)

Platyhelminthes• 25 species (26 genomes)

•Cestodes – 12 species

•Trematodes – 11 species

•Other– 2 species

• Largest and smallest•Spirometra erinaceieuropaei (1250 Mb)

•Hydatigera taeniaeformis (100 Mb)

Orthologs and paralogs• Ensembl “Compara” protein-tree pipeline

• 118 genomes

•9 additional nematode genomes (free living)

•13 comparator genomes

•Including human, mouse, zebrafish

• ~150,000 protein multiple alignments

• ~1000 CPU days

Page 4: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

http://parasite.wormbase.org

Page 5: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

http://parasite.wormbase.org

Page 6: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

ParaSite Downloads

Page 7: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

ftp://ftp.wormbase.org/pub/wormbase/parasite• Consistent file naming and data organisation

• Genome project (NCBI BioProject) disambiguation

• Files for each genome

• Genome fasta(s)

• Protein fasta

• Transcript fasta

• Annotation GFF3

Page 8: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

http://parasite.wormbase.org

Page 9: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

ParaSite Mart

• Table-based data-mining tool

• Like WormMine, but different interface

• Complementary to WormMine

•Less depth for C. elegans, but…

•Comprehensive species set (all nematode genomes)

•Some additional functionality

Page 10: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

ParaSite Mart - orthologs

Page 11: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

ParaSite Mart – sequence extraction

Page 12: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

The UCSC WormBase genome Hub

Background● Many researchers like the UCSC genome browser

○ Familiar interface

○ Comparative genomics (alignments / conservation)

● Worm data at UCSC is 5 years out of date

UCSC hubs● A new mechanism for remote hosting of collections of genome browser tracks

● Emerging standard for cross-browser compatibility

● The WormBase hub

○ View up-to-date WormBase data on UCSC!

○ View some data not viewable anywhere else: genomic alignments

Page 13: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

Nematode genomic alignments

Progressive Cactus (Nguyen et al, 2014)

• New tool (UCSC) for genome multiple alignments (100s

genomes)

• Creates “virtual” ancestor genomes

• Output = HAL file (HDF5 database)

WormBase cactus alignments

• 29 nematode genomes (more in future)

• Viewable on UCSC browser (“SNAKE” tracks)

Page 15: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

UCSC

New Dropdowns:

● Nematodes● core

genomes● WormBase

assembly identifiers

Search:● seq. names● WBGeneIDs● gene symbols

Page 16: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

UCSC

Release Tracks:

● transcripts● current + reference● pseudogenes● ncRNAs● mRNA alignments● WormBase links

Assembly Tracks:● repeats● conservation● comparative hub

Page 17: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

UCSC

Page 18: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

UCSC

Page 19: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

EnsEMBL (and friends)

Development Hub

http://ftp.ebi.ac.uk/pub/databases/wormbase/releases/current-development-release/COMPARATIVE_ANALYSIS/hub/parasite_hub.txt

Production Hub

http://ftp.ebi.ac.uk/pub/databases/wormbase/releases/current-production-release/COMPARATIVE_ANALYSIS/hub/parasite_hub.txt

current:metazoa.ensembl.org

coming soon:parasite.wormbase.orgensembl.org

configure tracks

Page 20: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

… and more

GBrowse

BioDalliance

JBrowse

Page 21: ParaSite Ensembl Genomes and UCSC Assembly Hub WormBase Workshop International Worm Meeting 2015

Summary

WormBase ParaSite

• parasite.wormbase.org

• Poster 952C (Saturday)

UCSC WormBase assembly hub• ftp.ebi.ac.uk/pub/databases/wormbase/releases/current-production-release/

COMPARATIVE_ANALYSIS/hub/hub.txt

• blog.wormbase.org

More information• [email protected]

• Come and see us for a tutorial!