Kipper: Sequence database versioning for Galaxy bioinformatics servers

10
KIPPER: SEQUENCE DATABASE VERSIONING FOR GALAXY BIOINFORMATICS SERVERS Damion Dooley Hsiao Lab, BC Public Health Microbiology & Reference Laboratory And UBC Department of Pathology, Vancouver, Canada https ://github.com/Public-Health- Bioinformatics /kipper /versioned_data

Transcript of Kipper: Sequence database versioning for Galaxy bioinformatics servers

Page 1: Kipper: Sequence database versioning for Galaxy bioinformatics servers

KIPPER: SEQUENCE DATABASE VERSIONING FOR GALAXY BIOINFORMATICS SERVERS

Damion DooleyHsiao Lab, BC Public Health Microbiology & Reference LaboratoryAnd UBC Department of Pathology, Vancouver, Canada

https://github.com/Public-Health-Bioinformatics

/kipper /versioned_data

Page 2: Kipper: Sequence database versioning for Galaxy bioinformatics servers

How to recreate sequencing analysis?

Retrieve or redo sequencing data

Get right software versions

Get databases as they appeared on a certain date

Page 3: Kipper: Sequence database versioning for Galaxy bioinformatics servers

Nice database vs. juggernaut

Periodically publishedVarying ability to download past versions

RDP RNA v10.1 – 11.4 (5.5 GB) Silva RNA v89 – 119 (2.6 GB)Uniref (~50 versions, ~35 GB latest)

Pseudo-versioned Version stated but no way to get past ones?No client software for insert/delete diff

NCBI nt (58 GB) NCBI nr (78 GB) Ancient juggernaut supporting immortal database and crushing

unwary sys admins in its path

Page 4: Kipper: Sequence database versioning for Galaxy bioinformatics servers

Kipper – fetch!

What is a poor server admin to do?

Page 5: Kipper: Sequence database versioning for Galaxy bioinformatics servers

Kipper data store

Metadata file

Page 6: Kipper: Sequence database versioning for Galaxy bioinformatics servers

Kipper data store

Volume file(s)

Page 7: Kipper: Sequence database versioning for Galaxy bioinformatics servers

Version listing

• Add new version:

• Retrieve a version by id:

$ Kipper rdp_rna –i download.fasta –o.

$ Kipper rdp_rna –e –n11

• Kipper is a python script

$ Kipper rdp_rna

Page 8: Kipper: Sequence database versioning for Galaxy bioinformatics servers

Galaxy - version retrieval

Page 9: Kipper: Sequence database versioning for Galaxy bioinformatics servers

Version retrieval

Page 10: Kipper: Sequence database versioning for Galaxy bioinformatics servers

Acknowledgements

This work was supported by Genome Canada / Genome BC Grant “A Federated Bioinformatics Platform for Public Health Microbial Genomics” to Fiona Brinkman, Gary Van Domselaar and William Hsiao. More information about the IRIDA project (Integrated Rapid Infectious Disease Analysis) can be found at http://www.irida.ca