Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

43
Tools for: Open-Source Open-Data Rob L Davidson about.me/rob.davidson www.slideshare.net/RobertDavidson6/g3-talk

description

Rob Davidson at the G3 (Great GigaScience & Galaxy) Workshop: Open Source - Tools for Reproducibility. University of Melbourne, 19th September 2014

Transcript of Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Page 1: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Tools for:

Open-SourceOpen-Data

Rob L Davidson about.me/rob.davidsonwww.slideshare.net/RobertDavidson6/g3-talk-rld2

Page 2: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

The problem

reproducibility.cs.arizona.edu• 515 papers (429 conf, 86 journal) • <30% reproducible

Page 3: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

The problem

reproducibility.cs.arizona.edu

Page 4: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

The Cause

• Stodden 2010– 638 registrant at NIPS

• 30% share code• 20% share data

http://web.stanford.edu/~vcs/papers/SMPRCS2010.pdf

Page 5: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Publishers must provide! HostingCurating

Citations for everything:data, tools + workflows

Page 6: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Tools for Reproducibility

• Data: GigaDB• Images: OMERO• Workflows

– Galaxy – Executable Docs– VMs

Page 7: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

GigaDBgithub.com/gigascience/gigadb-cogini

Page 8: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Hosting all data

Page 9: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Hosting all research objects

Page 10: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Impact for research objects

• Host• Curate• Share• Cite - DOI

Page 11: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Even more accessible, transparent data?Hosting image data with OMERO

Page 12: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Re-producing Images Image LIMS Keeps metadata with image Means the image can be

found later! Image can be understood Also some processing

options

http://www.openmicroscopy.org/site/products/omero

Page 13: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Accessible, transparent Images Embed in web Full res View without special

software Adjust contrast etc Link all images to pub!

No cherry picking!

http://www.openmicroscopy.org/site/products/omero

Page 14: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

NO

Cyber-Centipedes! Phenotyping

Page 15: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Accessible Cyber-Centipede images

OMERO: providing access to imaging data

View, filter, measure raw images with direct links from journal article.

See all image data, not just cherry picked examples.

Download and reprocess.

Page 16: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

OMERO: Adding value

Page 17: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

The alternative...

...look but don't touch

Page 18: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Workflows 1. Galaxy

galaxyproject.org

Page 19: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

galaxy.cbiit.cuhk.edu.hk

Page 20: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Implement workflows in a community-accepted format

http://galaxyproject.org

Over 45,000 main Galaxy server users

Over 1,000 papersciting Galaxy use

Over 55 Galaxyservers deployed

Open source

Page 21: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Copyright NBAF-B 2013Tool list Tool parameterisation Results panel

Implement workflows in an intuitive format

Page 22: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Visualising Workflows

Page 23: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Birmingham Metabo-Galaxy Workflow

Page 24: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Birmingham Metabo-Galaxy

Tools wrapped in Python and XMLUser sees web form (easy!)Data stored centrally (secure!)Work done centrally (easy update)

Page 25: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility
Page 26: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Hosting Workflows

Page 27: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Hosting Workflows

1) Test data2) Software files3) Instructions+ Galaxy implementation

Page 28: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Can we reproduce results? SOAPdenovo2 S. aureus pipeline

Page 29: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

GalaxyMost accessible

Easy to share (galaxy toolshed)Quite a bit of work

Doesn't include publication explanations

Page 30: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Workflows2. Executable Docs

Page 31: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Open lab books, dynamic documents• Facilitate reuse and sharing with tools like: Knitr, Sweave,

iPython Notebook

Sweave

• Working towards executable papers…

Page 32: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

E.g.

Page 33: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

E.g.

Page 34: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Some testimonials for KnitrAuthors (Wolfgang Huber)“I do all my projects in Knitr. Having the textual explanation, the associated code and the results all in one place really increases productivity, and helps explaining my analyses to colleagues, or even just to my future self.”

Reviewers (Christophe Pouzat) “It took me a couple of hours to get the data, the few custom developed routines, the “vignette” and to REPRODUCE EXACTLY the analysis presented in the manuscript. With few more hours, I was able to modify the authors’ code to change their Fig. 4. In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewer’s job much more fun!

Page 35: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Executable docs:Completely reproduce paper!May require some code-skills

Page 36: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Workflow accessibility:VMs

Page 37: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Why VMs?

• OS settings• Dependencies

– Versions– e.g. python!

• Data + Code linked• Download or run in

cloud

Page 38: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

VMs in GigaDB

Page 39: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

VMs:Can host Galaxy

Can hold KnitR codeProvides 'snapshot' of working system

Page 40: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Summary

Page 41: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Share data in GigaDBShare all images in GigaDB-View images via OMERO

Share code in GigaDB!Share pipeline using:

Executable docs!Galaxy!

VMs!

Page 42: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Give us data, papers & pipelines*

Improve reproducibility!

[email protected] [email protected] [email protected]

Contact us:

* APC’s currently generously covered by BGI until 2015

www.gigasciencejournal.com

Page 43: Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScience

blogs.biomedcentral.com/gigablog/

Peter LiHuayan Gao Chris HunterJesse Si ZheNicole NogoyLaurie GoodmanAmye Kenall (BMC)

Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford)

www.gigadb.orggalaxy.cbiit.cuhk.edu.hk

www.gigasciencejournal.com

CBIITFunding from:

Our collaborators:team: Case study: