Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li,...

1
Promoting data dissemination and reproducibili ty. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of: No space constraints, and unlimited data and workflow hosting in GigaDB and GigaGalaxy Open access, open data, and highly visible work freely available for distribution Collation of Data Citation Numbers by Thompson-Reuters Indexing of papers in PubMed Central, Google Scholar, etc. GigaScience journal (http://www.gigasciencejournal.com) is an Open Access, Open Data, scientific journal that aims “to revolutionize data dissemination, organization, understanding, and use.” Beyond the traditional publication, GigaScience is linked to GigaDB (http://www.gigadb.org), a new integrated database of ‘big-data’ studies from the life sciences. The initial goals of GigaDB are to assign DOIs to datasets to allow them to tracked and cited, and to provide a user-friendly web interface to provide easy access to selected GigaDB datasets and files. We will be working with authors to make the raw data, computational tools and data processing pipelines described in the GigaScience papers available and, where possible, executable on an informatics platform. We hope that by making both the data and processes involved in their analysis freely accessible, this novel form of publication will help articles published in GigaScience to have a much higher impact in the scientific literature, and maximize their reuse within the community. We also provide a rapid data release mechanism for datasets that are no associated with GigaScience articles that have not previously been published elsewhere by giving each the dataset a DOI, making them citable in a standard (and countable) manner in the reference section of papers that use these data. In future GigaDB will link with GigaDB (presented by OMERO), which is a data viewer for GigaDB image files to aid readers in visualizing the imaging datasets. GigaDB will also integration with the BGI Cloud, and with the Galaxy software tools to enable users to directly upload files to Galaxy for further analysis. To date, GigaDB comprises over 159 datasets (>30TB in size) – all under a CC0 Waiver. We host some very popular “multi-omics” datasets, including 88 of Hepatocellular carcinoma genomes from the Asia Cancer Research Group (http://doi.org/10.5524/100034 ), a number of datasets from the YH Asian reference individual including their diploid genome (http://dx.doi.org/10.5524/ 100038 ), and nanochannel-based genome map (http ://dx.doi.org/10.5524/ 100097 ) as well as the first published microbial genome on a Oxford Nanopore Technologies MinION TM sequencer ( http://dx.doi.org/10.5524/ 100102 ). Huayan Gao (GigaScience), Yong Zhang (BGI; China National Genebank) Shaoguang Liang (BGI- SZ), Alex Wong, Dennis Chan (BGI- HK), Tin-Lap Lee (CUHK), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), and Mark Viant (Birmingham Uni) Thanks to: With financial support from: Anatomy of a Data Publication Anatomy of a traditional Publication Data Idea Study Analysis Answer Metadata Disseminate your code and workflows via GigaGalaxy bioinformatics platform Share your raw and intermediary data even before the analysis is complete via GigaDB Fully describe the research in GigaScience and link the data with the workflows by DOI Briefly describe the aims and data collection, and draw conclusions.

Transcript of Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li,...

Page 1: Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit.

Promoting data dissemination and reproducibility.

Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman.

Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of:

• No space constraints, and unlimited data and workflow hosting in GigaDB and GigaGalaxy

• Open access, open data, and highly visible work freely available for distribution

• Collation of Data Citation Numbers by Thompson-Reuters • Indexing of papers in PubMed Central, Google Scholar, etc.

GigaScience journal (http://www.gigasciencejournal.com) is an Open Access, Open Data, scientific journal that aims “to revolutionize data dissemination, organization, understanding, and use.” Beyond the traditional publication, GigaScience is linked to GigaDB (http://www.gigadb.org), a new integrated database of ‘big-data’ studies from the life sciences. The initial goals of GigaDB are to assign DOIs to datasets to allow them to tracked and cited, and to provide a user-friendly web interface to provide easy access to selected GigaDB datasets and files.

We will be working with authors to make the raw data, computational tools and data processing pipelines described in the GigaScience papers available and, where possible, executable on an informatics platform. We hope that by making both the data and processes involved in their analysis freely accessible, this novel form of publication will help articles published in GigaScience to have a much higher impact in the scientific literature, and maximize their reuse within the community. We also provide a rapid data release mechanism for datasets that are no associated with GigaScience articles that have not previously been published elsewhere by giving each the dataset a DOI, making them citable in a standard (and countable) manner in the reference section of papers that use these data. In future GigaDB will link with GigaDB (presented by OMERO), which is a data viewer for GigaDB image files to aid readers in visualizing the imaging datasets. GigaDB will also integration with the BGI Cloud, and with the Galaxy software tools to enable users to directly upload files to Galaxy for further analysis.

To date, GigaDB comprises over 159 datasets (>30TB in size) – all under a CC0 Waiver. We host some very popular “multi-omics” datasets, including 88 of Hepatocellular carcinoma genomes from the Asia Cancer Research Group (http://doi.org/10.5524/100034), a number of datasets from the YH Asian reference individual including their diploid genome (http://dx.doi.org/10.5524/100038), and nanochannel-based genome map (http://dx.doi.org/10.5524/100097) as well as the first published microbial genome on a Oxford Nanopore Technologies MinIONTM sequencer (http://dx.doi.org/10.5524/100102).

Huayan Gao (GigaScience), Yong Zhang (BGI; China National Genebank) Shaoguang Liang (BGI-SZ), Alex Wong, Dennis Chan (BGI-HK), Tin-Lap Lee (CUHK), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), and Mark Viant (Birmingham Uni)

Thanks to:

With financial support from:

Anatomy of a Data Publication

Anatomy of a traditional Publication

Data

Idea

Study

Analysis

Answer

Metadata

Disseminate your code and workflows via GigaGalaxy bioinformatics platform

Share your raw and intermediary data even before the analysis is complete via GigaDB

Fully describe the research in GigaScience and link the data with the workflows by DOI

Briefly describe the aims and data collection, and draw conclusions.