005 Andrew Hufton ScientificData BSBD

19
FROM DATA REPOSITORIES TO DATA JOURNALS – WHERE, WHEN AND HOW TO SUBMIT Andrew L. Hufton Managing Editor, Scientific Data Nature Publishing Group [email protected] Publishing better science through better data Nov 14 th , 2014

description

scientific data

Transcript of 005 Andrew Hufton ScientificData BSBD

Manuscript transfer traffic within NPG

From data repositories to data journals where, when and how to submit

Andrew L. HuftonManaging Editor, Scientific DataNature Publishing [email protected] better science through better data Nov 14th, 2014

What do authors and readers sayStoring Data: The majority of participants say that some or all of their data is stored locally and not published.

2What do authors and readers say

Using Data: The majority of participants look for other researchers datasets, with more than half doing so once a month or more for each data type, and between a fifth and a quarter doing so once a week or more frequently.

~60% in public repositories, ~40 % in supp info~50 % from colleagues

3Find the right repository for your data

Find the right repository for your dataWhat to look for in a data repository

Quality curation

A commitment to long-term preservation

Features that support collaborative analysis

Features that allow you keep data private until you are ready to publish.

Investigate data archiving options at your institution http://www.nature.com/scientificdata/for-authors/data-deposition-policies/#recommended-repositories

5Find the right repository for your dataBrowse our recommended data repository online. We currently list more than 60 repositories, across the biological, physical and social sciencesWe advise authors on the best place to store their data

http://www.nature.com/scientificdata/for-authors/data-deposition-policies/#recommended-repositories

6Find the right repository for your dataWhen a specific data repository does not exist for your field, we recommend:

http://www.nature.com/scientificdata/for-authors/data-deposition-policies/#recommended-repositories

7Publish your data

The Data Journal conceptData must be well described before others can use it and benefit from it.

Scientists who share data in a reusable manner deserve credit through citable publications.

Data quality matters9A diversity of new data journals201220132014In PubMed39PendingGigascience (Data Notes)119YesF1000R (Data Notes)154YesBiodiversity Data Journal (Data Paper)19YesEarth System Science Data520NoUbiquity metajournalsJournal of Open Archaeology Data923NoOpen Health Data57NoJournal of Open Psychology Data16No10

Data publications per yearNow Live!

Scientific Data launched in May 2014, introducing a new type of content called the Data Descriptor designed to make data more discoverable, interpretable and reusable. Check out our first publications online.

Our Data Descriptors fall broadly into two categoriesFirst descriptions of datasetsThese often describe valuable, unpublished datasets that may be hard to fit into a traditional research article context. See our first publications for clear demonstrations that Scientific Data can help motivate scientists to share valuable datasets that might not have otherwise seen the light day. Follow-up articlesThese articles provide fuller descriptions and more complete release of datasets analysed in previous publications. In these cases, the value of the underlying datasets is often already well-demonstrated, but for groundbreaking studies, where there are not established standards or data repositories, a substantial amount of additional information is often needed before others can actually reuse the data. Data Descriptors at Scientific Data help motivate the authors to release datasets more fully, and the Data Descriptor manuscripts can provide more detailed descriptions of the data collection methods and the data file formatsessential information for others who may wish to reuse the data.

11Get Credit for Sharing Your DataPublications will be indexed and citeable.

Open-accessArticles are published by default under a Creative Commons Attribution licence (CC BY). Each publication supported by CCO metadata.

Focused on Data ReuseAll the information others need to reuse the data; no interpretative analysis, or hypothesis testing

Peer-reviewedRigorous peer-review focused on technical data quality and reuse value

Promoting Community Data RepositoriesNot a new data repository; data stored in community data repositories

Key features of Scientific Data12When might you submit a manuscript to a data journal?

Publish your data early

Publish a data paper alongside your research publications

Describe standalone datasets that dont fit in your other publications

Release data used in your previous research articles

13

Publish early: screening dataFull screen data for RNAi knockdown of 238 genesData at figshare & GenomeRNAiFindings from specific hits published later at PLOS One

Screening data can be published independently, giving credit to the scientists that performed the screen.

In this case, the authors published the full screening data first, and then published findings derived from specific hits later in PLOS One. 14Publish alongside: major consortiumsSee the Focus on RNA sequencing quality control (SEQC) In the September issue of Nature Biotechnology

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control ConsortiumSEQC/MAQC-III Consortium | doi:10.1038/nbt.2957

The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundanceWang et al. | doi:10.1038/nbt.3001

Cross-platform ultradeep transcriptomic profiling of human reference RNA samples by RNA-SeqXu et al. | doi:10.1038/sdata.2014.20

Transcriptomic profiling of rat liver samples in a comprehensive study design by RNA-SeqGong et al. | doi:10.1038/sdata.2014.21

Publish after: Earth sciencesData in at BODC/NERCBuilds on previous article at Nature Geoscience

In this Data Descriptor, Hao et al. describes a series global drought indicator datasets. The authors recently used these data to produce a dramatic visualization of the current drought conditions in California in a recent Letter at Science, Australias Drought: Lessons for California (generating Scientific Datas first citation). Using our Data Descriptor, anyone can download the base data, generate similar maps for any region of the globe, past or future, and even recalculate the drought metrics using the authors own source code.

16

Publish standalone dataCode in GitHubNew DatasetData in OpenfMRISource code in GitHubBig Data

In this Data Descriptor, the authors present a high-resolution brain imaging dataset from participants listening to an audio version of 'Forrest Gump. This rich dataset is being used in a brain cognition challenge organized by the authors. The Data Descriptor includes rich methodological details, and links to source code stored at github to help maximize the reusability of these data. 17Get the most from your data

Preserve itEncourage reuseGet credit

Now launched!

Visit nature.com/scientificdata

Email [email protected]

Tweet@ScientificData

Managing Editor, Scientific DataAndrew L. [email protected]

Honorary Academic EditorSusanna-Assunta Sansone

Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators

Thanks!Supported by

Scientific Data is an open-access, peer-reviewed publication for descriptions of scientifically valuable datasets. Our primary article-type, the Data Descriptor, is designed to make your data more discoverable, interpretable and reusable.19Chart1

How datasets are typically looked for

Sheet1SalesStart by using a search engine (e.g. Google)62 %Go directly to a specific website34 %Other4 %To resize chart data range, drag lower right corner of range.

Chart10.02836879430.02403846150.0240.06382978720.10576923080.0640.04609929080.06250.0760.08865248230.12019230770.0760.14893617020.15865384620.1240.15957446810.16826923080.1720.58156028370.56730769230.5760.77659574470.74519230770.736

Text or numerical data in spreadsheet formatText or numerical data in other file formatsImages or video

Sheet1Column1OtherDepositedin a public data repositoryMade availableon my own websiteStored in a private, paid-for archiveDeleted themPut into supplementary information with my journal articlesShared with colleagues and collaboratorsStored locally, not publishedText or numerical data in spreadsheet format3%6%5%9%15%16%58%78%Text or numerical data in other file formats2%11%6%12%16%17%57%75%Images or video2%6%8%8%12%17%58%74%To resize chart data range, drag lower right corner of range.

Chart10.2040.3410.1630.0850.1030.1030.2380.2920.150.0620.0980.160.2530.3180.1680.0670.0880.106

At least once a weekAt least once a monthAt least once every three monthsAt least once every six monthsLess than once every six monthsNever

Sheet1Column1Text or numerical data in spreadsheet formatText or numerical data in other file formatsImages or videoAt least once a week20%24%25%At least once a month34%29%32%At least once every three months16%15%17%At least once every six months9%6%7%Less than once every six months10%10%9%Never10%16%11%To resize chart data range, drag lower right corner of range.