Dryad: Distributed Data-Parallel Programs from Sequential ...
Dryad UK discussion meeting Mark Patterson, Director of Publishing April 27, 2010
description
Transcript of Dryad UK discussion meeting Mark Patterson, Director of Publishing April 27, 2010
www.plos.org
Dryad UK discussion meetingMark Patterson, Director of Publishing
April 27, 2010
Committed to making the world’s
scientific and medical literature
a public resource
www.plos.org
Why share data?
• Complete picture of the work• Reliability of the
conclusions/recommendations• Developing alternative interpretations• Reusing the data for new analyses
– Data may be unique/precious
• Human participants deserve it– Whilst preserving confidentiality
www.plos.org
Consequences of not sharing data
• Misunderstanding• Uncorrected errors• Misrepresentation • Duplication of effort• Limits research impact
..at least 70 structures demonstrated to be falsified…
…the current problems could not have been easily discovered without the availability of the
structure-factor files
…the full data must be accessible for scrutiny by the
scientific community.
www.plos.org
Barriers to (effective) data sharing
• Technical barriers– Lack of infrastructure (database)– Lack of standards (formats)– Too much data
• Administrative and legal barriers– Lack of clarity of reuse terms– Lots of files to organize and process– Publishers don’t make it easy enough
• Cultural barriers– Sharing is not the norm– Insufficient incentives– Maximizing credit via publication encourages hoarding of data
www.plos.org
The role of publishers
• Policy requiring data sharing as a condition of publication
• Quality control of data• Providing incentives to share data
www.plos.org
Challenges to policy development
• Discipline-specific differences– Data sharing tradition/behaviour– Availability of an established database– Enforcing the right standards at the right time– Privacy/confidentiality issues
• Technical issues– Quantity of data– CC Zero Waiver
• Policing the policy– Making sure restrictions are clear before publication– Appropriate action after publication
www.plos.org
Quality control - image manipulation
• Images screened for inappropriate manipulation
• Most frequent problem is that original files cannot be found
• Should all raw data be submitted?
www.plos.org
Incentives
• Provide a forum for ‘data papers’• Indicators for the impact of datasets
– Make sure that datasets are properly cited
www.plos.org
PLoS Currents: Influenza Workflow
Google Knol: Author(s) assemble content and control access and editing. Authors submit content to PLoS Currents.
PLoS Currents: Moderators control posting of content, commenting and version control.
PubMed Central: Immediate transfer from PLoS Currents site; stable identifier and permanent archiving.
• PLoS Currents Influenza• Very fast• Very cheap• Moderated by experts • Citable• Version control• Archived at PubMed Central• Indexed in PubMed
http://www.plosone.org/enhanced/pone.0006975/
http://www.plosone.org/enhanced/pone.0006975/
(http://tiny.cc/ALM1)
“Article-level metrics” could be applied to datasets in Dryad