Research data and scholarly publications: going from casual acquaintances to something more

Post on 20-May-2015

1.202 views 0 download

Tags:

description

Presented to ALPSP annual meeting 2011 in Oxfordshire UK during a session entitled "Abort Retry Fail? Data and the scholarly literature"

Transcript of Research data and scholarly publications: going from casual acquaintances to something more

Research data and scholarly publications:

Going from casual acquaintances to something more

Todd VisionDept of Biology, University of North Carolina at Chapel Hill

and the U.S. National Evolutionary Synthesis Center

ALPSP, September 2011Abort, Retry, Fail? Data and the scholarly literature

Peer-to-peer ‘sharing’ fails

Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.

“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The

poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728.

Info

rmat

ion

Co

nte

nt

Time

Time of publication

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.

Source: Publishing Research Consortium, http://publishingresearch.net

n=3824

Taxonomy of data archiving benefits

Modified from Beagrie et al. (2009) Keeping Research Data Safe 2

DirectVerification of published researchPreserving accessibility to dataAllowing reuse and repurposing of dataDiscoverability of data

Indirect (costs avoided)Redundant data collectionInefficient legacy data curation Burden of sharing-upon-requestOpportunity cost of science not done

Near termProtection against personnel turnoverAvailability for review and validation

Long termSecure long-term stewardshipIncreased impact per publication

PrivateIncreased citationsNew collaborations New research opportunitiesFulfilling funding mandates

PublicMore efficient use of research dollarsPublic trust in scienceEducational opportunitiesImproved methodologiesMore informed policy

10

Joint Data Archiving Policy (JDAP)

Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future.

As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.

Authors may elect to embargo access to the data for a period up to a year after publication.

Exceptions may be granted at the discretion of the editor, especially for sensitive information.

Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.

The long tail of orphan data in “small science”

Volu

me

Rank frequency of datatype

Specialized repositories(e.g. GenBank, PDB)

Orphan data

after B. Heidorn

“Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray

Smit E (2011) Abelard and Héloise: Why Data and Publications Belong Together. D-Lib Magazine doi:10.1045/january2011-smit

• The End To make data archiving and reuse standard part of research and

publishing.

• The Means Enable low-burden data archiving at the time of manuscript submission. Promote researcher benefits from data archiving. Promote responsible data reuse. Empower journals, societies & publishers in shared governance. Ensure sustainability and long-term preservation.

• The Scope Data underlying peer-reviewed articles in basic and applied biosciences.

Submit manuscript

Integrated

Manuscript metadata

Submit manuscript

Integrated

Prompt author

Submit data

Manuscript metadata

Submit manuscript

Integrated

Prompt author

Submit data

Manuscript metadata

Peer review

Review passcode

Submit manuscript

Integrated

Prompt author

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Submit manuscript

Integrated

Prompt author

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

Submit manuscript

Integrated

Prompt author

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

ArticlePublicatio

n

Data publicati

on

Submit manuscript

Integrated

Prompt author

Article DOI/final metadata harvested

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

ArticlePublicatio

n

Data publicati

on

Non-integrated

Submit data

Submit manuscript

Integrated

Prompt author

Article DOI/final metadata harvested

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

ArticlePublicatio

n

Data publicati

on

Non-integrated

Submit data

Author includes

data DOI

Data DOI

Submit manuscript

Integrated

Prompt author

Article DOI/final metadata harvested

Submit data

Manuscript metadata

Peer review

Review passcode

Acceptance notification Curation

Data DOIProduction

Article metadata Curation

ArticlePublicatio

n

Data publicati

on

Non-integrated

Submit data

Author includes

data DOI

Data DOI

Article publicati

on

DOI/final metadataharvested

Submit manuscript

Integrated

Prompt author

Article DOI/final metadata harvested

26

Dryad relative to Supplementary Online Materials

Dryad SOM

Discoverable: indexed and exposed to both web and bibliographic search engines ✔ ✗

Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers ✔ ✗*

Permanent: processes in place to promote preservation (incl. format migration) ✔ ✔/✗**

Curated: quality control by both automated processes and human inspection ✔ ✗*

Ease of deposit: streamlined deposit, allowance for large and complex datasets ✔ ✔/✗**

Formatted for reuse: support for non-PDF file formats ✔ ✔/✗**

Updatable: new versions of data files can be added, metadata can be enhanced ✔ ✗

Support for embargoes: can delay release of data in accordance with journal policy ✔ ✗

Free reuse: no paywall, clear terms of reuse (all data released under CC Zero) ✔ ✔/✗**

Economy of scale: cost efficiency from shared infrastructure ✔ ✔/✗**

Alignment to organizational mission: focus on archiving and reuse of scientific data ✔ ✗

* A few publisher SOM sites are exceptions to the general rule** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit

Article citationWu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M,

Venter JC, Eisen JA (2011) Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011

Data citationWu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M,

Venter JC, Eisen JA (2011) Data from: Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. Dryad Digital Repository. doi:10.5061/dryad.8384

Rebbeck CA, Leroi AM, Burt A (2011) Mitochondrial capture by a transmissible cancer. Science 331, 303

0

200

400

600

800

1000

Number of data packages

_x0

_x0

_x0

_x0

_x0

_x0

_x0

_x0

0

100

200

300

400

500Number of files

10

100

1000

10000

100000

100

1000...

1000...

1000...

0

100

200

300

400

500

Total data package size (bytes)

20 papers from Delsuc and Douzery going back to 2002

By now, downloaded >1000X

Fulfilling the role of a journal

Journal Dryad

Registration ✓ ✓

Certification ✓ (peer review)

✓ (curation)

Awareness ✓ ✓ (distribution)

Archiving ✓

Rewarding ✓ ✓

Does sharing imply that it need be altruistic?

Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.

• For a set of 85 cancer microarray clinical trials 48% had publicly available data These received 85% of the article

citations Independent of journal impact

factor, publication date, author nationality

Does sharing imply that it need be altruistic?

Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.

• For a set of 85 cancer microarray clinical trials 48% had publicly available data These received 85% of the article

citations Independent of journal impact

factor, publication date, author nationality

Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1

Data policies among bioscience journals

n=70

IF=3.6

IF=4.5

IF=6.0

The value proposition

• For researchers Increase the impact of, and citations to, published

research. Preserve and make data available to verify published

results, to refine methodologies, and to repurpose. Free researchers from the burden of data preservation and

access.

• For journals, publishers and societies Free journals from the burden of managing supplemental

data Increase the discoverability, impact, and integrity of

articles Increase their value to the community they serve.

• For funders A cost-effective mechanism to make research more

accessible Leverage existing investments in order to enable new

science

Sustainability and governance• Business model

Long-term preservation requires a long-term organization

In Dryad’s case, a membership-based nonprofit Revenue received from a broad array of ‘customers,

including journals, societies, publishers, and researchers

• Deposit charges Paid upfront, when the majority of costs are incurred Ensure free access to the data in perpetuity Allow revenue to naturally scale with costs (i.e. volume

of deposits) Distribute costs fairly among stakeholders

• Governance 12 member Board of Directors nominated, elected by

Membership Membership serves in advisory capacity, and is a

community of practice

Costs

• Moderate economies of scale are required At 10K packages/yr, <$50/deposit, depending on

curation

• What are the costs for SOM? Journal of Clinical Investigation: $300 flat fee Ecological Archives: $250 <10Mb, more fees beyond

that FASEB: $100 per file

Beagrie N, Eakin-Richards L, Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010

Proposed payment plans

1. Journal-based annual fee based on all research articles published/yr

(~$25/per*) covers any deposits from the journal (even from prior

yrs)

2. Voucher-based pay in advance for some number of deposits

(<$50/per deposit)

3. Pay-as-you-go: be invoiced retrospectively for deposits (>$50/per

deposit)

4. Author-pays Author pays online at time of deposit Journal can still facilitate archiving through

submission integration

* These are rates for Members, which include a 10% discount

What is the return on investment?

• A rigorous framework is lacking But we can look at comparators

• Marginal cost of data archiving $50/article is <2% of of publication costs (>$2.5K) And 0.2% of grant costs/article (~$25K)

• Is the data worth 2% of the research investment? Using DNA microarray data in GEO as a model 2,711 submissions in 2007 Data reused by 3rd parties in >1,150 articles

Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H, Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285

• http://datadryad.org• http://blog.datadryad.org• http://datadryad.org/wiki• http://code.google.com/p/dryad• dryad-users@nescent.org• @datadryad• Dryad

A very incomplete list of contributors

JDAP: M. WhitlockDryadUS. R. Scherle, E. Feinstein, J. Greenberg,

H. Piwowar, P. SchaefferDryadUK: B. Hole, Max Wilkinson, D. ShottonSustainability planning: N. Beagrie, L. Eakin-

Richards