Data citations: who cares?

Post on 01-Nov-2014

1.181 views 0 download

Tags:

description

Who cares how research data is attributed and cited? Lots of people. Presented by Heather Piwowar to DataONE summer internship 2010 group on data citatio

Transcript of Data citations: who cares?

Data citation...Who cares?

Heather Piwowar

DataONE postdoc with Dryad and NESCentDataONE summer internship meeting 

July 7, 2010

http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm

http://www.flickr.com/photos/jsmjr/62443357/

http://www.flickr.com/photos/camilleharrington/3587294608/

http://www.flickr.com/photos/rkuhnau/3318245976/

http://www.flickr.com/photos/conformpdx/1796399674/

http://www.flickr.com/photos/rkuhnau/3317418699/

http://www.flickr.com/photos/zemlinki/261617721/

http://www.flickr.com/photos/tracenmatt/3020786491/

http://www.flickr.com/photos/the-o/2078239333/

Probably.

In theory.

?

• Genbank

• PDB

http://www.oxfordjournals.org/nar/database/cap/

http://www.flickr.com/photos/archeon/2941655917/

Data citation...

datasetpaper

paper

paper

paper

paper

paper

dataset

dataset

dataset

dataset

dataset

• Alas, no unique standard identifier• URL• accession number• DOI• citation to paper• citation to database• reference to supplementary material• search strategy

Example: full-text phrases containing “... accessed”

“submitted”

“downloaded”

• Citations are indexed and machine-extractable

datasetpaper

paper

paper

paper

paper

paper

dataset

dataset

dataset

dataset

dataset

• understand current practice• articulate the best best-practices

datasetpaper

paper

paper

paper

paper

paper

dataset

dataset

dataset

dataset

dataset

Who cares?

1.  Data creators

• personal reward• motivation:

• “if it really helped”• even esoteric datasets are useful

• how prevalent is scooping?• alert to possible misuses• grounded requirements

2.  Data reusers

• clear guidelines are helpful• what has been reused, for what?• what hasnʼt?

3.  Repository creators, maintainers

• funding• how much metadata• how to format• what additional tools are useful• lifecycle of data

4.  Funders

• most, best science for their money• cost/benefit of mandate• inform funding decisions:

• what has been extra useful?• what hasnʼt?

• what support is needed

5.  Journals

• increasingly called upon to mandate or fund:

• how to decide• how to rationalize

• another avenue to compete

6.  Information scientists

• extension of citation analysis for studying information behaviour

6.  Me

Articles published in journals

with a strong data-sharing

policy are more likely to have

publicly available datasets

Reuse estimate

• 2703 submissions in 2007 • GSE* in PubMed Central• Exclude author overlap• Exclude data creation

• automatically, manually

• 139

• 520

7.  You

8.  Your mom

9.  These mice

http://www.flickr.com/photos/ryanr/142455033/

10.  Scientific progress

• trace errors, fraud• increase transparency• more efficient and effective

you can not manage what you do not measure

quote: Lord Kelvinhttp://www.flickr.com/photos/archeon/2941655917/

science about our science

http://www.flickr.com/photos/druclimb/293046352/

questions?

Thanks to:

NSF, DataONE, NESCent, Dryad

UBC Dept of Zoology

NLM, U of Pittsburgh Dept of Biomedical Informatics

Open science online community and those who release their articles, datasets and photos openly