Post on 26-Jan-2015
description
#IDCC14 February 2014
Data Publication Etcetera at the CDL
Carly Strasser & John Kratz California Digital Library
@carlystrasser
Zooming out
mileskm
4060
Zooming out
From Wikimedia Commons
Back in the day…
From ahswhg.wikispaces.com
Back in the day…
Da Vinci
Curie Newton
classicalschool.blogspot.com
Darwin
Research has changed
Better
From wikimedia
Such Internet!
So many tools!
From Flickr by John Jobby
So much data!
Research has changed Worse
Digital data Fr
om F
lickr
by
Flick
mor
From
Flic
kr b
y US
Arm
y En
viron
men
tal C
omm
and
From
Flic
kr b
y D
W08
25
C. Strasser
Cour
tese
y of
WHO
I
From
Flic
kr b
y d
eltaM
ike
Digital data +
Complex workflows
“Reproducibility Crisis”
“Digital Dark Age”
“Erosion of Trust”
Can we fix science? the way we
communicate our
v All of the science
Early & often Transparently & openly
Zooming out
mileskm
4060
Zooming in
feetmeters
2000700
Data Publication @
John Kratz, CLIR Postdoc From Flickr by lindyjb
“Data Publication”
What does “data publication” mean?
Props to Sarah Callaghan
& colleagues!
What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy*
Data are
What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy*
Data are
*peer reviewed? certified?
Available | Citable | Trustworthy
Publish means to “make public”. You should not have to email the author. The data doesn’t have to be open access.
“Email me!” CC-0 on web
Simple case…
Data citations should be in reference list. Five-element citation: author, year, title, publisher, identifier
Available | Citable | Trustworthy
Boettiger C, Dushoff J, Weitz JS (2009). Data from: Fluctuation domains in adaptive evolution. Theoretical Population Biology. Published in Dryad. doi:10.5061/dryad.j8n0p7vc
More complicated…
Deep data citation: what if you want to cite a subset? Dynamic data: how to create a reliable citation when a dataset is changing?
Available | Citable | Trustworthy
Technical VS. Scientific
Sometimes consider impact and/or novelty
Guidelines provided
Available | Citable | Trustworthy
From Flickr by Percival Lowell
1. Data as supplemental material
Data published alongside a traditional journal article. Available + citable. Review varies. Potential issues with long-term availability.
What does a data publication look like?
From Flickr by subsetsum
2. Data paper: Data + descriptive “data paper”
Most require data be in a trusted repository. All have a component of peer review. Examples: • Standalone journals: Nature Scientific Data, Geoscience Data
Journal, Ecological Archives • Journals that publish data papers: GigaScience, F1000 Research,
Internet Archaeology
What does a data publication look like?
From Flickr by subsetsum
3. Standalone data
Data published without a related journal article. Rich metadata (structured or unstructured) Examples: • Open Context • NASA PDS Peer Review Data • figshare (but no validation)
What does a data publication look like?
From Flickr by subsetsum
“Publish”
“Paper”
“Peer review” “Sharing”
“Available”
“Article” “Publication”
From Flickr by Sandia Labs
C. Strasser
C. Strasser
World Bank Photo Collection From Flickr
What do researchers think of data publication?
• Publishing • Sharing • Citation • Peer review • Trustworthiness
Share with researchers! tinyurl.com/datapubsurvey
Academic
Govt Other
79% US | 21% Not
PI
Postdoc
Other
Grad student
Bio
Archaeology
Envi/earth Sci
Math, physics
Info Sci Other
Survey Demographics
Type of researcher Discipline
N=274
In the meantime…
UCSF
For all UCs
• Use institutional credentials to log in • Enter metadata & deposit data • Get identifier • Optional PDF download • Landing page is the publication
data publishing data sharing
Focus on solving simple bits first: Easy sharing � Citable datasets
Website Email Tweet Slides
Survey
carlystrasser.net carlystrasser@gmail.com @carlystrasser slideshare.net/carlystrasser
Big thanks to John Kratz, CLIR Postdoc
tinyurl.com/datapubsurvey