Making Data Sharing Happen
-
Upload
anita-de-waard -
Category
Documents
-
view
890 -
download
1
description
Transcript of Making Data Sharing Happen
Making It Happen
March 19, 2013Anita de Waard
VP Research Data Collaborations, Elsevier [email protected]
Sustainable Data Preservation and Use
Making It Happen:
“What aspects/tools/capabilities/frameworks are related to this idea?”
• There are many different research databases– both generic (Dryad, Dataverse, …) and specific (NIF, IEDA, PDB, …)
• There are many systems for creating/sharing workflows (Taverna, MyExperiment, Vistrails, Workflow4Ever etc)
• There are many e-lab notebooks (LabGuru, LabArchives, LaBlog, etc)
• There are scores of projects, committees, standards, bodies, grants, initiatives, conferences for discussing and connecting all of this (KEfED, Pegasus, PROV, RDA, Science Gateways, Codata, BRDI, Earthcube, etc. etc)
• You can make a living out of this ;-)! (and many of us do…)
…but this is what scientists do:
Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook. The PI then tries to make sense of this,and writes a paper. End of story.
Why save research data?A. Data Preservation: – Preserve record of scientific process, provenance– Enable reproducible research
B. Data Use:– Use results obtained by others– Do better science!– Improve interdisciplinary work
C. Sustainable Models: – Technology transfer; societal/industrial development– Reward scientists for data creation (credit/attribution)– Long-term archiving
> 50 My Papers2 M scientists
2 M papers/year
Where The Data Goes Now:
Majority of data(90%?) is stored
on local hard drivesDryad:
7,631 filesDataverse:
0.6 M
Datacite: 1.5 M
Some data (8%?) stored in large,
generic data repositories
MiRB: 25k
PetDB: 1,5 k
TAIR: 72,1 k
PDB: 88,3 k
SedDB: 0.6 k
A small portion of data (1-2%?) stored in small,
topic-focuseddata repositories
> 50 My Papers2 M scientists
2 M papers/year
Key Needs:
Dryad: 7,631 files
Dataverse:0.6 M
Datacite: 1.5 M
MiRB: 25k
PetDB: 1,5 k
Majority of data(90%?) is stored
on local hard drives
Some data (8%?) stored in large,
generic data repositories
TAIR: 72,1 k
PDB: 88,3 k
SedDB: 0.6 k
A small portion of data (1-2%?) stored in small,
topic-focuseddata repositories
INCREASE DATA PRESERVATION
IMPR
OVE DAT
A USE
DEVELOP SUSTAINABLE MODELS
Objections (and rebuttals) to data sharing:Objection: Rebuttal:“Our lab notebooks are all on paper – it’s how we do things”
Graft tools closely on scientists’ daily practice
“I need to see a direct benefit of any effort I put in.”
Create tools to allow better insight in own and other’s results.
“I don’t really trust anyone else’s data – and don’t think they’ll trust mine”
Create social networking context and allow data owner to provide granular access control.
“I am afraid other people might scoop my discoveries”
=> Reward system moves from a competition to a ‘shared mission’
Prepare
Observe
Analyze
Ponder
Communicate
Prepare
Observe
Analyze
Ponder
Communicate
From insular ‘CoSI-Factories’…
…to shared experimental repositories:
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observations
Observations
Observations
Across labs, experiments: track reagents and how they are used
Prepare
Analyze Communicate
Prepare
Analyze Communicate
Observations
Observations
Observations
Compare outcome of interactions with these entities
…to shared experimental repositories:
Prepare
Analyze Communicate
Prepare
AnalyzeCommunicate
Observations
Observations
Observations
Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments
…to shared experimental repositories:
Think
• Grafting tools on workflow: create tailored metadata collection tools on mini-tablets in labs to replace paper notebook
• Direct rewards: through ‘PI-Dashboard’: allow immediate access/analysis of shared data: new science!
• Data sharing rewards: Data Rescue Challenge:: collect and reward stories/practices of data preservation/use in Earth/Lunar Science
• Improve data use: With NIF/Eagle-I: add antibodies as key ‘entities’ to paper, link to AB repository
Some examples:
c o n s o r t i u m
How do we make data use happen:• We are creating repositories of shared experiments: you
are part of a greater whole!• Collect and share stories and practices re. data use and
sustainable systems: “What gets to them?”• Develop system of rewards for data sharing: enable
demonstrably better science!• Work with grant agencies, repositories (generic/specific,
institutional, cross-national) to integrate and annotate existing datasets and enable cross-use
• Collectively pioneer long-term funding options; support/develop ‘shared mission’ funding challenges