Making Data Sharing Happen

Making It Happen

March 19, 2013Anita de Waard

VP Research Data Collaborations, Elsevier [email protected]

Sustainable Data Preservation and Use

Making It Happen:

mailto:[email protected]

“What aspects/tools/capabilities/frameworks are related to this idea?”

• There are many different research databases– both generic (Dryad, Dataverse, …) and specific (NIF, IEDA, PDB, …)

• There are many systems for creating/sharing workflows (Taverna, MyExperiment, Vistrails, Workflow4Ever etc)

• There are many e-lab notebooks (LabGuru, LabArchives, LaBlog, etc)

• There are scores of projects, committees, standards, bodies, grants, initiatives, conferences for discussing and connecting all of this (KEfED, Pegasus, PROV, RDA, Science Gateways, Codata, BRDI, Earthcube, etc. etc)

• You can make a living out of this ;-)! (and many of us do…)

http://www.oxfordjournals.org/nar/database/c/

http://www.force11.org/tools%23methods

http://www.nature.com/news/going-paperless-the-digital-lab-1.9881

http://www.force11.org/tools%23metadata

http://force11.org/

…but this is what scientists do:

Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook. The PI then tries to make sense of this,and writes a paper. End of story.

Why save research data?A. Data Preservation: – Preserve record of scientific process, provenance– Enable reproducible research

B. Data Use:– Use results obtained by others– Do better science!– Improve interdisciplinary work

C. Sustainable Models: – Technology transfer; societal/industrial development– Reward scientists for data creation (credit/attribution)– Long-term archiving

> 50 My Papers2 M scientists

2 M papers/year

Where The Data Goes Now:

Majority of data(90%?) is stored

on local hard drivesDryad:

7,631 filesDataverse:

0.6 M

Datacite: 1.5 M

Some data (8%?) stored in large,

generic data repositories

MiRB: 25k

PetDB: 1,5 k

TAIR: 72,1 k

PDB: 88,3 k

SedDB: 0.6 k

A small portion of data (1-2%?) stored in small,

topic-focuseddata repositories

> 50 My Papers2 M scientists

2 M papers/year

Key Needs:

Dryad: 7,631 files

Dataverse:0.6 M

Datacite: 1.5 M

MiRB: 25k

PetDB: 1,5 k

Majority of data(90%?) is stored

on local hard drives

Some data (8%?) stored in large,

generic data repositories

TAIR: 72,1 k

PDB: 88,3 k

SedDB: 0.6 k

A small portion of data (1-2%?) stored in small,

topic-focuseddata repositories

INCREASE DATA PRESERVATION

IMPR

OVE DAT

A USE

DEVELOP SUSTAINABLE MODELS

Objections (and rebuttals) to data sharing:Objection: Rebuttal:“Our lab notebooks are all on paper – it’s how we do things”

Graft tools closely on scientists’ daily practice

“I need to see a direct benefit of any effort I put in.”

Create tools to allow better insight in own and other’s results.

“I don’t really trust anyone else’s data – and don’t think they’ll trust mine”

Create social networking context and allow data owner to provide granular access control.

“I am afraid other people might scoop my discoveries”

=> Reward system moves from a competition to a ‘shared mission’

Prepare

Observe

Analyze

Ponder

Communicate

Prepare

Observe

Analyze

Ponder

Communicate

From insular ‘CoSI-Factories’…

…to shared experimental repositories:

Prepare

Analyze Communicate

Prepare

Analyze Communicate

Observations

Observations

Observations

Across labs, experiments: track reagents and how they are used

Prepare

Analyze Communicate

Prepare

Analyze Communicate

Observations

Observations

Observations

Compare outcome of interactions with these entities


Prepare

Analyze Communicate

Prepare

AnalyzeCommunicate

Observations

Observations

Observations

Build a ‘virtual reagent spectrogram’ by comparing how different entities interacted in different experiments


Think

• Grafting tools on workflow: create tailored metadata collection tools on mini-tablets in labs to replace paper notebook

• Direct rewards: through ‘PI-Dashboard’: allow immediate access/analysis of shared data: new science!

• Data sharing rewards: Data Rescue Challenge:: collect and reward stories/practices of data preservation/use in Earth/Lunar Science

• Improve data use: With NIF/Eagle-I: add antibodies as key ‘entities’ to paper, link to AB repository

Some examples:

c o n s o r t i u m

How do we make data use happen:• We are creating repositories of shared experiments: you

are part of a greater whole!• Collect and share stories and practices re. data use and

sustainable systems: “What gets to them?”• Develop system of rewards for data sharing: enable

demonstrably better science!• Work with grant agencies, repositories (generic/specific,

institutional, cross-national) to integrate and annotate existing datasets and enable cross-use

• Collectively pioneer long-term funding options; support/develop ‘shared mission’ funding challenges

Making Data Sharing Happen

Documents

Transcript of Making Data Sharing Happen