Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow...

22
Managing and sharing data Sarah Jones DCC, University of Glasgow [email protected] Twitter: @sjDCC ERC Workshop on Research Data Management and Sharing 18-19 September 2014 , Brussels Funded by:

Transcript of Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow...

Page 1: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Managing and sharing data

Sarah Jones

DCC, University of Glasgow [email protected]

Twitter: @sjDCC

ERC Workshop on Research Data Management and Sharing

18-19 September 2014 , Brussels

Funded by:

Page 2: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

European Research Council policy

Commitment to open science from the start:

"it is the firm intention of the ERC Scientific Council to issue specific guidelines for the mandatory deposit in open access repositories of research results – that is, publications, data and primary materials – obtained thanks to ERC grants, as

soon as pertinent repositories become operational."

Statement on Open Access, December 2006

Image CC BY-SA 3.0 by Greg Emmerich www.flickr.com/photos/gemmerich/6365692655

Page 3: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Why make data available?

Page 4: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Sharing leads to breakthroughs

www.nytimes.com/2010/08/13/health/research /13alzheimer.html?pagewanted=all&_r=0

“It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.”

Dr John Trojanowski, University of Pennsylvania

... increases the speed of discovery

Page 5: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Returns for institutions

“If an institution spent A$10 million on data, what would be the return? The answer is: more publications; an increased citation count; more grants; greater profile; and more collaboration.”

Dr Ross Wilkinson, ANDS www.ariadne.ac.uk/issue72/oar-2013-rpt

Page 6: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Researchers get a citation boost

“Publicly available data was significantly (p = 0.006) associated with a 69% increase in

citations, independently of journal impact factor, date of publication, and author

country of origin using linear regression.”

Piwowar H., Day, R and Fridsma, D. (2007) Sharing detailed research data is associated with increased citation rate. DOI: 10.1371/journal.pone.0000308

Page 7: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

But, there are also barriers...

Who owns the data?

• Researchers?

• University?

• Commercial partners?

• Funders?

• …

People are often misinformed about who owns the data. It is particularly hard to determine in international projects or ones with industry.

Restrictions on sharing

• Patentable data

• Commercial sensitivities

• Personal, identifiable data

• Lack of consent

• …

There are legitimate reasons to agree embargo periods, impose conditions, or to share only some of the data.

However, these are often given as reasons not to share data at all.

www.dcc.ac.uk/sites/default/files/documents/events/ workshops/IHW-2013/UKDA-barriers-to-data-sharing.pdf

Page 8: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

And opportunity costs

By Emilio Bruna http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690

For his most recent paper:

1. Double checking the main dataset and reformatting to submit to Dryad: 5 hours

2. Creating complementary file and preparing metadata: 3 hours

3. Submission of these two files and the metadata to Dryad: 45 minutes

4. Preparing a map of the locations: 1 hour

5. Submission of map to Figshare: 15 minutes

6. Cleaning up and documenting the code, uploading it to GitHub: 25 hours

7. Cost of archiving in Dryad: US$90

8. Page Charges: $600

Page 9: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

What needs to change?

Conclusions from Emilio Bruna:

• Develop a better system of incentives from the

community for archiving data and code

• Teach our students how to do this NOW - it’s much easier

if you develop good habits early

• Minimise the actual and opportunity costs

We need to stop telling people “You should” and get

better at telling people “Here’s how”

Page 10: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

What is involved in data curation

• Data Management Planning

• Data creation

• Annotating / documenting data

• Analysis, use, versioning

• Storage and backup

• Publishing papers and data

• Preparing for deposit

• Archiving and sharing

• Licensing

• Citing…

Plan

Create

Document

Use

Publish

Share

Page 11: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Data Management Plans

Brief plans to determine how data will be created, managed and shared. DMPs usually cover:

1. Description of data to be collected / created

2. Standards and methodologies for data collection & management

3. Any issues or restrictions due to ethics and Intellectual Property

4. Plans for data sharing and access

5. Strategy for long-term preservation

DMPs are often submitted as part of grant applications, but are useful whenever you’re creating data.

Page 12: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Help with DMPs

https://dmponline.dcc.ac.uk

A web-based tool to help researchers write data management plans

www.dcc.ac.uk/sites/default/files/documents/resource/DMP_Checklist_2013.pdf

Framework for creating a DMP

A list of common elements explaining why they are important and giving example answers www.icpsr.umich.edu/icpsrweb/content/ datamanagement/dmp/framework.html

Examples plans www.dcc.ac.uk/resources/data-

management-plans/guidance-examples

Page 13: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Managing and sharing data: a best practice guide

http://data-archive.ac.uk/media/2894/managingsharing.pdf

Page 14: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Training materials

FOSTER project

• Open science training

• Courses across EU

• Portal to OA materials

• Guidance on Horizon 2020

• Free online training course

• Aimed at PhD students

• Case studies, quizzes etc

• Data handling tutorials – R

– SPSS

– ArcGIS

– Nvivo

http://datalib.edina.ac.uk/mantra

www.fosteropenscience.eu

Page 15: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

DCC tools catalogue

A catalogue of RDM tools for different audiences. Tools for researchers focus on data handling, managing workflows, citation and impact.

www.dcc.ac.uk/resources/external/tools-services

Page 16: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Tools to help with RDM activities

impactstory.org

owncloud.org

thedata.org

www.datacite.org

dataup.cdlib.org

www.myexperiment.org

www.taverna.org.uk

www.labtrove.org

Documentation & metadata

Workflow management

Storage & collaboration

Citation & impact

Page 17: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Metadata standards catalogue

Use standards wherever possible for interoperability

www.dcc.ac.uk/resources/

metadata-standards

Page 18: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Data repositories

http://databib.org

http://service.re3data.org/search

Page 19: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

1. How do you foster open science?

• Make it feasible to comply – provide tools and infrastructure

• Train people early in their careers

• Incentivise openness

• Listen to researchers and learn from their experience about what doesn’t work

• Follow up on any demands made in policies

Page 20: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

2. Who is responsible for providing infrastructure and support?

Funders

Discipline

Institution

Third-party

services

National provider

Data centres e.g. via NERC

Institutional support for discipline-specific tools e.g. Monash MeRC partnership on tools like OMERO

National brokerage of deals with third-party providers e.g. Jisc Janet deals with Arkivum

And what about co-ordination?

Page 21: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

3. Who should pay?

Funding Research Data Management

"A conversation with the funders”

The DCC held a special event on this topic in the UK, but there’s still a long way to go www.dcc.ac.uk/events/research-data-management-forum-rdmf/rdmf-special-event-funding-research-data-management

Page 22: Sarah Jones DCC, University of Glasgow · PDF fileSarah Jones DCC, University of Glasgow sarah.jones@glasgow.ac.uk ...     . DCC tools catalogue

Thanks – any questions?

DCC guidance, tools and case studies:

www.dcc.ac.uk/resources

Follow us on twitter:

@digitalcuration and #ukdcc