Research Data Management · 2017. 3. 27. · because good research needs good data Research Data...

1

Transcript of Research Data Management · 2017. 3. 27. · because good research needs good data Research Data...

Page 1: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

because good research needs good data

Research Data ManagementNanyang Technological University

09th March 2017

Kevin Ashley/Jonathan RansDigital Curation Centre

This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License.

Page 2: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Who we are

The (Est. 2004) is:

» A national-level centre of expertise in digital preservation with a particular focus on Research Data Management (RDM) and Open Research

» Working closely with a number of UK institutions to boost RDM capability across the HE sector

» Also involved in a variety of national and international collaborations

2017-03-08 DCC - NTU RDM workshop - CC-BY 2

Page 3: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

2017-03-08 DCC - NTU RDM workshop - CC-BY 3

What is data curation ?

“Maintaining, preserving and adding value to research data throughout its lifecycle”More than preservation:» Active management – dealing with change

Less than preservation:» Lifecycle sometimes involves destruction

Sometimes, not always, about publication or citationAlways about sharing in some way

Page 4: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

“an explicit process covering the creation and stewardship of research materials to

enable their use for as long as they retain value.”

What is research data management?

Plan

Create

Use

Appraise

Deposit and Publish

Discover and Reuse

“the active management and appraisal of data over the lifecycle of scholarly and scientific

interest”

Data management is part of good research practice

2017-03-08 DCC - NTU RDM workshop - CC-BY 4

Page 5: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

2017-03-08 DCC - NTU RDM workshop - CC-BY 5

Why care?

Data is expensive – an investmentReuse:» More research» Teaching & Learning» Planning

Impact – with or without publicationAccountabilityLegal & regulatory requirements

Presenter
Presentation Notes
\not just about opennness – think of seismic, drug industry. Protected data, but needs to be reused in other parts of company, or many years after creation when originators have gone. Need to know what you have & how to use it.
Page 6: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Why does this matter?

Research quality» How close can we get to

the truth?

Research speed» How quickly can we get to

the truth?

Research finance» How much does the truth

cost?

Improving one or more of these is of interest to all actors:Researchers as data creatorsResearchers as data reusersResearch institutionsFunders – hence government and society

2017-03-08 DCC - NTU RDM workshop - CC-BY 6

Presenter
Presentation Notes
For an audience such as this, I shouldn’t have to explain why data reuse is important. But just in case, and to explain why some things have happened the way they have, I’ll describe some of the drivers. Ensuring that all research data is discoverable and reusable increases the quality of the research that we do. It can add to the data we collect ourselves and can improve the statistical rigour of our results. Exposing data to scrutiny makes it more straightforward to validate or challenge the findings of others. Making data available also improves the speed with which we can do research. If someone else has already gathered the data we need (perhaps for a different end use), we can move directly to the analysis stage of our work, saving both time and money. And saving money increases the efficiency of research. We hope that the money saved lets us do more research, but even if it doesn’t society as a whole will gain. There’s evidence behind this that I’ll come to later, but it is an effective counter to those in some universities who feel that increasing funder requirements for data management simply leads to additional costs with no gain. There is a gain in all these areas, and hence every one of the actors – researchers, their employers, their funders, and society, should be motivated to make this happen.
Page 7: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

2017-03-08 DCC - NTU RDM workshop - CC-BY 7

Centres like these provide a return on investment of between 400% and 1200%

http://www.jisc.ac.uk/whatwedo/programmes/di_directions/strategicdirections/badc.aspx

Presenter
Presentation Notes
And we mustn’t forget that we’ve already got good homes for data in a number of subject areas in the UK. Some, such as the British Atmospheric Data Centre and the British Oceanographic Data Centre are not just national bodies, designated by NERC as appropriate homes for managed data outputs – they are international bodies with an international role. We’re lucky to have them close to home – but we can’t unilaterally decide how they operate. The archaeology data service is another reminder that not all data of interest to academic research emerges from academic endeavour. Much of the data that lives there is deposited by commercial bodies, usually as a prelude to property development that will destroy archaeological heritage. The 40+ years of UKDA are a reminder that we have a long history of expertise and practical knowledge to draw on for many aspects of data curation.
Page 8: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Integrity – not without data

Cyril Burt» Twin studies on intelligence.» Questioned 1976; now discreditedDuke case» Data hiding leads to wasted treatments, clinical

trials, probable death & huge lawsuitsDutch cases» Stapel – 55 publications – “fictitious data”» Poldermans – fabricated data or negligence?

“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256

2017-03-08DCC - NTU RDM workshop - CC-BY8

Page 9: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Why manage research data –The selfish view• To make research easier!

• To stop yourself drowning in irrelevant stuff

• In case you need the data later

• To avoid accusations of fraud or bad science• To comply with the law or regulations

• To share data so others can use and learn from it

• To get credit for producing the data

• Because it’s a condition of research funding2017-03-08DCC - NTU RDM workshop - CC-BY

9

Presenter
Presentation Notes
Data is increasing in significance. It will unquestionably matter to your research careers, more than it does to your supervisors’ generation. Learn good data habits now! You’ll need them later.
Page 10: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Digital data are fragile and susceptible to loss for a wide variety of reasonsNatural disasterFacilities infrastructure failureStorage failureServer hardware/software failureApplication software failureFormat obsolescenceLegal encumbranceHuman errorMalicious attackLoss of staffing competenciesLoss of institutional commitmentLoss of financial stabilityChanges in user expectations

Data loss

Image CC BY-NC-SA 2.0 by Dave Hill https://www.flickr.com/photos/dmh650/4031607067

Presenter
Presentation Notes
Digital data are fragile. There are lots of ways in which data can be lost. Hardware and software can fail, formats can become obsolete, you can lose the knowledge and skills needed to understand the data, and you can lose the investment needed to keep the data accessible.�Despite significant investment, data is not being managed effectively� The current estimated total global spend on research and development is $1.5 trillion, which could be at risk. Much of the data generated is lost – in one study, the odds of sourcing datasets declined by 17% each year. The same study found 80% of datasets over 20 years old not available.
Page 11: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Definitions of research data

“Research data is defined as recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings; although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created.“

“Evidence which is used or created to generate new knowledge and interpretations. ‘Evidence’ may be intersubjective or subjective; physical or emotional; persistent or ephemeral; personal or public; explicit or tacit; and is consciously or unconsciously referenced by the researcher at some point during the course of their research.”

2017-03-08 DCC - NTU RDM workshop - CC-BY 11

Page 12: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Excuses – and responses“People will ask questions”» So use a data centre or repository“It will be misinterpreted”» Stuff happens. Also, openness encourages correction“It’s not interesting”» Let others be the judge – your noise is my signal“I might get another paper out of it”» Up to a point. We might get more research out of it“I don’t have permission”» A real problem. But solvable at senior level“It’s too bad/complicated” –see above“It’s not a priority”» Unfortunately, funders are making it so. But if you looked at the

evidence, it would be your priority as well

2017-03-08 DCC - NTU RDM workshop - CC-BY 12

See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/

Presenter
Presentation Notes
Yet some researchers still aren’t convinced by the rhetoric. Carly Strasser at CDL has listed some of the reasons for not sharing data that she’s encountered – and here are some of my one-line responses. I’m not saying that the concerns aren’t sincere or reasonable but they can all be dealt with and some are positively misguided. The purpose of data centres, for instance, is to make data independently reusable (as stated in the OAIS standard) which relieves researchers of the burden of dealing with questions about it, at the same time as increasing the likelihood that their data will be cited.
Page 13: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Data reuse from Hubble

2017-03-08 DCC - NTU RDM workshop - CC-BY 13

Presenter
Presentation Notes
Many of you may be familiar with this graph from the Hubble Space telescope data archive. It tells the same story in a different way, and also tells a story about the transformation of astronomy as a discipline. In the days of photographic plates, sharing (analogue) astronomical data was difficult. Digital instruments transformed this, and some time around 2000, more research was being done with old data than with new data. I could be more specific about this if the data behind this graph was made available, incidentally!
Page 14: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

AGreenSkills 2014 – Climate Change and Evidence Based Medicine

Serge Planton –view of range of effects of climate changeHuge range of datasets, disciplines involvedDifficult to place a value on this work

2017-03-08 DCC - NTU RDM workshop - CC-BY 14

Page 15: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Evidence-based Medicine

Phillippe Ravauddescribed the meta-analysisMany studies, many (?incompatible) datasets involved

2017-03-08 DCC - NTU RDM workshop - CC-BY 15

Page 16: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

New research with old data

Synthesis allows new analysesResearch that cannot be done with any one of these datasets

2017-03-08 DCC - NTU RDM workshop - CC-BY 16

Page 17: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

2017-03-08 DCC - NTU RDM workshop - CC-BY 17

Presenter
Presentation Notes
Services like this make it easy when we want to locate two datasets, perhaps from two sub-disciplines, to combine – a common enough requirement.
Page 18: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

2017-03-08 DCC - NTU RDM workshop - CC-BY 18

Presenter
Presentation Notes
But increasingly we want to undertake combinations of hundreds or even thousands of individual datasets and to do so in a relatively automated way. In general, we don’t yet have services that make this straightforward.
Page 19: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

My messages to researchers

Sharing is difficultReusing is difficultBoth are key to advancing science, and advancing your own careerYour data can live longer than your findingsAll this can be easier than you think

2017-03-08 DCC - NTU RDM workshop - CC-BY 19

Page 20: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Make data citable

Making data available increases citationsEveryone – academic, funder, institution –loves citationsWant evidence?» Alter, Pienta, Lyle – 240%, social sciences *» Piwowar, Vision – 9% (microarray data)†» Henneken, Accomazzi – 20% (astronomy) #

2017-03-08 DCC - NTU RDM workshop - CC-BY 20

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Page 21: Research Data Management · 2017. 3. 27. · because good research needs good data Research Data Management Nanyang Technological University 09th March 2017 Kevin Ashley/Jonathan

Institutional support

http://www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services2017-03-08 DCC - NTU RDM workshop - CC-BY 21