My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

55
My Data, Our Data, Your Data: data reuse through data management Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc

description

My keynote talk for Eurocris2014, Rome. I make the case for reuse of research data, discuss the barriers and look at ways we are trying to overcome them.

Transcript of My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Page 1: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

My Data, Our Data, Your Data:data reuse through data management

Kevin Ashley Digital Curation Centre

www.dcc.ac.uk@kevingashley

[email protected]

Reusable with attribution: CC-BY The DCC is supported by Jisc

Page 2: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

2

A summary

• Why data reuse ?• What stops us ?• How data management helps• Harmonising the goals of research

administration and research• Barriers again• The case for reuse - again

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 3: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

3

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 4: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 42014-05-14

What is data curation ?

• “Maintaining, preserving and adding value to research data throughout its lifecycle”

• More than preservation:– Active management – dealing with change

• Less than preservation:– Lifecycle sometimes involves destruction

Page 5: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

5

DCC guidance

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 6: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

62014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

SWEDEN

DENMARK

CANADA

Page 7: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

7

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 8: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

8

What a paleontologist looks at

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Now100 million years ago

25m50m 75m

1m

Page 9: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

9

What a paleontologist looks at

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Now100 million years ago

25m50m 75m

1mNow 1 million years

750,000500,000100,000

Page 10: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

10

What an archaeologist looks at

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Now 1 million years

750,000500,000100,000

100,000 years ago75,00050,00025,000

Page 11: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

11

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 12: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

122014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

The Old weather project

Data for research, not from research

Page 13: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 132014-05-14

Page 14: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

14

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 15: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

15

Data reuse - messages

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Often your data tells stories that your

publications do not

Not all data comes from other researchers

One person’s noise is another person’s signal

Discipline-bounded data discovery doesn’t give us

all we need or want

Page 16: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 162014-05-14

Why care?

• Data is expensive – an investment• Reuse:

– More research– Teaching & Learning– Planning

• Impact – with or without publication• Accountability• Legal & regulatory requirements

Page 17: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

17

Why does this matter?

• Research quality– How close can we get to

the truth?• Research speed

– How quickly can we get to the truth?

• Research finance– How much does the

truth cost?

• Improving one or more of these is of interest to all actors:

• Researchers as data creators

• Researchers as data reusers

• Research institutions• Funders – hence

government and society

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 18: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 18

G8UK - Endorses OAOpen Data CharterPolicy Paper18 June 2013

2014-05-14

G8UK - Billigt offenen ZugangEine offene Daten CharterStrategiepapier.

Page 19: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

19

Funder requirements

• UK

• USA – NSF, NEH, NIH• Europe

• Most place burden on researcher – some on the institution

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

http://www.epsrc.ac.uk/about/standards/researchdata/Pages/policyframework.aspx

Page 20: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

20

RCUK policy - The 1-minute version

• Research data are a public good – make openly available in timely & responsible way

• Have policies & plans. Data with long-term value should be preserved & usable

• Metadata for discovery & reuse. Link publications & data

• Sometimes law, ethics get in the way. We understand.• Limited embargos OK. Recognition is important – always

cite data sources• OK to use public money to do this. Do it efficiently.

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 21: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY

EPSRC policy points

• Awareness of regulatory environment• Data access statement• Policies and processes• Data storage• Structured metadata descriptions• DOIs for data• Securely preserved for a minimum of 10 years

from last use2014-05-14

21

Compliance expected by 2015

Page 22: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 222014-05-14

DCC Policy Summary

http://www.dcc.ac.uk/resources/policy-and-legal

Page 23: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 232014-05-14

Findable, citable data has value

• Important to link publications to data (and vice versa)• Increases citations – of data & publication• Increases reuse (hence value)• But effects exist even without publication, if data is:

– Archived– Citable– Discoverable

MORAL: build a data registry

Page 24: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

24

What stops data reuse• Loss• Destruction• Pride• Gluttony• Ineptitude• Concealment• Bureaucracy• Complexity• Procrastination• Lack of potential

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 25: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 25

“Departments don’t have guidelines or norms for personal back-up and researcher procedure,

knowledge and diligence varies tremendously. Many have experienced moderate to

catastrophic data loss”

Incremental Project Report, June 2010

http://www.flickr.com/photos/mattimattila/3003324844/

2014-05-14

Page 26: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

26

What stops data reuse• Loss• Destruction• Pride• Gluttony• Ineptitude• Concealment• Bureaucracy• Complexity• Procrastination• Lack of potential

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 27: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

27

How people talk about data

• I put my data in figshare and I got a DOI for it• Not our data; the university’s data; my

funder’s data; the data; the people’s data; your data.

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 28: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

28

Data ownership – it’s messy

• You need ownership to make data free• Governments may assert this• Industrial collaborators – understanding role

of public funding• Research admin tracks the rules

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 29: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

29

ON METADATA

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 30: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

30

Disciplines – current state

• Typically specialised• Focussed on discipline-specific concerns• Frequently embedded – hence processing

required to expose independently• Historic failure to express generic concepts

generically– Place– Time

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 31: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 312014-05-14

Page 32: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 322014-05-14

Understanding Data Requirements

http://www.dcc.ac.uk/

Page 33: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 332014-05-14

Page 34: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 34

Data centres are good value!

• See Jisc reports on ADS, BADC, UKDA:• Returns on investment between 400% and

1200%

2014-05-14

Page 35: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

352014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 36: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

36

Integrity

• Not everyone publishes here

• Almost all fraud connected to unavailable data

• People suffer & die due to research fraud

• When your research is reproducible – it gets cited

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 37: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

37

Integrity – not without data• Cyril Burt

– Twin studies on intelligence.– Questioned 1976; now discredited

• Duke case– Data hiding leads to wasted treatments, clinical trials,

probable death & huge lawsuits• Dutch cases

– Stapel – 55 publications – “fictitious data”– Poldermans – fabricated data or negligence?

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

“The case for open data: the Duke Clinical Trials “– blog post, Kevin Ashley, http://www.dcc.ac.uk/news/case-open-data-duke-clinical-trials“Lies, Damned Lies and Research Data: Can Data Sharing Prevent Data Fraud?” – Doorn, Dillo, van Horik, IJDC 8(1); doi:10.2218/ijdc.v8i1.256

Page 38: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

38

Citability

• Making data available increases citations• Everyone – academic, funder, institution – loves

citations• Want evidence?

– Alter, Pienta, Lyle – 240%, social sciences *– Piwowar, Vision – 9% (microarray data)†– Henneken, Accomazzi – 20% (astronomy) #

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Page 39: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 392014-05-14How to cite data

What data to keep

Page 40: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

40

The Data Deluge is upon us

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Sensor’s ability to produce data outstrips IT’s ability to process it

Page 41: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 412014-05-14

Page 42: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 42

Roles and Responsibilities

What data to keep

2014-05-14

Page 43: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 43

Excuses – and responses• “People will ask questions”

– So use a data centre or repository• “It will be misinterpreted”

– Stuff happens. Also, openness encourages correction• “It’s not interesting”

– Let others be the judge – your noise is my signal• “I might get another paper out of it”

– Up to a point. We might get more research out of it• “I don’t have permission”

– A real problem. But solvable at senior level• “It’s too bad/complicated” –see above• “It’s not a priority”

– Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well

2014-05-14

See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/

Page 44: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

44

Should all data be open?

• NO• Many reasons – most to do with human

subjects• But data existence should always be open• Allows discovery & negotiation on use• Avoids pointless replication

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 45: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 45

Some conundrums

• Releasing genome data is OK when it’s:– An identified human subject– An anonymous human subject– Your pet dog– Another mammal– An insect– A plant– A virus

2014-05-14

Page 46: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

46

It’s amazing what people will share…

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 47: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

47

Data reuse from Hubble

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 48: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

482014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 49: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

49

Pimp your data –

make it findable & reusable

2014-04-25 Kevin Ashley, DCC – SocSciScot14 - CC-BY

Gking.harvard.edu/data

Page 50: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

50

Data is variable

• Not always textual• Not always tabular• Not always fixed – continual change• Not always clearly authored – think of archival

provenance• Not always associated with publication• Often with indistinct boundaries• Multi-dimensional and non-linear

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 51: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

51

Some messages for you

• Some things we need to know about data:– When/where/what is it about?– Who owns it– What rights apply– What it is derived from & how– What software may be associated– What data management plan applies– How do I gain access ?– Where is it ?– When was/will it be destroyed?

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 52: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

52

What about your data?

• If administrative data isn’t freely available, why not?

• Expose it in bulk – not just as a web page• Gain the value from your overheads!

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 53: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

53

What about collaboration?

• Collaborate within the university• Collaborate with partners• Collaborate with regional, national services• Not everything can be done well locally• Some examples…

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY

Page 54: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

Kevin Ashley – Eurocris2014 - CC-BY 54http://dataintelligence.3tu.nl/en/home/

http://www.sheffield.ac.uk/is/research/projects/

rdmrose

Choice of RDM training materials for librarians

Up-skilling for data

http://datalib.edina.ac.uk/mantra/libtraining.html

2014-05-14

Page 55: My data, your data, our data - increasing data value through reuse (Eurocris2014 keynote)

55

My message to researchers• The credit belongs to you• The data belongs to all of us• Share, and we all reap the

benefits

2014-05-14 Kevin Ashley – Eurocris2014 - CC-BY