The State of Open Research Data
-
Upload
ross-mounce -
Category
Education
-
view
751 -
download
0
description
Transcript of The State of Open Research Data
![Page 1: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/1.jpg)
The State of Open Research Data
Ross Mounce, Ph.D. (@RMounce)Postdoc, University of Bath
November 15, 2014
![Page 2: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/2.jpg)
bit.ly/stateofdataThese slides are on Slideshare here:
All textual content is
![Page 3: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/3.jpg)
Disclaimer
Summarising the state of open data is HARD
I'd love to have better data
& better evidence for this talk.
![Page 4: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/4.jpg)
Disclaimer #2
Whenever I talk about data in this talk, assume I'm
talking about non-sensitive data e.g.
NOT medical data
NOT bio-weapons research data
et cetera...
![Page 5: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/5.jpg)
Outline
● What is open data?
● The evolution of data availability
● Where are we now?
● Some goals & aspirations for the future
![Page 6: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/6.jpg)
What exactly is open data?
From http://opendefinition.org/, see http://opendefinition.org/od/ for more detail
Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)
![Page 7: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/7.jpg)
Centralised Data Centres
The Cambridge Crystallographic Data Centre, est. 1965
It maintains the Cambridge Structural Database **
** Not open data sensu stricto …but I'll leave that to Peter Murray-Rust to explain
![Page 8: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/8.jpg)
Data Sharing (by snail mail)
e.g. “The full profile listings are on floppy diskswhich are available upon request”
Fernholz et al (1989) A survey of measurements and measuring techniques in rapidly distorted compressible turbulent boundary layers.
![Page 9: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/9.jpg)
Bilofsky & Burks (1988) Nucleic Acids Research v16 n5
“The author will provide the accession number to the PROCEEDINGS [PNAS] office to be included in a footnote to the published paper.”
1989
![Page 10: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/10.jpg)
Reproducible research
Jon Claerbout,
Jon Buckheit & David Donoho, 1995
![Page 11: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/11.jpg)
Community agreements to share data
the Bermuda Principles for sharing DNA seq. data
● Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours).
● Immediate publication of finished annotated sequences.
● Aim to make the entire sequence freely available in the public domain
![Page 12: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/12.jpg)
Supplementary Data (Online)
Chen et al (1999) Fluorescence Polarization in Homogeneous Nucleic Acid Analysis. Genome Research
“Numerical values for thedata are available as online supplementary material at http://www.genome.org.”
![Page 13: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/13.jpg)
“Each custodian of data on plant traits will retain the right to be informed of any TRY activity that may involve his/her data, and will have the opportunity to negotiate whether his/her data can be used, and whether general guidelines of authorship need to be modified in that particular case
Custodians retain the rights to withdraw their data at any time.”
![Page 14: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/14.jpg)
Your data is NOT 'too big' to share
http://gigadb.org/dataset/100124
39 Gigabytes (GB) of MRI scans
![Page 15: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/15.jpg)
![Page 16: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/16.jpg)
By sharing data we can see further
Data (& code) are the building blocks of science
Shared, re-used data allow us to more rigorously test hypotheses; “to see further”
...and to do it all more quickly and easily.
![Page 17: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/17.jpg)
Real problems of non-open data:GBIF & biodiversity data
Desmet, P. (2013) Showing you this map of aggregated bullfrog occurrences would be illegal http://peterdesmet.com/posts/illegal-bullfrogs.html
![Page 18: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/18.jpg)
Many many options for open'ing data
Genbank,SRA,1000's more!
http://www.crystallography.net/
![Page 19: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/19.jpg)
...and getting more credit for it with 'Data paper' journals
http://www.mdpi.com/journal/data/about
![Page 20: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/20.jpg)
Intelligent data papers allow databases to automatically pull-in your data
Many publishers (e.g. Pensoft) intelligently markup data papers so that the data can be automatically ingested into appropriate db's on the day of publication!
Data
data
![Page 21: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/21.jpg)
Data sharing benefits authors & re-users
Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175
“...open data citation benefit for this sample to be 9%”
relative to papers providing no public data, for gene expression microarray data
10.7717/peerj.175/fig-2See also previous work by Piwowar:10.1371/journal.pone.0000308
CitationAdvantage
![Page 22: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/22.jpg)
Those who share data, do better science
Wicherts, J. M., Bakker, M. & Molenaar, D. (2011) Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS ONE 6, e26828+ URL http://dx.doi.org/10.1371/journal.pone.0026828
The authors examined psychological papers for the quality of statistical reporting & asked the authors of those papers for the full data underlying the reported results. Generally, those who shared, had more statistically robust, reproducible results.
![Page 23: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/23.jpg)
“Email the author for data” - doesnt work
Wicherts JM, Borsboom D, Kats J, Molenaar D (2006) The poor availability of psychological research data for reanalysis. American Psychologist 61: 726–728 link
A well-known problem, which I myself have also faced many times!!!
Many legacy journals unfortunately still pretend that “email the author” is still acceptable.
![Page 24: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/24.jpg)
Best practice open data is time consuming(but still worth the extra effort!)
Emilio M. Bruna recently provided an estimate of the amount of time it took him to prepare & upload open data related to publication to figshare & dryad.
http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690/
11 Hours
& $90(for Dryad)
Providing open-source code was the most time consuming part (25.5 hours), and Open Access publication the most expensive ($600).
![Page 25: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/25.jpg)
THIS IS WHERE WE ARE (mostly)
Most research data would get ZERO (not available online)Or just ONE star
http://5stardata.info/
![Page 26: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/26.jpg)
3-star open research data is achievable and desirable
This is where research data publication should be aiming for in the short term.Publishing .csv / non-proprietary open data is NOT actually that hard!
http://5stardata.info/
![Page 27: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/27.jpg)
Imagine a world where no-one shared their data (post-publication)
How would we know what was truth & what was lies / fraud / error?
Imagine the waste of time & resourcesif everyone had to re-generate data de novo every time
How would we make progress?
![Page 28: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/28.jpg)
Predictions for the (near) future
● Research funding bodies will tighten-up their rules to ensure immediate post-publication data sharing. No embargoes, no bullshit.
● If no published data comes from your funded research, it will negatively effect your future chances of funding
● Research institutions will significantly improve research data management training for ALL staff & students, old and new alike
● Good journals will strictly enforce mandatory data sharing. Journals that don't will get a bad reputation for irreprodcible research
● CC0 for data will become the de facto standard. Everyone will realise that legal protection under copyright is completely the wrong tool for ensuring the ethical use of data & appropriate authorship assignment.
![Page 29: The State of Open Research Data](https://reader034.fdocuments.us/reader034/viewer/2022052601/558e23901a28ab5d048b45b7/html5/thumbnails/29.jpg)
Thank you! Happy to answer all questions
[email protected]@RMounce
www.righttoresearch.org
www.sparc.arl.org