JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts;...

13
JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these slides are machine-unfriendly) [email protected] ; http://wwmm.ch.cam.ac.uk [©Peter Murray-Rust: Reusable under Creative Commons]

Transcript of JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts;...

Page 1: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

JISC 2005

OPEN DATA!Peter Murray-Rust

University of Cambridge2005-04-12

Power corrupts; Powerpoint* corrupts absolutely (Tufte)(*under duress: these slides are machine-unfriendly)

[email protected]; http://wwmm.ch.cam.ac.uk

[©Peter Murray-Rust: Reusable under Creative Commons]

Page 2: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Need for Open Data

•Machine-understandable data•Vital for eScience and Semantic Web•Often micropublished (i.e. many independent articles)•Open capture highly variable•Bioinformatics – fairly successful•Chemistry – completely unsuccessful

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

Page 3: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Problems for Open Data

•Additional to Open Access•Requires explicit effort for Openness•Not recognised as a serious problem•Need culture change in funders, authors, editors, publishers (10 and 20)• Major problems are apathy and unclear or antagonistic licenses

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

Page 4: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Machine understandability

•ALL spectra are originally like this

• … hundreds more lines …

•but humans require pictures …

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

<spectrumData> <xaxis> <array dataType="xsd:float" units="unit:nm"> 199.1212 199.1212 199.1164 199.1164 199.1107 199.1107 199.1030 199.0977 199.0964 199.0898 199.0895 199.0893 199.0891 199.0858 199.0855 199.0853 199.0851 199.0680 199.0665 198.8244 198.8009 198.7893 198.6686 198.6686 198.6383 198.6079 198.5775 198.5775 198.5734 198.5682 198.5612 198.5612 198.5086 198.3795 198.2234 198.0650 198.0160 197.9815 197.9798 197.9307 197.7450 197.6996 197.6814 197.6768 197.6721 197.5620 197.5489 197.5115 197.5115 197.4962

Page 5: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Destruction by publication

•This was once machine-understandable data; the publisher has destroyed it

[©Peter Murray-Rust 2005: Text reusable under Creative Commons]

Page 6: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

“Open Access” is not enough

Once machine-understandable but no longer. Now only human-readable

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

Page 7: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Photons may be good enough for HUMANS…

but

MACHINES need DATA

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

Page 8: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Open Access and Open Data

•Many OA evangelists don’t understand data

•OA only guarantees access to human retinas

•Much current OA says nothing about re-use and redistribution of text or data

•NOT PDF, NOT Word, NOT Powerpoint

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

Page 9: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Publishers Destroy Scientific Data

•80-99% of high-quality scientific data never leaves the laboratory•Few publishers support the publication of eData. They trash it•Few support robotic extraction•Many actively forbid it•Others sell it back to the originators

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

Page 10: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Non-OpenData policy (ACS)

What is important to realize is that a subscription to an STM journal is no longer [...] a subscription; in fact, it is an access fee to a database maintained by the publisher

•RUDY M. BAUM, Editor-in-Chief, C&E News,September 20 2004 Volume 82, Number 38 p. 7

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

Page 11: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

ACS Copyright on Supplemental Data

Electronic Supporting Information files * are available without a subscription to ACS Web Editions. All files are copyrighted by the American Chemical Society. Files may be downloaded for personal use; users are not permitted to reproduce, republish, redistribute, or resell any Supporting Information, either in whole or in part, in either machine-readable form or any other form. For permission to reproduce this material, contact the ACS Copyright Office …

(*i.e. scientific FACTS provided by the author – PM-R)

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

Page 12: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Recommendations for OpenData

•Funders must promote publication

•Authors should use Creative Commons

•Editors to promote publication at source

•Publishers should provide data licenses

•Citation junkies and RAE to credit data

•Institutional repositories to encourage

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

[email protected]; http://wwmm.ch.cam.ac.uk

Page 13: JISC 2005 OPEN DATA! Peter Murray-Rust University of Cambridge 2005-04-12 Power corrupts; Powerpoint* corrupts absolutely (Tufte) (*under duress: these.

Human readability

•Humans want spectra like this

• but publishers require…

[©Peter Murray-Rust 2005: Reusable under Creative Commons]

<spectrumData> <xaxis> <array dataType="xsd:float" units="unit:nm"> 199.1212 199.1212 199.1164 199.1164 199.1107 199.1107 199.1030 199.0977 199.0964 199.0898 199.0895 199.0893 199.0891 199.0858 199.0855 199.0853 199.0851 199.0680 199.0665 198.8244 198.8009 198.7893 198.6686 198.6686 198.6383 198.6079 198.5775 198.5775 198.5734 198.5682 198.5612 198.5612 198.5086 198.3795 198.2234 198.0650 198.0160 197.9815 197.9798 197.9307 197.7450 197.6996 197.6814 197.6768 197.6721 197.5620 197.5489 197.5115 197.5115 197.4962