What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

21
ALEX HENDERSON & PETER GARDNER MANCHESTER INSTITUTE OF BIOTECHNOLOGY UNIVERSITY OF MANCHESTER, UK HTTP://GARDNER-LAB.COM & HTTP://CLIRSPEC.ORG SPEC 2014 Shedding New Light on Disease Kraków, Poland. 17-22 August 2014 WHAT’S MINE IS YOURS (AND VICE VERSA): DATA SHARING IN VIBRATIONAL SPECTROSCOPY

description

Presentation given at SPEC 2014, Krakow, Poland. 17-22 August 2014 [some slides do not display correctly, download the pdf for better quality] In our day-to-day practice we collect data, convert this to information, hopefully extract knowledge, and then pass this on to our peers, thereby advancing the global understanding of our field. This is a very linear process. What if we were to share our data? Have others take our information and combine it with their own? Such a branched process would likely result in more rapid discoveries and, potentially, a greater understanding. In order to facilitate data sharing we must define at least two interfaces with our peers; 1. A mechanism of them understanding the language of our data 2. A mechanism of passing on the context of our experiment Of course, both of these must work in reverse; we must understand their data and also their experimental context. These are separate yet related ideas. Our data are meaningless without context, but because we are ‘close to the action’ we do not explicitly document them. Recording the nature of our experiments can have benefits closer to home. Too often we find ourselves searching for results that we know we recorded, but have difficulty locating. Then there is the issue of recalling the exact experimental procedure involved in the sample preparation or data reduction. Documentation of these will lead to better laboratory practice all round. Earlier this year, a network of academic, clinical and industrial groups was constituted in the UK, with some international partners, to consider how best to push forward the use of infrared and Raman spectroscopies in the clinical arena: CLIRSPEC [1]. One of the work packages of the CLIRSPEC network is the development of standard protocols for data sharing. The work package falls, initially, into two parts; 1. How to easily and uniformly transfer our data between research teams and, by association, into an accessible archive. 2. How to record the provenance of our samples, the treatments they undergo, the experiments performed on them and the manner the resulting data was manipulated: the metadata. In this presentation we will outline the current position of the CLIRSPEC work package, both in terms of the performance of various candidate data formats (JCAMP-DX, SPC, netCDF, …), and the options for the recording of the metadata associated with the experimental procedure (controlled vocabularies, XML, RDF, ISA-TAB, …). Included here is the concept of a minimum reporting requirement for IR and Raman, particularly in the clinical arena, that we can all try to meet. None of this can happen without the buy-in of the community. We seek to engage everyone in a dialogue that will result in more consistent, and hopefully better, practice across all laboratories to further our understanding of clinical vibrational spectroscopy. [1] http://clirspec.org

Transcript of What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Page 1: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

ALEX HENDERSON & PETER GARDNER MANCHESTER INSTITUTE OF BIOTECHNOLOGY

UNIVERSITY OF MANCHESTER, UK HTTP://GARDNER-LAB.COM & HTTP://CLIRSPEC.ORG

SPEC 2014 Shedding New Light on Disease

Kraków, Poland. 17-22 August 2014

WHAT’S MINE IS YOURS (AND VICE VERSA):

DATA SHARING IN VIBRATIONAL SPECTROSCOPY

Page 2: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Sharing…

Page 3: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Why share?

Technique validation

Round-robins

Standard spectra for unknown identification

Standard operating procedure validation

Test visualisation schemes

Remote location of special samples

Remote location of special equipment

Page 4: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

What to share?

Raw data files

Eg. For testing data processing procedures

Metadata for sample preparation

Sample SOP

Metadata for experimental procedure/protocol

Acquisition SOP

Processed data to save doing it yourself

Page 5: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

What to share?

Raw data files

Eg. For testing data processing procedures

Metadata for sample preparation

Sample SOP

Metadata for experimental procedure/protocol

Acquisition SOP

Processed data to save doing it yourself

Page 6: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

How to give?

Pen drive

CD

Email

Dropbox

ftp server

Data repository

One-to-one

One-to-few

One-to-more

One-to-all Best solution

Page 7: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

How to receive?

Data in different file formats introduces a barrier to

end user

Disconnect between analysis software and file

format

Incorrectly/poorly coded formats require additional

information

(hyper)Spectral data disconnected from sample

treatments or acquisition protocols

Page 8: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Third-party data analysis suites

Package Author Platform

CytoSpec Peter Lasch MATLAB

hyperSpec Claudia Beleites R

ProSpect Paul Bassan MATLAB

SpecToolbox Matt Baker (and friends) MATLAB

Not an exhaustive list, email me your package info

Author must write import filter for each version of

each vendor’s formats

Page 9: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Writing import filters

Slow

Laborious

Steep learning curve

Potential for error

Incomplete filter without sufficient test data

No access to file format specification/detail

IP issues with proprietary formats (NDA)

Some limited to (32-bit) Windows (eg. DLL or DDE)

Page 10: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Objectives 2014 – 2017

Deve

lop

ing

Understanding of interaction of light with clinical samples

Strategies for pre-processing and statistical analysis in clinical

spectroscopy

Pro

toco

ls

Preparation of cells, tissue and biofluids for clinical spectroscopy

Inter-group data sharing

Evid

en

ce

Power of spectroscopy for use in the clinical arena

Requirements of instrumentation suitable for use in the clinic

Clinical Infrared and Raman Spectroscopy for Medical

Diagnosis

PARTNERS

ACADEMIC Peter Gardner

Matthew J Baker

Nicholas Stone

Julian Moger

Josep Sulé-Suso

Francis Martin

Sergei G Kazarian

Hugh J Byrne

Roy Goodacre

John M Chalmers

Alex Henderson

Peter Lasch

Ganesh

Sockalingum

Bayden Wood

Peter Weightman

Gianfelice Cinque

Peter Rich

CLINICAL Noel Clarke

Jonathan Shanks

Timothy Dawson

Charles Davis

Pierre Martin-Hirsch

Hugh Barr

Neil Shepherd

John McGrath

Jim Brown

Sam Janes

INDUSTRIAL Agilent

Bruker

Cobalt Light Systems

Coherent UK

Perkin Elmer

Renishaw

@clirspec http://clirspec.org/

Page 11: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

CLIRSPEC Work Package 6

Assess current spectral and image data attributes from the range of currently employed network instrumentation

Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding

Provide a single software target, easing the development of third party software and its uptake within the clinical arena

Investigate the utility of standard spectra for specific diseases

Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse

Page 12: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

CLIRSPEC Work Package 6

Assess current spectral and image data attributes from the range of currently employed network instrumentation

Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding

Provide a single software target, easing the development of third party software and its uptake within the clinical arena

Investigate the utility of standard spectra for specific diseases

Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse

Page 13: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Data format requirements

Operating system neutral

Scalable to large file sizes (futureproof)

Random access (don’t unzip before reading)

File format description available (NDA open)

Other software available that can read it

Quick to write and, more importantly, quick to read

Able to hold (encrypted) instrumental parameters

Enables round-tripping, no information loss

Page 14: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Open data formats – Spectra

JCAMP-DX Over 4 compression systems

Some code available

Grams SPC Understands spectroscopy types and units

Some import filters available

CSV/text Simple to read

Not scalable

Not suitable for images

Loss of metadata

Page 15: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Hyperspectral images

Grams SPC

Pixel indexing issues, needs help

ENVI

Manual spectrum-centric or image-centric access

May require IDL library

NetCDF-4

Self-describing, accessed via libraries

Compression and streaming available

Page 16: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

3D confocal and tomographic

NetCDF-4

Unlimited dimensionality

Optimised spectrum-centric or image-centric access

through ‘chunking’

Supported

Page 17: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Community input required

Data types that need to be supported

Irregularly shaped images

Collections of spectra

Discrete wavelength data (multispectral not

hyperspectral)

Time course (multiple dependent variables)

Software

Filters written, format testing etc.

THINKING and PLANNING!!

Page 18: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Registration at http://clirspec.org

Page 19: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Groups at http://clirspec.org

Page 20: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Updates at http://clirspec.org

Page 21: What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy

Remember…