What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
-
Upload
alex-henderson -
Category
Science
-
view
246 -
download
3
description
Transcript of What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
ALEX HENDERSON & PETER GARDNER MANCHESTER INSTITUTE OF BIOTECHNOLOGY
UNIVERSITY OF MANCHESTER, UK HTTP://GARDNER-LAB.COM & HTTP://CLIRSPEC.ORG
SPEC 2014 Shedding New Light on Disease
Kraków, Poland. 17-22 August 2014
WHAT’S MINE IS YOURS (AND VICE VERSA):
DATA SHARING IN VIBRATIONAL SPECTROSCOPY
Sharing…
Why share?
Technique validation
Round-robins
Standard spectra for unknown identification
Standard operating procedure validation
Test visualisation schemes
Remote location of special samples
Remote location of special equipment
What to share?
Raw data files
Eg. For testing data processing procedures
Metadata for sample preparation
Sample SOP
Metadata for experimental procedure/protocol
Acquisition SOP
Processed data to save doing it yourself
What to share?
Raw data files
Eg. For testing data processing procedures
Metadata for sample preparation
Sample SOP
Metadata for experimental procedure/protocol
Acquisition SOP
Processed data to save doing it yourself
How to give?
Pen drive
CD
Dropbox
ftp server
Data repository
One-to-one
One-to-few
One-to-more
One-to-all Best solution
How to receive?
Data in different file formats introduces a barrier to
end user
Disconnect between analysis software and file
format
Incorrectly/poorly coded formats require additional
information
(hyper)Spectral data disconnected from sample
treatments or acquisition protocols
Third-party data analysis suites
Package Author Platform
CytoSpec Peter Lasch MATLAB
hyperSpec Claudia Beleites R
ProSpect Paul Bassan MATLAB
SpecToolbox Matt Baker (and friends) MATLAB
…
Not an exhaustive list, email me your package info
Author must write import filter for each version of
each vendor’s formats
Writing import filters
Slow
Laborious
Steep learning curve
Potential for error
Incomplete filter without sufficient test data
No access to file format specification/detail
IP issues with proprietary formats (NDA)
Some limited to (32-bit) Windows (eg. DLL or DDE)
Objectives 2014 – 2017
Deve
lop
ing
Understanding of interaction of light with clinical samples
Strategies for pre-processing and statistical analysis in clinical
spectroscopy
Pro
toco
ls
Preparation of cells, tissue and biofluids for clinical spectroscopy
Inter-group data sharing
Evid
en
ce
Power of spectroscopy for use in the clinical arena
Requirements of instrumentation suitable for use in the clinic
Clinical Infrared and Raman Spectroscopy for Medical
Diagnosis
PARTNERS
ACADEMIC Peter Gardner
Matthew J Baker
Nicholas Stone
Julian Moger
Josep Sulé-Suso
Francis Martin
Sergei G Kazarian
Hugh J Byrne
Roy Goodacre
John M Chalmers
Alex Henderson
Peter Lasch
Ganesh
Sockalingum
Bayden Wood
Peter Weightman
Gianfelice Cinque
Peter Rich
CLINICAL Noel Clarke
Jonathan Shanks
Timothy Dawson
Charles Davis
Pierre Martin-Hirsch
Hugh Barr
Neil Shepherd
John McGrath
Jim Brown
Sam Janes
INDUSTRIAL Agilent
Bruker
Cobalt Light Systems
Coherent UK
Perkin Elmer
Renishaw
@clirspec http://clirspec.org/
CLIRSPEC Work Package 6
Assess current spectral and image data attributes from the range of currently employed network instrumentation
Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding
Provide a single software target, easing the development of third party software and its uptake within the clinical arena
Investigate the utility of standard spectra for specific diseases
Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse
CLIRSPEC Work Package 6
Assess current spectral and image data attributes from the range of currently employed network instrumentation
Develop a standard data transfer format to allow free and easy dissemination of data between network members enhancing collaboration and efficiency of research funding
Provide a single software target, easing the development of third party software and its uptake within the clinical arena
Investigate the utility of standard spectra for specific diseases
Investigate the technological, cultural, ethical and IP issues in order to enable data sharing and reuse
Data format requirements
Operating system neutral
Scalable to large file sizes (futureproof)
Random access (don’t unzip before reading)
File format description available (NDA open)
Other software available that can read it
Quick to write and, more importantly, quick to read
Able to hold (encrypted) instrumental parameters
Enables round-tripping, no information loss
…
Open data formats – Spectra
JCAMP-DX Over 4 compression systems
Some code available
Grams SPC Understands spectroscopy types and units
Some import filters available
CSV/text Simple to read
Not scalable
Not suitable for images
Loss of metadata
Hyperspectral images
Grams SPC
Pixel indexing issues, needs help
ENVI
Manual spectrum-centric or image-centric access
May require IDL library
NetCDF-4
Self-describing, accessed via libraries
Compression and streaming available
3D confocal and tomographic
NetCDF-4
Unlimited dimensionality
Optimised spectrum-centric or image-centric access
through ‘chunking’
Supported
Community input required
Data types that need to be supported
Irregularly shaped images
Collections of spectra
Discrete wavelength data (multispectral not
hyperspectral)
Time course (multiple dependent variables)
Software
Filters written, format testing etc.
THINKING and PLANNING!!
Registration at http://clirspec.org
Groups at http://clirspec.org
Updates at http://clirspec.org
Remember…