LIDER workshop, Munich 13th of July 2015 Semantics for...
Transcript of LIDER workshop, Munich 13th of July 2015 Semantics for...
LIDER workshop, Munich 13th of July 2015
Semantics for Integrated
Laboratory Analytical Processes
The Allotrope Perspective
Heiner Oberkampf
slide 2
Agenda
Initial Situation
Allotrope Foundation
Approach and IT-Solution
Allotrope Data Format
Domain Taxonomies
Use Cases
slide 3
Laboratory Analytical Processes
sample data analytical process
slide 4
Laboratory Analytical Processes
Application 1
Application 2 Application 3
slide 5
Common Problems
It’s hard to find data
based on intuitive starting
points [e.g. study, project,
analyst, technique]
It’s hard to integrate
data from different
labs instruments, or
online/offline because
the file format is
different
It’s hard to mine a collection of
data because the details and the
context of the experiment is
stored somewhere else
Can’t interpret data later because the context is
incomplete, inconsistent, often free text
Instrument & software
interoperability is
limited…at best
slide 6
Allotrope Data Format
slide 7
Allotrope Foundation
Member Companies: AbbVie, Amgen, Baxter, Bayer, Biogen,
Boehringer Ingelheim, Bristol-Myers Squibb, Eli Lilly,
Genentech/Roche, GlaxoSmithKline, Merck & Co., Pfizer
Secretariat: Drinker Biddle
Project Management
Legal & Logistics Support
Professional Software Firm: OSTHUS
Framework development
Technical leadership
Partner Network: ACD/Labs, Agilent Technologies, BIOVIA,
BSSN Software, Erasmus MC, IDBS, Mestrelab Research, Mettler
Toledo, Sartorius, Shimadzu, Thermo Scientific, University of
Southampton, Waters
slide 8
Allotrope Data Format (ADF)
ADF is based on Hierarchical Data Format (HDF 5), which is specificially designed to store
and organize large amounts of numerical data.
slide 9
API Stack
The Allotrope Framework provides APIs to read and write data
contained in ADF
Thus, developers do not have to concern themselves with RDF,
SPARQL, semantics or complex graph patterns.
Platform independent file format
(HDF 5)
Data Package API Data Cube API
Data Description API
(Apache Jena)
Analytical Data API
Taxonom
ies
Triple Store API
slide 10
Allotrope Foundation Taxonomies (AFT)
slide 11
Scope and Current Status
13 analytical techniques are
already implemented:
small molecules:
• gas chromatography
• Karl Fischer
• liquid chromatography
• mass spectrometry
• nuclear magnetic repulsion spectrometry
• thermogravimetric analysis
• ultra violet spectrometry
large molecules:
• capillary electrophoresis
• cell counter
• cell culture analyzer
• blood gas analysis
both:
• balance
• pH
530
140
2220
270
Number of Classes:
slide 12
Reused Vocabularies and Ontologies
Directly imported:
Simple Knowledge Organization System (SKOS)
Quantities, Units, Dimensions and Data Types Ontologies (QUDT)
The RDF Data Cube Vocabulary (QB)
Partly reused definitions:
Chemical Methods Ontology (CHMO)
Proteomics Standards Initiative – Mass Spectrometry (PSI-MS)
International Union of Pure and Applied Chemistry (IUPAC)
…
slide 13
Analytical Workflow
slide 14
Analytical Workflow
The basic analytical workflow and data flow gets standardized
slide 15
Liquid Chromatography Mass Spectrometry
Data set of rank 2
Additional dimensions:
- sample
- retention time
- device
- …
Only meta data is expressed in RDF,
while the numeric data is natively
represented in HDF 5.
The ADF Data Cube Ontology
provides the mapping between RDF
meta data descriptions and physical
storage in HDF 5.
mass
ion c
ount
slide 16
Imaging Mass Spectrometry
Nature Reviews Cancer 10, 639-646
(September 2010) | doi:10.1038/nrc2917
slide 17
High Performance Liquid Chromatography
<HPLCSystem1/QuaternarySolventManager>
<HPLCSystem1/SampleManager>
<HPLCSystem1/ColumnManager>
<HPLCSystem1/PDADetector>
<HPLCSystem1>
http://registry.mycompany.com/systems/hplc/hplc-uv/
Linked Data Platform relative URLs under
HPLC-UV
Base URL in Registry
af-e:has component
slide 18
Conclusion
Initially: Experiments were performed to get approval for
drugs.
Today: Experiments generate data that can be used in many
different contexts.
Why Semantics?
Good framework for standardized data descriptions and
needed to realize the potential of the available data
Linked Data allows to relate information stored in ADF with
additional context: e.g. materials, devices, chemicals,
processes, locations etc.
slide 19
Questions?
Heiner Oberkampf
www.osthus.com
Allotrope Foundation:
www.allotrope.org