Linked Experimental Data in Life Sciences
Alejandra González-Beltrán, PhD Oxford e-Research Centre, University of [email protected] @alegonbel
UKON 2014 April 24th, 2014 Birmingham, UK
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
Experimental workflow
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
metadata
data+
Experimental workflow
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
metadata
data+
Data Interoperability
Experimental workflow
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
Data Reusability
Experimental workflow
Formats & Database Fragmentation Publication
���5
) infrastructureThe Investigation/Study/Assay (
generic format for experimental description and data exchange
open source software toolscommunity engagement
���6
Experimental workflow - graph representation
H. Sapiens
33 Years
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
Scanning
Scanning
Scanning
...
...
...
Spreadsheets for end-users
vocabulary for the description of the experimental workflow
H. Sapiens
33 Years
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
Scanning
Scanning
Scanning
...
...
...
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
Scanning
Scanning
Scanning
...
Experimental workflow - graph representation
Spreadsheets for end-users
vocabulary for the description of the experimental workflow
syntactic interoperabilityacross biological experiments of different types
H. Sapiens
33 Years
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
Scanning
Scanning
Scanning
...
...
...
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
Scanning
Scanning
Scanning
...
Experimental workflow - graph representation
Spreadsheets for end-users
vocabulary for the description of the experimental workflow
syntactic interoperabilityacross biological experiments of different types
H. Sapiens
33 Years
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
Scanning
Scanning
Scanning
...
...
...
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
Scanning
Scanning
Scanning
...
Experimental workflow - graph representation
Support for O
ntology
Annotation
H1
semantic interoperabilityacross biological experiments of different types
H1.sample1
H1.sample2
Machine-readable representation Graph + Semantics
obi:material entity
tax:homo sapiens
bfo:derives
from
obi:material sample
bfo:derives _from
labeling1
obi:material processing
obi:is_specifi
ed
_input _of
obi:processed material
H1.sample1. labeled
obi:is_specified
_output _of h1-s1.cel
isa:raw data file
obi:planned process
scanning1
obi:is_specifi
ed
_input _of
obi:is_specified
_output _of
H1.sample2. labeled
labeling2 scanning2
obi:is_specifi
ed
_input _of
obi:is_specified
_output _of obi:is_specifi
ed
_input _of
obi:is_specified
_output _of h1-s2.cel
labeling protocol
obi:protocol
isa:
exec
utes
& more, e.g. MS Excel, OpenOffice
ISA config
files
ISA-Tab files ISA mapping
files
Graph
Analyzer
Conversion
Engine
ISA model
Ontology
Lookup
IRI
Generator
ISA Mapping
Parser
ISA graph
IRIs
ISA mappings
-‐ Ontology search and automated tagging -‐ (relying on NCBO BioPortal services and also LOV services in the 2nd version) on Google Spreadsheets
-‐ Collabora?ve annota?on; support for distributed users -‐ Version control & history
OntoMaton:(a(Bioportal(powered(Ontology(widget(for(Google(
Spreadsheets(Maguire(et(al,((2013(
Bioinforma?cs(
OntoMaton:(a(Bioportal(powered(Ontology(widget(for(Google(
Spreadsheets(Maguire(et(al,((2013(
Bioinforma?cs(
OntoMaton:(a(Bioportal(powered(Ontology(widget(for(Google(
Spreadsheets(Maguire(et(al,((2013(
Bioinforma?cs(
OntoMaton:(a(Bioportal(powered(Ontology(widget(for(Google(
Spreadsheets(Maguire(et(al,((2013(
Bioinforma?cs(
OntoMaton:(a(Bioportal(powered(Ontology(widget(for(Google(
Spreadsheets(Maguire(et(al,((2013(
Bioinforma?cs(
ISA$OBI'mapping'Ontology for Biomedical
Investigations
ISA$OBI'mapping'Ontology for Biomedical
Investigations
Also mappings to SIO, PROV-O
investigation studies assays
measurement technology
investigation studies assays
measurement technology
Underlying RDF representation
Bio-GraphIIn web application
Bio-GraphIIn web application
http://isa-tools.github.io/soapdenovo2/
!18
http://isa-tools.github.io/stato/
• General-purpose statistics ontology
• Coverage for processes (e.g. statistical tests and their condition of application) and information needed or resulting from statistical methods (e.g. probability distributions, variable, spread and variation metrics)
• STATO also benefits from: (i) extensive documentation with the provision of textual and formal definitions; (ii) an associated R code snippets using the dedicated R-command metadata tag, aiming at facilitating teaching and learning while relying of the popular R language; (iii) query examples documentation, highlighting how the ontology can be harnessed for reviewers/tutors/student alike.
Developed in collaboration with Dr Burke, Senior Statistician, Nuffield Department of Population Health, University of Oxford
!18
http://isa-tools.github.io/stato/
• General-purpose statistics ontology
• Coverage for processes (e.g. statistical tests and their condition of application) and information needed or resulting from statistical methods (e.g. probability distributions, variable, spread and variation metrics)
• STATO also benefits from: (i) extensive documentation with the provision of textual and formal definitions; (ii) an associated R code snippets using the dedicated R-command metadata tag, aiming at facilitating teaching and learning while relying of the popular R language; (iii) query examples documentation, highlighting how the ontology can be harnessed for reviewers/tutors/student alike.
Developed in collaboration with Dr Burke, Senior Statistician, Nuffield Department of Population Health, University of Oxford
Linked Experimental Data
• Importance of data interoperability for experimental data
• Fragmentation of formats and databases
• ISA-TAB for syntactic interoperability
• ISA-OWL for semantic interoperability
• Decoupling conversion engine from semantic framework
• Support for data integration, uniform semantic queries across experiments enabled by a common semantic framework (provided by ISA2OWL)
• Application: Bio-GraphIIn, SOAPdenovo2 use case
• STATistical Ontology for annotation of statistical analysis results
funders
Questions?
You can email us... [email protected]
View our blog http://isatools.wordpress.com
Follow us on Twitter @isatools
View our website http://www.isa-tools.org
View our Git repo & contribute http://github.com/ISA-tools
Thanks for your attention!
Top Related