The Electronic Notebook Ontology

Post on 15-Apr-2017

149 views 0 download

Transcript of The Electronic Notebook Ontology

The ElectronicNotebook Ontology

Stuart J. ChalkDepartment of Chemistry, University of North

Floridaschalk@unf.edu

VIVO 2015 – August 2015

Motivation Inspiration Electronic Scientific Notebooks The Experiment Markup Language VIVO-ISF Ontology HCLS Community Profiles Analysis Important Questions Ontology Conclusion

Outline

There’s somethingmissing from the big data landscape in science…

VIVO captures data about scientists (faculty)… …but not about the data they produce

HCLS Community Profile outlines metadata for describing datasets but does not mention laboratory notebooks

Electronic laboratory notebooks are set to become the standard way scientists capture data

How do we link these together?

Motivation

Scientists need to move todigital notebooks…

...and record not just the databut the flow and context

Traditional Laboratory Notebooks

How science is doneis important for searching,aggregation, meta-analysis

Developed out of Laboratory InformationManagement Systems (LIMS)

Content Management System for Scientists Storage of

Research data Research resources (instruments, samples, scientists) The story of the scientific endeavor

Link to external resources Display chemical structures Allow aggregation, processing of data Be compliant with industry standard record

keeping

Electronic Laboratory Notebooks

Electronic Laboratory Notebooks

A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)

Many datatypes (will expand…)

Experiment Markup Language (ExptML)

Sample Solution Space Specimen Substance Task Template Timeline User Vendor

Annotation Api Calculation Chemical Citation Communication Customer Data Dataset Definition

Element Equipment Event Experiment Group Project Protocol Quote Report Result

ExptML Ontology

VIVO-ISF Ontology

https://wiki.duraspace.org/download/attachments/51052811/PeopleOrgsRolesGrants.2014-03-14.png

The Healthcare and Life Science (HCLS) Community Profile is a Note from the Semantic Web HCLS Interest Group Access to consistent, high-quality metadata is critical to

finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval.

Data Descriptions:HCLS Community Profile

http://www.w3.org/TR/hcls-dataset/

Describes three levels for description of datasets Summary Level

Type declaration (rdf:type = dctypes:Dataset)

Title (dct:title = rdf:langString) Description (dct:description =

rdf:langString) Publisher (dct:publisher = IRI)

Version Level Type declaration (rdf:type =

dctypes:Dataset) Title (dct:title = rdf:langString) Description (dct:description =

rdf:langString)

Creator (dct:creator = IRI) Publisher (dct:publisher = IRI) Version identifier (pav:version =

xsd:string) Version linking (dct:isVersionOf =

IRI) Distribution Level

Type declaration (rdf:type = void:Dataset OR dcat:Distribution)

Title (dct:title = rdf:langString) Description (dct:description =

rdf:langString) Creator (dct:creator = IRI) Publisher (dct:publisher = IRI) License (rdf:type = IRI)

Data Descriptions:HCLS Community Profile

http://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels

Goal: Automated identification of datasets that could be made searchable and/or distributable

When an ELN functions what does it do? Orchestrates access to the system

(authentication) Supplies GUI to allow information to be

Displayed Entered Processed

Processes files to bring them into the system Sends requests to internal/external servers to

get data

Analysis

Is this information a dataset?

Does dataset belong to this author? Is the dataset available? Is there appropriate metadata? At what HCLS levels can this dataset be made

available?

What mechanism is used to make the dataset available?

Important Questions

Actions that deal with datasets Software actions User actions

Clues that something is research data(not metadata or someone else’s data)

Collection of metadata for annotation of datasets

Inference that a HCLS dataset has been created

Dataset Identification

Electronic Notebook Ontology (ENO)

ENO

ENO

Providing a mechanism to link research data to VIVO profiles would Add value to VIVO Provides faculty with a resource for their

data management plans Creates opportunities for automatic aggregation

of research data into institutional repositories

Needs to be implemented in a test ELN…

Take Home

schalk@unf.edu Phone: 904-620-1938 Skype: stuartchalk LinkedIn/Slidehare: https://www.linkedin.com/in/

stuchalk ORCID: http://orcid.org/0000-0002-0703-7776 ResearcherID:

http://www.researcherid.com/rid/D-8577-2013

Questions?