Cedar OnDemand: An intelligent browser extension to generate ontology-based metadata.
-
Upload
syed-ahmad-chan-bukhari-phd -
Category
Data & Analytics
-
view
95 -
download
0
Transcript of Cedar OnDemand: An intelligent browser extension to generate ontology-based metadata.
CEDAR An intelligent browser extension to generate ontology-based metadata.
OnDemand
Syed Ahmad Chan Bukhari, PhD
Importance of Scientific Metadata● Scientific data are generated by experiments or observations.
● Datasets must be accompanied by auxiliary information in order to be interpreted and accessed.
Metadata helps
● Datasets more understandable for humans and processable for the machines
● Scientific data analysis- often requires multiple datasets to be integrated across multiple repositories.
● Discovery in the large variety of scientific datasets and support reproducibility.
What is the high-quality metadata?
● Datasets and their metadata should be identifiable globally, described using standardized terminologies, and available in a standardized machine readable format.
Challenges with the generation of high-quality metadata
● The diversity of metadata representation formats and the poor support for semantic markup typically result in metadata that are of poor quality.
Metadata Diversity in NCBI repositories
Current practices towards data standardization
● Scientific communities have developed templates incorporating detailed checklists of the metadata needed to describe about the particular types of experimental data sources.
● Minimum information standards such as
○ MIAME: Minimum information about a microarray experiment○ MIAPE: Minimum Information About a Proteomics Experiment
● What is the minimum amount of information (metadata) needed for reporting results in a reproducible and reusable fashion.
Metadata Standardization and availability● A large number of public repositories use these community derived templates
to collect metadata from users
FAIRsharing provides a central catalog of existing standards and data formats.
CEDAR Advantages over conventional approaches● Decrease authoring time
○ Suggest values○ Pre-filling some of the fields○ Extract metadata from unstructured sources
● Increase metadata quality (accurate, complete, standardized data)○ No mistakes and inconsistencies○ Validation (required values, format, data types)○ Standardized metadata (ontologies)○ Accurate, complete, standardized
CEDAR can help editing metadata within its environment
● CEDAR template designing and metadata approaches are centralized.
● Outside of the CEDAR workbench, there are a number of existing portals providing conventional metadata submitting environments.
● CEDAR OnDemand is a browser extension
○ An extension is essentially a small software program that can access contents of a web page, modify it and can enhance the functionality of a web browser.
Most of public data repositories provide web interfaces● The lack of standardization in the collected metadata limits the source datasets to
be broadly discovered and reused.
● The creation of standardized metadata can be facilitated using standard vocabularies/ontologies.
● CEDAR have developed technologies to facilitate high-quality metadata authoring.
● While CEDAR has been working closely with several data providers to implement such pipelines, there is a communication and implementation overhead.
● To reach out to the maximum available public biomedical data repositories and enable users to generate ontology linked standardised metadata within the repository specific environment.
● This approach enables the user to seamlessly enter ontologically-controlled metadata through existing web forms native to individual repositories.
● CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories.
The key advantage of this approach is that it facilitates the creation of ontology-annotated metadata into existing web forms without requiring the individual repositories to change any code.
● CEDAR OnDemand facilitates users to create standardised machine readable metadata on web forms accessible through WWW.
● It can have its own interface to operate or can work seamlessly without providing any graphical interface.
● CEDAR OnDemand utilizes the CEDAR terminology API server and the NCBO web services to access ontologies available on bioportal and to predict relevant metadata.
● Upon activation, CEDAR OnDemand script analyses a web page contents through the browser document object model (DOM), which defines the content, structure and style of an HTML document.
● To predict the field specific ontology pool, CEDAR OnDemand script takes associated text of input fields in a webpage as inputs and invokes the CEDAR ontology server API through restful web services.
● To access the biomedical ontologies available on bioportal through CEDAR ontology server API, we use AJAX (asynchronous JavaScript and XML). AJAX communicates with CEDAR server asynchronously (in the background) through XMLHttpRequest Object to send and retrieve the data.
htt
p:/
/dat
a.b
ioo
nto
logy
.org
Ontology Search
• Download• Traverse• Search• Comment
Widgets• Tree-view• Auto-complete• Graph-view
Annotator
Recommender
Mapping Services
• Create• Download• Upload
● Term recognition● Ontology
association● Class
Recommendation
http://bioportal.bioontology.org
NCBO Tools and services in summary
● Our algorithm syntactically matches the keywords mentioned in associated text of the field with the ontology description and fetches the relevant ontology URI (Universal Resource Identifier).
● To find the relevant ontology terms, our algorithm looks from the domain ontology first. [NCBITAXON, DOID, GO, OBI, PR,CL]
● Our approach narrows down the scope of ontology class research which helps to provide relevant semantic vocabulary runtime.
● While functioning, CEDAR OnDemand displays most relevant classes run-time when to author scientific metadata.
● Auto-reading the web page contents, Its vulnerable, could be used for browser based eavesdropping attacks. E.g passwords, Credit Card
■ Gave control to users through manual activation
● Diversity in the input field. E.g <input type=text, <div, <inputfield, <text■ Support <input type=text, <div, HTML5■ Limited support for twitter bootstrap
● Right ontology selection. Most of the ontologies in bioportal do not have definitions and description.
■ String mapping algorithm is currently used to fetch the right ontology ID
● Run-time delay■ Limited to a set of ontologies
(challenges and limitations)
Future Work● Topic to ontology prediction is the area where I have plan to focus in future to
increase the precision.
● Required more metadata to display run-time e.g definitions It takes several minutes to display with in current setup.
○ Downloading ontologies to a local server could be possible solution
● Auto-filling feature would a great addition based on the pre filled fields
Summary● CEDAR OnDemand is a Chrome browser extension that help to create
standardized high-quality metadata on the web forms available on web.
● It utilizes the functionality of cutting edge ontology web services and tools available at the NCBO and CEDAR workbench and make them available out of their working environment
● CEDAR OnDemand is an application independent browser extension which can work on mobile platform as well.
Availability● CEDAR OnDemand is available on chrome webstore freely. Source code can be
accessed at Github http:/github.com/ahmadchan/cedarondemand