Cedar OnDemand: An intelligent browser extension to generate ontology-based metadata.

29
CEDAR An intelligent browser extension to generate ontology-based metadata. OnDemand Syed Ahmad Chan Bukhari, PhD [email protected]

Transcript of Cedar OnDemand: An intelligent browser extension to generate ontology-based metadata.

CEDAR An intelligent browser extension to generate ontology-based metadata.

OnDemand

Syed Ahmad Chan Bukhari, PhD

[email protected]

Importance of Scientific Metadata● Scientific data are generated by experiments or observations.

● Datasets must be accompanied by auxiliary information in order to be interpreted and accessed.

Metadata helps

● Datasets more understandable for humans and processable for the machines

● Scientific data analysis- often requires multiple datasets to be integrated across multiple repositories.

● Discovery in the large variety of scientific datasets and support reproducibility.

What is the high-quality metadata?

● Datasets and their metadata should be identifiable globally, described using standardized terminologies, and available in a standardized machine readable format.

Challenges with the generation of high-quality metadata

● The diversity of metadata representation formats and the poor support for semantic markup typically result in metadata that are of poor quality.

Metadata Diversity in NCBI repositories

Current practices towards data standardization

● Scientific communities have developed templates incorporating detailed checklists of the metadata needed to describe about the particular types of experimental data sources.

● Minimum information standards such as

○ MIAME: Minimum information about a microarray experiment○ MIAPE: Minimum Information About a Proteomics Experiment

● What is the minimum amount of information (metadata) needed for reporting results in a reproducible and reusable fashion.

Metadata Standardization and availability● A large number of public repositories use these community derived templates

to collect metadata from users

FAIRsharing provides a central catalog of existing standards and data formats.

Metadata Standardization and availability

CEDAR helps to generate FAIR metadata

CEDAR Advantages over conventional approaches● Decrease authoring time

○ Suggest values○ Pre-filling some of the fields○ Extract metadata from unstructured sources

● Increase metadata quality (accurate, complete, standardized data)○ No mistakes and inconsistencies○ Validation (required values, format, data types)○ Standardized metadata (ontologies)○ Accurate, complete, standardized

CEDAR provides run-time recommendations

CEDAR can help editing metadata within its environment

● CEDAR template designing and metadata approaches are centralized.

● Outside of the CEDAR workbench, there are a number of existing portals providing conventional metadata submitting environments.

● CEDAR OnDemand is a browser extension

○ An extension is essentially a small software program that can access contents of a web page, modify it and can enhance the functionality of a web browser.

Most of public data repositories provide web interfaces● The lack of standardization in the collected metadata limits the source datasets to

be broadly discovered and reused.

● The creation of standardized metadata can be facilitated using standard vocabularies/ontologies.

● CEDAR have developed technologies to facilitate high-quality metadata authoring.

● While CEDAR has been working closely with several data providers to implement such pipelines, there is a communication and implementation overhead.

● To reach out to the maximum available public biomedical data repositories and enable users to generate ontology linked standardised metadata within the repository specific environment.

● This approach enables the user to seamlessly enter ontologically-controlled metadata through existing web forms native to individual repositories.

● CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories.

The key advantage of this approach is that it facilitates the creation of ontology-annotated metadata into existing web forms without requiring the individual repositories to change any code.

A manifest file is the entry point for the chrome extension script to take action

● CEDAR OnDemand facilitates users to create standardised machine readable metadata on web forms accessible through WWW.

● It can have its own interface to operate or can work seamlessly without providing any graphical interface.

● CEDAR OnDemand utilizes the CEDAR terminology API server and the NCBO web services to access ontologies available on bioportal and to predict relevant metadata.

● Upon activation, CEDAR OnDemand script analyses a web page contents through the browser document object model (DOM), which defines the content, structure and style of an HTML document.

● To predict the field specific ontology pool, CEDAR OnDemand script takes associated text of input fields in a webpage as inputs and invokes the CEDAR ontology server API through restful web services.

● To access the biomedical ontologies available on bioportal through CEDAR ontology server API, we use AJAX (asynchronous JavaScript and XML). AJAX communicates with CEDAR server asynchronously (in the background) through XMLHttpRequest Object to send and retrieve the data.

htt

p:/

/dat

a.b

ioo

nto

logy

.org

Ontology Search

• Download• Traverse• Search• Comment

Widgets• Tree-view• Auto-complete• Graph-view

Annotator

Recommender

Mapping Services

• Create• Download• Upload

● Term recognition● Ontology

association● Class

Recommendation

http://bioportal.bioontology.org

NCBO Tools and services in summary

● Our algorithm syntactically matches the keywords mentioned in associated text of the field with the ontology description and fetches the relevant ontology URI (Universal Resource Identifier).

● To find the relevant ontology terms, our algorithm looks from the domain ontology first. [NCBITAXON, DOID, GO, OBI, PR,CL]

● Our approach narrows down the scope of ontology class research which helps to provide relevant semantic vocabulary runtime.

● While functioning, CEDAR OnDemand displays most relevant classes run-time when to author scientific metadata.

CEDAR OnDemand In action

CEDAR OnDemand In action

Other potential usage of CEDAR OnDemand

What could be other application areas?

● Auto-reading the web page contents, Its vulnerable, could be used for browser based eavesdropping attacks. E.g passwords, Credit Card

■ Gave control to users through manual activation

● Diversity in the input field. E.g <input type=text, <div, <inputfield, <text■ Support <input type=text, <div, HTML5■ Limited support for twitter bootstrap

● Right ontology selection. Most of the ontologies in bioportal do not have definitions and description.

■ String mapping algorithm is currently used to fetch the right ontology ID

● Run-time delay■ Limited to a set of ontologies

(challenges and limitations)

Future Work● Topic to ontology prediction is the area where I have plan to focus in future to

increase the precision.

● Required more metadata to display run-time e.g definitions It takes several minutes to display with in current setup.

○ Downloading ontologies to a local server could be possible solution

● Auto-filling feature would a great addition based on the pre filled fields

Summary● CEDAR OnDemand is a Chrome browser extension that help to create

standardized high-quality metadata on the web forms available on web.

● It utilizes the functionality of cutting edge ontology web services and tools available at the NCBO and CEDAR workbench and make them available out of their working environment

● CEDAR OnDemand is an application independent browser extension which can work on mobile platform as well.

Availability● CEDAR OnDemand is available on chrome webstore freely. Source code can be

accessed at Github http:/github.com/ahmadchan/cedarondemand

Acknowledgement

Kei-Hoi Cheung, Yale University, Dept. of Medical Informatics

Kleinstein Lab, Yale University, Dept. of Pathology