D1.3 GRANATUM Biomedical Semantic Model · German Cancer Research Centre (DKFZ) Clarissa Gerhäuser...

FP7‐ICT‐2009‐6

Project Partners: FIT (DE), NUIG‐DERI (IE), CYBION (IT), CERTH (GR), UCY/CBC (CY), UCY/CS

(CY), DKFZ (DE), UBITECH (GR)

Every effort has been made to ensure that all statements and information contained herein are

accurate, however the Partners accept no liability for any error or omission in the same.

© Copyright in this document remains vested in the Project Partners.

Project Number 270139

D1.3 –GRANATUM Biomedical Semantic Model

Version 0.7

1 February 2012 Draft

EC Distribution

CERTH with input from NUIG-DERI, UCY/CBC, DFKZ and UBITECH

D1.3 – GRANATUM Biomedical Semantic Model

31 January 2012 Version 1.0 Page ii

Confidentiality: EC Distribution

PROJECT PARTNER CONTACT INFORMATION

Centre of Research and Technology

Hellas (CERTH)

Prof. Konstantinos Tarabanis

6th Klm. Charilaou - Thermi Road

P.O. BOX 60361 GR - 570 01

Thermi, Thessaloniki - Greece

Tel.: +30 231 08 91 578

Fax : +30 231 08 91 509

E-mail: [email protected]

Fraunhofer FIT (FIT)

Wolfgang Prinz

Schloss Birlinghoven

53754 Sankt Augustin, Germany

Tel.: +49 2241 142730

Fax : +49 2241 142080

E-mail: wolfgang,[email protected]

University of Cyprus –

Cancer Biology and Chemoprevention

Laboratory, Department of Biological

Sciences (UCY/CBC)

Christiana Neophytou

Panepistimioupolis

1678 Lefkosia – Cyprus

Tel.: + 35 722892725

Fax : + 35 22892881


CYBION

Via della Scrofa, 117

00186 Roma - Italia

Tel.: +39 6 68 65 975

Fax : +39 6 68 80 69 97

E-mail:

GIOUMPITEK - Meleti Schediasmos

Ylopoiisi kai Polisi Ergon Pliroforikis

EPE (UBITECH)

Thanassis Bouras

Mesogeion Avenue 429 & Chalandriou 3,

15343 Agia Paraskevi, Athens, Greece

Tel.: +30 211 7005570

Fax : +30 211 7005571


University of Cyprus - Department of

Computer Sciences (UCY/CS)

Christos Kannas

P.O. Box 20537

1678 Nicosia, CYPRUS

Tel.: + 357 99530608

Fax : + 357 22892701


German Cancer Research Centre

(DKFZ)

Clarissa Gerhäuser

Im Neuenheimer Feld 280

69120 Heidelberg - Germany

Tel.: +49 6221 42 3306

Fax : +49 6221 42 3359

E-mail:

National University of Ireland, Galway -

Digital Enterprise Research Institute

(NUIG-DERI)

Helena F. Deus

IDA business park, Lower Dangan

Galway, Ireland

Tel.: +353 91495270

Fax : +353 91 495541

Email: [email protected]


31 January 2012 Version 1.0 Page iii


CONTRIBUTORS

Partner Name Short

Name

Nationality

Fraunhofer Gesellschaft - Institut für Angewandte

Informationstechnik

FIT DE

National University of Ireland, Galway (NUI, Galway) -

Digital Enterprise Research Institute (DERI)

NUIG-DERI IE

CYBION Srl. CYBION IT

Centre of Research and Technology Hellas CERTH GR

University

of Cyprus

Cancer Biology and Chemoprevention

Laboratory, Department of Biological Sciences

UCY/CBC CY

Department of Computer Sciences UCY/CS

German Cancer Research Centre (Deutsche

Krebsforschungszentrum)

DKFZ DE

GIOUMPITEK - Meleti Schediasmos Ylopoiisi kai Polisi Ergon

Pliroforikis EPE

UBITECH GR


31 January 2012 Version 1.0 Page iv


DOCUMENT CONTROL

Version Status Date

0.1 Changes to all sections 1 October 2011

0.2 Changes to all sections 25 October 2011

0.3 Bottom-up and top-down construction of the model 19 December 2011

0.4 Model specification 21 December 2011

0.5 Model evaluation questionnaire 12 January 2012

0.6 Integration of evaluation results 25 January 2012

0.7 Integrate comments from trial partners related to the model 1 February 2012


31 January 2012 Version 1.0 Page v


TABLE OF CONTENTS

EXECUTIVE SUMMARY ......................................................................................................................................... 1

1. INTRODUCTION .............................................................................................................................................. 2

1.1. DOCUMENT SCOPE ....................................................................................................................................... 2 1.2. MOTIVATION ................................................................................................................................................ 3 1.3. METHODOLOGY............................................................................................................................................ 6 1.4. DOCUMENT STRUCTURE ............................................................................................................................... 7

2. DESIGN PROCESS ........................................................................................................................................... 8

2.1. SPECIFICATION ............................................................................................................................................. 8 2.2. CONCEPTUALIZATION................................................................................................................................... 9

2.2.1. Top- Down Conceptualization ............................................................................................................ 9 2.2.1.1. Existing Ontologies/models .............................................................................................................. 10 2.2.1.1.1. Literature representation Ontologies ........................................................................................... 10 2.2.1.1.2. Biomedical Ontologies ................................................................................................................. 11 2.2.1.2. Top- Down Concept identification .................................................................................................... 14 2.2.1.2.1. Literature Domain ........................................................................................................................ 14 2.2.1.2.2. Experiment Domain ...................................................................................................................... 15 2.2.1.2.3. Biomedical Domain ...................................................................................................................... 18 2.2.2. Bottom up construction of the GRANATUM model .......................................................................... 20 2.2.2.1. Analysis of Publicly available datasets ............................................................................................. 20 2.2.2.1.1. Existing datasets accessed through SPARQL endpoints .............................................................. 20 2.2.2.1.2. Existing datasets accessed through searching.............................................................................. 25 2.2.2.1.3. Concept/attribute identification from Existing datasets ............................................................... 30 2.2.2.2. Experimental data analysis ............................................................................................................... 31 2.2.2.3. Requirements analysis ...................................................................................................................... 33

3. GRANATUM BIOMEDICAL SEMANTIC MODEL FORMALIZATION .............................................. 35

3.1. OVERVIEW ................................................................................................................................................. 35 3.2. MODEL SPECIFICATION .............................................................................................................................. 37 3.3. MODEL IMPLEMENTATION ......................................................................................................................... 47

4. MODEL EVALUATION ................................................................................................................................. 48

4.1. EVALUATION CRITERIA .............................................................................................................................. 48 4.2. EVALUATION METHODOLOGY .................................................................................................................... 48 4.3. EVALUATION RESULTS ............................................................................................................................... 49

4.3.1. Usability evaluation .......................................................................................................................... 50 4.3.2. Correctness and completeness evaluation ........................................................................................ 51

5. CONCLUSIONS AND FUTURE WORK ...................................................................................................... 52

REFERENCES .......................................................................................................................................................... 53

APPENDIX ................................................................................................................................................................ 56


31 January 2012 Version 1.0 Page vi


TABLE OF FIGURES

Figure 1 GRANATUM Vision for Bridging Biomedical Researchers’ Knowledge and

Information Gap .............................................................................................................................. 3

Figure 2Methodology for building the GRANATUM Biomedical Semantic Model ..................... 6 Figure 3 GRANATUM Biomedical Semantic Model scope .......................................................... 9 Figure 4 Bottom-up and Top-down conceptualization of the GRANATUM Biomedical Semantic

Model ............................................................................................................................................ 10 Figure 5 Conceptualization of the first experimental data set ....................................................... 32

Figure 6 Conceptualization of the second experimental data set .................................................. 32 Figure 7 Overview of the GRANATUM Biomedical Semantic Model ........................................ 36


31 January 2012 Version 1.0 Page vii


TABLE OF TABLES

Table 1 Functional requirements that require the use of Ontology ................................................. 4 Table 2 Concepts identified from the analysis of the non-functional requirements ....................... 5

Table 3 Concepts identified in the literature domain .................................................................... 15 Table 4 Properties identified in the literature domain ................................................................... 15 Table 5 Concepts identified in the experiment domain ................................................................. 16 Table 6 Properties identified in the experiment domain ............................................................... 17 Table 7 Concepts identified in the biomedical domain ................................................................. 18

Table 8 Properties identified in the biomedical domain ............................................................... 19 Table 9 Concepts detected into dataset (accessed through SPARQL endpoints or through

searching) ...................................................................................................................................... 31 Table 10 Concepts derive from experimental data analysis .......................................................... 33 Table 11 Properties derive from experimental data analysis ........................................................ 33 Table 12 Concepts derived from the Use Cases and the Questionnaires ...................................... 34 Table 13 Ontology evaluation criteria ........................................................................................... 48

Table 14 Methodologies for ontology evaluation ......................................................................... 48

Table 15 Evaluation criteria for each evaluation methodology .................................................... 49 Table 16 Usability evaluation ....................................................................................................... 50 Table 17 Changes based on evaluation ......................................................................................... 51


31 January 2012 Version 1.0 Page 1


EXECUTIVE SUMMARY

The present document is Deliverable 1.3 “GRANATUM Biomedical Semantic Model”

(henceforth referred to as D1.3) of the GRANATUM project. The main objective of this

document is to define, design and document the GRANATUM Biomedical Semantic Model,

which is one of the pillars on which the GRANATUM approach will build. The GRANATUM

Biomedical Semantic Model will comprise of a set of core concepts and their relationships. It

will be possible to further specialize these core concepts or introduce new concepts, thus

ensuring the extensibility and adaptability of the ontology to the needs of different cases. The

model will contribute to the realization of the GRANATUM vision by carrying the required

semantics in WPs 2 to 6.

The methodology that has been followed for the creation of GRANATUM Biomedical Semantic

Model is based on a combination of known methodologies for ontology creation. The following

steps have been carried out for the creation of the ontology: (i) Specification of the ontology’s

scope, uses, end-users and granularity of the concepts that should be taken into account; (ii)

Identification of the concepts and relations of the ontology using a “meet-in-the-middle”

approach that combines a bottom-up and top-down methodology, the output of this process is a

conceptual model of the ontology; (iii) Formalization of the conceptual model using a standard

template; (iv) Implementation of the ontology using a standardized language such as OWL, (v)

Evaluation of the accuracy and completeness of the produced ontology; and (vi) Continuous

maintenance of the ontology to improve it.

The main step of the methodology was the Conceptualization, where are identified the concepts

and properties of the ontology. On the one hand, the concepts and properties emerged in a

bottom-up fashion by: (i) analyzing the top and medium priority requirements identified in D1.1,

(ii) analyzing existing data sets (e.g. databases and SPARQL endpoint) and (iii) analyzing

experimental data related to cancer chemoprevention that are provided by the trial partners. On

the other hand concepts were elicited following a top-down approach by analyzing existing

ontologies (e.g. Gene Ontology, Experimental Factor Ontology, National Cancer Institute

Thesaurus etc.) and identifying concepts/relation related to the GRANATUM Biomedical

Semantic Model. The concepts and properties of both methodologies (bottom-up and top-down)

will be merged in order to create the conceptual model of the ontology.

Afterwards, the conceptual model was formally defined using a standard template that specifies

the class hierarchy and the properties used by each class. The OWL implementation uses the

classes and properties as they are represented by the formally defined conceptual model.

The next step was the evaluation of the ontology. The evaluation examined the accuracy and

completeness of the ontology. At the evaluation process of the ontology the trial partners (i.e.

DKFZ, IITRI and UCY/CBC) were actively involved in order to identify inconsistencies and

weaknesses of the ontology.

Finally, the Maintenance of the ontology is a continuous process that will be carried throughout

the GRANATUM project, so it will detect needs that emerge and alter the ontology respectively

to handle them.




1. INTRODUCTION

The aim of this section is to present the background of the work pursued during Task 1.3. The

scope and the main objectives which have guided this work are introduced in section 1.1. Section

1.2 presents the motivation for creating a semantic model. The methodology followed is

described in section 1.3. Last, section 1.4 presents the organization of the current deliverable.

1.1. DOCUMENT SCOPE

The present document is Deliverable 1.3 “D1.3 – GRANATUM Biomedical Semantic Model”

(henceforth referred to as D1.3) of the GRANATUM project. The main objective of this

document is to define and document the outcome of Task 1.3, providing the specification of the

GRANATUM common reference ontological model for describing, sharing and linking cancer

chemoprevention significant Web resources. In order to capture the semantics of the biomedical

domain, the GRATANUM platform utilizes a lightweight ontological model, called

GRANATUM Biomedical Semantic Model. The creation of the GRANATUM Biomedical

Semantic Model follows a methodology (Section 1.3) based on a set of existing methodologies

for defining ontologies.

The GRANATUM Biomedical Semantic Model constitutes one of the pillars of the

GRANATUM platform that will fulfil the GRANATUM vision.

The vision of the GRANATUM project is to bridge the information, knowledge and collaboration gap

among biomedical researchers in Europe and beyond, ensuring that the biomedical scientific community

has homogenized access to the globally available information and data resources needed to perform

complex cancer chemoprevention experiments and conduct studies on large scale datasets (Figure 1). In

this way, GRANATUM will facilitate the social sharing and collective analysis of biomedical experts’

knowledge and experience, as well as the joint conceptualization and design of scalable chemoprevention

models and simulators, towards the enablement of collaborative biomedical research activities beyond

geographical barriers, helping researchers in this highly multidisciplinary field to manage the complex

range of tasks involved in carrying out collaborative research.

The GRANATUM consortium




Figure 1 GRANATUM Vision for Bridging Biomedical Researchers’ Knowledge and Information Gap

1.2. MOTIVATION

In order to capture the semantics of the biomedical domain, the GRATANUM project will

develop a lightweight ontological model, called GRANATUM Biomedical Semantic Model,

which will be one of the pillars on which the GRATANUM approach will build. The

GRANATUM Biomedical Semantic Model will comprise of a set of core concepts and their

relationships. It will be possible to further specialize these core concepts (through sub-

concepting) or introduce new concepts, thus ensuring the extensibility and adaptability of the

ontology to the needs of different cases.

The need for a Biomedical Semantic model, that is formalized as an ontology, derives from the

functional and non-functional requirements presented in the Requirement Analysis in D1.1.

Specifically Table 1 presents the functional requirements identified, for each requirement it is

identified if the fulfilment of the requirement premises the use of a model/ontology. Similarly,

Table 2 presents the non-functional requirements and for each requirement it is identified if the

fulfilment of the requirement assumes the use of an ontology. 15 of the 20 functional

requirements require an ontology in order to be satisfied, while only 2 non-functional

requirements, related with the interoperability, require the use on an ontology.




Requirement Use of ontology

F1. Search and access information (including publications, pathways, epigenomics,

genes, proteins, agents, drugs, clinical trials, etc.) derived from biomedical

databases/libraries

F2. Customize search based on advanced criteria e.g. include/exclude a database or

an attribute

F3. Integrate/combine information derived from different biomedical

databases/libraries

F4. Build/edit a hypothesis scenario

F5. Support in silico discovery

F6. Support the collaboration with distributed partners/groups and the sharing of

data/opinions/expertise

F7. Manage data and resources

F8. Manage teams and assign roles -

F9. Push data into different tools (visualization, statistical, data analysis etc.)

F10. Support knowledge extraction from scientific publications

F11. Add feedback and comments on the quality of data

F12. Advise on conflicting data

F13. Support the preparation of a grant proposal -

F14. Support user profiles in the GRANATUM platform -

F15. Manage the personalized space in the GRANATUM platform -

F16. Receive and view information from multiple feeds

F17. Support data query workflows (send the results of one database query into the

next database query)

F18. Present the searching results and the incoming feeds adapted to user’s profile and

preference criteria

F19. Manage IPRs -

F20. Recommend relevant information based on domain ontology, user profile and

current activities

Table 1 Functional requirements that require the use of Ontology




Requirement

Type

Requirement Use of ontology

Fu

nct

ion

ali

ty

(Sec

uri

ty)

NF1. Support data security -

NF2. Support user authentication, authorization and role-

based access control sign-in

-

NF3. Support data privacy, confidentiality -

NF4. Support security proof -

Fu

nct

ion

ali

ty

(In

tero

per

ab

i

lity

)

NF5. Support interoperability between different formats

(compatible with many standards, data

transformations)

NF6. Support commonly used standards, data models,

widely available tools, standard syntax, open API,

technologies, methodologies, and best practices

Rel

iab

ilit

y NF7. Support Reliability -

Usa

bil

ity

(Op

erab

ili

ty)

NF8. Provide a presentation interface customized to the

user’s profile/role and interests (my projects, my team,

my data, my deadlines etc.)

-

NF9. Provide a unified interface with many tools and

functionalities bundled together

-

Usa

bil

ity

(Un

der

sta

nd

ab

ilit

y) NF10. Easy-to-use interface (filtering, navigation, etc.) -

NF11. Support an understandable and intuitive interface -

NF12. Support an intelligent interface -

Eff

icie

ncy

NF13. Perform analytic tasks (in-silico discovery) through

vast amount of data in acceptable times

-

NF14. Be accessible by multiple users at the same time

(multi-tenant)

-

NF15. Support efficiency -

Av

ail

a

bil

ity NF16. Be accessible and available (at acceptable service

levels).

-

Table 2 Concepts identified from the analysis of the non-functional requirements




1.3. METHODOLOGY

The methodology that has been followed for the definition of the GRANATUM Biomedical

Semantic Model is based on a set of methodologies for defining ontologies:

METHONTOLOGY [1] is a methodology, created in the Artificial Intelligence Lab from

the Technical University of Madrid (UPM), for building ontologies either from scratch,

reusing their ontologies as they are, or through a reengineering procees. The ontology

development process identifies which tasks should be performed when building

ontologies: Specification, Conceptualization, Formalization, Integration, Implementation,

and Maintenance. The main phase in the ontology development process using the

METHONTOLOGY approach is the conceptualization phase. METHONTOLOGY has

been proposed1 for ontology construction by the Foundation for Intelligent Physical

Agents (FIPA).

Li et al. [2] propose a method and process to acquire and validate ontologies. The main

contributions include a new, systematic, and structured ontology development method

assisted by a semiautomatic acquisition tool. The development method defines the steps

for the ontology development that are the Specification, Acquisition, Formalization,

Population, Validation and Maintenance.

Öhgren et al. [3] propose a methodology for ontology development. The core ideas of our

enhanced methodology are (a) reuse of fragments of existing ontologies, (b) instruction-

like detailed definition of all steps of the development process, and (c) extensive use of

guidelines and other aids. The phases of the methodology are the Requirements Analysis,

the Building, the Implementation and the Evaluation & Maintenance.

Specification Conceptualization Formalization Implementation Evaluation Maintenance

Figure 2Methodology for building the GRANATUM Biomedical Semantic Model

An overview of the adopted methodology is shown in Figure 2. More specifically, the steps that

were followed are listed below:

1. Specification. This activity states why is the ontology built, which are the intended uses

and who are the end-users. As well as, the level of granularity of the concepts that should

be taken into account.

2. Conceptualization. This activity identifies the concepts and relations of the ontology.

The conceptualization of the GRANATUM Model will follow a “meet-in-the-middle”

approach. On one hand the concepts will emerge in a bottom-up fashion by analyzing the

domain, on the other hand, it will follow a top-down approach by analyzing existing

ontologies and models. The result of the conceptualization activity is the ontology

conceptual model.

1 http://www.fipa.org/specs/fipa00086/




3. Formalization. This activity transforms the conceptual model into a formal or semi-

computable model. For the formalization of the GRANATUM Biomedical Semantic

Model it is used a standard template to formally define the concepts of the model and

their relationships.

4. Implementation. This activity builds computable models in an ontology language. The

ontology language selected for the implementation of the GRANATUM Biomedical

Semantic Model is OWL.

5. Evaluation. This activity validates the accuracy and completeness of the produced

ontology.

6. Maintenance. This activity updates and corrects the ontology if needed. If corrections

are needed then step 2 (Conceptualization) is performed.

1.4. DOCUMENT STRUCTURE

The remainder of D1.2 is divided into four sections. Section 2 describes the motivation and

scope of the Biomedical Semantic model, as well as the design process followed, according to

the proposed methodology, to create the ontology. Section 3 formally documents the Biomedical

Semantic Model by defining the classes and their properties. In Section 4 an evaluation of the

model is conducted. Finally, Section 5 briefly explains further direction for the future

development of the Ontology and some conclusions are drawn.




2. DESIGN PROCESS

2.1. SPECIFICATION

The GRANATUM Biomedical Semantic Model is designed to serve as a common reference

model for the semantic annotation sharing and interconnection of globally available biomedical

resources, including Electronic Health Records, digital libraries and archives, online

communities and discussions, facilitating the delivery of machine interpretable information

regarding their structure and content, supporting the on demand discovery of published cancer

chemoprevention significant data. This biomedical semantic model will be utilized:

In the semantic annotation, sharing and inter-connection of globally available web

resources (in the Linked Biomedical Data Space);

In the semantic processing of publications and scientific papers (in online libraries and

digital archives), as well as posts on online communities and social networks (in the

Opinion Modelling and Argument Analysis Space);

In the ontology-based mash-up of social networking applications and collaboration tools

(in the Social Collaborative Working Space), and

In the discovery and retrieval of the semantically-linked cancer chemoprevention

significant online data and web resources (in the In Silico Models, Tools and

Experiments Space).

The GRANATUM Biomedical Semantic Model constitutes a cancer chemoprevention

ontological reference model, relying on widely–known and–adopted biomedical guidelines,

standards and controlled vocabularies. As this lightweight semantic model is going to be utilized

in the creation of a knowledge base of semantically interconnected distributed biomedical data

and resources across the Web, the structure of the GRANATUM Biomedical Semantic Model

will comprise:

A set of elements for the Literature representation and scientific discourse in online

communities at different levels of granularity;

A set of concepts from the biomedical domain that facilitate the representation of cancer

chemoprevention related data and resources. A set of concepts from the biomedical domain that facilitate the representation of

experimental data, procedures and protocols.

A set of concept from the biomedical domain that facilitates the representation of data

related to In-silico modeling.




Figure 3 GRANATUM Biomedical Semantic Model scope

2.2. CONCEPTUALIZATION

The identification of the concepts and relations (Conceptualization) of the GRANATUM

Biomedical Semantic Model will follow a “meet-in-the-middle” approach. On one hand the

model will emerge in a bottom-up fashion by analyzing publicly available datasets (i.e

databases, SPARQL endpoints), experimental data, user requirements collected in deliverable

D1.1 and literature/functionality suggested by the non-technical partners. On the other hand, the

model will follow a top-down approach. Existing models and ontologies from the biomedical

domain, collected during the requirements analysis step, will be modularized in order to retrieve

the concepts and relationships relevant to the GRANATUM. Specifically, in order to define the

GRANATUM Biomedical Semantic Model the following steps have been carried out (Figure 4):

i. Identify the concepts/relations using a Top-Down approach (Section 2.2.1);

a. Analyze existing models and ontologies;

ii. Identify the concepts/relations using a Bottom-Up approach (Section 2.2.2);

a. Analyze existing data sets (databases, SPARQL endpoints);

b. Analyze user requirements;

c. Analyze experimental data;

iii. Merge concepts/relations from (i) and (ii) and define the Biomedical Semantic Model.

The concepts/properties may derive from one of the two approaches (bottom-up, top-down) or

form both. In the GRANATUM Biomedical Semantic Model we include all the concept

detected, regardless of the identification approach.

2.2.1. Top- Down Conceptualization

Existing models and ontologies relevant to the GRANATUM from the biomedical domain were

primarily collected during the review of the state of the art in D1.1. These ontologies have been

analyzed and modularized in order to retrieve the concepts and relationships relevant to the

GRANATUM scope (Figure 3), i.e. relevant to in-silico modeling, cancer chemoprevention,

scientific research and experimental data/protocols. The rest of the section presents the

ontologies/models that are analyzed (Section 2.2.1.1) and then presents the concepts and

relations identified (Section 2.2.1.2).




Granatum

biomedical

model

- Top down -

Existing ontologies/models

- Bottom up – Available data sets, experimental data, user requirements,

Figure 4 Bottom-up and Top-down conceptualization of the GRANATUM Biomedical Semantic Model

2.2.1.1. Existing Ontologies/models

This section presents the models and ontologies relevant to the GRANATUM scope that were

primarily collected during the review of the state of the art at D1.1.

2.2.1.1.1. Literature representation Ontologies

In this section we present ontologies that can represent concepts related to the scientific literature

and discourse, such as bibliographic records, citations, references, authors etc.

BiRO2

The Bibliographic Reference Ontology (BiRO) is an ontology for describing bibliographic

records and references, and their compilation into bibliographic collections and reference lists. It

can be used as a citation ontology, as a document classification ontology, or simply as a way to

describe any kind of document. It has been inspired by many existing document description

metadata formats, and can be used as a common ground for converting other bibliographic data

sources. It forms part of SPAR, a suite of Semantic Publishing and Referencing Ontologies.

CiTO3

The Citation Typing Ontology (CiTO) [4] is an ontology for describing the nature of reference

citations in scientific research articles and other scholarly works, both to other such publications

and also to Web information resources, and for publishing these descriptions on the Semantic

Web. Citations are described in terms of the factual and rhetorical relationships between citing

publication and cited publication, the in-text and global citation frequencies of each cited work,

2 http://purl.org/spar/biro 3 http://purl.org/spar/cito




and the nature of the cited work itself, including its publication and peer review status. It forms

part of SPAR, a suite of Semantic Publishing and Referencing Ontologies.

FaBiO4

The FRBR-aligned Bibliographic Ontology (FaBiO) is an ontology for recording and publishing

on the Semantic Web descriptions of entities that are published or potentially publishable, and

that contain or are referred to by bibliographic references, or entities used to define such

bibliographic references. FaBiO entities are primarily textual publications such as books,

magazines, newspapers and journals, and items of their content such as poems and journal

articles. However, they also include datasets, computer algorithms, experimental protocols,

formal specifications and vocabularies, legal records, governmental papers, technical and

commercial reports and similar publications, and also bibliographies, reference lists, library

catalogues and similar collections.

SIOC5

The Semantically-Interlinked Online Communities (SIOC) [5] Core Ontology provides the main

concepts and properties required to describe information from online communities (e.g., message

boards, wikis, weblogs, etc.) on the Semantic Web. It is an attempt to link online community

sites, to use Semantic Web technologies to describe the information that communities have about

their structure and contents, and to find related information and new connections between

content items and other community objects. Developers can use this ontology to express

information contained within community sites in a simple and extensible way.

SWAN6

SWAN (Semantic Web Applications in Neuromedicine) [6] is an interdisciplinary project to

develop a practical, common, semantically-structured, framework for biomedical discourse

initially applied, but not limited, to significant problems in Alzheimer Disease (AD) research.

The SWAN ontology has been developed in the context of building a series of applications for

biomedical researchers, as well as extensive discussions and collaborations with the larger bio-

ontologies community. The Citations ontology defines a set of entities useful for referencing

scientific publications.

2.2.1.1.2. Biomedical Ontologies

In this section we present ontologies that represent concepts related to the Cancer

Chemoprevention, the Experimental process and the in-silico modelling.

Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO)7

The intention of the ACGT MO [7] is to represent the domain of cancer research and

management in a computationally tractable manner. The ACGT MO is shaped as a cross-section

4 http://purl.org/spar/fabio

5 http://sioc-project.org/

6 http://swan.mindinformatics.org/ontology.html

7 http://www.ifomis.org/wiki/ACGT_Master_Ontology_%28MO%29

http://www.w3.org/2001/sw/

http://sioc-project.org/

http://swan.mindinformatics.org/ontology.html




of a multitude of sub-domains and is aimed at constituting a terminology for transnational data

exchange in oncology, emphasizing the integration of both clinical and molecular data. It is built

using the Protégé-OWL free open-source ontology editor. It is written in OWL-DL and

presented as an .owl file. The ACGT MO is re-using BFO as upper level and the OBO RO. It

also uses the OBO RO for the relations between the classes.

Biological Pathway Exchange (BioPAX)8

BioPAX [8] aims to provide a common data exchange format that will facilitate the integration

and exchange of data maintained in several biological pathway databases. Indeed, there are more

than 200 biomedical databases storing biological pathway data. Therefore, merging diverse

database schemas to achieve integrated results from more than one database is quite difficult.

BioPAX provides a standard for representing metabolic, biochemical, transcription regulation,

protein synthesis and signal transduction pathways.

Biotop9

BioTop [9] is a top-domain ontology for molecular biology that provides definitions for the

foundational entities of biomedicine as a basic vocabulary to unambiguously describe facts in

this domain. BioTop can furthermore serve as top-level model for creating new ontologies for

more specific domains or as aid for aligning or improving existing ones.

CancerGrid Metamodel

The CancerGrid Metamodel [10] is instantiated with metadata elements to create a model of a

particular clinical trial. Trial and study designs are instances of the CancerGrid metamodel. This

gives a precise, computable definition of a clinical trial which allows generating the specific

runtime services needed, and helping to analyze the resulting data.

Experimental Factor Ontology (EFO)

The Experimental Factor Ontology (EFO) [11] is an application focused ontology modelling

experimental factors. The ontology has been developed to increase the richness of the

annotations, to promote consistent annotation, to facilitate automatic annotation and to integrate

external data. The methodology employed in the development of EFO involves construction of

mappings to multiple existing domain specific ontologies. This is achieved using a combination

of automated and manual curation steps and the use of a phonetic matching algorithm.

Gene Ontology (GO)10

Gene Ontology (GO) [12] is a controlled vocabulary for describing gene and gene product

attributes. Its aim is to address the need for consistent representation of gene product information

in different databases. In particular, different Model Organism Databases (MODs) describe the

same gene product information using different terms. To enable different databases to represent

data in a consistent way, the GO Consortium creates standard sets of terms (hierarchies or “name

spaces”) for describing biological processes, molecular functions and cellular components of

8 http://www.biopax.org/

9 http://www.imbi.uni-freiburg.de/ontology/biotop/

10 http://www.geneontology.org/

http://www.imbi.uni-freiburg.de/ontology/biotop/

http://www.geneontology.org/




gene products. This information describes gene products: i) functions on the molecular level, ii)

the biological processes these function contribute and iii) the place it is located in the cell. Terms

are related to each other within each hierarchy by is-a and part-of relationships. It started with

terminologies from three genomic databases (Flybase, the Saccharomyces Genome Database and

the Mouse Genome Database) and has grown to include many major genome repositories.

Medical Subject Headings (MeSH)11

Medical Subject Headings (MeSH) [13] is a controlled vocabulary created by the US National

Library of Medicine. It consists of sets of terms naming descriptors in a hierarchical structure

that is used for indexing, cataloguing, and searching for biomedical and health-related

information and documents. MeSH descriptors are arranged in both an alphabetic and a

hierarchical structure. There are 26,142 descriptors in 2011 MeSH. The use of MeSH for

providing names for biomedical entities in these applications is analogous in purpose to the use

of GO for providing standard names for biological processes and molecular functions.

Microarray Gene Expression Data Ontology (MGED)

Microarrays are a common experimental method being used to measure molecular-level

biomarkers for a variety of biological states and medical diseases. The Microarray Gene

Expression Data Ontology (MGED) [14] contains concepts, definitions, terms, and resources for

standardized description of a microarray experiments and results. Specifically, it provides a

terminology for annotating microarrays experiments. It describes the biological sample used in

an experiment, the treatment that the sample receives in the experiment, and the micro-array chip

technology used in the experiment. This basic information will aid researchers exploring third

party data to validate comparisons between data and help confirm interpretations of data. It is

necessary to know how an experiment was performed in order to interpret findings and make

comparison between interpretations.

National Cancer Institute (NCI) Thesaurus12

National Cancer Institute (NCI) Thesaurus [15] is a description logic terminology developed and

distributed by the US NCI for Bioinformatics and the Office of Cancer Communications [16]. It

integrates molecular and clinical cancer-related information enabling researchers to integrate,

retrieve and relate relevant concepts to one another in a formal structure, so that computers as

well as humans can use the Thesaurus for a variety of purposes. Today, the NCI contains

100,000 terms and 34,000 concepts, covering chemicals, drugs and other therapies, diseases

(more than 8,500 cancers and related diseases), genes and gene products, anatomy, organisms,

animal models, techniques, biologic processes, and administrative categories, including

definitions and synonyms [16].

Ontology for Biomedical Investigations (OBI)13

The Ontology for Biomedical Investigations (OBI) [17] is a controlled vocabulary that aims at

stimulating the integration of experimental data. The domain of OBI is the representation of

designs, protocols, instrumentation, materials, processes, data and types of analysis in all areas of

11

http://www.nlm.nih.gov/mesh/

12 http://ncit.nci.nih.gov

13 http://obi-ontology.org/page/Main_Page

http://www.nlm.nih.gov/mesh/

http://ncit.nci.nih.gov/




biological and biomedical investigations. The ontology addresses the need of modelling all

biomedical investigations and as such contains ontology terms for aspects such as:

1. biological material

2. instrument (and parts of an instrument )

3. information content

4. design and execution of an investigation (and individual experiments)

5. data transformation (incorporating aspects such as data normalization and data analysis)

RxNorm14

RxNorm [18] is a standardized terminology for clinical drugs that addresses the lack of an

adequate standard for a national terminology for medications. RxNorm has been developed by

the HL7 Vocabulary Technical committee capitalizing on models used by four vendors of drug

knowledge bases (the NLM, the Food and Drug Administration, the Department of Veterans

Affairs (VA) and HL7) [19]. It contains standard names for clinical drugs (active drug

ingredient, dosage strength, physical form) and links from the active ingredient to brand name

and combination names. The principled, non-proprietary approach has lead RxNorm to be

recommended by National Committee for Vital and Health Statistics (NCVHS)15

as one of the

standard terminologies for the core patient medical record information [20].

Unified Medical Language System (UMLS)16

The Unified Medical Language System (UMLS) [21] was created by the US National Library of

Medicine (NLM) as a meta-terminology, summarizing the contents of other terminologies about

biomedical and health related concepts in order to enable interoperability between computer

systems. UMLS provides access, among other, to the following ontologies: i) MeSH, ii) NCI

Thesaurus, iii) Gene Ontology and iv) RxNorm.

2.2.1.2. Top- Down Concept identification

This section summarizes the concepts and properties that derive from the study of the existing

Ontologies/models listed in Section 2.2.1.1. The concepts and properties are separated into 3

domains: i) the Literature domain that contain concepts related to the publications and studies,

ii) the Experiment domain that contain concepts related to an experiment and iii) the Cancer

Chemoprevention domain that contain biomedical concepts related to the cancer

chemoprevention.

2.2.1.2.1. Literature Domain

The Literature Domain contains concepts related to the publications and studies, such as

bibliographic records, citations, references, authors etc. Table 3 contains the concepts identified

in the literature domain while Table 4 contains the properties detected in the literature domain.

14

http://www.nlm.nih.gov/research/umls/rxnorm/ 15

www.ncvhs.hhs.gov/050908rpt.htm 16

http://www.nlm.nih.gov/research/umls/about_umls.html




Published Work Research statement

Person

Forum post

SWAN Book

Journal article

Newspaper article

Newspaper news

Web article

Research statement

agent

-

CiTO - - -

BiRO - - -

FaBiO Work:

Report

Specification

Paper

Dataset

Essay

Expression

- -

SIOC - - User account

Table 3 Concepts identified in the literature domain

Name referenceTo hasStatement Author

Domain

Range

Published Work

Published Work

Published Work

Research statement

Published Work

Person

SWAN - - contributorAuthor

CiTO cites

- -

BiRO reference

- -

FaBiO - hasRealization

creator

SIOC -

- has_creator

Table 4 Properties identified in the literature domain

2.2.1.2.2. Experiment Domain

Experiment Domain contains concepts related to an experiment and the procedure followed.

Table 5 contains the concepts identified in the experiment domain while Table 6 contains the

properties detected in the experiment domain.




Experiment Protocol Experimental factor

ACGT Experiment

Clinical trial

Protocol

ClinicalTrialProtocol

Organism

Substance sample

Technical object

Organ

Chemical substance

BIOTOP - -

Organism part

organ

EFO Assay

Protocol

Experimental process

Experimental factor:

Information entity

Material entity

Material property

Process

Site

MGED BioAssay

Experiment

Test

ExperimentDesign

Protocol

Experimental factor

Organism part

OBI assay

Protocol

Study design

Anatomical entity

Cell

Organism

CancerGrid Clinical trial

Trial protocol -

UMLS

(NCI, GO, MeSH)

Clinical trial

Animal experiment

Experiment

Biological assay

Assay

Research activity

Clinical trial protocol

Clinical protocol

Experimental Design

biospecimen

Research device

Tissue

Cell

Organism

Organ

Table 5 Concepts identified in the experiment domain




useFactor followProtocol

Domain

Range

Protocol

Experimental factor

Experiment

Protocol

ACGT - implements

BIOTOP - -

EFO hasInput

Inverse: is_input_of

Realizes

MGED has_experimental_factors

has_protocol

has_experimental_design

has_test_protocol

OBI has_specified_input

Inverse: is_specified_input_of

-

CancerGrid -

UMLS

(NCI, GO, MeSH)

receives_input_from

uses_device

uses_substance

has_method

has_associated_procedure

has_measurement_method

Table 6 Properties identified in the experiment domain




2.2.1.2.3. Biomedical Domain

Biomedical Domain contains biomedical concepts related to the cancer chemoprevention. Table 7 contains the concepts identified in the

experiment domain while Table 8 contains the properties detected in the experiment domain.

Pathway Target Disease Source Molecule

ACGT - Biological Macromolecule:

Protein

Nucleic acid

Disease

Drug

Chemotherapy Drug

Pharmac. Substance

-

BIOPAX Protein, RNA/DNA - Biosource -

BIOTOP - protein molecule

Nucleic acid structure

- - Biological

compound

EFO - Protein, Dna/Rna Disease

Cancer

drug Chemical

compound

MGED - - Disease state

Cancer site

compound (can be a

drug)

compound

OBI - Macromolecule

Nucleic acid

protein

Disease

- -

UMLS

(NCI, GO,

MeSH)

Pathway

Biochemical Pathway

Neural Pathways …

Proteins

Nucleic acid

Disease Pharmac. substance

Clinical drug

Pharmac. Preparations

Source

Natural source

Molecule

compound

Table 7 Concepts identified in the biomedical domain




partOfPathway hasTarget hasSource isPreventedBy

Domain

Range

Pathway

Target

Molecule

Target

Molecule

Source

Chemopreventive agent

Disease

ACGT - - - (drug hierarchy)

BIOPAX pathwayComponent

cofactor (at at pathway step)

controller (at a pathway step)

pathwayStep (each step has

participants)

- Organism

(domain: gene, protein,

range: biosource)

-

BIOTOP - - locatedIn

DrugRole

BiomedicalMaterialRole

EFO - has_role

- has_role

MGED - - - -

OBI - has_role

locatedIn

has_role

UMLS

(NCI, GO,

MeSH)

gene_is_element_in_pathway

gene_product_is_element_in_pa

thway

pathway_has_gene_element

chemical_or_drug_plays

_role_in_biological_pro

cess

gene_product_has_organism_so

urce

has_specimen_source_identity

is_organism_source_of_gene_pr

oduct

chemical_or_drug_plays_role_i

n_biological_process

Table 8 Properties identified in the biomedical domain




2.2.2. Bottom up construction of the GRANATUM model

The bottom-up construction of the model identifies concepts and properties based on existing

data sets and requirements that are relevant to the GRANATUM scope (Figure 3), i.e. relevant to

in-silico modeling, cancer chemoprevention, scientific research and experimental data/protocols.

Specifically, the following steps are followed:

Analysis of publicly available data sets related to the cancer chemoprevention (e.g.

KEGG, CheBI, ClinicalTrials etc). Those data sets have been detected during the review

of the state of the art at D1.1. The analysis is based on the data provided through the

SPARQL endpoints of each data set or through the searching mechanism provided.

Analysis of the requirements that have been detected during the requirements analysis in

D1.1. The functional and non-functional requirements are analyzed in order to detect

concepts and properties that can be used by the model. Moreover, concepts and properties

are detected based on the Usage Scenarios and the Questionnaire described in D1.1.

Analysis of experimental data sets provided by the trial partners related to cancer

chemoprevention.

2.2.2.1. Analysis of Publicly available datasets

In this section are presented and analyzed publicly available existing data sets related to the

cancer chemoprevention, that have been detected during the review of the state of the art at D1.1.

2.2.2.1.1. Existing datasets accessed through SPARQL endpoints

This section presents the publicly available datasets that are accessed through SPARQL

endpoints in order to detect concepts and properties related to cancer chemoprevention. Each

SPARQL endpoint may provide data from more than one data set (e.g. for example

http://linkedlifedata.com/ provides access to more than 20 datasets.)

Chemical Entities of Biological Interest (ChEBI)17

ChEBI is a database and ontology of small molecular entities. The term molecular entity refers to

any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion,

complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular

entities are either products of nature or synthetic products used to intervene in the processes of

living organisms. Molecules directly encoded by the genome, such as nucleic acids, proteins and

peptides derived from proteins by proteolysis cleavage, are not as a rule included in ChEBI.

Pubmed18

PubMed is the most widely used source for biomedical literature. PubMed provides access to

citations from the MEDLINE database and additional life science journals including links to

many full-text articles at journal Web sites and other related Web resources. The US NLM

(National Library of Medicine) at the NIH (National Institutes of Health) maintains the database

17

http://www.ebi.ac.uk/chebi/

18 http://www.ncbi.nlm.nih.gov/pubmed

http://linkedlifedata.com/

http://www.ebi.ac.uk/chebi/

http://www.ncbi.nlm.nih.gov/pubmed




as part of the Entrez information retrieval system. PubMed was first released in January 1996.

Today, much of the knowledge available regarding chemoprevention agents are only available as

publications. As a result, PubMed is typically the primary source of information for most

biomedical researchers.

DrugBank19

The DrugBank [22] database is a bioinformatics and cheminformatics resource that combines

detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug

target (i.e. sequence, structure, and pathway) information. The database contains 6826 drug

entries including 1431 FDA-approved (Food and Drug Administration) small molecule drugs,

133 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals and 5211 experimental

drugs. Additionally, 4435 non-redundant protein (i.e. drug target/enzyme/transporter/carrier)

sequences are linked to these drug entries. Each DrugCard entry contains more than 150 data

fields with half of the information being devoted to drug/chemical data and the other half

devoted to drug target or protein data.

Kyoto Encyclopedia of Genes and Genomes (KEGG)20

KEGG is a database resource that integrates genomic, chemical, and systemic functional

information. In particular, gene catalogues are linked to higher-level systemic functions of the

cell, the organism, and the ecosystem. Major efforts have been undertaken to manually create a

knowledge base for such systemic functions by capturing and summarizing experimental

knowledge in computable forms; namely, in the forms of molecular networks called KEGG

pathway maps, BRITE functional hierarchies, and KEGG modules. Continuous efforts have also

been made to improve the annotation procedure for linking genomes to the molecular networks.

As the result, KEGG is widely used for interpretation of large-scale datasets generated by

genome sequencing and other high-throughput experimental technologies. In addition to

maintaining the aspects to support basic research, KEGG is being expanded towards more

practical applications with molecular network-based views of diseases, drugs, and environmental

compounds.

Reactome21

Reactome is an open-source, open access, manually curated and peer-reviewed pathway

database. Pathway annotations are authored by expert biologists, in collaboration with Reactome

editorial staff and cross-referenced to many bioinformatics databases. The rationale behind

Reactome is to convey the rich information in the visual representations of biological pathways

familiar from textbooks and articles in a detailed, computationally accessible format. The core

unit of the Reactome data model is the reaction. Entities (nucleic acids, proteins, complexes and

small molecules) participating in reactions form a network of biological interactions and are

grouped into pathways. Examples of biological pathways in Reactome include signalling, innate

and acquired immune function, transcriptional regulation, translation, apoptosis and classical

intermediary metabolism.

19

http://www.drugbank.ca/

20 http://www.genome.jp/kegg/

21 http://www.reactome.org

http://www.drugbank.ca/

http://www.genome.jp/kegg/

http://www.reactome.org/




Universal Protein Resource (UniProt)22

The UniProt is a comprehensive resource for protein sequence and annotation data. The UniProt

Knowledgebase (UniProtKB) is the central hub for the collection of functional information on

proteins, with accurate, consistent and rich annotation. In addition to capturing the core data

mandatory for each UniProtKB entry, as much annotation information as possible is added. This

includes widely accepted biological ontologies, classifications and cross-references, and clear

indications of the quality of annotation in the form of evidence attribution of experimental and

computational data.

Diseasome23

Diseasome is a disease/disorder relationships explorer and a sample of an innovative map-

oriented scientific work. Built by a team of researchers and engineers, it uses the Human Disease

Network dataset [23] and allows intuitive knowledge discovery by mapping its complexity. This

kind of data has a network-like organization, and relations between elements are at least as

important as the elements themselves. More data could be integrated to this prototype and could

eventually bring closer phenotype and genotype.

Dailymed24

DailyMed provides high quality information about marketed drugs. This information includes

FDA labels (package inserts). It contains health information providers and the public with a

standard, comprehensive, up-to-date, look-up and download resource of medication content and

labelling as found in medication package inserts. Drug labelling and other information in the

SPL is what has been most recently submitted by drug companies to the Food and Drug

Administration (FDA) as drug listing information. The drug labelling has been reformatted to

make it easier to read but its content has not been altered or verified by FDA or National Library

of Medicine.

Sider25

The Side Effect Resource (SIDER) represents an effort to aggregate dispersed public information

on side effects. SIDER contains information on marketed medicines and their recorded adverse

drug reactions. The information is extracted from public documents and package inserts. The

available information include side effect frequency, drug and side effect classifications as well as

links to further information, for example drug–target relations.

open-BioMed.org.uk26

open-BioMed allows one to search for information about alternative medicines associated with a

given disease, in terms of its putative effects, associated genes and relating clinical trials, from

different databases accessible through SPARQL endpoints.

22

http://www.uniprot.org/

23 http://diseasome.eu/

24 http://dailymed.nlm.nih.gov/dailymed/about.cfm

25 http://sideeffects.embl.de/

26 http://www.open-biomed.org.uk/

http://www.uniprot.org/

http://diseasome.eu/

http://dailymed.nlm.nih.gov/dailymed/about.cfm

http://sideeffects.embl.de/

http://www.open-biomed.org.uk/




BioGRID27

The Biological General Repository for Interaction Datasets (BioGRID) is a public database that

archives and disseminates genetic and protein interaction data from model organisms and

humans. BioGRID currently holds many interactions curated from both high-throughput datasets

and individual focused studies, as derived from publications in the primary literature. Current

curation drives are focused on particular areas of biology to enable insights into conserved

networks and pathways that are relevant to human health. BioGRID provides interaction data to

several model organism databases.

Freebase28

Freebase is a large collaborative knowledge base consisting of metadata composed mainly by

its community members. It is an online collection of structured data harvested from many

sources, including individual 'wiki' contributions. Freebase aims to create a global resource

which allows people (and machines) to access common information more effectively. Freebase

provides access to many biological data related to genes, proteins organisms etc.

HapMap29

The International HapMap Project is an organization that aims to develop

a haplotype map (HapMap) of the human genome, which will describe the common patterns of

human genetic variation. HapMap is a key resource for researchers to find genetic variants

affecting health, disease and responses to drugs and environmental factors. The information

produced by the project is made freely available to researchers around the world.

Human Protein Reference Database30

The Human Protein Reference Database (HPRD) [24] represents a centralized platform to

visually depict and integrate information pertaining to domain architecture, post-translational

modifications, interaction networks and disease association for each protein in the human

proteome. All the information in HPRD are extracted from the literature by expert biologists who

read, interpret and analyze the published data. It contains information pertaining to the biology

of most human proteins and proteins involved in human diseases.

HumanCYC31

The encyclopaedia of homo sapiens genes and metabolism (HumanCYC) [25] is a

bioinformatics database that describes human metabolic pathways and the human genome. By

presenting metabolic pathways as an organizing framework for the human genome, HumanCyc

provides the user with an extended dimension for functional analysis of Homo sapiens at the

genomic level. For example, HumanCyc has tools for analysis of human metabolomics and gene-

expression data.

27

http://thebiogrid.org/ 28

http://www.freebase.com/ 29

http://hapmap.ncbi.nlm.nih.gov/ 30

http://www.hprd.org/ 31

http://humancyc.org/

http://wiki.thebiogrid.org/doku.php/statistics

http://en.wikipedia.org/wiki/Knowledge_base

http://en.wikipedia.org/wiki/Metadata

http://en.wikipedia.org/wiki/Community

http://en.wikipedia.org/wiki/Online_database

http://en.wikipedia.org/wiki/Wiki

http://en.wikipedia.org/wiki/Haplotype

http://en.wikipedia.org/wiki/Map

http://en.wikipedia.org/wiki/Human_genome

http://en.wikipedia.org/wiki/Genetic_variability

http://thebiogrid.org/

http://www.freebase.com/

http://hapmap.ncbi.nlm.nih.gov/

http://www.hprd.org/

http://humancyc.org/




IntAct32

IntAct provides a freely available, open source database system and analysis tools for protein

interaction data. All interactions are derived from literature curation or direct user submissions

and are freely available.

LHGND33

The Literature-derived Human Gene-Disease Network (LHGND) [26] is a publicly available

gene-disease repository. It uses the Text2SemRel system to automatically constructs knowledge

bases from textual data consisting of facts about entities using semantic relations. LHGDN is

part of the Linked Life Data initiative.

LinkedCT34

The Linked Clinical Trials (LinkedCT) is a Semantic Web data source for clinical trials data.

The data exposed by LinkedCT is generated by (1) transforming existing data sources of clinical

trials into RDF, and (2) discovering links between the records in the trials data and several other

data sources.

MetaCyc35

The MetaCyc [27] database is a comprehensive and freely accessible resource for metabolic

pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally

determined, small-molecule metabolic pathways and are curated from the primary scientific

literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic

pathways currently available. Pathways reactions are linked to one or more well-characterized

enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and

literature citations.

MINT36

The Molecular INTeraction database (MINT) [28] aims at storing, in a structured format,

information about molecular interactions by extracting experimental details from work published

in peer-reviewed journals. At present the MINT team focuses the curation work on physical

interactions between proteins. Genetic or computationally inferred interactions are not included

in the database. Over the past few years the number of curated physical interactions has soared to

over 95000.

NeuroCommons37

NeuroCommons is a project that seeks to make all scientific research materials - research

articles, knowledge bases, research data, physical materials - as available and as usable as they

can be. To achieve this, they use practices that render information in a form that promotes

32

http://www.ebi.ac.uk/intact/ 33

http://www.dbs.ifi.lmu.de/~bundschu/LHGDN.html 34

http://linkedct.org/ 35

http://metacyc.org/ 36

http://mint.bio.uniroma2.it 37

http://neurocommons.org

http://www.ebi.ac.uk/intact/

http://www.dbs.ifi.lmu.de/~bundschu/LHGDN.html

http://linkedct.org/

http://metacyc.org/

http://mint.bio.uniroma2.it/

http://neurocommons.org/




uniform access by computational agents. It covers general data and knowledge sources used in

computational biology as well as sources specific to neuroscience and neuromedicine.

PharmGKB 38

The Pharmacogenomics Knowledge Base (PharmGKB ) is a repository for genetic, genomic,

molecular and cellular phenotype data and clinical information about people who have

participated in pharmacogenomics research studies. The data includes, but is not limited to,

clinical and basic pharmacokinetic and pharmacogenomic research in the cardiovascular,

pulmonary, cancer, pathways, metabolic and transporter domains.

2.2.2.1.2. Existing datasets accessed through searching

This section presents the existing data sets that are not accessed through SPARQL endpoint but

are accessed through the searching mechanism usually provided by their web site.

PubMed Dietary Supplement Subset39

Pubmed Dietary Supplement Subset is designed to limit search results to citations from a broad

spectrum of dietary supplement literature including vitamin, mineral, phytochemical, ergogenic,

botanical, and herbal supplements in human nutrition and animal models. The subset retrieves

dietary supplement-related citations on topics including, but not limited to: chemical

composition; biochemical role and function - both in vitro and in vivo; clinical trials; health and

adverse effects; fortification; traditional Chinese medicine and other folk/ethnic supplement

practices; cultivation of botanical products used as dietary supplements; as well as, surveys of

dietary supplement use.

Dietary Supplements Labels Database40

The Dietary Supplements Labels Database offers information about label ingredients in more

than 5,000 selected brands of dietary supplements. It enables users to compare label ingredients

in different brands. Information is also provided on the “structure/function” claims made by

manufacturers and can therefore be used to narrow down active ingredients in different types of

food which may be applicable as chemoprevention agents. Ingredients of dietary supplements in

this database are linked to other databases such as MedlinePlus and PubMed to allow users to

understand the characteristics of ingredients and view the results of research pertaining to them.

ClinicalTrials41

ClinicalTrials.gov is an up-to-date registry and results database of federally and privately

supported clinical trials conducted in the US and around the world. ClinicalTrials.gov offers

information for locating federally and privately supported clinical trials for a wide range of

diseases and conditions.

38

http://www.pharmgkb.org/ 39

http://ods.od.nih.gov/research/PubMed_Dietary_Supplement_Subset.aspx 40

http://dietarysupplements.nlm.nih.gov/dietary/ 41

http://clinicaltrials.gov/

http://dietarysupplements.nlm.nih.gov/dietary/glossary.jsp#medlineplus

http://dietarysupplements.nlm.nih.gov/dietary/glossary.jsp#pubmed

http://ods.od.nih.gov/research/PubMed_Dietary_Supplement_Subset.aspx

http://dietarysupplements.nlm.nih.gov/dietary/

http://clinicaltrials.gov/




TOXicology Data NETwork (TOXNET)42

TOXNET provides access to full-text and bibliographic databases oriented to toxicology,

hazardous chemicals, environmental health and related areas.

Aggregated Computational Toxicology Resource (ACToR)43

ACToR is an online warehouse of all publicly available chemical toxicity data and can be used

to find all publicly available data about potential chemical risks to human health and the

environment. ACToR aggregates data from over 500 public sources on over 500,000

environmental chemicals searchable by chemical name, other identifiers and by chemical

structure. The data warehouse allows users to search and query data from chemical toxicity

databases including:

i. ToxRefDB (animal toxicity studies).

ii. ToxCastDB (data from screening 1,000 chemicals in over 500 high-throughput assays).

iii. ExpoCastDB (consolidate and link human exposure and exposure factor data for

chemical prioritization).

iv. Distributed Structure-Searchable Toxicity (DSSTox) is a database that provides high

quality chemical structures and annotations. Its overall aims are to effect the closer

association of chemical structure information with existing toxicity data.

PubChem44

PubChem provides information on the biological activities of small molecules including

substance information, compound structures, and BioActivity data in three primary databases.

PubChem is integrated with Entrez, NCBI’s (National Center for Biotechnology Information)

primary search engine, and also provides compound neighbouring, sub/superstructure, similarity

structure, BioActivity data, and other searching features. PubChem contains substance and

BioAssay (Biological Assay) information from a multitude of depositors. The system is

maintained by the NCBI, a component of the NLM, which is part of the US NIH. PubChem can

be accessed for free through a web user interface. PubChem contains substance descriptions and

small molecules with fewer than 1000 atoms and 1000 bonds. More than 80 database vendors

contribute to the growing PubChem database.

Repartoire Database45

REPAIRtoire is a database resource for systems biology of DNA damage and repair. The

database collects and organizes the following types of information: (i) DNA damage linked to

environmental mutagenic and cytotoxic agents, (ii) pathways comprising individual processes

and enzymatic reactions involved in the removal of damage, (iii) proteins participating in DNA

repair and (iv) diseases correlated with mutations in genes encoding DNA repair proteins.

REPAIRtoire provides also links to publications and external databases. REPAIRtoire can be

42

http://toxnet.nlm.nih.gov/

43 http://actor.epa.gov/actor/faces/ACToRHome.jsp

44 http://pubchem.ncbi.nlm.nih.gov/

45 http://repairtoire.genesilico.pl/

http://toxnet.nlm.nih.gov/

http://actor.epa.gov/actor/faces/ACToRHome.jsp

http://pubchem.ncbi.nlm.nih.gov/

http://repairtoire.genesilico.pl/




queried by the name of pathway, protein, enzymatic complex, damage and disease. In addition, a

tool for drawing custom DNA-protein complexes is available online.

Cancer Gene Expression Database (CGED)46

CGED is a database of gene expression profile and accompanying clinical information. The data

of CGED were obtained through collaborative efforts of Nara Institute of Science and

Technology, Osaka University Medical School, Kyoto University Medical School and Osaka

Medical Center for Caner and Cardiovascular Diseases to identify genes of clinical importance.

This database offers graphical presentation of expression and clinical data with similarity search

and sorting functions. CGED includes data on breast (prognosis and docetaxel data sets),

colorectal, hepatocellular, esophageal, thyroid, and gastric cancers (updated in March 2007).

ArrayExpress47

The ArrayExpress Archive is a database of functional genomics experiments including gene

expression where you can query and download data collected to Minimum Information About a

Microarray Experiment (MIAME) and Minimum Information about a high-throughput

SeQuencing Experiment (MINSEQE) standards. Gene Expression Atlas contains a subset of

curated and re-annotated archive data which can be queried for individual gene expression under

different biological conditions across experiments.

Gene Expression Omnibus (GEO)48

The GEO is a public repository that archives and freely distributes microarray, next-generation

sequencing, and other forms of high-throughput functional genomic data submitted by the

scientific community. In addition to data storage, a collection of web-based interfaces and

applications are available to help users query and download the studies and gene expression

patterns stored in GEO.

GenBank49

The GenBank sequence database is an open access, annotated collection of all publicly available

nucleotide sequences and their protein translations. This database is produced at NCBI as part of

the International Nucleotide Sequence Database Collaboration (INSDC). GenBank and its

collaborators receive sequences produced in laboratories throughout the world from more than

380,000 distinct organisms. Release 155, produced in August 2006, and contained over 65

billion nucleotide bases in more than 61 million sequences. The input stream of data coming into

the database is primarily as direct submissions from the scientific community and individual

laboratories as well as from bulk submissions from large-scale sequencing centers, on electronic

media, with little or no data being keyboarded from the printed page by the databank staff.

46

http://lifesciencedb.jp/cged/

47 http://www.ebi.ac.uk/arrayexpress/

48 http://www.ncbi.nlm.nih.gov/geo/

49 http://www.ncbi.nlm.nih.gov/genbank/

http://www.mged.org/Workgroups/MIAME/miame.html

http://www.mged.org/minseqe

http://lifesciencedb.jp/cged/

http://www.ebi.ac.uk/arrayexpress/

http://www.ncbi.nlm.nih.gov/geo/

http://www.ncbi.nlm.nih.gov/genbank/




ChemSpider50

ChemSpider is a free access website for chemists to research structure-based information. It links

together chemical structures and their associated information across the web, providing a single

searchable repository which contains millions of chemical structures.

ChemSpider builds on the collected sources by adding additional properties, related information

and links back to original data sources. It offers text and structure searching to find compounds

of interest and provides unique services to improve this data by curation and annotation and to

integrate it with users’ applications. Moreover, the ChemSpider SyntheticPages (CS|SP)51

,

extends this model to cover reactions, providing quick publication, peer review and semantic

enhancement of repeatable reactions

Chemical Compounds Database (Chembase)52

The Chembase collects and provides information on chemical compounds and their physical and

chemical properties, NMR (Nuclear Magnetic Resonance) spectra, mass spectra, UV/Vis

(Ultraviolet-Visible Spectroscopy) absorption and IR data. All data available can be searched by

various parameters or browsed by different topics.

Sigma-Aldrich53

The Sigma-Aldrich product database includes datasheets for commercially available compounds

including solubility.

ChemDB54

ChemDB is a public database of small molecules available on the Web. ChemDB is built using

the digital catalogs of over a hundred vendors and other public sources and is annotated with

information derived from these sources as well as from computational methods, such as

predicted solubility and 3D structure. It supports multiple molecular formats and is periodically

updated, automatically whenever possible. The current version of the database contains

approximately 4.1 million commercially available compounds and 8.2 million counting isomers.

The database includes a user-friendly graphical interface, chemical reactions capabilities, as well

as unique search capabilities.

Colon Chemoprevention Agents Database (CCAD)55

The Colon chemoprevention agents database [29] results from a systematic review of the

literature of colon chemoprevention in human, rats and mice. Target cancers are colorectal

adenoma and adenocarcinoma, aberrant crypt foci (ACF) (a preneoplasic lesion), and Min mice

polyp (adenomas in Apc+/- mutant mice). The chemopreventive agents are ranked by efficacy

(potency against carcinogenesis).

50

http://www.chemspider.com/

51 http://cssp.chemspider.com/

52 http://www.chembase.com/

53 https://www.sigmaaldrich.com/catalog/

54 http://cdb.ics.uci.edu/

55 http://corpet.free.fr/

http://cdb.ics.uci.edu/




Wikipathways56

Wikipathways [30] is an open, collaborative platform dedicated to the curation of biological

pathways. WikiPathways thus presents a model for pathway databases that enhances and

complements ongoing efforts, such as KEGG, Reactome and Pathway Commons.

cPath: Pathway Database Software57

cPath is a software platform for collecting/querying biological pathways. It can serve as the core

data handling component in information systems for pathway visualization, analysis and

modelling. Using it, researchers can import interaction and pathway data from multiple sources,

access such data via a standard web interface, and export data to third-party applications via a

standards-based web service. Biomedical researchers can utilize cPath for content aggregation,

query and analysis. More specifically, its main features include: i) Aggregate pathway data from

multiple sources (e.g. BioCyc, KEGG, Reactome), ii) Import/Export support with different

formats PSI-MI (Proteomics Standards Initiative Molecular Interaction) and BioPAX, iii) Data

visualization using Cytoscape and iv) Simple web service.

Protein Data Bank (PDB)58

The PDB is a repository for the 3D structural data of large biological molecules, such as proteins

and nucleic acids. The data, typically obtained by X-ray crystallography or NMR (Nuclear

Magnetic Resonance) spectroscopy and submitted by biologists and biochemists from around the

world, are freely accessible on the Internet. Most major scientific journals, and some funding

agencies, require scientists to submit their structure data to the PDB.

Protein Database59

The Protein database is a collection of sequences from several sources, including translations

from annotated coding regions in GenBank, and TPA (Tissue plasminogen activator), as well as

records from SwissProt, Protein Information Resource (PIR), Protein Research Foundation

(PRF), UniProt and PDB. Protein sequences are the fundamental determinants of biological

structure and function.

56

http://www.wikipathways.org

57 http://cbio.mskcc.org/software/cpath/

58 http://www.pdb.org

59 http://www.hprd.org/

http://www.genome.jp/kegg/

http://www.reactome.org/

http://www.pathwaycommons.org/pc/

http://www.wikipathways.org/

http://www.pdb.org/

http://www.hprd.org/




2.2.2.1.3. Concept/attribute identification from Existing datasets

This section identifies the concepts detected into both types of dataset (i.e. accessed through

SPARQL endpoints or through searching). For each concept a list of related attributes are

presented. The results are presented in Table 9.

Concept SPARQL

endpoints

Data sets Attributes

Published Work -

Title

Author

Citation

Abstract

Type (e.g. journal)

Research statement - - -

Person Name

Surname

Experiment - Description

Title

Clinical Trial - Description

Location

Date

Title

BioAssay Name

Description

BioActive Compounds

(compounds/substances tested in the

BioAssay)

Protocol Description

Experimental factor - Name

Description

Pathway Name

Description

Component

Order

Target Name

Description

Nucleic acid Gene sequence

Protein Cellular Location

Organism

Protein sequence




Disease Name

Description

possibleDrug/isTreatedBy

Source - Name

Description

Drug Name/ brand name

Description

Type

Formula/Smiles

Dosage

Adverse reaction

Indication

Has target

Natural source -

Name

Description

Molecule Name

Description

Formula

Molecular weight

Chemopreventive

agent

Name

Description

Table 9 Concepts detected into dataset (accessed through SPARQL endpoints or through searching)

2.2.2.2. Experimental data analysis

This section aims to identify the concepts, properties and attributes detected by analyzing sets of

experimental data relevant to the cancer chemoprevention. To succeed this, two sets of

experimental data were analyzed. The experimental data sets and the results, conclusions related

to each data set are documented in separate published studies:

The first experimental data set is used by [31]. The authors establish and utilize a mouse

mammary gland organ culture model (MMOC) as a bioassay for identifying

chemopreventive agents. More than 200 synthetic and natural product-derived molecules

were evaluated in this model. For each molecule a number of bioassays are conducted

measuring its activity (e.g. the inhibition of carcinogen-induced development of

precancerous lesions in the MMOC and the the percentage of the MMOC activity). If the

measured activity meets specific requirements then the molecule is identified as a

chemopreventive agent. Figure 5 shows a snapshot of the experimental data set produced.

The whole experimental data set constitutes a Study that contains many Bioassays each

examining the activity of a molecule that can be identified as a Chemopreventive agent.




Figure 5 Conceptualization of the first experimental data set

The second experimental data set is used by [32]. This study identifies potential cancer

chemopreventive constituents using a battery of cell- and enzyme-based in vitro marker

systems relevant for prevention of carcinogenesis in vivo. A number of known

chemopreventive substances have been tested belonging to several structural classes as

reference compounds for the identification of novel chemopreventive agents or

mechanisms. For each chemopreventive agent a number of bioassays were conducted

measuring its activity. Figure 6 shows a snapshot of the experimental data set produced.

The whole experimental data set constitutes a Study that contains many Bioassays each

examining the activity of a Chemopreventive agent.

Figure 6 Conceptualization of the second experimental data set




The concepts and properties derived from the experimental data analysis are show in Table 10

and Table 11 respectively.

Concept Description

Study A study is a paper that describes the experiment, the experimental data used

and the results.

Bioassay A bioassay includes specific measurements for the activity of the

chemoprevention agents.

Chemopreventive agent A chemopreventive agent is a single tested compound during a bioassay.

Table 10 Concepts derive from experimental data analysis

Property Description

hasInput This property is used to declare that a molecule is used as input to the

Bioassay-Experiment.

identify This property is used to declare that a Bioassay-Experiment has identified a

chemopreventive agent that meets specific requirements.

Table 11 Properties derive from experimental data analysis

2.2.2.3. Requirements analysis

This section aims to identify the concepts and properties detected by analyzing the Usage

Scenario and the Questionnaire described in D1.1. Specifically, Table 12 contains the analysis of

the Usage Scenarios and the Questionnaire that are presented in D1.1. It lists the concepts

derived from the Usage Scenarios and the Questionairre.

Concept Usage Scenario Questionnaire

Publication US1/US4

Research statement US4 -

Protocol US1 -

Experimental Factor US1 (tissue/cell line) -

Experiment US1

Clinical trial US2

Virtual screening US3 -

BioAssay US1 -

Pathway US1

Molecule US1/US2/US3




Chemopreventive agent US1/US2

Drug US3

Natural Source US1 -

Disease US1 -

Protein US1/ US3

Gene US1

Table 12 Concepts derived from the Use Cases and the Questionnaires




3. GRANATUM BIOMEDICAL SEMANTIC MODEL FORMALIZATION

At this stage of development, the GRANATUM Biomedical Semantic Model has identified the

main entities and their core set of properties, together with main relations between them.

Following a “meet-in-the-middle” approach by combining both top-down and bottom-up

conceptualization, a set of fundamental entities has been found:

A set of elements for the representation of Literature;

A set of concepts from the biomedical domain related to the cancer chemoprevention; A set of concepts from the biomedical domain related to the experimental

representation;

A set of concept from the biomedical domain related to the In-silico modeling.

The following sections describe in depth both the conceptual view, and the ontological

representation of the identified concepts.

3.1. OVERVIEW

The objective of the Granatum Biomedical Semantic Model (Figure 7) is to recognize the

fundamental biomedical entities used in cancer chemoprevention and define their relations. The

Granatum Biomedical Semantic Model defines the following entities:

Information resource

o Unstructured knowledge resource

Image

o Semi-structured knowledge resource

Forum post

Published Work

Research statement

Person

Protocol

Experimental Factor

Experiment

o In vivo

Clinical trial

o In vitro

BioAssay

o In silico

Virtual screening

Scientific Workflow

Molecule

o Chemopreventive agent

o Target

Protein

Nucleic acid

Reactive oxidative species

Sugar

Lipid

Toxicity




Source

o Drug

o Natural Source

Pathway

Disease

Figure 7 Overview of the GRANATUM Biomedical Semantic Model




3.2. MODEL SPECIFICATION

This section defines the data model used by the GRANATUM Biomedical Semantic Model by

defining the classes and properties used by the model. For each class are mentioned the URI, a

definition, the type of the class, its super classes and the properties that use the class as domain

or as range.

granatum:InformationResource

URI http://www.granatum.eu#InformationResource

Definition A resource that provides data, knowledge or narrative information.

sub-class-of -

In-domain-of Property Range

- -

granatum:UnstructuredKnowledgeResource

URI http://www.granatum.eu#UnstructuredKnowledgeResource

Definition A resource that provides access to collection of data or information that is not easily

queryable without using metadata about the resource.

sub-class-of granatum:InformationResource


- -

granatum:Image

URI http://www.granatum.eu#Image

Definition A resource that provides data in the form of images.

sub-class-of granatum:UnstructuredKnowledgeResource


URI xsd:string

granatum:SemiStructuredKnowledgeResource

URI http://www.granatum.eu#SemiStructuredKnowledgeResource

Definition A resource that provides data that is partially structured.

sub-class-of granatum:InformationResource





- -

granatum:PublishedWork

URI http://www.granatum.eu#PublishedWork

Definition This entity refers to any type of publication that makes content available to public.

Each publication has at least one Author, supports a number of Research Statements

and is commented in a number of Forum posts. A published work can be: a Book, a

conference Article, a Journal article etc.

sub-class-of granatum:SemiStructuredKnowledgeResource


granatum:hasStatement granatum:ResearchStatement

granatum:hasAuthor granatum:Person

granatum:commentedIn granatum:ForumPost

granatum:referenceTo granatum:PublishedWork

granatum:title xsd:string

granatum:Citation xsd:string

granatum:Abstract xsd:string

granatum:Type xsd:string

granatum:ForumPost

URI http://www.granatum.eu#ForumPost

Definition A Forum Post is related to a Published Work and contains comments about this

specific Publication. It allows the discussion and the exchange of ideas related to the

Published Work.

sub-class-of granatum:SemiStructuredKnowledgeResource


granatum:hasAuthor granatum:Person

granatum:Date xsd:Date

granatum:Comment xsd:string

URI xsd:string




granatum:ResearchStatement

URI http://www.granatum.eu#ResearchStatement

Definition A Research Statement is a declarative sentence supported by a specific Publication.

sub-class-of -

Property Range

granatum:Description xsd:string

granatum:Hypothesis xsd:string

granatum:Claim xsd:string

granatum:Person

URI http://www.granatum.eu#Person

Definition This class represents the author that has written or has contributed to the writing of a

Published Work or Forum post. The Author will be a FOAF person.

sub-class-of foaf:person

Property Range

granatum:Name xsd:string

granatum:Surname xsd:string

granatum:Affiliation xsd:string

granatum:contactDetails xsd:string

granatum:Protocol

URI http://www.granatum.eu#Protocol

Definition A protocol is an information entity which is a set of instructions that describe how an

experiment is done.

sub-class-of -

Property Range

granatum:referredToPublication granatum:PublishedWork

granatum:useFactor granatum:ExperimentalFactor

granatum:Title xsd:string

granatum:Procedure xsd:string





granatum:ExperimentalFactor

URI http://www.granatum.eu#ExperimentalFactor

Definition Experimental factors are the variable aspect of an experiment design which can be used to

describe an experiment, or set of experiments, in an increasingly detailed manner.

sub-class-of -

Property Range



granatum:Experiment

URI http://www.granatum.eu#Experiment

Definition An experiment is a methodical procedure carried out with the goal of verifying,

falsifying, or establishing the validity of a hypothesis. The experiment follows a Protocol

to check the hypothesis and its findings may be published to a Publication.

sub-class-of -

Property Range

granatum:followProtocol granatum:Protocol

granatum:describedInPublication granatum:PublishedWork

granatum:identify granatum:ChemopreventiveAgent

granatum:hasInput granatum:Target

granatum:Molecule



granatum:Date xsd:Date

granatum:Outcome xsd:string

granatum:InVivo

URI http://www.granatum.eu#InVivo

Definition In vivo (Latin for “within the living”) refers to experimentation using a whole, living organism as opposed to a partial or dead organism.

sub-class-of granatum:Experiment

Property Range




- -

granatum:Clinical trial

URI http://www.granatum.eu#Clinical_trial

Definition A Clinical trial is a research study that prospectively assigns human participants or

groups of humans to one or more health-related interventions to evaluate the effects on

health outcomes. Interventions include but are not restricted to drugs, cells and other

biological products, surgical procedures, radiologic procedures, devices, behavioral

treatments, process-of-care changes, and preventive care.

sub-class-of granatum:InVivo

Property Range

granatum:ParticipantDetails xsd:string

granatum:InVitro

URI http://www.granatum.eu#InVitro

Definition In vitro (Latin for within the glass) refers to the technique of performing a given

procedure in a controlled environment outside of a living organism.


Property Range

- -

granatum:BioAssay

URI http://www.granatum.eu#BioAssay

Definition A Bioassay is a laboratory test or analysis of the biological activity of a substance (e.g.

Chemopreventive agent) performed by studying its effect on an organism or in a test

tube under controlled conditions. A Bioassay is part of an experiment that includes also

other bioassays.

sub-class-of granatum:InVitro

Property Range

- -

granatum:InSilico

URI http://www.granatum.eu#InSilico




Definition In silico is an experiment performed on computer or via computer simulation.


Property Range

granatum:useWorkflow granatum:ScientificWorkflow

granatum:VirtualScreening

URI http://www.granatum.eu#VirtualScreening

Definition Virtual Screening refers to the technique of performing a Biomedical experiment

entirely in a computer via computer simulation

sub-class-of granatum:InSilico

Property Range

- -

granatum:ScientificWorkflow

URI http://www.granatum.eu#ScientificWorkflow

Definition The Scientific workflow is a pipeline of connected components (in-silico tools, models)

to perform an in silico experiment.

sub-class-of -

Property Range



granatum:Molecule

URI http://www.granatum.eu#Molecule

Definition The smallest particle of a substance that has all of the physical and chemical properties

of that substance. Molecules are made up of one or more atoms. If they contain more

than one atom, the atoms can be the same (an oxygen molecule has two oxygen atoms)

or different (a water molecule has two hydrogen atoms and one oxygen atom). Biological

molecules, such as proteins and DNA, can be made up of many thousands of atoms.

sub-class-of -

Property Range



granatum:Formula xsd:string




granatum:SMILES xsd:string

granatum:MolecularWeight xsd:int

granatum:Size xsd:int

granatum:ChemopreventiveAgent

URI http://www.granatum.eu#ChamopreventiveAgent

Definition A Chemopreventive agent is a Molecule that has shown some evidence that it may be able

to prevent or delay the development of a specific Disease (e.g. cancer) by interfering with

a Biological target. For example - cancer chemopreventive agents are used to inhibit,

delay, or reverse carcinogenesis. A Chemopreventive agent can be identified in an

Experiment and can be contained in a Natural source or a Drug. It can also be a synthetic

chemical agent. References to a Chemopreventive agent can be found in a Publication

sub-class-of granatum:Molecule

Property Range



granatum:induceDifferentiation xsd:boolean

granatum:coopperateWith granatum:ChemopreventiveAgen

t

granatum:hasToxicity granatum:Toxicity

granatum:referredInToPublication granatum:PublishedWork

granatum:induce/prevent granatum:Target

granatum:affectPathway granatum:Pathway

granatum:hasSource granatum:Source

granatum:Target

URI http://www.granatum.eu#BiologicalTarget

Definition A biological target is a biopolymer such as a protein or nucleic acid whose activity can

be modified by an external stimulus. A Chemopreventive agent has a Biological target in

order to "hit" it and change its behavior in order to prevent a disease.

sub-class-of granatum:Molecule

Property Range






granatum:ReactiveOxygenSpecies

URI http://www.granatum.eu#ReactiveOxygenSpecies

Definition Reactive oxygen species are organic or inorganic chemicals that contain an oxygen atom

with an unpaired electron. This unstable electron configuration causes these chemicals to

be highly reactive with other molecules.

sub-class-of granatum:Target

Property Range

- -

granatum:Sugar

URI http://www.granatum.eu#MicroMolecule

Definition Any member of a class of edible, crystalline carbohydrates (mainly sucrose, lactose and

fructose) characterised by a sweet flavour; a loose term applied to monosaccharides,

disaccharides, trisaccharides and oligosaccharides, in contrast to complex carbohydrates

such as polysaccharides.


Property Range

- -

granatum:Protein

URI http://www.granatum.eu#Protein

Definition A Protein is a group of complex organic macromolecules composed of one or more

chains (linear polymers) of alpha-L-amino acids linked by peptide bonds and ranging in

size from a few thousand to over 1 million Daltons. Proteins are fundamental genetically

encoded components of living cells with specific structures and functions dictated by

amino acid sequence.


Property Range

granatum:ProteinSequence xsd:string

granatum:CellularLocation xsd:string

granatum:NucleicAcid

URI http://www.granatum.eu#NucleicAcid




Definition Nucleic Acid are a family of macromolecules, composed of various moieties: purines,

pyrimidines, phosphoric acid, and a pentose, either d-ribose or d-deoxyribose. Nucleic

acids as DNA or RNA is found in the chromosomes, nucleoli, mitochondria, and

cytoplasm of all cells, and in viruses. Nucleic acids are the major players in controlling

cellular function and heredity.


Property Range

granatum:Sequence xsd:string

granatum:Lipid

URI http://www.granatum.eu#Lipid

Definition A class of hydrocarbon-containing organic compounds. Lipids are insoluble in water but

soluble in nonpolar solvents and play important roles in living organisms: these roles

include functioning as energy storage molecules, serving as structural components of cell

membranes, and constituting important signaling molecules. Lipids can be subdivided

into 2 groups: fatty acids and glycerides.


Property Range

- -

granatum:Toxicity

URI http://www.granatum.eu#Toxicity

Definition Toxicity is a quality of a chemical substance which indicates the capacity to cause injury

to an organism in a dose dependent manner.

sub-class-of -

Property Range

granatum:CellType xsd:string

granatum:level xsd:string

granatum:species xsd:string

granatum:concentration xsd:string

granatum:Source

URI http://www.granatum.eu#Source




Definition A Source refers to the sources that a Chemopreventive agent is available or from where

it originates.

sub-class-of -

Property Range

granatum:contains granatum:Molecule

granatum:Drug

URI http://www.granatum.eu#Drug

Definition A Drug is any substance which when absorbed into a living organism may modify one or

more of its functions. The term is generally accepted for a substance taken for a

therapeutic purpose, but is also commonly used for abused substances. A

Chemopreventive agent may be contained into a drug. (Chebi)

sub-class-of granatum:Source

Property Range

granatum:CommonName xsd:string

granatum:AdverseReaction xsd:string

granatum:Type xsd:string

granatum:interact granatum:Target

granatum:NaturalSource

URI http://www.granatum.eu#NaturlaSource

Definition A Natural Source is a material found in nature that usually has a pharmacological or

biological activity. A Chemopreventive agent may be contained into a Natural Source.

sub-class-of granatum:Source

Property Range

granatum:CommonName xsd:string

Granatum:Pathway

URI http://www.granatum.eu#Pathway

Definition A Pathways is a set or series of interactions, often forming a network, which biologists

have found useful to group together for organizational, historic, biophysical, or other

reasons. The Chemopreventive agent affects a Biological target in order to “break” the

series of interactions that leads to a Disease (i.e. cancer).




sub-class-of -

Property Range

granatum:containsTarget granatum:Target

granatum:relatedToDisease granatum:Disease



granatum:PathwayMap xsd:anyURI

granatum:Order xsd:string

Granatum:Disease

URI http://www.granatum.eu#Pathway

Definition A Disease is any abnormal condition of the body or mind that causes discomfort,

dysfunction, or distress to the person affected or those in contact with the person. The

term is often used broadly to include injuries, disabilities, syndromes, symptoms, deviant

behaviors, and atypical variations of structure and function.

sub-class-of -

Property Range

granatum:isPreventedBy granatum:ChemopreventiveAgent



3.3. MODEL IMPLEMENTATION

There are several languages that can be used to implement an ontology. Very generic and

flexible ones (such as OWL) allow to express complex relationship between the concepts and the

roles in the domain; ontologies using such formalisms are called heavyweight ontologies in

contrast to lighweight ontologies which use simpler formalisms (as RDF or RDF schema) with

fewer possibilities to express complex relationship. Choosing the proper formalism is strict

linked to the computational operations to be performed on top of the model.

The GRANATUM Biomedical Model aims to solve interoperability issues, and interconnect

different existing ontologies. Since mapping same concepts expressed differently by diverse

ontologies could not be so simple and conflicts that might be raised could need reasoning

techniques to be employed, then the chosen formalism for the GRANATUM ontology is OWL 2

language. The lightweight approach has been discarded to avoid limits in the expressiveness of

the ontology.




4. MODEL EVALUATION

The GRANATUM Biomedical Semantic Model needs to be evaluated and tested to check that it

fulfils the requirements defined in the specification phase (Section 2.1). It should also be

evaluated according to criteria such as clarity, the ontology and its terms should be clear and

unambiguous, consistency, the ontology needs to be free from contradictions, and reusability,

define the possibilities to reuse the ontology and the extent of reuse. The evaluation criteria are

further described in Section 4.1. A set of existing methodologies proposed for ontology

evaluation are presented in section 4.2. Finally, Section 4.3 presents the results of the evaluation

procedure.

4.1. EVALUATION CRITERIA

The criteria used for the evaluation of the ontology are proposed in existing methodologies for

ontology evaluation [33] [34, 35]. Table 13 presents the evaluation criteria with a brief

description.

Object of evaluation Description

Lexicon and vocabulary Emphasizes the handling of concepts and instances and the

vocabulary used to identify them

Hierarchy, Taxonomy Emphasizes taxonomic relations (is-a relations)

Semantic relations Evaluates other relations, which are not taxonomic relations

Context or application Evaluates ontologies in their context of use and in the context of

application of which the ontology itself is part

Syntax Evaluates ontology conformity to syntactical requirements of

formal language in which the ontology was developed

Structure and architecture Evaluates ontology conformity to predefined structural

requirements

Table 13 Ontology evaluation criteria

4.2. EVALUATION METHODOLOGY

Various approaches for the evaluation of ontologies have been considered in the literature,

depending on what kind of ontologies are being evaluated and for what purpose. Broadly

speaking, most evaluation approaches fall into one of the categories described into Table 14.

Evaluation methodology Description

Golden standard [36, 37] Syntactic comparison between an ontology and a

standard, which may be another ontology

Application Based [38, 39] Use of an ontology in an application followed by

evaluation of the results.

Data or corpus driven[40, 41] Comparison with a data source covered by the ontology

proper

Human assessment[42, 43] Evaluation conducted by people who seek to verify the

adherence of an ontology to criteria and patterns.

Table 14 Methodologies for ontology evaluation

Each of the evaluation methodologies (presented to Table 14) is capable to check some of the

evaluation criteria (presented to Table 13). The evaluation criteria for each evaluation

methodology are presented in Table 15.




Evaluation

criteria

Evaluation methodology

Golden

standard

Application

Based

Data or corpus

driven

Human

assessment

Lexicon and

vocabulary

Hierarchy,

Taxonomy

Scemantic

relations

Context or

application

- -

Syntax - -

Structure and

architecture

- - -

Table 15 Evaluation criteria for each evaluation methodology

The “Golden standard” methodology cannot be used for the evaluation of the Granatum

biomedical Semantic model since no standard exists that covers the scope of the Granatum

model. The “Application Based” methodology could be used at the maintenance step, once the

Granatum platform is complete and evaluation results can be extracted from its use. It cannot be

used as an evaluation method at this stage where currently no application exists. The “Data or

corpus driven” methodology is incorporated in the conceptualization phase (Section 2.2.2.2)

where experimental data sets are analyzed to identify concepts. The most appropriate evaluation

method at this stage of the ontology creation is the “Human assessment”. At the evaluation

process of the ontology the trial partners (i.e. DKFZ, IITRI and UCY/CBC) are actively involved

in order to identifying inconsistencies and weaknesses of the ontology. In order to simplify the

evaluation process a questionnaire is provided.

4.3. EVALUATION RESULTS

The questionnaire (see Appendix) examines the completeness, correctness, usability and the

simplicity of the Granatum biomedical semantic model. It is separated into two parts:

The first part examines the usability and the simplicity of the model. In this part the

experts were asked to answer a tailored version of the System Usability Scale (SUS) [44]

that is proposed by [45] in order to evaluate the understanding and agreement felt by the

biomedical experts regarding the Granatum model as a whole. It contains 7 Likert scale

questions (stating the degree of agreement or disagreement).

The second part examines the correctness and the completeness of the model. It contains

4 questions related to the definitions of the model’s concepts (in case no standard

definitions are detected in existing ontologies) and 20 questions for the validation of the

relations between the concepts that exist in the GRANATUM Biomedical Semantic

Model. Moreover it provides the biomedical experts the ability to express any

disagreement or detect any concept or property missing.

The next sections present the results from both parts of the questionnaire.




4.3.1. Usability evaluation

This section presents the results from the usability evaluation. The majority of the biomedical

experts (71.42% agreement and 14.29% high agreement) declared that they could contribute to

the ontology while 14.29% where indifferent (Question 1). The understanding of the ontology is

examined by Questions 2, 3, 5 and 6. Most of the biomedical experts (42.86%) found the

ontology easy to understand (Question 2), but there are experts that did not found it easy

(28.57%) and others that are indifferent (28.57%). Moreover, most of the experts understand the

conceptualization (Question 6) of the ontology (71.42% agreement). Regarding Question 3, the

answers vary, but most of the users (14.29% agreement and 28.57% high agreement) will need

further theoretical support to be able to understand the ontology. The same conclusion derives

also from Question 5, where most of the users agreed that the biomedical experts could not easily

understand the ontology (14.29% high disagreement, 28.57% disagreement, 57.14% indifferent).

Finally, assuming the completeness (Question 7) and integration (Question 4) of the ontology

most of the users found the concepts of the ontology well integrated (71.42% agreement and

14.29% high agreement) and they believe that the ontology covers the needs of the Cancer

Chemoprevention domain (42.86% agreement). The usability results are presented in Table 16.

N Question High

disagree

Disagreeme

nt

Indifferent Agreement High agree

1 I think that I could

contribute to this ontology

0.00% 0.00% 14.29% 71.42% 14.29%

2 I find the ontology easy to

understand

0.00% 28.57% 28.57% 42.86% 0.00%

3 I think that I would need

further theoretical

support to be able to

understand this ontology

14.29% 14.29% 28.57% 14.29% 28.57%

4 I found the various

concepts in this model

were well integrated

0.00% 0.00% 14.29% 71.42% 14.29%

5 I would imagine that most

biomedical experts would


very quickly

14.29% 28.57% 57.14% 0.00% 0.00%

6 I am confident I

understand the

conceptualization of the

ontology

0.00% 0.00% 28.57% 71.42% 0.00%

7 The concepts/properties

of the ontology cover the

needs of the Cancer

Chemoprevention

domain.

0.00% 0.00% 57.14% 42.86% 0.00%

TOTALS 4.08% 10.21% 32.65% 44.90% 8.16%

Table 16 Usability evaluation




4.3.2. Correctness and completeness evaluation

This section presents the results from the correctness and completeness evaluation. The

biomedical experts generally agreed with the concepts and properties of the model but they also

proposed changes to the definitions of the concepts as well as addition/deletion of concepts

properties. Those changes refer to addition/deletion of concepts/properties/attributes and are

adopted by the Granatum biomedical semantic model presented in Section 3. The changes are

described in Table 17.

Change Description

Add concept

“Information Resource”

Add the concept “Information Resource” as an upper class for the concepts

that carry any type of information

Add subclasses of

Information Resource the

“Semi-structured

Knowledge Resource”

and “Unstructured

Knowledge Resource”

Add the concepts “Semi-structured Knowledge Resource” and

“Unstructured Knowledge Resource” as subclasses of the Information

resource in order to conceptually separate the rest concepts. They also

proposed to make the Forum post and Published work subclass of the Semi-

structured Knowledge Resource and add a new concept Image as subclass

of Unstructured Knowledge Resource.

Change Experiment

hierarchy

Separate the experiments into 3 types: i) in vivo, ii) in vitro and iii) in

silico. Moreover they proposed to add BioAssay as a subclass of in vitro

Experiment and not as an independent class.

Add concept “Scientific

workflow”

Add a new concept “Scientific Workflow” that shows the pipeline of

connected components in order to perform an in silico experiment.

Add properties to


Add attributes to the Chemopreventive agent: i) cooperateWith other agents

and ii) induce differentiation.

Add concept “Toxicity” Add a concept “Toxicity” that is related with a Chemopreventive agent.

The Toxicity has properties such as: i) cell type, ii) level of toxicity, iii)

species and iv) concentration.

Add Target subclass of

Molecule

Make the Target subclass of the Molecule. And add a property size to the

molecule.

Add subclasses to Target Add subclass to concept Target: i) Reactive Oxygen Species, ii) Lipids and

iii) Sugars. Moreover they proposed to add new attributes to concepts e.g.

add property 3D structure to the Proteins.

Add relation between


and Pathway

Add a relation between a Chemopreventive agent and a Pathway because

we may know the Pathway that a Chemopreventive agent affects but not

the specific Target.


Source and Molecule

Add a relation between the Source of a Chemopreventive agent and a

Molecule since the Source may contain many Molecules.



and Disease

Add a relation between the Chemopreventive agent and the Disease in

order to show which disease can be prevented by the agent.

Remove relation between

Disease and Drug

Remove the relation between the Drug and the Disease that show the

Disease treated by the Drug, because is out of the scope of the model that

focuses on chemoprevention and not treatment.


Drug and Target

Add a relation between a Drug and the Targets it interacts with.

Table 17 Changes based on evaluation




5. CONCLUSIONS AND FUTURE WORK

The present document is Deliverable 1.3 “GRANATUM Biomedical Semantic Model” of the

GRANATUM project. This document defines, designs and document the GRANATUM

Biomedical Semantic Model, which is one of the pillars on which the GRANATUM approach

will build. The GRANATUM Biomedical Semantic Model comprises of a set of core concepts

and their relationships. It is be possible to further specialize these core concepts (through sub-

concepting) or introduce new concepts, thus ensuring the extensibility and adaptability of the

ontology to the needs of different cases. The model will contribute to the realization of the

GRANATUM vision by carrying the required semantics in WPs 2 to 5. Specifically the

GRANATUM Biomedical Semantic model will be utilized:

WP2: In the semantic annotation, sharing and inter-connection of globally available web

resources in the Linked Biomedical Data Space.

WP3: In the semantic processing of publications and scientific papers (in online libraries

and digital archives), as well as posts on online communities and social networks in the

Opinion Modelling and Argument Analysis Space.

WP4: In the discovery and retrieval of the semantically-linked cancer chemoprevention

significant online data and web resources in the In Silico Models, Tools and Experiments

Space.

WP5: In the ontology-based mash-up of social networking applications and collaboration

tools in the Social Collaborative Working Space.

The definition of the GRANATUM Biomedical Semantic Model is based on the work carried

out in D1.1 where the state of the art is studied and the existing biomedical ontologies and data

sets, related to cancer chemoprevention, where detected. Moreover a prioritized list of

requirements with high and medium priority was selected for being addressed by the

GRANATUM Biomedical Semantic Model.

The data sets and ontologies detected in D1.1 were used at the top-down and bottom-up

conceptualization of the GRANATUM model. Furthermore, experimental data related to cancer

chemoprevention and the top/medium priority requirements identified in D1.1 where analyzed in

the conceptualization phase. Afterwards, the conceptual model is formally defined using a

standard template that specifies the class hierarchy and the properties used by each class. The

OWL implementation uses the classes and properties as they are represented by the formally

defined conceptual model.

The next step is the evaluation of the ontology. During the evaluation the accuracy,

completeness, usability and simplicity of the model were examined. For the evaluation a

questionnaire was circulated to the biomedical experts that detected problems and

inconsistencies and proposed correction actions. Those proposals were integrated in the

GRANATUM Biomedical Semantic Model.

The Maintenance of the ontology is a continuous process that will be carried throughout the

GRANATUM project. Whenever any needs emerge, that are not covered by the existing model,

then the ontology will change respectively (add/remove concepts/properties, change definitions

etc) to handle them.




REFERENCES

[1] O. Corcho, M. Fernández-lópez, A. Gómez-pérez, and A. López, "Building legal

ontologies with METHONTOLOGY and WebODE," in Law and the Semantic Web,

number 3369 in LNAI: Springer-Verlag, 2005, pp. 142--157.

[2] Z. Li, M. Yang, and K. Ramani, "A methodology for engineering ontology acquisition

and validation," Artif. Intell. Eng. Des. Anal. Manuf., vol. 23, pp. 37--51, 2009.

[3] A. Öhgren and K. Sandkuhl, "Towards a methodology for ontology development in small

and medium-sized enterprises," in IADIS International Conference on Applied

Computing, 2005, pp. 369 - 376.

[4] D. Shotton, " CiTO, the Citation Typing Ontology," Journal of Biomedical Semantics,

vol. 1(Suppl 1):S6, 2010.

[5] U. Bojars, J. G. Breslin, V. Peristeras, G. Tummarello, and S. Decker, "Interlinking the

Social Web with Semantics," IEEE Intelligent Systems, vol. 23, pp. 29-40, 2008.

[6] P. Ciccarese, E. Wu, G. Wong, M. Ocana, J. Kinoshita, A. Ruttenberg, and T. Clark,

"The SWAN biomedical discourse ontology," Journal of Biomedical Informatics, vol. 41,

pp. 739-751, 2008.

[7] M. Brochhausen, A. Spear, C. Cocos, G. Weiler, L. Martìn, A. Anguita, H. Stenzhorn, E.

Daskalaki, F. Schera, U. Schwarz, S. Sfakianakis, S. Kiefer, M. Dörr, N. Graf, and

MTsiknakis, "The ACGT Master Ontology and Its Applications - Towards an Ontology-

Driven Cancer Research and Management System," Journal of Biomedical Informatics,

vol. 44, pp. 8-25, 2011.

[8] G. D. Bader and M. P. Cary, "BioPAX – Biological Pathways Exchange Language "

Level 2, Version 1.0 Documentation, doi:http://www.biopax.org/release/biopax-level2-

documentation.pdf, 2005.

[9] E. Beißwanger, S. Schulz, H. Stenzhorn, and U. Hahn, "BioTop: An Upper Domain

Ontology for the Life Sciences - A Description of its Current Structure, Contents, and

Interfaces to OBO Ontologies," Applied Ontology, vol. 3, pp. 205-212,, 2008.

[10] C. Crichton, J. Davies, J. Gibbons, S. Harris, A. Tsui, and J. Brenton, "Metadata-Driven

Software for Clinical Trials," Proceedings of the 2009 ICSE Workshop on Software

Engineering in Health Care (SEHC '09), 2009.

[11] J. Malone, E. Holloway, T. Adamusiak, M. Kapushesky, J. Zheng, N. Kolesnikov, A.

Zhukova, A. Brazma, and H. Parkinson, "Modeling Sample Variables with an

Experimental Factor Ontology," Bioinformatics, vol. 26, pp. 1112-1118, 2010.

[12] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis,

K. Dolinski, S. S. Dwight, and J. T. Eppig, "Gene ontology: tool for the unification of

biology," The Gene Ontology Consortium. Nature Genet., vol. 25, pp. 25-29, 2000.

[13] H. J. Lowe and G. O. Barnett, "Understanding and using the medical subject headings

(MeSH) vocabulary to perform literature searches," Journal of the American Medical

Assocation (JAMA), vol. 271, pp. 1103-1108, 1994.

[14] C. A. Ball and A. Brazma, "MGED standards: work in progress," Omics 2006;, vol. 10,

pp. 138-144, 2006.

[15] N. Sioutos, S. Coronado, M. Haber, F. Hartel, W. Shaiu, and L. Wright, "NCI Thesaurus:

a semantic model integrating cancer-related clinical and molecular information," Journal

of biomedical informatics, vol. 40, pp. 30-43, 2007.

[16] S. de Coronado, M. W. Haber, N. Sioutos, M. S. Tuttle, and L. W. Wright, "NCI

Thesaurus: using science-based terminology to integrate cancer research results,"

Medinfo, vol. 11, pp. 33-37, 2004.

http://www.biopax.org/release/biopax-level2-documentation.pdf

http://www.biopax.org/release/biopax-level2-documentation.pdf




[17] M. Courtot, W. Bug, F. Gibson, A. Lister, J. Malone, D. Schober, R. Brinkman, and A.

Ruttenberg, "The OWL of Biomedical Investigations," in OWLED Workshop on OWL:

Experiences and Directions, collocated with the 7th International Semantic Web

Conference (ISWC-2008) Karlsruhe, Germany, 2008.

[18] S. Liu, M. Wei, R. Moore, V. Ganesan, and S. Nelson, "RxNorm: prescription for

electronic drug information exchange," IT Professional, vol. 7, pp. 17-23, 2005.

[19] J. J. Cimino, T. J. McNamara, T. Meredith, C. A. Broverman, K. C. Eckert, and M. e. a.

Moore, "Evaluation of a proposed method for representing drug terminology," Journal of

the American Medical Informatics Association, vol. 6, pp. 47-51, 1999.

[20] S. P. Cohn, "Seventh Annual Report to Congress on the Implementation Of the

Administrative Simplification Provisions of the Health Insurance Portability and

Accountability Act of 1996," 2005.

[21] D. Lindberg, B. Humphreys, and A. McCray, "The Unified Medical Language System,"

Methods of Information and Medicine, vol. 32, pp. 281-291, 1993.

[22] D. S. Wishart, C. Knox, A. C. Guo, D. Cheng, S. Shrivastava, D. Tzur, B. Gautam, and

M. Hassanali, "DrugBank: a knowledgebase for drugs, drug actions and drug targets,"

Nucleic Acids Research, vol. 36, pp. D901-D906, 2008.

[23] K.-I. Goh, M. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabási, "The Human

Disease Network," Proc Natl Acad Sci USA, vol. 104, pp. 8685-8690, 2007.

[24] K. Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D.

Telikicherla, R. Raju, B. Shafreen, A. Venugopal, L. Balakrishnan, A. Marimuthu, S.

Banerjee, D. Somanathan, A. Sebastian, S. Rani, S. R. S, K. Harrys, S. Kanth, M.

Ahmed, M. Kashyap, R. Mohmood, Y. Ramachandra, V. Krishna, B. Rahiman, S.

Mohan, P. Ranganathan, S. Ramabadran, R. Chaerkady, and A. Pandey, "Human Protein

Reference Database " Nucleic Acids Research, vol. 37, pp. 767-72, 2009.

[25] P. Romero, J. Wagg, M. L. Green, D. Kaiser, M. Krummenacker, and P. D. Karp,

"Computational prediction of human metabolic pathways from the complete human

genome," Genome Biology, vol. 6, pp. 1-17, 2004.

[26] M. Bundschus, A. Bauer-Mehren, V. Tresp, L. Furlong, and H.-P. Kriegel, "Digging for

Knowledge with Information Extraction: A Case Study on Human Gene-Disease

Associations " in 19th ACM International Conference on Information and Knowledge

Management (CIKM 2010), 2010.

[27] R. Caspi, H. Foerster, C. A. Fulcher, P. Kaipa, M. Krummenacker, M. Latendresse, S.

Paley, S. Y. Rhee, A. G. Shearer, C. Tissier, T. C. Walk, P. Zhang, and P. D. Karp, "The

MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of

Pathway/Genome Databases," Nucleic Acids Res., vol. 36(Database issue), pp. 623–

D631, 2008.

[28] A. Ceol, A. A. Chatr, L. Licata, D. Peluso, L. Briganti, L. Perfetto, L. Castagnoli, and G.

Cesareni, "MINT, the molecular interaction database," Nucleic Acids Res., vol.

38(Database issue), pp. 532 - 539, 2010.

[29] D. Corpet and S. Tache, "Most effective colon cancer chemopreventive agents in rats: a

systematic review of aberrant crypt foci and tumor data, ranked by potency," Nutrition

and Cancer, vol. 43, pp. 1-21, 2002.

[30] A. Pico, T. Kelder, M. v. Iersel, K. Hanspers, B. Conklin, and C. Evelo, "WikiPathways:

Pathway Editing for the People," PLoS Biol, doi:10.1371/journal.pbio.0060184, vol. 6,

2008.




[31] R. G. Mehta, R. Naithani, L. Huma, M. Hawthorne, R. M. Moriarty, D. L. McCormick,

V. E. Steele, and L. Kopelovich, "Efficacy of Chemopreventive Agents in Mouse

Mammary Gland Organ Culture (MMOC) Model: A Comprehensive Review," Current

Medicinal Chemistry, vol. 15, pp. 2785-2825, 2008,.

[32] C. Gerhäuser, K. Klimo, E. Heiss, I. Neumann, A. Gamal-Eldeen, J. Knauft, G.-Y. Liu,

S. Sitthimonchai, and N. Frank, " Mechanism-based in vitro screening of potential cancer

chemopreventive agents," Mutation Research/Fundamental and Molecular Mechanisms

of Mutagenesis, vol. 523-524, pp. 163-172, 2003.

[33] J. Brank, M. Grobelnik, and D. Mladenić, "A survey of ontology evaluation techniques,"

in Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD 2005),

2005.

[34] M. B. Almeida, "A proposal to evaluate ontology content," Applied Ontology, vol. 4, pp.

245–265, 2009.

[35] G. Maiga and D. Williams, "A Flexible Approach for User Evaluation of Biomedical

Ontologies," International Journal of Computing and ICT Research, vol. 2, 2008.

[36] A. Maedche and S. Staab, " Measuring similarity between ontologies," in 13th European

Conference on Knowledge Acquisition and Management (EKAW 2002) Madrid, Spain,

2002.

[37] J. Brank, M. Grobelnik, and D. Mladeni´c, "Gold standard based ontology evaluation

using instance assignment," in 4th International Workshop on Evaluation of Ontologies

for the Web (EON 2006) at the 15th International World Wide Web Conference (WWW

2006) Edinburgh, UK, 2006.

[38] R. Porzel and R. Malaka, "A task-based approach for ontology evaluation," in In

Workshop on Ontology Learning and Population at the 16th European Conference on

Artificial Intelligence ECAI Valencia, Spain, 2004.

[39] Y. Kalfoglou and B. Hu, "Issues with evaluating and using publicly available

ontologies," in In 4th International Workshop on Evaluation of Ontologies for the Web

(EON 2006) at the 15th International World Wide Web Conference Edinburgh, UK,

2006.

[40] C. Patel, K. Supekar, L. Yugyung, and E. K. Park, "OntoKhoj: a semantic web portal for

ontology searching, ranking and classification," in Proceedings of the 5th ACM

International Workshop on Web Information and Data Management, 2003, pp. 58–61.

[41] C. Brewster, H. Alani, S. Dasmahapatra, and Y. Wilk, " Data driven ontology

evaluation," in In International Conference on Language Resources and Evaluation,

Lisbon, Portugal, 2004.

[42] A. Lozano-Tello and A. Gómez-Pérez, "ONTOMETRIC: A method to choose the

appropriate ontology," Journal of Database Management, vol. 15, pp. 1–18, 2004.

[43] A. Gómez-Pérez, "Ontology evaluation," in Handbook on Ontologies, S. Staab and R.

Studer, Eds. Berlin: Springer-Verlag, 2004, pp. 251–274.

[44] J. Brooke, "SUS: A “quick and dirty” usability scale," in Usability evaluation in industry,

P. W. Jordan, B. Thomas, B. A. Weerdmeester, and I. L. McClelland, Eds. London:

Taylor & Francis., 1996, pp. 189 -194.

[45] C. Nuria, "Ontology Evaluation through Usability Measures," in Proceedings of OTM

Workshops'2009, Vilamoura, Portugal, 2009, pp. 594 - 603.




APPENDIX

QUESTIONNAIRE FOR THE EVALUATION OF THE GRANATUM BIOMEDICAL

SEMANTIC MODEL

Based on the GRANATUM Biomedical Semantic model depicted in the following image answer

the questions of the questionnaire.




COMPLETENESS / USABILITY OF THE MODEL

N. Rate the following statements

(1: high disagreement, 5: high agreement)

1 2 3 4 5

1. I think that I could contribute to this ontology

2. I find the ontology easy to understand

3. I think that I would need further theoretical support to be able to


4. I found the various concepts in this model were well integrated

5. I would imagine that most biomedical experts would understand this

ontology very quickly

6. I am confident I understand the conceptualization of the ontology

7. The concepts/properties of the ontology cover the needs of the Cancer

Chemoprevention domain.

If the concepts/properties of the ontology do not cover the needs of the Cancer

Chemoprevention domain, describe the missing concepts/properties:

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

...........................................................................................................................................................

CONCEPT DEFINITION QUESTIONS

N. Do you agree with the definition of: Yes No If no explain/propose

correction or enter a

reference for a definition

1. Virtual Screening

Virtual Screening refers to the technique of

performing a Biomedical experiment entirely in a

computer via computer simulation.

2. Chemopreventive Agent

A Chemopreventive agent is a Molecule that has

shown some evidence that it may be able to




prevent or delay the development of a specific

Disease (e.g. cancer) by interfering with a

Biological target. For example -

cancer chemopreventive agents are used to

inhibit, delay, or reverse carcinogenesis. A

Chemopreventive agent can be tested into an

Experiment and can be contained into a Natural

source or a Drug. References to a

Chemopreventive agent can be found into a

Publication.

3. Natural Source

A Natural Source is a substance found in nature

that usually has a pharmacological or biological

activity. A Chemopreventive agent may be

contained into a Natural Source.

4. Biological Target

A biological target is a biopolymer such as

a protein or nucleic acid whose activity can be

modified by an external stimulus. A

Chemopreventive agent has a Biological target in

order to "hit" it and change its behavior in order to

prevent a disease.

QUESTIONS FOR RELATIONS BETWEEN CONCEPTS

Do you agree with the following statements? Yes No If no explain/propose

correction

1. An Experiment uses a set of Experimental factors

2. An Experiment follows a Protocol

3. A Protocol can be referenced in a Publication

4. A Clinical trial is an Experiment

5. A Virtual Screening is an Experiment

6. An Experiment can be described in a Publication

7. A BioAssay is part of an Experiment

8. A BioAssay uses a Chemopreventive Agent

9. A Chemopreventive Agent is a Molecule

10. A Chemopreventive Agent can be referenced in a

Publication

11. A Chemopreventive Agent can be found in a Source

12. A Source is a Molecule

13. A Drug can be a source of a Chemopreventive Agent

14. A Natural Source can be a source of a Chemopreventive




Agent

15. A Chemopreventive Agent has a Biological target

16. A Nucleic Acid can be the Biological target of a

Chemopreventive Agent

17. A Protein can be the Biological target of a

Chemopreventive Agent

18. A Biological target can be part of a Pathway

19. A Pathway can be related to a Disease

20. A Disease can be treated by a Drug

D1.3 GRANATUM Biomedical Semantic Model · German Cancer Research Centre (DKFZ) Clarissa Gerhäuser...

Documents

Transcript of D1.3 GRANATUM Biomedical Semantic Model · German Cancer Research Centre (DKFZ) Clarissa Gerhäuser...