D1.3 GRANATUM Biomedical Semantic Model · German Cancer Research Centre (DKFZ) Clarissa Gerhäuser...
Transcript of D1.3 GRANATUM Biomedical Semantic Model · German Cancer Research Centre (DKFZ) Clarissa Gerhäuser...
FP7‐ICT‐2009‐6
Project Partners: FIT (DE), NUIG‐DERI (IE), CYBION (IT), CERTH (GR), UCY/CBC (CY), UCY/CS
(CY), DKFZ (DE), UBITECH (GR)
Every effort has been made to ensure that all statements and information contained herein are
accurate, however the Partners accept no liability for any error or omission in the same.
© Copyright in this document remains vested in the Project Partners.
Project Number 270139
D1.3 –GRANATUM Biomedical Semantic Model
Version 0.7
1 February 2012 Draft
EC Distribution
CERTH with input from NUIG-DERI, UCY/CBC, DFKZ and UBITECH
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page ii
Confidentiality: EC Distribution
PROJECT PARTNER CONTACT INFORMATION
Centre of Research and Technology
Hellas (CERTH)
Prof. Konstantinos Tarabanis
6th Klm. Charilaou - Thermi Road
P.O. BOX 60361 GR - 570 01
Thermi, Thessaloniki - Greece
Tel.: +30 231 08 91 578
Fax : +30 231 08 91 509
E-mail: [email protected]
Fraunhofer FIT (FIT)
Wolfgang Prinz
Schloss Birlinghoven
53754 Sankt Augustin, Germany
Tel.: +49 2241 142730
Fax : +49 2241 142080
E-mail: wolfgang,[email protected]
University of Cyprus –
Cancer Biology and Chemoprevention
Laboratory, Department of Biological
Sciences (UCY/CBC)
Christiana Neophytou
Panepistimioupolis
1678 Lefkosia – Cyprus
Tel.: + 35 722892725
Fax : + 35 22892881
E-mail: [email protected]
CYBION
Via della Scrofa, 117
00186 Roma - Italia
Tel.: +39 6 68 65 975
Fax : +39 6 68 80 69 97
E-mail:
GIOUMPITEK - Meleti Schediasmos
Ylopoiisi kai Polisi Ergon Pliroforikis
EPE (UBITECH)
Thanassis Bouras
Mesogeion Avenue 429 & Chalandriou 3,
15343 Agia Paraskevi, Athens, Greece
Tel.: +30 211 7005570
Fax : +30 211 7005571
E-mail: [email protected]
University of Cyprus - Department of
Computer Sciences (UCY/CS)
Christos Kannas
P.O. Box 20537
1678 Nicosia, CYPRUS
Tel.: + 357 99530608
Fax : + 357 22892701
E-mail: [email protected]
German Cancer Research Centre
(DKFZ)
Clarissa Gerhäuser
Im Neuenheimer Feld 280
69120 Heidelberg - Germany
Tel.: +49 6221 42 3306
Fax : +49 6221 42 3359
E-mail:
National University of Ireland, Galway -
Digital Enterprise Research Institute
(NUIG-DERI)
Helena F. Deus
IDA business park, Lower Dangan
Galway, Ireland
Tel.: +353 91495270
Fax : +353 91 495541
Email: [email protected]
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page iii
Confidentiality: EC Distribution
CONTRIBUTORS
Partner Name Short
Name
Nationality
Fraunhofer Gesellschaft - Institut für Angewandte
Informationstechnik
FIT DE
National University of Ireland, Galway (NUI, Galway) -
Digital Enterprise Research Institute (DERI)
NUIG-DERI IE
CYBION Srl. CYBION IT
Centre of Research and Technology Hellas CERTH GR
University
of Cyprus
Cancer Biology and Chemoprevention
Laboratory, Department of Biological Sciences
UCY/CBC CY
Department of Computer Sciences UCY/CS
German Cancer Research Centre (Deutsche
Krebsforschungszentrum)
DKFZ DE
GIOUMPITEK - Meleti Schediasmos Ylopoiisi kai Polisi Ergon
Pliroforikis EPE
UBITECH GR
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page iv
Confidentiality: EC Distribution
DOCUMENT CONTROL
Version Status Date
0.1 Changes to all sections 1 October 2011
0.2 Changes to all sections 25 October 2011
0.3 Bottom-up and top-down construction of the model 19 December 2011
0.4 Model specification 21 December 2011
0.5 Model evaluation questionnaire 12 January 2012
0.6 Integration of evaluation results 25 January 2012
0.7 Integrate comments from trial partners related to the model 1 February 2012
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page v
Confidentiality: EC Distribution
TABLE OF CONTENTS
EXECUTIVE SUMMARY ......................................................................................................................................... 1
1. INTRODUCTION .............................................................................................................................................. 2
1.1. DOCUMENT SCOPE ....................................................................................................................................... 2 1.2. MOTIVATION ................................................................................................................................................ 3 1.3. METHODOLOGY............................................................................................................................................ 6 1.4. DOCUMENT STRUCTURE ............................................................................................................................... 7
2. DESIGN PROCESS ........................................................................................................................................... 8
2.1. SPECIFICATION ............................................................................................................................................. 8 2.2. CONCEPTUALIZATION................................................................................................................................... 9
2.2.1. Top- Down Conceptualization ............................................................................................................ 9 2.2.1.1. Existing Ontologies/models .............................................................................................................. 10 2.2.1.1.1. Literature representation Ontologies ........................................................................................... 10 2.2.1.1.2. Biomedical Ontologies ................................................................................................................. 11 2.2.1.2. Top- Down Concept identification .................................................................................................... 14 2.2.1.2.1. Literature Domain ........................................................................................................................ 14 2.2.1.2.2. Experiment Domain ...................................................................................................................... 15 2.2.1.2.3. Biomedical Domain ...................................................................................................................... 18 2.2.2. Bottom up construction of the GRANATUM model .......................................................................... 20 2.2.2.1. Analysis of Publicly available datasets ............................................................................................. 20 2.2.2.1.1. Existing datasets accessed through SPARQL endpoints .............................................................. 20 2.2.2.1.2. Existing datasets accessed through searching.............................................................................. 25 2.2.2.1.3. Concept/attribute identification from Existing datasets ............................................................... 30 2.2.2.2. Experimental data analysis ............................................................................................................... 31 2.2.2.3. Requirements analysis ...................................................................................................................... 33
3. GRANATUM BIOMEDICAL SEMANTIC MODEL FORMALIZATION .............................................. 35
3.1. OVERVIEW ................................................................................................................................................. 35 3.2. MODEL SPECIFICATION .............................................................................................................................. 37 3.3. MODEL IMPLEMENTATION ......................................................................................................................... 47
4. MODEL EVALUATION ................................................................................................................................. 48
4.1. EVALUATION CRITERIA .............................................................................................................................. 48 4.2. EVALUATION METHODOLOGY .................................................................................................................... 48 4.3. EVALUATION RESULTS ............................................................................................................................... 49
4.3.1. Usability evaluation .......................................................................................................................... 50 4.3.2. Correctness and completeness evaluation ........................................................................................ 51
5. CONCLUSIONS AND FUTURE WORK ...................................................................................................... 52
REFERENCES .......................................................................................................................................................... 53
APPENDIX ................................................................................................................................................................ 56
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page vi
Confidentiality: EC Distribution
TABLE OF FIGURES
Figure 1 GRANATUM Vision for Bridging Biomedical Researchers’ Knowledge and
Information Gap .............................................................................................................................. 3
Figure 2Methodology for building the GRANATUM Biomedical Semantic Model ..................... 6 Figure 3 GRANATUM Biomedical Semantic Model scope .......................................................... 9 Figure 4 Bottom-up and Top-down conceptualization of the GRANATUM Biomedical Semantic
Model ............................................................................................................................................ 10 Figure 5 Conceptualization of the first experimental data set ....................................................... 32
Figure 6 Conceptualization of the second experimental data set .................................................. 32 Figure 7 Overview of the GRANATUM Biomedical Semantic Model ........................................ 36
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page vii
Confidentiality: EC Distribution
TABLE OF TABLES
Table 1 Functional requirements that require the use of Ontology ................................................. 4 Table 2 Concepts identified from the analysis of the non-functional requirements ....................... 5
Table 3 Concepts identified in the literature domain .................................................................... 15 Table 4 Properties identified in the literature domain ................................................................... 15 Table 5 Concepts identified in the experiment domain ................................................................. 16 Table 6 Properties identified in the experiment domain ............................................................... 17 Table 7 Concepts identified in the biomedical domain ................................................................. 18
Table 8 Properties identified in the biomedical domain ............................................................... 19 Table 9 Concepts detected into dataset (accessed through SPARQL endpoints or through
searching) ...................................................................................................................................... 31 Table 10 Concepts derive from experimental data analysis .......................................................... 33 Table 11 Properties derive from experimental data analysis ........................................................ 33 Table 12 Concepts derived from the Use Cases and the Questionnaires ...................................... 34 Table 13 Ontology evaluation criteria ........................................................................................... 48
Table 14 Methodologies for ontology evaluation ......................................................................... 48
Table 15 Evaluation criteria for each evaluation methodology .................................................... 49 Table 16 Usability evaluation ....................................................................................................... 50 Table 17 Changes based on evaluation ......................................................................................... 51
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 1
Confidentiality: EC Distribution
EXECUTIVE SUMMARY
The present document is Deliverable 1.3 “GRANATUM Biomedical Semantic Model”
(henceforth referred to as D1.3) of the GRANATUM project. The main objective of this
document is to define, design and document the GRANATUM Biomedical Semantic Model,
which is one of the pillars on which the GRANATUM approach will build. The GRANATUM
Biomedical Semantic Model will comprise of a set of core concepts and their relationships. It
will be possible to further specialize these core concepts or introduce new concepts, thus
ensuring the extensibility and adaptability of the ontology to the needs of different cases. The
model will contribute to the realization of the GRANATUM vision by carrying the required
semantics in WPs 2 to 6.
The methodology that has been followed for the creation of GRANATUM Biomedical Semantic
Model is based on a combination of known methodologies for ontology creation. The following
steps have been carried out for the creation of the ontology: (i) Specification of the ontology’s
scope, uses, end-users and granularity of the concepts that should be taken into account; (ii)
Identification of the concepts and relations of the ontology using a “meet-in-the-middle”
approach that combines a bottom-up and top-down methodology, the output of this process is a
conceptual model of the ontology; (iii) Formalization of the conceptual model using a standard
template; (iv) Implementation of the ontology using a standardized language such as OWL, (v)
Evaluation of the accuracy and completeness of the produced ontology; and (vi) Continuous
maintenance of the ontology to improve it.
The main step of the methodology was the Conceptualization, where are identified the concepts
and properties of the ontology. On the one hand, the concepts and properties emerged in a
bottom-up fashion by: (i) analyzing the top and medium priority requirements identified in D1.1,
(ii) analyzing existing data sets (e.g. databases and SPARQL endpoint) and (iii) analyzing
experimental data related to cancer chemoprevention that are provided by the trial partners. On
the other hand concepts were elicited following a top-down approach by analyzing existing
ontologies (e.g. Gene Ontology, Experimental Factor Ontology, National Cancer Institute
Thesaurus etc.) and identifying concepts/relation related to the GRANATUM Biomedical
Semantic Model. The concepts and properties of both methodologies (bottom-up and top-down)
will be merged in order to create the conceptual model of the ontology.
Afterwards, the conceptual model was formally defined using a standard template that specifies
the class hierarchy and the properties used by each class. The OWL implementation uses the
classes and properties as they are represented by the formally defined conceptual model.
The next step was the evaluation of the ontology. The evaluation examined the accuracy and
completeness of the ontology. At the evaluation process of the ontology the trial partners (i.e.
DKFZ, IITRI and UCY/CBC) were actively involved in order to identify inconsistencies and
weaknesses of the ontology.
Finally, the Maintenance of the ontology is a continuous process that will be carried throughout
the GRANATUM project, so it will detect needs that emerge and alter the ontology respectively
to handle them.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 2
Confidentiality: EC Distribution
1. INTRODUCTION
The aim of this section is to present the background of the work pursued during Task 1.3. The
scope and the main objectives which have guided this work are introduced in section 1.1. Section
1.2 presents the motivation for creating a semantic model. The methodology followed is
described in section 1.3. Last, section 1.4 presents the organization of the current deliverable.
1.1. DOCUMENT SCOPE
The present document is Deliverable 1.3 “D1.3 – GRANATUM Biomedical Semantic Model”
(henceforth referred to as D1.3) of the GRANATUM project. The main objective of this
document is to define and document the outcome of Task 1.3, providing the specification of the
GRANATUM common reference ontological model for describing, sharing and linking cancer
chemoprevention significant Web resources. In order to capture the semantics of the biomedical
domain, the GRATANUM platform utilizes a lightweight ontological model, called
GRANATUM Biomedical Semantic Model. The creation of the GRANATUM Biomedical
Semantic Model follows a methodology (Section 1.3) based on a set of existing methodologies
for defining ontologies.
The GRANATUM Biomedical Semantic Model constitutes one of the pillars of the
GRANATUM platform that will fulfil the GRANATUM vision.
The vision of the GRANATUM project is to bridge the information, knowledge and collaboration gap
among biomedical researchers in Europe and beyond, ensuring that the biomedical scientific community
has homogenized access to the globally available information and data resources needed to perform
complex cancer chemoprevention experiments and conduct studies on large scale datasets (Figure 1). In
this way, GRANATUM will facilitate the social sharing and collective analysis of biomedical experts’
knowledge and experience, as well as the joint conceptualization and design of scalable chemoprevention
models and simulators, towards the enablement of collaborative biomedical research activities beyond
geographical barriers, helping researchers in this highly multidisciplinary field to manage the complex
range of tasks involved in carrying out collaborative research.
The GRANATUM consortium
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 3
Confidentiality: EC Distribution
Figure 1 GRANATUM Vision for Bridging Biomedical Researchers’ Knowledge and Information Gap
1.2. MOTIVATION
In order to capture the semantics of the biomedical domain, the GRATANUM project will
develop a lightweight ontological model, called GRANATUM Biomedical Semantic Model,
which will be one of the pillars on which the GRATANUM approach will build. The
GRANATUM Biomedical Semantic Model will comprise of a set of core concepts and their
relationships. It will be possible to further specialize these core concepts (through sub-
concepting) or introduce new concepts, thus ensuring the extensibility and adaptability of the
ontology to the needs of different cases.
The need for a Biomedical Semantic model, that is formalized as an ontology, derives from the
functional and non-functional requirements presented in the Requirement Analysis in D1.1.
Specifically Table 1 presents the functional requirements identified, for each requirement it is
identified if the fulfilment of the requirement premises the use of a model/ontology. Similarly,
Table 2 presents the non-functional requirements and for each requirement it is identified if the
fulfilment of the requirement assumes the use of an ontology. 15 of the 20 functional
requirements require an ontology in order to be satisfied, while only 2 non-functional
requirements, related with the interoperability, require the use on an ontology.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 4
Confidentiality: EC Distribution
Requirement Use of ontology
F1. Search and access information (including publications, pathways, epigenomics,
genes, proteins, agents, drugs, clinical trials, etc.) derived from biomedical
databases/libraries
F2. Customize search based on advanced criteria e.g. include/exclude a database or
an attribute
F3. Integrate/combine information derived from different biomedical
databases/libraries
F4. Build/edit a hypothesis scenario
F5. Support in silico discovery
F6. Support the collaboration with distributed partners/groups and the sharing of
data/opinions/expertise
F7. Manage data and resources
F8. Manage teams and assign roles -
F9. Push data into different tools (visualization, statistical, data analysis etc.)
F10. Support knowledge extraction from scientific publications
F11. Add feedback and comments on the quality of data
F12. Advise on conflicting data
F13. Support the preparation of a grant proposal -
F14. Support user profiles in the GRANATUM platform -
F15. Manage the personalized space in the GRANATUM platform -
F16. Receive and view information from multiple feeds
F17. Support data query workflows (send the results of one database query into the
next database query)
F18. Present the searching results and the incoming feeds adapted to user’s profile and
preference criteria
F19. Manage IPRs -
F20. Recommend relevant information based on domain ontology, user profile and
current activities
Table 1 Functional requirements that require the use of Ontology
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 5
Confidentiality: EC Distribution
Requirement
Type
Requirement Use of ontology
Fu
nct
ion
ali
ty
(Sec
uri
ty)
NF1. Support data security -
NF2. Support user authentication, authorization and role-
based access control sign-in
-
NF3. Support data privacy, confidentiality -
NF4. Support security proof -
Fu
nct
ion
ali
ty
(In
tero
per
ab
i
lity
)
NF5. Support interoperability between different formats
(compatible with many standards, data
transformations)
NF6. Support commonly used standards, data models,
widely available tools, standard syntax, open API,
technologies, methodologies, and best practices
Rel
iab
ilit
y NF7. Support Reliability -
Usa
bil
ity
(Op
erab
ili
ty)
NF8. Provide a presentation interface customized to the
user’s profile/role and interests (my projects, my team,
my data, my deadlines etc.)
-
NF9. Provide a unified interface with many tools and
functionalities bundled together
-
Usa
bil
ity
(Un
der
sta
nd
ab
ilit
y) NF10. Easy-to-use interface (filtering, navigation, etc.) -
NF11. Support an understandable and intuitive interface -
NF12. Support an intelligent interface -
Eff
icie
ncy
NF13. Perform analytic tasks (in-silico discovery) through
vast amount of data in acceptable times
-
NF14. Be accessible by multiple users at the same time
(multi-tenant)
-
NF15. Support efficiency -
Av
ail
a
bil
ity NF16. Be accessible and available (at acceptable service
levels).
-
Table 2 Concepts identified from the analysis of the non-functional requirements
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 6
Confidentiality: EC Distribution
1.3. METHODOLOGY
The methodology that has been followed for the definition of the GRANATUM Biomedical
Semantic Model is based on a set of methodologies for defining ontologies:
METHONTOLOGY [1] is a methodology, created in the Artificial Intelligence Lab from
the Technical University of Madrid (UPM), for building ontologies either from scratch,
reusing their ontologies as they are, or through a reengineering procees. The ontology
development process identifies which tasks should be performed when building
ontologies: Specification, Conceptualization, Formalization, Integration, Implementation,
and Maintenance. The main phase in the ontology development process using the
METHONTOLOGY approach is the conceptualization phase. METHONTOLOGY has
been proposed1 for ontology construction by the Foundation for Intelligent Physical
Agents (FIPA).
Li et al. [2] propose a method and process to acquire and validate ontologies. The main
contributions include a new, systematic, and structured ontology development method
assisted by a semiautomatic acquisition tool. The development method defines the steps
for the ontology development that are the Specification, Acquisition, Formalization,
Population, Validation and Maintenance.
Öhgren et al. [3] propose a methodology for ontology development. The core ideas of our
enhanced methodology are (a) reuse of fragments of existing ontologies, (b) instruction-
like detailed definition of all steps of the development process, and (c) extensive use of
guidelines and other aids. The phases of the methodology are the Requirements Analysis,
the Building, the Implementation and the Evaluation & Maintenance.
Specification Conceptualization Formalization Implementation Evaluation Maintenance
Figure 2Methodology for building the GRANATUM Biomedical Semantic Model
An overview of the adopted methodology is shown in Figure 2. More specifically, the steps that
were followed are listed below:
1. Specification. This activity states why is the ontology built, which are the intended uses
and who are the end-users. As well as, the level of granularity of the concepts that should
be taken into account.
2. Conceptualization. This activity identifies the concepts and relations of the ontology.
The conceptualization of the GRANATUM Model will follow a “meet-in-the-middle”
approach. On one hand the concepts will emerge in a bottom-up fashion by analyzing the
domain, on the other hand, it will follow a top-down approach by analyzing existing
ontologies and models. The result of the conceptualization activity is the ontology
conceptual model.
1 http://www.fipa.org/specs/fipa00086/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 7
Confidentiality: EC Distribution
3. Formalization. This activity transforms the conceptual model into a formal or semi-
computable model. For the formalization of the GRANATUM Biomedical Semantic
Model it is used a standard template to formally define the concepts of the model and
their relationships.
4. Implementation. This activity builds computable models in an ontology language. The
ontology language selected for the implementation of the GRANATUM Biomedical
Semantic Model is OWL.
5. Evaluation. This activity validates the accuracy and completeness of the produced
ontology.
6. Maintenance. This activity updates and corrects the ontology if needed. If corrections
are needed then step 2 (Conceptualization) is performed.
1.4. DOCUMENT STRUCTURE
The remainder of D1.2 is divided into four sections. Section 2 describes the motivation and
scope of the Biomedical Semantic model, as well as the design process followed, according to
the proposed methodology, to create the ontology. Section 3 formally documents the Biomedical
Semantic Model by defining the classes and their properties. In Section 4 an evaluation of the
model is conducted. Finally, Section 5 briefly explains further direction for the future
development of the Ontology and some conclusions are drawn.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 8
Confidentiality: EC Distribution
2. DESIGN PROCESS
2.1. SPECIFICATION
The GRANATUM Biomedical Semantic Model is designed to serve as a common reference
model for the semantic annotation sharing and interconnection of globally available biomedical
resources, including Electronic Health Records, digital libraries and archives, online
communities and discussions, facilitating the delivery of machine interpretable information
regarding their structure and content, supporting the on demand discovery of published cancer
chemoprevention significant data. This biomedical semantic model will be utilized:
In the semantic annotation, sharing and inter-connection of globally available web
resources (in the Linked Biomedical Data Space);
In the semantic processing of publications and scientific papers (in online libraries and
digital archives), as well as posts on online communities and social networks (in the
Opinion Modelling and Argument Analysis Space);
In the ontology-based mash-up of social networking applications and collaboration tools
(in the Social Collaborative Working Space), and
In the discovery and retrieval of the semantically-linked cancer chemoprevention
significant online data and web resources (in the In Silico Models, Tools and
Experiments Space).
The GRANATUM Biomedical Semantic Model constitutes a cancer chemoprevention
ontological reference model, relying on widely–known and–adopted biomedical guidelines,
standards and controlled vocabularies. As this lightweight semantic model is going to be utilized
in the creation of a knowledge base of semantically interconnected distributed biomedical data
and resources across the Web, the structure of the GRANATUM Biomedical Semantic Model
will comprise:
A set of elements for the Literature representation and scientific discourse in online
communities at different levels of granularity;
A set of concepts from the biomedical domain that facilitate the representation of cancer
chemoprevention related data and resources. A set of concepts from the biomedical domain that facilitate the representation of
experimental data, procedures and protocols.
A set of concept from the biomedical domain that facilitates the representation of data
related to In-silico modeling.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 9
Confidentiality: EC Distribution
Figure 3 GRANATUM Biomedical Semantic Model scope
2.2. CONCEPTUALIZATION
The identification of the concepts and relations (Conceptualization) of the GRANATUM
Biomedical Semantic Model will follow a “meet-in-the-middle” approach. On one hand the
model will emerge in a bottom-up fashion by analyzing publicly available datasets (i.e
databases, SPARQL endpoints), experimental data, user requirements collected in deliverable
D1.1 and literature/functionality suggested by the non-technical partners. On the other hand, the
model will follow a top-down approach. Existing models and ontologies from the biomedical
domain, collected during the requirements analysis step, will be modularized in order to retrieve
the concepts and relationships relevant to the GRANATUM. Specifically, in order to define the
GRANATUM Biomedical Semantic Model the following steps have been carried out (Figure 4):
i. Identify the concepts/relations using a Top-Down approach (Section 2.2.1);
a. Analyze existing models and ontologies;
ii. Identify the concepts/relations using a Bottom-Up approach (Section 2.2.2);
a. Analyze existing data sets (databases, SPARQL endpoints);
b. Analyze user requirements;
c. Analyze experimental data;
iii. Merge concepts/relations from (i) and (ii) and define the Biomedical Semantic Model.
The concepts/properties may derive from one of the two approaches (bottom-up, top-down) or
form both. In the GRANATUM Biomedical Semantic Model we include all the concept
detected, regardless of the identification approach.
2.2.1. Top- Down Conceptualization
Existing models and ontologies relevant to the GRANATUM from the biomedical domain were
primarily collected during the review of the state of the art in D1.1. These ontologies have been
analyzed and modularized in order to retrieve the concepts and relationships relevant to the
GRANATUM scope (Figure 3), i.e. relevant to in-silico modeling, cancer chemoprevention,
scientific research and experimental data/protocols. The rest of the section presents the
ontologies/models that are analyzed (Section 2.2.1.1) and then presents the concepts and
relations identified (Section 2.2.1.2).
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 10
Confidentiality: EC Distribution
Granatum
biomedical
model
- Top down -
Existing ontologies/models
- Bottom up – Available data sets, experimental data, user requirements,
Figure 4 Bottom-up and Top-down conceptualization of the GRANATUM Biomedical Semantic Model
2.2.1.1. Existing Ontologies/models
This section presents the models and ontologies relevant to the GRANATUM scope that were
primarily collected during the review of the state of the art at D1.1.
2.2.1.1.1. Literature representation Ontologies
In this section we present ontologies that can represent concepts related to the scientific literature
and discourse, such as bibliographic records, citations, references, authors etc.
BiRO2
The Bibliographic Reference Ontology (BiRO) is an ontology for describing bibliographic
records and references, and their compilation into bibliographic collections and reference lists. It
can be used as a citation ontology, as a document classification ontology, or simply as a way to
describe any kind of document. It has been inspired by many existing document description
metadata formats, and can be used as a common ground for converting other bibliographic data
sources. It forms part of SPAR, a suite of Semantic Publishing and Referencing Ontologies.
CiTO3
The Citation Typing Ontology (CiTO) [4] is an ontology for describing the nature of reference
citations in scientific research articles and other scholarly works, both to other such publications
and also to Web information resources, and for publishing these descriptions on the Semantic
Web. Citations are described in terms of the factual and rhetorical relationships between citing
publication and cited publication, the in-text and global citation frequencies of each cited work,
2 http://purl.org/spar/biro 3 http://purl.org/spar/cito
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 11
Confidentiality: EC Distribution
and the nature of the cited work itself, including its publication and peer review status. It forms
part of SPAR, a suite of Semantic Publishing and Referencing Ontologies.
FaBiO4
The FRBR-aligned Bibliographic Ontology (FaBiO) is an ontology for recording and publishing
on the Semantic Web descriptions of entities that are published or potentially publishable, and
that contain or are referred to by bibliographic references, or entities used to define such
bibliographic references. FaBiO entities are primarily textual publications such as books,
magazines, newspapers and journals, and items of their content such as poems and journal
articles. However, they also include datasets, computer algorithms, experimental protocols,
formal specifications and vocabularies, legal records, governmental papers, technical and
commercial reports and similar publications, and also bibliographies, reference lists, library
catalogues and similar collections.
SIOC5
The Semantically-Interlinked Online Communities (SIOC) [5] Core Ontology provides the main
concepts and properties required to describe information from online communities (e.g., message
boards, wikis, weblogs, etc.) on the Semantic Web. It is an attempt to link online community
sites, to use Semantic Web technologies to describe the information that communities have about
their structure and contents, and to find related information and new connections between
content items and other community objects. Developers can use this ontology to express
information contained within community sites in a simple and extensible way.
SWAN6
SWAN (Semantic Web Applications in Neuromedicine) [6] is an interdisciplinary project to
develop a practical, common, semantically-structured, framework for biomedical discourse
initially applied, but not limited, to significant problems in Alzheimer Disease (AD) research.
The SWAN ontology has been developed in the context of building a series of applications for
biomedical researchers, as well as extensive discussions and collaborations with the larger bio-
ontologies community. The Citations ontology defines a set of entities useful for referencing
scientific publications.
2.2.1.1.2. Biomedical Ontologies
In this section we present ontologies that represent concepts related to the Cancer
Chemoprevention, the Experimental process and the in-silico modelling.
Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO)7
The intention of the ACGT MO [7] is to represent the domain of cancer research and
management in a computationally tractable manner. The ACGT MO is shaped as a cross-section
4 http://purl.org/spar/fabio
5 http://sioc-project.org/
6 http://swan.mindinformatics.org/ontology.html
7 http://www.ifomis.org/wiki/ACGT_Master_Ontology_%28MO%29
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 12
Confidentiality: EC Distribution
of a multitude of sub-domains and is aimed at constituting a terminology for transnational data
exchange in oncology, emphasizing the integration of both clinical and molecular data. It is built
using the Protégé-OWL free open-source ontology editor. It is written in OWL-DL and
presented as an .owl file. The ACGT MO is re-using BFO as upper level and the OBO RO. It
also uses the OBO RO for the relations between the classes.
Biological Pathway Exchange (BioPAX)8
BioPAX [8] aims to provide a common data exchange format that will facilitate the integration
and exchange of data maintained in several biological pathway databases. Indeed, there are more
than 200 biomedical databases storing biological pathway data. Therefore, merging diverse
database schemas to achieve integrated results from more than one database is quite difficult.
BioPAX provides a standard for representing metabolic, biochemical, transcription regulation,
protein synthesis and signal transduction pathways.
Biotop9
BioTop [9] is a top-domain ontology for molecular biology that provides definitions for the
foundational entities of biomedicine as a basic vocabulary to unambiguously describe facts in
this domain. BioTop can furthermore serve as top-level model for creating new ontologies for
more specific domains or as aid for aligning or improving existing ones.
CancerGrid Metamodel
The CancerGrid Metamodel [10] is instantiated with metadata elements to create a model of a
particular clinical trial. Trial and study designs are instances of the CancerGrid metamodel. This
gives a precise, computable definition of a clinical trial which allows generating the specific
runtime services needed, and helping to analyze the resulting data.
Experimental Factor Ontology (EFO)
The Experimental Factor Ontology (EFO) [11] is an application focused ontology modelling
experimental factors. The ontology has been developed to increase the richness of the
annotations, to promote consistent annotation, to facilitate automatic annotation and to integrate
external data. The methodology employed in the development of EFO involves construction of
mappings to multiple existing domain specific ontologies. This is achieved using a combination
of automated and manual curation steps and the use of a phonetic matching algorithm.
Gene Ontology (GO)10
Gene Ontology (GO) [12] is a controlled vocabulary for describing gene and gene product
attributes. Its aim is to address the need for consistent representation of gene product information
in different databases. In particular, different Model Organism Databases (MODs) describe the
same gene product information using different terms. To enable different databases to represent
data in a consistent way, the GO Consortium creates standard sets of terms (hierarchies or “name
spaces”) for describing biological processes, molecular functions and cellular components of
8 http://www.biopax.org/
9 http://www.imbi.uni-freiburg.de/ontology/biotop/
10 http://www.geneontology.org/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 13
Confidentiality: EC Distribution
gene products. This information describes gene products: i) functions on the molecular level, ii)
the biological processes these function contribute and iii) the place it is located in the cell. Terms
are related to each other within each hierarchy by is-a and part-of relationships. It started with
terminologies from three genomic databases (Flybase, the Saccharomyces Genome Database and
the Mouse Genome Database) and has grown to include many major genome repositories.
Medical Subject Headings (MeSH)11
Medical Subject Headings (MeSH) [13] is a controlled vocabulary created by the US National
Library of Medicine. It consists of sets of terms naming descriptors in a hierarchical structure
that is used for indexing, cataloguing, and searching for biomedical and health-related
information and documents. MeSH descriptors are arranged in both an alphabetic and a
hierarchical structure. There are 26,142 descriptors in 2011 MeSH. The use of MeSH for
providing names for biomedical entities in these applications is analogous in purpose to the use
of GO for providing standard names for biological processes and molecular functions.
Microarray Gene Expression Data Ontology (MGED)
Microarrays are a common experimental method being used to measure molecular-level
biomarkers for a variety of biological states and medical diseases. The Microarray Gene
Expression Data Ontology (MGED) [14] contains concepts, definitions, terms, and resources for
standardized description of a microarray experiments and results. Specifically, it provides a
terminology for annotating microarrays experiments. It describes the biological sample used in
an experiment, the treatment that the sample receives in the experiment, and the micro-array chip
technology used in the experiment. This basic information will aid researchers exploring third
party data to validate comparisons between data and help confirm interpretations of data. It is
necessary to know how an experiment was performed in order to interpret findings and make
comparison between interpretations.
National Cancer Institute (NCI) Thesaurus12
National Cancer Institute (NCI) Thesaurus [15] is a description logic terminology developed and
distributed by the US NCI for Bioinformatics and the Office of Cancer Communications [16]. It
integrates molecular and clinical cancer-related information enabling researchers to integrate,
retrieve and relate relevant concepts to one another in a formal structure, so that computers as
well as humans can use the Thesaurus for a variety of purposes. Today, the NCI contains
100,000 terms and 34,000 concepts, covering chemicals, drugs and other therapies, diseases
(more than 8,500 cancers and related diseases), genes and gene products, anatomy, organisms,
animal models, techniques, biologic processes, and administrative categories, including
definitions and synonyms [16].
Ontology for Biomedical Investigations (OBI)13
The Ontology for Biomedical Investigations (OBI) [17] is a controlled vocabulary that aims at
stimulating the integration of experimental data. The domain of OBI is the representation of
designs, protocols, instrumentation, materials, processes, data and types of analysis in all areas of
11
http://www.nlm.nih.gov/mesh/
12 http://ncit.nci.nih.gov
13 http://obi-ontology.org/page/Main_Page
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 14
Confidentiality: EC Distribution
biological and biomedical investigations. The ontology addresses the need of modelling all
biomedical investigations and as such contains ontology terms for aspects such as:
1. biological material
2. instrument (and parts of an instrument )
3. information content
4. design and execution of an investigation (and individual experiments)
5. data transformation (incorporating aspects such as data normalization and data analysis)
RxNorm14
RxNorm [18] is a standardized terminology for clinical drugs that addresses the lack of an
adequate standard for a national terminology for medications. RxNorm has been developed by
the HL7 Vocabulary Technical committee capitalizing on models used by four vendors of drug
knowledge bases (the NLM, the Food and Drug Administration, the Department of Veterans
Affairs (VA) and HL7) [19]. It contains standard names for clinical drugs (active drug
ingredient, dosage strength, physical form) and links from the active ingredient to brand name
and combination names. The principled, non-proprietary approach has lead RxNorm to be
recommended by National Committee for Vital and Health Statistics (NCVHS)15
as one of the
standard terminologies for the core patient medical record information [20].
Unified Medical Language System (UMLS)16
The Unified Medical Language System (UMLS) [21] was created by the US National Library of
Medicine (NLM) as a meta-terminology, summarizing the contents of other terminologies about
biomedical and health related concepts in order to enable interoperability between computer
systems. UMLS provides access, among other, to the following ontologies: i) MeSH, ii) NCI
Thesaurus, iii) Gene Ontology and iv) RxNorm.
2.2.1.2. Top- Down Concept identification
This section summarizes the concepts and properties that derive from the study of the existing
Ontologies/models listed in Section 2.2.1.1. The concepts and properties are separated into 3
domains: i) the Literature domain that contain concepts related to the publications and studies,
ii) the Experiment domain that contain concepts related to an experiment and iii) the Cancer
Chemoprevention domain that contain biomedical concepts related to the cancer
chemoprevention.
2.2.1.2.1. Literature Domain
The Literature Domain contains concepts related to the publications and studies, such as
bibliographic records, citations, references, authors etc. Table 3 contains the concepts identified
in the literature domain while Table 4 contains the properties detected in the literature domain.
14
http://www.nlm.nih.gov/research/umls/rxnorm/ 15
www.ncvhs.hhs.gov/050908rpt.htm 16
http://www.nlm.nih.gov/research/umls/about_umls.html
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 15
Confidentiality: EC Distribution
Published Work Research statement
Person
Forum post
SWAN Book
Journal article
Newspaper article
Newspaper news
Web article
Research statement
agent
-
CiTO - - -
BiRO - - -
FaBiO Work:
Report
Specification
Paper
Dataset
Essay
Expression
- -
SIOC - - User account
Table 3 Concepts identified in the literature domain
Name referenceTo hasStatement Author
Domain
Range
Published Work
Published Work
Published Work
Research statement
Published Work
Person
SWAN - - contributorAuthor
CiTO cites
- -
BiRO reference
- -
FaBiO - hasRealization
creator
SIOC -
- has_creator
Table 4 Properties identified in the literature domain
2.2.1.2.2. Experiment Domain
Experiment Domain contains concepts related to an experiment and the procedure followed.
Table 5 contains the concepts identified in the experiment domain while Table 6 contains the
properties detected in the experiment domain.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 16
Confidentiality: EC Distribution
Experiment Protocol Experimental factor
ACGT Experiment
Clinical trial
Protocol
ClinicalTrialProtocol
Organism
Substance sample
Technical object
Organ
Chemical substance
BIOTOP - -
Organism part
organ
EFO Assay
Protocol
Experimental process
Experimental factor:
Information entity
Material entity
Material property
Process
Site
MGED BioAssay
Experiment
Test
ExperimentDesign
Protocol
Experimental factor
Organism part
OBI assay
Protocol
Study design
Anatomical entity
Cell
Organism
CancerGrid Clinical trial
Trial protocol -
UMLS
(NCI, GO, MeSH)
Clinical trial
Animal experiment
Experiment
Biological assay
Assay
Research activity
Clinical trial protocol
Clinical protocol
Experimental Design
biospecimen
Research device
Tissue
Cell
Organism
Organ
Table 5 Concepts identified in the experiment domain
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 17
Confidentiality: EC Distribution
useFactor followProtocol
Domain
Range
Protocol
Experimental factor
Experiment
Protocol
ACGT - implements
BIOTOP - -
EFO hasInput
Inverse: is_input_of
Realizes
MGED has_experimental_factors
has_protocol
has_experimental_design
has_test_protocol
OBI has_specified_input
Inverse: is_specified_input_of
-
CancerGrid -
UMLS
(NCI, GO, MeSH)
receives_input_from
uses_device
uses_substance
has_method
has_associated_procedure
has_measurement_method
Table 6 Properties identified in the experiment domain
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 18
Confidentiality: EC Distribution
2.2.1.2.3. Biomedical Domain
Biomedical Domain contains biomedical concepts related to the cancer chemoprevention. Table 7 contains the concepts identified in the
experiment domain while Table 8 contains the properties detected in the experiment domain.
Pathway Target Disease Source Molecule
ACGT - Biological Macromolecule:
Protein
Nucleic acid
Disease
Drug
Chemotherapy Drug
Pharmac. Substance
-
BIOPAX Protein, RNA/DNA - Biosource -
BIOTOP - protein molecule
Nucleic acid structure
- - Biological
compound
EFO - Protein, Dna/Rna Disease
Cancer
drug Chemical
compound
MGED - - Disease state
Cancer site
compound (can be a
drug)
compound
OBI - Macromolecule
Nucleic acid
protein
Disease
- -
UMLS
(NCI, GO,
MeSH)
Pathway
Biochemical Pathway
Neural Pathways …
Proteins
Nucleic acid
Disease Pharmac. substance
Clinical drug
Pharmac. Preparations
Source
Natural source
Molecule
compound
Table 7 Concepts identified in the biomedical domain
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 19
Confidentiality: EC Distribution
partOfPathway hasTarget hasSource isPreventedBy
Domain
Range
Pathway
Target
Molecule
Target
Molecule
Source
Chemopreventive agent
Disease
ACGT - - - (drug hierarchy)
BIOPAX pathwayComponent
cofactor (at at pathway step)
controller (at a pathway step)
pathwayStep (each step has
participants)
- Organism
(domain: gene, protein,
range: biosource)
-
BIOTOP - - locatedIn
DrugRole
BiomedicalMaterialRole
EFO - has_role
- has_role
MGED - - - -
OBI - has_role
locatedIn
has_role
UMLS
(NCI, GO,
MeSH)
gene_is_element_in_pathway
gene_product_is_element_in_pa
thway
pathway_has_gene_element
chemical_or_drug_plays
_role_in_biological_pro
cess
gene_product_has_organism_so
urce
has_specimen_source_identity
is_organism_source_of_gene_pr
oduct
chemical_or_drug_plays_role_i
n_biological_process
Table 8 Properties identified in the biomedical domain
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 20
Confidentiality: EC Distribution
2.2.2. Bottom up construction of the GRANATUM model
The bottom-up construction of the model identifies concepts and properties based on existing
data sets and requirements that are relevant to the GRANATUM scope (Figure 3), i.e. relevant to
in-silico modeling, cancer chemoprevention, scientific research and experimental data/protocols.
Specifically, the following steps are followed:
Analysis of publicly available data sets related to the cancer chemoprevention (e.g.
KEGG, CheBI, ClinicalTrials etc). Those data sets have been detected during the review
of the state of the art at D1.1. The analysis is based on the data provided through the
SPARQL endpoints of each data set or through the searching mechanism provided.
Analysis of the requirements that have been detected during the requirements analysis in
D1.1. The functional and non-functional requirements are analyzed in order to detect
concepts and properties that can be used by the model. Moreover, concepts and properties
are detected based on the Usage Scenarios and the Questionnaire described in D1.1.
Analysis of experimental data sets provided by the trial partners related to cancer
chemoprevention.
2.2.2.1. Analysis of Publicly available datasets
In this section are presented and analyzed publicly available existing data sets related to the
cancer chemoprevention, that have been detected during the review of the state of the art at D1.1.
2.2.2.1.1. Existing datasets accessed through SPARQL endpoints
This section presents the publicly available datasets that are accessed through SPARQL
endpoints in order to detect concepts and properties related to cancer chemoprevention. Each
SPARQL endpoint may provide data from more than one data set (e.g. for example
http://linkedlifedata.com/ provides access to more than 20 datasets.)
Chemical Entities of Biological Interest (ChEBI)17
ChEBI is a database and ontology of small molecular entities. The term molecular entity refers to
any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion,
complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular
entities are either products of nature or synthetic products used to intervene in the processes of
living organisms. Molecules directly encoded by the genome, such as nucleic acids, proteins and
peptides derived from proteins by proteolysis cleavage, are not as a rule included in ChEBI.
Pubmed18
PubMed is the most widely used source for biomedical literature. PubMed provides access to
citations from the MEDLINE database and additional life science journals including links to
many full-text articles at journal Web sites and other related Web resources. The US NLM
(National Library of Medicine) at the NIH (National Institutes of Health) maintains the database
17
http://www.ebi.ac.uk/chebi/
18 http://www.ncbi.nlm.nih.gov/pubmed
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 21
Confidentiality: EC Distribution
as part of the Entrez information retrieval system. PubMed was first released in January 1996.
Today, much of the knowledge available regarding chemoprevention agents are only available as
publications. As a result, PubMed is typically the primary source of information for most
biomedical researchers.
DrugBank19
The DrugBank [22] database is a bioinformatics and cheminformatics resource that combines
detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug
target (i.e. sequence, structure, and pathway) information. The database contains 6826 drug
entries including 1431 FDA-approved (Food and Drug Administration) small molecule drugs,
133 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals and 5211 experimental
drugs. Additionally, 4435 non-redundant protein (i.e. drug target/enzyme/transporter/carrier)
sequences are linked to these drug entries. Each DrugCard entry contains more than 150 data
fields with half of the information being devoted to drug/chemical data and the other half
devoted to drug target or protein data.
Kyoto Encyclopedia of Genes and Genomes (KEGG)20
KEGG is a database resource that integrates genomic, chemical, and systemic functional
information. In particular, gene catalogues are linked to higher-level systemic functions of the
cell, the organism, and the ecosystem. Major efforts have been undertaken to manually create a
knowledge base for such systemic functions by capturing and summarizing experimental
knowledge in computable forms; namely, in the forms of molecular networks called KEGG
pathway maps, BRITE functional hierarchies, and KEGG modules. Continuous efforts have also
been made to improve the annotation procedure for linking genomes to the molecular networks.
As the result, KEGG is widely used for interpretation of large-scale datasets generated by
genome sequencing and other high-throughput experimental technologies. In addition to
maintaining the aspects to support basic research, KEGG is being expanded towards more
practical applications with molecular network-based views of diseases, drugs, and environmental
compounds.
Reactome21
Reactome is an open-source, open access, manually curated and peer-reviewed pathway
database. Pathway annotations are authored by expert biologists, in collaboration with Reactome
editorial staff and cross-referenced to many bioinformatics databases. The rationale behind
Reactome is to convey the rich information in the visual representations of biological pathways
familiar from textbooks and articles in a detailed, computationally accessible format. The core
unit of the Reactome data model is the reaction. Entities (nucleic acids, proteins, complexes and
small molecules) participating in reactions form a network of biological interactions and are
grouped into pathways. Examples of biological pathways in Reactome include signalling, innate
and acquired immune function, transcriptional regulation, translation, apoptosis and classical
intermediary metabolism.
19
http://www.drugbank.ca/
20 http://www.genome.jp/kegg/
21 http://www.reactome.org
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 22
Confidentiality: EC Distribution
Universal Protein Resource (UniProt)22
The UniProt is a comprehensive resource for protein sequence and annotation data. The UniProt
Knowledgebase (UniProtKB) is the central hub for the collection of functional information on
proteins, with accurate, consistent and rich annotation. In addition to capturing the core data
mandatory for each UniProtKB entry, as much annotation information as possible is added. This
includes widely accepted biological ontologies, classifications and cross-references, and clear
indications of the quality of annotation in the form of evidence attribution of experimental and
computational data.
Diseasome23
Diseasome is a disease/disorder relationships explorer and a sample of an innovative map-
oriented scientific work. Built by a team of researchers and engineers, it uses the Human Disease
Network dataset [23] and allows intuitive knowledge discovery by mapping its complexity. This
kind of data has a network-like organization, and relations between elements are at least as
important as the elements themselves. More data could be integrated to this prototype and could
eventually bring closer phenotype and genotype.
Dailymed24
DailyMed provides high quality information about marketed drugs. This information includes
FDA labels (package inserts). It contains health information providers and the public with a
standard, comprehensive, up-to-date, look-up and download resource of medication content and
labelling as found in medication package inserts. Drug labelling and other information in the
SPL is what has been most recently submitted by drug companies to the Food and Drug
Administration (FDA) as drug listing information. The drug labelling has been reformatted to
make it easier to read but its content has not been altered or verified by FDA or National Library
of Medicine.
Sider25
The Side Effect Resource (SIDER) represents an effort to aggregate dispersed public information
on side effects. SIDER contains information on marketed medicines and their recorded adverse
drug reactions. The information is extracted from public documents and package inserts. The
available information include side effect frequency, drug and side effect classifications as well as
links to further information, for example drug–target relations.
open-BioMed.org.uk26
open-BioMed allows one to search for information about alternative medicines associated with a
given disease, in terms of its putative effects, associated genes and relating clinical trials, from
different databases accessible through SPARQL endpoints.
22
http://www.uniprot.org/
23 http://diseasome.eu/
24 http://dailymed.nlm.nih.gov/dailymed/about.cfm
25 http://sideeffects.embl.de/
26 http://www.open-biomed.org.uk/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 23
Confidentiality: EC Distribution
BioGRID27
The Biological General Repository for Interaction Datasets (BioGRID) is a public database that
archives and disseminates genetic and protein interaction data from model organisms and
humans. BioGRID currently holds many interactions curated from both high-throughput datasets
and individual focused studies, as derived from publications in the primary literature. Current
curation drives are focused on particular areas of biology to enable insights into conserved
networks and pathways that are relevant to human health. BioGRID provides interaction data to
several model organism databases.
Freebase28
Freebase is a large collaborative knowledge base consisting of metadata composed mainly by
its community members. It is an online collection of structured data harvested from many
sources, including individual 'wiki' contributions. Freebase aims to create a global resource
which allows people (and machines) to access common information more effectively. Freebase
provides access to many biological data related to genes, proteins organisms etc.
HapMap29
The International HapMap Project is an organization that aims to develop
a haplotype map (HapMap) of the human genome, which will describe the common patterns of
human genetic variation. HapMap is a key resource for researchers to find genetic variants
affecting health, disease and responses to drugs and environmental factors. The information
produced by the project is made freely available to researchers around the world.
Human Protein Reference Database30
The Human Protein Reference Database (HPRD) [24] represents a centralized platform to
visually depict and integrate information pertaining to domain architecture, post-translational
modifications, interaction networks and disease association for each protein in the human
proteome. All the information in HPRD are extracted from the literature by expert biologists who
read, interpret and analyze the published data. It contains information pertaining to the biology
of most human proteins and proteins involved in human diseases.
HumanCYC31
The encyclopaedia of homo sapiens genes and metabolism (HumanCYC) [25] is a
bioinformatics database that describes human metabolic pathways and the human genome. By
presenting metabolic pathways as an organizing framework for the human genome, HumanCyc
provides the user with an extended dimension for functional analysis of Homo sapiens at the
genomic level. For example, HumanCyc has tools for analysis of human metabolomics and gene-
expression data.
27
http://thebiogrid.org/ 28
http://www.freebase.com/ 29
http://hapmap.ncbi.nlm.nih.gov/ 30
http://www.hprd.org/ 31
http://humancyc.org/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 24
Confidentiality: EC Distribution
IntAct32
IntAct provides a freely available, open source database system and analysis tools for protein
interaction data. All interactions are derived from literature curation or direct user submissions
and are freely available.
LHGND33
The Literature-derived Human Gene-Disease Network (LHGND) [26] is a publicly available
gene-disease repository. It uses the Text2SemRel system to automatically constructs knowledge
bases from textual data consisting of facts about entities using semantic relations. LHGDN is
part of the Linked Life Data initiative.
LinkedCT34
The Linked Clinical Trials (LinkedCT) is a Semantic Web data source for clinical trials data.
The data exposed by LinkedCT is generated by (1) transforming existing data sources of clinical
trials into RDF, and (2) discovering links between the records in the trials data and several other
data sources.
MetaCyc35
The MetaCyc [27] database is a comprehensive and freely accessible resource for metabolic
pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally
determined, small-molecule metabolic pathways and are curated from the primary scientific
literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic
pathways currently available. Pathways reactions are linked to one or more well-characterized
enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and
literature citations.
MINT36
The Molecular INTeraction database (MINT) [28] aims at storing, in a structured format,
information about molecular interactions by extracting experimental details from work published
in peer-reviewed journals. At present the MINT team focuses the curation work on physical
interactions between proteins. Genetic or computationally inferred interactions are not included
in the database. Over the past few years the number of curated physical interactions has soared to
over 95000.
NeuroCommons37
NeuroCommons is a project that seeks to make all scientific research materials - research
articles, knowledge bases, research data, physical materials - as available and as usable as they
can be. To achieve this, they use practices that render information in a form that promotes
32
http://www.ebi.ac.uk/intact/ 33
http://www.dbs.ifi.lmu.de/~bundschu/LHGDN.html 34
http://linkedct.org/ 35
http://metacyc.org/ 36
http://mint.bio.uniroma2.it 37
http://neurocommons.org
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 25
Confidentiality: EC Distribution
uniform access by computational agents. It covers general data and knowledge sources used in
computational biology as well as sources specific to neuroscience and neuromedicine.
PharmGKB 38
The Pharmacogenomics Knowledge Base (PharmGKB ) is a repository for genetic, genomic,
molecular and cellular phenotype data and clinical information about people who have
participated in pharmacogenomics research studies. The data includes, but is not limited to,
clinical and basic pharmacokinetic and pharmacogenomic research in the cardiovascular,
pulmonary, cancer, pathways, metabolic and transporter domains.
2.2.2.1.2. Existing datasets accessed through searching
This section presents the existing data sets that are not accessed through SPARQL endpoint but
are accessed through the searching mechanism usually provided by their web site.
PubMed Dietary Supplement Subset39
Pubmed Dietary Supplement Subset is designed to limit search results to citations from a broad
spectrum of dietary supplement literature including vitamin, mineral, phytochemical, ergogenic,
botanical, and herbal supplements in human nutrition and animal models. The subset retrieves
dietary supplement-related citations on topics including, but not limited to: chemical
composition; biochemical role and function - both in vitro and in vivo; clinical trials; health and
adverse effects; fortification; traditional Chinese medicine and other folk/ethnic supplement
practices; cultivation of botanical products used as dietary supplements; as well as, surveys of
dietary supplement use.
Dietary Supplements Labels Database40
The Dietary Supplements Labels Database offers information about label ingredients in more
than 5,000 selected brands of dietary supplements. It enables users to compare label ingredients
in different brands. Information is also provided on the “structure/function” claims made by
manufacturers and can therefore be used to narrow down active ingredients in different types of
food which may be applicable as chemoprevention agents. Ingredients of dietary supplements in
this database are linked to other databases such as MedlinePlus and PubMed to allow users to
understand the characteristics of ingredients and view the results of research pertaining to them.
ClinicalTrials41
ClinicalTrials.gov is an up-to-date registry and results database of federally and privately
supported clinical trials conducted in the US and around the world. ClinicalTrials.gov offers
information for locating federally and privately supported clinical trials for a wide range of
diseases and conditions.
38
http://www.pharmgkb.org/ 39
http://ods.od.nih.gov/research/PubMed_Dietary_Supplement_Subset.aspx 40
http://dietarysupplements.nlm.nih.gov/dietary/ 41
http://clinicaltrials.gov/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 26
Confidentiality: EC Distribution
TOXicology Data NETwork (TOXNET)42
TOXNET provides access to full-text and bibliographic databases oriented to toxicology,
hazardous chemicals, environmental health and related areas.
Aggregated Computational Toxicology Resource (ACToR)43
ACToR is an online warehouse of all publicly available chemical toxicity data and can be used
to find all publicly available data about potential chemical risks to human health and the
environment. ACToR aggregates data from over 500 public sources on over 500,000
environmental chemicals searchable by chemical name, other identifiers and by chemical
structure. The data warehouse allows users to search and query data from chemical toxicity
databases including:
i. ToxRefDB (animal toxicity studies).
ii. ToxCastDB (data from screening 1,000 chemicals in over 500 high-throughput assays).
iii. ExpoCastDB (consolidate and link human exposure and exposure factor data for
chemical prioritization).
iv. Distributed Structure-Searchable Toxicity (DSSTox) is a database that provides high
quality chemical structures and annotations. Its overall aims are to effect the closer
association of chemical structure information with existing toxicity data.
PubChem44
PubChem provides information on the biological activities of small molecules including
substance information, compound structures, and BioActivity data in three primary databases.
PubChem is integrated with Entrez, NCBI’s (National Center for Biotechnology Information)
primary search engine, and also provides compound neighbouring, sub/superstructure, similarity
structure, BioActivity data, and other searching features. PubChem contains substance and
BioAssay (Biological Assay) information from a multitude of depositors. The system is
maintained by the NCBI, a component of the NLM, which is part of the US NIH. PubChem can
be accessed for free through a web user interface. PubChem contains substance descriptions and
small molecules with fewer than 1000 atoms and 1000 bonds. More than 80 database vendors
contribute to the growing PubChem database.
Repartoire Database45
REPAIRtoire is a database resource for systems biology of DNA damage and repair. The
database collects and organizes the following types of information: (i) DNA damage linked to
environmental mutagenic and cytotoxic agents, (ii) pathways comprising individual processes
and enzymatic reactions involved in the removal of damage, (iii) proteins participating in DNA
repair and (iv) diseases correlated with mutations in genes encoding DNA repair proteins.
REPAIRtoire provides also links to publications and external databases. REPAIRtoire can be
42
http://toxnet.nlm.nih.gov/
43 http://actor.epa.gov/actor/faces/ACToRHome.jsp
44 http://pubchem.ncbi.nlm.nih.gov/
45 http://repairtoire.genesilico.pl/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 27
Confidentiality: EC Distribution
queried by the name of pathway, protein, enzymatic complex, damage and disease. In addition, a
tool for drawing custom DNA-protein complexes is available online.
Cancer Gene Expression Database (CGED)46
CGED is a database of gene expression profile and accompanying clinical information. The data
of CGED were obtained through collaborative efforts of Nara Institute of Science and
Technology, Osaka University Medical School, Kyoto University Medical School and Osaka
Medical Center for Caner and Cardiovascular Diseases to identify genes of clinical importance.
This database offers graphical presentation of expression and clinical data with similarity search
and sorting functions. CGED includes data on breast (prognosis and docetaxel data sets),
colorectal, hepatocellular, esophageal, thyroid, and gastric cancers (updated in March 2007).
ArrayExpress47
The ArrayExpress Archive is a database of functional genomics experiments including gene
expression where you can query and download data collected to Minimum Information About a
Microarray Experiment (MIAME) and Minimum Information about a high-throughput
SeQuencing Experiment (MINSEQE) standards. Gene Expression Atlas contains a subset of
curated and re-annotated archive data which can be queried for individual gene expression under
different biological conditions across experiments.
Gene Expression Omnibus (GEO)48
The GEO is a public repository that archives and freely distributes microarray, next-generation
sequencing, and other forms of high-throughput functional genomic data submitted by the
scientific community. In addition to data storage, a collection of web-based interfaces and
applications are available to help users query and download the studies and gene expression
patterns stored in GEO.
GenBank49
The GenBank sequence database is an open access, annotated collection of all publicly available
nucleotide sequences and their protein translations. This database is produced at NCBI as part of
the International Nucleotide Sequence Database Collaboration (INSDC). GenBank and its
collaborators receive sequences produced in laboratories throughout the world from more than
380,000 distinct organisms. Release 155, produced in August 2006, and contained over 65
billion nucleotide bases in more than 61 million sequences. The input stream of data coming into
the database is primarily as direct submissions from the scientific community and individual
laboratories as well as from bulk submissions from large-scale sequencing centers, on electronic
media, with little or no data being keyboarded from the printed page by the databank staff.
46
http://lifesciencedb.jp/cged/
47 http://www.ebi.ac.uk/arrayexpress/
48 http://www.ncbi.nlm.nih.gov/geo/
49 http://www.ncbi.nlm.nih.gov/genbank/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 28
Confidentiality: EC Distribution
ChemSpider50
ChemSpider is a free access website for chemists to research structure-based information. It links
together chemical structures and their associated information across the web, providing a single
searchable repository which contains millions of chemical structures.
ChemSpider builds on the collected sources by adding additional properties, related information
and links back to original data sources. It offers text and structure searching to find compounds
of interest and provides unique services to improve this data by curation and annotation and to
integrate it with users’ applications. Moreover, the ChemSpider SyntheticPages (CS|SP)51
,
extends this model to cover reactions, providing quick publication, peer review and semantic
enhancement of repeatable reactions
Chemical Compounds Database (Chembase)52
The Chembase collects and provides information on chemical compounds and their physical and
chemical properties, NMR (Nuclear Magnetic Resonance) spectra, mass spectra, UV/Vis
(Ultraviolet-Visible Spectroscopy) absorption and IR data. All data available can be searched by
various parameters or browsed by different topics.
Sigma-Aldrich53
The Sigma-Aldrich product database includes datasheets for commercially available compounds
including solubility.
ChemDB54
ChemDB is a public database of small molecules available on the Web. ChemDB is built using
the digital catalogs of over a hundred vendors and other public sources and is annotated with
information derived from these sources as well as from computational methods, such as
predicted solubility and 3D structure. It supports multiple molecular formats and is periodically
updated, automatically whenever possible. The current version of the database contains
approximately 4.1 million commercially available compounds and 8.2 million counting isomers.
The database includes a user-friendly graphical interface, chemical reactions capabilities, as well
as unique search capabilities.
Colon Chemoprevention Agents Database (CCAD)55
The Colon chemoprevention agents database [29] results from a systematic review of the
literature of colon chemoprevention in human, rats and mice. Target cancers are colorectal
adenoma and adenocarcinoma, aberrant crypt foci (ACF) (a preneoplasic lesion), and Min mice
polyp (adenomas in Apc+/- mutant mice). The chemopreventive agents are ranked by efficacy
(potency against carcinogenesis).
50
http://www.chemspider.com/
51 http://cssp.chemspider.com/
52 http://www.chembase.com/
53 https://www.sigmaaldrich.com/catalog/
54 http://cdb.ics.uci.edu/
55 http://corpet.free.fr/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 29
Confidentiality: EC Distribution
Wikipathways56
Wikipathways [30] is an open, collaborative platform dedicated to the curation of biological
pathways. WikiPathways thus presents a model for pathway databases that enhances and
complements ongoing efforts, such as KEGG, Reactome and Pathway Commons.
cPath: Pathway Database Software57
cPath is a software platform for collecting/querying biological pathways. It can serve as the core
data handling component in information systems for pathway visualization, analysis and
modelling. Using it, researchers can import interaction and pathway data from multiple sources,
access such data via a standard web interface, and export data to third-party applications via a
standards-based web service. Biomedical researchers can utilize cPath for content aggregation,
query and analysis. More specifically, its main features include: i) Aggregate pathway data from
multiple sources (e.g. BioCyc, KEGG, Reactome), ii) Import/Export support with different
formats PSI-MI (Proteomics Standards Initiative Molecular Interaction) and BioPAX, iii) Data
visualization using Cytoscape and iv) Simple web service.
Protein Data Bank (PDB)58
The PDB is a repository for the 3D structural data of large biological molecules, such as proteins
and nucleic acids. The data, typically obtained by X-ray crystallography or NMR (Nuclear
Magnetic Resonance) spectroscopy and submitted by biologists and biochemists from around the
world, are freely accessible on the Internet. Most major scientific journals, and some funding
agencies, require scientists to submit their structure data to the PDB.
Protein Database59
The Protein database is a collection of sequences from several sources, including translations
from annotated coding regions in GenBank, and TPA (Tissue plasminogen activator), as well as
records from SwissProt, Protein Information Resource (PIR), Protein Research Foundation
(PRF), UniProt and PDB. Protein sequences are the fundamental determinants of biological
structure and function.
56
http://www.wikipathways.org
57 http://cbio.mskcc.org/software/cpath/
58 http://www.pdb.org
59 http://www.hprd.org/
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 30
Confidentiality: EC Distribution
2.2.2.1.3. Concept/attribute identification from Existing datasets
This section identifies the concepts detected into both types of dataset (i.e. accessed through
SPARQL endpoints or through searching). For each concept a list of related attributes are
presented. The results are presented in Table 9.
Concept SPARQL
endpoints
Data sets Attributes
Published Work -
Title
Author
Citation
Abstract
Type (e.g. journal)
Research statement - - -
Person Name
Surname
Experiment - Description
Title
Clinical Trial - Description
Location
Date
Title
BioAssay Name
Description
BioActive Compounds
(compounds/substances tested in the
BioAssay)
Protocol Description
Experimental factor - Name
Description
Pathway Name
Description
Component
Order
Target Name
Description
Nucleic acid Gene sequence
Protein Cellular Location
Organism
Protein sequence
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 31
Confidentiality: EC Distribution
Disease Name
Description
possibleDrug/isTreatedBy
Source - Name
Description
Drug Name/ brand name
Description
Type
Formula/Smiles
Dosage
Adverse reaction
Indication
Has target
Natural source -
Name
Description
Molecule Name
Description
Formula
Molecular weight
Chemopreventive
agent
Name
Description
Table 9 Concepts detected into dataset (accessed through SPARQL endpoints or through searching)
2.2.2.2. Experimental data analysis
This section aims to identify the concepts, properties and attributes detected by analyzing sets of
experimental data relevant to the cancer chemoprevention. To succeed this, two sets of
experimental data were analyzed. The experimental data sets and the results, conclusions related
to each data set are documented in separate published studies:
The first experimental data set is used by [31]. The authors establish and utilize a mouse
mammary gland organ culture model (MMOC) as a bioassay for identifying
chemopreventive agents. More than 200 synthetic and natural product-derived molecules
were evaluated in this model. For each molecule a number of bioassays are conducted
measuring its activity (e.g. the inhibition of carcinogen-induced development of
precancerous lesions in the MMOC and the the percentage of the MMOC activity). If the
measured activity meets specific requirements then the molecule is identified as a
chemopreventive agent. Figure 5 shows a snapshot of the experimental data set produced.
The whole experimental data set constitutes a Study that contains many Bioassays each
examining the activity of a molecule that can be identified as a Chemopreventive agent.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 32
Confidentiality: EC Distribution
Figure 5 Conceptualization of the first experimental data set
The second experimental data set is used by [32]. This study identifies potential cancer
chemopreventive constituents using a battery of cell- and enzyme-based in vitro marker
systems relevant for prevention of carcinogenesis in vivo. A number of known
chemopreventive substances have been tested belonging to several structural classes as
reference compounds for the identification of novel chemopreventive agents or
mechanisms. For each chemopreventive agent a number of bioassays were conducted
measuring its activity. Figure 6 shows a snapshot of the experimental data set produced.
The whole experimental data set constitutes a Study that contains many Bioassays each
examining the activity of a Chemopreventive agent.
Figure 6 Conceptualization of the second experimental data set
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 33
Confidentiality: EC Distribution
The concepts and properties derived from the experimental data analysis are show in Table 10
and Table 11 respectively.
Concept Description
Study A study is a paper that describes the experiment, the experimental data used
and the results.
Bioassay A bioassay includes specific measurements for the activity of the
chemoprevention agents.
Chemopreventive agent A chemopreventive agent is a single tested compound during a bioassay.
Table 10 Concepts derive from experimental data analysis
Property Description
hasInput This property is used to declare that a molecule is used as input to the
Bioassay-Experiment.
identify This property is used to declare that a Bioassay-Experiment has identified a
chemopreventive agent that meets specific requirements.
Table 11 Properties derive from experimental data analysis
2.2.2.3. Requirements analysis
This section aims to identify the concepts and properties detected by analyzing the Usage
Scenario and the Questionnaire described in D1.1. Specifically, Table 12 contains the analysis of
the Usage Scenarios and the Questionnaire that are presented in D1.1. It lists the concepts
derived from the Usage Scenarios and the Questionairre.
Concept Usage Scenario Questionnaire
Publication US1/US4
Research statement US4 -
Protocol US1 -
Experimental Factor US1 (tissue/cell line) -
Experiment US1
Clinical trial US2
Virtual screening US3 -
BioAssay US1 -
Pathway US1
Molecule US1/US2/US3
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 34
Confidentiality: EC Distribution
Chemopreventive agent US1/US2
Drug US3
Natural Source US1 -
Disease US1 -
Protein US1/ US3
Gene US1
Table 12 Concepts derived from the Use Cases and the Questionnaires
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 35
Confidentiality: EC Distribution
3. GRANATUM BIOMEDICAL SEMANTIC MODEL FORMALIZATION
At this stage of development, the GRANATUM Biomedical Semantic Model has identified the
main entities and their core set of properties, together with main relations between them.
Following a “meet-in-the-middle” approach by combining both top-down and bottom-up
conceptualization, a set of fundamental entities has been found:
A set of elements for the representation of Literature;
A set of concepts from the biomedical domain related to the cancer chemoprevention; A set of concepts from the biomedical domain related to the experimental
representation;
A set of concept from the biomedical domain related to the In-silico modeling.
The following sections describe in depth both the conceptual view, and the ontological
representation of the identified concepts.
3.1. OVERVIEW
The objective of the Granatum Biomedical Semantic Model (Figure 7) is to recognize the
fundamental biomedical entities used in cancer chemoprevention and define their relations. The
Granatum Biomedical Semantic Model defines the following entities:
Information resource
o Unstructured knowledge resource
Image
o Semi-structured knowledge resource
Forum post
Published Work
Research statement
Person
Protocol
Experimental Factor
Experiment
o In vivo
Clinical trial
o In vitro
BioAssay
o In silico
Virtual screening
Scientific Workflow
Molecule
o Chemopreventive agent
o Target
Protein
Nucleic acid
Reactive oxidative species
Sugar
Lipid
Toxicity
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 36
Confidentiality: EC Distribution
Source
o Drug
o Natural Source
Pathway
Disease
Figure 7 Overview of the GRANATUM Biomedical Semantic Model
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 37
Confidentiality: EC Distribution
3.2. MODEL SPECIFICATION
This section defines the data model used by the GRANATUM Biomedical Semantic Model by
defining the classes and properties used by the model. For each class are mentioned the URI, a
definition, the type of the class, its super classes and the properties that use the class as domain
or as range.
granatum:InformationResource
URI http://www.granatum.eu#InformationResource
Definition A resource that provides data, knowledge or narrative information.
sub-class-of -
In-domain-of Property Range
- -
granatum:UnstructuredKnowledgeResource
URI http://www.granatum.eu#UnstructuredKnowledgeResource
Definition A resource that provides access to collection of data or information that is not easily
queryable without using metadata about the resource.
sub-class-of granatum:InformationResource
In-domain-of Property Range
- -
granatum:Image
URI http://www.granatum.eu#Image
Definition A resource that provides data in the form of images.
sub-class-of granatum:UnstructuredKnowledgeResource
In-domain-of Property Range
URI xsd:string
granatum:SemiStructuredKnowledgeResource
URI http://www.granatum.eu#SemiStructuredKnowledgeResource
Definition A resource that provides data that is partially structured.
sub-class-of granatum:InformationResource
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 38
Confidentiality: EC Distribution
In-domain-of Property Range
- -
granatum:PublishedWork
URI http://www.granatum.eu#PublishedWork
Definition This entity refers to any type of publication that makes content available to public.
Each publication has at least one Author, supports a number of Research Statements
and is commented in a number of Forum posts. A published work can be: a Book, a
conference Article, a Journal article etc.
sub-class-of granatum:SemiStructuredKnowledgeResource
In-domain-of Property Range
granatum:hasStatement granatum:ResearchStatement
granatum:hasAuthor granatum:Person
granatum:commentedIn granatum:ForumPost
granatum:referenceTo granatum:PublishedWork
granatum:title xsd:string
granatum:Citation xsd:string
granatum:Abstract xsd:string
granatum:Type xsd:string
granatum:ForumPost
URI http://www.granatum.eu#ForumPost
Definition A Forum Post is related to a Published Work and contains comments about this
specific Publication. It allows the discussion and the exchange of ideas related to the
Published Work.
sub-class-of granatum:SemiStructuredKnowledgeResource
In-domain-of Property Range
granatum:hasAuthor granatum:Person
granatum:Date xsd:Date
granatum:Comment xsd:string
URI xsd:string
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 39
Confidentiality: EC Distribution
granatum:ResearchStatement
URI http://www.granatum.eu#ResearchStatement
Definition A Research Statement is a declarative sentence supported by a specific Publication.
sub-class-of -
Property Range
granatum:Description xsd:string
granatum:Hypothesis xsd:string
granatum:Claim xsd:string
granatum:Person
URI http://www.granatum.eu#Person
Definition This class represents the author that has written or has contributed to the writing of a
Published Work or Forum post. The Author will be a FOAF person.
sub-class-of foaf:person
Property Range
granatum:Name xsd:string
granatum:Surname xsd:string
granatum:Affiliation xsd:string
granatum:contactDetails xsd:string
granatum:Protocol
URI http://www.granatum.eu#Protocol
Definition A protocol is an information entity which is a set of instructions that describe how an
experiment is done.
sub-class-of -
Property Range
granatum:referredToPublication granatum:PublishedWork
granatum:useFactor granatum:ExperimentalFactor
granatum:Title xsd:string
granatum:Procedure xsd:string
granatum:Description xsd:string
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 40
Confidentiality: EC Distribution
granatum:ExperimentalFactor
URI http://www.granatum.eu#ExperimentalFactor
Definition Experimental factors are the variable aspect of an experiment design which can be used to
describe an experiment, or set of experiments, in an increasingly detailed manner.
sub-class-of -
Property Range
granatum:Title xsd:string
granatum:Description xsd:string
granatum:Experiment
URI http://www.granatum.eu#Experiment
Definition An experiment is a methodical procedure carried out with the goal of verifying,
falsifying, or establishing the validity of a hypothesis. The experiment follows a Protocol
to check the hypothesis and its findings may be published to a Publication.
sub-class-of -
Property Range
granatum:followProtocol granatum:Protocol
granatum:describedInPublication granatum:PublishedWork
granatum:identify granatum:ChemopreventiveAgent
granatum:hasInput granatum:Target
granatum:Molecule
granatum:Title xsd:string
granatum:Description xsd:string
granatum:Date xsd:Date
granatum:Outcome xsd:string
granatum:InVivo
URI http://www.granatum.eu#InVivo
Definition In vivo (Latin for “within the living”) refers to experimentation using a whole, living organism as opposed to a partial or dead organism.
sub-class-of granatum:Experiment
Property Range
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 41
Confidentiality: EC Distribution
- -
granatum:Clinical trial
URI http://www.granatum.eu#Clinical_trial
Definition A Clinical trial is a research study that prospectively assigns human participants or
groups of humans to one or more health-related interventions to evaluate the effects on
health outcomes. Interventions include but are not restricted to drugs, cells and other
biological products, surgical procedures, radiologic procedures, devices, behavioral
treatments, process-of-care changes, and preventive care.
sub-class-of granatum:InVivo
Property Range
granatum:ParticipantDetails xsd:string
granatum:InVitro
URI http://www.granatum.eu#InVitro
Definition In vitro (Latin for within the glass) refers to the technique of performing a given
procedure in a controlled environment outside of a living organism.
sub-class-of granatum:Experiment
Property Range
- -
granatum:BioAssay
URI http://www.granatum.eu#BioAssay
Definition A Bioassay is a laboratory test or analysis of the biological activity of a substance (e.g.
Chemopreventive agent) performed by studying its effect on an organism or in a test
tube under controlled conditions. A Bioassay is part of an experiment that includes also
other bioassays.
sub-class-of granatum:InVitro
Property Range
- -
granatum:InSilico
URI http://www.granatum.eu#InSilico
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 42
Confidentiality: EC Distribution
Definition In silico is an experiment performed on computer or via computer simulation.
sub-class-of granatum:Experiment
Property Range
granatum:useWorkflow granatum:ScientificWorkflow
granatum:VirtualScreening
URI http://www.granatum.eu#VirtualScreening
Definition Virtual Screening refers to the technique of performing a Biomedical experiment
entirely in a computer via computer simulation
sub-class-of granatum:InSilico
Property Range
- -
granatum:ScientificWorkflow
URI http://www.granatum.eu#ScientificWorkflow
Definition The Scientific workflow is a pipeline of connected components (in-silico tools, models)
to perform an in silico experiment.
sub-class-of -
Property Range
granatum:Title xsd:string
granatum:Description xsd:string
granatum:Molecule
URI http://www.granatum.eu#Molecule
Definition The smallest particle of a substance that has all of the physical and chemical properties
of that substance. Molecules are made up of one or more atoms. If they contain more
than one atom, the atoms can be the same (an oxygen molecule has two oxygen atoms)
or different (a water molecule has two hydrogen atoms and one oxygen atom). Biological
molecules, such as proteins and DNA, can be made up of many thousands of atoms.
sub-class-of -
Property Range
granatum:Title xsd:string
granatum:Description xsd:string
granatum:Formula xsd:string
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 43
Confidentiality: EC Distribution
granatum:SMILES xsd:string
granatum:MolecularWeight xsd:int
granatum:Size xsd:int
granatum:ChemopreventiveAgent
URI http://www.granatum.eu#ChamopreventiveAgent
Definition A Chemopreventive agent is a Molecule that has shown some evidence that it may be able
to prevent or delay the development of a specific Disease (e.g. cancer) by interfering with
a Biological target. For example - cancer chemopreventive agents are used to inhibit,
delay, or reverse carcinogenesis. A Chemopreventive agent can be identified in an
Experiment and can be contained in a Natural source or a Drug. It can also be a synthetic
chemical agent. References to a Chemopreventive agent can be found in a Publication
sub-class-of granatum:Molecule
Property Range
granatum:Title xsd:string
granatum:Description xsd:string
granatum:induceDifferentiation xsd:boolean
granatum:coopperateWith granatum:ChemopreventiveAgen
t
granatum:hasToxicity granatum:Toxicity
granatum:referredInToPublication granatum:PublishedWork
granatum:induce/prevent granatum:Target
granatum:affectPathway granatum:Pathway
granatum:hasSource granatum:Source
granatum:Target
URI http://www.granatum.eu#BiologicalTarget
Definition A biological target is a biopolymer such as a protein or nucleic acid whose activity can
be modified by an external stimulus. A Chemopreventive agent has a Biological target in
order to "hit" it and change its behavior in order to prevent a disease.
sub-class-of granatum:Molecule
Property Range
granatum:Title xsd:string
granatum:Description xsd:string
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 44
Confidentiality: EC Distribution
granatum:ReactiveOxygenSpecies
URI http://www.granatum.eu#ReactiveOxygenSpecies
Definition Reactive oxygen species are organic or inorganic chemicals that contain an oxygen atom
with an unpaired electron. This unstable electron configuration causes these chemicals to
be highly reactive with other molecules.
sub-class-of granatum:Target
Property Range
- -
granatum:Sugar
URI http://www.granatum.eu#MicroMolecule
Definition Any member of a class of edible, crystalline carbohydrates (mainly sucrose, lactose and
fructose) characterised by a sweet flavour; a loose term applied to monosaccharides,
disaccharides, trisaccharides and oligosaccharides, in contrast to complex carbohydrates
such as polysaccharides.
sub-class-of granatum:Target
Property Range
- -
granatum:Protein
URI http://www.granatum.eu#Protein
Definition A Protein is a group of complex organic macromolecules composed of one or more
chains (linear polymers) of alpha-L-amino acids linked by peptide bonds and ranging in
size from a few thousand to over 1 million Daltons. Proteins are fundamental genetically
encoded components of living cells with specific structures and functions dictated by
amino acid sequence.
sub-class-of granatum:Target
Property Range
granatum:ProteinSequence xsd:string
granatum:CellularLocation xsd:string
granatum:NucleicAcid
URI http://www.granatum.eu#NucleicAcid
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 45
Confidentiality: EC Distribution
Definition Nucleic Acid are a family of macromolecules, composed of various moieties: purines,
pyrimidines, phosphoric acid, and a pentose, either d-ribose or d-deoxyribose. Nucleic
acids as DNA or RNA is found in the chromosomes, nucleoli, mitochondria, and
cytoplasm of all cells, and in viruses. Nucleic acids are the major players in controlling
cellular function and heredity.
sub-class-of granatum:Target
Property Range
granatum:Sequence xsd:string
granatum:Lipid
URI http://www.granatum.eu#Lipid
Definition A class of hydrocarbon-containing organic compounds. Lipids are insoluble in water but
soluble in nonpolar solvents and play important roles in living organisms: these roles
include functioning as energy storage molecules, serving as structural components of cell
membranes, and constituting important signaling molecules. Lipids can be subdivided
into 2 groups: fatty acids and glycerides.
sub-class-of granatum:Target
Property Range
- -
granatum:Toxicity
URI http://www.granatum.eu#Toxicity
Definition Toxicity is a quality of a chemical substance which indicates the capacity to cause injury
to an organism in a dose dependent manner.
sub-class-of -
Property Range
granatum:CellType xsd:string
granatum:level xsd:string
granatum:species xsd:string
granatum:concentration xsd:string
granatum:Source
URI http://www.granatum.eu#Source
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 46
Confidentiality: EC Distribution
Definition A Source refers to the sources that a Chemopreventive agent is available or from where
it originates.
sub-class-of -
Property Range
granatum:contains granatum:Molecule
granatum:Drug
URI http://www.granatum.eu#Drug
Definition A Drug is any substance which when absorbed into a living organism may modify one or
more of its functions. The term is generally accepted for a substance taken for a
therapeutic purpose, but is also commonly used for abused substances. A
Chemopreventive agent may be contained into a drug. (Chebi)
sub-class-of granatum:Source
Property Range
granatum:CommonName xsd:string
granatum:AdverseReaction xsd:string
granatum:Type xsd:string
granatum:interact granatum:Target
granatum:NaturalSource
URI http://www.granatum.eu#NaturlaSource
Definition A Natural Source is a material found in nature that usually has a pharmacological or
biological activity. A Chemopreventive agent may be contained into a Natural Source.
sub-class-of granatum:Source
Property Range
granatum:CommonName xsd:string
Granatum:Pathway
URI http://www.granatum.eu#Pathway
Definition A Pathways is a set or series of interactions, often forming a network, which biologists
have found useful to group together for organizational, historic, biophysical, or other
reasons. The Chemopreventive agent affects a Biological target in order to “break” the
series of interactions that leads to a Disease (i.e. cancer).
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 47
Confidentiality: EC Distribution
sub-class-of -
Property Range
granatum:containsTarget granatum:Target
granatum:relatedToDisease granatum:Disease
granatum:Title xsd:string
granatum:Description xsd:string
granatum:PathwayMap xsd:anyURI
granatum:Order xsd:string
Granatum:Disease
URI http://www.granatum.eu#Pathway
Definition A Disease is any abnormal condition of the body or mind that causes discomfort,
dysfunction, or distress to the person affected or those in contact with the person. The
term is often used broadly to include injuries, disabilities, syndromes, symptoms, deviant
behaviors, and atypical variations of structure and function.
sub-class-of -
Property Range
granatum:isPreventedBy granatum:ChemopreventiveAgent
granatum:Title xsd:string
granatum:Description xsd:string
3.3. MODEL IMPLEMENTATION
There are several languages that can be used to implement an ontology. Very generic and
flexible ones (such as OWL) allow to express complex relationship between the concepts and the
roles in the domain; ontologies using such formalisms are called heavyweight ontologies in
contrast to lighweight ontologies which use simpler formalisms (as RDF or RDF schema) with
fewer possibilities to express complex relationship. Choosing the proper formalism is strict
linked to the computational operations to be performed on top of the model.
The GRANATUM Biomedical Model aims to solve interoperability issues, and interconnect
different existing ontologies. Since mapping same concepts expressed differently by diverse
ontologies could not be so simple and conflicts that might be raised could need reasoning
techniques to be employed, then the chosen formalism for the GRANATUM ontology is OWL 2
language. The lightweight approach has been discarded to avoid limits in the expressiveness of
the ontology.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 48
Confidentiality: EC Distribution
4. MODEL EVALUATION
The GRANATUM Biomedical Semantic Model needs to be evaluated and tested to check that it
fulfils the requirements defined in the specification phase (Section 2.1). It should also be
evaluated according to criteria such as clarity, the ontology and its terms should be clear and
unambiguous, consistency, the ontology needs to be free from contradictions, and reusability,
define the possibilities to reuse the ontology and the extent of reuse. The evaluation criteria are
further described in Section 4.1. A set of existing methodologies proposed for ontology
evaluation are presented in section 4.2. Finally, Section 4.3 presents the results of the evaluation
procedure.
4.1. EVALUATION CRITERIA
The criteria used for the evaluation of the ontology are proposed in existing methodologies for
ontology evaluation [33] [34, 35]. Table 13 presents the evaluation criteria with a brief
description.
Object of evaluation Description
Lexicon and vocabulary Emphasizes the handling of concepts and instances and the
vocabulary used to identify them
Hierarchy, Taxonomy Emphasizes taxonomic relations (is-a relations)
Semantic relations Evaluates other relations, which are not taxonomic relations
Context or application Evaluates ontologies in their context of use and in the context of
application of which the ontology itself is part
Syntax Evaluates ontology conformity to syntactical requirements of
formal language in which the ontology was developed
Structure and architecture Evaluates ontology conformity to predefined structural
requirements
Table 13 Ontology evaluation criteria
4.2. EVALUATION METHODOLOGY
Various approaches for the evaluation of ontologies have been considered in the literature,
depending on what kind of ontologies are being evaluated and for what purpose. Broadly
speaking, most evaluation approaches fall into one of the categories described into Table 14.
Evaluation methodology Description
Golden standard [36, 37] Syntactic comparison between an ontology and a
standard, which may be another ontology
Application Based [38, 39] Use of an ontology in an application followed by
evaluation of the results.
Data or corpus driven[40, 41] Comparison with a data source covered by the ontology
proper
Human assessment[42, 43] Evaluation conducted by people who seek to verify the
adherence of an ontology to criteria and patterns.
Table 14 Methodologies for ontology evaluation
Each of the evaluation methodologies (presented to Table 14) is capable to check some of the
evaluation criteria (presented to Table 13). The evaluation criteria for each evaluation
methodology are presented in Table 15.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 49
Confidentiality: EC Distribution
Evaluation
criteria
Evaluation methodology
Golden
standard
Application
Based
Data or corpus
driven
Human
assessment
Lexicon and
vocabulary
Hierarchy,
Taxonomy
Scemantic
relations
Context or
application
- -
Syntax - -
Structure and
architecture
- - -
Table 15 Evaluation criteria for each evaluation methodology
The “Golden standard” methodology cannot be used for the evaluation of the Granatum
biomedical Semantic model since no standard exists that covers the scope of the Granatum
model. The “Application Based” methodology could be used at the maintenance step, once the
Granatum platform is complete and evaluation results can be extracted from its use. It cannot be
used as an evaluation method at this stage where currently no application exists. The “Data or
corpus driven” methodology is incorporated in the conceptualization phase (Section 2.2.2.2)
where experimental data sets are analyzed to identify concepts. The most appropriate evaluation
method at this stage of the ontology creation is the “Human assessment”. At the evaluation
process of the ontology the trial partners (i.e. DKFZ, IITRI and UCY/CBC) are actively involved
in order to identifying inconsistencies and weaknesses of the ontology. In order to simplify the
evaluation process a questionnaire is provided.
4.3. EVALUATION RESULTS
The questionnaire (see Appendix) examines the completeness, correctness, usability and the
simplicity of the Granatum biomedical semantic model. It is separated into two parts:
The first part examines the usability and the simplicity of the model. In this part the
experts were asked to answer a tailored version of the System Usability Scale (SUS) [44]
that is proposed by [45] in order to evaluate the understanding and agreement felt by the
biomedical experts regarding the Granatum model as a whole. It contains 7 Likert scale
questions (stating the degree of agreement or disagreement).
The second part examines the correctness and the completeness of the model. It contains
4 questions related to the definitions of the model’s concepts (in case no standard
definitions are detected in existing ontologies) and 20 questions for the validation of the
relations between the concepts that exist in the GRANATUM Biomedical Semantic
Model. Moreover it provides the biomedical experts the ability to express any
disagreement or detect any concept or property missing.
The next sections present the results from both parts of the questionnaire.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 50
Confidentiality: EC Distribution
4.3.1. Usability evaluation
This section presents the results from the usability evaluation. The majority of the biomedical
experts (71.42% agreement and 14.29% high agreement) declared that they could contribute to
the ontology while 14.29% where indifferent (Question 1). The understanding of the ontology is
examined by Questions 2, 3, 5 and 6. Most of the biomedical experts (42.86%) found the
ontology easy to understand (Question 2), but there are experts that did not found it easy
(28.57%) and others that are indifferent (28.57%). Moreover, most of the experts understand the
conceptualization (Question 6) of the ontology (71.42% agreement). Regarding Question 3, the
answers vary, but most of the users (14.29% agreement and 28.57% high agreement) will need
further theoretical support to be able to understand the ontology. The same conclusion derives
also from Question 5, where most of the users agreed that the biomedical experts could not easily
understand the ontology (14.29% high disagreement, 28.57% disagreement, 57.14% indifferent).
Finally, assuming the completeness (Question 7) and integration (Question 4) of the ontology
most of the users found the concepts of the ontology well integrated (71.42% agreement and
14.29% high agreement) and they believe that the ontology covers the needs of the Cancer
Chemoprevention domain (42.86% agreement). The usability results are presented in Table 16.
N Question High
disagree
Disagreeme
nt
Indifferent Agreement High agree
1 I think that I could
contribute to this ontology
0.00% 0.00% 14.29% 71.42% 14.29%
2 I find the ontology easy to
understand
0.00% 28.57% 28.57% 42.86% 0.00%
3 I think that I would need
further theoretical
support to be able to
understand this ontology
14.29% 14.29% 28.57% 14.29% 28.57%
4 I found the various
concepts in this model
were well integrated
0.00% 0.00% 14.29% 71.42% 14.29%
5 I would imagine that most
biomedical experts would
understand this ontology
very quickly
14.29% 28.57% 57.14% 0.00% 0.00%
6 I am confident I
understand the
conceptualization of the
ontology
0.00% 0.00% 28.57% 71.42% 0.00%
7 The concepts/properties
of the ontology cover the
needs of the Cancer
Chemoprevention
domain.
0.00% 0.00% 57.14% 42.86% 0.00%
TOTALS 4.08% 10.21% 32.65% 44.90% 8.16%
Table 16 Usability evaluation
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 51
Confidentiality: EC Distribution
4.3.2. Correctness and completeness evaluation
This section presents the results from the correctness and completeness evaluation. The
biomedical experts generally agreed with the concepts and properties of the model but they also
proposed changes to the definitions of the concepts as well as addition/deletion of concepts
properties. Those changes refer to addition/deletion of concepts/properties/attributes and are
adopted by the Granatum biomedical semantic model presented in Section 3. The changes are
described in Table 17.
Change Description
Add concept
“Information Resource”
Add the concept “Information Resource” as an upper class for the concepts
that carry any type of information
Add subclasses of
Information Resource the
“Semi-structured
Knowledge Resource”
and “Unstructured
Knowledge Resource”
Add the concepts “Semi-structured Knowledge Resource” and
“Unstructured Knowledge Resource” as subclasses of the Information
resource in order to conceptually separate the rest concepts. They also
proposed to make the Forum post and Published work subclass of the Semi-
structured Knowledge Resource and add a new concept Image as subclass
of Unstructured Knowledge Resource.
Change Experiment
hierarchy
Separate the experiments into 3 types: i) in vivo, ii) in vitro and iii) in
silico. Moreover they proposed to add BioAssay as a subclass of in vitro
Experiment and not as an independent class.
Add concept “Scientific
workflow”
Add a new concept “Scientific Workflow” that shows the pipeline of
connected components in order to perform an in silico experiment.
Add properties to
Chemopreventive agent
Add attributes to the Chemopreventive agent: i) cooperateWith other agents
and ii) induce differentiation.
Add concept “Toxicity” Add a concept “Toxicity” that is related with a Chemopreventive agent.
The Toxicity has properties such as: i) cell type, ii) level of toxicity, iii)
species and iv) concentration.
Add Target subclass of
Molecule
Make the Target subclass of the Molecule. And add a property size to the
molecule.
Add subclasses to Target Add subclass to concept Target: i) Reactive Oxygen Species, ii) Lipids and
iii) Sugars. Moreover they proposed to add new attributes to concepts e.g.
add property 3D structure to the Proteins.
Add relation between
Chemopreventive agent
and Pathway
Add a relation between a Chemopreventive agent and a Pathway because
we may know the Pathway that a Chemopreventive agent affects but not
the specific Target.
Add relation between
Source and Molecule
Add a relation between the Source of a Chemopreventive agent and a
Molecule since the Source may contain many Molecules.
Add relation between
Chemopreventive agent
and Disease
Add a relation between the Chemopreventive agent and the Disease in
order to show which disease can be prevented by the agent.
Remove relation between
Disease and Drug
Remove the relation between the Drug and the Disease that show the
Disease treated by the Drug, because is out of the scope of the model that
focuses on chemoprevention and not treatment.
Add relation between
Drug and Target
Add a relation between a Drug and the Targets it interacts with.
Table 17 Changes based on evaluation
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 52
Confidentiality: EC Distribution
5. CONCLUSIONS AND FUTURE WORK
The present document is Deliverable 1.3 “GRANATUM Biomedical Semantic Model” of the
GRANATUM project. This document defines, designs and document the GRANATUM
Biomedical Semantic Model, which is one of the pillars on which the GRANATUM approach
will build. The GRANATUM Biomedical Semantic Model comprises of a set of core concepts
and their relationships. It is be possible to further specialize these core concepts (through sub-
concepting) or introduce new concepts, thus ensuring the extensibility and adaptability of the
ontology to the needs of different cases. The model will contribute to the realization of the
GRANATUM vision by carrying the required semantics in WPs 2 to 5. Specifically the
GRANATUM Biomedical Semantic model will be utilized:
WP2: In the semantic annotation, sharing and inter-connection of globally available web
resources in the Linked Biomedical Data Space.
WP3: In the semantic processing of publications and scientific papers (in online libraries
and digital archives), as well as posts on online communities and social networks in the
Opinion Modelling and Argument Analysis Space.
WP4: In the discovery and retrieval of the semantically-linked cancer chemoprevention
significant online data and web resources in the In Silico Models, Tools and Experiments
Space.
WP5: In the ontology-based mash-up of social networking applications and collaboration
tools in the Social Collaborative Working Space.
The definition of the GRANATUM Biomedical Semantic Model is based on the work carried
out in D1.1 where the state of the art is studied and the existing biomedical ontologies and data
sets, related to cancer chemoprevention, where detected. Moreover a prioritized list of
requirements with high and medium priority was selected for being addressed by the
GRANATUM Biomedical Semantic Model.
The data sets and ontologies detected in D1.1 were used at the top-down and bottom-up
conceptualization of the GRANATUM model. Furthermore, experimental data related to cancer
chemoprevention and the top/medium priority requirements identified in D1.1 where analyzed in
the conceptualization phase. Afterwards, the conceptual model is formally defined using a
standard template that specifies the class hierarchy and the properties used by each class. The
OWL implementation uses the classes and properties as they are represented by the formally
defined conceptual model.
The next step is the evaluation of the ontology. During the evaluation the accuracy,
completeness, usability and simplicity of the model were examined. For the evaluation a
questionnaire was circulated to the biomedical experts that detected problems and
inconsistencies and proposed correction actions. Those proposals were integrated in the
GRANATUM Biomedical Semantic Model.
The Maintenance of the ontology is a continuous process that will be carried throughout the
GRANATUM project. Whenever any needs emerge, that are not covered by the existing model,
then the ontology will change respectively (add/remove concepts/properties, change definitions
etc) to handle them.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 53
Confidentiality: EC Distribution
REFERENCES
[1] O. Corcho, M. Fernández-lópez, A. Gómez-pérez, and A. López, "Building legal
ontologies with METHONTOLOGY and WebODE," in Law and the Semantic Web,
number 3369 in LNAI: Springer-Verlag, 2005, pp. 142--157.
[2] Z. Li, M. Yang, and K. Ramani, "A methodology for engineering ontology acquisition
and validation," Artif. Intell. Eng. Des. Anal. Manuf., vol. 23, pp. 37--51, 2009.
[3] A. Öhgren and K. Sandkuhl, "Towards a methodology for ontology development in small
and medium-sized enterprises," in IADIS International Conference on Applied
Computing, 2005, pp. 369 - 376.
[4] D. Shotton, " CiTO, the Citation Typing Ontology," Journal of Biomedical Semantics,
vol. 1(Suppl 1):S6, 2010.
[5] U. Bojars, J. G. Breslin, V. Peristeras, G. Tummarello, and S. Decker, "Interlinking the
Social Web with Semantics," IEEE Intelligent Systems, vol. 23, pp. 29-40, 2008.
[6] P. Ciccarese, E. Wu, G. Wong, M. Ocana, J. Kinoshita, A. Ruttenberg, and T. Clark,
"The SWAN biomedical discourse ontology," Journal of Biomedical Informatics, vol. 41,
pp. 739-751, 2008.
[7] M. Brochhausen, A. Spear, C. Cocos, G. Weiler, L. Martìn, A. Anguita, H. Stenzhorn, E.
Daskalaki, F. Schera, U. Schwarz, S. Sfakianakis, S. Kiefer, M. Dörr, N. Graf, and
MTsiknakis, "The ACGT Master Ontology and Its Applications - Towards an Ontology-
Driven Cancer Research and Management System," Journal of Biomedical Informatics,
vol. 44, pp. 8-25, 2011.
[8] G. D. Bader and M. P. Cary, "BioPAX – Biological Pathways Exchange Language "
Level 2, Version 1.0 Documentation, doi:http://www.biopax.org/release/biopax-level2-
documentation.pdf, 2005.
[9] E. Beißwanger, S. Schulz, H. Stenzhorn, and U. Hahn, "BioTop: An Upper Domain
Ontology for the Life Sciences - A Description of its Current Structure, Contents, and
Interfaces to OBO Ontologies," Applied Ontology, vol. 3, pp. 205-212,, 2008.
[10] C. Crichton, J. Davies, J. Gibbons, S. Harris, A. Tsui, and J. Brenton, "Metadata-Driven
Software for Clinical Trials," Proceedings of the 2009 ICSE Workshop on Software
Engineering in Health Care (SEHC '09), 2009.
[11] J. Malone, E. Holloway, T. Adamusiak, M. Kapushesky, J. Zheng, N. Kolesnikov, A.
Zhukova, A. Brazma, and H. Parkinson, "Modeling Sample Variables with an
Experimental Factor Ontology," Bioinformatics, vol. 26, pp. 1112-1118, 2010.
[12] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis,
K. Dolinski, S. S. Dwight, and J. T. Eppig, "Gene ontology: tool for the unification of
biology," The Gene Ontology Consortium. Nature Genet., vol. 25, pp. 25-29, 2000.
[13] H. J. Lowe and G. O. Barnett, "Understanding and using the medical subject headings
(MeSH) vocabulary to perform literature searches," Journal of the American Medical
Assocation (JAMA), vol. 271, pp. 1103-1108, 1994.
[14] C. A. Ball and A. Brazma, "MGED standards: work in progress," Omics 2006;, vol. 10,
pp. 138-144, 2006.
[15] N. Sioutos, S. Coronado, M. Haber, F. Hartel, W. Shaiu, and L. Wright, "NCI Thesaurus:
a semantic model integrating cancer-related clinical and molecular information," Journal
of biomedical informatics, vol. 40, pp. 30-43, 2007.
[16] S. de Coronado, M. W. Haber, N. Sioutos, M. S. Tuttle, and L. W. Wright, "NCI
Thesaurus: using science-based terminology to integrate cancer research results,"
Medinfo, vol. 11, pp. 33-37, 2004.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 54
Confidentiality: EC Distribution
[17] M. Courtot, W. Bug, F. Gibson, A. Lister, J. Malone, D. Schober, R. Brinkman, and A.
Ruttenberg, "The OWL of Biomedical Investigations," in OWLED Workshop on OWL:
Experiences and Directions, collocated with the 7th International Semantic Web
Conference (ISWC-2008) Karlsruhe, Germany, 2008.
[18] S. Liu, M. Wei, R. Moore, V. Ganesan, and S. Nelson, "RxNorm: prescription for
electronic drug information exchange," IT Professional, vol. 7, pp. 17-23, 2005.
[19] J. J. Cimino, T. J. McNamara, T. Meredith, C. A. Broverman, K. C. Eckert, and M. e. a.
Moore, "Evaluation of a proposed method for representing drug terminology," Journal of
the American Medical Informatics Association, vol. 6, pp. 47-51, 1999.
[20] S. P. Cohn, "Seventh Annual Report to Congress on the Implementation Of the
Administrative Simplification Provisions of the Health Insurance Portability and
Accountability Act of 1996," 2005.
[21] D. Lindberg, B. Humphreys, and A. McCray, "The Unified Medical Language System,"
Methods of Information and Medicine, vol. 32, pp. 281-291, 1993.
[22] D. S. Wishart, C. Knox, A. C. Guo, D. Cheng, S. Shrivastava, D. Tzur, B. Gautam, and
M. Hassanali, "DrugBank: a knowledgebase for drugs, drug actions and drug targets,"
Nucleic Acids Research, vol. 36, pp. D901-D906, 2008.
[23] K.-I. Goh, M. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabási, "The Human
Disease Network," Proc Natl Acad Sci USA, vol. 104, pp. 8685-8690, 2007.
[24] K. Prasad, R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D.
Telikicherla, R. Raju, B. Shafreen, A. Venugopal, L. Balakrishnan, A. Marimuthu, S.
Banerjee, D. Somanathan, A. Sebastian, S. Rani, S. R. S, K. Harrys, S. Kanth, M.
Ahmed, M. Kashyap, R. Mohmood, Y. Ramachandra, V. Krishna, B. Rahiman, S.
Mohan, P. Ranganathan, S. Ramabadran, R. Chaerkady, and A. Pandey, "Human Protein
Reference Database " Nucleic Acids Research, vol. 37, pp. 767-72, 2009.
[25] P. Romero, J. Wagg, M. L. Green, D. Kaiser, M. Krummenacker, and P. D. Karp,
"Computational prediction of human metabolic pathways from the complete human
genome," Genome Biology, vol. 6, pp. 1-17, 2004.
[26] M. Bundschus, A. Bauer-Mehren, V. Tresp, L. Furlong, and H.-P. Kriegel, "Digging for
Knowledge with Information Extraction: A Case Study on Human Gene-Disease
Associations " in 19th ACM International Conference on Information and Knowledge
Management (CIKM 2010), 2010.
[27] R. Caspi, H. Foerster, C. A. Fulcher, P. Kaipa, M. Krummenacker, M. Latendresse, S.
Paley, S. Y. Rhee, A. G. Shearer, C. Tissier, T. C. Walk, P. Zhang, and P. D. Karp, "The
MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of
Pathway/Genome Databases," Nucleic Acids Res., vol. 36(Database issue), pp. 623–
D631, 2008.
[28] A. Ceol, A. A. Chatr, L. Licata, D. Peluso, L. Briganti, L. Perfetto, L. Castagnoli, and G.
Cesareni, "MINT, the molecular interaction database," Nucleic Acids Res., vol.
38(Database issue), pp. 532 - 539, 2010.
[29] D. Corpet and S. Tache, "Most effective colon cancer chemopreventive agents in rats: a
systematic review of aberrant crypt foci and tumor data, ranked by potency," Nutrition
and Cancer, vol. 43, pp. 1-21, 2002.
[30] A. Pico, T. Kelder, M. v. Iersel, K. Hanspers, B. Conklin, and C. Evelo, "WikiPathways:
Pathway Editing for the People," PLoS Biol, doi:10.1371/journal.pbio.0060184, vol. 6,
2008.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 55
Confidentiality: EC Distribution
[31] R. G. Mehta, R. Naithani, L. Huma, M. Hawthorne, R. M. Moriarty, D. L. McCormick,
V. E. Steele, and L. Kopelovich, "Efficacy of Chemopreventive Agents in Mouse
Mammary Gland Organ Culture (MMOC) Model: A Comprehensive Review," Current
Medicinal Chemistry, vol. 15, pp. 2785-2825, 2008,.
[32] C. Gerhäuser, K. Klimo, E. Heiss, I. Neumann, A. Gamal-Eldeen, J. Knauft, G.-Y. Liu,
S. Sitthimonchai, and N. Frank, " Mechanism-based in vitro screening of potential cancer
chemopreventive agents," Mutation Research/Fundamental and Molecular Mechanisms
of Mutagenesis, vol. 523-524, pp. 163-172, 2003.
[33] J. Brank, M. Grobelnik, and D. Mladenić, "A survey of ontology evaluation techniques,"
in Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD 2005),
2005.
[34] M. B. Almeida, "A proposal to evaluate ontology content," Applied Ontology, vol. 4, pp.
245–265, 2009.
[35] G. Maiga and D. Williams, "A Flexible Approach for User Evaluation of Biomedical
Ontologies," International Journal of Computing and ICT Research, vol. 2, 2008.
[36] A. Maedche and S. Staab, " Measuring similarity between ontologies," in 13th European
Conference on Knowledge Acquisition and Management (EKAW 2002) Madrid, Spain,
2002.
[37] J. Brank, M. Grobelnik, and D. Mladeni´c, "Gold standard based ontology evaluation
using instance assignment," in 4th International Workshop on Evaluation of Ontologies
for the Web (EON 2006) at the 15th International World Wide Web Conference (WWW
2006) Edinburgh, UK, 2006.
[38] R. Porzel and R. Malaka, "A task-based approach for ontology evaluation," in In
Workshop on Ontology Learning and Population at the 16th European Conference on
Artificial Intelligence ECAI Valencia, Spain, 2004.
[39] Y. Kalfoglou and B. Hu, "Issues with evaluating and using publicly available
ontologies," in In 4th International Workshop on Evaluation of Ontologies for the Web
(EON 2006) at the 15th International World Wide Web Conference Edinburgh, UK,
2006.
[40] C. Patel, K. Supekar, L. Yugyung, and E. K. Park, "OntoKhoj: a semantic web portal for
ontology searching, ranking and classification," in Proceedings of the 5th ACM
International Workshop on Web Information and Data Management, 2003, pp. 58–61.
[41] C. Brewster, H. Alani, S. Dasmahapatra, and Y. Wilk, " Data driven ontology
evaluation," in In International Conference on Language Resources and Evaluation,
Lisbon, Portugal, 2004.
[42] A. Lozano-Tello and A. Gómez-Pérez, "ONTOMETRIC: A method to choose the
appropriate ontology," Journal of Database Management, vol. 15, pp. 1–18, 2004.
[43] A. Gómez-Pérez, "Ontology evaluation," in Handbook on Ontologies, S. Staab and R.
Studer, Eds. Berlin: Springer-Verlag, 2004, pp. 251–274.
[44] J. Brooke, "SUS: A “quick and dirty” usability scale," in Usability evaluation in industry,
P. W. Jordan, B. Thomas, B. A. Weerdmeester, and I. L. McClelland, Eds. London:
Taylor & Francis., 1996, pp. 189 -194.
[45] C. Nuria, "Ontology Evaluation through Usability Measures," in Proceedings of OTM
Workshops'2009, Vilamoura, Portugal, 2009, pp. 594 - 603.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 56
Confidentiality: EC Distribution
APPENDIX
QUESTIONNAIRE FOR THE EVALUATION OF THE GRANATUM BIOMEDICAL
SEMANTIC MODEL
Based on the GRANATUM Biomedical Semantic model depicted in the following image answer
the questions of the questionnaire.
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 57
Confidentiality: EC Distribution
COMPLETENESS / USABILITY OF THE MODEL
N. Rate the following statements
(1: high disagreement, 5: high agreement)
1 2 3 4 5
1. I think that I could contribute to this ontology
2. I find the ontology easy to understand
3. I think that I would need further theoretical support to be able to
understand this ontology
4. I found the various concepts in this model were well integrated
5. I would imagine that most biomedical experts would understand this
ontology very quickly
6. I am confident I understand the conceptualization of the ontology
7. The concepts/properties of the ontology cover the needs of the Cancer
Chemoprevention domain.
If the concepts/properties of the ontology do not cover the needs of the Cancer
Chemoprevention domain, describe the missing concepts/properties:
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
...........................................................................................................................................................
CONCEPT DEFINITION QUESTIONS
N. Do you agree with the definition of: Yes No If no explain/propose
correction or enter a
reference for a definition
1. Virtual Screening
Virtual Screening refers to the technique of
performing a Biomedical experiment entirely in a
computer via computer simulation.
2. Chemopreventive Agent
A Chemopreventive agent is a Molecule that has
shown some evidence that it may be able to
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 58
Confidentiality: EC Distribution
prevent or delay the development of a specific
Disease (e.g. cancer) by interfering with a
Biological target. For example -
cancer chemopreventive agents are used to
inhibit, delay, or reverse carcinogenesis. A
Chemopreventive agent can be tested into an
Experiment and can be contained into a Natural
source or a Drug. References to a
Chemopreventive agent can be found into a
Publication.
3. Natural Source
A Natural Source is a substance found in nature
that usually has a pharmacological or biological
activity. A Chemopreventive agent may be
contained into a Natural Source.
4. Biological Target
A biological target is a biopolymer such as
a protein or nucleic acid whose activity can be
modified by an external stimulus. A
Chemopreventive agent has a Biological target in
order to "hit" it and change its behavior in order to
prevent a disease.
QUESTIONS FOR RELATIONS BETWEEN CONCEPTS
Do you agree with the following statements? Yes No If no explain/propose
correction
1. An Experiment uses a set of Experimental factors
2. An Experiment follows a Protocol
3. A Protocol can be referenced in a Publication
4. A Clinical trial is an Experiment
5. A Virtual Screening is an Experiment
6. An Experiment can be described in a Publication
7. A BioAssay is part of an Experiment
8. A BioAssay uses a Chemopreventive Agent
9. A Chemopreventive Agent is a Molecule
10. A Chemopreventive Agent can be referenced in a
Publication
11. A Chemopreventive Agent can be found in a Source
12. A Source is a Molecule
13. A Drug can be a source of a Chemopreventive Agent
14. A Natural Source can be a source of a Chemopreventive
D1.3 – GRANATUM Biomedical Semantic Model
31 January 2012 Version 1.0 Page 59
Confidentiality: EC Distribution
Agent
15. A Chemopreventive Agent has a Biological target
16. A Nucleic Acid can be the Biological target of a
Chemopreventive Agent
17. A Protein can be the Biological target of a
Chemopreventive Agent
18. A Biological target can be part of a Pathway
19. A Pathway can be related to a Disease
20. A Disease can be treated by a Drug