LLogics for DData and KKnowledgeRRepresentation
The DERA methodology
Outline Overview: the semantic heterogeneity problem
Ontologies
The DERA methodology
Use cases
Projects
2
The semantic heterogeneity problem
3
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
The difficulty of establishing a certain level of connectivity between people, software agents or IT systems [Uschold & Gruninger, 2004] at the purpose of enabling each of the parties to appropriately understand the exchanged information [Pollock, 2002]
Early solutions
4
Physical connectivity relies on the presence of a stable communication channel between the parties, for instance ODBC data gateways and software adapters.
Syntactic connectivity is established by instituting a common vocabulary of terms to be used by the parties or by point-to-point bridges that translate messages written in one vocabulary in messages in the other vocabulary.
This rigidity and lack of explicit meaning causes very high maintenance costs (up to 95% of the overall ownership costs) as well as integration failure (up to 88% of the projects) [Pollock, 2002]
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
The semantic interoperability solution
5
The solution in three points:
Semantic mediation: the usage of an ontology, providing a shared vocabulary of terms with explicit meaning.
Semantic mapping: using the ontology, the establishment of a mapping constituted by a set of correspondences between semantically similar data elements independently maintained by the parties.
Context sensitivity: the mapping has contextual validity, i.e. it has to be used by taking into account the conditions and the purposes for which it was generated.
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Ontologies An ontology is an explicit
specification of a shared conceptualization [Gruber, 1993]
Ontologies are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts
By providing a common formal terminology and understanding of a given domain of interest, it allows for automation (logical inference), supports reuse and favor interoperability across applications and people.
They differ according to the purpose and the semantics
6
Animal
Bird HeadMammal
Predator Herbivore
GoatTiger
Chicken
Cat
Is-a
Is-a
Is-aIs-a
Is-a
Eats
Eats
Is-aPart-of
Is-a Is-a
Eats
Body
Part-of
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Definition of ontology in detail An ontology is an explicit specification of a shared conceptualization
[Gruber, 1993]
Conceptualization: an abstract model of how people theorize (part of) the world in terms of basic cognitive units called concepts. Concepts represent the intention, i. e. the set of properties that distinguish the concept from others, and summarize the extension, i.e. the set of objects having such properties.
Explicit specification: the abstract model is made explicit by providing names and definitions for the concepts, i.e. the name and the definition of the concept provide a specification of its meaning in relation with other concepts.
Formal specification: when it is written in a language with formal syntax and formal semantics, i.e. in a logic-based language.
Shared conceptualization: it captures knowledge which is common to a community of people and therefore represents concretely the level of agreement reached in that community.
7
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Kinds of ontologies [Uschold and Gruninger, 2004]
8
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Classification vs. Descriptive Ontologies Classification ontologies
They are used to classify things, such as books, documents, web pages, etc.; the purpose is to provide domain specific terminology and organize individuals accordingly. Such ontologies usually take the form of classifications with (BT\NT\RT) or without explicit relations.
Descriptive ontologies
They are used to describe a piece of world, such as the Gene ontology, Industry ontology, etc.; the purpose is to offer an unambiguous description of the world. Relations are typically explicit (e.g. is-a) and can be of any kind.
See [Giunchiglia et al., 2009]9
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Classification ontologiesOVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Descriptive ontologiesOVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Classification vs. Real World semantics Classification ontologies are in classification semantics
In classification ontologies, the extension of each concept (label of a node) is the set of documents about the entities or individual objects described by the label of the concept. For example, the extension of the concept animal is “the set of documents about animals” of any kind.
Descriptive ontologies are in real world semantics
In descriptive ontologies, concepts represent real world entities. For example, the extension of the concept animal is the set of real world animals, which can be connected via relations of the proper kind.
12
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
13
How to build a robust ontology?We answer to this question with DERA
[Giunchiglia et al., 2013]
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Domains Any area of knowledge or field of study that we are interested in
or that we are communicating about that deals with specific kinds of entities:
conventional fields of study (e.g. physics, mathematics) applications of pure disciplines (e.g. engineering,
agriculture) any aggregate of such fields (e.g. physical sciences, social
sciences) capturing knowledge about our everyday lives (e.g. music,
movie, sport, recipes, tourism)
Domains are the main means by which the diversity of the world is captured, in terms of language, knowledge and personal experience. 14
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
How to build domains? Several methodologies have been developed for the construction
and maintenance of vocabularies in specific domains
Among them, the faceted approach [Ranganathan, 1967] is known to have great benefits in terms of quality and scalability of the developed resources
BUT the faceted approach - and Knowledge Organization (KO) in general - aims at the development of controlled vocabularies built as classification ontologies, while Knowledge Representation (KR) needs descriptive ontologies
15
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
The faceted approach as from KO The faceted approach is a well-established methodology used in
library & information science (LIS) for the organization of knowledge in libraries [Ranganathan, 1967]
It is based on the fundamental notions of domain and facets, which allow capturing the different aspects of a domain and, at the same time, allow for an incremental growth of knowledge.
Originally facets were of 5 types (PMEST): Personality, Matter, Energy, Space, Time).
For instance, in the medicine domain the important facets are the body parts, the diseases that affect them and the treatment to cure or prevent them.
16
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
From KO to KR: DERA How to build high quality and scalable descriptive ontologies? DERA is faceted as it is inspired to the principles and canons of
the faceted approach by Ranganathan DERA is a KR approach as it models entities of a domain (D) by
their entity classes (E), relations (R) and attributes (A)
17
R AD
FACET
E
CATEGORY
ARRAY
CONCEPT
Entity Classes Relations AttributesDomain
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Elements of DERAA DERA domain is a triple D = <E, R, A> where: E (for Entity) is a set of facets grouping terms denoting entity classes, whose instances (the entities) have either perceptual or conceptual existence. Terms in these hierarchies are explicitly connected by is-a or part-of relation.R (for Relation) is a set of facets grouping terms denoting relations between entities. Terms in these hierarchies are connected by is-a relation.A (for Attribute) is a set of facets grouping terms denoting qualitative/quantitative or descriptive attributes of the entities. We differentiate between attribute names and attribute values such that each attribute name is associated corresponding values. Attribute names are connected by is-a relation, while attribute values are connected to corresponding attribute names by value-of relations.
18
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Mapping DERA to Description Logic (DL) is-a, part-of and value-of relations form the backbone of facets,
are assumed to be transitive and asymmetric, and hence are said to be hierarchical.
Other relations, whenever defined, not having such properties, are said to be associative and connect terms in different facets.
All together facets constitute the TBox of a descriptive ontology.
The mapping of E/R/A above to DL should be obvious:o Classes correspond to conceptso Relations and Attributes to roleso is-a and part-of correspond to subsumption (⊑) between classes
and between roleso value-of correspond to value restrictions
19
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
An example of DERA domain
20
ENTITY
Location Landform (is-a) Natural elevation (is-a) Continental elevation (is-a) Mountain (is-a) Hill (is-a) Oceanic elevation (is-a) Seamount (is-a) Submarine hill (is-a) Natural depression (is-a)Continental depression (is-a) Valley (is-a) Trough (is-a) Oceanic depression (is-a) Oceanic valley (is-a) Oceanic trough Body of water (is-a) Flowing body of water (is-a) Stream, Watercourse (is-a) River (is-a) Brook (is-a) Still body of water (is-a) Lake (is-a) Pond
RELATION
Direction (is-a) East (is-a) North (is-a) South (is-a) West Relative level (is-a) Above (is-a) Below
Containment (is-a) part-of
ATTRIBUTE
NameLatitude Longitude Altitude AreaPopulation Depth (value-of) deep (value-of) shallow Length (value-of) long (value-of) short
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Steps in the DERA approach Step 1: Identification of the atomic concepts
(E) watercourse, stream: a natural body of running water flowing on or under the earth
Step 2: Analysis a body of water a flowing body of water no fixed boundary confined within a bed and stream banks larger than a brook
21
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Steps in the DERA approach (cont.) Step 3: Synthesis
(E) Body of water
(is-a) Flowing body of water
(is-a) Stream
(is-a) Brook
(is-a) River
(is-a) Still body of water
(is-a) Pond
(is-a) Lake
Step 4: Standardization
(E) stream, watercourse: a natural body of running water flowing on or under the earth22
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Steps in the DERA approach (cont.) Step 5: Ordering
Terms and concepts in the facets are ordered
Step 6: Formalization
Descriptive ontologies are translated into Description Logic formal ontologies, e.g.,:
Lake Body-of-Water⊑
Lake (Garda Lake)
Depth (Garda Lake, deep)
23
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Guiding principles [Dutta et al., 2011]
24
Principle Example
Relevance breed is more realistic to classify the universe of cows instead of by grade
Ascertainability flowing body of water
Permanence spring as a natural flow of ground water
Exhaustiveness to classify the universe of people, we need both male and female
Exclusiveness age and date of birth, both produce the same divisions
Context bank, a bank of a river, OR, a building of a financial institution
Currency metro station vs. subway station
Reticence minority author, black man
Ordering stream preferred to watercourse
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Advantages of DERA DERA facets have explicit semantics and are modeled as
descriptive ontologies
DERA facets inherits all the important properties of the faceted approach, such as robustness and scalability
DERA allows for automated reasoning via the formalization into Description Logics ontologies. In particular, DERA allows for a very expressive search by any entity property
25
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
26
Concrete use-case of the application of DERA principles
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Problems with WordNet
27
The position of nodes is driven by syntax
Glosses exhibit space and time bias
Some concepts are too similar in meaning
Some concepts are actually individuals
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Reorganizing WordNet with DERA
28
Educational Institution <by level of complexity>PreschoolSchool
Primary schoolSecondary schoolPost-secondary school
<by programme orientation>Training schoolVocational schoolTechnical schoolGraduate school
CollegeUniversity
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
Reorganizing WordNet with DERA
29
Preschool (an educational institution for children too young for primary school)
Nursery school, kindergarten (a preschool where children below the age of compulsory education play and learn)
Playschool (a part time preschool where children meeting for half-day session
School (an educational institution designed for the teaching of students (or "pupils") under the direction of teachers)
Primary school (a school for children where they receive the first stage of compulsory education)
Infant school (a primary school for very young children where they learn basic reading and writing skills)
Junior school (a primary school for young children where they learn about some simple basic disciplines like mathematics, geography and history)
Secondary school (a school for students intermediate between primary school and tertiary school)
Junior high school (a secondary school where students learn about complex aspects of basic disciplines)
Senior high school (a secondary school where students can opt for their field of study)…
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
30
Concrete projects with DERA
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
The space ontology [Giunchiglia et al., 2012]
Objects Quantity
Entity classes (E) 845
Entities (e) 6,907,417
Relations (R) 70
Attributes (A) 31
Knowledge is extracted from GeoNames and the Getty Thesaurus of Geographic Names
Terms are collected, categorized into classes, entities, relations and attributes, and synsets are generated
Synsets are mapped to and integrated with WordNet
Synsets are analyzed and arranged into facets Terms are standardized and ordered
Landform Natural depression Oceanic depression Oceanic valley Oceanic trough Continental depression Trough Valley Natural elevation Oceanic elevation Seamount Submarine hill Continental elevation Hill Mountain
Body of water Flowing body of water Stream River Brook Stagnant body of water Lake Pond
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
The semantic-geo catalogue [Farazi et al., 2012]
Objects QuantityFacets 5Entity classes (E) 39Entities (e) 20,162part-of relations 20,161Alternative names 7,929
Knowledge is extracted from the geographical dataset of the Province of Trento
The faceted ontology was built in English and Italian Usage of the ontology
The ontology is used in combination with S-Match within the search component of the geo-catalogue to improve search
The evaluation shows that at the price of a drop in precision of 0.16% we double recall
Body of water Lake Group of lakes Stream River Rivulet Spring Waterfall Cascade Canal Natural elevation Highland Hill Mountain Mountain range Peak Chain of peaks GlacierNatural depression Valley Mountain pass
OVERVIEW :: ONTOLOGIES :: DERA METHODOLOGY :: USE-CASES :: PROJECTS
References Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge
Acquisition, 5 (2), 199–220. Maltese V. (2012). Dealing with semantic heterogeneity in classifications. PhD thesis:
http://eprints-phd.biblio.unitn.it/700/ (see chapters 1.1. and 2.1) Uschold, M., Gruninger, M. (2004). Ontologies and semantics for seamless connectivity. SIGMOD
Rec., 33(4), 58–64. Pollock, J. (2002). Integration’s Dirty Little Secret: It’s a Matter of Semantics. Whitepaper, The
Interoperability Company. Giunchiglia, F., Dutta, B., Maltese, V. (2009). Faceted lightweight ontologies. In “Conceptual
Modeling: Foundations and Applications”, LNCS 5600 Springer. Giunchiglia, F., Dutta, B., Maltese, V. (2013). From Knowledge Organization to Knowledge
Representation. ISKO UK conference. Ranganathan (1967). Prolegomena to library classification. Asia Publishing House. Dutta, B., Giunchiglia, F. and Maltese, V. (2011). A Facet-based Methodology for Geo-Spatial
Modeling. GEOS conference, 6631, pp 133–150. Farazi, F., Maltese, V., Dutta, B., Ivanyukovich, A., V. Rizzi (2013). A semantic geo-catalogue for a
local administration. AI Review Journal, 40 (2), 193-212. Giunchiglia, F., Dutta, B., Maltese, V., Farazi, F., 2012. A facet-based methodology for the
construction of a large-scale geospatial ontology. Journal on Data Semantics, 1 (1), pp. 57-73.
33
Thank you!Questions?
Top Related