Post on 02-Jan-2016
description
What is ontology?
Ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts.
Generally it consist of Classes: sets, collections, or types of objects Instances: the basic or "ground level" objects Relations: ways that objects can be related to one another
It can be used … as schema for knowledge management system, … to reason about the objects within that domain, etc.
Examples of Real-world Ontologies AgroVoc
Multilingual thesaurus for the field of Agriculture, Forestry, Fisheries, Food Security and related stuff
Consists of terms in different languages, thesaurus relationships between terms
Broader, narrower, related
ASFA Thesaurus used for annotating bibliography related to aquatic
science literature EuroVoc
Multilingual thesaurus used by European institutions Acquis Communitarian corpus is annotated by EuroVoc
Cyc Knowledge base, formalization of fundamental human knowledge
Dmoz – The Open Directory Project Worlds largest directory of WWW, maintained by volunteer editors
What is Ontolight?
Simple model covering most of the well known light-weight ontologies Stores ontology like a rich graph
Defined as: List of languages used for lexical terms (covers
multliliguality) List of class-types (types of nodes in the graph) List of classes (nodes in the graph) List of relation types (types of links in the graph) List of relations (links in the graph) Grounding model
A function which proposes a set of classes for a given instance Classification in machine learning
Grounding
Mutliclass classification model trained on the instances of ontology In case of Dmoz web pages In case of EuroVoc EU legislation
We used centroid-based classifier Calculates a centroid vector for each class Uses knowledge of hierarchy Classification performed by kNN algorithm Highly scalable – can handle 100s of thousands of
classes
Population
Takes instance as an input Output is a list of suggested classes Example from EuroVoc
Instance: “Slovenia and Croatia are having a fishing industry” Output:
OntoGen
Ontology construction and learning
Semi-Automatic: Text-mining methods
provide suggestions and insights into the domain
The user can interact with parameters of text-mining methods
All the final decisions are taken by the user
Data-Driven: Most of the aid provided
by the system is based on some underlying data provided by the system
Instances are described by features extracted from the data (e.g. bag-of-words vectors)
Contextualized ontology generation
Ontolight is integrated with Ontogen Helps at new ontology generation by means of
existing ontologies User loads Ontolight into Ontogen at start
Suggestion methods: Concept suggestion
Offers concepts from loaded Ontolight as possible sub-concepts
Name suggestion Offers names of concepts from Ontolight as possible
concept names All suggestions are integrated in semi-automatic
manner
Concept suggestion
User selects concept User selects Ontolight OntoGen classifies each
document into context – Ontolight ontology
Concepts with most documents are provided as suggestions to the user
Name suggestion
User selects concept OntoGen classifies each
document into context – loaded Ontolight ontologies
Names of concepts with most classified documents are provided as suggestions to the user