Ontology Mining by exploiting Machine Learningfor Semantic Data Management
Claudia d’Amato
Department of Computer ScienceUniversity of Bari
BDA 2017 � Nancy, France - November 15, 2017
Introduction & Motivation
Semantic data managementa range of methods and techniques
for the manipulation and usageof data based on its meaning
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 2 / 59
Introduction & Motivation
Semantic Web Goalmaking data on the Web machine understandable
Data meaning needs to be
explicit
formally defined
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 3 / 59
Introduction & Motivation
Ontologies ⇒ basic element for realizing the semanticinteroperability
on the Web and in other contextsact as a shared vocabulary for assigning data semantics
Examples of existing real ontologies
Schema.org
Gene Ontology
Foundational Model of Anatomy ontology
Financial Industry Business Ontology (by OMG Finance Domain Task Force)
GoodRelations
. . .
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 4 / 59
Introduction & Motivation
Reasoning on Description Logics Ontologies
OWL adopted ⇒ Description Logics theoretical foundation
Ontologies are equipped with deductive reasoning capabilities ⇒ allowing to makeexplicit, knowledge that is implicit within them
Deduction:”Credit du Nord”,”Credit Agricole”
are also Company
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 5 / 59
Introduction & Motivation
Reasoning on Description Logics Ontologies
OWL adopted ⇒ Description Logics theoretical foundation
Ontologies are equipped with deductive reasoning capabilities ⇒ allowing to makeexplicit, knowledge that is implicit within them
Deduction:”Credit du Nord”,”Credit Agricole”
are also Company
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 5 / 59
Introduction & Motivation
Reasoning on Description Logics Ontologies
Question: would it be possible to discover new/additional knowledge byexploiting the evidence coming from the assertional data?
Deduction:”Credit du Nord”,”Credit Agricole”
are also Company
Incompleteness
”UniCredit” is a Bank
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 6 / 59
Introduction & Motivation
Reasoning on Description Logics Ontologies
Deduction:”Credit du Nord”,”Credit Agricole”
are also Company
Inconsistency
Mellon cannot bea Person and a Bank
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 7 / 59
Introduction & Motivation
Reasoning on Description Logics Ontologies
Question: would it be possible to discover new/additional knowledge byexploiting the evidence coming from the assertional data?
Deduction:”Credit du Nord”,”Credit Agricole”
are also Company
Noise
Person ≡ ¬Bank missing
Idea: exploiting Machine Learning methods for Ontology Mining related tasks[d’Amato et al. @SWJ’10]
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 8 / 59
Introduction & Motivation
Reasoning on Description Logics Ontologies
Question: would it be possible to discover new/additional knowledge byexploiting the evidence coming from the assertional data?
Deduction:”Credit du Nord”,”Credit Agricole”
are also Company
Noise
Person ≡ ¬Bank missing
Idea: exploiting Machine Learning methods for Ontology Mining related tasks[d’Amato et al. @SWJ’10]
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 8 / 59
Introduction & Motivation
Reasoning on Description Logics Ontologies
Question: would it be possible to discover new/additional knowledge byexploiting the evidence coming from the assertional data?
Deduction:”Credit du Nord”,”Credit Agricole”
are also Company
Noise
Person ≡ ¬Bank missing
Idea: exploiting Machine Learning methods for Ontology Mining related tasks[d’Amato et al. @SWJ’10]
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 8 / 59
Basics
Definition (Ontology Mining)
All activities that allow for
discovering hidden knowledge fromontological knowledge bases
Special Focus on:
(similarity-based) inductive learning methods
use specific examples to reach general conclusionsare known to be very efficient and fault-tolerant
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 9 / 59
Basics
Definition (Ontology Mining)
All activities that allow for
discovering hidden knowledge fromontological knowledge bases
Special Focus on:
(similarity-based) inductive learning methods
use specific examples to reach general conclusionsare known to be very efficient and fault-tolerant
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 9 / 59
Basics
Induction vs. Deduction
Deduction (Truth preserving)
Given:
a set of general axioms
a proof procedure
Draw:
correct and certainconclusions
Induction (Falsity preserving)
Given:
a set of examples
Determine:
a possible/plausiblegeneralization covering
the givenexamples/observationsnew and not previouslyobserved examples
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 10 / 59
Ontology Mining Tasks
Instance Retrieval (Instance Level)
Ontology Enrichment (Schema Level)
Concept Drift and Novelty Detection (Ontology Dynamic)
from an inductive perspective
Focus on: similarity-based methods
Ontology Mining Tasks
Instance Retrieval (Instance Level)
Ontology Enrichment (Schema Level)
Concept Drift and Novelty Detection (Ontology Dynamic)
from an inductive perspective
Instance Retrieval as a Classification Problem
Introducing Instance Retrieval I
Instance Retrieval → Finding the extension of a query concept
Instance Retrieval (Bank) = {”Credit du Nord”, ”Credit Agricole”}
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 13 / 59
Instance Retrieval as a Classification Problem
Introducing Instance Retrieval I
Problem: Instance Retrieval in incomplete/inconsistent/noisy ontologies
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 14 / 59
Instance Retrieval as a Classification Problem
Introducing Instance Retrieval II
Problem: Instance Retrieval in incomplete/inconsistent/noisy ontologies
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 15 / 59
Instance Retrieval as a Classification Problem
Introducing Instance Retrieval III
Problem: Instance Retrieval in incomplete/inconsistent/noisy ontologies
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 16 / 59
Instance Retrieval as a Classification Problem
Issues & Solutions I
IDEA
Casting the problem as a Machine Learning classification problem
assess the class membership of individuals in a Description Logic(DL) KB w.r.t. the query concept
State of art classification methods cannot be straightforwardly applied
generally applied to feature vector representation→ upgrade DL expressive representations
implicit Closed World Assumption made in ML→ cope with the Open World Assumption made in DLs
classes considered as disjoint→ cannot assume disjointness of all concepts
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 17 / 59
Instance Retrieval as a Classification Problem
Issues & Solutions II
Adopted Solutions:
Defined new semantic similarity measures for DL representations
to cope with the high expressive power of DLsto deal with the semantics of the compared objects (concepts,individuals, ontologies)to convey the underlying semantics of KB
Formalized a set of criteria that a similarity function has to satisfy inorder to be defined semantic [d’Amato et al. @ EKAW 2008]
Definition of the classification problem taking into account the OWA
Multi-class classification problem decomposed into a set a smallerclassification problems
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 18 / 59
Instance Retrieval as a Classification Problem
Definition (Problem Definition)
Given:
a populated ontological knowledge base K= 〈T ,A〉a query concept Q
a training set with {+1,−1, 0} as target values
Learn a classification function f such that: ∀a ∈ Ind(A) :
f (a) = +1 if a is instance of Q
f (a) = −1 if a is instance of ¬Qf (a) = 0 otherwise (unknown classification because of OWA)
Dual Problem
given an individual a ∈ Ind(A), tell concepts C1, . . . ,Ck in K it belongs to
the multi-class classification problem is decomposed into a set of ternaryclassification problems (one per target concept)
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 19 / 59
Instance Retrieval as a Classification Problem
Developed methods
Pioneering the Problem
relational K-NN for DL KBs [d’Amato et al. ESWC’08]
Improving the effectiveness/efficiency
kernel functions for kernel methods to be applied to DLs KBs [Fanizzi,d’Amato et al. @ ISMIS’06, JWS 2012; Bloehdorn and Sure @ ISWC’07]
Scaling on large datasets
Statistical Relational Learning methods for large scale and data sparseness[Huang et al. @ ILP’10, Minervini et a. @ ICMLA’15]
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 20 / 59
Instance Retrieval as a Classification Problem
Example: Nearest Neighbor Classification
query concept: Bank k = 7target values standing for the class values: {+1, 0,−1}
xq
+1
+1
+1
+1
+1
−1
−1+1
−1+1
0
0
0
0
0
query individual
class(xq)← ?
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 21 / 59
Instance Retrieval as a Classification Problem
Example: Nearest Neighbor Classification
query concept: Bank k = 7target values standing for the class values: {+1, 0,−1}
xq
+1
+1
+1
+1
+1
−1
−1+1
−1+1
0
0
0
0
0
query individual
class(xq)← +1
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 22 / 59
Instance Retrieval as a Classification Problem
Lesson Learnt from experiments I
Experiments performed on ontologies publicly available
Results compared with a standard deductive reasoner
Registered mismatches: Induction: {+1,−1} - Deduction: no results
Evaluated as mistake if precision and recall were used while it couldturn out to be a correct inference when judged by a human
Need for new metrics → Defined to distinguish induced assertions from mistakes
Reasoner+1 0 -1
Inductive +1 M I CClassifier 0 O M O
-1 C I M
M Match Rate O Ommission Error RateC Commission Error Rate I Induction Rate
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 23 / 59
Instance Retrieval as a Classification Problem
Lesson Learnt from experiments II
Commission error almost zero on average
Omission error rate very low and only in some cases
Not null for ontologies in which disjoint axioms are missing
Induction Rate not zero
new knowledge (not logically derivable) induced ⇒ can be used forsemi-automatizing the ontology population task
match commission omission inductionSWM 97.5 ± 3.2 0.0 ± 0.0 2.2 ± 3.1 0.3 ± 1.2
LUBM 99.5 ± 0.7 0.0 ± 0.0 0.5 ± 0.7 0.0 ± 0.0NTN 97.5 ± 1.9 0.6 ± 0.7 1.3 ± 1.4 0.6 ± 1.7
Financial 99.7 ± 0.2 0.0 ± 0.0 0.0 ± 0.0 0.2 ± 0.2
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 24 / 59
Ontology Mining Tasks
Instance Retrieval (Instance Level)
Ontology Enrichment (Schema Level)
Concept Drift and Novelty Detection (Ontology Dynamic)
from an inductive perspective
Ontology Enrichment as
a Disjointness Axioms Discovery Problem
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Disjointness axioms often missing within ontologiesProblems:
introduction of noise
Noise
Person ≡ ¬Bank missing
counterintuitive inferences
K ={JournalPaper v Paper , ConferencePaper v Paper , ConferencePaper(a) }
K |= JournalPaper(a)?Answer: Unknown
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 27 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Observation: extensions of disjoint concepts do not overlap
Question: would it be possible to automatically capture disjointnessaxioms by analyzing the data configuration/distribution?
Idea: Exploiting (Conceptual) clustering methods for the purpose
Definition (Problem Definition)
Given
an ontological knowledge base K = 〈T ,A〉a set of individuals I ⊆ Ind(A)
Find
n pairwise disjoint clusters {C1, . . . ,Cn}for each i = 1, . . . , n, a concept description Di that describesCi , such that:
∀a ∈ Ci : K |= Di (a)∀b ∈ Cj , j 6= i : K |= ¬Di (b).
Hence ∀Di ,Dj , i 6= j : K |= Dj v ¬Di .
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 28 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Basics on Clustering Methods
Clustering methods: unsupervised inductive learning methods thatorganize a collection of unlabeled resources into meaningful clusters suchthat
intra-cluster similarity is high
inter-cluster similarity is low
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 29 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Basics on Clustering Methods
Clustering methods: unsupervised inductive learning methods thatorganize a collection of unlabeled resources into meaningful clusters suchthat
intra-cluster similarity is high
inter-cluster similarity is low
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 30 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Basics on Clustering Methods
Clustering methods: unsupervised inductive learning methods thatorganize a collection of unlabeled resources into meaningful clusters suchthat
intra-cluster similarity is high
inter-cluster similarity is low
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 31 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Observation: extensions of disjoint concepts do not overlap
Question: would it be possible to automatically capture them byanalyzing the data configuration/distribution?
Idea: Exploiting (Conceptual) clustering methods for the purpose
Definition (Problem Definition)
Given
an ontological knowledge base K = 〈T ,A〉a set of individuals I ⊆ Ind(A)
Find
n pairwise disjoint clusters {C1, . . . ,Cn}for each i = 1, . . . , n, a concept description Di that describesCi , such that:
∀a ∈ Ci : K |= Di (a)∀b ∈ Cj , j 6= i : K |= ¬Di (b).
Hence ∀Di ,Dj , i 6= j : K |= Dj v ¬Di .C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 32 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Learning Disjointness Axioms: Developed Methods
Statistical-based approach
NAR - exploiting negative association rules [Fleischhacker et al. @OTM’11]
PCC - exploiting Pearson’s correlation coeff. [Volker at al.@JWS2015]
do not exploit any background knowledge
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 33 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Terminological Cluster Tree
Defined a method for eliciting disjointness axioms [Rizzo et.al.@ESWC’17]
solving a clustering problem via learning Terminological Cluster Trees
providing a concept description for each cluster
Definition (Terminological cluster tree (TCT))
A binary logical tree where
a node stands for a cluster of individuals C
each inner node contains a description D (over the signature of K)
each departing edge corresponds to positive (left) and negative (right)examples of D
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 34 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Example of TCT
Given I ⊆ Ind(A), an example of TCT describing individuals in theSemantic Web research community
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 35 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Collecting Disjointness Axioms
Given a TCT T:Step I:
Traverse the T to collect the concept descriptions describing theclusters at the leaves
A set of concepts CS is obtained
Step II:
A set of candidate axioms A is generated from CS:an axiom D v ¬E (D,E ∈ CS) is generated if
D 6≡ E (or D 6v E or viceversa)E v ¬D has not been generated
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 36 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Collecting Disjointness Axioms: Example
CS = { Person, Person u ∃hasPublication.>, ¬(Person u ∃hasPublication.>),Person∃hasPublication.SWPaper, ¬Proceedings,¬Person u Proceedings, · · · }
Axiom1: Person u ∃hasPublication.SWPaper v ¬(¬Proceedings)Axiom2: · · · serve stringa quanto quella sopra per allineare assio
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 37 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Inducing a TCT
Given the set of individuals I and > concept
Divide-and-conquere approach adopted
Base Case: test the stopConditionthe cohesion of the cluster I exceeds a threshold ν
distance between medoids below a threshold ν
Recursive Step (stopCondition does not hold):
a set S of refinements of the current (parent) description C generatedthe bestConcept E∗ ∈ S is selected and installed as current node
the one showing the best cluster separation ⇔ with max distancebetween the medoids of its positive P and negative N individuals
I is split in:
Ileft ⊆ I ↔ individuals with the smallest distance wrt the medoid of PIright ⊆ I ↔ individuals with the smallest distance wrt the medoid of N
Note: Number of clusters not required - obtained from data distribution
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 38 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Lesson Learnt from experiments I
Experiments performed on ontologies publicly available
Goal I: Re-discover a target axiom (existing in K)Setting:
A copy of each ontology is created removing a target axiomThreshold ν = 0.9, 0.8, 0.7Metrics # discovered axioms and #cases of inconsistency
Results:
target axioms rediscovered for almost all casesadditional disjointness axioms discovered in a significant numberlimited number of inconsistencies found
OntologyTCT 0.9 TCT 0.8 TCT 0.7
#inc. #ax’s #inc. #ax’s #inc. #ax’sBioPax 2 53 2 53 3 52NTN 10 70 9 73 10 75
Financial 0 125 0 126 0 127GeoSkills 2 345 1 347 4 347Monetary 0 432 0 432 0 433DBPedia3.9 45 45 44 44 43 43
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 39 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Lesson Learnt from experiments II
Goal II:
Re-discover randomly selected target axioms added according to theStrong Disjointness Assumption [Schlobach et al. @ ESWC 2005]
two sibling concepts in a subsumption hierarchy considered as disjoint
comparative analysis with statistical-based methods [Volker at al. @JWS 2015, Fleischhacker et al. @ OTM’11]
PCC - based on Pearson’s correlation coefficientNAR - exploiting negative association rules
Setting:A copy of each ontology created removing 20%, 50%, 70% of thedisjointness axioms
The copy used to induce TCT - ν = 0.9, 0.8, 0.7 - # Run: 10 times
Metrics: rate of rediscovered target axioms, #cases of inconsistency,# addional discovered axioms
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 40 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Lesson Learnt from experiments III
Results:almost all axioms rediscovered
Rate decreases when larger fractions of axioms removed, as expected
TCT outperforms PCC and NAR wrt additionally discovered axiomswhilst introducing limited inconsistency
TCT allows to express complex disjointness axioms
PCC and NAR tackle only disjointness between concept names
Exploiting the K as well as the data distribution improvesdisjointness axioms discovery
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 41 / 59
Ontology Enrichment Ontology Enrichment as a Disjointness Axioms Discovery Problem
Example of axioms
Successfully discovered axioms
ExternalReferenceUtilityClass u ∃TAXONREF.>disjoint withxref
Activitydisjoint withPerson u ∃nationality.United states
Person u hasSex.Male (≡ Man)disjoint withSupernaturalBeing u God (≡ God)
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 42 / 59
Ontology Enrichment as
a Concept Learning Problem
Ontology Enrichment Ontology Enrichment as a Concept Learning Problem
On Learning Concept Descriptions I
Goal: Learning descriptions for a given concept name / expression
Example : Man ≡ Human uMale
Question: How to learn concept descriptions automatically, given a set ofindividuals?
IDEA
Regarding the problem as a supervised concept learning task
Supervised Concept Learning:
Given a training set of positive and negative examples for a concept name,
construct a description that will accurately classify whether future examplesare positive or negative.
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 44 / 59
Ontology Enrichment Ontology Enrichment as a Concept Learning Problem
On Learning Concept Descriptions II
Definition (Problem Definition)
Given
the KB K as a background knowledgea subset pos of individuals as positive examples of Ca subset neg of individuals as negative examples of C
Learn
a DL concept description D so thatthe individuals in pos are instances of D while those in neg are not
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 45 / 59
Ontology Enrichment Ontology Enrichment as a Concept Learning Problem
Developed Methods for Supervised Concept Learning
Separate-and-conquer approachYinYang [Iannone et al. @ Appl. Intell. J. 2007]DL-FOIL [Fanizzi et al. @ ILP 2008]DL-Learner [Lehmann et al. @ MLJ 2010, SWJ 2011]
Divide-and-conquer approachTermiTIS [Fanizzi et al. @ ECML 2010, Rizzo et al. @ ESWC 2015]
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 46 / 59
Ontology Enrichment Ontology Enrichment as a Concept Learning Problem
Separate and Conquer: Example
C1
C′1
+
−
+
−
−
+
++
−
−
+
−
+
−
−
C2
C′2
+
−
+
−
−
+
+
+
−
−
−
+
−
C1 = MasterStudent C ′1 = MasterStudent u ∃worskIn.>C2 = BachelorStudent C ′2 = BachelorStudent u ∃worskIn.>
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 47 / 59
Ontology Enrichment Ontology Enrichment as a Concept Learning Problem
On Evaluating the Learnt Concept Descriptions
Publicly available ontologies considered
A number (30) of satisfiable randomly generated concepts considered
Positive and negative examples collected for each concept by using adeductive reasoner
Running concept learning on the collected positive and negativeexamples
Inductive classification performed on the learnt concept descriptions
match commission omission inductionontology rate error rate error rate rateBioPax 76.9 ± 15.7 19.7 ± 15.9 7.0 ± 20.0 7.5 ± 23.7
NTN 78.0 ± 19.2 16.1 ± 4.0 6.4 ± 8.1 14.0 ± 10.1Financial 75.5 ± 20.8 16.1 ± 12.8 4.5 ± 5.1 3.7 ± 7.9
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 48 / 59
Ontology Enrichment Ontology Enrichment as a Concept Learning Problem
Examples of Learned Concept Descriptions with DL-FOIL
BioPaxinduced:Or( And( physicalEntity protein) dataSource)
original:Or( And( And( dataSource externalReferenceUtilityClass)
ForAll(ORGANISM ForAll(CONTROLLED phys icalInteraction)))
protein)
NTNinduced:Or( EvilSupernaturalBeing Not(God))
original:Not(God)
Financialinduced:Or( Not(Finished) NotPaidFinishedLoan Weekly)
original:Or( LoanPayment Not(NoProblemsFinishedLoan))
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 49 / 59
Ontology Mining Tasks
Instance Retrieval (Instance Level)
Ontology Enrichment (Schema Level)
Concept Drift and Novelty Detection (OntologyDynamic)
from an inductive perspective
Concept Drift and Novelty Detection as a Clustering Problem
Concept Drift and Novelty Detection
Ontologies evolve over the time ⇒ New assertions added.
Concept Driftchange of a concept towards a more general/specific one w.r.t. theevidence provided by new annotated individuals
almost all Worker work for more than 10 hours per days ⇒ HardWorker
Novelty Detectionisolated cluster in the search space that requires to be defined throughnew emerging concepts to be added to the KB
subset of Worker employed in a company ⇒ Employee
subset of Worker working for several companies ⇒ Free-lance
Idea: automatically capturing them by analyzing the dataconfiguration/distribution
Research Direction
Exploiting (Conceptual) clustering methods for the purpose
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 51 / 59
Concept Drift and Novelty Detection as a Clustering Problem
Concept Drift and Novelty Detection
Ontologies evolve over the time ⇒ New assertions added.
Concept Driftchange of a concept towards a more general/specific one w.r.t. theevidence provided by new annotated individuals
almost all Worker work for more than 10 hours per days ⇒ HardWorker
Novelty Detectionisolated cluster in the search space that requires to be defined throughnew emerging concepts to be added to the KB
subset of Worker employed in a company ⇒ Employee
subset of Worker working for several companies ⇒ Free-lance
Idea: automatically capturing them by analyzing the dataconfiguration/distribution
Research Direction
Exploiting (Conceptual) clustering methods for the purpose
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 51 / 59
Concept Drift and Novelty Detection as a Clustering Problem
Clustering Individuals of An Ontology: Developed Methods
Purely Logic-based
KLUSTER [Kietz & Morik,94]
CSKA [Fanizzi et al., 04]
Produce a flat outputSuffer from noise in thedata
Similarity-based ⇒ noise tolerant
Evolutionary ClusteringAlgorithm around Medoids[Fanizzi et al. @ IJSWIS 2008]
automatically assess the bestnumber of clusters
k-Medoid (hierarchical andfuzzy) clustering algorithm[Fanizzi et al. @ ESWC’08,Fundam. Inform.’10]
number of clusters required
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 52 / 59
Concept Drift and Novelty Detection as a Clustering Problem
Automated Concept Drift and Novelty Detection 1/3
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 53 / 59
Concept Drift and Novelty Detection as a Clustering Problem
Automated Concept Drift and Novelty Detection 2/3
The new instances are considered to be a candidate clusterAn evaluation of it is performed for assessing its nature
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 54 / 59
Concept Drift and Novelty Detection as a Clustering Problem
Automated Concept Drift and Novelty Detection 2/3
The new instances are considered to be a candidate cluster
An evaluation of it is performed for assessing its nature
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 54 / 59
Concept Drift and Novelty Detection as a Clustering Problem
Automated Concept Drift and Novelty Detection 3/3
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 55 / 59
Concept Drift and Novelty Detection as a Clustering Problem
Lesson Learnt from Experiments
Intentional descriptions learnt
by using DL concept learning algorithms
Clustering algorithms
applied on ontologies publicly available
evaluated by the use of standard validity clustering indexes (e.g.Generalized Dunns index, cohesion index, Silhouette index)
Necessity of a domain expert/gold standard particularly for validatingthe concept novelty/drift
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 56 / 59
Concept Drift and Novelty Detection as a Clustering Problem Conclusions
Conclusions
Machine Learning methods
could be usefully exploited for ontology mining
suitable in case of incoherent/noisy KBs
can be seen as an additional layer on top of deductive reasoningfor realizing new/additional forms of approximated reasoningcapabilities
Future directions:
Semi-Supervised Learning methods particularly appealing for LOD
Special focus on scalability issues
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 57 / 59
That’s all!
Thank you
Nicola Fanizzi Giuseppe Rizzo Floriana Esposito
Concept Drift and Novelty Detection as a Clustering Problem Conclusions
Refinement Operators
Downward refinement operators specializing a concept C
ρ1 C ′ = C u (¬)A;
ρ2 C ′ = C u (¬)(∃)R.>;
ρ3 C ′ = C u (¬)(∀)R.>;
ρ4 ∃R.C ′i ∈ ρ(∃R.Ci ) ∧ C ′i ∈ ρ(Ci );
ρ5 ∀R.C ′i ∈ ρ(∀R.Ci ) ∧ C ′i ∈ ρ(Ci ).
C. d’Amato (UniBa) Machine Learning for Ontology Mining BDA 2017 59 / 59
Top Related