Probabilistic Description Logics: Reasoning and...
Transcript of Probabilistic Description Logics: Reasoning and...
Probabilistic Description Logics: Reasoning and
Learning
Riccardo Zese
Universita di Ferrara
Email: [email protected]
R. Zese Probabilistic Description Logics 1 / 61
Presented Systems
The systems we developed, which I will mention or demonstrate during thetalk, are all available at the Machine Learning@unife Web page:http://ml.unife.it/
The content of these slides is taken from the book Probabilistic SemanticWeb: Reasoning and Learning published by IOSPress.http://www.iospress.nl/book/probabilistic-semantic-web/
R. Zese Probabilistic Description Logics 2 / 61
Semantic Web
User interface and applications
Trust
Proof
Unifying logic
Taxonomies: RDFS
Rules: RIF/SWRLQuerying:
SPARQL
Data interchange:
RDFSyntax: XML+NS
Identi ers: URI/IRI Character set: UNICODE
Certi
catio
n
Cry
pto
gra
phy
R. Zese Probabilistic Description Logics 3 / 61
Semantic Web
The Semantic Web aims at making information available in a formthat is understandable and automatically manageable by machines.
Ontologies, or knowledge bases, are engineering artifacts used to thispurpose
Description Logics are (decidable) subsets of First Order Logiclanguages used for modeling (and reasoning upon) ontologies and theSemantic Web.
R. Zese Probabilistic Description Logics 4 / 61
Big Data
R. Zese Probabilistic Description Logics 5 / 61
Outline
1 Combining Logic and Probability
2 Inference Algorithms
3 Learning Algorithms
R. Zese Probabilistic Description Logics 6 / 61
Combining Logic and Probability Description Logics
Description Logics [Baader et al., 1994]
(Decidable) subsets of First Order Logic, basis of OWL
Open World Assumption
Individuals, Class concepts, roles, and concept-forming operators
KB = 〈ABox ,TBox ,RBox〉tom : Cat(kevin, tom) : hasAnimalCat v Pet∃hasAnimal .Pet v NatureLoverhasPet v hasAnimal
R. Zese Probabilistic Description Logics 7 / 61
Combining Logic and Probability Description Logics
Description Logics
Some concept-forming operators:
class conjunction C u Dclass disjunction C t D
existential restriction ∃R.C...
Some types of axioms:
class inclusion/subsumption C v DC ≡ D
role chains R1 ◦ . . .Rn ⊆ Rinverse roles R ≡ S−
Decidability (and tractability) is a central issue of DLs
R. Zese Probabilistic Description Logics 8 / 61
Combining Logic and Probability Description Logics
Description Logics
DLs differ depending on which operators are admitted; moreoperators means:
higher expressivityhigher computational coststhe logic might to be undecidable...
http://www.cs.man.ac.uk/~ezolin/dl/, a nice Web applicationthat shows decidability and complexity issues depending on whichoperator you decide to include/exclude in your logic, with extensivereferences
R. Zese Probabilistic Description Logics 9 / 61
Combining Logic and Probability Description Logics
KBSW : A simple example in ALCA long time ago in a galaxy far, far away...
The class of Stormtrooper is that of soldiers ofGalactic Empire having at least one blaster equipped,and also only fighting Ewoks
Stormtrooper ≡ ∃ hasEquip.Blaster u ∀ hasFought.Ewok
The class of blaster is the union of blaster rifles and blaster pistols
Blaster ≡ Blaster Rifle t Blaster Pistol
FN-2199 is equipped also with a blaster, but not allthe foes he has fought were Ewoks
FN-2199 : ∃ hasEquip.Blaster u ¬ (∀ hasFought.Ewok)
We can check whetherKBSW |= FN-2199 : Stormtrooper , and it is not but...
R. Zese Probabilistic Description Logics 10 / 61
Combining Logic and Probability Description Logics
Meet FN-2199
R. Zese Probabilistic Description Logics 11 / 61
Combining Logic and Probability OWL
OWL: Web Ontology Language
Developed by the W3C for representing ontologies
Three different sublanguages:
OWL-Lite, based on SHIF , good forthesauri or hierarchies
OWL DL, based on SHOIN , moreexpressive, computational complete,decidable, availability of practical reasoningalgorithms
OWL Full, highly expressive, also high-orderlogic, undecidable
OWL 1.1 and OWL 2 based on SROIQ
R. Zese Probabilistic Description Logics 12 / 61
Combining Logic and Probability OWL
A more elaborated example for Stormtrooper in SROIQ
Stormtroopers are soldiers of Galactic Empire having at least oneblaster equiped and that have fought at least 10 Ewoks1
Stormtrooper ≡ ∃ hasEquip.Blaster u ≥ 10 hasFought.Ewok
FN-2199 is equiped with a blaster pistol and a baton, and has foughtat least 10 Ewoks (but also humans and other races/species)
FN-2199 : ∃hasEquip.Blaster u ∃hasEquip.Batonu≥ 10 hasFought.Ewok u ¬ (∀ hasFought.Ewok)
I am quite happy now, since, in this KB, FN-2199 is a Stormtrooper!
1Qualified number restrictionR. Zese Probabilistic Description Logics 13 / 61
Combining Logic and Probability OWL
OWL 2: retaining tractability
OWL 2 profiles, identified as maximal OWL 2 sublanguages stillimplementable in PTime, each is more restrictive than OWL DL:
OWL QL disallow ∃ (fragment of Datalog, tractable, conjunctivequeries answered in LogSpace)
OWL EL disallow inverse roles, (tractable, used in ontology with hugeDBs as SNOMED CT)
OWL RL subclass axioms understood as rule-like implications withhead (superclass) and body (subclass), with restrictions on subclassesand superclasses (e.g., no existentials in the heads; standardreasoning is P-Time complete)
R. Zese Probabilistic Description Logics 14 / 61
Combining Logic and Probability Probabilistic Decription Logics
Probabilistic Description Logics
Semantic Web
Incompleteness or uncertainty are intrinsic of much information on theWorld Wide WebMost common approaches: probability theory, Fuzzy Logic
Probabilistic extensions of DL based on Bayesian networks[Koller et al., 1997, Ding and Peng, 2004]
Statistical terminological knowledge in PCL [Jaeger, 1994]
PR-OWL, a framework for building probabilistic ontologies[Carvalho et al., 2010, Laskey and da Costa, 2005]
P-SHIQ(D) [Lukasiewicz, 2008], semantically based on the notion ofprobabilistic lexicographic entailment
Prob-ALC [Lutz and Schroder, 2010] and crALC[Ochoa-Luna et al., 2011], extending ALC with epistemic and statisticalaxioms respectively
BEL [Ceylan et al., 2015] exploiting Bayesian networks to extend the EL DL
R. Zese Probabilistic Description Logics 15 / 61
Combining Logic and Probability Probabilistic Decription Logics
Uncertainty Representation in LP
Have a look at Logic Programming (LP) field
Uncertain relationships among entities characterize many complexdomains
Most common approach: probability theory → Distribution Semantics[Sato, 1995].
It underlies many languages (ICL, PRISM, ProbLog, LPADs),...They define a probability distribution over logic programs, called worldsThe distribution is extended to a joint distribution over worlds andqueriesThe probability of a query is obtained from this distribution bysumming out worlds
R. Zese Probabilistic Description Logics 16 / 61
Combining Logic and Probability Probabilistic Decription Logics
Inference for PLP under DS
Computing the probability of a query
Knowledge compilation:compile the program to an intermediate representation
Binary Decision Diagrams (ProbLog [De Raedt et al., 2007], cplint2
[Riguzzi, 2007, Riguzzi, 2009], PITA [Riguzzi and Swift, 2011])deterministic, Decomposable Negation Normal Form circuits (d-DNNF)(ProbLog23 [Fierens et al., 2015])Sentential Decision Diagrams (ProbLog2)
and then compute the probability by weighted model counting
Bayesian Network based:
Convert to BNUse BN inference algorithms (CVE [Meert et al., 2010])
Lifted inference [Poole, 2003]
2http://cplint.ml.unife.it/3https://dtai.cs.kuleuven.be/problog/editor.html
R. Zese Probabilistic Description Logics 17 / 61
Combining Logic and Probability Probabilistic Decription Logics
Knowledge Compilation
Assigns boolean random variables to the probabilistic rules
Given a query Q, compute its explanations, assignments to therandom variables that are sufficient for entailing the query
Let K be the set of all possible explanations
Build the DNF boolean formula:
F (Q) =∨κ∈K
∧X∈κ
X∧X∈κ
X
Then build a BDD / d-DNNF / SDD representing F (Q), andcompute the probability for Q from it, by Shannon expansion
R. Zese Probabilistic Description Logics 18 / 61
Combining Logic and Probability Probabilistic Decription Logics
DISPONTE: DIstribution Semantics for Probabilistic ONTologiEs
Idea: annotate axioms of an ontology with a probability, under theassumption that the axioms are independent of each other
0.6 :: Cat v Pet
The probability value specifies a degree of belief in the correspondingaxiom.
A probabilistic ontology defines thus a distribution over theories(worlds) obtained by including an axiom in a world with theprobability given by the annotation
R. Zese Probabilistic Description Logics 19 / 61
Combining Logic and Probability Probabilistic Decription Logics
Example - people+pets ontology
fluffy is a Cat with probability 0.4 and tom is a Cat with probability 0.3; Cats are
Pets with probability 0.6. Everyone who has a pet animal (∃hasAnimal .Pet) is a
NatureLover ; kevin has two animals, fluffy and tom.
0.4 :: fluffy : Cat (1)
0.3 :: tom : Cat (2)
0.6 :: Cat v Pet (3)
∃hasAnimal .Pet v NatureLover
(kevin, fluffy) : hasAnimal
(kevin, tom) : hasAnimal
Q = kevin : NatureLover has 8 worlds (true in 3 of them):
{ (1), (3) }, { (2), (3) }, { (1), (2), (3) }and P(Q) = 0.4× 0.6× (1− 0.3) + 0.3× 0.6× (1− 0.4)
+ 0.4× 0.6× 0.4 = 0.348
R. Zese Probabilistic Description Logics 20 / 61
Combining Logic and Probability Probabilistic Decription Logics
Inference and Query answering
The probability of a query Q can be computed accordingly with thedistribution semantics by first finding the explanations for Q in the KBExplanation: set of boolean variables corresponding to a set of axioms ofKB that is sufficient for entailing QAll the explanations for Q must be found, corresponding to all (minimal)ways of proving Q
Probability of Q → probability of the DNF formula
F (Q) =∨κ∈K
∧X∈κ
X
where K is the set of explanations and X is a Boolean randomvariable associated to axiom κ
We exploit Binary Decision Diagrams (BDDs) for efficientlycomputing the probability of a DNF formula
R. Zese Probabilistic Description Logics 21 / 61
Combining Logic and Probability Probabilistic Decription Logics
Example - people+pets ontology cont.
Explanations4: { (1), (3) }, { (2), (3) }P(Q) = 0.4× 0.6× (1− 0.3) + 0.3× 0.6 = 0.348
We associate the random variables X1 with (1), X2 with (2) and X3 with(3).
X1 0.4 n1
X2 0.3 n2
X3 0.6 n3
1 0
f (X) = (X1 ∧ X3) ∨ (X2 ∧ X3)
The probability is:P(n3) = 0.6 · 1 + 0.4 · 0 = 0.6P(n2) = 0.3 · 0.6 + 0.7 · 0 = 0.18P(n1) = 0.4 · 0.6 + 0.6 · 0.18 = 0.348
So P(Q = kevin : NatureLover) =P(n1) = 0.348.
4Certain axioms are irrelevant for computing query probabilityR. Zese Probabilistic Description Logics 22 / 61
Combining Logic and Probability Probabilistic Decription Logics
PCL [Jaeger, 1994]
Allows the definition of
P(C |D) = [pl , pu] called probabilistic terminological axioms definingstatistical informationP(C (a)) = p defining probabilistic assertions
Inference can be done by:1 naive method for computing optimal bound using cross-entropy
minimization2 linear optimization problem
R. Zese Probabilistic Description Logics 23 / 61
Combining Logic and Probability Probabilistic Decription Logics
P-SHIQ(D) [Lukasiewicz, 2008]
Allows the definition of
(C |D)[l , u] (as in [Jaeger, 1994])(∃R.{a}|C )[l , u] (an arbitrary instance of a concept C is R-related tothe individual a with probability in the interval [l , u])(C |{a})[l , u] and (∃R.{b}|{a})[l , u] defining probabilistic assertions(correspond with DISPONTE)
Terminological knowledge is interpreted statistically (as in[Jaeger, 1994])
Assertional knowledge is interpreted in an epistemic way.
Semantics based on Nilsson’s probabilistic logic [Nilsson, 1986] thatdefines probabilistic interpretations instead of a single probabilitydistribution over theories.
R. Zese Probabilistic Description Logics 24 / 61
Combining Logic and Probability Probabilistic Decription Logics
PR-OWL[Carvalho et al., 2010, Laskey and da Costa, 2005]
An upper ontology that provides a framework for building probabilisticontologies. It allows to use the first-order probabilistic logic MEBN[Laskey and da Costa, 2005] for representing uncertainty inontologies.
DISPONTE differs from PR-OWL because it allows the reuse ofinference technology from DLs by minimally extending DL.
R. Zese Probabilistic Description Logics 25 / 61
Combining Logic and Probability Probabilistic Decription Logics
Prob-ALC [Lutz and Schroder, 2010]
Allows the definition of
P≥nC indicating set of individuals belonging to C with probabilitygreater than n∃P≥nR.C set of individuals a connected to at least another individual bof C by role R such that the probability of R(a, b) is greater than nP≥nC (a) and P≥nR(a, b) defining probabilistic assertions
Considers only epistemic probabilities (as DISPONTE)
Complementary to DISPONTE ALC as it allows new concept andassertional expressions while DISPONTE allows probabilistic axioms.
R. Zese Probabilistic Description Logics 26 / 61
Combining Logic and Probability Probabilistic Decription Logics
crALC [Ochoa-Luna et al., 2011]
Allows the definition of
P(C |D) = α, for any element x in the domain, the probability that x isin C given that it is in D is αP(R) = β, for each couple of elements x and y in the domain, theprobability that x is linked to y by the role R is β
Considers only statistical probabilities
Adopts an interpretation-based semantics based on probabilitymeasures over the space of interpretations with a fixed domain.
Inference performed by a first order loopy belief propagation algorithmon ground direct acyclic graphs which represents the KBs w.r.t.queries.
R. Zese Probabilistic Description Logics 27 / 61
Combining Logic and Probability Probabilistic Decription Logics
BEL [Ceylan et al., 2015]
Extends EL DL by exploiting Bayesian network with variables V .
Allows the definition of
E : X = x where E is a DL axiom and X = x is an annotation withX ⊆ V and x a set of values for these variables.
Bayesian network assigns a probability to every assignment of V ,called a world.
The probability of a query Q = E : X = x is given by the sum of theprobabilities of the worlds where X = x is satisfied and where E is alogical consequence of the theory composed of the axioms whoseannotation is true in the world.
DISPONTE is a special case of these semantics where every axiomEi : Xi = xi is such that Xi is a single Boolean variable and theBayesian network has no edges, i.e., all the variables are independent.
R. Zese Probabilistic Description Logics 28 / 61
Inference Algorithms Tableau method
Reasoning on a KB : approaches
A variety of reasoning techniques (at least for ALC):
Resolution-based approaches
Automata based approaches
Structural approaches
Tableau based approaches
Most DL reasoners use a tableau algorithm for doing inference, and mostof them are implemented in a procedural language (e.g., Pellet, RacerPro,FaCT++)
R. Zese Probabilistic Description Logics 29 / 61
Inference Algorithms Tableau method
Tableau-based approaches
They usually solve the KB (in)consistency problem, i.e., determine whethera given KB has a model, reducing reasoning tasks to this problem:
Concept unsatisfiabilityKB |= C ≡⊥ ⇔ KB ∪ {C (x)} has no model
SubsumptionKB � C v D ⇔ KB ∪ {(C u ¬D)(x)} has no model
Instance checkingKB � C (a) ⇔ KB ∪ {¬C (a)} has no model
R. Zese Probabilistic Description Logics 30 / 61
Inference Algorithms Tableau method
The Tableau Algorithm
Starts from the ABox A, represented as a graph (the tableau), where
Each node represents an individual a and is labeled with the set ofconcepts it belongs to;Each edge between two individuals a and b is labeled with the set ofroles to which the couple (a, b) belongs.
Proves the satisfiability of an axiom by refutation, e.g.:
To test a class assertion axiom C (a), it adds ¬C to the label of a.To test the unsatisfiability of a concept C , it adds a new anonymousnode x to the tableau and adds ¬C to the label of x .
Applies expansion rules (one for each language construct, possiblynon-deterministic), until all constraints are satisfied or a contradiction(clash) is detected
Uses a blocking system (to avoid loops, and ensure termination)
R. Zese Probabilistic Description Logics 31 / 61
Inference Algorithms Tableau method
The Tableau Algorithm
tom : Cat(kevin, tom) : hasAnimalCat v Pet Q = kevin : NatureLover∃hasAnimal .Pet v NatureLoverhasPet v hasAnimal
kevin : ¬NatureLover
hasAnimal
kevin :∃hasAnimal .PetNatureLover¬NatureLover
hasAnimalhasPet
+3
tom : Cat tom :CatPet
R. Zese Probabilistic Description Logics 32 / 61
Inference Algorithms Tableau method
Expansion Rules
→ unfold: if A ∈ L(a), A atomic and (A v D) ∈ KB, thenif D /∈ L(a), thenL(a) := L(a) ∪ {D}
τ(D, a) := τ(A, a) ∪ {(A v D, a)}→ t: if (C1 t C2) ∈ L(a) and a is not blocked, then
if {C1,C2} ∩ L(a) = ∅, thenGenerate graphs Gi := G for each i ∈ {1, 2},L(a) := L(a) ∪ {Ci} for each i ∈ {1, 2}
τ(Ci , a) := τ((C1 t C2), a)
Output: explanation, as set of axioms of the KB entailing the query(thanks to the tracing function τ)
Often, we also want to find all the possible explanations for a query(axiom pinpointing)
R. Zese Probabilistic Description Logics 33 / 61
Inference Algorithms Tableau method
The Tableau Algorithm: OR-search tree
R. Zese Probabilistic Description Logics 34 / 61
Inference Algorithms Reasoners
Probabilistic Reasoners
Despite the availability of many DL reasoners, the number ofprobabilistic reasoners is quite small
BUNDLETRILLTRILLP
TORNADOPRONTO [Klinov, 2008] which follows P-SHIQ(D) semanticsBORN [Ceylan et al., 2015] following BEL
R. Zese Probabilistic Description Logics 35 / 61
Inference Algorithms Reasoners
BUNDLEBinary decision diagrams for Uncertain reasoNing on Description Logic thEories
BUNDLE performs inference over DISPONTE OWL knowledge bases
It exploits the underlying ontology reasoner Pellet [Sirin et al., 2007]able to return all explanations for a query
Explanations for a query in the form of a set of sets of axioms
BUNDLE performs a double loop over the set of explanations andover the set of axioms in each explanation, in which it builds a BDDrepresenting the set of explanations
JavaBDD library for the manipulation of BDDs
https://sites.google.com/a/unife.it/ml/bundle
R. Zese Probabilistic Description Logics 36 / 61
Inference Algorithms Reasoners
Why not Prolog?
The reasoners based on procedural languages have to implement alsoa backtracking algorithm to find all the possible explanations
Example: BUNDLE (Java-based) uses a hitting set algorithm thatrepeatedly removes an axiom from the KB and then computes again anew explanation
Reasoners written in Prolog can exploit Prolog’s backtrackingfacilities for performing the search
R. Zese Probabilistic Description Logics 37 / 61
Inference Algorithms Reasoners
Reasoning: use of Logic Programming
Some tableau expansion rules are non-deterministic (e.g., t)
Reasoners implement a search strategy in an or-branching space
The algorithm has to explore all the non-deterministic choices
LP-based implementations of the tableau (e.g., [Meissner, 2004],[Herchenroder, 2006], [Beckert and Posegga, 1995]LP-based inference methods based on resolution (e.g., in the DLogsystem by [Lukacsy and Szeredi, 2009])Bottom-up inference methods, e.g., based on Answer Set Programming(e.g., ontoDLP, and its reasoner ontoDLV [Ricca et al., 2009]), or thechase algorithm for Datalog± [Calı et al., 2009]
R. Zese Probabilistic Description Logics 38 / 61
Inference Algorithms Reasoners
TRILL - Tableau Reasoner for descrIption Logics in proLog
Implements the tableau algorithm using Prolog
Prolog’s search strategy is exploited for the non-determinism of thereasoning process
Solves the axiom pinpointing problem since we are interested in theset of explanations that entail a query
Applies all the possible expansion rules, first the non-deterministicones then the deterministic ones
Computes the probability of the query from the set of explanatons
https://sites.google.com/a/unife.it/ml/trill
R. Zese Probabilistic Description Logics 39 / 61
Inference Algorithms Reasoners
TRILLP - Tableau Reasoner for descrIption Logics in proLog powered by
Pinpointing formula
TRILLP solves the axiom pinpointing problem by computing apinpointing formula [Baader and Penaloza, 2010]
1 The pinpointing formula is a monotone Boolean formula on thevariables associated to axioms that compactly encodes the set of allexplanations
2 Like F (Q) when finding all explanations, except it may not be in DNF:
((F1 ∨ F3) ∧ F2) instead of ((F1 ∧ F2) ∨ (F3 ∧ F2))3 Convert it into a BDD, so we can compute the probability as in
BUNDLE
https://sites.google.com/a/unife.it/ml/trill
R. Zese Probabilistic Description Logics 40 / 61
Inference Algorithms Reasoners
TORNADO - Trill powered by pinpOinting foRmulas and biNAry DecisiOn
diagrams
TORNADO solves the axiom pinpointing problem by computing aBDD representing the pinpointing formula
The use of BDDs instead of explanations or pinpointing formulasallows TORNADO being usually faster and avoiding some exponentialblow-ups
https://sites.google.com/a/unife.it/ml/trill
R. Zese Probabilistic Description Logics 41 / 61
Inference Algorithms Reasoners
TRILL-on-SWISH - A web interface for TRILL, TRILLP and TORNADO
TRILL-on-SWISH is an interface for the reasoners TRILL, TRILLP
and TORNADO.
It allows users to write a KB in the RDF/XML format or in Prologsyntax directly in the web page or load it from a URL, and specifyqueries that are answered by TRILL running on the server
Available at http://trill.ml.unife.it
R. Zese Probabilistic Description Logics 42 / 61
Inference Algorithms Reasoners
PRONTO [Klinov, 2008]
A reasoner for P-SHIQ(D).
Based on Pellet, implemented by same authors.
Exploits a linear program solver such as GLPK5.
Performs probabilistic lexicographic entailment by means of solvingProbabilistic Satisfiability problems and tight logical entailments.
Pellet is used to help the generation of linear programs given as inputto the linear program solver.
5https://www.gnu.org/software/glpk/R. Zese Probabilistic Description Logics 43 / 61
Inference Algorithms Reasoners
BORN [Ceylan et al., 2015]
Answers probabilistic subsumption queries w.r.t. BEL KBs.
Exploits ProbLog for managing the probabilistic part of the KB.
Can perform inference on KBs of low expressivity (EL DL).
R. Zese Probabilistic Description Logics 44 / 61
Learning Algorithms
Probabilistic Learning
Problem Input: Background knowledge as a (probabilistic) KB B,a set of positive and negative examples E + and E−,and possibly a language bias L.Output: A probabilistic KB P such that the probabilityof positive examples according to P ∪ B is maximized and theprobability of negative examples is minimized.
Two variants:1 parameter learning: learning the parameters of a fixed probabilistic
logic program B2 structure learning: both the structure and the parameters of a
probabilistic logic program B
R. Zese Probabilistic Description Logics 45 / 61
Learning Algorithms Expectation Maximization
Parameter Learning
Parameter learning for languages following the distribution semantics hasbeen performed by using the Expectation Maximization (EM) algorithm orby gradient descent.
Gradient DescentComputes the gradient of the target function and iteratively modifythe parameters moving in the direction of the gradient.LeProbLog [Gutmann et al., 2008]: use of a dynamic programmingalgorithm for computing the gradient exploiting BDDs.
R. Zese Probabilistic Description Logics 46 / 61
Learning Algorithms Expectation Maximization
Expectation Maximization
EMEstimates the probability of models containing random variables thatare not observed in the data (hidden).Used in many systems: PRISM [Sato and Kameya, 2001],LFI-ProbLog [Fierens et al., 2015], EMBLEM[Bellodi and Riguzzi, 2013], RIB [Riguzzi and Di Mauro, 2012]).
Why EM?
We can not use relative frequency.1 For each example we can find more than one explanation with more
than one probabilistic axiom.2 We have to associate a variable to each (probabilistic) axiom.
We know only the value of a Boolean function that uses the variables,thus they are hidden.
R. Zese Probabilistic Description Logics 47 / 61
Learning Algorithms Expectation Maximization
Expectation Maximization
Expectation: for each example Q, computes E[ci0|Q] and E[ci1|Q] forall axioms Ei where cix is the number of times variable Xi takes valuex for x ∈ {0, 1}:
E[cix |Q] = P(Xi = x |Q) =P(Xi = x ,Q)
P(Q)
Then it sums up the contributions of the different examples
E[cix ] =∑Q
E[cix |Q]
Maximization:computes pi for all axioms Ei :
pi =E[ci1]
E[ci0] + E[ci1]
The EM algorithm is guaranteed to find a local maximum of theprobability of the examples.
R. Zese Probabilistic Description Logics 48 / 61
Learning Algorithms Learning systems
Probabilistic Learning Systems
Some learning systems for DL KBs:
EDGELEAPcrALC [Ochoa-Luna et al., 2011], both parameter and structurelearningGoldMiner [Volker and Niepert, 2011, Fleischhacker and Volker, 2011],both parameter and structure learning
R. Zese Probabilistic Description Logics 49 / 61
Learning Algorithms Learning systems
EDGE: Parameter Learning in DISPONTE
EDGE (Em over bDds for description loGics paramEter learning)
1 learns the parameters (weights) of a probabilistic KB by taking as inputa DL KB and a number of positive and negative examples
2 EDGE builds the BDD corresponding with the explanations of eachexample using BUNDLE
3 Then, it learns the probabilities associated with axioms by traversingthe BDDs during the execution of EM algorithm
EDGEMR(EDGE powered by MapReduce) parallelizes EDGE:
The examples are divided in chunks and associated to differentprocesses, since each example is independent of the othersThe learning phase is spread on all the workers as well
R. Zese Probabilistic Description Logics 50 / 61
Learning Algorithms Learning systems
EDGE
1 Expectation takes as input a list of BDDs and computes P(Xi = x |Q)using two passes over the BDD.
Expected counts of random variables: E [ci0|Q] and E [ci1|Q] arecomputed by traversing twice the BDDs
2 Maximization computes the parameters values for the next EMiteration by relative frequency.
R. Zese Probabilistic Description Logics 51 / 61
Learning Algorithms Learning systems
LEAP: Structure Learning in DISPONTE
LEAP (LEArning Probabilistic description logics)
1 Learns the structure of a probabilistic KB by taking as input a DL KBand a number of positive and negative examples
2 Exploits CELOE [Lehmann et al., 2011], a top-down algorithm whichlearns (acyclic) concept expressions, given a set of positive andnegative examples
3 Descriptions C for one or more Target classes are created, oneprobabilistic subsumption axiom at a time of the form p :: C v Target
is added to the ontology K;4 EDGE is run on the extended theory to compute the log-likelihood of
the data LL and the updated parametersIf LL is better than the current best LL0, the new axiom is kept in theknowledge base, otherwise the new axiom is discarded.
LEAPMR(LEAP powered by MapReduce) parallelizes LEAP by
exploiting EDGEMR
R. Zese Probabilistic Description Logics 52 / 61
Learning Algorithms Learning systems
crALC learner [Ochoa-Luna et al., 2011]
A parameter and structure learner for crALC.
Starts from positive and negative examples for a single concept andthe general concept > in the root of the search tree.The space of possible concept definitions is explored by means of arevision operator in the style of Inductive Logic Programming.Parameters are learned using an EM algorithm.
If the best score in the tree is above a threshold, a deterministicconcept definition is returned, otherwise a probabilistic inclusion Ci issearched on a weighted spanning tree, where the target concept isadded as a parent of each vertex and probabilities are learned asP(Ci |Parents(Ci )).Each definition is scored against the examplesThe definition with the highest score is retained and the algorithmenters a new refinement iteration.
top-down procedure for building axioms (CELOE) but LEAP exploitsBDD structures to compute the expected counts for EM instead ofresorting to inference in a graphical model
R. Zese Probabilistic Description Logics 53 / 61
Learning Algorithms Learning systems
GoldMiner[Volker and Niepert, 2011, Fleischhacker and Volker, 2011]
A parameter and structure learner.
Extracts information about individuals, named classes and roles usingSPARQL queries
Builds two transaction tables storing class and property assertions, arow for each individual (pair of individuals) and a column for eachnamed concept and ∃R.C concept (role), the cells contains 1 if theindividual (pair) belongs to concept (role).
APRIORI algorithm [Agrawal and Srikant, 1994] is applied to eachtable in turn in order to find ARs.
Each AR A⇒ B is converted to the axiom A v B.Confidence of AR can be interpreted as the probability of the axiom.
GoldMiner can be used to obtain a probabilistic knowledge base.
R. Zese Probabilistic Description Logics 54 / 61
Learning Algorithms Learning systems
Thank you for listening!
Research group: MachineLearning@unifehttp://ml.unife.it/
R. Zese Probabilistic Description Logics 55 / 61
Learning Algorithms Learning systems
References I
Agrawal, R. and Srikant, R. (1994).
Fast algorithms for mining association rules in large databases.In International Conference on Very Large Data Bases, pages 487–499. Morgan Kaufmann.
Baader, F. and Penaloza, R. (2010).
Automata-based axiom pinpointing.Journal of Automated Reasoning, 45(2):91–129.
Beckert, B. and Posegga, J. (1995).
leantap: Lean tableau-based deduction.J. Autom. Reasoning, 15(3):339–358.
Bellodi, E. and Riguzzi, F. (2013).
Expectation Maximization over Binary Decision Diagrams for probabilistic logic programs.Intell. Data Anal., 17(2):343–363.
Calı, A., Gottlob, G., and Lukasiewicz, T. (2009).
A general datalog-based framework for tractable query answering over ontologies.In Paredaens, J. and Su, J., editors, Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems, pages 77–86. ACM.
Carvalho, R. N., Laskey, K. B., and Costa, P. C. G. (2010).
PR-OWL 2.0 - bridging the gap to OWL semantics.In Bobillo, F. and et al., editors, 6th International Workshop on Uncertainty Reasoning for the Semantic Web, volume654 of CEUR-WS. Sun SITE Central Europe.
R. Zese Probabilistic Description Logics 56 / 61
Learning Algorithms Learning systems
References II
Ceylan, I. I., Mendez, J., and Penaloza, R. (2015).
The bayesian ontology reasoner is born!In Dumontier, M., Glimm, B., Goncalves, R. S., Horridge, M., Jimenez-Ruiz, E., Matentzoglu, N., Parsia, B., Stamou,G. B., and Stoilos, G., editors, Informal Proceedings of the 4th International Workshop on OWL Reasoner Evaluation(ORE-2015) co-located with the 28th International Workshop on Description Logics (DL 2015), volume 1387 ofCEUR-WS, pages 8–14. CEUR-WS.org.
De Raedt, L., Kimmig, A., and Toivonen, H. (2007).
ProbLog: A probabilistic Prolog and its application in link discovery.In Veloso, M. M., editor, IJCAI 2007, volume 7, pages 2462–2467, Palo Alto, California USA. AAAI Press.
Ding, Z. and Peng, Y. (2004).
A probabilistic extension to ontology language OWL.In 37th Hawaii International Conference on System Sciences (HICSS-37 2004), CD-ROM / Abstracts Proceedings, 5-8January 2004, Big Island, HI, USA. IEEE Computer Society.
Fierens, D., den Broeck, G. V., Renkens, J., Shterionov, D. S., Gutmann, B., Thon, I., Janssens, G., and De Raedt, L.
(2015).Inference and learning in probabilistic logic programs using weighted boolean formulas.Theor. Pract. Log. Prog., 15(3):358–401.
Fleischhacker, D. and Volker, J. (2011).
Inductive learning of disjointness axioms.In On the Move to Meaningful Internet Systems: OTM 2011, pages 680–697. Springer.
R. Zese Probabilistic Description Logics 57 / 61
Learning Algorithms Learning systems
References III
Gutmann, B., Kimmig, A., Kersting, K., and De Raedt, L. (2008).
Parameter learning in probabilistic databases: A least squares approach.In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases(ECMLPKDD 2008), volume 5211 of LNCS, pages 473–488. Springer-Verlag.
Herchenroder, T. (2006).
Lightweight semantic web oriented reasoning in Prolog: Tableaux inference for description logics.Master’s thesis, School of Informatics, University of Edinburgh.
Jaeger, M. (1994).
Probabilistic reasoning in terminological logics.In Doyle, J., Sandewall, E., and Torasso, P., editors, roceedings of the 4th International Conference on Principles ofKnowledge Representation and Reasoning (KR’94). Bonn, Germany, May 24-27, 1994., pages 305–316. MorganKaufmann.
Klinov, P. (2008).
Pronto: A non-monotonic probabilistic description logic reasoner.In Bechhofer, S., Hauswirth, M., Hoffmann, J., and Koubarakis, M., editors, ESWC 2008, volume 5021 of LNCS, pages822–826. Springer.
Koller, D., Levy, A. Y., and Pfeffer, A. (1997).
P-CLASSIC: A tractable probablistic description logic.In Kuipers, B. and Webber, B. L., editors, Proceedings of the Fourteenth National Conference on Artificial Intelligenceand Ninth Innovative Applications of Artificial Intelligence Conference, AAAI 97, IAAI 97, July 27-31, 1997, Providence,Rhode Island., pages 390–397. AAAI Press / The MIT Press.
R. Zese Probabilistic Description Logics 58 / 61
Learning Algorithms Learning systems
References IV
Laskey, K. B. and da Costa, P. C. G. (2005).
Of starships and klingons: Bayesian logic for the 23rd century.In Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence, Edinburgh (UAI 2005), Scotland, July26-29, 2005, pages 346–353. AUAI Press.
Lehmann, J., Auer, S., Buhmann, L., and Tramp, S. (2011).
Class expression learning for ontology engineering.Web Semantics: Science, Services and Agents on the World Wide Web, 9(1):71–81.
Lukacsy, G. and Szeredi, P. (2009).
Efficient description logic reasoning in prolog: The dlog system.Theor. Pract. Log. Prog., 9(3):343–414.
Lukasiewicz, T. (2008).
Expressive probabilistic description logics.Artif. Intell., 172(6-7):852–883.
Lutz, C. and Schroder, L. (2010).
Probabilistic Description Logics for subjective uncertainty.In Lin, F., Sattler, U., and Truszczynski, M., editors, Principles of Knowledge Representation and Reasoning:Proceedings of the Twelfth International Conference, KR 2010, Toronto, Ontario, Canada, May 9-13, 2010, pages393–403, Menlo Park, CA, USA. AAAI Press.
Meert, W., Struyf, J., and Blockeel, H. (2010).
CP-Logic theory inference with contextual variable elimination and comparison to BDD based inference methods.In Raedt, L. D., editor, ILP 2009, volume 5989 of LNCS, pages 96–109. Springer.
R. Zese Probabilistic Description Logics 59 / 61
Learning Algorithms Learning systems
References V
Meissner, A. (2004).
An automated deduction system for description logic with alcn language.Studia z Automatyki i Informatyki, 28-29:91–110.
Nilsson, N. J. (1986).
Probabilistic logic.Artif. Intell., 28(1):71–87.
Ochoa-Luna, J. E., Revoredo, K., and Cozman, F. G. (2011).
Learning probabilistic description logics: A framework and algorithms.In Advances in Artificial Intelligence, pages 28–39. Springer.
Poole, D. (2003).
First-order probabilistic inference.In Gottlob, G. and Walsh, T., editors, 18th International Joint Conference on Artificial Intelligence, pages 985–991.Morgan Kaufmann Publishers Inc.
Ricca, F., Gallucci, L., Schindlauer, R., Dell’Armi, T., Grasso, G., and Leone, N. (2009).
OntoDLV: An ASP-based system for enterprise ontologies.J. Logic Comput., 19(4):643–670.
Riguzzi, F. (2007).
A top down interpreter for LPAD and CP-logic.In Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence, volume 4733 of LNAI, pages109–120. Springer.
R. Zese Probabilistic Description Logics 60 / 61
Learning Algorithms Learning systems
References VI
Riguzzi, F. (2009).
Extended semantics and inference for the Independent Choice Logic.Logic Journal of the IGPL, 17(6):589–629.
Riguzzi, F. and Di Mauro, N. (2012).
Applying the information bottleneck to statistical relational learning.Mach. Learn., 86(1):89–114.
Riguzzi, F. and Swift, T. (2011).
The PITA system: Tabling and answer subsumption for reasoning under uncertainty.Theor. Pract. Log. Prog., 11(4–5):433–449.
Sato, T. (1995).
A statistical learning method for logic programs with distribution semantics.In Sterling, L., editor, ICLP 1995, pages 715–729, Cambridge, Massachusetts. MIT Press.
Sato, T. and Kameya, Y. (2001).
Parameter learning of logic programs for symbolic-statistical modeling.J. Artif. Intell. Res., 15:391–454.
Sirin, E., Parsia, B., Cuenca-Grau, B., Kalyanpur, A., and Katz, Y. (2007).
Pellet: A practical OWL-DL reasoner.J. Web Semant., 5(2):51–53.
Volker, J. and Niepert, M. (2011).
Statistical schema induction.In The Semantic Web: Research and Applications, pages 124–138. Springer Berlin Heidelberg.
R. Zese Probabilistic Description Logics 61 / 61