RR2010 Keynote
-
Upload
clark-parsia-llc -
Category
Technology
-
view
1.246 -
download
2
description
Transcript of RR2010 Keynote
Data Validation with OWL Integrity Constraints
Evren Sirin, CTOClark & Parsia, LLC
1
Wednesday, September 22, 2010
Who are we?• Clark & Parsia is a semantic software startup!
– HQ in Washington, DC & office in Boston
• Provides software development and integration services
• Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers!
2
http://clarkparsia.com/Twitter: @candp
Wednesday, September 22, 2010
Overview• Data validation with OWL
– Representation and validation of integrity constraints
• Use cases– Examples, issues, workarounds
• OWL Integrity Constraints– Syntax, semantics, validation
• Comparison with other approaches– Epistemic DLs, Epistemic QLs, Rules
• Implementation and performance
3
Wednesday, September 22, 2010
Some Applications• Customer and product data
– Find which customer would be interested in buying a certain product
• System and component descriptions– Configure components to build a desired system
• Workforce and employee data– Locate employees with desired expertise
• Patient history and drug data– Detect and prevent potentially harmful drug interactions
4
Wednesday, September 22, 2010
Common Theme• There is data and lots of it!• Adding semantics to the data helps a lot
– Sometimes simple taxonomies, but other times, complex ontologies
• We have complete knowledge about the domain• Errors in the data cause problems
– Failures in applications, errors in decision making, potential loss of revenue, security vulnerabilities, etc.
5
Wednesday, September 22, 2010
Data Validation• Fundamental!data management!problem
– Verify data integrity and correctness!– Enforce validity of updates!
• Relevant in many scenarios– Storing data for stand-alone applications– Exchanging data in distributed settings
• Solved (to some degree) in RDBMSs– Harder to achieve as data semantics increase and/or
more expressive integrity conditions are required
6
Wednesday, September 22, 2010
Disclaimer• Data validity not important for every use case
– Invalid data may be fine for an application– Invalidity may even be a requirement
• Focus of this talk is cases where data consistency and integrity are crucial
7
Wednesday, September 22, 2010
Building Semantic Apps• Represent data as RDF triples
– First step for accomplishing data integration and analysis
• Enrich data with more semantics (RDFS, OWL)– Infer implicit information from explicit assertions
• Ensure data validity– Detect errors in the data
• Do something cool with the data– Obviously...
8
Wednesday, September 22, 2010
Reasoning Example• Input ontology
# Every supervisor is an employeeSupervisor subClassOf Employee# Person0853 is a managerPerson085 type Supervisor
• Output inferences# Person0853 is an employeePerson085 type Employee
9
Wednesday, September 22, 2010
Reasoning Example• Input ontology
# Every supervisor is an employeeSupervisor subClassOf Employee# Person0853 is a managerPerson085 type Supervisor
• Output inferences# Person0853 is an employeePerson085 type Employee
Schema
9
Wednesday, September 22, 2010
Reasoning Example• Input ontology
# Every supervisor is an employeeSupervisor subClassOf Employee# Person0853 is a managerPerson085 type Supervisor
• Output inferences# Person0853 is an employeePerson085 type Employee
Schema
Instance data
9
Wednesday, September 22, 2010
Validating RDF Data• Common misunderstanding
– RDFS/OWL is to RDF what XML Schema is to XML– Describe integrity conditions in RDFS or OWL
• Typing constraints - RDFS domain/range• Participation constraints - OWL some values restrictions• Uniqueness constraints - OWL cardinality restriction
– Use a reasoner to find inconsistencies
• Problem:!Open World Assumption
10
Wednesday, September 22, 2010
Closed vs. Open World• Two different views on truth:
– CWA: Any statement that is not known to be true is false– OWA: A statement is false only if it is known to be false
• Used in different contexts– Databases use CWA because (typically) they contain!
complete information– Ontologies use OWA because (typically) they don't...
that is, they contain!incomplete information
• Data validation results significantly different when using CWA instead of OWA
11
Wednesday, September 22, 2010
Typing Constraint• Only managers can supervise employees• Input ontology
o supervises domain Supervisoro Person085 supervises Person173
OWA CWA
!Consistent true false
!Reason Infer that Person085 type Supervisor
Assume that Person085 type not Supervisor
12
Wednesday, September 22, 2010
• Each supervisor must supervise at least one employee
• Input axiomso Supervisor subClassOf supervises some Employeeo Person085 type Supervisor
OWA CWA Consistent true false
Reason Infer that Person085 supervises _:b _:b type Employee
Assume that Person085 supervises _:b does not exist
Participation Constraint
13
Wednesday, September 22, 2010
Uniqueness Constraint• Employees can have at most one supervisor• Input axioms
o supervises InverseFunctionalo Person085 supervises Person173o Person632 supervises Person173
OWA CWA Consistent true false
Reason Infer that Person085 sameAs Person632
Assume that Person085 sameAs Person632 does not hold
14
Wednesday, September 22, 2010
Workarounds for CW• Manually close the world
– Declare all individuals different from each other– Count existing property values and add a max
cardinality restriction– Make all disjointness statements explicit and add
negated types to individuals
• Drawbacks– Can be computationally expensive– Likely to be error-prone
15
Wednesday, September 22, 2010
Problem Summary• Definitions in an OWL schema may have two
purposes– Infer new statements– Check if existing statements are valid
• Using OWA for validation is undesirable – Not always but in many cases
• In a problem domain we may have:– Complete knowledge about some parts of the domain– Incomplete knowledge about the other parts
16
Wednesday, September 22, 2010
Integrity Constraint Solution
• We defined an alternative semantics for OWL– Integrity Constraint (IC) semantics use CWA– Can be combined with regular inference axioms
• Ontology developer chooses which axioms will be!interpreted with...– OWA - regular OWL axiom, or– CWA - integrity constraint
17
Wednesday, September 22, 2010
IC Extension• Syntax specification
– How do we syntactically say an axiom is an IC and not a regular OWL axiom?
• Semantics specification– How do we exactly interpret an IC?
• Validation algorithm– Given the semantics how do we check for IC
violations?
18
Wednesday, September 22, 2010
IC Syntax• Similar approach to using owl:imports• Define a new annotation property in a new
namespace
Ont1 owl:imports Ont2Ont1 ic:imports IC1
• Backward compatible, requires minimum change in tools
19
Wednesday, September 22, 2010
Use Case: SKOS• Simple Knowledge Organization System (SKOS)• SKOS provides a model for expressing the basic
structure and content of concept schemes – Thesauri, classification schemes, subject heading lists,
taxonomies, folksonomies, etc.
• SKOS data model specification– Informal (Text): http://www.w3.org/TR/skos-reference/– Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf
20
Wednesday, September 22, 2010
# NL constraints from SKOS specification expressed as ICsskos:related propertyDisjointWith skos:broaderTransitive
# SKOS reference ontology that contains inference rulesskos:broaderTransitive Transitiveskos:broaderTransitive subPropertyOf skos:broader
# SKOS data that violates the SKOS data model[] a owl:Ontology ; owl:imports skos-reference.ttl ; ic:imports skos-constraints.ttl .
A skos:broader B ; skos:related C . B skos:broader C .
skos-constraints.ttl
skos-invalid.ttl
skos-reference.ttl
SKOS Example
IC validation requires OWL reasoning 21
Wednesday, September 22, 2010
# SKOS data that violates the SKOS data model when # SKOS ontology is imported as ICs as well[] a owl:Ontology ; owl:imports skos-xl.ttl ; ic:imports skos-xl.ttl .
A skosxl:labelRelation LabelA LabelA type skosxl:Label .
skos-invalid.tll
skos-xl.ttl
Another SKOS Example
Same ontology can be both a regular OWL import and an IC import
# SKOS-XL ontology with a cardinality restrictionskosxl:Label subClassOf skosxl:literalForm cardinality 1
22
Wednesday, September 22, 2010
IC Semantics• OWL semantics based on model theory
– Similar to First Order Logic– Formal, precise, and unambiguous
• IC semantics specification – Extends OWL model theory– Change couple basic definitions, everything else
follows
• Details published in technical papers– We are submitting a W3C member submission soon
23
Wednesday, September 22, 2010
IC Interpretations• A regular OWL interpretation I = ( ΔI, ΔD, ⋅C, ⋅OP, ⋅DP, ⋅I,
⋅DT, ⋅LT, ⋅FA) is a 9-tuple– ⋅C is the class interpretation function that assigns to each class
C ∈ VC a subset (C)C ⊆ ΔI
• An OWL IC interpretation Γ = ( ΔI, ΔD, I, U, ⋅C, ⋅OP, ⋅DP, ⋅I, ⋅DT, ⋅LT, ⋅FA) is a 11-tuple where I and elements of U are regular OWL interpretations– (C)C = {xI | x ∈ VI and for each Uj ∈ U we have that xIUj
∈ (C)Cj }.
• More details available in references at the end
24
Wednesday, September 22, 2010
Other Approaches• IC semantics of Motik et al. [WWW2007] • Epistemic DLs• Epistemic QLs• Rules with negation as failure operators
25
Wednesday, September 22, 2010
Validation Algorithm• An automated translation!algorithm• Automatically maps an OWL IC to a SPARQL
query – Query must be evaluated with OWL entailment regime
• ICs can be mapped to RIF rules too– Use SPARQL and Datalog correspondence– RIF defines Negation-as-Failure operator
• Many different implementation possibilities• Off-the-shelf tools can be used for IC validation
26
Wednesday, September 22, 2010
SPARQL Translation
SELECT * { ?x type Supervisor. NOT EXISTS { ?x supervises ?y. ?y type Employee. } }
Supervisor subClassOf supervises some Employee
27
Wednesday, September 22, 2010
RIF Translation
Forall ?x ?y ( invalid() :- And ( ?x[type -> Supervisor] Naf And ( ?x[supervises -> ?y] ?y[type -> Employee] )))
Supervisor subClassOf supervises some Employee
28
Wednesday, September 22, 2010
Solution Summary• Separate ICs from regular OWL ICs
– No new syntax– Import-based mechanism
• Alternative semantics for ICs– Extends OWL model theory– Provides the meanings of ICs formally
• Validation algorithm– Translate ICs to another formalism– SPARQL or RIF engines can be used
29
Wednesday, September 22, 2010
Explanations
30
• Explanations for positive atoms well-understood– Smallest subset of the ontology that entails the atom– Precise & laconic explanation– Lemma generation
• Explanation for IC violation are tricky– Need to explain negation (i.e. missing values)– Lemma generation even more crucial
• Simple solution– Explanation represented as a tree where each node
represents an existing or missing axiom
Wednesday, September 22, 2010
Explanation Example (1)VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C INFERRED: A broaderTransitive C
31
Wednesday, September 22, 2010
Explanation Example (1)VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C
32
Wednesday, September 22, 2010
Explanation Example (1)VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C ASSERTED: A broader B ASSERTED: B broader C ASSERTED: broader subPropertyOf broaderTransitive ASSERTED: broaderTransitive Transitive
33
Wednesday, September 22, 2010
Explanation Example (2)VIOLATION: A violates Label subClassOf literalForm cardinality 1
INFERRED: A type Label INFERRED: A labelRelation LabelA NOT INFERRED: LabelA literalForm
34
Missing values are represented as
Wednesday, September 22, 2010
Explanation Example (2)VIOLATION: A violates Label subClassOf literalForm cardinality 1
INFERRED: A type Label ASSERTED: A type Label INFERRED: A labelRelation LabelA ASSERTED: A labelRelation LabelA NOT INFERRED: LabelA literalForm NOT ASSERTED: LabelA literalForm
35
Wednesday, September 22, 2010
Performance• Using ICs can improve performance!• Expressive OWL reasoning is not easy• Profiles of OWL defined for tractable reasoning
– OWL 2 QL, OWL 2 EL, OWL 2 RL– Less expressive but more efficient
• Modeling some OWL axioms as ICs may reduce the expressivity where OWL reasoning is used
36
Wednesday, September 22, 2010
Prototype • Pellet IC validator
– Translates ICs into SPARQL queries automatically– Executes SPARQL queries with Pellet– Query results show constraint violations– Automatically explain constraint violations
• Free download– http://clarkparsia.com/pellet/icv
37
Wednesday, September 22, 2010
Code Example// create an inferencing model using Pellet reasonerInfModel dataModel = ModelFactory.createInfModel(r);
// load the schema and instance data to PelletdataModel.read( "file:data.rdf" );dataModel.read( "file:schema.owl" ); // Create the IC validator and associate it with the datasetJenaICValidator validator = new JenaICValidator(dataModel); // Load the constraints into the IC validatorvalidator.getConstraints().read("file:constraints.owl");
// Get the constraint violationsIterator<ConstraintViolation> violations = validator.getViolations();
38
Wednesday, September 22, 2010
Next Steps• W3C Member submission for IC semantics• Robust IC validator implementation
– Incremental validation– Multi-threaded validation
• Support for IC editing• Integration with PelletDb
– Scalable reasoning + validation
39
Wednesday, September 22, 2010
• Evren Sirin, Michael Smith, Evan Wallace Opening, Closing Worlds - On Integrity ConstraintsOWL: Experiences and Directions Workshop (OWLED '08), October 2008.
• Evren Sirin, Jiao TaoTowards Integrity Constraints in OWLOWL: Experiences and Directions Workshop (OWLED '09), October 2009.
• Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinnessIntegrity Constraints in OWLTo AppearThe 24th AAAIConference on Artificial Intelligence (AAAI '10), July 2010.
References
40
Wednesday, September 22, 2010
Other Approaches• IC semantics of Motik et al. [WWW2007] • Epistemic DLs• Epistemic QLs• Rules
42
Wednesday, September 22, 2010
IC Semantics of Motik et al.• Same motivation and similar approach
– Separate ICs for regular OWL axioms– Use ICs for validation only
• Semantics based on outer skolemization on first order formula and entailment in minimal Herbrand models
• Several features of the semantics made it unsuitable for us– ICs can be satisfied by existential variables– Disjunction can cause false positives
43
Wednesday, September 22, 2010
Epistemic DLs• DLs extended with epistemic operator K• ICs can be represented as epistemic queries over a
regular KB– Aligns with Reiter’s original characterization of ICs
• Example:– KSupervisor subClassOf Ksupervises some KEmployee
• Only major differences– We are not using K operator explicitly– No Unique Name Assumption in OWL ICs
44
Wednesday, September 22, 2010