RR2010 Keynote

46
Data Validation with OWL Integrity Constraints Evren Sirin, CTO Clark & Parsia, LLC [email protected] 1 Wednesday, September 22, 2010

description

Evren Sirin gave one of the RR2010 keynotes on Integrity Constraint Validation for Linked Data with OWL2 via SPARQL.

Transcript of RR2010 Keynote

Data Validation with OWL Integrity Constraints

Evren Sirin, CTOClark & Parsia, LLC

[email protected]

1

Wednesday, September 22, 2010

Who are we?• Clark & Parsia is a semantic software startup!

– HQ in Washington, DC & office in Boston

• Provides software development and integration services

• Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers!

2

http://clarkparsia.com/Twitter: @candp

Wednesday, September 22, 2010

Overview• Data validation with OWL

– Representation and validation of integrity constraints

• Use cases– Examples, issues, workarounds

• OWL Integrity Constraints– Syntax, semantics, validation

• Comparison with other approaches– Epistemic DLs, Epistemic QLs, Rules

• Implementation and performance

3

Wednesday, September 22, 2010

Some Applications• Customer and product data

– Find which customer would be interested in buying a certain product

• System and component descriptions– Configure components to build a desired system

• Workforce and employee data– Locate employees with desired expertise

• Patient history and drug data– Detect and prevent potentially harmful drug interactions

4

Wednesday, September 22, 2010

Common Theme• There is data and lots of it!• Adding semantics to the data helps a lot

– Sometimes simple taxonomies, but other times, complex ontologies

• We have complete knowledge about the domain• Errors in the data cause problems

– Failures in applications, errors in decision making, potential loss of revenue, security vulnerabilities, etc.

5

Wednesday, September 22, 2010

Data Validation• Fundamental!data management!problem

– Verify data integrity and correctness!– Enforce validity of updates!

• Relevant in many scenarios– Storing data for stand-alone applications– Exchanging data in distributed settings

• Solved (to some degree) in RDBMSs– Harder to achieve as data semantics increase and/or

more expressive integrity conditions are required

6

Wednesday, September 22, 2010

Disclaimer• Data validity not important for every use case

– Invalid data may be fine for an application– Invalidity may even be a requirement

• Focus of this talk is cases where data consistency and integrity are crucial

7

Wednesday, September 22, 2010

Building Semantic Apps• Represent data as RDF triples

– First step for accomplishing data integration and analysis

• Enrich data with more semantics (RDFS, OWL)– Infer implicit information from explicit assertions

• Ensure data validity– Detect errors in the data

• Do something cool with the data– Obviously...

8

Wednesday, September 22, 2010

Reasoning Example• Input ontology

# Every supervisor is an employeeSupervisor subClassOf Employee# Person0853 is a managerPerson085 type Supervisor

• Output inferences# Person0853 is an employeePerson085 type Employee

9

Wednesday, September 22, 2010

Reasoning Example• Input ontology

# Every supervisor is an employeeSupervisor subClassOf Employee# Person0853 is a managerPerson085 type Supervisor

• Output inferences# Person0853 is an employeePerson085 type Employee

Schema

9

Wednesday, September 22, 2010

Reasoning Example• Input ontology

# Every supervisor is an employeeSupervisor subClassOf Employee# Person0853 is a managerPerson085 type Supervisor

• Output inferences# Person0853 is an employeePerson085 type Employee

Schema

Instance data

9

Wednesday, September 22, 2010

Validating RDF Data• Common misunderstanding

– RDFS/OWL is to RDF what XML Schema is to XML– Describe integrity conditions in RDFS or OWL

• Typing constraints - RDFS domain/range• Participation constraints - OWL some values restrictions• Uniqueness constraints - OWL cardinality restriction

– Use a reasoner to find inconsistencies

• Problem:!Open World Assumption

10

Wednesday, September 22, 2010

Closed vs. Open World• Two different views on truth:

– CWA: Any statement that is not known to be true is false– OWA: A statement is false only if it is known to be false

• Used in different contexts– Databases use CWA because (typically) they contain!

complete information– Ontologies use OWA because (typically) they don't...

that is, they contain!incomplete information

• Data validation results significantly different when using CWA instead of OWA

11

Wednesday, September 22, 2010

Typing Constraint• Only managers can supervise employees• Input ontology

o supervises domain Supervisoro Person085 supervises Person173

OWA CWA

!Consistent true false

!Reason Infer that Person085 type Supervisor

Assume that Person085 type not Supervisor

12

Wednesday, September 22, 2010

• Each supervisor must supervise at least one employee

• Input axiomso Supervisor subClassOf supervises some Employeeo Person085 type Supervisor

OWA CWA Consistent true false

Reason Infer that Person085 supervises _:b _:b type Employee

Assume that Person085 supervises _:b does not exist

Participation Constraint

13

Wednesday, September 22, 2010

Uniqueness Constraint• Employees can have at most one supervisor• Input axioms

o supervises InverseFunctionalo Person085 supervises Person173o Person632 supervises Person173

OWA CWA Consistent true false

Reason Infer that Person085 sameAs Person632

Assume that Person085 sameAs Person632 does not hold

14

Wednesday, September 22, 2010

Workarounds for CW• Manually close the world

– Declare all individuals different from each other– Count existing property values and add a max

cardinality restriction– Make all disjointness statements explicit and add

negated types to individuals

• Drawbacks– Can be computationally expensive– Likely to be error-prone

15

Wednesday, September 22, 2010

Problem Summary• Definitions in an OWL schema may have two

purposes– Infer new statements– Check if existing statements are valid

• Using OWA for validation is undesirable – Not always but in many cases

• In a problem domain we may have:– Complete knowledge about some parts of the domain– Incomplete knowledge about the other parts

16

Wednesday, September 22, 2010

Integrity Constraint Solution

• We defined an alternative semantics for OWL– Integrity Constraint (IC) semantics use CWA– Can be combined with regular inference axioms

• Ontology developer chooses which axioms will be!interpreted with...– OWA - regular OWL axiom, or– CWA - integrity constraint

17

Wednesday, September 22, 2010

IC Extension• Syntax specification

– How do we syntactically say an axiom is an IC and not a regular OWL axiom?

• Semantics specification– How do we exactly interpret an IC?

• Validation algorithm– Given the semantics how do we check for IC

violations?

18

Wednesday, September 22, 2010

IC Syntax• Similar approach to using owl:imports• Define a new annotation property in a new

namespace

Ont1 owl:imports Ont2Ont1 ic:imports IC1

• Backward compatible, requires minimum change in tools

19

Wednesday, September 22, 2010

Use Case: SKOS• Simple Knowledge Organization System (SKOS)• SKOS provides a model for expressing the basic

structure and content of concept schemes – Thesauri, classification schemes, subject heading lists,

taxonomies, folksonomies, etc.

• SKOS data model specification– Informal (Text): http://www.w3.org/TR/skos-reference/– Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf

20

Wednesday, September 22, 2010

# NL constraints from SKOS specification expressed as ICsskos:related propertyDisjointWith skos:broaderTransitive

# SKOS reference ontology that contains inference rulesskos:broaderTransitive Transitiveskos:broaderTransitive subPropertyOf skos:broader

# SKOS data that violates the SKOS data model[] a owl:Ontology ; owl:imports skos-reference.ttl ;                  ic:imports skos-constraints.ttl .

A skos:broader B ; skos:related C . B skos:broader C .

skos-constraints.ttl

skos-invalid.ttl

skos-reference.ttl

SKOS Example

IC validation requires OWL reasoning 21

Wednesday, September 22, 2010

# SKOS data that violates the SKOS data model when # SKOS ontology is imported as ICs as well[] a owl:Ontology ; owl:imports skos-xl.ttl ;                  ic:imports skos-xl.ttl .

A skosxl:labelRelation LabelA LabelA type skosxl:Label .

skos-invalid.tll

skos-xl.ttl

Another SKOS Example

Same ontology can be both a regular OWL import and an IC import

# SKOS-XL ontology with a cardinality restrictionskosxl:Label subClassOf skosxl:literalForm cardinality 1

22

Wednesday, September 22, 2010

IC Semantics• OWL semantics based on model theory

– Similar to First Order Logic– Formal, precise, and unambiguous

• IC semantics specification – Extends OWL model theory– Change couple basic definitions, everything else

follows

• Details published in technical papers– We are submitting a W3C member submission soon

23

Wednesday, September 22, 2010

IC Interpretations• A regular OWL interpretation I = ( ΔI, ΔD, ⋅C, ⋅OP, ⋅DP, ⋅I,

⋅DT, ⋅LT, ⋅FA) is a 9-tuple– ⋅C is the class interpretation function that assigns to each class

C ∈ VC a subset (C)C ⊆ ΔI

• An OWL IC interpretation Γ = ( ΔI, ΔD, I, U, ⋅C, ⋅OP, ⋅DP, ⋅I, ⋅DT, ⋅LT, ⋅FA) is a 11-tuple where I and elements of U are regular OWL interpretations– (C)C = {xI | x ∈ VI and for each Uj ∈ U we have that xIUj

∈ (C)Cj }.

• More details available in references at the end

24

Wednesday, September 22, 2010

Other Approaches• IC semantics of Motik et al. [WWW2007] • Epistemic DLs• Epistemic QLs• Rules with negation as failure operators

25

Wednesday, September 22, 2010

Validation Algorithm• An automated translation!algorithm• Automatically maps an OWL IC to a SPARQL

query – Query must be evaluated with OWL entailment regime

• ICs can be mapped to RIF rules too– Use SPARQL and Datalog correspondence– RIF defines Negation-as-Failure operator

• Many different implementation possibilities• Off-the-shelf tools can be used for IC validation

26

Wednesday, September 22, 2010

SPARQL Translation

SELECT * { ?x type Supervisor. NOT EXISTS { ?x supervises ?y. ?y type Employee. } }

Supervisor subClassOf supervises some Employee

27

Wednesday, September 22, 2010

RIF Translation

Forall ?x ?y ( invalid() :- And ( ?x[type -> Supervisor] Naf And ( ?x[supervises -> ?y] ?y[type -> Employee] )))

Supervisor subClassOf supervises some Employee

28

Wednesday, September 22, 2010

Solution Summary• Separate ICs from regular OWL ICs

– No new syntax– Import-based mechanism

• Alternative semantics for ICs– Extends OWL model theory– Provides the meanings of ICs formally

• Validation algorithm– Translate ICs to another formalism– SPARQL or RIF engines can be used

29

Wednesday, September 22, 2010

Explanations

30

• Explanations for positive atoms well-understood– Smallest subset of the ontology that entails the atom– Precise & laconic explanation– Lemma generation

• Explanation for IC violation are tricky– Need to explain negation (i.e. missing values)– Lemma generation even more crucial

• Simple solution– Explanation represented as a tree where each node

represents an existing or missing axiom

Wednesday, September 22, 2010

Explanation Example (1)VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C INFERRED: A broaderTransitive C

31

Wednesday, September 22, 2010

Explanation Example (1)VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C

32

Wednesday, September 22, 2010

Explanation Example (1)VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C ASSERTED: A broader B ASSERTED: B broader C ASSERTED: broader subPropertyOf broaderTransitive ASSERTED: broaderTransitive Transitive

33

Wednesday, September 22, 2010

Explanation Example (2)VIOLATION: A violates Label subClassOf literalForm cardinality 1

INFERRED: A type Label INFERRED: A labelRelation LabelA NOT INFERRED: LabelA literalForm

34

Missing values are represented as

Wednesday, September 22, 2010

Explanation Example (2)VIOLATION: A violates Label subClassOf literalForm cardinality 1

INFERRED: A type Label ASSERTED: A type Label INFERRED: A labelRelation LabelA ASSERTED: A labelRelation LabelA NOT INFERRED: LabelA literalForm NOT ASSERTED: LabelA literalForm

35

Wednesday, September 22, 2010

Performance• Using ICs can improve performance!• Expressive OWL reasoning is not easy• Profiles of OWL defined for tractable reasoning

– OWL 2 QL, OWL 2 EL, OWL 2 RL– Less expressive but more efficient

• Modeling some OWL axioms as ICs may reduce the expressivity where OWL reasoning is used

36

Wednesday, September 22, 2010

Prototype • Pellet IC validator

– Translates ICs into SPARQL queries automatically– Executes SPARQL queries with Pellet– Query results show constraint violations– Automatically explain constraint violations

• Free download– http://clarkparsia.com/pellet/icv

37

Wednesday, September 22, 2010

Code Example// create an inferencing model using Pellet reasonerInfModel dataModel = ModelFactory.createInfModel(r);

// load the schema and instance data to PelletdataModel.read( "file:data.rdf" );dataModel.read( "file:schema.owl" ); // Create the IC validator and associate it with the datasetJenaICValidator validator = new JenaICValidator(dataModel); // Load the constraints into the IC validatorvalidator.getConstraints().read("file:constraints.owl");

// Get the constraint violationsIterator<ConstraintViolation> violations = validator.getViolations();

38

Wednesday, September 22, 2010

Next Steps• W3C Member submission for IC semantics• Robust IC validator implementation

– Incremental validation– Multi-threaded validation

• Support for IC editing• Integration with PelletDb

– Scalable reasoning + validation

39

Wednesday, September 22, 2010

• Evren Sirin, Michael Smith, Evan Wallace Opening, Closing Worlds - On Integrity ConstraintsOWL: Experiences and Directions Workshop (OWLED '08), October 2008.

• Evren Sirin, Jiao TaoTowards Integrity Constraints in OWLOWL: Experiences and Directions Workshop (OWLED '09), October 2009.

• Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinnessIntegrity Constraints in OWLTo AppearThe 24th AAAIConference on Artificial Intelligence (AAAI '10), July 2010.

References

40

Wednesday, September 22, 2010

Questions

41

Wednesday, September 22, 2010

Other Approaches• IC semantics of Motik et al. [WWW2007] • Epistemic DLs• Epistemic QLs• Rules

42

Wednesday, September 22, 2010

IC Semantics of Motik et al.• Same motivation and similar approach

– Separate ICs for regular OWL axioms– Use ICs for validation only

• Semantics based on outer skolemization on first order formula and entailment in minimal Herbrand models

• Several features of the semantics made it unsuitable for us– ICs can be satisfied by existential variables– Disjunction can cause false positives

43

Wednesday, September 22, 2010

Epistemic DLs• DLs extended with epistemic operator K• ICs can be represented as epistemic queries over a

regular KB– Aligns with Reiter’s original characterization of ICs

• Example:– KSupervisor subClassOf Ksupervises some KEmployee

• Only major differences– We are not using K operator explicitly– No Unique Name Assumption in OWL ICs

44

Wednesday, September 22, 2010