Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan...

21
Ontology Quality by Detecti on of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University of Georgia, USA EON’2006 Edinburgh, Scotland, May 22, 2006 Co-located with WWW-2006

Transcript of Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan...

Page 1: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Ontology Quality by Detection of Conflicts in Metadata

Budak I. ArpinarKarthikeyan Giriloganathan

Boanerges Aleman-Meza

LSDIS lab Computer Science

University of Georgia, USA

EON’2006Edinburgh, Scotland, May 22, 2006

Co-located with WWW-2006

Page 2: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Motivation• Ontologies over 1 million entities increasingly

appearing• TAP, SWETO, GlycO, UniProt

• Quality Concerns:– Entity disambiguation– Which ontologies are available? (i.e., search & ranking)– Inconsistency checking (i.e., in OWL)– Conflict detection

Page 3: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

… Motivation•“Representing, identifying, discovering, validating, and

exploiting complex relationships are important issues related to realizing the full power of the Semantic Web, and can help close the gap between highly separated information retrieval and decision-making steps” [Sheth, Arpinar & Kashyap 2003]

•“The Web is decentralized, allowing anyone to say anything. As a result, different viewpoints may be contradictory, or even false information may be provided. In order to prevent agents from combining incompatible data or from taking consistent data and evolving it into an inconsistent state, it is important that inconsistencies can be detected automatically” [W3C 2004]

•“… these problems manifest themselves in various ways, including poor recall of available resources and inconsistency of search results. They arise due to errors, omissions and ambiguities in the metadata…” [Currier & Barton 2003]

Page 4: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Our Approach

• Approach: Detection of conflicting relationships– or conflicts in sequences of relationships

• How? User-defined rules are validated against a populated ontology– These rules are domain-dependent

• Goal: By detecting conflicting data, a user can take action to improve the quality of the ontology

Page 5: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

John

Claura

Bill

Mary

fatherOfmarriedTo

motherOf

fatherOf

marriedTo

fatherinLawOf

CONFLICT

Example of Conflict Identification

Page 6: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

WilliamsChris RepublicanPartyvotedFor memberOfsupporterOf

• An RDF triple is a simplification

• Basically, composing relationships– Leading to simple relations yet somewhat arbitrary

Few definitions, ‘simplification’

Page 7: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Statement Simplification• There could be simplifications of the form:

statement1 statement2 … statementn → statementt

• In this case statementt is a simplification– this is dependent on expert knowledge– this is not in the traditional reasoning approach

Page 8: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Statement Simplification

Immigrant Immigrant

FinancialOrganization

JudicialOrganization

BusinessOrganization

Person

multipleDeposits

associated

owner

works

underInvestigation

MoneyLaunderingsuspected

Page 9: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

T A set of triples

S A function denoting the process of simplification

s The result of simplification (S(T)s)

U Constraints expressed in an ontology (e.g., the property ‘biologicalMother’ is unique)

E Constraints supplied by an expert (e.g., person(x) can never do action(y))

Two sets of triples T1 and T2 are in conflict if their simplifications S(T1)s1 and S(T2)s2 are mutually non-agreeable

Using ‘simplification’ for detection of conflict

Two simplifications s1 and s2 are mutually non-agreeable if taken together they are in violation of domain constrains

Page 10: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Defining Rules for Simplification

Page 11: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Types of Conflicts

• Property Assertion

• Class Assertion

• Statement Assertion

Page 12: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Establish constraints on properties

- based on the semantics of their intended/expected use

- thus, subjective

Examples:

‘asymmetric’ constraint

‘disjoint’ constraint

Types of conflicts: Property Assertion

Page 13: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Types of Conflicts: Class Assertion Establish constraints on classes

- based on the semantics of their intended/expected use

- also, subjective

Examples:

‘disjoint’ classes

(schema or instances)

Page 14: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Types of Conflicts: Statement Assertion• Stating that under

certain conditions, one or more statement are conflicting

• Example, a person cannot be a superior and a friend to “John” at the same time

Page 15: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Semantic Metadata

Ontology

JENA API

SERIALIZER

User Interface

Relationship Ontology RULES RuleMLSIMPLIFICATION

CONFIDER API

MANDARAX API

Facts RULES RuleML

MANDARAX APICONFLICT ENGINE

System Architecture

Page 16: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Performance Evaluation

• Tested with an ontology of 6K entities and 11K relationships – subset of SWETO ontology– domain of computer science publications

• Sample conflict detection of:– no two same papers published in different

publication venues

Page 17: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Conflict Identification Results

Page 18: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Statement Provenance

Page 19: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Performance EvaluationTriples vs Time

4.6

51849588 5

.955244091

6.5

08925185

6.7

30027336

6.8

08724663

6.8

75508877

6.8

9743981

7.0

30853263

7.0

75019174

7.0

49989218

0

1

2

3

4

5

6

7

8

0 200 400 600 800 1000 1200

No of Triples

Lo

g (

Tim

e in

milliseco

nd

s)

with increase in number of conflicts (500 triples)

Page 20: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Conclusions and Discussion

• Defined types of conflicts • Described a rule-based approach to identify

the conflicts

Findings:• Scalability limited by other tools (Mandarax)• Applicable to refining extraction-based

approaches for populating ontologies• Very domain-dependent and subjective

method

Page 21: Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006

Comments, Questions, …