Krzysztof Pietrzak Infrastructure Archiect Boanerges S.C. Exchange Server 2010 Planning and Sizing.
Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan...
-
Upload
domenic-robinson -
Category
Documents
-
view
213 -
download
0
Transcript of Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan...
Ontology Quality by Detection of Conflicts in Metadata
Budak I. ArpinarKarthikeyan Giriloganathan
Boanerges Aleman-Meza
LSDIS lab Computer Science
University of Georgia, USA
EON’2006Edinburgh, Scotland, May 22, 2006
Co-located with WWW-2006
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Motivation• Ontologies over 1 million entities increasingly
appearing• TAP, SWETO, GlycO, UniProt
• Quality Concerns:– Entity disambiguation– Which ontologies are available? (i.e., search & ranking)– Inconsistency checking (i.e., in OWL)– Conflict detection
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
… Motivation•“Representing, identifying, discovering, validating, and
exploiting complex relationships are important issues related to realizing the full power of the Semantic Web, and can help close the gap between highly separated information retrieval and decision-making steps” [Sheth, Arpinar & Kashyap 2003]
•“The Web is decentralized, allowing anyone to say anything. As a result, different viewpoints may be contradictory, or even false information may be provided. In order to prevent agents from combining incompatible data or from taking consistent data and evolving it into an inconsistent state, it is important that inconsistencies can be detected automatically” [W3C 2004]
•“… these problems manifest themselves in various ways, including poor recall of available resources and inconsistency of search results. They arise due to errors, omissions and ambiguities in the metadata…” [Currier & Barton 2003]
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Our Approach
• Approach: Detection of conflicting relationships– or conflicts in sequences of relationships
• How? User-defined rules are validated against a populated ontology– These rules are domain-dependent
• Goal: By detecting conflicting data, a user can take action to improve the quality of the ontology
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
John
Claura
Bill
Mary
fatherOfmarriedTo
motherOf
fatherOf
marriedTo
fatherinLawOf
CONFLICT
Example of Conflict Identification
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
WilliamsChris RepublicanPartyvotedFor memberOfsupporterOf
• An RDF triple is a simplification
• Basically, composing relationships– Leading to simple relations yet somewhat arbitrary
Few definitions, ‘simplification’
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Statement Simplification• There could be simplifications of the form:
statement1 statement2 … statementn → statementt
• In this case statementt is a simplification– this is dependent on expert knowledge– this is not in the traditional reasoning approach
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Statement Simplification
Immigrant Immigrant
FinancialOrganization
JudicialOrganization
BusinessOrganization
Person
multipleDeposits
associated
owner
works
underInvestigation
MoneyLaunderingsuspected
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
T A set of triples
S A function denoting the process of simplification
s The result of simplification (S(T)s)
U Constraints expressed in an ontology (e.g., the property ‘biologicalMother’ is unique)
E Constraints supplied by an expert (e.g., person(x) can never do action(y))
Two sets of triples T1 and T2 are in conflict if their simplifications S(T1)s1 and S(T2)s2 are mutually non-agreeable
Using ‘simplification’ for detection of conflict
Two simplifications s1 and s2 are mutually non-agreeable if taken together they are in violation of domain constrains
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Defining Rules for Simplification
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Types of Conflicts
• Property Assertion
• Class Assertion
• Statement Assertion
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Establish constraints on properties
- based on the semantics of their intended/expected use
- thus, subjective
Examples:
‘asymmetric’ constraint
‘disjoint’ constraint
Types of conflicts: Property Assertion
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Types of Conflicts: Class Assertion Establish constraints on classes
- based on the semantics of their intended/expected use
- also, subjective
Examples:
‘disjoint’ classes
(schema or instances)
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Types of Conflicts: Statement Assertion• Stating that under
certain conditions, one or more statement are conflicting
• Example, a person cannot be a superior and a friend to “John” at the same time
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Semantic Metadata
Ontology
JENA API
SERIALIZER
User Interface
Relationship Ontology RULES RuleMLSIMPLIFICATION
CONFIDER API
MANDARAX API
Facts RULES RuleML
MANDARAX APICONFLICT ENGINE
System Architecture
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Performance Evaluation
• Tested with an ontology of 6K entities and 11K relationships – subset of SWETO ontology– domain of computer science publications
• Sample conflict detection of:– no two same papers published in different
publication venues
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Conflict Identification Results
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Statement Provenance
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Performance EvaluationTriples vs Time
4.6
51849588 5
.955244091
6.5
08925185
6.7
30027336
6.8
08724663
6.8
75508877
6.8
9743981
7.0
30853263
7.0
75019174
7.0
49989218
0
1
2
3
4
5
6
7
8
0 200 400 600 800 1000 1200
No of Triples
Lo
g (
Tim
e in
milliseco
nd
s)
with increase in number of conflicts (500 triples)
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Conclusions and Discussion
• Defined types of conflicts • Described a rule-based approach to identify
the conflicts
Findings:• Scalability limited by other tools (Mandarax)• Applicable to refining extraction-based
approaches for populating ontologies• Very domain-dependent and subjective
method
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Comments, Questions, …