CoopIS’2001Trento, Italy
The Use of Machine-Generated Ontologies in Dynamic Information Seeking
Giovanni ModicaAvigdor Gal
Hasan M. Jamil
CoopIS’2001Trento, Italy
Motivating example
CoopIS’2001Trento, Italy
PreliminariesDefinition: An ontology is an explicit representation of a conceptualization. (Gruber 1993)Conjecture I: Applications in a given domain base their information exchange on some (shared) underlying ontology.Observation: Application in a given domain use different ontology representation.Conjecture II: Given an application A such that A utilizes an ontology representation OA, and an ontology O, there exists an invertible mapping fA such that
fA(OA)=O
CoopIS’2001Trento, Italy
Problem descriptionGiven two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that
fBA (OB)=OA
In a perfect world:– O is known.
– fA is known.
– fB is known.
OA= fA-1(fB(OB))
Alas:– O is unknown. At best, an approximation of O exists, in a
form of a standard.
– fA and fB are unknown: lack of documentation, the mental state of a designer, etc.
CoopIS’2001Trento, Italy
Proposed solution
Given two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that
fBA depends on the ontology representation.
A matching is associated with a “degree of confidence” in the matching.
0 identifies non-matching terms.1 identifies a crisp matching.
]1,0[: BABA OOf
CoopIS’2001Trento, Italy
Ontology representation
Dynamic information seeking:– HTML forms
• Labels• Input fields• Scripts
– Assumptions:• Labels represent terms in an ontology (e.g., Pick-up Date).• Input fields provide constraints on the value domains (e.g., {Day, 1,…
31}).• Scripts, among other things, suggest a precedence relationship (e.g.,
Pick-up Locations is required before selecting a Car Type).
CoopIS’2001Trento, Italy
Ontology representation
Conceptual modeling approachBased on Bunge:– Terms (things)– Values– Composition– Precedence
CoopIS’2001Trento, Italy
Ontology extraction and matchingURL (e.g. http://www.avis.com)
HTMLParsing
DOMTree
Phase 1Parsing
Phase 2Labeling
HTML Elements
Label Identification
FORM Elements
rules
Form Renderin
g
Phase 3Ontology
Phase 4Merging
KB
KB Submission
Matching Algorithms
Target/Candidate Ontology
Target Ontology
CandidateOntology
Refined Ontology
Ontology Creation
Thesaurus
CoopIS’2001Trento, Italy
Phase 1: Parsing
CoopIS’2001Trento, Italy
Phase 2: Labeling
CoopIS’2001Trento, Italy
Phase 2: Labeling
CoopIS’2001Trento, Italy
Phase 2: Labeling
CoopIS’2001Trento, Italy
MergingHeuristics for the ontology merging (Frakes and Baeza-Yates,
1992): Textual matching: Date date Pickup pickup Ignorable characters removal: *Country country De-hyphenation: Pick-up Pickup Pickup Pick up Stop terms removal:
Date of Return Return DateStop terms: a, to, do, does, the, in, or, and, this, those, that,
… etc. Substring matching: Pickup Location Code Pick-up location
(66%) Content matching:
Dropoff Day (1,..,31) Return Day (1,..,31) (100%)Dropoff Return
Thesaurus matching: Dropoff Location Return Location (100%)
CoopIS’2001Trento, Italy
Phase 4: Merging
CoopIS’2001Trento, Italy
Preliminary Results
Two metrics are used for performance analysis (Frakes and Baeza-Yates, 1992):
Recall (completeness) Precision (soundness)
Parameters: tr : number of terms retrieved tm : number of terms matched te : number of terms effectively matched
r
m
t
tR
m
e
t
tP Recall: Precision:
CoopIS’2001Trento, Italy
Preliminary Results
RPb
PRbE
2
2 )1(1
Example: # of terms in Ontology1: 20# of matches identified: 15 Recall: 75%(15/20)# of effective matches: 10 Precision: 66%
(10/15)A third metric is used to compare the recall and precision. For a precision value P, a recall value R and an importance measure b, the combined metric E is calculated as (Frakes and Baeza-Yates, 1992):
CoopIS’2001Trento, Italy
Preliminary Results
Precision vs. Recall (Avis & Hertz)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Textual I gn. Chars. De-hyph. StopTerms Substring SubstringNames
Content Thesaurus
Recall
Precision
Strategy Recall Precision ETextual 0.3 0.33 0.673913Ign. Chars. 0.3 0.33 0.673913De-hyph. 0.6 0.33 0.634146StopTerms 0.6 0.33 0.634146Substring 0.75 0.40 0.558824Substring Names 0.75 0.67 0.318182Content 0.65 0.92 0.148472Thesaurus 0.65 0.92 0.148472
CoopIS’2001Trento, Italy
Preliminary Results
E Metric for Hertz vs. Alamo (b=0.5)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Textual Ign.Chars.
De-hyph. StopTerms Substring SubstringNames
Content Thesaurus
Hertz
Alamo
CoopIS’2001Trento, Italy
Preliminary Results
Learning from Thesaurus
0.389534884
0.479166667
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
No Thesaurus Improved Thesarus
E (b=0.5)
CoopIS’2001Trento, Italy
Summary and Future Work
We have introduced:– Automatic ontology creation– Automatic matching process– Preliminary results
Future work oriented towards:– Incorporation of query facilities into the tool– Automatic navigation of web sites for ontology extraction– Dynamic translation between queries against the target ontology to
queries against the multiple candidate ontologies
Top Related