Download - The Use of Machine-Generated Ontologies in Dynamic Information Seeking

CoopIS’2001Trento, Italy

The Use of Machine-Generated Ontologies in Dynamic Information Seeking

Giovanni ModicaAvigdor Gal

Hasan M. Jamil


Motivating example


PreliminariesDefinition: An ontology is an explicit representation of a conceptualization. (Gruber 1993)Conjecture I: Applications in a given domain base their information exchange on some (shared) underlying ontology.Observation: Application in a given domain use different ontology representation.Conjecture II: Given an application A such that A utilizes an ontology representation OA, and an ontology O, there exists an invertible mapping fA such that

fA(OA)=O


Problem descriptionGiven two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that

fBA (OB)=OA

In a perfect world:– O is known.

– fA is known.

– fB is known.

OA= fA-1(fB(OB))

Alas:– O is unknown. At best, an approximation of O exists, in a

form of a standard.

– fA and fB are unknown: lack of documentation, the mental state of a designer, etc.


Proposed solution

Given two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that

fBA depends on the ontology representation.

A matching is associated with a “degree of confidence” in the matching.

0 identifies non-matching terms.1 identifies a crisp matching.

]1,0[: BABA OOf


Ontology representation

Dynamic information seeking:– HTML forms

• Labels• Input fields• Scripts

– Assumptions:• Labels represent terms in an ontology (e.g., Pick-up Date).• Input fields provide constraints on the value domains (e.g., {Day, 1,…

31}).• Scripts, among other things, suggest a precedence relationship (e.g.,

Pick-up Locations is required before selecting a Car Type).


Ontology representation

Conceptual modeling approachBased on Bunge:– Terms (things)– Values– Composition– Precedence


Ontology extraction and matchingURL (e.g. http://www.avis.com)

HTMLParsing

DOMTree

Phase 1Parsing

Phase 2Labeling

HTML Elements

Label Identification

FORM Elements

rules

Form Renderin

g

Phase 3Ontology

Phase 4Merging

KB

KB Submission

Matching Algorithms

Target/Candidate Ontology

Target Ontology

CandidateOntology

Refined Ontology

Ontology Creation

Thesaurus


Phase 1: Parsing


Phase 2: Labeling


MergingHeuristics for the ontology merging (Frakes and Baeza-Yates,

1992): Textual matching: Date date Pickup pickup Ignorable characters removal: *Country country De-hyphenation: Pick-up Pickup Pickup Pick up Stop terms removal:

Date of Return Return DateStop terms: a, to, do, does, the, in, or, and, this, those, that,

… etc. Substring matching: Pickup Location Code Pick-up location

(66%) Content matching:

Dropoff Day (1,..,31) Return Day (1,..,31) (100%)Dropoff Return

Thesaurus matching: Dropoff Location Return Location (100%)


Phase 4: Merging


Preliminary Results

Two metrics are used for performance analysis (Frakes and Baeza-Yates, 1992):

Recall (completeness) Precision (soundness)

Parameters: tr : number of terms retrieved tm : number of terms matched te : number of terms effectively matched

r

m

t

tR

m

e

t

tP Recall: Precision:


Preliminary Results

RPb

PRbE

2

2 )1(1

Example: # of terms in Ontology1: 20# of matches identified: 15 Recall: 75%(15/20)# of effective matches: 10 Precision: 66%

(10/15)A third metric is used to compare the recall and precision. For a precision value P, a recall value R and an importance measure b, the combined metric E is calculated as (Frakes and Baeza-Yates, 1992):


Preliminary Results

Precision vs. Recall (Avis & Hertz)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Textual I gn. Chars. De-hyph. StopTerms Substring SubstringNames

Content Thesaurus

Recall

Precision

Strategy Recall Precision ETextual 0.3 0.33 0.673913Ign. Chars. 0.3 0.33 0.673913De-hyph. 0.6 0.33 0.634146StopTerms 0.6 0.33 0.634146Substring 0.75 0.40 0.558824Substring Names 0.75 0.67 0.318182Content 0.65 0.92 0.148472Thesaurus 0.65 0.92 0.148472


Preliminary Results

E Metric for Hertz vs. Alamo (b=0.5)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Textual Ign.Chars.

De-hyph. StopTerms Substring SubstringNames

Content Thesaurus

Hertz

Alamo


Preliminary Results

Learning from Thesaurus

0.389534884

0.479166667

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

No Thesaurus Improved Thesarus

E (b=0.5)


Summary and Future Work

We have introduced:– Automatic ontology creation– Automatic matching process– Preliminary results

Future work oriented towards:– Incorporation of query facilities into the tool– Automatic navigation of web sites for ontology extraction– Dynamic translation between queries against the target ontology to

queries against the multiple candidate ontologies