Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli...
-
Upload
philippa-walker -
Category
Documents
-
view
215 -
download
0
Transcript of Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli...
Semantic MatchingSemantic MatchingFausto Giunchiglia
work in collaboration with Pavel Shvaiko
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
2
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Outline
Matching
Syntactic Matching
Semantic Matching
On Implementing Semantic Matching
Conclusions
3
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
MATCHING
4
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Application Domains
Generic Model Management Schema integration
Data warehouses
E-commerce
Semantic query processing
Data Coordination in P2P systems
5
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Matching Problems
1. RDB Schemas
2. OODB Schemas
3. XML Schemas
4. Concept Hierarchies
5. Ontologies
6
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Example of Matching
Arts
Organizations
Art History
Music
Baroque
History
www.google.com
Organizations
Arts&Humanities
Art History
www.yahoo.com
Design Art
Baroque
Architecture
History
Sc=1.0
Sr={}
Sc=1.0
Sr={}
Sr={}
7
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Matching
Match is an operator that takes two graph-like structures (e.g., database schemas or ontologies) and produces a mapping between elements of the two graphs that correspond semantically to each other
8
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Matching
The problem of matching can be decomposed in two steps:
Extract graphs from the data and conceptual models
Match the resulting graphs (generic matching)
9
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Matching
Mapping element is a 4-tuple < mID, Ni1, N
j2, R >, i=1...h; j=1..k;
where
mID is a unique identifier of the given mapping element;
Ni1 is the i-th node of the first graph, h is the number of nodes in the first graph;
Nj2 is the j-th node of the second graph, k is the number of nodes in the second graph
R specifies a similarity relation of the given nodes
Mapping is a set of mapping elements
Matching is the process of discovering mappings between two graphs through the application of a matching algorithm
10
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Matching: Syntactic AND Semantic
Matching
Semantic Matching
Syntactic Matching•R is computed between labels at nodes
•R = [0,1]
•R is computed between concepts at nodes
•R = {set-theoretic relations, e.g., =, , , , }
11
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
SYNTACTIC MATCHING
12
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Syntactic Matching
Mapping element is a 4-tuple < mID, Li1, L
j2, R >, where
Li1 is the label at the i-th node of the first graph;
Lj2 is the label at the j-th node of the second graph;
R specifies a similarity relation in the form of a coefficient, which measures the similarity between the labels of the given nodes
Example: R is a similarity coefficient in [0,1]
R = <m21,telephone, phone, 0.7>
13
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Example: Cupid (tentative links)
Arts
Organizations
Art History
Music
Baroque
History
www.google.com
Organizations
Arts&Humanities
Art History
www.yahoo.com
Design Art
Baroque
Architecture
History
Sc=1.0
Sc=1.0
Sc=0.9
Sc=1.0
Sc=0.7
Sc=1.0
Sc=0.7Sc=0.7
(final result)
14
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
The State of the Art
Cupid… is a hybrid matching prototype. It exploits linguistic and structural schema matching heuristics, and computes similarity coefficients between nodes of the trees.
Similarity Flooding… is a hybrid matching prototype. It uses fix-point computation to determine correspondences between nodes of the graphs.
COMA…is a composite matching prototype. It provides an extensible library of different matchers which manipulate DAGs and supports various ways of combining final results.
As far as we know, so far only syntactic matching…
15
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
SEMANTIC MATCHING
16
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching
Mapping element is a 4-tuple < mID, Ci1, C
j2, R >, where
Ci1 is the concept of the i-th node of the first graph;
Cj2 is the concept of the j-th node of the second graph;
R specifies a similarity relation in the form of a semantic relation between the extensions of concepts at the given nodes
Possible R’s: equality {=}, overlapping {}, mismatch {}, more general/specific {, }
Example: R = <m21,telephone, phone, {=}>
17
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Examples: Analysis of Siblings
A
B C
is-a is-a
is-a is-a
E D
1
4 3
5 2
A
C B
is-a is-a
is-a is-a
E D
1
4 3
5 2
Suppose that we want to match nodes 51 and 22
Cupid: R = 0,8. This is because A1=A2, C1=C2 and we
have the same structures on both sides (no importance of order of links)
A semantic matching approach compares concepts A1C1 with A2C2 and produces C51 = C22
18
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Examples: Analysis of Ancestors. Case 1
Suppose that we want to match nodes 51 and 12
A
C
is-a
B
is-a
is-a is-a
E D
1
2
4
3
5
A
C B
is-a is-a
is-a is-a
E D
1
4 3
2 5
Cupid does not find a similarity coefficient between the nodes under consideration, due to the significant differences in structure of the given graphs
Semantic matching: The concept denoted by the label at node 51 is C1, while the concept at node 51 is C51 =
A1C1. The concept at node 12 is C12 = C2. Thus, C51 C12
19
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Examples: Analysis of Ancestors. Case 2
Suppose that we want to match nodes 51 and 52
A
C B
is-a is-a
is-a is-a
E D
1
4 3
2 5 *
A
is-a
C
is-a
is-a is-a
E D
1
2
…
3
5
Cupid: R= 0,86. This is because of the identity of labels A1=A2, C1=C2
Semantic matching: The concept at node 51 is C51 =
A1C1; while the concept at node 52 is C52 = A2*C2.
Since we have that A1=A2 and C1=C2, then C52 C51
20
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Examples: Enriched Analysis of Siblings
Suppose that we want to match nodes 21 and 22
is-a
World
Luxembourg
is-a is-a
Netherlands
Benelux
2
3
4
1
Belgium
World
is-a
1
2
Cupid: R= 0,68. This is mainly because of the entry in
the thesaurus specifying Belgium as a part of Benelux, and due to the fact that the nodes with labels Benelux1
and Belgium2 are leaves
Semantic matching: We treat C21 as Benelux1 Netherlands1 Luxembourg1 = Belgium.
Thus, C21 = C22
21
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
ON
IMPLEMENTING
SEMANTIC MATCHING
22
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
On Implementation
Semantic Matching
Structure - level
Element - level
Weak Semantics Techniques
Strong Semantics Techniques
23
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Element-level Semantic Matching
Weak Semantics Techniques Analysis of strings {=}
<phone, telephone,{=}>
Analysis of data types {=, , , , } <string, integer,{}>
<integer, real,{}>
Analysis of soundex {=}< Fausto, Phausto,{=}>
Strong Semantics Techniques Precompiled thesaurus
syn key <Discount, Rebate,{=}>
WordNet <Art_#1, Humanities_#1,{}>, where #1 … sense number 1 of the word Art according to WordNet
24
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Element-level Semantic Matching (cont.)
Semantic Relations via WordNetEquality: one concept is equal to another if there is at least one sense of the first concept, which is a synonym of the secondOverlapping: one concept is overlapped with the other if there are some senses in commonMismatch: two concepts are mismatched if they have no sense in commonMore general: one concept is more general then the other iff there exists at least one sense of the first concept that has a sense of the other as a hyponym or meronymLess general: one concept is less general than the other iff there exists at least one sense of the first concept that has a sense of the other concept as hypernym or as a holonym
25
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Structure-level Semantic Matching
We translate the matching problem, namely the two graphs (in particular, the pair of nodes submitted to matching) into a propositional formula and then check for its validity
We check for validity using SAT
26
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching Algorithm
1. Extract the two graphs
2. Compute element-level semantic matching
3. Compute concepts at nodes
4. Construct the propositional formula
5. Run SAT
6. Perform iterations
27
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching Algorithm: Example – (1)
Extract the two graphs
A
C
is-a
B
is-a
is-a is-a
E D
1
2
4
3
5
A
C B
is-a is-a
is-a is-a
E D
1
4 3
2 5
• In the case of RDB, XML and OODB schemas, it is
necessary to extract useful semantic information, for instance in the form of ontologies
28
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching Algorithm: Example – (2)Element-level semantic matching. For each node, compute semantic relations holding among all the concepts denoted by labels at nodes under consideration
A
C
is-a
B
is-a
is-a is-a
E D
1
2
4
3
5
A
C B
is-a is-a
is-a is-a
E D
1
4 3
2 5
A1 = A2
B1 = B2
C1 = C2
D1 = D2
E1 = E2
29
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching Algorithm: Example – (3)
Compute concepts at nodes. Suppose, we want to find a semantic relation between nodes 51 and 12
A
C
is-a
B
is-a
is-a is-a
E D
1
2
4
3
5
A
C B
is-a is-a
is-a is-a
E D
1
4 3
2 5
?
C11 = A1
C51 = A1 C2
C12 = C2
C51 C12
30
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching Algorithm: Example – (4)Construct the propositional formula. We translate all the semantic relations computed in step 2 into propositional formulas under the following rules:
A1 A2 A2 A1
A1 A2 A1 A2
A1 = A2 A1 A2
A1 A2 (A1 A2)
A
C
is-a
B
is-a
is-a is-a
E D
1
2
4
3
5
A
C B
is-a is-a
is-a is-a
E D
1
4 3
2 5
?
From step 2 we have: C1 C2
We want to prove that C51 C12 ( we guess relation
between nodes at this stage)
(A1 C1) C2
(C1 C2) ((A1 C1) C2)
31
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching Algorithm: Example – (5)
Run SAT
In order to prove that (C1 C2) ((A1 C1 ) C2) is
valid, we prove that its negation is unsatisfiabile
(C1 C2) ((A1 C1) C2)
SAT returns FALSE
Thus, C51 C12
32
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching Algorithm: Example – (6.1)
Iterations. Iterations are performed re-running SAT
is-a is-a
A
B
is-a
D C 4 2 3
1
F
is-a is-a
C B 2 3
1
B D C
is-a is-a
A
is-a
4 3 2
1
B D C
is-a is-a
F
is-a
4 3 2
1
B D C
is-a is-a
G
is-a
4 3 2
1
Suppose, that C21 C22
…an oracle tells us that A1 = F2 G2
After this additional analysis we can infer C21 = C22
33
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Semantic Matching Algorithm: Example – (6.2)Iterations. …to use the result of a previous match
A
C F
is-a is-a
is-a is-a
E D
1
5 4
3 2
A
C B
is-a is-a
is-a is-a
E D
1
5 4
3 2
Suppose, that F1 B2
Having found that C41 C42
We can automatically infer that C51 C52
34
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Example: Cupid vs. Semantic Matching
Arts
Organizations
Art History
Music
Baroque
History
www.google.com
Organizations
Arts&Humanities
Art History
www.yahoo.com
Design Art
Baroque
Architecture
History{}
{}
{}
{}
{}
{}
35
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Conclusions
We have made a rational reconstruction of the major matching problems and articulated them in terms of the more generic problem of matching graphs
We have identified semantic matching as a new approach for performing generic matching
We have proposed an implementation of semantic matching using SAT
36
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
Future Work
Extend to a full graph matcher
How to extract semantics from schemas
Study how to take into account attributes and instances
Develop an efficient implementation of the system
Do a thorough testing of the system
37
The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003
References
Project website: http://www.dit.unitn.it/~p2p/
F. Giunchiglia, P.Shvaiko “Semantic Matching”. Technical Report #DIT-03-013, Trento, 2003. Also to appear in Proc. of ODS at IJCAI – 03.
F. Giunchiglia, I. Zaihrayeu “Making peer databases interact – a vision for an architecture supporting data coordination” In Proc. Of the Conference of Information Agents (CIA 2002), Madrid, 2002