Annotation. Traditional genome annotation BLAST Similarities.
Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching 1 Label...
-
Upload
kelley-bailey -
Category
Documents
-
view
221 -
download
1
Transcript of Serena SorrentinoLabel Normalization and Lexical Annotation for Schema and Ontology Matching 1 Label...
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
1
Label Normalization and Label Normalization and Lexical Annotation Lexical Annotation
for Schema and Ontology Matching for Schema and Ontology Matching
International Doctorate School inInformation and Communication Technologies
Università degli Studi di Modena e Reggio Emilia
Serena Sorrentino
XXIII Cycle
Computer Engineering and Science
Advisor: Prof. Sonia Bergamaschi
Co-Advisor: Prof. Sanda Harabagiu
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
OutlineOutline
2
Conclusion & Future WorkConclusion & Future Work
OverviewOverview
Schema MatchingSchema Matching
Lexical AnnotationLexical Annotation
The MOMIS Data Integration SystemThe MOMIS Data Integration System
Open Problems and ContributionsOpen Problems and Contributions
Semi-Automatic Lexical AnnotationSemi-Automatic Lexical Annotation
Schema Label NormalizationSchema Label Normalization
Uncertainty in Automatic AnnotationUncertainty in Automatic Annotation
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Schema Matching - DefinitionSchema Matching - Definition
Schema matching Schema matching is the task of finding the semantic correspondences (mappings) between elements of two schemata
Auxiliary Information: dictionaries, thesauri, user input …
Auxiliary Information: dictionaries, thesauri, user input …
Schema Information: element names, data types, constraints…
Schema Information: element names, data types, constraints…
Instance Information: used to characterize the content and semantics of schema elements
Instance Information: used to characterize the content and semantics of schema elements
Match Result: is defined as a set of mapping elements each of which specifies that certain elements of S1 are mapped to certain elements of S2
Match Result: is defined as a set of mapping elements each of which specifies that certain elements of S1 are mapped to certain elements of S2
InputInput OutputOutput
3
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Lexical Annotation for Schema MatchingLexical Annotation for Schema Matching
4
Lexical Annotation of schema labels is the explicit assignment of meanings w.r.t. a reference lexical thesaurus (WordNet in our case)
Lexical relationships (inter-schema knowledge):• SYN SYN (Synonym-of) between two synonym terms• BT (BT (Broader Term) between two terms where the first generalizes the second (the opposite is NT- Narrower Term)• RTRT(Related Term) between two terms that are generally used together in the same context
[ S.Bergamaschi, S.Castano, M.Vincini, D.Beneventano. Semantic integration of heterogeneous information sources. DKE Journal, 2001]
Schema derived relationships (intra-schema knowledge):• BT/NT (BT/NT ( from ISA relationships, and from Foreign Key (FK) in relational sources when it is a Primary Key in both the original and referenced relation)• RTRT (from nested elements in XML files and from FK in relational sources)
DBGroup Approach: DBGroup Approach: starting from “hidden” meanings associated to schema schema labels labels (i.e. class and attribute names, also called terms), it is possible to discover lexical relationships among schema elements
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Lexical Annotation - ExampleLexical Annotation - Example
5
Schema Labels
Meaning (Synsets in WordNet) Customer Client
someone who pays for goods or services
a person who seeks the advice of a lawyer
any computer that is hooked up to a computer network
√ √ √ √ √ √
√ √
Lexical Annotation
Customer ClientSYN
Client#2
Client#3
Customer#1 Client#1
Same Synset
…
…
hyponym
meronymy
hypernym
holonym
…
Lexical Relationship
Discovery
• SYN SYN synonym in WordNet• BT/NTBT/NT hypernym/hyponym WordNet relationship• RTRT meronym relationship (part of) or sibling in WordNet
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
The MOMIS Data Integration SystemThe MOMIS Data Integration System
6
MANUAL LEXICALANNOTATION
AUTOMATIC LEXICALANNOTATION
INFERRED RELATIONSHIPS
LEXICAL RELATIONSHIPS
SCHEMA DERIVED RELATIONSHIPS
CommonThesaurus
COMMON THESAURUS GENERATION
USER SUPPLIED RELATIONSHIPS
LOCAL SCHEMA N
GLOBAL SCHEMA GENERATION
clustersgeneration
WRAPPING
LOCAL SCHEMA 1
…
RDB
<XML>
<DATA>
SYNSET2
SYNSET#
SYNSET3
SYNSET1
MAPPING TABLES
GLOBAL CLASSES
The MOMIS System (Mediator EnvirOment for Multiple Information Sources) is an I3 framework designed for the integration of structured and semi-structured data sources
6
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Open Problems and Contributions: Automatic Lexical AnnotationOpen Problems and Contributions: Automatic Lexical Annotation
7
…
…
…
Schema S1Schema S1 Schema S2Schema S2
CLIENT_IDNAMEADDRESS
CLIENT
COUNTRYCITY
PO_ID
STREET_ADDRESS
PO_IDPRODUCT_CODE
PURCHASE_ORDER
QTY
TSP_INFO
INVOCE_NR
PRICE
… …
Non-Dictionary Words. i.e., Compound Nouns(CNs) , abbreviations, acronyms: need to normalize schema labels
Non-Dictionary Words. i.e., Compound Nouns(CNs) , abbreviations, acronyms: need to normalize schema labels
Fully Automatic Annotation (i.e. “on-the-fly”) is intrinsically uncertaint: need of dealing with uncertain annotations
Fully Automatic Annotation (i.e. “on-the-fly”) is intrinsically uncertaint: need of dealing with uncertain annotations
Manual Annotation is a boring and not scalable task we need of a method to perform Automatic or Semi-automatic Annotation
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
OutlineOutline
8
Conclusion & Future WorkConclusion & Future Work
OverviewOverview
Schema MatchingSchema Matching
Lexical AnnotationLexical Annotation
The MOMIS Data Integration SystemThe MOMIS Data Integration System
Open Problems and ContributionsOpen Problems and Contributions
Semi-Automatic Lexical AnnotationSemi-Automatic Lexical Annotation
Schema Label NormalizationSchema Label Normalization
Uncertainty in Automatic AnnotationUncertainty in Automatic Annotation
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Word Sense Disambiguation for Semi-Automatic Lexical Word Sense Disambiguation for Semi-Automatic Lexical AnnotationAnnotation
WSD (Word Sense Disambiguation) is the ability of identifying the meanings of words in a context by a computational technique [R. Navigli, Word sense disambiguation: A survey. ACM Comput. Surv., 2009 ]
9
The semi-automatic CWSD (Combined Word Sense Disambiguation) method:
associates to each label, one/more WordNet meanings
combines two WSD algorithms:
SD (Structural Disambiguation) exploits the schema derived relationships WND (WordNet domains Disambiguation) exploits WordNet Domains [B. Magnini, et al.,The role of domain information in Word Sense Disambiguation, Journal of Natural Language Engineering, 2002 ]
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
The CWSD methodThe CWSD methodSOURCES
SCHEMA DERIVED RELATIONSHIP EXTRACTION
(Automatic Wrapping)
1
CLASS AND ATTRIBUTE NAMES EXTRACTION
(Automatic Wrapping)
1
SD
Algorithm
WND
Algorithm
CWSD
LEXICAL
RELATIONSHIPS
43
ANNOTATED SCHEMATA
AA
AA AA
INTEGRATIONDESIGNER
Selects relevant domains
10
CommonThesaurus
2
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
We experimented CWSD over a real data set: three level of a subtree of the Yahoo and Google directories (“society and culture” and “society”, respectively)
Experimental EvaluationExperimental Evaluation
WSD Algorithm
Recall Precision F-Measure
SD 0.08 0.97 0.15
WND 0.67 0.70 0.68
CWSD 0.74 0.74 0.74
11
Publications related to CWSD: • S.Bergamaschi, L.Po, S.Sorrentino. Automatic Annotation in Data Integration Systems. OTM Workshops 2007OTM Workshops 2007• S.Bergamaschi, L.Po, A.Sala, S.Sorrentino. Data source annotation in data integration systems. DBISP2P DBISP2P 20072007
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
OutlineOutline
12
Conclusion & Future WorkConclusion & Future Work
OverviewOverview
Schema MatchingSchema Matching
Lexical AnnotationLexical Annotation
The MOMIS Data Integration SystemThe MOMIS Data Integration System
Open Problems and ContributionsOpen Problems and Contributions
Semi-Automatic Lexical AnnotationSemi-Automatic Lexical Annotation
Schema Label NormalizationSchema Label Normalization
Uncertainty in Automatic AnnotationUncertainty in Automatic Annotation
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Schema label normalization: Schema label normalization: is the reduction of each label to some standardized form that can be easily recognized
In our caseIn our case: the process of abbreviation expansion and CN (Compound Noun) annotation
Schema Label NormalizationSchema Label Normalization
a- Discovered relationships without Schema normalizationa- Discovered relationships without Schema normalization b- Discovered relationships with Schema normalizationb- Discovered relationships with Schema normalization
Legenda
Right RelationshipFalse Negative RelationshipFalse Positive Relationship
POPO PurchaseOrderPurchaseOrder
SYNSYN
SYN
SYN
SYN
SYN
SYN
SYN
SYN
SYN
POPO PurchaseOrderPurchaseOrder
13
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
The Schema Label Normalization methodThe Schema Label Normalization method
14
SelectingSelecting the labels to be normalized
TokenizingTokenizing labels into separated words
IdentifyingIdentifying abbreviations and CNs among the tokenized words
SelectingSelecting the labels to be normalized
TokenizingTokenizing labels into separated words
IdentifyingIdentifying abbreviations and CNs among the tokenized words
Maciej Gawinecki’s presentation
Maciej Gawinecki’s presentation
Interpreting Interpreting CNs Creating new Creating new
WordNet entries and WordNet entries and meanings meanings for the CNs
Interpreting Interpreting CNs Creating new Creating new
WordNet entries and WordNet entries and meanings meanings for the CNs
We propose a semi-automatic schema label normalization method which is composed by three phases:
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
CN AnnotationCN Annotation
Compound Noun (CN): is a term composed of two or more words called constituents
Endocentric CNs: they consist of a headhead (i.e. the part that contains the basic meaning of the CN) and modifiersmodifiers, which restrict this meaning. Eg. “delivery company”
Our method can be summed up into four main stepsfour main steps
15
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
1.CN constituent disambiguation 1.CN constituent disambiguation
• head and modifiers disambiguationhead and modifiers disambiguation: by applying CWSD
2.Redundant constituent identification and pruning 2.Redundant constituent identification and pruning
• Redundant words: Redundant words: words that do not contribute new information, i.e. derived from the schema or from the lexical thesaurus
• E.g. the attribute “company address” of the class “company”: “company” is not considered as the relationship holding among a class and its attributes is implicit in the schema
CN constituent disambiguation & pruningCN constituent disambiguation & pruning
16
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
CN interpretation via semantic relationshipsCN interpretation via semantic relationships3. CN interpretation: selecting, among a set of predefined semantic relationships in our case the nine Levi’s relationships (CAUSE, HAVE, MAKE, IN, FOR, ABOUT, USE, BE, FROM) [Levi, J. N., The Syntax and Semantics of Complex Nominals. Academic Press, 1978]) the one that best captures the relationship between the head and the modifier
Intuition: the semantic relationship between head and modifier is the same holding between their unique beginners (i.e., the 25 top concepts in the noun WordNet hierarchy) we manually select the correct Levi’s relationship only for the couple of unique beginners
Group#1
hyponym …
Institution#1
hyponym …
Company#1
Act#2
hyponym
Delivery#1
MAKE
MAKE
hyponym
Transport#1
…
…
17
• they are suitable to interpret couple of unique beginners• a detailed and fine interpretation is not required in our context • they can be used during the CN gloss definition
Why Levi’s relationships?:
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Creation of a new WN meaning for a CNCreation of a new WN meaning for a CN
4.a Gloss definition4.a Gloss definition Company#1 GlossDelivery #1 Glossan institution created to conduct business
the act of delivering or distributing something
++
Modifier MAKE Head
an institution created to conduct business make the act of delivering or distributing something
Delivery_Company Delivery_Company Gloss:Gloss:
4.b Inclusion of the new CN meaning in WN4.b Inclusion of the new CN meaning in WN
Company#1 Delivery#1
Delivery_Company#1 SYNSETµ
SYNSETβHypernym/Hyponym
Related Term
Delivery_Company#1
18
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Experimental EvaluationExperimental Evaluation
Evaluation over five different data sets (including relational and XML schemata)
Evaluating the lexical annotation process:Evaluating the lexical annotation process:
Evaluating the discovered lexical relationships:Evaluating the discovered lexical relationships:
Precision Recall F-Measure
Lexical Annotation without Normalization 0.78 0.36 0.49
Lexical Annotation with Normalization 0.71 0.66 0.68
Precision Recall F-Measure
Relationships discovered without Normalization 0.52 0.47 0.49
Relationships discovered with Normalization 0.79 0.75 0.77
19
Publications related to Schema Label Normalization :• S.Sorrentino, S.Bergamaschi, M.Gawinecki, L.Po, Schema Label Normalization for Improving Schema Matching, DKE Journal, 2010.DKE Journal, 2010.• S.Sorrentino, S.Bergamaschi, M.Gawinecki, L.Po , Schema Label Normalization for Improving Schema Matching, ER 2009 ER 2009
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
OutlineOutline
20
Conclusion & Future WorkConclusion & Future Work
OverviewOverview
Schema MatchingSchema Matching
Lexical AnnotationLexical Annotation
The MOMIS Data Integration SystemThe MOMIS Data Integration System
Open Problems and ContributionsOpen Problems and Contributions
Semi-Automatic Lexical AnnotationSemi-Automatic Lexical Annotation
Schema Label NormalizationSchema Label Normalization
Uncertainty in Automatic AnnotationUncertainty in Automatic Annotation
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Uncertainty in Automatic Annotation Uncertainty in Automatic Annotation
21
In Automatic Lexical Annotation, uncertainty is assessed in terms of probability
The PWSDPWSD (Probabilistic Word Sense Disambiguation) algorithm:
automatically associates one/more WordNet meanings to schema labels
automatically assigns to each annotation a probability value that indicates the reliability of the annotation itself
is based on a probabilistic combination of different WSD algorithms
uses the Dempster-Shafer theory [Shafer, G., A Mathematical Theory of Evidence, Princeton 1976] to combine the results of the different WSD algorithms
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
ExampleExample
22
Dempster-Shafer Theory
… …
Annotations Prob. Value0.65
0.17
0.60
0.48
Source1.Book
Source1.Book
Source2.Brochure
Source2.Book Heading
Schema Elementsbook#1
book#3
brochure#1
heading#2…
meanings WSD 1 WSD 2 WSD N
label label#1 x x x
label#2
label#3 x
WSD Algorithm 170% Confidence
TERMS ANNOTED WITH ALGORITHM 1
WSD Algorithm 260% Confidence
WSD Algorithm 350% Confidence …
TERMS ANNOTED WITH ALGORITHM 2
TERMS ANNOTED WITH ALGORITHM N
SCHEMA LABELS
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Probabilistic Lexical RelationshipsProbabilistic Lexical Relationships
23
Starting from the probabilistic annotation, PWSD derives a set of probabilistic lexical relationships probabilistic lexical relationships between schema elements
0.42
0.38
0.40
0.57
0.56
0.39
0.62
0.510.78
0.64
0.23
WordNet First Sense PWSD
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Experimental ResultsExperimental Results
Evaluation on 2 relational schemata of the Amalgam integration benchmark and 3 ontologies from the benchmark OAEI’06
24
WSD method Precision Recall F-Measure
WordNet First Sense 0.75 0.54 0.63
PWSD* 0.63 0.73 0.68
WSD method Precision Recall F-Measure
WordNet First Sense 0.80 0.65 0.72
PWSD* 0.80 0.71 0.75
* Threshold = 0.2
* Threshold = 0.15
Evaluating the lexical annotation process:
Evaluating the discovered lexical relationships::
Publications related to PWSD:• L.Po, S.Sorrentino, Automatic generation of probabilistic relationships for improving schema matching, Information Systems Information Systems Journal, 2011Journal, 2011• L. Po, S.Sorrentino, S.Bergamaschi, D. Beneventano, Lexical knowledge extraction: an effective approach to schema and ontology matching, ECKM 2009ECKM 2009
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
NORMS and ALANORMS and ALA
The Schema Label Normalization functionalities have been implemented in a tool called NORMS (NORMalizer of Schemata) which allows the designer to enhance the normalized labels by correcting potential errors [S.Sorrentino, S.Bergamaschi, M.Gawinecki, NORMS: an automatic tool to perform schema label normalization, ICDE 2011ICDE 2011]
CWSD and PWSD have been implemented in a tool called ALA (Automatic Lexical Annotator). It has been integrated within the MOMIS System [S.Bergamaschi, L.Po, S.Sorrentino, A.Corni, Dealing with Uncertainty in Lexical Annotation, ERPD 2009 ERPD 2009 ]
25
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
ConclusionConclusion
26
Automatic and Semi-Automatic methods to perform Label Normalization and Lexical Annotation have been presented:
CWSD
Schema Label Normalization
PWSD
Automatic methods to extract (probabilistic) lexical relationships have been proposed and their effectiveness in order to improve schema matching has been shown
All the methods have been implemented in the context of the MOMIS Data Integration System. However, they can be applied in the general contexts of schema and ontology matching
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Future WorkFuture Work
27
New research lines:
inclusion and integration of other knowledge resources for automatic lexical annotation:
Domain-Specific Resources such as domain ontologies, domain thesauri etc. to address the problem of specific domain terms in schemata (e.g., the biomedical term “aromatase” which is an enzyme involved in the production of estrogen)
Generic resources: Wikipedia, dictionary etc.
inclusion of instance-information extraction techniques to improve the automatic annotation and relationship discovery processes and to solve the problem of non-informative schema labels
The use of RELEVANT [S. Bergamaschi, C. Sartori, F. Guerra, M. Orsini, Extracting Relevant Attribute Values for Improved Search. IEEE Internet Computing 2007], which is a tool to extract (and add to the schema) metadata about the relevant instance values of an attribute, is a promising direction
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
PublicationsPublicationsJournals: Journals: • Po, L. and Sorrentino, S. (2011). Automatic generation of probabilistic relationships for
improving schema matching. Information Systems Journal, Special Issue on Semantic Integration of Data, Multimedia, and Services, 36(2):192208
• Sorrentino, S., Bergamaschi, S., Gawinecki, M., and Po, L. (2010). Schema label normalization for improving schema matching. DKE Journal, 69(12):12541273.
International Conferences and Workshops:International Conferences and Workshops:• Sorrentino, S., Bergamaschi, S., and Gawinecki, M. (2011). NORMS: an automatic tool to
perform schema label normalization. In Press, Accepted Manuscript (Demo Paper), IEEE International Conference on Data Engineering ICDE 2011ICDE 2011, April 11-16, Hannover.
• Sorrentino, S., Bergamaschi, S., Gawinecki, M., and Po, L. (2009). Schema normalization for improving schema matching. In proceedings of the 28th International Conference on Conceptual Modeling, ER 2009ER 2009, Gramado, Brasil, 9-12 November, pages 280-293.
• Beneventano, D., Bergamaschi, S., and Sorrentino, S. (2009) Extending WordNet with compound nouns for semi-automatic annotation in data integration systems. In proceeding of the IEEE NLP-KE IEEE NLP-KE Conference, Dalian, China, 24-27 September 2009.
• Bergamaschi, S., Po, L., Sorrentino, S., and Corni, A. (2009). Dealing with Uncertainty in Lexical Annotation. Revista de Informatica Terica e Aplicada, RITA, ER 2009 Poster and ER 2009 Poster and Demonstrations Demonstrations Session,16(2):9396.
28
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
PublicationsPublications
• Beneventano, D., Orsini, M., Po, L., Antonio, S., and Sorrentino, S. (2009). An ontology-based data integration system for data and multimedia sources. In Proceeding of the Third International Conference on Semantic Computing, IEEE ICSC 2009IEEE ICSC 2009, Berkeley, CA, USA - September 14-16, pages 606-611. IEEE Computer Society.
• Beneventano, D., Orsini, M., Po, L., and Sorrentino, S. (2009). The MOMIS-STASIS approach for Ontology-Based Data Integration. In proceedings of the 1st International Workshop on Interoperability through Semantic Data and Service Integration, ISDSI 2009ISDSI 2009, Camogli (GE), Italy June 25.
• Po, L., Sorrentino, S., Bergamaschi, S., and Beneventano, D. (2009). Lexical knowledge extraction: an effective approach to schema and ontology matching. Proceedings of the European Conference on Knowledge Management, ECKM 2009ECKM 2009, 3-4 September Vicenza.
• Bergamaschi, S., Po, L., Sala, A., and Sorrentino, S. (2007). Data source annotation in data integration systems. In Proceedings of the fifth International Workshop on Databases, Information Systems and Peer- to -Peer Computing, DBISP2PDBISP2P, at 33st International Conference on Very Large Data Bases (VLDB 2007), University of Vienna, Austria, September 24.
• Bergamaschi, S., Po, L., and Sorrentino, S. (2007). Automatic Annotation in Data Integration Systems. In Proceeding of the OTM WorkshopsOTM Workshops, Portugal, November 27-28.
29
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
PublicationsPublications
National Conferences National Conferences
• Bergamaschi, L. Po, S. Sorrentino, A. Corni, "Uncertainty in data integration systems: automatic generation of probabilistic relationships", VI Conference of the Italian Chapter of AIS, ITAIS ITAIS 2009, , Costa Smeralda, Italy, October 2-3 2009.
• S. Bergamaschi, S. Sorrentino, "Semi-automatic compound nouns annotation for data integration systems", Proceedings of the 17th Italian Symposium on Advanced Database Systems, SEBD SEBD 2009, Camogli (Genova), Italy 21-24 June 2009.
• S. Bergamaschi, L. Po, and S. Sorrentino, "Automatic annotation for mapping discovery in data integration systems", Proceedings of the Sixteenth Italian Symposium on Advanced Database Systems, SEBD SEBD 2008, Mondello (Palermo), Italy, 22-25 June 2008 (pp 334-341).
•
Book ChaptersBook Chapters
• Bergamaschi, S., Beneventano, D., Po, L., Sorrentino, S. (2011). Automatic Schema Mapping through Normalization and Annotation. In Press, in Second Search Computing Workshop: Challenges and Directions, 2010, LNCS State-of-the-Art Survey.
• Bergamaschi S., Po L., Sorrentino S., Corni A.. “Uncertainty in data integration systems: automatic generation of probabilistic relationships”, to appeat at Management of the Interconnected World (A. D’Atri, M. De Marco, A. Maria Braccini, F. Cariddu eds.), Springer, ISBN/ISSN: 978-3-7908-2403-2, 2010.
30
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
ProjectsProjects
31
NeP4B - Networked Peers for Business, MIUR funded research project – FIRB 2005 (2006- 2009) (http://www.dbgroup.unimo.it/nep4b)
STASIS - SofTware for Ambient Semantic Interoperable Services - Project FP6-2005-IST-5-034980 (2006-2008) (http://www.dbgroup.unimo.it/stasis/)
“Searching for a needle in mountains of data!” project funded by the Fondazione Cassa di Risparmio di Modena within the Bando di Ricerca Internazionale (2008-2010) (http://www.dbgroup.unimo.it/keymantic)
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Thanks for your attentionThanks for your attention!!
32
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Evaluation MeasuresEvaluation Measures
33
FN:False Negative TP: True PositiveFP: False PositiveTN: True Negative
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Unique beginnersUnique beginners
• The top level concepts of the WordNet hierarchy are the 25 unique beginners (e.g., act, animal, artifact etc.) for WordNet English nouns defined in [Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K., WordNet: An on-line lexical database. International Journal of Lexicography, 1990]
34
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Levi’s relationships setLevi’s relationships set
35
M = ModifierH = Head
[Levi, J. N., The Syntax and Semantics of Complex Nominals. Academic Press, 1978]
Serena Sorrentino Label Normalization and Lexical Annotation for Schema and Ontology Matching
Dempster-Shafer theoryDempster-Shafer theory
36
The Dempster-Shafer theory is a mathematical theory of evidence. It allows to combine evidence from different sources: by using this theory for each algorithm, we assign a probability mass function m(·) to the set of all possible meanings for the term under consideration
• The mass function of the WSD algorithms are combined by using the Dempster’s rule of combination
• In the end, to obtain the probability assigned to each meaning, the belief mass function concerning a set of meanings is split