Automatic Schema Matching Nicole Oldham CSCI 8350 (Semantic Web Course @ Univ of Georgia) Topic...
-
Upload
lindsay-caren-lamb -
Category
Documents
-
view
215 -
download
0
Transcript of Automatic Schema Matching Nicole Oldham CSCI 8350 (Semantic Web Course @ Univ of Georgia) Topic...
Automatic Schema MatchingAutomatic Schema Matching
Nicole OldhamCSCI 8350
(Semantic Web Course Univ of Georgia)Topic Presentation
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Schema MatchingSchema Matching
bull Match Takes two schemas as input and produces a mapping between the elements that correspond to each other semantically
bull It is usually performed manually- Tedious- Time Consuming- Error Prone- Expensive
We must automate this process
ExampleExample
bull GTE telecommunications needed to integrate 40 databases with a total of 27000 elements
bull Project planners estimated that manual matching would take 12 person years to integrate
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Various Levels of HeterogenityVarious Levels of Heterogenity
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
How to deal with Semantic How to deal with Semantic HeterogenityHeterogenity
1 Standardize agree on a common representation
2 Translate create mappings between different schemas1048766 -requires human input and machine reasoning1048766 -mappings can be difficult and expensive
3 Annotate create relationships between agreed upon conceptualizations
1048766 -requires human input and machine reasoning1048766 -annotation can be difficult and expensive1048766
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
ChallengesChallengesbull Actual semantics of the involved elements are typically only from the
creators or documentation ndash so we must use clues in the schema and data instead
bull These clues are often misleading bull Ie lsquoArearsquo can refer to different entitiesbull Ie The same entities can have very different names
bull Clues are often ambiguousbull Ie lsquoContact-agentrsquo Agent name or phone number
bull Matching process can be very costlybull Each element of the schema must be examined to ensure discovery of
the best match
bull Matching is often subjective depending on the application
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Schema MatchingSchema Matching
bull Match Takes two schemas as input and produces a mapping between the elements that correspond to each other semantically
bull It is usually performed manually- Tedious- Time Consuming- Error Prone- Expensive
We must automate this process
ExampleExample
bull GTE telecommunications needed to integrate 40 databases with a total of 27000 elements
bull Project planners estimated that manual matching would take 12 person years to integrate
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Various Levels of HeterogenityVarious Levels of Heterogenity
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
How to deal with Semantic How to deal with Semantic HeterogenityHeterogenity
1 Standardize agree on a common representation
2 Translate create mappings between different schemas1048766 -requires human input and machine reasoning1048766 -mappings can be difficult and expensive
3 Annotate create relationships between agreed upon conceptualizations
1048766 -requires human input and machine reasoning1048766 -annotation can be difficult and expensive1048766
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
ChallengesChallengesbull Actual semantics of the involved elements are typically only from the
creators or documentation ndash so we must use clues in the schema and data instead
bull These clues are often misleading bull Ie lsquoArearsquo can refer to different entitiesbull Ie The same entities can have very different names
bull Clues are often ambiguousbull Ie lsquoContact-agentrsquo Agent name or phone number
bull Matching process can be very costlybull Each element of the schema must be examined to ensure discovery of
the best match
bull Matching is often subjective depending on the application
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Schema MatchingSchema Matching
bull Match Takes two schemas as input and produces a mapping between the elements that correspond to each other semantically
bull It is usually performed manually- Tedious- Time Consuming- Error Prone- Expensive
We must automate this process
ExampleExample
bull GTE telecommunications needed to integrate 40 databases with a total of 27000 elements
bull Project planners estimated that manual matching would take 12 person years to integrate
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Various Levels of HeterogenityVarious Levels of Heterogenity
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
How to deal with Semantic How to deal with Semantic HeterogenityHeterogenity
1 Standardize agree on a common representation
2 Translate create mappings between different schemas1048766 -requires human input and machine reasoning1048766 -mappings can be difficult and expensive
3 Annotate create relationships between agreed upon conceptualizations
1048766 -requires human input and machine reasoning1048766 -annotation can be difficult and expensive1048766
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
ChallengesChallengesbull Actual semantics of the involved elements are typically only from the
creators or documentation ndash so we must use clues in the schema and data instead
bull These clues are often misleading bull Ie lsquoArearsquo can refer to different entitiesbull Ie The same entities can have very different names
bull Clues are often ambiguousbull Ie lsquoContact-agentrsquo Agent name or phone number
bull Matching process can be very costlybull Each element of the schema must be examined to ensure discovery of
the best match
bull Matching is often subjective depending on the application
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
ExampleExample
bull GTE telecommunications needed to integrate 40 databases with a total of 27000 elements
bull Project planners estimated that manual matching would take 12 person years to integrate
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Various Levels of HeterogenityVarious Levels of Heterogenity
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
How to deal with Semantic How to deal with Semantic HeterogenityHeterogenity
1 Standardize agree on a common representation
2 Translate create mappings between different schemas1048766 -requires human input and machine reasoning1048766 -mappings can be difficult and expensive
3 Annotate create relationships between agreed upon conceptualizations
1048766 -requires human input and machine reasoning1048766 -annotation can be difficult and expensive1048766
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
ChallengesChallengesbull Actual semantics of the involved elements are typically only from the
creators or documentation ndash so we must use clues in the schema and data instead
bull These clues are often misleading bull Ie lsquoArearsquo can refer to different entitiesbull Ie The same entities can have very different names
bull Clues are often ambiguousbull Ie lsquoContact-agentrsquo Agent name or phone number
bull Matching process can be very costlybull Each element of the schema must be examined to ensure discovery of
the best match
bull Matching is often subjective depending on the application
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Various Levels of HeterogenityVarious Levels of Heterogenity
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
How to deal with Semantic How to deal with Semantic HeterogenityHeterogenity
1 Standardize agree on a common representation
2 Translate create mappings between different schemas1048766 -requires human input and machine reasoning1048766 -mappings can be difficult and expensive
3 Annotate create relationships between agreed upon conceptualizations
1048766 -requires human input and machine reasoning1048766 -annotation can be difficult and expensive1048766
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
ChallengesChallengesbull Actual semantics of the involved elements are typically only from the
creators or documentation ndash so we must use clues in the schema and data instead
bull These clues are often misleading bull Ie lsquoArearsquo can refer to different entitiesbull Ie The same entities can have very different names
bull Clues are often ambiguousbull Ie lsquoContact-agentrsquo Agent name or phone number
bull Matching process can be very costlybull Each element of the schema must be examined to ensure discovery of
the best match
bull Matching is often subjective depending on the application
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
How to deal with Semantic How to deal with Semantic HeterogenityHeterogenity
1 Standardize agree on a common representation
2 Translate create mappings between different schemas1048766 -requires human input and machine reasoning1048766 -mappings can be difficult and expensive
3 Annotate create relationships between agreed upon conceptualizations
1048766 -requires human input and machine reasoning1048766 -annotation can be difficult and expensive1048766
ftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
ChallengesChallengesbull Actual semantics of the involved elements are typically only from the
creators or documentation ndash so we must use clues in the schema and data instead
bull These clues are often misleading bull Ie lsquoArearsquo can refer to different entitiesbull Ie The same entities can have very different names
bull Clues are often ambiguousbull Ie lsquoContact-agentrsquo Agent name or phone number
bull Matching process can be very costlybull Each element of the schema must be examined to ensure discovery of
the best match
bull Matching is often subjective depending on the application
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
ChallengesChallengesbull Actual semantics of the involved elements are typically only from the
creators or documentation ndash so we must use clues in the schema and data instead
bull These clues are often misleading bull Ie lsquoArearsquo can refer to different entitiesbull Ie The same entities can have very different names
bull Clues are often ambiguousbull Ie lsquoContact-agentrsquo Agent name or phone number
bull Matching process can be very costlybull Each element of the schema must be examined to ensure discovery of
the best match
bull Matching is often subjective depending on the application
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Where is Schema Matching Where is Schema Matching usedused
bull Database Application Domains- Data Integration- Data Warehousing- E-Business- Query Processing
bull Semantic Web- XMLHTML to an Ontology- Semantic Web Services
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Schema IntegrationSchema Integration
Problem Construct a global view from a set of independently constructed schemas
(ie ontologies)
- Different structure and terminologies
Solution Schema Matching is performed to find relationships between concepts in each schema Then the matching elements can be unified
Bernstein P Rahm E A survey of approaches to automatic schema matching
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Data WarehousesData Warehouses
Problem Integrating data sources into a data warehouse
- Different formats between the source and warehouse
Solution Use matching to find the elements of the source that are also present in the warehouse Then the details of the semantics can be examined to integrate the two
Bernstein P Rahm E A survey of approaches to automatic schema matching
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
E-CommerceE-Commerce
Problem Message translation
-Each trading partner uses its own message format
Solution A match operation would reduce the amount of manual work to specify how the formats are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Query ProcessingQuery Processing
Problem The terms used in the userrsquos query may be different from those in the database
Solution Matching is used to map the user-specified concepts in the query to schema elements
Bernstein P Rahm E A survey of approaches to automatic schema matching
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Need for Data Integration on the Need for Data Integration on the Semantic WebSemantic Web
bull Problem Web documents are not in RDF or any form suitable for the SW
bull We must annotate them with concepts from ontologies
bull Solution Use schema matching to map between elements represented in OWL and the different schemas of web documents
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Semantic Web ServicesSemantic Web Services
bull Problem Web Services are currently searched for using keywords
bull We need to annotate the WSDLs with semantic metadata so that they can be discovered efficiently
bull WSDLs are in XML Ontologies in OWL
bull Solution Use schema matching approaches to map between the two different schemas
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Term DefinitionsTerm Definitionsbull Schema a set of elements connected by some
structure
bull Mapping a set of mapping elements each of which indicates that certain elements of schema s1 are mapped to certain elements in s2
bull Mapping Expression Tells how s1 and s2 elements are related
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
ExampleExample
A mapping between s1 and s2 might contain these elementsbull CustC=CustomerCustIDbull Concatenate(CustFirstName CustLastName) = Customercontactbull CustCName = CustomerCompany
S1 Elements S2 Elements
Cust Customer
C CustID
CName Company
FirstName Contact
LastName Phone
Bernstein P Rahm E A survey of approaches to automatic schema matching
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
ExampleExample
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
bull Instance vs Schema matching approaches can consider instance data or schema-level information
bull Element vs Structure matching match can be performed for individual schema elements or combinations of elements
bull Language vs Constraint linguistic (names) or constraint-based (keys and relationships)
bull Matching Cardinality match result may relate one or more elements of one schema to one or more elements of another
bull Auxiliary Information matcher relies on other information besides the input schemas such as dictionaries user input global schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Classification of Schema Matching Classification of Schema Matching ApproachesApproaches
Schema Matching Approaches
Individual Matchers Combining Matchers
Schema-only
Structure LevelElement Level
InstanceContents
ConstraintLinguistic Constraint
hellip hellip hellip
Element Level
ConstraintLinguistic
hellip hellip
Hybrid Matchers Composite Matchers
Manual Composition Automatic Composition
Further Criteria -Match Cardinality -Auxiliary information usedhellip
bullName SimilaritybullDescription SimilaritybullGlobal Namespaces
bullWord Frequency
bullGroup Matching
bullType SimilaritybullKey Properties
bullValue Pattern and Ranges
Sample Approaches
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Schema Level MatchersSchema Level Matchers
bull Consider schema information instead of instance data Name Description Data Type Relationship Types Constraints Structure
bull Often produces multiple candidates and estimates a degree of similarity for each
1 Granularity of match (element level vs structure level)2 Match Cardinality3 Linguistic Approaches Name or Description Matching4 Constraint-Based Approaches5 Reusing Schema and Matching Information
Bernstein P Rahm E A survey of approaches to automatic schema matching
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Element-LevelElement-Level
bull Element-Level Identifies all elements of S1 that are the same or similar to elements of S2
bull The match comparison can be based on name description or data type of the element
bull Example of name-based element-level matching Address = CustomerAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Structure-Level Structure-Level bull Structure-Level Matches combinations of elements that appear together in S1
with combinations of elements that appear together in S2bull Full Structure Match
bull Partial Structure Match
bull Equivalence Patterns Can enhance structure matching by considering known equivalence patterns stored in a library
S1 Elements S2 Elements
Address CustAddress
Street Street
City City
State USState
Zip PostalCode
S1 Elements S2 Elements
AccountOwner Customer
Name Cname
Address CAddress
Birthdate CPhone
TaxExempt
Bernstein P Rahm E A survey of approaches to automatic schema matching
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Match CardinalityMatch Cardinalitybull One or more S1 elements can match one or
more S2 elementsbull Complex matches
Examples of the four local cardinality cases for individual mapping elements
Local Match Cardinalities
S1 Element(s) S2 Element(s) Matching Expression
11 element level Price Amount Amount = Price
n1 element level Price Tax Cost Cost = Price(1+Tax100)
1n element level Name FirstName
LastName
FirstName LastName = Name
nm element level
also
n1 structure level
BTitle
BPuNo
PPuNo
PName
ABook
APublisher
ABook APublisher = Select BTitle PName From B P
Where BPuNo = PPuNo
Bernstein P Rahm E A survey of approaches to automatic schema matching
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Complex MatchesComplex Matches
bull 11 matches are bounded by the sizes of the schemas but there are an unbounded number of functions for combining attributes in a schema
bull Only a few works on complex matching have been donebull Some hard code complex matches into rulesbull Some rely on a domain specific ontology
bull We need domain knowledge to accurately perform complex matching
bull The best match isnrsquot always the top match returned by the matcher ndash so human involvement is still needed
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Linguistic ApproachesLinguistic Approaches
bull Language based matchers use names and text (ie words or sentences) to find semantically similar schema elements
bull Name Matching match elements with similar namesbull Description Matching match comments in the schemas
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Linguistic ApproachesLinguistic ApproachesName MatchingName Matching
bull Matches schema elements with equal or similar namesbull How similarity is defined 1 Equality of names 2 Equality of names after stemming deals with prefixessuffixes 3 Equality of synonyms 4 Equality of hypernyms (suv is a type of car) 5 Similarity of names based on common substrings soundex pronunciation
(ShipTo = Ship2) 6 User provided name matches
bull Can be element or structure-levelbull Cardinality is not limited to 11
Bernstein P Rahm E A survey of approaches to automatic schema matching
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Linguistic ApproachesLinguistic ApproachesDescription MatchingDescription Matching
bull Schemas can contain comments in natural language that express the intended semantics of the schema elements
bull Example
S1 empn employee name
S2 name name of employee
bull Can be as simple as keyword extraction and synonym matching or as complex as using natural language understanding technology
Bernstein P Rahm E A survey of approaches to automatic schema matching
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Constraint BasedConstraint Based
bull Schemas often contain constraints to define data types and value ranges optionality relationship types cardinalities etc
Bernstein P Rahm E A survey of approaches to automatic schema matching
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Reusing Schema and Mapping Reusing Schema and Mapping InformationInformation
bull The effectiveness of matching can be improved with the reuse of common schema components and previously determined mappings
bull Many schemas are often very similar to each other and previously matched schemas
ie In E-Commerce substructures often repeat within different message formats (address fields name fields)
bull A schema library should be created and the schema editors should access the library to use predefined terms and definitions
Bernstein P Rahm E A survey of approaches to automatic schema matching
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Schema Mapping ReuseSchema Mapping Reuse
bull Example
bull Problems
1 Determining which part of a new schema is similar to some part of a previously matched one is a match problem itself
2 Similarity values may depend on the domain ie Salary and income may be identical in payroll application but not in a tax reporting application
Schema S1 Schema S2Schema S Purchase-order Product BillTo Name Address ShipTo Name Address ContactPhone
Purchase-order Product BillTo Name Address ShipTo Name Address Contact Name Address
POrder Article Payee BillAddress Recipient ShipAddress
Bernstein P Rahm E A survey of approaches to automatic schema matching
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Instance Level ApproachesInstance Level Approachesbull Why 1 Little or no schema information available 2 Enhancement of schema-level matchers Instance data gives insight to
the contents and meaning of schema elements 3 To match instance-level data
bull How 1 Preferred Method Linguistic Characterization 2 Constraint-based Characterization ie Ranges 3 Auxiliary Information 4 Also uses both rule-based and learner-based techniques
bull Main Problem When comparing data at the instance-level it is likely that there will be a ton of possible match combinations a lot of which are irrelevant
Bernstein P Rahm E A survey of approaches to automatic schema matching
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Rule Based SolutionsRule Based Solutions
bull Rule-Based hand crafted rules to exploit schema informationbull element names data types structures and
subelementsbull Ie two elements match if they have the same
name and the same number of subelements
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Learner Based SolutionsLearner Based Solutions
bull Learner-Based exploit both schema and data
bull Requires a lot of training data but can exploit data
bull Rule and learner based techniques combined provide an effective matching solution
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Combining Different MatchersCombining Different Matchersbull The ideal matching system must exploit many different types of
information and technique for maximum accuracy
bull More match candidates will be produced if the previous approaches are combined
bull Two Combination Methods 1 Hybrid integrates multiple matching criteria Better performance 2 Composite combine the results of independently executed matchers More flexible Can be done automatically or manually
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
LSD (Univ of Washington)LSD (Univ of Washington)
bull Learning Source Descriptions
bull Uses machine learning techniques to match a new data source against a previously determined global schema
bull Uses a name matcher and several instance-level matchers
bull System is trained with sample user inputs and it learns patterns and matching rules
bull Mostly instance-oriented but can use schema information too
bull Also supports user input domain constraints on the global schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
SKAT (Stanford University)SKAT (Stanford University)
bull Semantic Knowledge Articulation Toolbull Follows a rule-based approach to semi-automatically determine
matches between two ontologies
bull User input required The user must provide application specific matchmismatch relations The user must approve or reject matches
bull SKAT matching is used within the ONION architecture for ontology integration
bull In ONION an ldquoarticulation ontologyrdquo is constructed from the rules Matching is based on is-a relationships between the articulation ontology and the source ontology
Bernstein P Rahm E A survey of approaches to automatic schema matching
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
TransScm (Tel Aviv University)TransScm (Tel Aviv University)
bull Uses schema matching to derive an automatic data translation between schema instances
bull Schemas are transformed into labeled graphs
bull Matching is performed node by node (element-level 11) starting at the top
bull Requires user intervention if no match is found (ie to provide a new rule)
Bernstein P Rahm E A survey of approaches to automatic schema matching
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
DIKE (Univ of Reggio DIKE (Univ of Reggio Calabria Univ of Calabria)Calabria Univ of Calabria)
bull Compares pairs of objects by their attributes and the is-a relationships that they are involved in
bull These pairs are given a match score between 0 and 1
bull User must specify synonyms homonyms and inclusion properties
Bernstein P Rahm E A survey of approaches to automatic schema matching
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Cupid (Microsoft Research)Cupid (Microsoft Research)bull Hybrid matcherbull Element and Structural-Level matches
Phase 1 Linguistic Element-Level - categorizes elements based on name data types and domains - calculates a linguistic similarity coefficient Phase 2 - transform the original schema into a tree then perform a bottom-up structure
matching - calculates a similarity value - calculates a weighted mean of linguistic and structural similarity of pairs of
elements
Phase 3 - uses the mean from phase 2 to decide on a mapping
Bernstein P Rahm E A survey of approaches to automatic schema matching
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Clio (IBM Almaden and Univ Clio (IBM Almaden and Univ of Toronto)of Toronto)
bull Aims at a semi-automatic creation of match mappings between a given target schema and a new data source schema
bull Three Components Schema Readers read schema and translate it into an
internal representation Correspondence Engine is used to identify matching parts
of the schemas or databases Mapping Generator generates view definitions to map data
in the source schema to data in the target schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Similarity flooding (Stanford Similarity flooding (Stanford Univ and Univ of Leipzig)Univ and Univ of Leipzig)
bull Graph Matching Algorithm
bull Converts schemas into directed labeled graphs and determines the matches between corresponding nodes of the graphs
bull Uses a name matcher to get an initial element-level match that is then given to the structural matcher
Bernstein P Rahm E A survey of approaches to automatic schema matching
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Delta (Mitre)Delta (Mitre)
bull Uses attribute descriptions to determine attribute matches
bull The method is to group the metadata about an attribute into a text string which is presented as a document The user is then presented with other lsquodocumentsrsquo with matching attributes and can chose from those
Bernstein P Rahm E A survey of approaches to automatic schema matching
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Tess (Univ of Massachusetts Tess (Univ of Massachusetts Amherst)Amherst)
bull System for helping to cope with schema evolution
bull Takes a definition of the old schema and produces a program that will transform data that conforms to the old schema into data that conforms to the new schema
Bernstein P Rahm E A survey of approaches to automatic schema matching
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
LSDIS Lab UGALSDIS Lab UGAbull What is it
A tool for semi-automatically marking up web service descriptions with ontologies
It helps in describing services semantically and aids in efficient web service discovery and composition
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MWSAF Annotation ToolMWSAF Annotation Tool
bull Input WSDL File
1 Individual elements of the WSDL are matched to concepts in the domain
2 The WSDL is classified into a domain3 The Matches are given to the user to accept or reject4 Upon the userrsquos acceptance the annotations are written
to the WSDL
bull Output WSDL File with semantic annotations
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MWSAF ArchitectureMWSAF Architecture
Main Components of the System
1 Ontology Store stores the DAML and RDF ontologies that will be used to annotate the WSDL files Ontologies are categorized by domain
2 Parser Library consists of the parsers used to generate the SchemaGraphs
3 Matcher Library provides schema matching algorithm
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MWSAFMWSAFSchema GraphsSchema Graphs
PROBLEM The difference in expressiveness of XML Schema and ontology makes it very difficult to match these two models directly
MWSAF converts both models to a commonrepresentation format called SchemaGraph
A SchemaGraph is a set of nodes connected by edges that are created using conversion functions
Then it applies a matching algorithm to find themappings between them
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MWSAF Meteor-S Web Service Annotation MWSAF Meteor-S Web Service Annotation FrameworkFramework
XML to SchemaGraph conversion rulesXML to SchemaGraph conversion rules
ltxsdcomplexType name=Directiongt
ltxsdsequencegt
ltxsdelement maxOccurs=1 minOccurs=1
name=compass nillable=true
type=xsd1DirectionCompass gt
ltxsdelement maxOccurs=1 minOccurs=1
name=degrees type=xsdint gt
ltxsdsequencegt
ltxsdcomplexTypegt Direction
degreesDirectionCompass
hasElementcompass
SchemaNode representation of XML schema
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MWSAF Meteor-S Web Service Annotation FrameworkMWSAF Meteor-S Web Service Annotation FrameworkOntology to SchemaGraph conversion rulesOntology to SchemaGraph conversion rules
ltdamlClass rdfID=WindEventgt ltrdfscommentgtSuperclass for all events dealing with windltrdfscommentgt ltrdfslabelgtWind eventltrdfslabelgt ltrdfssubClassOf rdfresource=WeatherEvent gt ltdamlClassgtltdamlProperty rdfID=windDirectiongt ltrdfslabelgtWind directionltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource = httpwwww3org200010XMLSchemastring gt ltdamlPropertygtltdamlProperty rdfID=windSpeedgt ltrdfslabelgtWind speedltrdfslabelgt ltrdfsdomain rdfresource=WindEvent gt ltrdfsrange rdfresource=Speed gt ltdamlPropertygt
WindEvent
windDirection Speed
hasProperty windSpeed
SchemaGraph representation of part of ontologyPatil A Oundhakar S Sheth A Verma K METEOR-S Web service
Annotation Framework
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MappingMapping
bull Measures of the Match Score
-Element Level Match linguistic similarity of two concepts based on names Uses WordNet to check for synonyms Abbreviations are even checked
-Schema Match structural similarity sub-concept similarities
bull The getBestMapping function then looks at the Match Scores and determines a map set
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MWSAF Matching TechniquesMWSAF Matching TechniquesElemMatchElemMatch
bull Name and String Matching algorithms
-NGram considers the number of qgrams that the names have in common
-CheckSynonym uses Wordnet to find synonyms -CheckAbbreviations uses an abbreviation dictionary -TokenMatcher uses Porter Stemmer tonkenization and
substring matching techniques bull Each algorithm returns a value between 0 and 1 These
values are used in an equation for the final match score
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
MatchingMatching
bull Once Each WSDL is compared against all of the ontologies in the store and a mapping has been created for each ontology
Then two measures are derived from the mapping
-Average Concept Match tells the user about the degree of similarity between matched concepts of the WSDL and ontology
-Average Service Match helps to categorize the service
We have a machine learning alternative for categorization
Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
OutlineOutline
bull Introductionbull Application Domainsbull Classification of Schema Matching Approachesbull Current Workbull MWSAF Matchingbull Open Research Directoriesbull Conclusion
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
Current and Future IssuesCurrent and Future Issuesbull User Interaction minimize user input but maximize impact of the
feedback
bull Real World Analysis can the current matching techniques be used in real world situations
bull P2P data management
bull Mapping Maintenance what happens when you map between two schemas and then one changes
bull Developing global schemas (or ontologies) for domains
bull Dealing with inconsistent data values for a schema elementDoan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
More IssuesMore Issues
bull If we require user acceptance for our matches then what happens if our matcher returns thousands or hundreds of matches
bull Is it unrealistic to think that we will eventually perfect our matchers
Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
ConclusionConclusionbull It is necessary to automate the matching process
bull Schema matching is very difficult and expensive
bull We have looked at a taxonomy and the descriptions of the existing approaches for matching
-Schema vs Instance-level
-Element vs Structure-level
-Language and Constraint based matchers
bull We also discussed several implementations of the matching techniques
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
ReferencesReferencesbull Bernstein P Rahm E A survey of approaches to automatic schema matching
wwwresearchmicrosoftcom~philbeVLDBJ-Dec2001pdf
bull Doan A Halevy A Semantic Integration Research in the Database Community A Brief Survey httpanhaicsuiucedupublicdb-review14pdf
bull Patil A Oundhakar S Sheth A Verma K METEOR-S Web service Annotation Framework POSV-WWW2004pdf
bull Vassilis C Integrating XML Data Sources using RDFS Schemas The ICS-FORTH Semantic Web Integration Middleware (SWIM) Dagsthul SeminarftpftpdagstuhldepubProceedings040439104391ChristophidesVassilisSlidespdf
QuestionsQuestions
QuestionsQuestions