Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability...

38
Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg Pike Vienna, VA [email protected]

Transcript of Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability...

Page 1: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Semantic Interoperability: Automatically Resolving Vocabularies

4th Semantic Interoperability Conference February 10, 2006

Chuck Mosher8500 Leesburg Pike

Vienna, [email protected]

Page 2: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

2

Interoperable Information Backbone

• Enterprise-wide data abstraction layer for applications• Integrated views of data from multiple sources

– Relational databases, applications, files

• Re-useable Data Services for data consistency• Metadata-driven data management and integration• Complements other data integration tools (ETL, EAI, quality, etc.)

MetaMatrix

Enterprise Data Service LayerApplications

Data Sources

Page 3: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

3

Data Services

• A type of Web Service• Does all of the work to transform any data in

any format to a W3C compliant service– Implements all of the logic to effect the

transformation– Provides access to data sources, regardless of

source API, technology

• Does not implement application logic• Decouples the data from the application

while making the data discoverable and accessible

Page 4: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

4

Custom Apps

Web Services,Business Processes

Packaged Apps

Reporting, Analytics

EAI, Data warehouses

xml

databases

warehouses

spreadsheets

services

<sale/> <value/></ sale >

geo-spatial

rich media

Enterprise Enterprise Information Information

Sources (EIS)Sources (EIS)

Information Information ConsumersConsumers

Reusable Integrated Reusable Integrated Business ObjectsBusiness Objects

OD

BC

JDB

CS

OA

P

Exposed Exposed Information Information

ServicesServices

<WSDL><WSDL>(contract)

<WSDL><WSDL>(contract)

<WSDL><WSDL>(contract)

Model-Based Approach Maximizes Re-useData Abstraction Without Coding

Page 5: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

5

Data

Model

Meta-model

Meta Object Facility (MOF)

Page 6: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

6

MetaMatrix MetaBase Modeler• Model disparate

information sources– Relational DBs– Content Management

Systems– Files– Services– Applications

• Uses and retains domain-specific modeling terminology– Relational models

have “Tables”, “Foreign Keys”, “Columns”, etc.

– UML models have “Packages”, “Classes”, “Attributes”, etc.

Page 7: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

7

MetaMatrix MetaBase Modeler

• Define reusable data services/ business objects

• Transformations defined with:– Selects– Joins– Criteria– Unions– Functions– User defined

• Perform schema and semantic matching, data type conversion

Page 8: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

8

T

Data Sources - Authoritative- Redundant

- Overlapping

Multiple Internal/External Information Sources

Aggregate Data Services:• Relational or XML• Application-specific• Access via ODBC,

JDBC, or SOAP APIs

T T

Virtual XML Document<a>

</a>

<b>

</b>…

TTT

ODBC/JDBC JDBC SOAP

WebServices

WebServices

Portal Applications

Portal Applications

BusinessIntelligence

Applications

BusinessIntelligence

Applications

Enterprise-wide or COI-driven Data Model

• Rationalization and Semantic mediation Layer• Harmonization• Data Catalog/Dictionary

Logical Data Model

Semantic Mediation: The Problem

bldg_id SITENUM Facility_ID

Location_ID

bldg_type Depot_Number

Location_Type

Page 9: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

9

J-8 Force Structure

J-7 Operational Plans

J-6 C4CS

TData Sources- Authoritative- Redundant

- Overlapping

Multiple Internal/External Information Sources

T T

ODBC/JDBC JDBC SOAP

WebServices

WebServices

Portal Applications

Portal Applications

BusinessIntelligence

Applications

BusinessIntelligence

Applications

Enterprise-wide or COI-driven Data Models

• Rationalization• Harmonization• Data Catalogs

Building Enterprise Semantic Model(s)

J-5 Plans & Policy

J-4 Logistics (GCSS)

J-3 Operations

J-2 Intelligence

J-1 Manpower / Personnel

Page 10: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

10

Biggest Challenge in Creating Data Services?

• Semantics!!!

• Structural differences are straightforward

• Differing definitions among data sources

• Differing vocabularies among COI’s

• Established, emerging, and evolving data standards– C2IEDM, JC3IEDM, GJXDM, NIEM, GFM,

many more

• Not addressed by ETL, EAI, SOA

Page 11: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

11

A Previously Intractable Problem

• TWPDES has 1000+ core entities

• NIEM has 100,000+!

• Even a limited program with a dozen data sources could yield 10’s of 1000’s of potential mappings

• Humans cannot address this without help

• Indeed, it has stopped many data integration/reconciliation programs in their tracks.

Page 12: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Automated Semantic Matching

Page 13: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

13

DISCLAIMER

• Semantic matching can't really be done automatically yet!

• Requires intelligence to understand the context and semantics.

• So use computers to do most of the work but then have the user confirm or check the result.

Page 14: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

14

• Given two symbols, calculate a measure of the relationship between them:

Doesn’t seem so hard…

amount quantity

The Matching Problem

Page 15: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

15

ftuqky aqfkyeyr

The Matching Problem

• Given two symbols, calculate a measure of the relationship between them:

This is what a computer “sees.”

Page 16: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

16

The Matching Problem

• Even after extracting likely symbols, matching is a difficult problem.

• Symbols alone are not enough to generate good matches: – “ID” -> “SocialSecurityNumber” or “NY”

• The solution relies on context:– “NJ”,”MA”,”CA”,”ID”– “Ego”, “SuperEgo”, “ID”

• MatchIt provides that context

Page 17: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

17

MatchIT 1.0

• Integrated component of the MetaMatrix Semantic Data Services product

• Based on ontology-driven semantic knowledge base– Word relationships, dictionaries, lexicons, thesauri

• Plug-in architecture• Standards-compliant:

– OWL– RDF– Inference engines– OSGI– Eclipse– JDBC

Page 18: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

18

FBI CBP NYC NY NJ

Data Source Services

Matched (Confidence of 90%)

Gender ID

Person Sex Code

Ontology

“Sex” semantically related to “Gender”

(Semi-)Automated Semantic Mediation

*An extensible semantic knowledge base provides a dictionary and thesaurus like information on “words”, their “meanings”, and their relationships to other words.

*A sophisticated set of matching algorithms provides string similarity matches and semantic matches with confidence ratings and explanations.

Page 19: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

19

Matching Techniques

• MatchIT uses two types of matching techniques:– String Matching

• Attempts to determine string similarity based on the lexical distance between them.

– Semantic Matching• Attempts to determine string similarity based on the

ontological distance between them within a semantic ontology.

• Generate Match Sets• Can be run individually or in combinations• Pluggable architecture allows for algorithmic

extendibility

Page 20: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

20

String Matching

• What is the lexical distance between two symbols?– “PUZZLE”, “PUZZ”– “ID”,”IDENTIFIER”– “STRONG”,”SONG”

Page 21: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

21

Semantic Matching

• How semantically similar are two concepts?

car

motor vehicle

self-propelled vehicle

wheeled vehicle

vehicle

craft

aircraft

heavier-than-air craft

airplanetruck

is a

is a

is a

is a is a

is a

is a

is a

is a

car and truck are very similar

Car and airplane are less similar

Page 22: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

22

Semantic Matching Objectives

• Find and rank the potential matches, but let the user review and decide for sure.

• I.e., eliminate 99+% of the things that don't match, and let the user review the <1%.

• Many times, a user can visually scan a small list of the top 1% and very quickly agree or disagree with the results.

• Favor false positives over false negatives.

Page 23: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

23

Semantic Matching in MetaMatrix

Ontologies[OWL/RDF]

Relational

XML

XML

XML

XMLDomain[UML/ER]

MetaBase Modeler

Custom

AnySource

XML

File System

JDBC

RDBMS

Instance-levelMatch

Instance-levelMatch

Schema-levelMatch

Schema-levelMatch

MatchIt Ontology

Semantic Knowledge Base

MetaMatrix Connector Framework

MetaMatrix Importer Framework

Models & Files[versioned]

Models & Files[versioned]

Search Index

Search Index

Web Reporting

Web Reporting

MetaBase Repository

Data Harmonization Complete

MetadataAccess

Data/ContentAccess

Ontological Semantics Access

Lexicons

Fact

Repository

Onomasticons

Find Matches

•Analyze

•Visualize

•Collaborate

•Transform

Import Export

Conceptual/Logical/Physical Data ModelsEnterprise Information Sources

Representations

Page 24: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Example

Page 25: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

25

Overall process

• Import two nontrivial vocabularies– ERwin model of large data warehouse– TWPDES XML schema

• Extract symbols– Schema-specific tokenization algorithms

• Assign semantics to each– Symbols are keys into dictionaries

• Perform semantic matching between them

• Analyze results

Page 26: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

26

ERwin Data Warehouse Model

Page 27: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

27

TWPDES XML Schema

Mapping Classes for each XML frag

in hierarchy

Page 28: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

28

Generated Symbol Dictionary (TWPDES)

Page 29: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

29

Generated Symbol Dictionary (ERwin model)

Page 30: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

30

Editing the Dictionary

Modify Definition

Page 31: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

31

Editing the Semantics

Control Senses

Page 32: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

32

Target Model

Match Results

Page 33: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

33

Examine Details

Page 34: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

34

Match Details

Page 35: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

35

Matches Used to Build Mappings

Page 36: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

36

From Pat Cassidy & COSMO

Obligation Duty

GenericObligation

SameAs

SameAs

The Integrating Function of the Common Semantic Model –via Domain-level Mapping

Page 37: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

37

MatchIt Semantic Matching Tool

• A way to use ontologies in a world where nearly 100% of what already exists is not in an ontology.

• Map connections between ontologies that are being built and artifacts currently in use:– RDBMs schemas– XML and XSD files– Spreadsheet data– More coming, including ontologies!

• Map an imported model to a Vocabulary, and a Vocabulary to an Ontological structure

Page 38: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Thank you