© 2006 IBM Corporation W3C RDF/RDB WorkshopOct 25, 2007 Semantic Web Technologies and Data...

24
W3C RDF/RDB Workshop Oct 25, 2007 © 2006 IBM Corporation Semantic Web Technologies and Data Management Li Ma, Jing Mei, Yue Pan, Krishna Kulkarni, Achille Fokoue, Anand Ranganathan

Transcript of © 2006 IBM Corporation W3C RDF/RDB WorkshopOct 25, 2007 Semantic Web Technologies and Data...

W3C RDF/RDB Workshop Oct 25, 2007 © 2006 IBM Corporation

Semantic Web Technologies and Data Management

Li Ma, Jing Mei, Yue Pan, Krishna Kulkarni,

Achille Fokoue, Anand Ranganathan

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Why bring together Relational Databases and the Semantic Web?

RDF and OWL ontologies are good in capturing data semantics

– Can be used to define a “semantic” model of the underlying relational data that can be tailored to different domains or applications, and that hides the actual layout of data across different tables

Allow use of additional domain knowledge in OWL ontologies while answering queries to the relational DB

Allow use of DL reasoning while answering queries to the relational DB to improve recall

Allow Semantic Web applications (that use an RDF/OWL data model) to have access to relational data, without having to deal with a different data model

Main Motivations are in capturing Data Semantics, achieving Data Integration and Reasoning

Semantic queryFind Company EDOX’s all direct and indirect shareholders who are from Europe and are IT company.

ID Name Location Business1 Business2

1 BAR Bei Jing Memory Wireless software

2 FOO Paris Optical comm.

Wireless comm.

3 ROL New York Banking Solut.

NULL

4 EDOX New York Memory Main Board

5 GUC Vancouver NULL NULL

ID Name Shareholders1 Shareholders2

1 BAR FOO TIT

2 FOO GUC Null

3 ROL BAR TIT

4 EDOX ROL Null

Company info.ontology

Business

Finance …

Banking…

IT

TelecomPC

Hardware…

Software

Optical Wireless

Wireless Software

Main board

Memory

Solution

Region

Asia Euro.Amer.

East Asia

China

BeiJing

…North Amer.

USA

NY

Canada

France

Paris

Vancouver

Shareholding

Ontology based Semantic Query

FOO is retrieved using transitive closure and subsumption inference.

BAR is retrieved using classification and subsumption inference

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Overview of talk

Survey of Two Basic Approaches for RDF Access to Relational Databases

Use Cases

Relevant Technologies from IBM

IBM’s basic positions

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Two Basic Approaches for RDF Access to Relational Data

1. Extending existing query languages for RDF Access

Extend SQL or XQuery with RDF-specific extensions

2. Using RDF-specific languages (like SPARQL) to allow publishing and accessing legacy data as RDF

Define an RDF interface over relational (or XML) databases and use query rewriting methods for accessing data

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Extending existing query languages for the access to new types of the data

Host Language

Extended feature

LanguageExamples

Storage TechnicalRequirements

ImplementationExamples

SQL XMLquery

SQL/XML XML in relational databases

Rewriting queriesfrom XQuery to SQL

XML extension in commercial databases

Native XML databases

Join results of XQuery & SQL

Commercial native XML stores

SQL

RDFquery

SQL table function

RDF in relational databases

Rewriting queriesfrom SPARQL to SQL

Commercial RDF stores

Native RDF databases

Join results of SPARQL & SQL

 N/A

XQuery RDFquery

XQuery with Functional Accessors;

RDF in XML serialization

Normalized XML representation of RDF; SPARQL implementation using XQuery

 TreeHugger

Native RDF databases

Join results of SPARQL & XQuery

 N/A

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Publishing and accessing legacy data using new data models and query languages.

Language Data Sources Technical Requirements Implementation Examples

XQuery Relational databases Publish relational data as XMLRewrite XQuery to SQL

Commercial databases

SPARQL Relational databases Publish relational data as RDFRewrite SPARQL to SQL

D2RQ mapping and D2R ServerVirtuoso; SPASQL

SPARQLXML databases Publish XML data as RDF

Rewrite SPARQL to XQuery ; N/A

IBM T.J. Watson Research Center

© 2006 IBM Corporation

IBM’s position

Because of the semantics differences between different query languages, we believe attempts to extend SQL and XQuery to support SPARQL would involve considerable complexity.

Hence, we advocate focusing research efforts on publishing and accessing relational (and XML) data as RDF data and exploiting SPARQL for semantic query and integration.

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Case Study: RDF Representation and Access to Master Data

Master data is the reference data that is shared by several disparate IT systems and groups.

– May include lists or hierarchies of customers, suppliers, accounts, products, or organizational units

Effective Master Data Management required to enable consistent computing between diverse system architectures and business functions

Challenge of building a common master model flexible enough to deal with business changes, and expressive enough to represent the semantics of master data.

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Why RDF/OWL for Master Data Management?

Use of URIs to enable identification of common entities across different organizations

Integration of external industry-specific ontologies

Annotation of various relations based on Description Logics (e.g. symmetric, functional, inverse functional, or transitive properties)

Ability to define new classes of entities in a flexible and dynamic manner

– Using intersection, union and complement operators in OWL

– Defining classes based on restrictions on properties

Semantics Technologies | IBM China Research Lab

Cross sell item

Upsellitem

Replacement Item

transitivetransitive

Computable Relationships

Automatic definition of inverse relations :

Replaced by <-> Replaces

Composed of

Product categories

Extend the definition of relationships thanks to ontology expressiveness :

Transitivity, reflexivity…Inverse property (Replaced by

is the inverse of Replaces)

Offer the Product Information Management basic semantics (category, products, catalogs…)

Electronic City example from IBM Websphere Product Center

It gathers typical customer models in electronic product sales.

Extended with ontology expressiveness

Ontology’s Values for Master Data Management

Semantics Technologies | IBM China Research Lab

Cross sell item

Upsellitem

Replacement Item

transitivetransitive

Computable Relationships

Automatic definition of inverse relations :

Replaced by <-> Replaces

Composed of

Product categories

Category can be defined using disjoint, intersection, union, as well as various restriction on paraent categories;

The items will be automatically categorized according the defined categories

PDA Phone

Computable category definition

IntersectionIntersection

Disjoint classesDisjoint classes

Semantics Technologies | IBM China Research Lab

Cross sell item

Upsellitem

Replacement Item

transitivetransitive

Computable Relationships

Automatic definition of inverse relations :

Replaced by <-> Replaces

Composed of

Product categoriesAllow to define new

type of objects and relationship(locally or using URI reference)

PDA Phone

Computable category definition

IntersectionIntersection

Disjoint classesDisjoint classes

New entities

Manufacturer

Material

Made by

Contains

New relations

Semantics Technologies | IBM China Research Lab

Cross sell item

Upsellitem

Replacement Item

transitivetransitive

Computable Relationships

Automatic definition of inverse relations :

Replaced by <-> Replaces

Composed of

Product categories

PDA Phone

Computable category definition

IntersectionIntersection

Disjoint classesDisjoint classes

Outdated Items

Promoted Items

Dynamic categories

Metallic products

Products containingbatteries

Aluminum products

«allValuesFrom» Contains «allValuesFrom» Contains comes from Metalcomes from Metal

«someValuesFrom» Composed of «someValuesFrom» Composed of Comes from BatteryComes from Battery

Contains «hasValue» aluminumContains «hasValue» aluminum

Cardinality restriction:Cardinality restriction:Has 1 or more Replacement item

Cardinality restriction:Cardinality restriction:Has 1 or more promotion

New entities

Manufacturer

Material

Made by

Contains

New relations

China Research Lab

© 2003 IBM CorporationIBM Confidential

Architecture

Data

Ontology based

Sem

antic Engine

Ontology Views

MDM HubOperationalData stores

OperationalData storesOperational

Data stores

OperationalData stores

Pub/Sub

Ontology and rule

Repository

Business analysts

Ontology and rule queries

Scenario

Ontology Classification Datalog Evaluation

SPARQL Query Parser

New MDM services created by developers

Ontologyto RDB

Mapping

User-defined RulesOWL Class and ontology

SPARQL Queries

SQL Generator& Executor

China Research Lab

© 2003 IBM CorporationIBM Confidential

Example Query– Find all Contracts related to those which are assembled by ContractComponents that VIP Contacts own

– SPARQL• Select ?w• Where

– { ?x RelatedContract ?w; :assembledby ?y. – ?y rdf:type :ContractComponent. – ?z rdf:type VIP; :playRole ?u. – ?u typeOf :ContractRole; :contractRoleType own; :playRoleIn ?y}

Elements– WCC business entities and their related properties

• ContractRelationship• Contract• ContractComponent

– assemble (object property, range: Contract)• ContractRole

– ContractRoleType (datatype property) = own– playRoleIn (object property, range: ContractComponent)

• Contact– playRole (object property, range: ContractRole)

– User-defined classifier• VIP contact

– A contact whose client_importance property is high– User-defined datalog rule

• RelatedContract(x, y):- RelatedContract(x, z), RelatedContract(z, y);• RelatedContract(x, y):- Contract(y), ContractRelationship(z), Related_From(z,y), Related_To(z,x),

Contract_Relationship_type(z,'supplemental');

VIP contact as a classifier is defined in a hierarchy tree outside the MDM system. Users can manually add new individuals under the classifier or automatically populate individuals as its instances using class expression.

Ontology reasoning capability

1. Subsumption inference

2. HasValue restriction

3. Rule reasoning

4. Transitive closure

IBM China Research Laboratory

© 2005 IBM Corporation

IBM SOR - Scalable Ontology Repository Efficient management for large-scale OWL ontologies (millions of

statements)

DBMSs Supported– IBM DB2 (Powerful Persistent Storage)– Derby (http://incubator.apache.org/derby/, Embedded Storage)– SQL Server (Powerful Persistent Storage)– Oracle (Powerful Persistent Storage)

Query Language– W3C SPARQL Query Language

Inference Engines– Pellet– Structural TBox Engine (IBM CRL)

Memory Model– EODM (EMF based Ontology Definition Metamodel, OMG’s

recommendation) (http://www.eclipse.org/emft/projects/eodm/)

IBM China Research Laboratory

© 2005 IBM Corporation

TBox Translator

Query Adaptor

DL ReasonerRule Inference

Engine

Enhanced Datalog Engine

SHER

DL Reasoner

ABox Summarizer ABox Filters

Membership and relationship query

TBox Translator

Query Adaptor

Generate reasoning task

Returned results by SQLs

SOR Architecture

Persistent Store

OWL documents

OWL Parser

DB Translator

SPARQL Processor

Users

Reasoning

Import

Storage

Query Answering

Simplified Datalog Engine

Generate EODM models from documents

Load and traverse EODM Abpx

Insert Abox assertions into DB

Load and traverse EODM Tbox

Insert Tbox

Retrieve Tbox

Retrieve subsumption

Insert Tbox

SPARQL queries and results

Retrieve data for query answering & reasoning

Insert data for reasoning Generate SPARQL memory model

Return results

•SPARQL2SQL translation

•Return resutls

Semantics Technologies | IBM China Research Lab

IBM SHER – Scalable Highly Expressive Reasoner

SHER: Support SHIN (a subset of OWL-DL) ontologies. SHER is better than the state of art reasoners in terms of scalability and performance.

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Design Principles

3 key steps to exposing relational data as RDF data,

– creating an RDF Representation (ontology) of the relational data,

– building a mapping between the relational database and ontology, and

– rewriting SPARQL queries to retrieve the relational data.

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Design Questions

URI Generation for Relational Data

– For classes, properties and instances

N-ary Relationship Representation

Representation of RDB Schema Constraints in Mapping

Effective Query Rewriting and Optimization

Reasoning

Performance, Security

IBM T.J. Watson Research Center

© 2006 IBM Corporation

Backup

Semantics Technologies | IBM China Research Lab

Master Data

“Master data is data that is shared across systems (such as lists or hierarchies of customers, suppliers, accounts, or organizational

units) and is used to classify and define transactional data.” [IDC]

Examples Sell Product A to Customer X on 1/1/06 for $100.

With Master Data, we should be able to answer to such questions

What is a “customer” ?– It is a subclass of People with the specific attributes A,B,C …

How to add a new customer ?– Defines the workflow

How to know that 2 customers refers to the same identity ?– Defines some business rules

Semantics Technologies | IBM China Research Lab

Decouples master information from individual applications

Becomes a central, application independent resource

Simplifies ongoing integration tasks and new app development

Ensure consistent master information across transactional and analytical systems

Addresses key issues such as data quality and consistency proactively rather than “after the fact” in the data warehouse

Historical /AnalyticalSystems

Existing

Applications

MasterData

MasterData

Existing

Applications

MasterData

MasterData

Existing

Applications

MasterData

MasterData

Master Data

Management

System

New

Applications

What Is Master Data Management?

Source: IBM EMDS team