© 2006 IBM Corporation W3C RDF/RDB WorkshopOct 25, 2007 Semantic Web Technologies and Data...
-
Upload
ashlie-warren -
Category
Documents
-
view
213 -
download
0
Transcript of © 2006 IBM Corporation W3C RDF/RDB WorkshopOct 25, 2007 Semantic Web Technologies and Data...
W3C RDF/RDB Workshop Oct 25, 2007 © 2006 IBM Corporation
Semantic Web Technologies and Data Management
Li Ma, Jing Mei, Yue Pan, Krishna Kulkarni,
Achille Fokoue, Anand Ranganathan
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Why bring together Relational Databases and the Semantic Web?
RDF and OWL ontologies are good in capturing data semantics
– Can be used to define a “semantic” model of the underlying relational data that can be tailored to different domains or applications, and that hides the actual layout of data across different tables
Allow use of additional domain knowledge in OWL ontologies while answering queries to the relational DB
Allow use of DL reasoning while answering queries to the relational DB to improve recall
Allow Semantic Web applications (that use an RDF/OWL data model) to have access to relational data, without having to deal with a different data model
Main Motivations are in capturing Data Semantics, achieving Data Integration and Reasoning
Semantic queryFind Company EDOX’s all direct and indirect shareholders who are from Europe and are IT company.
ID Name Location Business1 Business2
1 BAR Bei Jing Memory Wireless software
2 FOO Paris Optical comm.
Wireless comm.
3 ROL New York Banking Solut.
NULL
4 EDOX New York Memory Main Board
5 GUC Vancouver NULL NULL
ID Name Shareholders1 Shareholders2
1 BAR FOO TIT
2 FOO GUC Null
3 ROL BAR TIT
4 EDOX ROL Null
Company info.ontology
Business
Finance …
Banking…
IT
TelecomPC
Hardware…
Software
Optical Wireless
Wireless Software
Main board
Memory
Solution
…
Region
Asia Euro.Amer.
East Asia
China
BeiJing
…North Amer.
USA
NY
Canada
France
Paris
…
Vancouver
Shareholding
Ontology based Semantic Query
FOO is retrieved using transitive closure and subsumption inference.
BAR is retrieved using classification and subsumption inference
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Overview of talk
Survey of Two Basic Approaches for RDF Access to Relational Databases
Use Cases
Relevant Technologies from IBM
IBM’s basic positions
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Two Basic Approaches for RDF Access to Relational Data
1. Extending existing query languages for RDF Access
Extend SQL or XQuery with RDF-specific extensions
2. Using RDF-specific languages (like SPARQL) to allow publishing and accessing legacy data as RDF
Define an RDF interface over relational (or XML) databases and use query rewriting methods for accessing data
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Extending existing query languages for the access to new types of the data
Host Language
Extended feature
LanguageExamples
Storage TechnicalRequirements
ImplementationExamples
SQL XMLquery
SQL/XML XML in relational databases
Rewriting queriesfrom XQuery to SQL
XML extension in commercial databases
Native XML databases
Join results of XQuery & SQL
Commercial native XML stores
SQL
RDFquery
SQL table function
RDF in relational databases
Rewriting queriesfrom SPARQL to SQL
Commercial RDF stores
Native RDF databases
Join results of SPARQL & SQL
N/A
XQuery RDFquery
XQuery with Functional Accessors;
RDF in XML serialization
Normalized XML representation of RDF; SPARQL implementation using XQuery
TreeHugger
Native RDF databases
Join results of SPARQL & XQuery
N/A
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Publishing and accessing legacy data using new data models and query languages.
Language Data Sources Technical Requirements Implementation Examples
XQuery Relational databases Publish relational data as XMLRewrite XQuery to SQL
Commercial databases
SPARQL Relational databases Publish relational data as RDFRewrite SPARQL to SQL
D2RQ mapping and D2R ServerVirtuoso; SPASQL
SPARQLXML databases Publish XML data as RDF
Rewrite SPARQL to XQuery ; N/A
IBM T.J. Watson Research Center
© 2006 IBM Corporation
IBM’s position
Because of the semantics differences between different query languages, we believe attempts to extend SQL and XQuery to support SPARQL would involve considerable complexity.
Hence, we advocate focusing research efforts on publishing and accessing relational (and XML) data as RDF data and exploiting SPARQL for semantic query and integration.
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Case Study: RDF Representation and Access to Master Data
Master data is the reference data that is shared by several disparate IT systems and groups.
– May include lists or hierarchies of customers, suppliers, accounts, products, or organizational units
Effective Master Data Management required to enable consistent computing between diverse system architectures and business functions
Challenge of building a common master model flexible enough to deal with business changes, and expressive enough to represent the semantics of master data.
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Why RDF/OWL for Master Data Management?
Use of URIs to enable identification of common entities across different organizations
Integration of external industry-specific ontologies
Annotation of various relations based on Description Logics (e.g. symmetric, functional, inverse functional, or transitive properties)
Ability to define new classes of entities in a flexible and dynamic manner
– Using intersection, union and complement operators in OWL
– Defining classes based on restrictions on properties
Semantics Technologies | IBM China Research Lab
Cross sell item
Upsellitem
Replacement Item
transitivetransitive
Computable Relationships
Automatic definition of inverse relations :
Replaced by <-> Replaces
Composed of
Product categories
Extend the definition of relationships thanks to ontology expressiveness :
Transitivity, reflexivity…Inverse property (Replaced by
is the inverse of Replaces)
Offer the Product Information Management basic semantics (category, products, catalogs…)
Electronic City example from IBM Websphere Product Center
It gathers typical customer models in electronic product sales.
Extended with ontology expressiveness
Ontology’s Values for Master Data Management
Semantics Technologies | IBM China Research Lab
Cross sell item
Upsellitem
Replacement Item
transitivetransitive
Computable Relationships
Automatic definition of inverse relations :
Replaced by <-> Replaces
Composed of
Product categories
Category can be defined using disjoint, intersection, union, as well as various restriction on paraent categories;
The items will be automatically categorized according the defined categories
PDA Phone
Computable category definition
IntersectionIntersection
Disjoint classesDisjoint classes
Semantics Technologies | IBM China Research Lab
Cross sell item
Upsellitem
Replacement Item
transitivetransitive
Computable Relationships
Automatic definition of inverse relations :
Replaced by <-> Replaces
Composed of
Product categoriesAllow to define new
type of objects and relationship(locally or using URI reference)
PDA Phone
Computable category definition
IntersectionIntersection
Disjoint classesDisjoint classes
New entities
Manufacturer
Material
Made by
Contains
New relations
Semantics Technologies | IBM China Research Lab
Cross sell item
Upsellitem
Replacement Item
transitivetransitive
Computable Relationships
Automatic definition of inverse relations :
Replaced by <-> Replaces
Composed of
Product categories
PDA Phone
Computable category definition
IntersectionIntersection
Disjoint classesDisjoint classes
Outdated Items
Promoted Items
Dynamic categories
Metallic products
Products containingbatteries
Aluminum products
«allValuesFrom» Contains «allValuesFrom» Contains comes from Metalcomes from Metal
«someValuesFrom» Composed of «someValuesFrom» Composed of Comes from BatteryComes from Battery
Contains «hasValue» aluminumContains «hasValue» aluminum
Cardinality restriction:Cardinality restriction:Has 1 or more Replacement item
Cardinality restriction:Cardinality restriction:Has 1 or more promotion
New entities
Manufacturer
Material
Made by
Contains
New relations
China Research Lab
© 2003 IBM CorporationIBM Confidential
Architecture
Data
Ontology based
Sem
antic Engine
Ontology Views
MDM HubOperationalData stores
OperationalData storesOperational
Data stores
OperationalData stores
Pub/Sub
Ontology and rule
Repository
Business analysts
Ontology and rule queries
Scenario
Ontology Classification Datalog Evaluation
SPARQL Query Parser
New MDM services created by developers
Ontologyto RDB
Mapping
User-defined RulesOWL Class and ontology
SPARQL Queries
SQL Generator& Executor
China Research Lab
© 2003 IBM CorporationIBM Confidential
Example Query– Find all Contracts related to those which are assembled by ContractComponents that VIP Contacts own
– SPARQL• Select ?w• Where
– { ?x RelatedContract ?w; :assembledby ?y. – ?y rdf:type :ContractComponent. – ?z rdf:type VIP; :playRole ?u. – ?u typeOf :ContractRole; :contractRoleType own; :playRoleIn ?y}
Elements– WCC business entities and their related properties
• ContractRelationship• Contract• ContractComponent
– assemble (object property, range: Contract)• ContractRole
– ContractRoleType (datatype property) = own– playRoleIn (object property, range: ContractComponent)
• Contact– playRole (object property, range: ContractRole)
– User-defined classifier• VIP contact
– A contact whose client_importance property is high– User-defined datalog rule
• RelatedContract(x, y):- RelatedContract(x, z), RelatedContract(z, y);• RelatedContract(x, y):- Contract(y), ContractRelationship(z), Related_From(z,y), Related_To(z,x),
Contract_Relationship_type(z,'supplemental');
VIP contact as a classifier is defined in a hierarchy tree outside the MDM system. Users can manually add new individuals under the classifier or automatically populate individuals as its instances using class expression.
Ontology reasoning capability
1. Subsumption inference
2. HasValue restriction
3. Rule reasoning
4. Transitive closure
IBM China Research Laboratory
© 2005 IBM Corporation
IBM SOR - Scalable Ontology Repository Efficient management for large-scale OWL ontologies (millions of
statements)
DBMSs Supported– IBM DB2 (Powerful Persistent Storage)– Derby (http://incubator.apache.org/derby/, Embedded Storage)– SQL Server (Powerful Persistent Storage)– Oracle (Powerful Persistent Storage)
Query Language– W3C SPARQL Query Language
Inference Engines– Pellet– Structural TBox Engine (IBM CRL)
Memory Model– EODM (EMF based Ontology Definition Metamodel, OMG’s
recommendation) (http://www.eclipse.org/emft/projects/eodm/)
IBM China Research Laboratory
© 2005 IBM Corporation
TBox Translator
Query Adaptor
DL ReasonerRule Inference
Engine
Enhanced Datalog Engine
SHER
DL Reasoner
ABox Summarizer ABox Filters
Membership and relationship query
…
TBox Translator
Query Adaptor
Generate reasoning task
Returned results by SQLs
SOR Architecture
Persistent Store
OWL documents
OWL Parser
DB Translator
SPARQL Processor
Users
Reasoning
Import
Storage
Query Answering
Simplified Datalog Engine
Generate EODM models from documents
Load and traverse EODM Abpx
Insert Abox assertions into DB
Load and traverse EODM Tbox
Insert Tbox
Retrieve Tbox
Retrieve subsumption
Insert Tbox
SPARQL queries and results
Retrieve data for query answering & reasoning
Insert data for reasoning Generate SPARQL memory model
Return results
•SPARQL2SQL translation
•Return resutls
Semantics Technologies | IBM China Research Lab
IBM SHER – Scalable Highly Expressive Reasoner
SHER: Support SHIN (a subset of OWL-DL) ontologies. SHER is better than the state of art reasoners in terms of scalability and performance.
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Design Principles
3 key steps to exposing relational data as RDF data,
– creating an RDF Representation (ontology) of the relational data,
– building a mapping between the relational database and ontology, and
– rewriting SPARQL queries to retrieve the relational data.
IBM T.J. Watson Research Center
© 2006 IBM Corporation
Design Questions
URI Generation for Relational Data
– For classes, properties and instances
N-ary Relationship Representation
Representation of RDB Schema Constraints in Mapping
Effective Query Rewriting and Optimization
Reasoning
Performance, Security
Semantics Technologies | IBM China Research Lab
Master Data
“Master data is data that is shared across systems (such as lists or hierarchies of customers, suppliers, accounts, or organizational
units) and is used to classify and define transactional data.” [IDC]
Examples Sell Product A to Customer X on 1/1/06 for $100.
With Master Data, we should be able to answer to such questions
What is a “customer” ?– It is a subclass of People with the specific attributes A,B,C …
How to add a new customer ?– Defines the workflow
How to know that 2 customers refers to the same identity ?– Defines some business rules
Semantics Technologies | IBM China Research Lab
Decouples master information from individual applications
Becomes a central, application independent resource
Simplifies ongoing integration tasks and new app development
Ensure consistent master information across transactional and analytical systems
Addresses key issues such as data quality and consistency proactively rather than “after the fact” in the data warehouse
Historical /AnalyticalSystems
Existing
Applications
MasterData
MasterData
Existing
Applications
MasterData
MasterData
Existing
Applications
MasterData
MasterData
Master Data
Management
System
New
Applications
What Is Master Data Management?
Source: IBM EMDS team