Unity Demonstration Dr. Ramon Lawrence University of Iowa [email protected] Dr. Ramon...

41
Unity Demonstration Unity Demonstration Dr. Ramon Lawrence Dr. Ramon Lawrence University of Iowa University of Iowa [email protected] [email protected]
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    7

Transcript of Unity Demonstration Dr. Ramon Lawrence University of Iowa [email protected] Dr. Ramon...

Page 1: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Unity DemonstrationUnity DemonstrationUnity DemonstrationUnity Demonstration

Dr. Ramon LawrenceDr. Ramon LawrenceUniversity of IowaUniversity of Iowa

[email protected]@uiowa.edu

Dr. Ramon LawrenceDr. Ramon LawrenceUniversity of IowaUniversity of Iowa

[email protected]@uiowa.edu

Page 2: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 2

Outline Motivation and Background Two basic integration approaches:

global as view (GAV) local as view (LAV)

What is the open problem? How Unity is different Using Unity example Benefits and Contributions Future Work

Page 3: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 3

Motivation There are many integration environments:

Operational systems within an organization System integration during company merger Data warehouses, Intranets, and the WWW

Users require information from many data sources which often do not work together.

Page 4: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 4

What is Integration? Two levels of integration:

Schema integration - the description of the data Data integration - the individual data instances

Integration handles the different mechanisms for storing data (structural conflicts), for referencing data (naming conflicts), and for attributing meaning to the data (semantic conflicts).

Page 5: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 5

Two Current Approaches The current state-of-the-art integration systems all

can be reduced to a logical basis. For this demo, assume the data is physically stored in the

relational model and queried using Datalog.

There are two basic "database" approaches to integration:

global as view approach - the extraction and integration of data is defined simulatenously with the global view definition TSIMMIS using Object Exchange Model (OEM)

local as view approach - pre-defines the global view and then defines what portion of the global view each local source provides Information Manifold using description logic

Page 6: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

BodyWorks Systems

Web Server

Custom Accounting

Package

ShipmentTrackingSoftware

Customer

OrderDatabase

InvoiceDatabase

ShipmentDatabase

Page 7: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

BodyWorks Systems

Web Server

Custom Accounting

Package

ShipmentTrackingSoftware

Customer

OrderDatabase

InvoiceDatabase

ShipmentDatabase

Question: Who has a complete picture of a customer's order, or the entire customer relatioship?

Page 8: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

BodyWorks Systems

Web Server

Custom Accounting

Package

ShipmentTrackingSoftware

Customer

OrderDatabase

InvoiceDatabase

ShipmentDatabase

Answer: No one, but management wants to know...

Page 9: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Data Warehouse Approach

OrderDatabase

InvoiceDatabase

ShipmentDatabase

GatherRefine

AggregateStore

GatherRefine

AggregateStore

GatherRefine

AggregateStore

Warehouse

Features:- static, materialized view- performs data cleansing and aggregation- historical more than operational

Page 10: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Query-Driven Dynamic Approach

InvoiceDatabase

Cust(id,name,addr,city,state,cty)Order(oid,cid,odate)OrdProd(oid,pid,amt,pr)Prod(id,name,pr,desc)

OrderDatabase

ShipmentDatabase

Cust(id,name,addr,city,state,cty)Invoice(invId,custId,shipId,iDate)InvProd(invId,prodId,amt,pr)Prod(id,name,pr,desc)

Cust(id,name,addr,city,state,cty)Shipment(shipid,oid,cid,shipdate)ShipProd(shipid,prodid,amt)Prod(id,name,pr,desc, inv)

Wrapper Wrapper Wrapper

mediator

Features:- view dynamically built- data is extracted at query-time- still typically read-only

Page 11: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 11

Global as View Approach Define global objects by specifying how to extract

their information from the local sources.

Requires that the administrator defining the global view understand the semantics of every local data source.

Further, if the local views or global views must be changed for whatever reason (such as adding a new data source), the global view must be re-compiled.

Page 12: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 12

Global as View Example Tsimmis MSL example extracting customer info:

Equivalent SQL:

<f(I) customer {<id I> <name N> <addr A>}>@med :-customer {<id I> <name N> <addr A>}@invoiceDB

<f(I) customer {<id I> <name N> <addr A>}>@med :-customer {<id I> <name N> <addr A>}@orderDB

<f(I) customer {<id I> <name N> <addr A>}>@med :-customer {<id I> <name N> <addr A>}@shipmentDB

Union the results of the following 3 queries: (matching ids if possible)orderDB: SELECT * FROM customerinvoiceDB: SELECT * FROM customer shipmentDB: SELECT * FROM customer

Page 13: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 13

Global as View Example (2) Extract all orders with invoices and shipments:

Equivalent SQL: (if possible to query multiple databases)

<shipInvOrd {<shipment S> <invoice I> <order O>}>@med :- <shipment {<shipid S> <oid O>}@shipmentDB AND<order {<oid O>}>@orderDB AND<invoice {<invId I> <shipId S>}@invoiceDB

SELECT shipment.shipid, invoice.invId, order.oidFROM shipment, invoice, orderWHERE shipment.shipid = invoice.shipId AND

shipment.oid = order.oid

Page 14: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 14

Local as View Approach Pre-define an integrated global view that

encompasses the information present in all sources. For each local source, specify the local view as a

subset of the information available in the GV. Building the GV is typically not discussed. However,

LAV approach makes it easier to add/remove sources as GV does not have to be updated.

Query processing using LAV approach is more difficult than GAV approach as have to determine what information can be extracted from the views.

Page 15: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 15

Local as View Example Consider this global customer relation in the GV:

Assume that the order, shipment, and invoice databases only contains a customer record if the customer had an invoice, order ,or shipment respectively. Further, assume that only shipmentDB contains a customer address.

Local views of each source:

customer(id, name, addr)

orderView(C,N) :- customer(C,N)

invoiceView(C,N) :- customer(C,N)

shipView(C,N,A) :- customer(C,N,A)

Page 16: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 16

Local as View Example (2) Let the user pose the following query:

Query asks for all customer names. Query processor must determine which views are relevant

(in this case all of them).

Local queries on each source:

q(N) :- customer(I, N, A)

q(N) :- orderView(C,N)

q(N) :- invoiceView(C,N)

q(N) :- shipView(C,N,A)

Page 17: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 17

What is the open problem? The two approaches are both viable methods for

solving data integration.

However, the open problem is that neither approach performs schema integration - the construction of the global view itself.

GAV - GV constructed (schema integration performed) by global designer when specifying extraction rules

LAV - GV is pre-defined using some previous integration process (most likely manual in nature)

Both methods rely on the concept of a global user to create the global schema.

Page 18: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 18

How Unity is Different Our integration architecture called Unity is different

because it approaches the integration problem for a different perspective:

Thus, the integration problem is tackled from a different set of starting assumptions:

Do not assume pre-existing or manually created GV. However, assume we have a dictionary and a language for

describing schema and data element semantics. Attempt to automatically build a GV from source descriptions

of each data source.

How can we automate, or semi-automate, the construction of the global view by extracting information from the local data sources?

Page 19: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 19

The Unity Approach Given a set of data sources and a dictionary and

language to describe data semantics: 1) Semi-automatically extract and represent data source

semantics in the language using the dictionary. 2) Automatically match concepts across data sources by

using the dictionary to determine related concepts. This process effectively builds the global level relations or objects

initially assumed or created in other approaches. However, since there is no manual intervention, the precision of

global view construction is affected by inconsistencies in the descriptions of the data sources and matching concepts.

3) Automatically generate queries specified by the user using dictionary terms (not structures) and map the user's query to appropriate data elements in the local sources.

Page 20: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 20

Unity Overview Unity is a software package that implements the

integration architecture with a GUI. Developed using Microsoft Visual C++ 6 and

Microsoft Foundation Classes (MFC).

Unity allows the user to: Construct and modify standard dictionaries Build X-Specs to describe data sources Integrate X-Specs into an integrated view Transparently query integrated systems using ODBC and

automatically generate SQL transactions

Page 21: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 21

Unity ExampleStep #1 - Standard Dictionary A standard dictionary (SD) provides standardized

terms to capture data semantics. Hierarchy of terms related by IS-A or HAS-A links Contains base set of common database concepts, but new

concepts can be added

A SD term is a single, unambiguous semantic definition.

Several SD entries for a single English word are required if the word has multiple definitions.

The top-level dictionary terms are those proposed by Sowa.

Page 22: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.
Page 23: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 23

Unity ExampleStep #2 - Data Extraction For each data source, an X-Spec document is

constructed that consists of: field, table, key, and join information extracted from the

ODBC source assignment of semantic names for each field and table

Semantic names combine dictionary terms to describe the semantics of schema elements.

semantic name := [CT_Type] | [CT_Type] PN CT_Type := CT | CT {; CT} | CT {,CT} CT := context term, PN := property name each CT and PN is a single term from the dictionary

Page 24: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 24

Unity ExampleStep #2 - Data Extraction (2) Semantic names are initially assigned using an

automatic algorithm which attempts to find the best matches.

The integrator can then refine initial semantic name assignments.

Semantic names have two major purposes: used as a means for describing, documenting, and

comparing concepts across systems allow information in the database (and later in the integrated

view) to be organized by semantic concept instead of using structures or relations This simplifies querying the database and integrated view because

the information is not divided in normalized relations.

Page 25: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.
Page 26: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 26

Unity ExampleStep #3 - Schema Building Unlike previous approaches, the global view (or

schema) is constructed automatically by combining source specifications (X-Specs).

This is possible because semantic naming of concepts allows matching across systems:

The same semantic name in two databases is assumed to represent the same concept.

Hierarchical nature of semantic names (consisting of multiple terms) allows a schema to be built-up from pieces of relations or objects from each data source.

Effectively, the global view is synthesized by the union of concepts in the underlying systems.

Page 27: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.
Page 28: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 28

Unity ExampleStep #4 - Query Processing The query processor:

Allows the user to formulate queries on the view. Translates from semantic names in the context view to

structural queries (SQL) on databases. Involves determining correct field and table mappings and

discovery of join conditions and join paths

Retrieves query results and formats them for display to the user.

Client-side query processing: Perform joins between databases using common keys.

Page 29: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.
Page 30: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 30

Benefits and Contributions The architecture automatically integrates relational

schemas into a global view for querying.

Unique contributions: Synthesizing a global view from the bottom-up instead of

top-down. This should improve integration scalability. Organizing the global view as a hierarchy of concepts

instead of relations or predicates simplifies querying similar to the Universal Relation as the user does not have to specify specific predicates/relations or join conditions.

Query processing is achieved by dynamically discovering extraction rules. The discovered rules are similar to extraction rules of GAV systems.

Page 31: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 31

Future Work Unity performs schema integration by extracting

data source information and performing global joins. However, the global query processor needs to be

extended to handle more diverse queries involving: aggregration and grouping, recursive queries, queries with

selection conditions that span data sources support for typical data integration problems of scaling, data type

conversions, and translation of units

Synthesizing the global view by combining concepts can be improved by exploiting dictionary knowledge:

Use IS-A relationships in dictionary to improve matching. Determine when to create new global level attributes and

contexts that are discovered based on interschema relationships.

Page 32: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 32

References Publications:

Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, Jan. 2000.

Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pages 127-136, Oct. 2000.

Integrating Relational Database Schemas using a Standardized Dictionary, SAC’2001 - ACM Symposium on Applied Computing, pages 225-230, March 2001.

Querying Relational Databases without Explicit Joins DASWIS 2001- International Workshop on Data Semantics in Web Information Systems (with ER'2001), Nov. 2001.

Further Information: http://www.cs.uiowa.edu/~rlawrenc/

Page 33: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 33

Extra Slides

Extra Slides...

Page 34: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Data Warehouse Approach

InvoiceDatabase

GatherRefine

AggregateStore

GatherRefine

AggregateStore

GatherRefine

AggregateStore

Warehouse

Cust(id,name,addr,city,state,cty)Order(oid,cid,odate)OrdProd(oid,pid,amt,pr)Prod(id,name,pr,desc)

OrderDatabase

ShipmentDatabase

Cust(id,name,addr,city,state,cty)Invoice(invId,custId,invDate)InvProdinvId,prodId,amt,pr)Prod(id,name,pr,desc)

Cust(id,name,addr,city,state,cty)Shipment(shipid,oid,cid,shipdate)ShipProd(shipid,prodid,amt)Prod(id,name,pr,desc, inv)

Page 35: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Integration Architecture

Architecture Components: 1) Integrated Context View

• user’s view of integration 2) X-Spec Editor

• stores schema & metadata• uses XML

3) Standard Dictionary• terms to express semantics

4) Integration Algorithm• combines X-Specs into integrated context view

5) Query Processor• accepts query on view• determines data source mappings and joins• executes queries and formats results

Local Transactions

X-Spec

X-Spec Editor

Standard Dictionary

Integration Algorithm

Integrated Context View

Query Processor and ODBC Manager

Database

Client

Subtransactions

Client

Multidatabase Layer

Database

X-Spec

Page 36: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 36

Architecture Components The architecture consists of four components:

A standard dictionary (SD) to capture data semantics SD terms are used to build semantic names describing semantics of

schema elements.

X-Specs for storing data semantics Database metadata and semantic names stored using XML

Integration Algorithm Matches concepts in different databases by semantic names. Produces an integrated view of all database concepts.

Query Processor Allows the user to formulate queries on the view. Translates from semantic names in integrated view to SQL queries

and integrates and formats results. Involves determining correct field and table mappings and discovery of

join conditions and join paths

Page 37: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 37

The integration architecture consists of three separate processes:

Capture process: independently extracts database schema information and metadata into a XML document called a X-Spec.

Integration process: combines X-Specs into a structurally-neutral hierarchy of database concepts called an integrated context view.

Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) and the results are integrated and formatted.

Integration Processes

Page 38: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.

Page 38

Architecture Components: Dictionary vs. Knowledge Base The standard dictionary differs from a knowledge base

such as Cyc because: Not intended to be a general English dictionary or contain

knowledge facts about the world Dictionary is evolved as new terms are required Not all English words are used

Dictionary provides the systems with no “knowledge” Since no facts are stored, system cannot deduce new facts Dictionary terms are just semantic place holders, integrators determine

the semantics of the database not the system

Simplified organization Dictionary is organized as a tree for efficiency and simplicity in

determining related concepts

Re-use of terms Terms are re-used in semantic names

Page 39: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.
Page 40: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.
Page 41: Unity Demonstration Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu Dr. Ramon Lawrence University of Iowa ramon-lawrence@uiowa.edu.