Download - Generic Model Management A Database Infrastructure for Schema Manipulation

Transcript
Page 1: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 1

Generic Model Management

A Database Infrastructure for Schema Manipulation

Philip A. BernsteinMicrosoft Corporation

September 6, 2001

Page 2: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 2

The Problem There is 30 years of DB Research on meta dataBut we don’t have great infrastructure to offer

– Most design tools and web services store meta data in files, not DBs

– OODBMS’s are not a huge success– Most meta data driven tools use their own infrastructure

Goal: generic meta data manipulation infrastructure – Reduce the amount of programming required to build meta

data driven applications.

Proposal: Model Management– Define an algebra to manipulate meta data in large

chunks, called models and mappings.

Page 3: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 3

Outline

• Overview of Model Management

• Solutions to classical meta data problems

• Recent technical results

Page 4: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 4

Models and Mappings• Model – a complex information structure

– XML schema, SQL schema, OO interface, UML model, web site map, make script, ….

• Mapping – a transformation from one model into another– Map between two XML schemas– Map a SQL schema to an XML schema– Map data sources to a data warehouse– Map an ER diagram to a SQL schema– Map a process defn to a workflow script

Page 5: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 5

RepresentationA model is a directed graph with one root.A model is a directed graph with one root.

Emp

E#

Dept#

Name

RelationalSchema

Emp

E#

Dept#

Name

First

Last

XSDmap1

A mapping is a model each A mapping is a model each of whose nodes connects of whose nodes connects nodes of two other modelsnodes of two other models

Page 6: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 6

Model Management Algebra

• Match

• Merge

• Compose

• Select

• Diff

• Enumerate

• ApplyFunction

• Copy

• Update operations

Page 7: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 7

map = Match(M1, M2, ) • Match(M1, M2, ) returns the best mapping

between M1 and M2, w.r.t. to

map1

=

=

Emp

E#

Dept#

Name

Addr

M1M2

Emp

E#

Dept#

Name

First

Last

Phone

Page 8: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 8

M3 = Merge(M1, M2, map)• Return the union of models M1 and M2

– Use map to guide the Merge– If elements x = y in map, then collapse

them into one element

Emp

Addr Name

Emp

Name Phone

mapC

=

Emp

Name PhoneAddr

Page 9: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 9

Left Composition ( f • )Emp

Addr

Street

City

Emp

StreetCity

Emp

StAddrTown

mapA

a1

a2a3

mapB

b2b3

M1 M2 M3

Emp

Addr

Street

City

Emp

StAddrTown

mapC

c1

c2c3

mapC = mapA f• mapB

Name Nameb1

Page 10: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 10

Model Management Algebra• map = Match (M1, M2, )

• M3 = Merge (M1, M2, map)

• map3 = Compose(map1, map2)

• M2 = Select(M1, pred)

• M2 = Diff(M1, map)

• list = Enumerate(M)

• ApplyFunction(M, f )

• M2 = Copy(M1)

• Update operations

They’re generic = data model independent … well … implemented on an extended ER model with an extensibility story

Page 11: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 11

Example

rdb1rdb1

xsd1xsd1

map

1

xsd2xsd21. map2 1. 1. mapmap22= Match(xsd1, xsd2)= Match(xsd1, xsd2)

2. map

3

2. 2. mapmap33 = = mapmap11 mapmap22

rdb2rdb2

3. m

ap4

3. <3. <mapmap44, rdb2 > = Copy(, rdb2 > = Copy(mapmap33))

• Given – map1 from SQL schema rdb1 to xsd1, – xsd2, which is similar to xsd1

• Produce– a map between xsd2 and a relational schema.

4. Use ApplyFunction(map4) to map each x in Diff(xsd2,map4) into rdb2

Page 12: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 12

Theme• Classic meta data problems can be solved

using Model Management operations– Schema integration – Schema evolution – Data migration– Reverse engineering– Data reintegration (3-way merging)

• Published solutions to these problems help us produce generic implementations of model mgmt operations

Page 13: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 13

OutlineOverview of Model Management

• Solutions to classical meta data problems– Schema integration – Schema evolution– Reverse engineering– Data reintegration (3-way merging) – Data migration

• Recent technical results

Page 14: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 14

1. map

1. 1. mapmap= Match(V= Match(V11, V, V22))

Schema Integration• Given

– two view schemas, V1 and V2

• Produce – an integrated schema, S

VV11 VV22

2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)map

SS

2. 3. 3. ApplyFunction(S) // to resolve ) // to resolve conflicts in conflicts in S, , producing SS

SS

Page 15: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 15

Emp

E#

Dept#

Addr

V1 V2

E#

Dept#

Phone

FirstName

LastName

Emp

Name

1. 1. mapmap= Match(V= Match(V11, V, V22))

map

=

=

2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)

S

E#

Dept#

Addr

Phone

Emp

Name

FirstName

LastName

f

L

R

FirstName

LastName 3. Use ApplyFunction(S3. Use ApplyFunction(S)) to re- to re-solve conflicts, producing Ssolve conflicts, producing S

Page 16: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 16

Merging Knowledge Bases (Ontologies)

• Same as schema integration, but applied to ontologies

• The literature on merging ontologies focuses mostly on Match.

Page 17: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 17

Schema Evolution• Given

– mapSV from schema S to view V– a modified version S of S

• Produce– a mapping mapSV from S to V

(i.e. a view defn for V over S).

SS

VV

map

SV

SS1. mapSS

1. 1. mapmapSSS S = Match(S= Match(S, S), S)2. mapSV

2. 2. mapmapSS V V = = mapmapSS S S mapmapSVSV

3. Use ApplyFunction(V) to delete elements not derivable from S

Page 18: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 18

OutlineOverview of Model Management

• Solutions to classical meta data problemsSchema integration Schema evolution – Reverse engineering– Data reintegration (3-way merging)

– Data migration

• Recent technical results

Page 19: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 19

Reverse Engineering• Given

– Model M (e.g., an ER model)– Model G (e.g., SQL) generated via mapMG from M– A modified version G of G

• Produce– A modified version M of M that generates G

GG

MM

map

MG

GG1. mapGG

1. map1. mapGGGG = Match(G, G= Match(G, G))2. mapMG

2. map2. mapMGMG = map= mapMG MG map mapGGGG

MM3. map

MG

3. <M3. <M, map, mapGG M M > = Copy(map> = Copy(mapMGMG))

4. Use ApplyFunction(mapMG), to reverse engineer each g in Diff(G,mapMG) into M

Page 20: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 20

3-Way Merge (aka Reintegration)• Given

– a source schema S0

– two derived schemas S1 and S2

• Produce– a schema S3 that merges the changes of S1 and S2

1. MapOA = Match(O, A) (based on OIDs) 2. MapOB = Match (O, B) (based on OIDs) 3. MapOA = ApplyFunction(MapOA) such that if eMapOA if

domain(e) = range(e), then delete e  (i.e. things changed in A) 4. MapOB = ApplyFunction(MapOB) such that if eMapOB if

domain(e) = range(e), then delete e (i.e. things changed in B) 5. ChangedA = range(MapOA)6. ChangedB = range(MapOB)7. MapChAChB = Match(ChangedA, ChangedB) 8. MapChBChA = invert(MapChAChB) 9. A = Diff(ChangedA,  ChangedB, MapChAChB) (changed in

A but not changed in B) 10. B = Diff(ChangedB, ChangedA, MapChBChA) 11. MapAB =  Match (A,B) (by OIDs) 12. G = Merge (A,B, MapAB) 13. MapGA =Match(G,A)

14. GA = Merge (G, A, MapGA) with preference for A 15. MapGAB =Match(GA,B) 16. GAB = Merge (GA’, B’, MapGA’B’) with preference for B17. DeletedA = Diff(O,A,MapOA) 18. DeletedB = Diff(O, B, MapOB) 19. MapDeletedAChangedB = Match(DeletedA, ChangedB) 20. MapDeletedBChangedA = Match(DeletedB, ChangedA) 21. ShouldDeleteA = Diff(DeletedA, ChangedB,

MapDeletedAChangedB) 22. ShouldDeleteB = Diff(DeletedB, ChangedA,

MapDeletedBChangedA) 23. MapGABSDA = Match(GAB, ShouldDeleteA) 24. GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA) 25. MapGABSDASDB = Match(GABSDA,ShouldDeleteB) 26. Final result = Diff(GABSDA, ShouldDeleteB,

MapGABSDASDB)

S0

S1 S2

S3

Page 21: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 21

Data Migration• Given

– a schema S and its database D– an evolved schema S

• Produce– a procedure for mapping D into an

S database D

SS SS D

2. Use Enum(S) to generate a data migration script

GenerateMigration

ScriptEnum

1. 1. mapmapSSSS = Match(S, S= Match(S, S))

1. mapSS

Run

D

Page 22: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 22

Data Translation

• Like data migration, except S and S are expressed in different data models.

Page 23: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 23

OutlineOverview of Model Management

Solutions to classical meta data problems

• Recent technical results

Page 24: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 24

Status Report• Vision

– [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00]• Data Warehouse Examples

– [Bernstein & Rahm, ER ’00]• Match Operation

– Survey: [Rahm & Bernstein, MSR Tech Report]– Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01]

• Merge Operation– coming soon …

• Theory– [Alagić & Bernstein, DBPL ’01]

Page 25: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 25

Schema Matching Approaches• About a dozen published algorithms. • Many good ideas, but none are robust.

Automatic composition

Composite

Individual matchers Combined matchers

Manual composition

Schema-based Content-based

• Graph matching

Linguistic Constraint-based

StructuralPer-Element

• Types• Keys

• Value pattern and ranges

Constraint-based

Linguistic

• IR (word frequencies, key terms)

Per-Element

Hybrid

Constraint-based

• Names• Descriptions

Page 26: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 26

The CUPID Algorithm

City Street

PurchaseOrder

InvoiceToDeliverTo

City Street City Street

Address Address

POShipTo

PO

POBillTo

City Street

ssim++

• Computes linguistic similarity of element pairs• Computes structural similarity of element pairs• Generates a mapping

Page 27: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 27

M3 = Merge(M1, map, M2)• [Buneman, Davidson, Kosky, EDBT 92]

– Meta-model has aggregation & generalization only– Do a union and collapse objects having the same name– Fix-up step for inconsistencies created by merging

Y

Xa

Z

Xa

Y X Z

W

a

Y

X

Za a

– Successive fixups lead to different results – Batch them at the end, to produce a unique minimal result

• Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)

Page 28: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 28

A Formal Semantics for Model Mgt

• Use category theory for a data-model-independent characterization of models and mappings

• Models and their DBs are categories• Model and data transformations are morphisms• Mappings between models & data are functors• Utility

– Define formal semantics for Match and Merge– Explain when Match & Merge preserve constraints.– Check that implementation satisfies the semantics

Page 29: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 29

Categories

Functor

Theory

Db Db(Sch1)

Db(Sch12)Db(Sch2)

DbDb

qp

Sch12

Sch1

Sch2

fSchm

g

Match

Merge

• Goal – a mathematical semantics of MM algebra

Page 30: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 30

Implementation Vision

OR Mapper

MM Meta-Model

MatchMerge

ComposeCopy

Apply …

Model-DrivenUI Generator

ModelManager

Object-OrientedRepository

SQLDBMS

BillCustomer

UpdateMarketing

Inventory

AuthorizeCredit

OrderEntry

ScheduleDelivery

Customer

Order

ScheduledDelivery

Product

Salesperson

select all

custempdept

dnodna

Generic ToolsGeneric Tools• BrowserBrowser• Import/exportImport/export• ScriptingScripting

• EditorsEditors• CatalogsCatalogs

OperationSpeciali-zations

InferencingEngine

Page 31: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 31

Related Work• There’s a lot of it. Apply it to model management!

• Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic)

• Inferencing on mappings – AQUV, description logic

• Transitive closure and recursive QP

• Differencing – text, trees, graphs

• Data translation – algebras, schema evolution

• Data integration – schema match, view generation

Page 32: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 32

Summary• Raise the level of abstraction of meta-data

programming by using:– models and mappings as objects– an algebra that manipulates models and

mappings on a generic meta-model• Classical meta data problems can be

expressed using this algebra• Implementations of classic problems offer

guidance on implementing the algebra

Page 33: Generic Model Management A Database Infrastructure for  Schema Manipulation

© 2001 Microsoft Corp. 33

References• http://www.research.microsoft.com/~philbe

• P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference

• P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage-ment of Complex Models”, SIGMOD Record, Dec. 2000

• E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report

• J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001

• S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001