Model-independent Schema and Data Translation
Paolo Atzeni
14/10/2009
References
• P. Atzeni, G. Gianforme, and P. Cappellari. A Universal Metamodel and Its Dictionary. Transactions on Large-Scale Data-and Knowledge-Centered Systems I, 2009
• P. Atzeni, P. Cappellari, R. Torlone, P. A. Bernstein, and G. Gianforme. Model-independent schema translation. VLDB Journal, 2008
• P. Atzeni, P. Cappellari, and P. A. Bernstein. Model independent schema and data translation. EDBT 2006.
ITD 14/10/2009 2
ITD 14/10/2009 3
Schema and data translation
• Schema translation:– given schema S1 in model M1 and model M2– find a schema S2 in M2 that “corresponds” to S1
• Schema and data translation:– given also a database D1 for S1– find also a database D2 for S2 that “contains the same data”
as D1
ITD 14/10/2009 4
A wider perspective
• (Generic) Model Management:– A proposal by Bernstein et al (2000 +)– Includes a set of operators on
• schemas and mappings between schemas– The main operators
• Match• Merge• Diff• ModelGen (=schema translation)
ITD 14/10/2009 5
A long standing issue
• Translations from a model to another have been studied since the 1970’s
• Whenever a new model is defined, techniques and tools to generate translations are studied
• However, proposals and solutions are usually model specific: – Given an ER schema, find the suitable relational schema that
“implements” it • the original paper (Chen 1976) contains the basics• further discussions by many (e.g. Markowitz and Shoshani
1989)• illustrated in every textbook
– Similarly with • any other conceptual model and any other logical one• XML and relational (or object)
ITD 14/10/2009 6
A simple example
• An object relational database, to be translated in a relational one
• Source: the OR-model
• Target: the relational model
System managed ids
used as references
ITD 14/10/2009 7
Example, 2
• How do we traslate?
– Two possibilities. Does the OR model allow for keys?
– If YES (EmpNo and Name are keys)
ITD 14/10/2009 8
Example, 3
• How do we traslate?
– Two possibilities. Does the OR model allow for keys?
– If NOT
ITD 14/10/2009 9
Many different models (plus variants …)
OR
Relational
OO
…
ER
XSD
ITD 14/10/2009 10
Heterogeneity
• We need to handle artifacts and data in various models– Data are defined wrt to schemas– Schemas wrt to models– How can models be defined? We need metamodels
Data
Models
Schemas
= metaschemas
= metadata
ITD 14/10/2009 11
A metamodel approach
• The constructs in the various models are rather similar:
– can be classified into a few categories (Hull & King 1986):
• Abstract (entity, class, …)
• Lexical: set of printable values (domain)
• Aggregation: a construction based on (subsets of) cartesian products (relationship, table)
• Function (attribute, property)
• Hierarchies
• …
• We can fix a set of metaconstructs (each with variants):
– abstract, lexical, aggregation, function, ...
– the set can be extended if needed, but this will not be frequent
• A model is defined in terms of the metaconstructs it uses
ITD 14/10/2009 12
The metamodel approach, example
• The ER model:– Abstract (called Entity)– Function from Abstract to Lexical (Attribute)– Aggregation of abstracts (Relationship) – …
• The OR model:– Abstract (Table with ID)– Function from Abstract to Lexical (value-based Attribute)– Function from Abstract to Abstract (reference Attribute)– Aggregation of lexicals (value-based Table)– Component of Aggregation of Lexicals (Column)– …
ITD 14/10/2009 13
The supermodel
• A model that includes all the meta-constructs (in their most general forms)– Each model is subsumed by the supermodel (modulo
construct renaming) – Each schema for any model is also a schema for the
supermodel (modulo construct renaming)
• In the example, a model that generalizes OR and relational
• Each translation from the supermodel SM to a target model M is also a translation from any other model to M:– given n models, we need n translations, not n2
ITD 14/10/2009 14
Generic translations
1. Copy
2. Translation
3. Copy
Supermodel
Source model
Target model
Translation:composition 1,2 & 3
ITD 14/10/2009 15
Translations within the supermodel
• We still have too many models:– we have few constructs, but each has several independent
features which give rise to variants• for example, within simple OR model versions,
– Key may be specifiable or not– Generalizations may be allowed or not– Foreign keys may be used or not– Nesting may be used or not
– Combining all these, we get hundreds of models!– The management of a specific translation for each model
would be hopeless
ITD 14/10/2009 16
The metamodel approach, translations
• As we saw, the constructs in the various models are similar: – can be classified according to the metaconstructs– translations can be defined on metaconstructs,
• there are standard, known ways to deal with translations of constructs (or variants theoreof)
• Elementary translation steps can be defined in this way– Each translation step handles a supermodel construct (or a
feature thereof) "to be eliminated" or "transformed"• Then, elementary translation steps to be combined• A translation is the concatenation of elementary translation
steps
ITD 14/10/2009 17
Many different models
OR
Relational
OO
…
ER
XSD
ITD 14/10/2009 18
Many different models (and variants …)
OR w/ PK, gen, ref, FK
Relational…
OR w/ PK, gen, ref
OR w/ PK, gen, FK
OR
w/ PK, ref, FK
OR w/ gen, ref
OR w/ PK, FK
OR w/ PK, ref
OR w/ ref
ITD 14/10/2009 19
A complex translation, example(0,N) (0,N)
Target: simple object model• Eliminate N-ary relationships • Eliminate attributes from relationships • Eliminate many-to-many relationships• Replace relationships with references• Eliminate generalizations
ITD 14/10/2009 20
Complex translations
Binary ER w/o gen
N-ary ER w/ gen
Binary ERw/ gen
Relational
Bin ER w/ gen w/o attr on rel
OO w/ gen
N-ary ER w/o gen
Bin ER w/o gen w/o attr on rel
Bin ER w/o gen w/o M:N rel
Elim. N-ary relationships Elim. Relationship attr.sElim. M:N relationshipsReplace relationships with
referencesElim OO generalizations Elim ER generalizations
Bin ER w/ gen w/o M:N rel
OO w/o gen…
ITD 14/10/2009 21
A more complex example
• An object relational database, to be translated in a relational one
– Source: an OR-model
– Target: the relational model
ITD 14/10/2009 22
A more complex example, 2
ENG
EMP DEPT
Last Name
School
Dept
Name
Address
ID ID
ID
Dept_ID
Emp_ID
Target: relational model
Eliminate generalizations Add keysReplace refs with FKs
A more complex example, 3
ITD 14/10/2009 23
EMP
Last Name
ID
Dept_ID
DEPT
Name
Address
ID
ENG
School
ID
Emp_ID
Target: relational model
Eliminate generalizations Add keysReplace refs with FKsReplace objects with tables
ITD 14/10/2009 24
Many different models (and variants …)
OR w/ PK, gen, ref, FK
Relational
OR w/ PK, gen, ref
OR w/ PK, gen, FK
OR
w/ PK, ref, FK
OR w/ gen, ref
OR w/ PK, FK
OR w/ PK, ref
OR w/ ref Eliminate generalizations
Add keysReplace refs with FKsReplace objects with tables
Source
Target
ITD 14/10/2009 25
Translations in MIDST (our tool)
• Basic translations are written in a variant of Datalog, with OID invention– We specify them at the schema level– The tool "translates them down" to the data level
(both in off-line and run-time manners, see later)– Some completion or tuning may be needed
ITD 14/10/2009 26
A Multi-Level Dictionary
• Handles models, schemas and data• Has both a model specific and a model independent component• Relational implementation, so Datalog rules can be easily
specified
ITD 14/10/2009 27
A common dictionary (for an ER design tool)
Entity
OID Schema Name
301 1 Employees
302 1 Departments
201 3 Clerks
202 3 Offices
AttributeOfEntity
OID Schema Name isIdent isNullable Type Entity
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302
405 1 Address F F Text 302
501 3 Code T F Int 201
… … … … … … …
Relationship
OID Schema Name Entity1 Entity2
401 1 Memb 301 302 … …
… … … … … … …
Employees DepartmentsMemb
Name
AddressName
EmpNo
ITD 14/10/2009 28
Multi-Level Dictionary
model
schema
modelgeneric
modelspecific
description
modelindependence
Supermodel description
(mSM)
Model descriptions
(mM)
Supermodel schemas
(SM)
Model specific schemas
(M)
Supermodel instances
(i-SM)
Model specific instances
(i-M)data
ITD 14/10/2009 29
The supermodel description
MSM-Construct
OID Name IsLex
1 AggregationOfLexicals F
2 ComponentOfAggrOfLex T
3 Abstract F
4 AttributeOfAbstract T
5 BinaryAggregationOfAbstracts F
MSM-Property
OID Name Constr. Type
11 Name 1 String
12 Name 2 String
13 IsKey 2 Boolean
14 IsNullable 2 Boolean
15 Type 2 String
16 Name 3 String
17 Name 4 String
18 IsIdentifier 4 Boolean
19 IsNullable 4 Boolean
20 Type 4 String
21 IsFunct1 5 Boolean
22 IsOptional1 5 Boolean
23 Role1 5 String
24 IsFunct2 5 Boolean
25 IsOptional2 5 Boolean
26 Role2 5 String
MSM-Reference
OID Name Construct Target
30 Aggregation 2 1
31 Abstract 4 3
32 Abstract1 5 3
33 Abstract2 5 3
mSMmMSMMi-SMi-M
ITD 14/10/2009 30
Model descriptions
MSM-Construct
OID Name IsLex
1 AggregationOfLexicals F
2 ComponentOfAggrOfLex T
3 Abstract F
4 AttributeOfAbstract T
5 BinaryAggregationOfAbstracts F
MM-Construct
OID Model MSM-Constr Name
1 2 3 ER_Entity
2 2 4 ER_Attribute
3 2 5 ER_Relationship
4 1 1 Rel_Table
5 1 2 Rel_Column
6 3 3 OO_Class
MM-Model
OID Name
1 Relational
2 Entity-Relationship
3 Object
MSM-Property
OID Name Construct Type
… … … …
MSM-Reference
OID Name Construct …
… … … …
MM-Property
… … … …
MM-Reference
… … … …
mSMmMSMMi-SMi-M
ITD 14/10/2009 31
Schemas in a model
ER-Entity
OID Schema Name
301 1 Employees
302 1 Departments
201 3 Clerks
202 3 Offices
ER-AttributeOfEntity
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302
405 1 Address F F Text 302
501 3 Code T F Int 201
… … … … … … …
SM-Construct
OID Name IsLex
… … …
3 ER-Entity F
4 ER-AttributeOfEntity T
… … …
ER schemas
mSMmMSMMi-SMi-M
Employees
Departments
EmpNo
Name
Name
Address
ITD 14/10/2009 32
Schemas in the supermodel
SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
201 3 Clerks
202 3 Offices
SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302
405 1 Address F F Text 302
501 3 Code T F Int 201
… … … … … … …
MSM-Construct
OID Name IsLex
… … …
3 Abstract F
4 AttributeOfAbstract T
… … …
Supermodel schemas
mSMmMSMMi-SMi-M
Employees
Departments
EmpNo
Name
Name
Address
ITD 14/10/2009 33
Multi-Level Repository, generation and use
model
schema
modelgeneric
modelspecific
description
modelindependence
Supermodel description
(mSM)
Model descriptions
(mM)
Supermodel schemas
(SM)
Model specific schemas
(M)
Supermodel instances
(i-SM)
Model specific instances
(i-M)data
Structure fixed, content provided by tool
designers
Structure generated by the tool from the content of mSM
Structure generated by the tool from the content of mSM
Structure fixed, content provided by model
designers out of mSM
Structure generated by the tool from the content of mM
ITD 14/10/2009 34
continued next week
Top Related