Data integration and transformation 3. Data Exchange
description
Transcript of Data integration and transformation 3. Data Exchange
![Page 1: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/1.jpg)
Data integration and transformation
3. Data Exchange
Paolo Atzeni Dipartimento di Informatica e Automazione
Università Roma Tre28/10-4/11/2009
![Page 2: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/2.jpg)
References
• Ronald Fagin, Laura M. Haas, Mauricio Hernandez, Renee J. Miller, Lucian Popa, and Yannis Velegrakis "Clio: Schema Mapping Creation and Data Exchange" A.T. Borgida et al. (Eds.): Mylopoulos Festschrift, LNCS 5600, Springer-Verlag Berlin Heidelberg, 2009, pp. 198–236.
and other papers cited in it
P. Atzeni ITD - 3 - 28/10-4/11/2009 2
![Page 3: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/3.jpg)
P. Atzeni ITD - 3 - 28/10-4/11/2009 3
Data exchange
• Given a source and a target schema, find a transformation from the former to the latter
![Page 4: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/4.jpg)
P. Atzeni ITD - 3 - 28/10-4/11/2009 4
Data exchange, a typical approach (the Clio project)
Schema Match
Mapping generation
Query generation
Target schema
Source schema
![Page 5: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/5.jpg)
Simple example
Dept(Id,DeptName) Emp(Code,EmpName,Dept)Employee(Id,Name,DeptId)
(with FK from DeptId to Dept.Id)
Assume we know that Employee.Id corresponds to CodeName corresponds to EmpNameDeptName corresponds to Dept
We would like to obtain a query that populates EmpSELECT Id as Code, Name AS EmpName, DeptName AS DeptFROM Employee JOIN Dept ON DeptId = Dept.Id
P. Atzeni ITD - 3 - 28/10-4/11/2009 5
![Page 6: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/6.jpg)
Better visualization
EmployeeIdNameDeptId
DeptIdDeptName
EmpCodeEmpNameDept
P. Atzeni ITD - 3 - 28/10-4/11/2009 6
We want to obtainSELECT Id as Code, Name AS EmpName, DeptName AS DeptFROM Employee JOIN Dept ON DeptId = Dept.Idand notSELECT Id as Code, Name AS EmpName, NULL AS Dept FROM Employee UNIONSELECT NULL as Code, NULL AS EmpName, DeptName AS Dept FROM DeptnorSELECT Id as Code, NULL AS EmpName, NULL AS Dept FROM Employee UNION…
![Page 7: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/7.jpg)
The main issue
• How do we discover we should use a join and not one or two unions?
• Attributes that appear together in a relation– Id,Name in the source and Code,EmpName in the target
• The foreign key
P. Atzeni ITD - 3 - 28/10-4/11/2009 7
![Page 8: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/8.jpg)
P. Atzeni ITD - 3 - 28/10-4/11/2009 8
Data exchange, another example
PayRate ( Rank HrRate )
Professor ( Id Name Sal )
Student ( Name GPA Yr )
WorksOn ( Name Proj Hrs ProjRank )
Personnel ( Id Name Sal Addr )
Address ( Id Addr )
• Foreign keys – between the two Id– between ProjRank and Rank– between the two Name
![Page 9: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/9.jpg)
P. Atzeni ITD - 3 - 28/10-4/11/2009 9
Data exchange, example
PayRate ( Rank HrRate )
Professor ( Id Name Sal )
Student ( Name GPA Yr )
WorksOn ( Name Proj Hrs ProjRank )
Personnel ( Id Name Sal Addr )
Address ( Id Addr )
• Assume we are given correspondences, which involve functions:– Usually identity– PayRate(HrRate)*WorksOn(Hrs) → Personnel(Sal)
![Page 10: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/10.jpg)
P. Atzeni ITD - 3 - 28/10-4/11/2009 10
Data exchange, example
PayRate ( Rank HrRate )
Professor ( Id Name Sal )
Student ( Name GPA Yr )
WorksOn ( Name Proj Hrs ProjRank )
Personnel ( Id Name Sal Addr )
Address ( Id Addr )
• How do we combine HrRate and Hrs?– Via a join suggested by foreign keys
• Foreign key between ProjRank and ProjRank suggests a join• Foreign keys over Name and between Yr and Rank suggest
another
![Page 11: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/11.jpg)
Heuristic
• We have many correspondences• Group correspondences in such a way that each set contains at
most one correspondence for each attribute in the target• We are interested in sets where the source attribute are either in
the same relations or in relations whose join is meaningful
P. Atzeni ITD - 3 - 28/10-4/11/2009 11
![Page 12: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/12.jpg)
Professor ( Id Name Sal )
PayRate ( Rank HrRate )
Student ( Name GPA Yr )
WorksOn ( Name Proj Hrs ProjRank )
Personnel ( Id Name Sal Addr )
Address ( Id Addr )
P. Atzeni ITD - 3 - 28/10-4/11/2009 12
Partition the correspondences
• … and for each partition the joins are meaningful
![Page 13: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/13.jpg)
P. Atzeni ITD - 3 - 28/10-4/11/2009 13
The process, example
SELECT P.Id, P.Name, P.Sal, A.AddrFROM Professor P, Address AWHERE A.Id = P.IdUNION ALLSELECT NULL AS Id, S.Name, p.HrRate * W.Hrs, NULL AS AddrFROM PayRate P, Student S, WorksOn WWHERE W.Name = S.Name AND S.Yr = P.Rank
Professor ( Id Name Sal )
PayRate ( Rank HrRate )
Student ( Name GPA Yr )
WorksOn ( Name Proj Hrs ProjRank )
Personnel ( Id Name Sal Addr )
Address ( Id Addr )
![Page 14: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/14.jpg)
More complex example (with nesting)
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
OrganizationsCodeYearFundings
FIdFinId
FinancesFinIdBudgetPhone
P. Atzeni ITD - 3 - 28/10-4/11/2009 14
f1
f2
f3
f4
Nested relation
Organizations
FundingsCode
HAL
Year
301
FinIdFId
SM
PH 303
302
![Page 15: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/15.jpg)
Correspondences (given by a "schema matcher")
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
OrganizationsCodeYearFundings
FIdFinId
FinancesFinIdBudgetPhone
P. Atzeni ITD - 3 - 28/10-4/11/2009 15
v1
v2
v3
v4
f1
f2
f3
f4
![Page 16: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/16.jpg)
Let us formalize correspondences
P. Atzeni ITD - 3 - 28/10-4/11/2009 16
v1
v2
v3
v4
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2
f3
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
n,d,y Companies(n,d,y) → y',F Organizations(n,y',F))v1
v2g,r,a,s,m Grants(g,r,a,s,m) →
c,y,F,f Organiz…(c,y,F)), F(g,f)
v4c, e, p Contacts(c,e,p) →
f,b Finances(f,b,p)
v3g, r, a, s, m Grants(g,r,a,s,m) →
f,p Finances(f,a,p)
![Page 17: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/17.jpg)
Correspondences alone are not enough
P. Atzeni ITD - 3 - 28/10-4/11/2009 17
v1
v2
v3
v4
CompaniesNameAddressYear
GrantsGIdRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2
f3
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
n,d,y Companies(n,d,y) → y',F Organizations(n,y',F))v1
v3g, r, a, s, m Grants(g,r,a,s,m) →
f,p Finances(f,a,p)
v2g,r,a,s,m Grants(g,r,a,s,m) →
c,y,F,f Organiz…(c,y,F)), F(g,f)
v4c, e, p Contacts(c,e,p) →
f,b Finances(f,b,p)
CompaniesName Address YearHAL NY 1920SM Seattle 1984PH SF 1957
Grants
GId Rec.t Amt301 HAL 30302 HAL 40303 PH 30
Organizations
FundingsCode
HAL
Year
FinIdFId
SM
PH
301
302
![Page 18: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/18.jpg)
More complex mappings are needed,representing associations
P. Atzeni ITD - 3 - 28/10-4/11/2009 18
v1
v2
v3
v4
CompaniesNameAddressYear
GrantsGIdRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2
f3
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) →
y',F,f Organizations(n,y',F)), F(g,f)
v3g, r, a, s, m Grants(g,r,a,s,m) →
f,p Finances(f,a,p)
v4c, e, p Contacts(c,e,p) →
f,b Finances(f,b,p)
CompaniesName Address YearHAL NY 1920SM Seattle 1984PH SF 1957
Grants
GId Rec.t Amt301 HAL 30302 HAL 40303 PH 30
Organizations
FundingsCode
HAL
Year
301
FinIdFId
SM
PH 303
302
Note: The "association" between companies and grants in the source is suggested by f1 (a foreign key)
![Page 19: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/19.jpg)
Yet more complex
P. Atzeni ITD - 3 - 28/10-4/11/2009 19
v1
v2
v3
v4
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2
f3
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) →
y',F,f, p Organizations(n,y',F), F(g,f), Finances(f,a,p)
Notes: •Three tuples are generated for each pair of related companies and grants•The mapping specifies that there exist an f, appearing in two places, without saying which its value should be
![Page 20: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/20.jpg)
A final issue
P. Atzeni ITD - 3 - 28/10-4/11/2009 20
v1
v2
v3
v4
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2
f3
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
• How do we obtain the phone to be put in finances?
• Is it the supervisor's one or the manager's?
• FKs suggest either (or even both)• Human intervention is needed to
choose
![Page 21: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/21.jpg)
Various solutions in nested caseswith possibily undesirable features
P. Atzeni ITD - 3 - 28/10-4/11/2009 21
CompaniesName Address YearHAL NY 1920SM Seattle 1984PH SF 1957
Grants
GId Rec.t Amt301 HAL 30302 HAL 40303 PH 30
Organizations
FundingsCode
HAL
Year
301
FinIdFId
k1
SM
PH 303 k1
302 k1
Finances
FinId Budget phonek1 30k1 40k1 30
![Page 22: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/22.jpg)
A better solution
P. Atzeni ITD - 3 - 28/10-4/11/2009 22
CompaniesName Address YearHAL NY 1920SM Seattle 1984PH SF 1957
Grants
GId Rec.t Amt301 HAL 30302 HAL 40303 PH 30
Organizations
FundingsCode
HAL
Year
301
FinIdFId
k1
SM
PH 303 k3
302 k2
Finances
FinId Budget phonek1 30k2 40k3 30
![Page 23: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/23.jpg)
A more verbose notation for mappings
P. Atzeni ITD - 3 - 28/10-4/11/2009 23
v1
v2
v3
v4
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2
f3
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) →
y',F,f, p Organizations(n,y',F)), F(g,f), Finances(f,a,p)
foreach c in companies, g in grantswhere c.name=g.recipient
exists o in organizations,f in o.fundings,i in financeswhere f.finId = i.finId
with o.code = c.name and f.fId = g.gId and i.budget = g.amount
query on the source
query on the targetcorrespondences
![Page 24: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/24.jpg)
The mapping as a source-to-target constraint
P. Atzeni ITD - 3 - 28/10-4/11/2009 24
v1
v2
v3
v4
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2
f3
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
foreach c in companies, g in grantswhere c.name=g.recipient
exists o in organizations,f in o.fundings,i in financeswhere f.finId = i.finId
with o.code = c.name and f.fId = g.gId and i.budget = g.amount
QS QT
"the result of QT (over the target, projected as in the with-clause) must contain the result of QS (over the source, projected as in the with-clause)"
QS
QT
![Page 25: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/25.jpg)
Syntax and restrictions
foreach x1 in g1, . . . , xn in gn
where B1
exists y1 in g'1, . . . , ym in g'mwhere B2
with e1 = e'1 and . . . and ek = e'k
foreach c in companies, g in grantswhere c.name=g.recipient
exists o in organizations,f in o.fundings,i in financeswhere f.finId = i.finId
with o.code = c.nameand f.fId = g.gIdand i.budget = g.amount
P. Atzeni ITD - 3 - 28/10-4/11/2009 25
xi in gi (generator)•xi variable•gi set (either the root or a set nested within it)
B1 conjunction of equalities over the xi variables
yi in g'iB2
similar
e1 = e'1 … equalities between a source expression and a target expression
Restrictions: See paper, page 210, lines 5+: "The mapping is well formed …"
![Page 26: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/26.jpg)
Schema constraints
• Referential integrity is essential in this approach as the basis for the discovery of "associations"
• Given the nested model, they need a rather complex definition• So, two steps
– Paths (primary paths and relative paths)– Nested referential integrity (NRI) constraints
P. Atzeni ITD - 3 - 28/10-4/11/2009 26
![Page 27: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/27.jpg)
Primary paths
• Primary path (given a schema root R, that is a first level element in the schema):– x1 in g1, x2 in g2, …, xn in gn
• where g1 is an expression on R (just R?), gi (for i ≥ 2) g1 is an expression on xi-1
• Examples– c in companies– o in organizations– o in organizations, f in o.fundings
P. Atzeni ITD - 3 - 28/10-4/11/2009 27
![Page 28: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/28.jpg)
Relative paths
• Primary path (given a schema root R, that is a first level element in the schema):– x1 in g1, x2 in g2, …, xn in gn
• where g1 is an expression on R (just R?), gi (for i ≥ 2) g1 is an expression on xi-1
• Relative path with respect to a variable x – x1 in g1, x2 in g2, …, xn in gn
• where g1 is an expression on x (just x?), gi (for i ≥ 2) g1 is an expression on xi-1
• Example– f in o.fundings
P. Atzeni ITD - 3 - 28/10-4/11/2009 28
![Page 29: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/29.jpg)
Nested referential integrity (NRI) constraints
• foreach P1 exists P2 where B
– P1 is a primary path
– P2 is either a primary path or a relative path with respect to a variable in P1
– B is a conjunction of equalities between an expression on a variable of P1 and an expression on a variable of P2
• Exampleforeach o in organizations, f in o.fundings exists i in finances
where f.finId = i.finId
P. Atzeni ITD - 3 - 28/10-4/11/2009 29
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
![Page 30: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/30.jpg)
The context
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
OrganizationsCodeYearFundings
FIdFinId
FinancesFinIdBudgetPhone
P. Atzeni ITD - 3 - 28/10-4/11/2009 30
v1
v2
v3
v4
f1
f2
f3
f4
![Page 31: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/31.jpg)
Associations
from x1 in g1, x2 in g2, …, xn in gn [where B]
– xi in gi generator (each expression may include variables defined in a previous generator)
– B a conjunction of equalities (with variables and constants)• Examples
– from c in contacts
– from g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid andg.manager = m.cid
P. Atzeni ITD - 3 - 28/10-4/11/2009 31
![Page 32: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/32.jpg)
Associations
• In the (flat) relational model, an association is a join (possibly with a selection)– from c in contacts– from g in grants, c in companies, s in contacts, m in contacts
where g.recipient = c.name and g.supervisor = s.cid and
g.manager = m.cid
P. Atzeni ITD - 3 - 28/10-4/11/2009 32
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2 f3
![Page 33: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/33.jpg)
Dominance and union
• A2 dominates A1 (A1 ≤ A2 ) if
– the from and where clauses of A1 are subsets of those of A2
(after suitable renaming and with other technicalities)• Example
– A2 : from g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid and
g.manager = m.cid– A1 : from g in grants, c in companies
where g.recipient = c.name
• Union of associations:– Union of from and of where (with renamings if needed)
P. Atzeni ITD - 3 - 28/10-4/11/2009 33
![Page 34: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/34.jpg)
Useful associations
• Structural association:– from P with P primary path
– from o in organizations, f in o.fundings
• User association – Any association (specified by the user)
• Logical association– An association obtained by "chasing" constraints (starting with a
structural or a user association)• from o in organizations, f in o.fundings, i in finances
where f.finId=i.finId
P. Atzeni ITD - 3 - 28/10-4/11/2009 34
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
![Page 35: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/35.jpg)
Logical associations
• from o in organizations, f in o.fundings NO
• from o in organizations, f in o.fundings, i in financeswhere f.finId=i.finId SÌ
• from c in companies SÌ
• from g in grants, c in companieswhere g.recipient = c.name NO
• from g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid andg.manager = m.cid SÌ
P. Atzeni ITD - 3 - 28/10-4/11/2009 35
OrganizationsCodeYearFundings
FId
FinId
FinancesFinIdBudgetPhone
f4
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2 f3
![Page 36: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/36.jpg)
The chase
• Given as association, repeatedly applying a chase rule to the "current" association (initialed as the input one)– If there is a NRI constraint
foreach X exists Y where Bsuch that (this is a bit informal but intuitive) the "current" association contains X and does not contain a Y that satisfies Bthen add Y to the generators and B to the where clause
• Example. If we start with from g in grants
then we have to add various components and obtainfrom g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid andg.manager = m.cid
• If the NRIs are acyclic, then the chase terminates andthe result does not depend on the order of application
P. Atzeni ITD - 3 - 28/10-4/11/2009 36
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2 f3
![Page 37: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/37.jpg)
Mapping generation
• Logical associations are meaningful combinations of correspondences
• A set of correspondences can be interpreted together if there are two logical associations (one in the source and one in the target) that cover them
• The algorithm for generating schema mappings– Finds maximal sets of correspondences that can be
interpreted together– Compares pairs of logical association (one in the source and
the other in the target)– Select a suitable set of pairs
P. Atzeni ITD - 3 - 28/10-4/11/2009 37
![Page 38: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/38.jpg)
Correspondences
• <P;e> schema element (an attribute somewhere)– P primary path– e expression on the last variable of P
• Examples– <c in companies; c.name>– <o in organizations, f in o.fundings;
c.name>• Correspondence:
for each PS exists PT with eS=eT
• with <PS; eS > and <PT; eT> schema elements• Example (v1)
– for each c in companies exists o in organizationswith c.name = o.code
P. Atzeni ITD - 3 - 28/10-4/11/2009 38
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
f1
f2 f3
![Page 39: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/39.jpg)
Correspondences, examples
v1: for each o in companies exists o in organizations with c.name = o.code
v2: for each g in grants exists o in organizations , f in o.fundingswith g.gId = f.fId
P. Atzeni ITD - 3 - 28/10-4/11/2009 39
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
OrganizationsCodeYearFundings
FIdFinId
FinancesFinIdBudgetPhone
v1
v2
v3
v4
f1
f2
f3
f4
v3: for each g in grants exists i in finances, with g.amount= i.budget
v4: for each c in contactsexists i in financeswith c.phone = i.phone
![Page 40: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/40.jpg)
Correspondences and associations
• A correspondence v : for each PS exists PT with eS=eT
is covered by a pair of associations (on source and target, resp.) <AS , AT> if PS ≤ AS and PT ≤ AT with some renaming h, h' (on source and target, resp.)
• We say that – there is a coverage of v by <AS , AT> via <h,h'>– the result of the coverage is h(eS )=h'(eT )
P. Atzeni ITD - 3 - 28/10-4/11/2009 40
![Page 41: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/41.jpg)
Clio mapping
• Given – S, T source and target schemas– C set of correspndences
• A Clio mapping:for each AS exists AT with E– AS AT logical associations (on source and target, resp.)– E a conjunction of equalities:
• for each correspondence v in C covered by <AS , AT> , E includes the equality h(eS )=h(eT ) which is the result of the coverage, for one of the coverages
P. Atzeni ITD - 3 - 28/10-4/11/2009 41
![Page 42: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/42.jpg)
Clio mapping, example
from g in grants, c in companies, s in contacts, m in contacts
where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid
from o in organizations, f in o.fundings, i in finances
where f.finId = i.finId
P. Atzeni ITD - 3 - 28/10-4/11/2009 42
CompaniesNameAddressYear
GrantsGidRecipientAmountSupervisorManager
ContactsCidEmailPhone
OrganizationsCodeYearFundings
FIdFinId
FinancesFinIdBudgetPhone
v1
v2
v3
v4
f1
f2
f3
f4
v1, v2, v3 are covered
![Page 43: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/43.jpg)
Clio mapping, example
from g in grants, c in companies, s in contacts, m in contacts
where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid
from o in organizations, f in o.fundings, i in finances
where f.finId = i.finId
P. Atzeni ITD - 3 - 28/10-4/11/2009 43
for each g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid
exists o in organizations, f in o.fundings, i in financeswhere f.finId = i.finId
with c.name = o.code and g.gId = f. fId and g.amount = i.budget
![Page 44: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/44.jpg)
Clio mappings, more
v4: for each c in contactsexists i in financeswith c.phone = i.phone
for each g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid
exists o in organizations, f in o.fundings, i in financeswhere f.finId = i.finId
with c.name = o.code and g.gId = f. fId and g.amount = i.budget and m.phone = i.phone
for each g in grants, c in companies, s in contacts, m in contacts….and s.phone = i.phone
P. Atzeni ITD - 3 - 28/10-4/11/2009 44
![Page 45: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/45.jpg)
Mapping generation algorithm
P. Atzeni ITD - 3 - 28/10-4/11/2009 45
![Page 46: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/46.jpg)
Data Exchange
<statisticsDB> { FOR $x0 IN /expenseDB/grant, $x1 IN /expenseDB/project, $x2 IN /expenseDB/company WHERE $x2/cid/text() = $x0/cid/text() $x0/project/text() = $x1/name/text() RETURN <cityStatistics> { FOR $x0L1 IN /expenseDB/grant, $x1L1 IN /expenseDB/project, $x2L1 IN /expenseDB/company WHERE $x2L1/cid/text() = $x0L1/cid/text() $x0L1/project/text() = $x1L1/name/text() $x2/city/text() = $x2L1/city/text() RETURN <organization> <cid> { $x0L1/cid/text() } </cid> <cname> { $x2L1/name/text() } </cname> { FOR $x0L2 IN /expenseDB/grant, $x1L2 IN /expenseDB/project, $x2L2 IN /expenseDB/company WHERE $x2L2/cid/text() = $x0L2/cid/text() $x0L2/project/text() = $x1L2/name/text() $x2L1/name/text() = $x2L2/name/text() $x2L1/city/text() = $x2L2/city/text() $x0L1/cid/text() = $x0L2/cid/text() RETURN <funding> ………………………………..
P. Atzeni 46ITD - 3 - 28/10-4/11/2009
![Page 47: Data integration and transformation 3. Data Exchange](https://reader035.fdocuments.us/reader035/viewer/2022062814/5681684b550346895dde4213/html5/thumbnails/47.jpg)
Query Generation
• Correspondences map only into some of the atomic attributes • We use Skolem functions to control the creation of the other elements
– sets (this controls how we group elements in the target)– atomic values (this enforces the integrity of the target)
expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd cid gid amount sponsor
project
statDB: Set of Rcd cityStat: Rcd city orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd
funding: Rcd gid proj aid
financials: Set of Rcd financial: Rcd aid date amount
M2
= Sk3[name]
= Sk4[name,gid,amt]
= Sk4[name,gid,amt]
= Sk2[]
= Sk1[name]
P. Atzeni 47ITD - 3 - 28/10-4/11/2009