DDAM Lecture 6 Integration 1

download DDAM Lecture 6 Integration 1

of 42

Transcript of DDAM Lecture 6 Integration 1

  • 8/6/2019 DDAM Lecture 6 Integration 1

    1/42

    1

    Distributed Data Architecture &

    Management

    Dr Simon Scola

  • 8/6/2019 DDAM Lecture 6 Integration 1

    2/42

    2

    Acknowledgement

    These LN series slides are based on slides adapted from authorsof the text book :

    1) Distributed DBMS, M. Tamer zsu & Patrick Valduriez

    Prentice Hall, 1999

    2) OU course notes M877

    With grateful acknowledgement

  • 8/6/2019 DDAM Lecture 6 Integration 1

    3/42

    3

    L ectures / ground rules

  • 8/6/2019 DDAM Lecture 6 Integration 1

    4/42

    4

    C oncepts & Topics for this LN

    Interoperability IntegrationSchema global conceptual schems, schematransalation, schema integration, ...

    H omogenization

  • 8/6/2019 DDAM Lecture 6 Integration 1

    5/42

    5

    Motivation

    Why interoperate, integrate? K inds of interoperability

    The general interoperability problem- Business push mergers; globalisation- Technology pull

    N ew distributed applications; web, etcMany sources of data; huge volumesH eterogenous data & systemsL egacy systems still used to handle lots of data

  • 8/6/2019 DDAM Lecture 6 Integration 1

    6/42

    6

    Motivation... thus

    R enewed much interest in broader question- I nformation system interoperability

    But what does that mean? H ow do we achieve it?

    Many issues here, but our focusin this lecture- Data/base Interoperability,

    - Heterogeneity & Integration Issues

  • 8/6/2019 DDAM Lecture 6 Integration 1

    7/42

    7

    Distributed Data (seen this before! )

    Paris projectsParis employeesParis assignmentsBoston employees

    Montreal projectsParis projectsNew York projects

    with budget > 200000Montreal employeesMontreal assignments

    Boston

    CommunicationNetwork

    Montreal

    Paris

    New York

    Boston projectsBoston employeesBoston assignments

    Boston projectsNew York employeesNew York projectsNew York assignments

    Tokyo

  • 8/6/2019 DDAM Lecture 6 Integration 1

    8/42

    8

    I ntegration problem (about hetrogenity )

    Boston

    CommunicationNetwork

    Montreal

    Paris

    New York

    Tokyo

  • 8/6/2019 DDAM Lecture 6 Integration 1

    9/42

    9

    Transparent system : User View

    Distributed Database

  • 8/6/2019 DDAM Lecture 6 Integration 1

    10/42

    10

    R ecap. Multi-Distributed Database systems

  • 8/6/2019 DDAM Lecture 6 Integration 1

    11/42

    11

    R ecap. Multi-Distributed Database systems

    Features- A collection of databases in which a global logical schema

    exists to enable distributed data access and management,but in which each database can be accessed independentlyof the global system for local use

    - Local sites can operate independently and apply a full set of D B MS operations locally

    - Local external schemas available to local users- Global schema represents shared information over MDD B

  • 8/6/2019 DDAM Lecture 6 Integration 1

    12/42

    12

    R ecap. Multi-Distributed Database systems

    Features- Global schema requires translation of heterogeneous local

    schemas- MDD B requires both local and global management and

    processing

  • 8/6/2019 DDAM Lecture 6 Integration 1

    13/42

    13

    Data/base integration

    process of conceptually integrating many datasources (db or otherwise ) to form a single,cohesive databaseWhat does this mean in the context of designingdistributed data?

    It means it is a process of designing the global conceptual

    schema, Bottom-up

  • 8/6/2019 DDAM Lecture 6 Integration 1

    14/42

    14

    G lobal Logical schema

    1.Fragmentationf1, f2, ,f5

    2. Replicationf3 & f4 replicated

    3.Partitioning2 partitions, p1 & p2

    4.Allocation2sites

    f1f5

    f2f3

    f4

    p1p2

    Bott

    om- UP

  • 8/6/2019 DDAM Lecture 6 Integration 1

    15/42

    15

    Data/base integration /2

    It means it is a process of designing theglobal conceptual schema, Bottom-up

    And thus N OT applicable to all 4 kinds of architectures that we looked at

    A pplicable in cases where a global conceptualschema is part of the architecture

    Bottom up : means individual data sourcesalready exist

  • 8/6/2019 DDAM Lecture 6 Integration 1

    16/42

    16

    Thus ...

    Designing the global concpetual schemainvolves integrating the components localconceptual schemas- E .g. CW integrating of schemas

    H ow do we achieve such integration?What are the problems/issues?

    H ow do we solve them?

  • 8/6/2019 DDAM Lecture 6 Integration 1

    17/42

    17

    Data integration

    K inds of problems to resolve include :- Schema integration issues vs data/instance

    integrations issues- Semnatic vs syntactic issues

  • 8/6/2019 DDAM Lecture 6 Integration 1

    18/42

    18

    G eneral Db Integration Process

    Two step process, in general,- Translation- I ntegration

    Data source 1

    Translator 1

    InS 1

    GCS

    Integrator

    Data source 2

    Translator 2

    InS 2

    Data sourceN

    TranslatorN

    InSN

    ....

    ....

    ....

  • 8/6/2019 DDAM Lecture 6 Integration 1

    19/42

    19

    G eneral Db I ntegration Process/2

    Translation step- I nto a canonical model - Necessary if data sources are heterogenous

    So what does heterogenous mean here?

    - Aim to reduce translation to a min- C anonical model - sufficiently expressive to subsume /

    include diverse concepts from many sources / databases

    - More expressive?C olour printer or black and white printer

    English alphabet (26 ) or Portuguese alphabet (23+ 13 accents )

  • 8/6/2019 DDAM Lecture 6 Integration 1

    20/42

    20

    E xpressiveness & heterogeneity

    TheCh inese language itself is remarkably concrete.There isno word for "size," for example. I f you want to fit someone for shoes, you ask them for the "big -

    small" of their feet.There is no suffix equivalent to "ness" inCh inese . Sothere is no "whiteness" -- only the white of the swanand the white of the snow.

    TheCh inese are disinclined to use precisely defined terms or categories in any arena, but instead useexpressive, metaphoric language.

  • 8/6/2019 DDAM Lecture 6 Integration 1

    21/42

    21

    G eneral Db I ntegration Process/3

    Translation step- Q1:

    is this step needed if all data are held in a relational databases?- Q2 :

    Are Oracle & MS SQ L server homogenous?

  • 8/6/2019 DDAM Lecture 6 Integration 1

    22/42

    22

    G eneral Db I ntegration Process/4

    Integration step

    Each I nS x is then integrated into a G C S

    We assumed conceptual schemas from local toglobal;

  • 8/6/2019 DDAM Lecture 6 Integration 1

    23/42

    23

    G eneral Db Integration Process

    Two step process, in general,- Translation- I ntegration

    Data source 1

    Translator 1

    InS 1

    GCS

    Integrator

    Data source 2

    Translator 2

    InS 2

    Data sourceN

    TranslatorN

    InSN

    ....

    ....

    ....

  • 8/6/2019 DDAM Lecture 6 Integration 1

    24/42

    24

    I llustrative Example

    Assume Two databases to be integrated :

    1. R elational,- our running example of E MP - PROJ -ASG-S A L

    - slightly modified, tables or relations

    2. an E R model;- Similar SCH E M A & data,

    - BU T NOT I D E NT I C A L; using E R concepts

    Which is more Expressive?

  • 8/6/2019 DDAM Lecture 6 Integration 1

    25/42

    25

    I llustrative Example/2

    db1 R elati o nal , modified as follows

    E MP ( E NO, E N A M E , T I TL E)

    PROJ (PNO, PN A M E , BU DG E T, LOC , CN A M E ) ASG ( E NO, PNO, R E SP, DU R )

    P AY (T I TL E , S A L )

    This is one version taken from the text book which we modify above.

  • 8/6/2019 DDAM Lecture 6 Integration 1

    26/42

    26

    I llustrative Example/3

    Db2 - ER m o del /conceptsSimilar, including one significant difference, keeps Dataabout clients who contracted the projects

    EN G INEE R

    C LIENT

    PROJ E C T WORK S -IN

    C ONT RA C TED -BY

    n

    n

    1

    1

    Different grahpical notations! Note: no attributes shown Relationship as diamond No pK shown here!!

  • 8/6/2019 DDAM Lecture 6 Integration 1

    27/42

    27

    I llustrative Example/4

    Db2 - E R model/concepts

    C LIENT

    EN G INEE R PROJ E C T WORK S -IN

    C ONT RA C TED BY

    n

    n

    1

    1

    EN G INEE R ( EN O, EN G -N AME , T IT LE , S AL)PROJ E C T (PROJ N O, PROJ N AME , LO C , BU D G ET )

    C LIENT (C N AME , ADD RE SS , ...)

    &the relationship attributes are:

    WORK S-IN ( RE SP O NSIBILTY, DURA TIO N)

    CO NTRA CTE D-BY(C-D ATE )

  • 8/6/2019 DDAM Lecture 6 Integration 1

    28/42

    28

    Translation R M->E R

    And.. what degree is the relationship?- 1:many or many:many?- An E MP is assigned to many projects and a project has

    many E MP assigned to it m:n- P AY relation is difficult to handle, why?Is it an entity? Is it an attribute?

    - I f an entity how is it related to other entities?

    - How do we know what it is?- Can we create a 1:m relationship from P AY to E MP

  • 8/6/2019 DDAM Lecture 6 Integration 1

    29/42

    29

    Translation

    So the R M is translated to an E R model thus

    E MP PROJ E C T ASGn m

    How do we know the degree of the relationships?

    A nd , where is the pay E ntity?

  • 8/6/2019 DDAM Lecture 6 Integration 1

    30/42

    30

    Translation issues : Model P AY asattributes

    E MP PROJ E C T ASGn m

    SalaryTitle

    Eng. N o. Eng N ame

  • 8/6/2019 DDAM Lecture 6 Integration 1

    31/42

    31

    Translation issues : Model P AY as Entity

    E MP PROJ E C T ASGn m

    SalaryTitle

    Eng. N o. Eng N ame

    PAY

    PAYM ENT

    n

    1

  • 8/6/2019 DDAM Lecture 6 Integration 1

    32/42

    32

    So, which solution & why?

    N eed to understand the difference (s ) between the 2solutions

    - Sol -1: P AY as an entity- Sol -2: P AY as attribute

    - Differences:1) which is the neater/simpler of the two?

    2) which is more expressive? Why?3) Which would make the better canonical model E R /R M? Why?4) Which is better? Why?

  • 8/6/2019 DDAM Lecture 6 Integration 1

    33/42

    33

    Schema I ntegration (S I )

    Canonical model being m o re expressive ; hencemetadataER is more expressive than R M;

    Why?- I t means that E R can capture more concepts, more semantics /

    more meanings, etc from the real world & thus results is a muchmore faithful model

    - What about OO vs E R? (from the expressiveness perspective? )

  • 8/6/2019 DDAM Lecture 6 Integration 1

    34/42

    34

    I llustrative Example/5 Schema Translation (ST )

    ST = mapping from one schema to another (see next slide ) R equires us to specify what the target global schema data

    model is... OO, E R , other kind N ot 100% essential if achievable during the integrati o n

    step, Integrator, then, has all infomation about the entire

    global data sets at one & the same time

    Which target model to use is thus chosen by theintegrator Can decide which model to use (OO, E R , etc )

  • 8/6/2019 DDAM Lecture 6 Integration 1

    35/42

    35

    G eneral Db Integration Process

    Two step process, in general,- Translation- I ntegration

    Data source 1

    Translator 1

    InS 1

    GCS

    Integrator

    Data source 2

    Translator 2

    InS 2

    Data sourceN

    TranslatorN

    InSN

    ....

    ....

    ....

  • 8/6/2019 DDAM Lecture 6 Integration 1

    36/42

    36

    I llustrative Example/6 Schema Translation (ST )

    Integrator can decide which target model to use (OO,ER , etc )- Can make trade-offs between local schemas,- to choose appropriate representation,

    - in case of conflicts between the local models as we will illustrate in thisexample

    - Thus, integrator must have knowoledge of all possible trade-offs betweenthe many different schemas (which may be heterogenous ) being integrated

  • 8/6/2019 DDAM Lecture 6 Integration 1

    37/42

    37

    Schema I ntegration (S I )

    Follows translation/mapping step,By integrating the intermediate schemasSI -- is a process,- I dentify components (in the two or more intermediate

    models ) related to each other - Selecting best representation for the GCS

    - And then, finally, integrating therelated components

  • 8/6/2019 DDAM Lecture 6 Integration 1

    38/42

    38

    N ow, onto schema I ntegration (S I )/2

    What does related mean?Two components can be related :

    - 1. as equivalent components- 2. one C contains the other C - 3. disjoint

    - 4. any other?

  • 8/6/2019 DDAM Lecture 6 Integration 1

    39/42

    39

    Schema I ntegration Methods (S I )/3

    Taxonomy / classification :

    Integration process

    Binary N -ary

    Ladder balanced One-shot Iterative

  • 8/6/2019 DDAM Lecture 6 Integration 1

    40/42

    40

    Binary I ntegration Methods /4

    a) Ladder step-wise b) Pure binary

  • 8/6/2019 DDAM Lecture 6 Integration 1

    41/42

    41

    N ary I ntegration Methods /5

    a) O ne pass iteration>2 schema integrated at each step

    O ne pass operation = all schemasare integrated producing GCS inone iteration stepPros:

    A ll info about local schemas available;Trade-offs between all schemas , not just between a few

    Cons:Increased complexityDifficult to automate

    b) Iterative nary integrationPros:

    More flexibilityMore general

  • 8/6/2019 DDAM Lecture 6 Integration 1

    42/42

    42

    Summary

    Introduction to I ntegrationSchema integration

    - Canonical model - Mapping

    Schema integration approaches