8 drived horizontal fragmentation

38
Distributed Database Systems 22-11-2012

description

Distribute Data Base

Transcript of 8 drived horizontal fragmentation

Page 1: 8  drived horizontal fragmentation

Distributed Database Systems

22-11-2012

Page 2: 8  drived horizontal fragmentation

You must remember!

Page 3: 8  drived horizontal fragmentation

You must also remember!

• Relation data languages are based on relational algebra

• Relational algebra consist of a set of operators on relations, which include:– Selection– Projection– Union– Cartesian product

Page 4: 8  drived horizontal fragmentation

Cartesian Product

• The Cartesian product of two relations R of degree k1 and S of degree k2 is the set of (k1+k2)-tuples, where each result tuple is a concatenation of one tuple of R with one tuple of S, for all tuples of R and S (R X S)

• Consider the relation EMP and PAY, EMPXPAY is:

Page 5: 8  drived horizontal fragmentation

Cartesian Product (EMPXPAY)

Page 6: 8  drived horizontal fragmentation

Joins

• Join is a derivative of Cartesian Product• There are various forms of joins– Join

• Inner join – Theta join– Equi-join

• Outer join– Left join– Right join– Full join

– Semi join

Page 7: 8  drived horizontal fragmentation

Theta Join

• Consider the relation EMP, the theta-join of relation EMP and ASG over the join predicate EMP.ENO=ASG.ENO

Page 8: 8  drived horizontal fragmentation

Equi-Join

• This example demonstrate a special case of theta-join called equi-join

Page 9: 8  drived horizontal fragmentation

Semi-Join

• The semi-join of relation R, defined over the set of attributes A, by relation S, defined over the set of attributes B, is the subset of the tuples of R that participate in the join of R with S

• The advantage of semi-join is that it decreases the number of tuples that need to be handled to form the join

Page 10: 8  drived horizontal fragmentation

Semi-Join

• In centralized database systems, this is important because it usually results in a decreased number of secondary storage accesses by making better use of the memory.

• It is even more important in distributed databases since it usually reduces the amount of data that needs to be transmitted between sites in order to evaluate a query.

Page 11: 8  drived horizontal fragmentation

Semi-Join

• To demonstrate the difference between join and semi-join, lets consider the semi-join of EMP with PAY over the predicate EMP.TITLE = PAY.TITLE that is

Page 12: 8  drived horizontal fragmentation

Semi-Join

Page 13: 8  drived horizontal fragmentation

Derived Horizontal Fragmentation

• A derived horizontal fragmentation is defined on a member relation of a link according to a selection operation specified on its owner

• It is important to remember two points– First, the link between the owner and the member

relations is defined as an equi-join– Second, an equi-join can be implemented by

means of semi-join

Page 14: 8  drived horizontal fragmentation

Derived Horizontal Fragmentation

• Accordingly, given a link L where owner(L) = S and member(L) = R, the derived horizontal fragments of R are defined as:

• Where w is the maximum number of fragments that will be defined on R, and

where Fi is the formula according to which the primary horizontal fragment Si is defined

S

Page 15: 8  drived horizontal fragmentation

Derived Horizontal Fragmentation

• To carry out a derived horizontal fragmentation, three inputs are needed:– The set of partitions of the owner relation (PAY1,

PAY2)– The member relation– The set of semi join predicates between the owner

and member (EMP.TITLE=PAY.TITLE)

Page 16: 8  drived horizontal fragmentation

Example

Page 17: 8  drived horizontal fragmentation

Example

• Consider L1, where owner(L1) = PAY and member (L1) = EMP

• We can group engineers into two groups according to their salary: those making less then or equal to $30,000, and those making more then $30,000

• The two fragments EMP1 and EMP2 are defined as:

Page 18: 8  drived horizontal fragmentation

Example

• The result of this fragmentation is depicted as:

Page 19: 8  drived horizontal fragmentation

Derived Horizontal Fragmentation

• One potential complication that need attention

• In a database schema if there are two link into a relation R, there could be more than one possible derived horizontal fragmentation of R

• The choice of candidate fragmentation is based on two criteria– The fragmentation with better join characteristics– The fragmentation used in more applications

Page 20: 8  drived horizontal fragmentation

The fragmentation used in more Applications

• It is quite straight forward if we take into consideration the frequency with which application access some data

• The access of the heavy users can minimize the total impact on system performance

Page 21: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• Consider the last example, the effect of this fragmentation is that the join of the EMP and PAY relations to answer the query is assisted– By performing it on smaller relations– By potentially performing joins in parallel

Page 22: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• The first point is obvious, the fragments of EMP are smaller than EMP itself

• Therefore, it will be faster to join any fragment of PAY with any fragment of EMP than to work with the relations themselves

• The second point is however, more important and is at the heart of distributed databases

• If, besides executing a number of queries at different sites, we can parallelize execution of one join query, the response time or throughput of the system can be expected to improve

Page 23: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• In the case of joins, this is possible under certain circumstances

• Consider the join graph between the fragments of EMP and PAY, there is only one link coming in or going out of a fragment

• Such a join graph is called a simple graph • The advantage of a design where the join relationship

between fragments is simple is that the member and owner link can be allocated to one site and the joins between different pairs of fragments can proceed independently and in parallel

Page 24: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

Page 25: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• Unfortunately, obtaining simple join graphs may not always be possible

• In that case the next desirable alternative is to have a design that results in a partitioned join graph

• A partitioned graph consist of two or more sub-graphs with no links between them

• Fragments so obtained may not be distributed for parallel execution as easily as those obtained via simple join graphs, but the allocation is still possible

Page 26: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• Let us continue with the distribution design of the database we started before

• We already decided on the fragmentation of relation EMP according to the fragmentation of PAY

• Lets now consider ASG, assume that there are two applications – The first application finds the names of engineers who work at certain

places, it turns on all three sites and accesses the information about the engineer who work on local projects with higher probability than those of projects at other locations

– At each administrative sites where employee records are managed, users would like to access the responsibilities on the projects that these employee work on and learn how they will work on those projects

Page 27: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• The first application results in a fragmentation of ASG according to the fragments PROJ1, PROJ3, PROJ4 and PROJ6 of PROJ obtained before

Page 28: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• Therefore, the derived fragmentation of ASG according to {PROJ1, PROJ3, PROJ4, PROJ6} is defined as:

• The fragment instances are:

Page 29: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• The second query can be specified in SQL as:

• Where i=1 or i=2, depending on the site where the query is issued

• The derived fragmentation of ASG according to the fragmentation of EMP is defined as:

Page 30: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

Page 31: 8  drived horizontal fragmentation

The Fragmentation with better join characteristics

• The example demonstrate two things:– Derived fragmentation may follow a chain where

one relation is fragmented as a result of another one’s design and it, in turn, causes the fragmentation of another relation

(PAY->EMP->ASG)– Typically, there will be more than one candidate

fragmentation for a relation (ASG), the final choice of the fragmentation scheme may be a decision problem addressed during allocation

Page 32: 8  drived horizontal fragmentation

Checking of Correctness

• We should now check the fragmentation algorithms discussed so far with respect to three correctness criteria– Completeness– Reconstruction– Disjointness

Page 33: 8  drived horizontal fragmentation

Completeness

• The completeness of a primary horizontal fragmentation is based on the selection predicate used

• As long as the selection predicates are complete, the resulting fragmentation is guaranteed to be complete as well

Page 34: 8  drived horizontal fragmentation

Completeness

• The completeness of a derived horizontal fragmentation is somewhat more difficult to define

• For example, there should be no ASG tuple which has a project number that is not also contained in PROJ, this rule is know as referential integrity

Page 35: 8  drived horizontal fragmentation

Reconstruction

• Reconstruction of a global relation from its fragments is performed by the union operator in both the primary and the derived horizontal fragmentation

• Thus for a relation R with fragmentation

Page 36: 8  drived horizontal fragmentation

Disjointness

• It is easier to establish Disjointness of fragmentation for primary than for derived horizontal fragmentation

• In PHF Disjointness is guaranteed as long as the minterm predicates determining the fragmentation are mutually exclusive

Page 37: 8  drived horizontal fragmentation

Example

• In derived fragmentation, however, there is a semi join involved that adds considerable complexity

• Disjointness can be guaranteed if the join graph is simple, otherwise it is necessary to investigate actual tuple values

• In general we do not want a tuple of a member relation to join with two or more tuples of the owner relation when these tuples are in different fragments of the owner

Page 38: 8  drived horizontal fragmentation

Example

• In fragmenting relation PAY, the minterm predicates M = {m1, m2} where

m1: SAL<=30000m2: SAL>30000

• Since m1 and m2 are mutually exclusive, the fragmentation of PAY is disjoint

• For relation EMP, however we require that – Each engineer has a single title – Each title have a single salary value associated with it

• Since these two rules follow from the semantics of the database, the fragmentation of EMP with respect to PAY is also disjoint