Response Db 4

11
Implementation of Database Exercise 4 Tanmaya Mahapatra Matriculation Number : 340959 [email protected] Bharath Rangaraj Matriculation Number : 340909 [email protected] Manasi Jayapal Matriculation Number : 340892 [email protected] December 8, 2013 1 Exercise 4.1 [Query Optimization] : Consider the join of three relations R, S, U . Their attributes and the esti- mated size of values sets for the attributes in each relation are summarized below, with V (Rel, attr) denoting the number of distinct values for the attribute attr of relation Rel, and T (Rel) denoting the number of tuples within relation Rel: R(a,b) S(b,c) U(c,a) T(R) = 800 T(S) = 600 T(U) = 400 V(R,a) = 100 V(S,b) = 120 V(U,c) = 100 V(R,b) = 100 V(S,c) = 200 V(U,a) = 40 Moreover, we make the following assumptions: Cost of a sub-plan is the sum of the sizes of all intermediate relations, excluding the base relations and the join result of the sub-plan under consideration. Size of a relation is simplified to be its cardinality, i.e., number of tuples. All attributes are mutually independent. 1

description

Database Solution 4 RWTH

Transcript of Response Db 4

Page 1: Response Db 4

Implementation of Database

Exercise 4

Tanmaya Mahapatra

Matriculation Number : 340959

[email protected]

Bharath Rangaraj

Matriculation Number : 340909

[email protected]

Manasi Jayapal

Matriculation Number : 340892

[email protected]

December 8, 2013

1 Exercise 4.1 [Query Optimization] :

Consider the join of three relations R, S, U . Their attributes and the esti-mated size of values sets for the attributes in each relation are summarizedbelow, with V (Rel, attr) denoting the number of distinct values for theattribute attr of relation Rel, and T (Rel) denoting the number of tupleswithin relation Rel:

R(a,b) S(b,c) U(c,a)

T(R) = 800 T(S) = 600 T(U) = 400

V(R,a) = 100 V(S,b) = 120 V(U,c) = 100

V(R,b) = 100 V(S,c) = 200 V(U,a) = 40

Moreover, we make the following assumptions:

• Cost of a sub-plan is the sum of the sizes of all intermediate relations,excluding the base relations and the join result of the sub-plan underconsideration.

• Size of a relation is simplified to be its cardinality, i.e., number oftuples.

• All attributes are mutually independent.

1

Page 2: Response Db 4

1 EXERCISE 4.1 [QUERY OPTIMIZATION] : 2

1.1 Estimate the sizes of R ⋊⋉ S, S ⋊⋉ U , R ⋊⋉ U and R ⋊⋉ S ⋊⋉ U

Solution

R ⋊⋉ S Number of Tuples in R = 800Number of Tuples in S = 600b is the common attribute/column.Maximum Distinct values of b in both R and S = 100Size of the Join = T (R)×T (S)

max(V (R,b),V (S,b))

= 800×600max(100,120)

= 800×600120

= 4000

S ⋊⋉ U Number of Tuples in S = 600Number of Tuples in U = 400c is the common attribute/column.Maximum Distinct values of c in both S and U = 200Size of the Join = T (S)×T (U)

max(V (S,c),V (U,c))

= 600×400max(200,100)

= 600×400200

= 1200

R ⋊⋉ U Number of Tuples in R = 800Number of Tuples in U = 400a is the common attribute/column.Maximum Distinct values of a in both R and U = 100Size of the Join = T (R)×T (U)

max(V (R,a),V (U,a))

= 800×400max(100,40)

= 800×400100

= 3200

R ⋊⋉ S ⋊⋉ U For this Join first we consider Join of R and SNumber of Tuples in R = 800Number of Tuples in S = 600b is the common attribute/column.Maximum Distinct values of b in both R and S = 100Size of the Join = T (R)×T (S)

max(V (R,b),V (S,b))

= 800×600max(100,120)

= 800×600120

1.1 Estimate the sizes of R ⋊⋉ S, S ⋊⋉ U , R ⋊⋉ U and R ⋊⋉ S ⋊⋉

U

Page 3: Response Db 4

1 EXERCISE 4.1 [QUERY OPTIMIZATION] : 3

= 4000Let us consider the joined Relation of R and S to be XNow join of X with UNumber of Tuples in X = 4000Number of Tuples in U = 400a and C the common attributes/columns.

Size of the Join = T (X)×T (U)max(V (X,a),V (U,a))×max(V (X,c),V (U,c))

= 4000×400max(100,40)×max(200,100)

= 4000×400100×200

= 80

1.2 Please use dynamic programming to find the cheapestleft-deep join plan for the natural join of the three re-lations. Be specific about the estimation of the cost ofeach candidate sub-plan. For each subplan, specify theestimated cost, and the best join plan.

Solution

• In the First step of Dynamic Programming we consider the singletonset of relations and their size, cost are given. Please Refer to Table 1.

1. For the Singleton sets, the sizes are as given.

2. The Cost is Zero since there are no intermediate relations needed.

• In the Second step there are 2 possible plane i.e since either of the tworelations can be on the left side of the argument. But Since we areconsidering only left-deep plan and the relations are of unequal sizes,we keep the smallest relation in the left hand side of the argument.Please Refer to Table 2.

1. The sizes are computed using the Standard Formula as done inprevious question.

2. The Cost for reach is zero since there are still no intermediaterelations in a join of two.

• The joining of 3 relations is considered. Please Refer to Table 3.

1. The sizes are computed using the Standard Formula as done inprevious question.

2. The Cost estimate for each of the triple relations is the size ofthe one intermediate relation - the join of the first two chosen.Since we want this cost to be as small as possible, we consider

1.2 Please use dynamic programming to find the cheapestleft-deep join plan for the natural join of the three relations.Be specific about the estimation of the cost of each candi-date sub-plan. For each subplan, specify the estimated cost,and the best join plan.

Page 4: Response Db 4

1 EXERCISE 4.1 [QUERY OPTIMIZATION] : 4

each pair of two out of the three relations and take the pair withthe smallest size.

The Join plan : (U ⋊⋉ S) ⋊⋉ R is the cheapest as evident fromthe table and the corresponding Join tree is given below.

{R} {S} {U}

Size 800 600 400

Cost 0 0 0

Best Plan R S U

Table 1: The table for Singleton sets.

{S,R} {U,R} {U,S}

Size 4000 3200 1200

Cost 0 0 0

Best Plan S ⋊⋉ R U ⋊⋉ R U ⋊⋉ S

Table 2: The table for pair of relations.

{S,R,U} {U,R,S} {U,S,R}

Size 80 80 80

Cost 4000 3200 1200

Best Plan (S ⋊⋉ R) ⋊⋉ U (U ⋊⋉ R) ⋊⋉ S (U ⋊⋉ S) ⋊⋉ R

Table 3: The table for triple of relations.

⋊⋉

⋊⋉

U S

R

1.2 Please use dynamic programming to find the cheapestleft-deep join plan for the natural join of the three relations.Be specific about the estimation of the cost of each candi-date sub-plan. For each subplan, specify the estimated cost,and the best join plan.

Page 5: Response Db 4

���

��������������� ��������������������������������������

�����

������������������� ! ���"���

��#�$�����������% ! ���"���

&����������������'������������������"����'���#�$����������"��

(������� ���������������������

������������������� ! �)������

��#�$�����������% �)�����*�

���������������� �������+�,������������������"����

�����

(-.-���/�

01&2���������������!�#�$��������#�

��-1-��*��3���������#*��3��������4#����2��������*"������

�����������$�3��+�,�

0���������'������ �������5������������3����5�������'������

��� ! �/�% ! �/� 67�

�% �

�����""�,��$��������2��������*"������

�% �/� *��

�%�

��������������+�,��������3�%���"��*�

Page 6: Response Db 4

3. Cost of plan P1: {ゝ HasMustache(picture) (HumanCriminal)} 牛 DogCriminal

The cost of the plan = Cost(Selection) + Cost(Join)

= Cost(ゝ) + Cost(牛)

Cost(ゝ) = T(HumanCriminal) * Cost(HasMustache() predicate)

= 10000 * N

Cost(牛) = (Here, we assume that each block is held in 1 page)

= N + M(N / Buffer) (for Block Nested Loop Join)

where,

N = Number of blocks in outer relation = {Number of blocks in HumanCriminal}

* 1% = 100,

M = Number of blocks in inner relation (DogCriminal) = 5000

Therefore, Cost(牛) = 100 + 5000(100/100) = 5100

Cost of plan P1

= Cost(ゝ) + Cost(牛)

= (10000N + 5100)

Cost of plan P2: ゝHasMustache(picture) {HumanCriminal 牛 DogCriminal}

The cost of the plan = Cost (Join) + Cost(Selection) = Cost(牛) + Cost(ゝ).

Cost(牛) = (We assume that each block is held in 1 page)

= N + M(N / Buffer) (for Block Nested Loop Join)

where, N = Number of blocks in outer relation (HumanCriminal) = 10000,

M = Number of blocks in inner relation (DogCriminal) = 5000

Therefore, Cost(牛)

= 10000 + 5000(10000/100) = 510000

Cost(ゝ)

= [Number of tuples in the join result {HumanCriminal 牛 DogCriminal}] *

Cost(HasMustache() predicate)

= [ { T(HumanCriminal) * T(DogCriminal) } * P ] (where P = probability 10-6) *

Page 7: Response Db 4

Cost(HasMustache() predicate)

= 500 * N

Therefore, cost of plan P2

= Cost(牛) + Cost(ゝ)

= (510000 + 500N)

4. For plan P1 to be cheaper than plan P2:

(10000N + 5100) <= (510000 + 500N)

Therefore, N < =53

So, for all values of N <= 53, plan P1 should be cheaper than plan P2.

Page 8: Response Db 4

���������AB�CDEFE������������������������������������������������������������������������������

���A�������F���������������F������E���EFE�E����������������������������������������������������

���������������� !�"#�� �������������"

����������$E��� #�� ����E�$E���%�����

�����D������F����������������EF��������F�����F������E���EFE�E����&��%����&����

E%%��%��EF��DEFE������'����

�E#�%E���F� !"#�� ����E�%E���F����"A

(��'F�����

)E���F� !"#��*�+�����"! #A

��#�$E������ !"#�� �E���"�E���%E���F�����F����E$�������A

(��'F�����

,E������ !"#��*�+�����-! #!

�������+�����-!"#!

�������,E��� #!

���������F�,E���"#A

./

,E������ !"#��*�+�����-! #!

�������+�����-!"#!

�������,E���"#!

���������F�,E��� #A

��#����F��� !"#�� ����E����F������"A

(��'F�����

(��F��� !"#��*�+����� !-#!

���+�����"!-#!

�����������������������,E���"#!

�����F�,E��� #A

��#��E�����F���� !"#�� ����E��E��*���F�������"�� �E���"��E������&�������$$���

%E���F#A

(��'F�����

0E��*1��F���� !"#��*�+����� !-#!

����+����� !)#!

����+�����"!-#!

����+�����"!2#!

����,E��� #A

Page 9: Response Db 4

2. Decide whether the following two Datalog programs are

strati_ed or not and explain why.

Datalog program 1:

q(X) :- 藩p(X), t(X).

p(X) :- s(X,X), 藩r(X).

s(X,Y) :- s(Y,X), t(Y).

r(X) :- t(X), 藩s(X,X).

Since there is no negative self loop on any of the strata layers.

The given datalog is stratified.

Page 10: Response Db 4

Datalog program 2:

p(X) :- q(X,Y), 藩t(Y).

t(Y) :- q(Y,Z), 藩t(Z).

t(Y) :- s(Y).

There is negative self loop in stratum 2. As per the rule the stratum

in layer I cannot negatively depend on a table in the same stratum.

The given datalog is not stratified

Page 11: Response Db 4

3)

Ans:

Minimal model 1:

q(x) :- b(x) , 藩 p(x)

p(x) :-a(x) , 藩q(x)

r(x) :-p(x) , 藩 b(x)

Minimal model 2:

p(x):- a(x) , 藩b(x)

q(x):-b(x)

r(x):-p(x)