ICS 214B: Transaction Processing and Distributed Data Management
description
Transcript of ICS 214B: Transaction Processing and Distributed Data Management
1
ICS 214B: Transaction Processing and Distributed Data Management
Lecture 9: Fragmentation and Distributed Query Processing
Professor Chen Li
ICS214B Notes 09 2
Which simple predicates should we use in Pr?
Desired property of Pr: - minimality
- uniformity
ICS214B Notes 09 3
Return to example:E(#, NM, LOC, SAL,…)Common queries:
Qa: select * Qb: select * from E from E where LOC=Sa where
LOC=Sb and … and ...
ICS214B Notes 09 4
Three choices:
(1) Pr = { } F1 ={ E }
(2) Pr = {LOC=Sa, LOC=Sb}
F2={ loc=Sa E, loc=Sb E }
(3) Pr = {LOC=Sa, LOC=Sb, Sal<10}
F3={ loc=Sa sal<10 E, loc=Sa sal10 E, loc=Sb sal<10E, loc=Sb sal10 E }
ICS214B Notes 09 5
In other words:Loc=Sa sal < 10
Loc=Sa sal 10
Loc=Sb sal < 10
Loc=Sb sal 10
F1 F3F2
Qa: Select … loc = Sa ...
Qb: Select … loc = Sb ...
F2 is good…
(not F1 , F3 )
ICS214B Notes 09 6
Informal definitionSet of predicates Pr is uniform if:
for every Fi F[Pr], every t Fi has equal probability of access by every major application.
Note: F[Pr] is fragmentation defined by minterm predicates generated by Pr.
ICS214B Notes 09 7
Back to example:Loc=Sa sal < 10
Loc=Sa sal 10
Loc=Sb sal < 10
Loc=Sb sal 10
F1
Qa: Select … loc = Sa ...
Qb: Select … loc = Sb ...tuples here havehigher probabilityof access
tuples here havelower probabilityof access
so F1 is not “good”...
ICS214B Notes 09 8
Back to example:Loc=Sa sal < 10
Loc=Sa sal 10
Loc=Sb sal < 10
Loc=Sb sal 10
F2
Qa: Select … loc = Sa ...
Qb: Select … loc = Sb ...tuples here havesame probabilityof access
so F2 is “good”...
so is F3 ...
ICS214B Notes 09 9
Informal definitionSet of predicates Pr is minimal if no Pr’ Pr is uniform
ICS214B Notes 09 10
Back to example:
(1) Pr = { } N(2) Pr = {LOC=Sa, LOC=Sb} Y(3) Pr = {LOC=Sa, LOC=Sb, Sal<10} N
uniform?
Pr(2) is a subset of Pr(3), so Pr(3) is not minimal...
ICS214B Notes 09 11
Is Pr uniform and minimal a good thing?
Not necessarily! But it does simplify allocation problem...
ICS214B Notes 09 12
Derived horizontal fragmentation
E(ENO, NAME, SAL, LOC)
J(ENO, DESCRIPTION,…)
(Owner)
(Member)
Common query: Given an employee name, list projects (s)he works in
E F={ E1, E2} by LOC
ICS214B Notes 09 13
E1
(at Sa) (at Sb)
E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…
# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…
# Description5 work on 347 hw7 go to moon5 build table12 rest…
J
ICS214B Notes 09 14
E1
(at Sa) (at Sb)
E2# NM Loc Sal5 Joe Sa 108 Tom Sa 15…
# NM Loc Sal7 Sally Sb 2512 Fred Sb 15…
J1 J2
J1 = J E1 J2 = J E2
# Des5 work on 347 hw5 build table…
# Des7 go to moon12 rest…
ICS214B Notes 09 15
Derived horizontal fragmentation
R, F = { F1, F2, ... Fn}
S, D = {D1, D2, …Dn} where Di =S Fi
Convention: R is owner S is member
F could beprimary or derived
ICS214B Notes 09 16
Checking completeness and disjointness of derived fragmentation
But no #= 33 in E1 nor in E2!
# Des…33 build chair…
Example: Say J is:
This J tuple will not be in J1 nor J2Fragmentation not complete
ICS214B Notes 09 17
To get completeness: Need to enforcereferential integrity constraint: join attr(#) of member
relation
join attr(#) of owner relation
ICS214B Notes 09 18
# NM Loc Sal5 Joe Sa 10…
# NM Loc Sal5 Fred Sb 20…
Example:
E1 E2
# Description5 day off…
# Description5 day off…
# Description5 day off…
J1
J
J2
Fragmentationis not
disjoint!
ICS214B Notes 09 19
To get disjointness: Join attribute(#) should be key of owner relation
ICS214B Notes 09 20
Summary: horizontal fragmentation
• Type: primary, derived• Properties: completeness,
disjointness• Predicates: minimal, uniform
ICS214B Notes 09 21
Vertical fragmentation
E1
# NM Loc Sal5 Joe Sa 107 Sally Sb 258 Fred Sa 15…
# NM Loc5 Joe Sa7 Sally Sb8 Fred Sa…
# Sal5 107 258 15…
E
E2
Example:
ICS214B Notes 09 22
R[T] R1[T1] Ti T
Rn[Tn]
Just like normalization of relations
...
ICS214B Notes 09 23
Properties: R[T] Ri[Ti]
(1) Completeness
U Ti = Tall i
ICS214B Notes 09 24
(2) DisjointnessTi Tj = for all i,j ij
E(#,LOC,SAL)E1(#,LOC)
E2(SAL)
Not a desirable property!!(could not reconstruct R!)
ICS214B Notes 09 25
(3) Reconstruction: Lossless join
Ri = Rall i
One way to achieve lossless join: Repeat key in all fragments, i.e.,
Key Ti for all i
ICS214B Notes 09 26
Hybrid Fragmentation
R
R1 R2
R11 R12 R21 R22
Horizontal
Vertical
ICS214B Notes 09 27
Hybrid Fragmentation -- Reconstruction
R11 R12 R21 R22
Horizontal
Vertical
U
ICS214B Notes 09 28
AllocationExample: E(#,NM,LOC,SAL)
F1 = loc=Sa E ; F2 = loc=Sb E Qa: select … where loc=Sa...Qb: select … where loc=Sb…
Site a Site b
Where do F1,F2 go?
?
ICS214B Notes 09 29
Issues• Where do queries originate?• What is communication cost?
and size of answers, relations,…• What is storage capacity, cost at sites?
and size of fragments?• What is processing power at sites?• What is query processing strategy?
– How are joins done?– Where are answers collected?
ICS214B Notes 09 30
• Cost of updating copies?• Writes and concurrency control?• ...
Do we replicate fragments?
ICS214B Notes 09 31
Optimization problem:
• What is best placement of fragments and/or best number of copies to:– minimize query response time– maximize throughput– minimize “some cost”– ...
• Subject to constraints?– Available storage– Available bandwidth, power,…– Keep 90% of response time below X– ...
• Often, can use common sense– Place fragments where they are most heavily
accessed
This is an incredibly hard problem
ICS214B Notes 09 32
Summary
• Horizontal and vertical fragmentation• Designing good fragmentations and
allocation
Next:• Query processing in distributed databases
ICS214B Notes 09 33
Query
Query Plan
Algebraic query tree on relations
(1) Decomposition
Algebraic query tree on relation fragments
(2) Localization
(3) Optimization
ICS214B Notes 09 34
Decomposition• Same as in centralized system
– Normalization– Eliminating redundancy– Algebraic rewriting
ICS214B Notes 09 35
Normalization
• Convert from query language to relational algebra
ICS214B Notes 09 36
ExampleSELECT R.A, S.DFROM R, SWHERE (R.B=1 and S.C=2) and (R.A = S.A)
R.A,S.D
(R.B=1 S.C=2)
R S(R.A = S.A)
ICS214B Notes 09 37
Eliminate redundancyE.g.: in conditions:
(S.A=1) (S.A>5) False(S.A<10) (S.A<5) S.A<5
ICS214B Notes 09 38
E.g.: Common sub-expressions
U U
S cond cond T S cond T
R R R
ICS214B Notes 09 39
Algebraic rewritingE.g.: Push conditions down
cond3
cond
cond1
cond2
R S R S
ICS214B Notes 09 40
Query
Algebraic query tree on relations
(1) Decomposition
Algebraic query tree on relation fragments
(2) Localization
ICS214B Notes 09 41
Localization steps(1) Start with query tree(2) Replace relations by fragments
(3) Push : up
, : down
(4) Simplify – eliminate unnecessary operations
ICS214B Notes 09 42
Notation for fragment
[R: cond]
fragment conditions its tuples satisfy
ICS214B Notes 09 43
Example A
(1) E=3
R
ICS214B Notes 09 44
(2) E=3
[R1: E < 10] [R2: E 10]
ICS214B Notes 09 45
(3)
E=3 E=3
[R1: E < 10] [R2: E 10]
Ø
ICS214B Notes 09 46
(4) E=3
[R1: E < 10]
ICS214B Notes 09 47
Rule 1
C1[R: c2] C1[R: c1 c2]
[R: False] ØA
B
ICS214B Notes 09 48
In example A:E=3[R2: E10] [E=3 R2: E=3
E10]
[E=3 R2: False]
Ø
ICS214B Notes 09 49
Example B(1) A=common
attribute
R S
A
ICS214B Notes 09 50
(2)
A
[S1: A<5] [S2: A 5]
[R1: A<5] [R2: 5 A 10] [R3: A>10]
ICS214B Notes 09 51
(3)
[R1:A<5][S1:A<5] [R1:A<5][S2:A5] [R2:5A10][S1:A<5]
[R2:5A10][S2:A5] [R3:A>10][S1:A<5] [R3:A>10]
[S2:A5]
A AA
AA A
ICS214B Notes 09 52
(4)
[R1:A<5][S1:A<5] [R2:5A10][S2:A5]
A A A
[R3:A>10][S2:A5]
ICS214B Notes 09 53
Rule 2
[R: C1] [S: C2]
[R S: C1 C2 R.A = S.A]
A
A
ICS214B Notes 09 54
In step 4 of Example B:
[R1: A<5] [S2: A 5]
[R1 S2: R1.A < 5 S2.A 5 R1.A = S2.A ]
[R1 S2: False] ØA
A
A
ICS214B Notes 09 55
Localization with derived fragmentation
Example C(2)
[R1:A<10][R210] S1:K=R.K S2:K=R.K
R.A<10 R.A10
K
ICS214B Notes 09 56
(3)
[R1][S1] [R1][S2] [R2][S1] [R2][S2]
K K K K
ICS214B Notes 09 57
(4)
[R1][S1] [R2][S2]
K K
ICS214B Notes 09 58
In step 3 of Example C:
[R1:A<10] [S2:K=R.K R.A10]
[R1 S2: R1.A<10 S2.K=R.K R.A10 R1.K= S2.K]
[R1 S2:False ] (K is key of R, R1)
Ø
K
K
K
ICS214B Notes 09 59
Localization with vertical fragmentation
Example D(1) A R1(K, A, B)
R R2(K, C, D)
ICS214B Notes 09 60
(2) A
R1 R2
(K, A, B) (K, C, D)
K
ICS214B Notes 09 61
(3) A
K,A K,A
R1 R2
(K, A, B) (K, C, D)
K
not reallyneeded
ICS214B Notes 09 62
(4) A
R1
(K, A, B)
ICS214B Notes 09 63
Rule 3• Given vertical fragmentation of R:
Ri = Ai (R), Ai A
• Then for any B A:
B (R) = B [ Ri | B Ai Ø ]i
ICS214B Notes 09 64
Localization with hybrid fragmentationExample E
R1 = k<5 [k,A R]
R2 = k5 [k,A R]
R3 = k,B R
ICS214B Notes 09 65
Query: A
k=3
R
ICS214B Notes 09 66
ReducedQuery: A
k=3
R1
ICS214B Notes 09 67
Distributed Query Processing
• Decomposition • Localization • Optimization next