Lecture 28: Size/Cost Estimation, Recovery
-
Upload
paul-fleming -
Category
Documents
-
view
21 -
download
0
description
Transcript of Lecture 28: Size/Cost Estimation, Recovery
![Page 1: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/1.jpg)
1
Lecture 28:Size/Cost Estimation, Recovery
Monday, December 6, 2004
![Page 2: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/2.jpg)
2
Outline
• Cost estimation: 16.4
• Recovery using undo logging 17.2
![Page 3: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/3.jpg)
3
Size Estimation
The problem: Given an expression E, compute T(E) and V(E, A)
• This is hard without computing E
• Will ‘estimate’ them instead
![Page 4: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/4.jpg)
4
Size Estimation
Estimating the size of a projection
• Easy: T(L(R)) = T(R)
• This is because a projection doesn’t eliminate duplicates
![Page 5: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/5.jpg)
5
Size Estimation
Estimating the size of a selection• S = A=c(R)
– T(S) san be anything from 0 to T(R) – V(R,A) + 1– Estimate: T(S) = T(R)/V(R,A)– When V(R,A) is not available, estimate T(S) = T(R)/10
• S = A<c(R)– T(S) can be anything from 0 to T(R)– Estimate: T(S) = (c - Low(R, A))/(High(R,A) - Low(R,A))– When Low, High unavailable, estimate T(S) = T(R)/3
![Page 6: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/6.jpg)
6
Size Estimation
Estimating the size of a natural join, R ||A S
• When the set of A values are disjoint, then T(R ||A S) = 0
• When A is a key in S and a foreign key in R, then T(R ||A S) = T(R)
• When A has a unique value, the same in R and S, then T(R ||A S) = T(R) T(S)
![Page 7: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/7.jpg)
7
Size Estimation
Assumptions:
• Containment of values: if V(R,A) <= V(S,A), then the set of A values of R is included in the set of A values of S– Note: this indeed holds when A is a foreign key in R, and a key in
S
• Preservation of values: for any other attribute B, V(R || A S, B) = V(R, B) (or V(S, B))
![Page 8: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/8.jpg)
8
Size Estimation
Assume V(R,A) <= V(S,A)
• Then each tuple t in R joins some tuple(s) in S– How many ?
– On average T(S)/V(S,A)
– t will contribute T(S)/V(S,A) tuples in R ||A S
• Hence T(R ||A S) = T(R) T(S) / V(S,A)
In general: T(R ||A S) = T(R) T(S) / max(V(R,A),V(S,A))
![Page 9: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/9.jpg)
9
Size Estimation
Example:
• T(R) = 10000, T(S) = 20000
• V(R,A) = 100, V(S,A) = 200
• How large is R || A S ?
Answer: T(R ||A S) = 10000 20000/200 = 1M
![Page 10: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/10.jpg)
10
Size Estimation
Joins on more than one attribute:
• T(R ||A,B S) =
T(R) T(S)/(max(V(R,A),V(S,A))*max(V(R,B),V(S,B)))
![Page 11: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/11.jpg)
11
Histograms
• Statistics on data maintained by the RDBMS
• Makes size estimation much more accurate (hence, cost estimations are more accurate)
![Page 12: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/12.jpg)
12
Histograms
Employee(ssn, name, salary, phone)• Maintain a histogram on salary:
• T(Employee) = 25000, but now we know the distribution
Salary: 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k
Tuples 200 800 5000 12000 6500 500
![Page 13: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/13.jpg)
13
Histograms
Ranks(rankName, salary)
• Estimate the size of Employee || Salary Ranks
Employee 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k
200 800 5000 12000 6500 500
Ranks 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k
8 20 40 80 100 2
![Page 14: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/14.jpg)
14
Histograms
• Eqwidth
• Eqdepth
0..20 20..40 40..60 60..80 80..100
2 104 9739 152 3
0..44 44..48 48..50 50..56 55..100
2000 2000 2000 2000 2000
![Page 15: Lecture 28: Size/Cost Estimation, Recovery](https://reader036.fdocuments.us/reader036/viewer/2022083004/56813500550346895d9c4b1e/html5/thumbnails/15.jpg)
15
Histograms
• Assume:– V(Employee, Salary) = 200– V(Ranks, Salary) = 250
• Main property:
Employee ⋈Salary Ranks=Employee1 ⋈Salary Ranks1’ ... Emplyee6 ⋈Salary Ranks6’
• A tuple t in Employee1 joins with how many tuples in Ranks1’ ?– Answer: with T(Rank1)/T(Rank) * T(Rank)/250 = T1 ‘/ 250
• Then T(Employee ⋈Salary Ranks) == i=1,6 Ti Ti’ / 250= (200x8 + 800x20 + 5000x40 + 12000x80 + 6500x100 + 500x2)/250= ….