Lecture 28: Size/Cost Estimation, Recovery

15
1 Lecture 28: Size/Cost Estimation, Recovery Monday, December 6, 2004

description

Lecture 28: Size/Cost Estimation, Recovery. Monday, December 6, 2004. Outline. Cost estimation: 16.4 Recovery using undo logging 17.2. Size Estimation. The problem: Given an expression E, compute T(E) and V(E, A) This is hard without computing E Will ‘estimate’ them instead. - PowerPoint PPT Presentation

Transcript of Lecture 28: Size/Cost Estimation, Recovery

Page 1: Lecture 28: Size/Cost Estimation, Recovery

1

Lecture 28:Size/Cost Estimation, Recovery

Monday, December 6, 2004

Page 2: Lecture 28: Size/Cost Estimation, Recovery

2

Outline

• Cost estimation: 16.4

• Recovery using undo logging 17.2

Page 3: Lecture 28: Size/Cost Estimation, Recovery

3

Size Estimation

The problem: Given an expression E, compute T(E) and V(E, A)

• This is hard without computing E

• Will ‘estimate’ them instead

Page 4: Lecture 28: Size/Cost Estimation, Recovery

4

Size Estimation

Estimating the size of a projection

• Easy: T(L(R)) = T(R)

• This is because a projection doesn’t eliminate duplicates

Page 5: Lecture 28: Size/Cost Estimation, Recovery

5

Size Estimation

Estimating the size of a selection• S = A=c(R)

– T(S) san be anything from 0 to T(R) – V(R,A) + 1– Estimate: T(S) = T(R)/V(R,A)– When V(R,A) is not available, estimate T(S) = T(R)/10

• S = A<c(R)– T(S) can be anything from 0 to T(R)– Estimate: T(S) = (c - Low(R, A))/(High(R,A) - Low(R,A))– When Low, High unavailable, estimate T(S) = T(R)/3

Page 6: Lecture 28: Size/Cost Estimation, Recovery

6

Size Estimation

Estimating the size of a natural join, R ||A S

• When the set of A values are disjoint, then T(R ||A S) = 0

• When A is a key in S and a foreign key in R, then T(R ||A S) = T(R)

• When A has a unique value, the same in R and S, then T(R ||A S) = T(R) T(S)

Page 7: Lecture 28: Size/Cost Estimation, Recovery

7

Size Estimation

Assumptions:

• Containment of values: if V(R,A) <= V(S,A), then the set of A values of R is included in the set of A values of S– Note: this indeed holds when A is a foreign key in R, and a key in

S

• Preservation of values: for any other attribute B, V(R || A S, B) = V(R, B) (or V(S, B))

Page 8: Lecture 28: Size/Cost Estimation, Recovery

8

Size Estimation

Assume V(R,A) <= V(S,A)

• Then each tuple t in R joins some tuple(s) in S– How many ?

– On average T(S)/V(S,A)

– t will contribute T(S)/V(S,A) tuples in R ||A S

• Hence T(R ||A S) = T(R) T(S) / V(S,A)

In general: T(R ||A S) = T(R) T(S) / max(V(R,A),V(S,A))

Page 9: Lecture 28: Size/Cost Estimation, Recovery

9

Size Estimation

Example:

• T(R) = 10000, T(S) = 20000

• V(R,A) = 100, V(S,A) = 200

• How large is R || A S ?

Answer: T(R ||A S) = 10000 20000/200 = 1M

Page 10: Lecture 28: Size/Cost Estimation, Recovery

10

Size Estimation

Joins on more than one attribute:

• T(R ||A,B S) =

T(R) T(S)/(max(V(R,A),V(S,A))*max(V(R,B),V(S,B)))

Page 11: Lecture 28: Size/Cost Estimation, Recovery

11

Histograms

• Statistics on data maintained by the RDBMS

• Makes size estimation much more accurate (hence, cost estimations are more accurate)

Page 12: Lecture 28: Size/Cost Estimation, Recovery

12

Histograms

Employee(ssn, name, salary, phone)• Maintain a histogram on salary:

• T(Employee) = 25000, but now we know the distribution

Salary: 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k

Tuples 200 800 5000 12000 6500 500

Page 13: Lecture 28: Size/Cost Estimation, Recovery

13

Histograms

Ranks(rankName, salary)

• Estimate the size of Employee || Salary Ranks

Employee 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k

200 800 5000 12000 6500 500

Ranks 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k

8 20 40 80 100 2

Page 14: Lecture 28: Size/Cost Estimation, Recovery

14

Histograms

• Eqwidth

• Eqdepth

0..20 20..40 40..60 60..80 80..100

2 104 9739 152 3

0..44 44..48 48..50 50..56 55..100

2000 2000 2000 2000 2000

Page 15: Lecture 28: Size/Cost Estimation, Recovery

15

Histograms

• Assume:– V(Employee, Salary) = 200– V(Ranks, Salary) = 250

• Main property:

Employee ⋈Salary Ranks=Employee1 ⋈Salary Ranks1’ ... Emplyee6 ⋈Salary Ranks6’

• A tuple t in Employee1 joins with how many tuples in Ranks1’ ?– Answer: with T(Rank1)/T(Rank) * T(Rank)/250 = T1 ‘/ 250

• Then T(Employee ⋈Salary Ranks) == i=1,6 Ti Ti’ / 250= (200x8 + 800x20 + 5000x40 + 12000x80 + 6500x100 + 500x2)/250= ….