Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem , and Moni Naor

25
Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor Presented by Suresh Barukula 2011csz8090 1

description

Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem , and Moni Naor. Presented by Suresh Barukula 2011csz8090. What is top-k query processing. Top-k query processing means finding k-objects, that have highest overall grades. A query in multimedia database - PowerPoint PPT Presentation

Transcript of Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem , and Moni Naor

Page 1: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

1

Optimal aggregation algorithms for

middlewareRonald Fagin, Amnon Lotem,

and Moni NaorPresented by Suresh Barukula

2011csz8090

Page 2: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

2

Top-k query processing means finding k-objects, that have highest overall grades.

A query in multimedia database *combines different graded attributes

through an aggregation function *Overall grade for each object will be

calculated using an aggregation function, and we can return top-k objects.

What is top-k query processing

Page 3: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

3

In general multimedia databases contains fuzzy data.

For example: We want to retrieve all red objects

What we can say about the below object? Is it red or not? We can’t say whether it is red or not, but

we can grade it by the amount of redness. attribute values are typically graded [0,1]

Why is it important?

Page 4: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

4

FA-Fagin’s Algorithm TA-Threshold Algorithm TAZ Algorithm NRA- No Random Access CA- Combined Algorithm

What are the ways?

Page 5: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

5

N-Number of Objects m-No of attributes xi Є [0,1] Database is consisting of m sorted lists L1…Lm; each of length N . We may refer to Li as list i. Each entry of Li is of the form (R, xi), where xi is the ith field of object R, Each list Li is sorted in descending order by the xi value.

The database model

Page 6: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

6

Sorted access Random access The cost of the middleware is sCS+ rCR

Where s is the no of sorted accesses, r is no of random accesses , CS is sorted access cost and CR is random access cost. 

Modes of accessing the database

Page 7: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

7

Example – Simple Database model

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

Sorted L1

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.N

a

b

c

d ....

ObjectID

0.9

0.8

0.72

0.6

.

.

.

.

Attribute 1

0.85

0.2

0.9....

Attribute 2

0.7

MSorted L2

Page 8: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

8

Find the top 2 (k = 2) objects on the following ‘query’ executed on the middleware:

A1 & A2 (eg: color=red & shape=round)

Example – Simple Query

A1 & A2 as a ‘query’ to the middlewareresults in combining of the grades of A1 andA2 by min(A1,A2)

Page 9: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

9

c

ID A1 A2 Min(A1,A2)

STEP 1: Read attributes from every sorted list• Stop when k objects have been seen in common from all lists

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

L1 L2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

a

d

0.9

0.9

0.85

b 0.8

0.72

0.7

Example – Fagin’s Algorithm

Page 10: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

10

c

ID A1 A2 Min(A1,A2)

STEP 2: Random access to find missing grades

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

L1 L2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

a

d

0.9

0.9

0.85

b 0.8

0.72

0.7

0.6

0.2

Example – Fagin’s Algortihm

Page 11: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

11

c

ID A1 A2 Min(A1,A2)

STEP 3• Compute the grades of the seen objects.• Return the k highest graded objects.

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

L1 L2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

a

d

0.9

0.9

0.85

b 0.8

0.72

0.7

0.6

0.2

0.85

0.6

0.7

0.2

Example – Fagin’s Algortihm

Page 12: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

12

Read all grades of an object once seen from a sorted access• No need to wait until the lists give k common objects

Do sorted access (and corresponding random accesses) until you have seen the top k answers.

• How do we know that grades of seen objects are higher than the grades of unseen objects ?• Predict maximum possible grade unseen objects:

a: 0.9b: 0.8c: 0.72

.

.

.

.

L1 L2

d: 0.9a: 0.85b: 0.7

c: 0.2

.

.

.

.f: 0.65d: 0.6

f: 0.6

Seen

Possibly unseen Threshold value

Threshold Algorithm (TA)

T = min(0.72, 0.7) = 0.7

Page 13: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

13

ID A1 A2 Min(A1,A2)

Step 1: - parallel sorted access to each list

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

L1 L2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

a

d

0.9

0.9

0.85 0.85

0.6 0.6

For each object seen: - get all grades by random access - determine Min(A1,A2) - amongst 2 highest seen ? keep in buffer

Example – Threshold Algorithm

Page 14: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

14

ID A1 A2 Min(A1,A2)a: 0.9

b: 0.8

c: 0.72

d: 0.6

.

.

.

.

L1 L2

d: 0.9

a: 0.85

b: 0.7

c: 0.2

.

.

.

.

Step 2: - Determine threshold value based on objects currently seen under sorted access. T = min(L1, L2)

a

d

0.9

0.9

0.85 0.85

0.6 0.6

T = min(0.9, 0.9) = 0.9

- 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1

Example – Threshold Algorithm

Page 15: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

15

ID A1 A2 Min(A1,A2)

Step 1 (Again): - parallel sorted access to each list

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

L1 L2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

a

d

0.9

0.9

0.85 0.85

0.6 0.6

For each object seen: - get all grades by random access - determine Min(A1,A2) - amongst 2 highest seen ? keep in buffer

b 0.8 0.7 0.7

Example – Threshold Algorithm

Page 16: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

16

ID A1 A2 Min(A1,A2)a: 0.9

b: 0.8

c: 0.72

d: 0.6

.

.

.

.

L1 L2

d: 0.9

a: 0.85

b: 0.7

c: 0.2

.

.

.

.

Step 2 (Again): - Determine threshold value based on objects currently seen. T = min(L1, L2)

a

b

0.9

0.7

0.85 0.85

0.8 0.7

T = min(0.8, 0.85) = 0.8

- 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1

Example – Threshold Algorithm

Page 17: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

17

ID A1 A2 Min(A1,A2)a: 0.9

b: 0.8

c: 0.72

d: 0.6

.

.

.

.

L1 L2

d: 0.9

a: 0.85

b: 0.7

c: 0.2

.

.

.

.

Situation at stopping condition

a

b

0.9

0.7

0.85 0.85

0.8 0.7

T = min(0.72, 0.7) = 0.7

Example – Threshold Algorithm

Page 18: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

18

The middleware cost of the FA is same no matter what the aggregation function is . 

TA stops at least as early as FA TA may perform more random accesses

than FA TA requires only bounded buffers TA can be stopped early(θ-approximation)

TA vs FA

Page 19: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

19

A = class of algorithms, A Є A represents an algorithmD = legal inputs to algorithms (databases), D Є D represents a database

Cost(A,D ) = middleware cost when running algorithm A over database D

Concept of instance optimality

Algorithm B is instance optimal over A and D if :B Є A and Cost(B,D ) = O(Cost(A,D )) A Є A, D Є D

Which means that:Cost(B,D ) ≤ c . Cost(A,D ) + c’, A Є A, D Є D

optimality ratio

,

Page 20: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

20

Theorem: If the aggregation function t is monotone, TA correctly finds the top K answers.

  Theorem: TA is instance optimal for every

monotone aggregation function, over every database

(Note: if we exclude wild guesses). 

Some facts about TA

Page 21: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

21

TAZ Algorithm(No Sorted Access)

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

L1 L2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

(b, 0.6)

(a, 0.83)

(d, 0.61)

(c, 0.9)

.

.

.

.

L3

1

T=min(0.72,0.7,1)=0.7

Page 22: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

22

Can we determine rank of an object without seeing all of it’s grades? The main essence of this algorithm is estimating the rank using best and

worst possible values

NRA(No Random Access)

1

1/3

1/3

1/3

.

.

.

.

1/3

1/3

1/3

1/3

.

.

.

.

Page 23: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

23

CA is merge between TA and NRA.

The idea of CA is to run NRA but after every h steps to perform random access step.

Both NRA and CA are instance optimal over all databases, when the aggregation function is monotone

CA(Combined Algorithm)

Page 24: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

24

In this paper the authors have studied a simple and elegant algorithm called TA.

They have also studied the variants of TA, when there are no sorted access, no random access etc..,

They have emphasized on instance optimality and they have proved that their algorithms are instance optimal over all algorithms for all databases under normal assumptions.

But they have not considered the computational costs and the data structures that are required to implement the algorithms.

Conclusion

Page 25: Optimal aggregation algorithms for middleware Ronald Fagin,  Amnon Lotem , and  Moni Naor

25

Questions?