Answering Top-k Queries Using Views

31
Answering Top-k Queries Using Views By Gautam Das Dimitrios Gunopulos Nick Koudas Dimitris Tsirogiannis Presented By Raju Buchi Poornima Ancha

description

Answering Top-k Queries Using Views. By Gautam Das Dimitrios Gunopulos Nick Koudas Dimitris Tsirogiannis. Presented By Raju Buchi Poornima Ancha. AGENDA. Agenda. Introduction Views Related Work Preliminaries Problems Discussed Algorithm LPTA View Selection Problem - PowerPoint PPT Presentation

Transcript of Answering Top-k Queries Using Views

Page 1: Answering Top-k Queries  Using Views

Answering Top-k Queries Using Views

ByGautam DasDimitrios GunopulosNick KoudasDimitris Tsirogiannis Presented By

Raju BuchiPoornima Ancha

Page 2: Answering Top-k Queries  Using Views

AGENDA

Agenda

Introduction Views Related Work Preliminaries Problems Discussed Algorithm LPTA View Selection Problem Experimental Results

Page 3: Answering Top-k Queries  Using Views

Introduction

Answering Top-k Queries

• Active research topic

• Retrieve quickly a number(k) of highest ranking tuples

in presence of monotone ranking functions defined on

attributes of underlying relations

Algorithms

• Threshold Algorithm (TA) by Fagin et. al.,

• Independently by Guntzer et. al.,

• Nepal et. al.,

INTRODUCTION

Page 4: Answering Top-k Queries  Using Views

Views

Materialized Views

• A database table that contains the results of the query

previously asked. Actually constructed and stored.

Problem Discussed

To find efficient methods of answering a query using a set of

previously defined materialized views over the database .

Why Views?

• Relevance to a variety of data management problems.

• Promised increased in performance.

• Views are materialized (incurring a space overhead) with the

hope to gain in performance for some queries.

INTRODUCTION

Page 5: Answering Top-k Queries  Using Views

Views

• Views do not specify any selection conditions on the attributes

they aim to rank.

• Example: (TOP-k)

INTRODUCTION

tid X1 X2 X3

1 82 1 59

2 53 19 83

3 29 1 2

4 80 22 90

5 28 8 87

6 12 55 82

7 16 99 42

8 18 42 67

9 42 1 23

10 23 21 88

R tid Score

7 527

6 299

4 270

8 246

2 201

tid Score

6 219

4 202

10 197

f1=2x1+5x2 f2=x2+2x3

View1 (V1)Top-5 Query

View2 (V2)Top-3 Query

Page 6: Answering Top-k Queries  Using Views

Views – Example Contd…

• Given a top-2 query defined using function f3=3x1+10x2+5x3,

we can apply standard top-k algorithm(e.g., TA) using the data

from R and obtain answer to the query.

• Using Views?

• Feasibility

• Guarantee an answer

• Speed of using R directly vs. Using Views

INTRODUCTION

Page 7: Answering Top-k Queries  Using Views

Related Work

• Multimedia Context: Uses ordered lists

• Threshold Algorithm:

• This algorithm requires the scoring function to be monotonic.

i .e. For tuples t and u, t[i]<u[i], 1≤i≤100, then ScoreQ(t)≤ScoreQ(u).

• TA requires that each attribute has an index mechanism that

allows all tids to be accessible in sorted order.

• A single random access is required to resolve all attributes of a tid.

• In our paper we focus on Additive scoring functions(monotonic),

where ScoreQ(t)=w1t[1]+ w2t[2]+….+ wmt[m]

RELATED

WORK

Page 8: Answering Top-k Queries  Using Views

Related Work

• Variants:

• TA-Sorted - Lists are always accessed sequentially and NO

random accesses are performed.

• PREFER [Hristidis et. al.,] :

• Storing multiple copies of ‘R’.

• It assumes to utilize only one copy of a relation which is

closest to the new query to answer the new query.

RELATED

WORK

Page 9: Answering Top-k Queries  Using Views

Ranking Queries• Consider Relation R with m numeric attributes (X1, X2…Xm)

• Domi=[lbi, ubi] domain of ith attribute.

• Tuple t is viewed as numeric vector t=(t[1], t[2]… t[m])

• Top-k Ranking Queries in SQL-like syntax:

SELECT TOP[k] FROM R WHERE RangeQ ORDER BY ScoreQ

• Expressed as a triple Q=(ScoreQ, k, RangeQ)

• ScoreQ: Function that assigns a numeric score to any tuple ‘t’.

• RangeQ : Boolean function that defines a selection condition

for the tuples of ‘R’.

• The semantics requires that the system retrieve the k tuples

with the top scores satisfying the selection condition.

PRELI

MINARIES

Page 10: Answering Top-k Queries  Using Views

Ranking Views

• Materialized Ranking View(V):

• Materialized result of the tuples of a previously executed top-k

query Q, ordered according to the scoring function ScoreQ.

Q’=(ScoreQ’ , k’, RangeQ’ )

• Corresponding materialized ranking view’ is a set of k(tid,

ScoreQ(tid) pairs, ordered by decreasing the values of ScoreQ(tid).

PRELI

MINARIES

Page 11: Answering Top-k Queries  Using Views

Problems Discussed• Problem 1: TOP-k QUERY ANSWERING USING VIEWS

• Given a set of views and a query Q, obtain an answer to Q

combining all the information conveyed by the views in U.

• SOLUTION: Algorithm named LPTA.

• Problem 2: VIEW SELECTION

• Given a collection of views V={V1, V2 … VR} that includes

the base views(thus r ≥ m) and a query Q, determine the

most efficient subset U ⊆ V to execute Q on.

• Such a subset U will be provided as input to LPTA.

• Should identify a set of views that can provide an answer

to the query and at same time provide the answer faster

than running TA on the base set of views, if possible.

PROBLEMS

Page 12: Answering Top-k Queries  Using Views

LPTA LINEAR PROGRAMMING ADAPTION OF THE THRESHOLD ALGORITHMALGORITHM

LPTA

• An adaptation of TA algorithm in the sense that it answers top-

k queries using multiple ranking views

• Requires the scoring functions of the query & the views to be

linear and additive

• Sorted access on pairs (tid, scoreQ(tid))

• Views and Queries are of the form V’ = (ScoreV’, n, *) and

Q=(ScoreQ, k, *) respectively.

• Pseudo code

• Example

• General Approach

Page 13: Answering Top-k Queries  Using Views

LPTA LINEAR PROGRAMMING ADAPTION OF THE THRESHOLD ALGORITHMALGORITHM

LPTA

• Pseudo code

• Initialize top-k buffer to empty.

• Retrieve the tids from the views V1 and V2 in a lock-step

fashion, in the order of decreasing score.

• Retrieve corresponding tuple by random access on R.

• Compute score according to f3 and update top-k buffer to

contain largest scores.

• Check the stopping condition.

• Once the stopping condition is satisfied we will have the

results in the top-k buffer.

Page 14: Answering Top-k Queries  Using Views

LPTA LINEAR PROGRAMMING ADAPTION OF THE THRESHOLD ALGORITHMALGORITHM

LPTA

• Stopping Condition:

• After dth iteration,

let the tuple read from V1= (tid1d, s1

d) and V2= (tid2d, s2

d)

and minimum score in the top-k buffer be top-kmin

• At this point the unseen tuples have to satisfy the following

inequalities: ( Domain of each attribute of R = [1, 100])0≤X1, X2, X3≤1002x1 + 5x2 ≤ s1

d

x2 + 2x3 ≤ s2d

• This will represent a convex region in 3-d space.

• unseenmax will be the solution to the linear program

where we maximize the function f3=3x1+10x2+5x3

Page 15: Answering Top-k Queries  Using Views

LPTA LINEAR PROGRAMMING ADAPTION OF THE THRESHOLD ALGORITHMALGORITHM

LPTA

• Example: (TOP-k Query Answering using Views)

tid X1 X2 X3

1 82 1 59

2 53 19 83

3 29 1 2

4 80 22 90

5 28 8 87

8 18 42 67

9 42 1 23

10 23 21 88

R

tid Score

4 270

8 246

2 201

tid Score

10 197

f1=2x1+5x2 f2=x2+2x3

View1 (V1)Top-5 Query

View2 (V2)Top-3 Query

f3=3x1+10x2+5x3Query = (f3, k, *)

top-2 buffer

7

6

527

299

6 219

4 202

{tidid, si

d }={(7,1248), (6,996)}Linear Programming Solution with s1

d=527 and s2d=219 gives

unseenmax= 1388

(7,1248)

(6,996)

7 16 99 42

6 12 55 82

Page 16: Answering Top-k Queries  Using Views

LPTA LINEAR PROGRAMMING ADAPTION OF THE THRESHOLD ALGORITHMALGORITHM

LPTA

• Example: (TOP-k Query Answering using Views)

tid X1 X2 X3

1 82 1 59

2 53 19 83

3 29 1 2

5 28 8 87

7 16 99 42

8 18 42 67

9 42 1 23

10 23 21 88

R

tid Score

4 270

8 246

2 201

tid Score

10 197

f1=2x1+5x2 f2=x2+2x3

View1 (V1)Top-5 Query

View2 (V2)Top-3 Query

f3=3x1+10x2+5x3Query = (f3, k, *)

top-2 buffer

(7, 1248)

(6, 996)

7

6

527

299

6 219

4 202

{tidid, si

d }={(6,996), (4, 910)}Linear Programming Solution with s1

d=299 and s2d=202 gives

unseenmax= 953.5

6 12 55 82

4 80 22 90

≤ top-kmin

Page 17: Answering Top-k Queries  Using Views

LPTA LINEAR PROGRAMMING ADAPTION OF THE THRESHOLD ALGORITHMALGORITHM

LPTA

V1

s11

tid12 s1

2

tid13 s1

3

tid14 s1

4

tid15 s1

5

V2

s21

tid22 s2

2

tid23 s2

3

tid24 s2

4

tid25 s2

5

tid11

R(X1, X2) Top-1

V1

V2

Qstoppingcondition

X1

X2

R=(1,1)

tid21

tid21

tid11

P=(1,0)O=(0,0)

T=(0,1)

Page 18: Answering Top-k Queries  Using Views

LPTA LINEAR PROGRAMMING ADAPTION OF THE THRESHOLD ALGORITHMALGORITHM

LPTA

0 ≤ x1, x2, x3 ≤ 100

2x1 + 5x2 ≤ s1d

x2 + 2x3 ≤ s2d

fV1=2x1+5x2

fV2=x2+2x3

Q: fQ=3x1+10x2+5x3R(X1, X2)

tid score

tid1d s1

d

tid score

tid2d s2

d d iteration

View1 (V1) View2 (V2)

unseenmax ≤ top-kmin

Page 19: Answering Top-k Queries  Using Views

LPTA LINEAR PROGRAMMING ADAPTION OF THE THRESHOLD ALGORITHMALGORITHM

LPTA

V1

tid11 s1

1

s12

tid13 s1

3

tid14 s1

4

tid15 s1

5

V2

tid21 s2

1

s22

tid23 s2

3

tid24 s2

4

tid25 s2

5

R(X1, X2)

tid12

tid22

tid21

tid22

V1

V2

Qstoppingcondition

Top-1

X1

X2

P=(1,0)O=(0,0)

T=(0,1)R=(1,1)

tid21

tid11

Page 20: Answering Top-k Queries  Using Views

TA Vs. LPTA

TA

VS

LPTA

• LPTA essentially becomes TA when the set of views U equal to the set of base views

• In terms of execution cost both have Sequential as well as Random Access

• Execution Efficiency: I/O Operations play a significant role – they overshadow the costs of CPU operations such as updated top-k buffer, testing for stopping condition & so on.

• Highly correlated: every sequential access incurs a random access.

• Determining factor: If d = number of lock-step iterations and

r = no. of views, then running Cost is O(dr).

Page 21: Answering Top-k Queries  Using Views

Conceptual DiscussionVIEW

SELECTION

Given a collection of views Ѵ = {V1,V2,…. Vr } that includes

base views determine the most efficient subset U ⊆ Ѵ to

execute the query Q on.

Conceptual Discussion

• View Selection in Two Dimensions

• View Selection in Higher Dimensions

Page 22: Answering Top-k Queries  Using Views

Conceptual DiscussionVIEW

SELECTION

2D

R=(1,1)

O=(0,0) P=(1,0)

T=(1,0)

V2

V1

Q

A1 A’1 A A’2

M

B’1

B’2 B2

B

Min top-k tuple

X

Y

Page 23: Answering Top-k Queries  Using Views

Conceptual DiscussionVIEW

SELECTION

HD

For Ѵ = {V1,V2,…. Vr } being a set of views for m-dimensional dataset, Q being query, the optimal execution of LPTA requires the use of a subset of the views U ⊆ Ѵ such that |U| < m.

Page 24: Answering Top-k Queries  Using Views

View Selection ProblemCOST

ESTI

MATION

• Compute histograms representing the distribution of scores

along each view in U.

• Estimate top kmin from Hq by determining the bucket which

contains the kth highest tuple.

• “Walkdown” these histograms until the stopping condition

is reached.

• Check stopping condition by linear programming.

• When Unseen max < top kmin then perform logarithmic search

within last bucket.

• Number of sorted accesses ((d-1)n/b + n’)r’.

• Running time of algorithm is O((d-1)+log n’)

Page 25: Answering Top-k Queries  Using Views

Select Views(Q,V)SELECT

VIEWS

• Consider MinCost and MinCurCost = ∞, U={ }, V -є ѴU

• Compare the cost estimate for V with MinCurCost,

if EstimateCost < MinCurCost , add V to MinV.

• MinCurCost is now is EstimateCost of V.

• ∀ V, above steps are followed

• When MinCurCost < MinCost, V is added U

• This is repeated for all the attributes m considered.

Page 26: Answering Top-k Queries  Using Views

View Selection Algorithms

Select Views(Q,V) / Exhaustive : Estimates cost of all possible (r

p)subsets of V to select one with minimum cost.

Simple Greedy Heuristic : Iterates the set of views , selects the one that reduces the total cost by the greatest amount.

SELECT

VIEWS

Page 27: Answering Top-k Queries  Using Views

View Selection Algorithms

Select Views Spherical(Q,V) : it has to solve linear program just once and is very effective for highly restrictive data sets.

Select view By Angles : sorts the view vectors by increasing angle with query vector returning top-m views.

SELECT

VIEWS

Page 28: Answering Top-k Queries  Using Views

More General Queries & Views

Views that Only Materialize their Top-k Tuples• Truncate the histograms

Accommodating Range Conditions• Select the views that cover the range conditions.• Truncate each attribute’s histogram

MORE

GENERAL

QUERIES

&

VIEWS

Page 29: Answering Top-k Queries  Using Views

Performance EvaluationEXPERI

MENTAL

RESULTS

Real Data, performance comparison of PREFER, LPTA, TA

(2d) (3d)

Page 30: Answering Top-k Queries  Using Views

References

REFERENCES

• Answering Top-k Queries Using Views: Gautam Das, Dimitrios Gunopulos, Nick Koudas

• aitrc.kaist.ac.kr/~vldb06/slides/R13-1.ppt

Page 31: Answering Top-k Queries  Using Views

THANK YOUQuestions???