Supporting Top- k join Queries in Relational Databases

35
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney

description

Supporting Top- k join Queries in Relational Databases. Ihab F. Ilyas , Walid G. Aref , Ahmed K. Elmagarmid. Presented by: Richa Varshney. Introduction. O rdered set of join results according to some provided function. Often searches are done on multiple features. - PowerPoint PPT Presentation

Transcript of Supporting Top- k join Queries in Relational Databases

Page 1: Supporting Top- k join  Queries in Relational Databases

Supporting Top-k join Queries in Relational

DatabasesIhab F. Ilyas, Walid G. Aref, Ahmed

K. Elmagarmid

Presented by: Richa Varshney

Page 2: Supporting Top- k join  Queries in Relational Databases

IntroductionO Ordered set of join results according to

some provided function.

O Often searches are done on multiple features.

O Each feature produces a different ranking for the query.

O Joining the individual feature rankings to produce a global ranking.

2

Page 3: Supporting Top- k join  Queries in Relational Databases

Example 1: Ranking in Multimedia Retrieval

Color Histogram

Edge Histogram

Texture

Query

Color Histogram

Edge Histogram

Texture

VideoDatabase

3

Page 4: Supporting Top- k join  Queries in Relational Databases

Example 2

SELECT h.id , s.nameFROM houses h , schools sWHERE h.location = s.locationORDER BY h.price+10 x s.tuitionSTOP AFTER 4

44

Page 5: Supporting Top- k join  Queries in Relational Databases

Example 2 (Cont’d)

ID

Location Price

123456

LafayetteW.LafayetteIndianapolisKokomoLafayetteKokomo

……

90,000110,000111,000118,000125,000154,000

ID

Location Tuition

12345678

IndianapolisW.LafayetteLafayetteLafayetteIndianapolisIndianapolisKokomoKokomo

30003500600062007000790082008200

SchoolsHouses

1 3 1500001 4 1520002 2 1450003 1 141000

5

Page 6: Supporting Top- k join  Queries in Relational Databases

MotivationSELECT A.1,B.2FROM A,B,CWHERE A.1 = B.1 and B.2 = C.2ORDER BY (0.3*A.1+0.7*B.2)STOP AFTER 5;

Problems:-• Sorting is an expensive

operation.• Sorting is a blocking

operator.

66

Page 7: Supporting Top- k join  Queries in Relational Databases

ContributionO Propose a new Rank-Join algorithmO Analyze the I/O cost of the algorithmO Implement the algorithm O Propose a score-guided and adaptive

join strategyO Evaluate performance

77

Page 8: Supporting Top- k join  Queries in Relational Databases

Ripple JoinCartesian product L x R

(L1(1,1,5) R1(1,3,5))

88

(L2,R2) {(2,2,4),(2,1,4)}

(L2,R1) {2,2,4), (1,3,5)}

(L1,R2) {(1,1,5), (2,1,4)}

L

L

R

R

Page 9: Supporting Top- k join  Queries in Relational Databases

Variation Of Ripple Join

Rectangle

Block

Hash Ripple Join: where all the sampled tuples are kept in hash tables in memory

99

Page 10: Supporting Top- k join  Queries in Relational Databases

Query Model: Top-k JoinOm Relations R1, ….., Rm | Ri has:

O n attributesO score attribute, si (can be an expression over

other attributes)

OA global score for a join result is computed as

F(s1,…., sm)

O A top-k join query is an ordered set of join results according to some provided function that combines the orders on each input.

O An example template:

SELECT some_attributesFROM R1,…..,Rm

WHERE join_conditionORDER BY F(s1,…..,sm) STOP AFTER k 10

10

Page 11: Supporting Top- k join  Queries in Relational Databases

1) Generate new valid join combinations2) Compute score for each combination3) For each incoming input, calculate the

threshold score:a) The last seen feature value and the

top ranked feature value for all other features in the query.

b) Store the maximum of these as T (threshold)

4) Store top k(maximum combined score) results in priority queue.

5) Halt when lowest value of queue ≥ T11

Rank-Join Algorithm

11

Page 12: Supporting Top- k join  Queries in Relational Databases

Select * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 3

Compute a Threshold (T) by

Max {(Last L).B + (First R.B), (First L).B + (Last R).B}

(1). Get a valid combination using any certain algorithm

Ripple Select (L1, R1) => No Result1212

Example

Page 13: Supporting Top- k join  Queries in Relational Databases

Example--Cont.

(1) Get a valid combination using any certain algorithm

Select (L2, R2)

(L2, R2), (L2, R1), (L1, R2) => (L1, R2)

(2) Compute the score (J) for the result

J1(L1, R2) => L.B + R.B = 5 + 4 = 91313

Select * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 3

Page 14: Supporting Top- k join  Queries in Relational Databases

O (3) Compute a Threshold (T) score by Max {(Last L).B + (First R.B), (First L).B + (Last R).B}

Selection (L1, R1) , (L2, R2) => T = Max (L2.B + R1.B, L1.B + R2.B) =Max (4+5, 5+4) = 9

O (4) J1= 9 ,T = 9,J1 >= T,Report J1

Since we need top 3 (k=3), continue until k=3 and Min(J1, J2, …Jk) > T

14

Select * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 3

Example--Cont.

14

Page 15: Supporting Top- k join  Queries in Relational Databases

(1) Select (L3, R3)

(L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3)

=> (L3, R3), (L2, R3)

(2) J2(L2, R3) = 4 + 3 = 7

J3(L3, R3) = 3 + 3= 6

Example--Cont.

1515

Select * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 3

Page 16: Supporting Top- k join  Queries in Relational Databases

O (3) Calculate T= Max { (Last L).B + (First R).B,(First L).B+ (Last R).B} = Max {L3.B + R1.B , L1.B + R3.B}= Max(3 + 5, 5 + 3) = 8

O (4) J1(L1,R2) = 9(reported),J2( L2, R3) = 7 ,J3(L3, R3) = 6 (Note, J’s are in descending order)

Min (J) = 6 < T Continue16

Select * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 3

Example--Cont.

16

Page 17: Supporting Top- k join  Queries in Relational Databases

(1)Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4)

(2) J(L4, R1) = 7, J(L2, R4) = 6, J(L3, R4) = 5

(3) T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7

(4) J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7,J3(L3,R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5Min(J1, J2) = 7 >= T (k = 3)

Example--Cont.

1717

Select * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 3

Page 18: Supporting Top- k join  Queries in Relational Databases

Hash Rank Join (HRJN) Operator

O Built on idea of hash ripple join

O Initialized by specifying four parameters:O Two inputs(Can be HRJN operator)O Join condition(general equality

condition/computes valid join)O Combining function(monotone/computes

global scores)

O Maintains highest (first) and lowest (last selected) objects from each relation.

O Results are added to a priority queue

1818

Page 19: Supporting Top- k join  Queries in Relational Databases

Hash Rank Join (HRJN) Operator: Problems

O Buffer ProblemO Cannot predict how many partial joins will

result

O Local Ranking Problem

1919

Page 20: Supporting Top- k join  Queries in Relational Databases

HRJN Solutions

O Use Block Ripple Join to solve Local Ranking Problem. (e.g. block size = 2)

2020

Page 21: Supporting Top- k join  Queries in Relational Databases

HRJN Solutions—Cont.

OHRJN* score-guided join strategyO How to select next (block) tuple

T1 = f(Ltop ,Rbottom ) and T2 = f(Lbottom ,Rtop ), where f is the ranking functionCase 1: T1 >T2 , more inputs should beretrieved from R

Case 2: T1 <T2 , more inputs should beretrieved from L 21

21

Page 22: Supporting Top- k join  Queries in Relational Databases

An adaptive join strategyO Use input availability as a guide instead of

the aforementioned score-guided strategy

O If both inputs are available, choose the next input to process.

O Otherwise, the available input is processed.

O e.g., a mediator over Web-accessible sources and distributed multimedia repositories

2222

Page 23: Supporting Top- k join  Queries in Relational Databases

Join Order

O When more than two tables join, the join order matters. (A and C have high similarity)

2323

Page 24: Supporting Top- k join  Queries in Relational Databases

O Rank-Join order heuristic- Get a ranked sample, top S ranked list from L and R- Calculate the similarity using Footrule

24

Join Order Algorithm

where (i, j ) is a valid join result that joins object i from L with object j from R

Page 25: Supporting Top- k join  Queries in Relational Databases

2525

Rank Join Order Heuristic

Page 26: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation

Changing the number of required answers: Selectivity = 0.2 % and m= 4

2626

Page 27: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation--Cont.

Changing the number of required answers: Selectivity = 0.2 % and m= 4

2727

Page 28: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation--Cont.

Changing the number of required answers: Selectivity = 0.2 % and m= 4

2828

Page 29: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation--Cont.

Changing the join selectivity: m =4 and K =502929

Page 30: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation--Cont.

Changing the join selectivity: m =4 and K =503030

Page 31: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation--Cont.

Changing the join selectivity: m =4 and K =503131

Page 32: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation--Cont.

Effect of pipelining: selectivity = 0 . 2% and K =50

3232

Page 33: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation--Cont.

Effect of pipelining: selectivity = 0 . 2% and K =50

3333

Page 34: Supporting Top- k join  Queries in Relational Databases

Performance Evaluation--Cont.

Effect of pipelining: selectivity = 0 . 2% and K =50

3434

Page 35: Supporting Top- k join  Queries in Relational Databases

Thank You

3535