6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
-
date post
20-Dec-2015 -
Category
Documents
-
view
220 -
download
1
Transcript of 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
04/18/23 1
Top-Top-kk algorithms algorithmsTop-Top-kk algorithms algorithms
Finding k objects that have the highest overall grades
04/18/23 2
Top-k queryGiven – a relation R (id, x1, x2, x3) and – a query Q: sum(x1, x2, x3)
Find k tuples with highest grades according to Q.
id x1 x2 x3
a 0.3 0.6 0.7
b 0.2 0.3 0.4
c 0.4 0.5 0.9
d 0.7 0.6 0.1
R
Top-2 tuples
sum
1.6
0.9
1.8
1.4
04/18/23 3
Problem formulation 1• Given
– A relational table R (id, x1, x2, …, xm)
– A query Q (monotone function)
• Find top-k tuples according to Q
04/18/23 4
Problem formulation 2• Given
– A relational table R (id, x1, x2, …, xm)
– A materialized view V (id, scorev) over R
– A query Q (monotone function)
• Find top-k tuples according to Q
04/18/23 5
Topics of Discussion• Fagin’s algorithm (FA)• Threshold algorithm (TA)
– No Random Accesses algorithm (NRA)
• Prefer
04/18/23 6
Topics of Discussion
• Fagin’s algorithm (FA)• Threshold algorithm (TA)
– No Random Accesses algorithm (NRA)
• Prefer
04/18/23 7
Finding top –k with FA • Do sorted access (in parallel) to each of
the lists Xi until at least k objects are seen in each of the lists
• For each object t seen, do random accesses to the rest of the lists
• Compute Q (t) for each object seen. Y is the set having k objects seen with the highest grades
04/18/23 8
FA example• Find top-2 with Q: min(x1, x2)
(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
.
Sorted Χ1 Sorted Χ2R
ID X1 X2
a 0.9 0.85
b 0.8 0.7
c 0.72 0.2
.
.
.
.
.
.
.
.
.
.
.
.
d 0.6 0.9
04/18/23 9
FA example • STEP 1
– Read attributes from every sorted list– Stop when k objects have been seen in common from all lists
(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
Χ1 Χ2
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
. c
ID Χ1 Χ2 min(x1,x2)
a
d
0.9
0.9
0.85
b 0.8
0.72
0.7
04/18/23 10
FA example • STEP 2
– Random access to find missing grades
(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
Χ1 Χ2
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
. c
ID Χ1 Χ2 min(x1,x2)
a
d
0.9
0.9
0.85
b 0.8
0.72
0.7
0.6
0.2
04/18/23 11
c
ID Χ1 Χ2 min(x1,x2)(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
Χ1 Χ2
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
.
a
d
0.9
0.9
0.85
b 0.8
0.72
0.7
0.6
0.2
0.85
0.6
0.7
0.2
FA example • STEP 3
– Compute the grades of the seen objects.– Return the k highest graded objects.
04/18/23 12
Topics of Discussion
• Fagin’s algorithm (FA)• Threshold algorithm (TA)
– No Random Accesses algorithm (NRA)
• Prefer
04/18/23 13
Finding top –k with TA • Do sorted access (in parallel) to each of the
lists Xi and random accesses to the other lists. Compute Q (t) for every object t seen. Remember k highest objects.
• For each list Xi let xi be the last grade seen. Compute threshold value τ = Q(x1, …, xm). Halt when at least k objects have grade ≥ τ
• Y is the set having k objects seen with the highest grades
04/18/23 14
TA example• Find top-2 with Q: min(x1, x2)
(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
.
Sorted Χ1 Sorted Χ2R
ID X1 X2
a 0.9 0.85
b 0.8 0.7
c 0.72 0.2
.
.
.
.
.
.
.
.
.
.
.
.
d 0.6 0.9
04/18/23 15
ID Χ1 Χ2 min(x1,x2)
Step 1: - parallel sorted access to each list
(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
X1 X2
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
.
a
d
0.9
0.9
0.85 0.85
0.6 0.6
For each object seen: - get all grades by random access - determine min(x1,x2) - amongst 2 highest seen ? keep in
buffer
TA example
04/18/23 16
ID X1 X2 min(x1,x2)(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
Χ1 Χ2
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
.
Step 2: - Determine threshold value based on objects currently seen under sorted access. τ = min(x1, x2)
a
d
0.9
0.9
0.85 0.85
0.6 0.6
T = min(0.9, 0.9) = 0.9
- 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1
TA example
04/18/23 17
ID X1 X2 min(X1,X2)
Step 1 (Again): - parallel sorted access to each list
(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
X1 X2
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
.
a
d
0.9
0.9
0.85 0.85
0.6 0.6
For each object seen: - get all grades by random access - determine min(x1,x2) - amongst 2 highest seen ? keep in
buffer
b 0.8 0.7 0.7
TA example
04/18/23 18
ID X1 X2 min(x1,x2)(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
X1 X2
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
.
Step 2 (Again): - Determine threshold value based on objects currently seen. τ =min(X1, X2)
a
b
0.9
0.7
0.85 0.85
0.8 0.7
τ = min(0.8, 0.85) = 0.8
- 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1
TA example
04/18/23 19
ID Χ1 Χ2 min(x1,x2)(a, 0.9)
(b, 0.8)
(c, 0.72)
(d, 0.6)
.
.
.
.
Χ1 Χ2
(d, 0.9)
(a, 0.85)
(b, 0.7)
(c, 0.2)
.
.
.
.
Situation at stopping condition
a
b
0.9
0.7
0.85 0.85
0.8 0.7
τ = min(0.72, 0.7) = 0.7
TA example
04/18/23 20
Topics of Discussion
• Fagin’s algorithm (FA)• Threshold algorithm (TA)
– No Random Accesses algorithm (NRA)
• Prefer
04/18/23 21
Finding top –k with NRA
• Do sorted access (in parallel) to each of the lists– Maintain last grades seen xi
– For every object t compute Wt and Bt
– Topk = {k objects with highest W} and Mk = kth highest W
– Viable object when Bt >Mk, t belongs in R
• Halt when Bt ≤ Mk for all objects not in Topk
04/18/23 22
Define W and B• Lower bound W=(x1, x2,…,xl,0,…0)
• Upper bound B=(x1, x2, …,xl,xl+1,..)
• E.g. f(x1, x2, x3)=x1+x2+x3
x1
a:0.7
.
.
.
x2
a:0.8
.
.
.
x3
d:0.9
.
.
.
•Wa=(0.7, 0.8, 0) = 1.5
•Ba=(0.7, 0.8, 0.9) = 2.4
04/18/23 23
NRA examplelis
ts s
ort
ed b
y
score
f:0.6 d:0.6 q:0.9
n:0.5 g:0.6 d:0.7
q:0.4 c:0.6 j:0.3
d:0.3 a:0.6 p:0.2
e:0.2 q:0.5 m:0.1
r:0.1 e:0.3 b:0.1
h:0.1
Χ1 Χ3Χ2
•Find top-2 with Q: sum(x1, x2, x3)
04/18/23 24
NRA examplelis
ts s
ort
ed b
y
score
f:0.6 d:0.6 q:0.9
n:0.5 g:0.6 d:0.7
q:0.4 c:0.6 j:0.3
d:0.3 a:0.6 p:0.2
e:0.2 q:0.5 m:0.1
r:0.1 e:0.3 b:0.1
h:0.1
Χ1 Χ3Χ2
Nk=2.1 ≤ Mk=0.6
ID BW
q
d
f
0.90.6
2.1
0.6
2.12.1
Topk
04/18/23 25
NRA examplelis
ts s
ort
ed b
y
score
f:0.6 d:0.6 q:0.9
n:0.5 g:0.6 d:0.7
q:0.4 c:0.6 j:0.3
d:0.3 a:0.6 p:0.2
e:0.2 q:0.5 m:0.1
r:0.1 e:0.3 b:0.1
h:0.1
Χ1 Χ3Χ2
Nk=1.9 ≤ Mk=0.9
ID BW
q
d
f
1.30.9
1.8
0.6
2.01.9
g
n
0.6
1.8
1.8
0.5
Topk
04/18/23 26
NRA examplelis
ts s
ort
ed b
y
score
f:0.6 d:0.6 q:0.9
n:0.5 g:0.6 d:0.7
q:0.4 c:0.6 j:0.3
d:0.3 a:0.6 p:0.2
e:0.2 q:0.5 m:0.1
r:0.1 e:0.3 b:0.1
h:0.1
Χ1 Χ3Χ2
Nk=1.5 ≤ Mk=1.3
ID BW
q
d
f
1.31.3
1.9
0.6
1.71.5
n
g 0.6
1.4
1.3
0.5
c
j
1.3
0.6
1.3
0.3
Topk
04/18/23 27
NRA examplelis
ts s
ort
ed b
y
score
f:0.6 d:0.6 q:0.9
n:0.5 g:0.6 d:0.7
q:0.4 c:0.6 j:0.3
d:0.3 a:0.6 p:0.2
e:0.2 q:0.5 m:0.1
r:0.1 e:0.3 b:0.1
h:0.1
Χ1 Χ3Χ2
Nk=1.4 ≤ Mk=1.3
ID BW
q
d
f
1.61.3
1.6
0.6
1.91.4
a
g 0.6
1.1
1.1
0.6
c
n
1.1
0.6
1.3
0.5
p
j 0.30.2
1.21.1
Topk
04/18/23 28
NRA examplelis
ts s
ort
ed b
y
score
f:0.6 d:0.6 q:0.9
n:0.5 g:0.6 d:0.7
q:0.4 c:0.6 j:0.3
d:0.3 a:0.6 p:0.2
e:0.2 q:0.5 m:0.1
r:0.1 e:0.3 b:0.1
h:0.1
Χ1 Χ3Χ2
Nk=1.2 ≤ Mk=1.6
ID BW
d
q
f
1.81.6
1.8
0.6
1.61.2
a
g 0.6
0.9
0.9
0.6
c
n
0.9
0.6
1.1
0.5
p
j 0.30.2
1.00.9e
m
0.2
0.80.
10.8
Topk
04/18/23 29
Topics of Discussion• Fagin’s algorithm (FA)• Threshold algorithm (TA)
– No Random Accesses algorithm (NRA)
• Prefer
04/18/23 30
Finding top –k with PREFER
• Step1: View selection algorithm– materializes a number of ranked views V of the
relation R and uses them to efficiently answer preference queries Q.
• Step2: Pipelined algorithm– Define 1st watermark– Output first tuples according to 1st watermark – Define 2nd watermark– Output second tuples according to 2nd watermark– …
04/18/23 31
)(tf(t)fT(t)fR,t 1vqq
1v,qv
Finding top –k with PREFER
• Determine watermark – How deep in V we must go to output the
top result tuple tq1
• such that– if t in V is below then t can’t be tq
1
since tv1 has higher score over Q
1v,qT
1v,qT
1v,qT
t fv(t)V
Watermark 1
v,qT
04/18/23 32
Finding top –k with PREFER
• Determine tq1 according to
– Scan V from top and retrieve prefix [tv1, tv
2,…, tv
w) where tvw first tuple in V with score
less than
– Order prefix according to Q, [tq1,…, tq
w-1]. Let tq
s be the position of tv1 according to Q.
1v,qT
1v,qT
t fv(t)
a 0.9
b 0.8
c 0.7
d 0.5
V
Watermark
=0.651
v,qTtv
1
tv2
tv3
Order according to Q
tq1
tq2
tq3
a=tv1
c=tv3
b=tv1 =tq
s
04/18/23 33
PREFER example
X3X2X1ID fq(t)fv(t)
g
f
e
d
5512
51015
12105
81015
5.76.4
99
10.19.8
9.910.2
c 121817 16.115.4
b 112020 17.316.4
a 201710 17.216.8
Find top-4 with:
fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3
t1
Watermark=14.26
1. Calculate Watermark for t1, which is 14.26
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
a 201710 17.216.8
b
a
ID
b 112020 17.316.4
04/18/23 34
PREFER example
X3X2X1ID fq(t)fv(t)
g
f
e
d
5512
51015
12105
81015
5.76.4
99
10.19.8
9.910.2
c 121817 16.115.4
b 112020 17.316.4
a 201710 17.216.8
a 201710 17.216.8
b
a
ID
b 112020 17.316.4
t1
1. Calculate Watermark for t1, which is 13.1
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Find top-4 with:
fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3
04/18/23 35
PREFER example
X3X2X1ID fq(t)fv(t)
g
f
e
d
5512
51015
12105
81015
5.76.4
99
10.19.8
9.910.2
c 121817 16.115.4
b 112020 17.316.4
a 201710 17.216.8
a 201710 17.216.8
b
a
c
ID
b 112020 17.316.4
t1
1. Calculate Watermark for t1, which is 13.1
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Find top-4 with:
fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3
04/18/23 36
PREFER example
X3X2X1ID fq(t)fv(t)
g
f
e
d
5512
51015
12105
81015
5.76.4
99
10.19.8
9.910.2
c 121817 16.115.4
b 112020 17.316.4
a 201710 17.216.8
a 201710 17.216.8
b
a
c
ID
b 112020 17.316.4
t1
1. Calculate Watermark for t1, which is 8.3
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Find top-4 with:
fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3
04/18/23 37
PREFER example
X3X2X1ID fq(t)fv(t)
g
f
5512
51015
5.76.4
99
e 12105 10.19.8
d 81015 9.910.2
c 121817 16.115.4
b 112020 17.316.4
a 201710 17.216.8
a 201710 17.216.8
b
a
c
d
e
ID
b 112020 17.316.4
t1
1. Calculate Watermark for t1, which is 8.3
2. Find prefix of view with fv greater than watermark value and sort them by fq
3. Output tuples up to t1
4. Repeat using first unprocessed as t1
Find top-4 with:
fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3
04/18/23 38
Citations• Ronald Fagin, Amnon Lotem, Moni Naor. Optimal aggregation
algorithms for middleware. J. Comput. Syst. Sci. 66(4), pp. 614-656, 2003.
• Ronald Fagin. Combining fuzzy information from multiple systems. In Proc. of the 15th ACM Symposium on principles of database systems, pp. 216-226, Montreal Canada, 1996.
• Ronald Fagin. Fuzzy queries in multimedia database systems. In Proc. of the 17th ACM Symposium on principles of database systems, pp. 1-10, Seattle USA, 1998.
• Ulrich Güntzer, Wolf-Tilo Balke, Werner Kießling. Optimizing Multi-Feature Queries for Image Databases. In proc. of the 26th VLDB conference, pp. 419-428, Cairo Egypt, 2000.
• Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou. PREFER a system for the efficient execution of multi-parametric ranked queries. In Proc. of the ACM Special Interest Group on Management of Data Conference (SIGMOD), pp. 259-270, Santa Barbara USA, 2001
• Vagelis Hristidis, Yannis Papakonstantinou. Algorithms and applications for answering ranked queries using ranked views. VLDB journal, 13(1), pp. 49-70, 2004.
• Surya Nepal, M. V. Ramakrishna. Query Processing Issues in Image (Multimedia) Databases. In Proc. 15th International Conference on Data Engineering (ICDE), pp. 22-29, Sydney Australia, March 1999.
04/18/23 39
Questions