EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
4
Transcript of EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.
![Page 1: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/1.jpg)
EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS
Presenting: Karina Koifman Course : DB Seminar
![Page 2: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/2.jpg)
Example
![Page 3: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/3.jpg)
Example
Yahoo! Autos
![Page 4: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/4.jpg)
Maybe a better retrieval
![Page 5: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/5.jpg)
Introduction
The article talks about the problem of
efficiently computing diverse query results
in online shopping applications.
![Page 6: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/6.jpg)
The Goal
The goal of diverse query answering
is to return a representative set of
top-k answers from all the tuples
that satisfy the user selection
condition
![Page 7: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/7.jpg)
Users issues query for a
product
Only most relevant answers are
shown.
Many Duplications
The Problem
![Page 8: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/8.jpg)
Existing Solutions
Definition of diversity
Impossibility results of
diversity.
Query processing technique.
Agenda
![Page 9: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/9.jpg)
Existing Solutions
Existing solutions are inefficient or
do not work in all situations.
Example:
Obtain all the query results and
then pick a diverse subset from
these results doesn’t scale for
large data sets.
![Page 10: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/10.jpg)
Existing Solutions
Web search engines:
first retrieve c × k and then pick a diverse subset from
these.
It is more efficient than the previous method.
many duplicates product sale. (inefficient and
doesn’t guarantee diversity)
![Page 11: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/11.jpg)
Existing Solutions
issuing multiple queries to obtain diverse results:
![Page 12: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/12.jpg)
Pro’s\Con’s
The good:
Diversity
The Bad:
Hurts performance
Empty results
*There are no Honda
Accord convertibles
![Page 13: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/13.jpg)
Existing Solutions
Definition of diversity
Impossibility results of
diversity.
Query processing technique.
Agenda
![Page 14: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/14.jpg)
A diversity ordering of a relation R with
attributes A, denoted by , is a total
ordering of the attributes in A.
Example: Make ≺ Model ≺ Color ≺ Year ≺
Description ≺ Id
Diversity Ordering
R
![Page 15: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/15.jpg)
The DB example
Id Make Model Color Year Description
1 Honda Civic Green 2007 Low miles
2 Honda Civic Blue 2007 Low miles
3 Honda Civic Red 2007 Low miles
4 Honda Civic Black 2007 Low miles
5 Honda Civic Black 2006 Low miles
6 Honda Accord Blue 2007 Best Price
7 Honda Accord Red 2006 Good miles
8 Honda Odyssey Green 2007 Rare
9 Honda Odyssey Green 2006 Good miles
10 Honda CRV Red 2007 Fun Car
11 Honda CRV Orange 2006 Good miles
12 Toyota Prius Tan 2007 Low miles
13 Toyota Corolla Black 2007 Low miles
14 Toyota Tercel Blue 2007 Low miles
15 Toyota Camry Blue 2007 Low miles
![Page 16: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/16.jpg)
Similarity – SIM(X,Y)
1 Honda Civic Green 2007 Low miles
2 Honda Civic Blue 2007 Low miles
( , ) 1SIM x y
12 Toyota Prius Tan 2007 Low miles
1 Honda Civic Green 2007 Low miles
( , ) 0SIM x y
Find a result set that
minimizes
,( , )
x y SSIM x y
![Page 17: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/17.jpg)
Example - Similarity
Id Make Model Color Year Description
1 Honda Civic Green 2007 Low miles
6 Honda Accord Blue 2007 Best Price
8 Honda Odyssey Green 2007 Rare
Id Make Model Color Year Description
1 Honda Civic Green 2007 Low miles
2 Honda Civic Blue 2007 Low miles
12 Toyota Prius Tan 2007 Low miles
![Page 18: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/18.jpg)
Prefix
Id Make Model Color Year Description
1 Honda Civic Green 2007 Low miles
Id Make Model Color Year Description
2 Honda Civic Blue 2007 Low miles
Id Make Model Color Year Description
8 Honda Odyssey Green 2007 Rare
9 Honda Odyssey Green 2006 Good miles
![Page 19: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/19.jpg)
Few more definitions
RES(R,Q) of size k
Given relation R and query Q, let maxval =
,K R Q
max ( ), where ,
is the sum of the scores of tuples in TKT Score T Score T
![Page 20: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/20.jpg)
Existing Solutions
Definition of diversity
Impossibility results of
diversity.
Query processing technique.
Agenda
![Page 21: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/21.jpg)
Impossibility Results
Intuition: IR score of an item depends
only on the item and possibly statistics
from the entire corpus, but diversity
depends on the other items in the
query result set.
![Page 22: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/22.jpg)
Inverted Lists
Honda cars
Honda
Car
Merged Inverted List:
![Page 23: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/23.jpg)
Impossibility Results
Item in an inverted list has a score, which can either be a global
score (e.g., PageRank) or a value/keyword -dependent score (e.g.,
TF-IDF).
The items in each list are usually ordered by their score – so that
we could handle top-k queries .
If we assume that we have a scoring function f() that is monotonic-
which as a normal assumption for traditional IR system, then the
article proofs either it’s not diverse or to inefficient\infeasible.
![Page 24: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/24.jpg)
Existing Solutions
Definition of diversity
Impossibility results of diversity.
Query processing technique.
Agenda
![Page 25: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/25.jpg)
The DB example
Id Make Model Color Year Description
1 Honda Civic Green 2007 Low miles
2 Honda Civic Blue 2007 Low miles
3 Honda Civic Red 2007 Low miles
4 Honda Civic Black 2007 Low miles
5 Honda Civic Black 2006 Low miles
6 Honda Accord Blue 2007 Best Price
7 Honda Accord Red 2006 Good miles
8 Honda Odyssey Green 2007 Rare
9 Honda Odyssey Green 2006 Good miles
10 Honda CRV Red 2007 Fun Car
11 Honda CRV Orange 2006 Good miles
12 Toyota Prius Tan 2007 Low miles
13 Toyota Corolla Black 2007 Low miles
14 Toyota Tercel Blue 2007 Low miles
15 Toyota Camry Blue 2007 Low miles
![Page 26: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/26.jpg)
The car indexing example
![Page 27: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/27.jpg)
One-pass Algorithm
Lets say Q looks for descriptions with ‘Low’, with k=3
Honda.Civic.Green.2007.’Low miles’
![Page 28: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/28.jpg)
One-pass Algorithm
We start from two Civics , then we know that we need only
one more so we pick the next Civic
![Page 29: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/29.jpg)
One-pass Algorithm
Then we look for another in next level (Accord)- no such,
because it doesn’t have ‘Low’ in it (also no other in that level).
![Page 30: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/30.jpg)
One-pass Algorithm
Then we look for another in next level (make)- and prune,
This is maximum diverse – we stop here.
![Page 31: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/31.jpg)
One-pass Algorithm
If we had a Ford, we would continue
Ford
Focus0
Black0
070
Lowmiles
0
![Page 32: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/32.jpg)
Scored One-pass Algorithm
Give each car a score , then the query would take this
score as parameter- minScore- smallest score in the
result set,
Choose next next ID by :
The smallest ID such that score(id)>=root.minScore.
And the algorithm proceeds as before.
![Page 33: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/33.jpg)
Probing Algorithm
Main idea: to go over all the cars as they were on an axis
K=
1
K=
2
K=
3
![Page 34: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/34.jpg)
Advantage of bidirectional exploring
“Honda” only has one child,
we found it quickly not exploring
every option (only civic).
Each time we add a node to the
diverse solution we do not have to
prune it- unlike the OnePass
algorithm.
![Page 35: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/35.jpg)
WAND algorithm
WAND is an efficient method of obtaining top-K
lists of scored results, without explicitly merging
the full inverted lists.
AND(X1,X2,...Xk)≡ WAND(X1,1,X2,1, ...Xk,1,k),
OR(X1,X2,...Xk) ≡ WAND(X1,1,X2,1, ...Xk,1,1).
To obtain k best results the operator uses the
upper bounds of maximum contribution, and
temp threshold.
WAND(X1,UB1,X2,UB2,...,Xk ,UBk, θ)
![Page 36: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/36.jpg)
Scored Probing AlgorithmWe use the WAND algorithm- to obtain the top-k list.
Next step is marking all possible nodes to add- as
MIDDLE.
we also maintain a heap – for a node with minimum
child.
Each step we move nodes from tentative to useful .
![Page 37: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/37.jpg)
Experiments
MultQ – rewriting the query as multiple
queries and merging their results.
Naïve – all the results of a query
Basic - just first k answers – without
diversity.
OnePass , Probe – our algorithms
U = unscored
S = scored
![Page 38: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/38.jpg)
Experiments
![Page 39: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/39.jpg)
Experiments
![Page 40: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/40.jpg)
Conclusions
Formalized diversity in structured
search and proposed inverted-list
algorithms.
The experiments showed that the
algorithms are scalable and
efficient.
In particular, diversity can be
implemented with little additional
overhead when compared to
traditional approaches
![Page 41: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/41.jpg)
Extension of the algorithm
Assign higher weights to
Hondas and Toyotas when
compared to Teslas, so that
the diverse results have
more Hondas and Toyotas.
![Page 42: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649d775503460f94a5a046/html5/thumbnails/42.jpg)
Questions?
Thank You!