Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
-
Upload
lorin-jordan -
Category
Documents
-
view
214 -
download
0
description
Transcript of Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Finding skyline on the fly
HKU CS DB Seminar21 July 2004
Speaker: Eric Lo
HKU CS DB Seminar 2
Skyline A new operator (like “ORDER BY”) in
database systems A set of data points that is not dominated by
any other data points
HKU CS DB Seminar 3
Example Find some good places for us to hold the next DB Seminar Good Closer to HKU (Min) Good Larger Area (Max) Return those homes that are not worse than any others in ALL DIMENSIONS Dataset (Table Homes):
Home Distance from HKU Area (m2)
K.K Loo 1 km 10
Ben 9 km 100
Ivy 5 km 2
Nikos 8 km 250
HKU CS DB Seminar 4
Outline Introduction to skyline queries Non-progressive skylining on the Web
Basic Distributed Skyline Algorithm (BDS) Progressive skylining on the Web Experimental result Conclusion and future directions
HKU CS DB Seminar 5
Skylining on the Web One distributed site holds one attribute
Attribute “Distance from HKU” stored at HKU Attribute “Area (m2)” stored at Purdue
Home Distance from HKU
K.K Loo 1 km
Ben 9 km
Ivy 5 km
Nikos 8 km
Home Area (m2)
K.K Loo 10
Ben 100
Ivy 2
Nikos 250
PurdueHKU
Internet
HKU CS DB Seminar 6
Accessing interfacesHome Distance from HKU
K.K Loo 1 km
Ben 9 km
Ivy 5 km
Nikos 8 km
Home Area (m2)
K.K Loo 10
Ben 100
Ivy 2
Nikos 250
Purdue
HKU
Internet
Interfaces of Web-accessible sites:1. Sorted Access (SA):
HKUgetNext(): returns rank 1st data tuple “K.K Loo” HKUgetNext() 2nd “Ivy” , HKUgetNext()3rd “Nikos”, ….
2. Random Access (RA): PurduegetScore(“K.K. Loo”) 10 m2
HKUgetScore(“Nikos”) 8 km
HKU CS DB Seminar 7
Basic distributed skyline algorithm (EDBT 04) Phase 1 – find all possible skyline:
Perform sorted access on each source 1-by-1 S1getNext(), S2getNext(), S3getNext() S1getNext(), S2getNext() …. ….
Stop until there is an object which attribute values are all known
HKU CS DB Seminar 8
Phase 1 f is the terminating object
HKU CS DB Seminar 9
Phase 1 (15 sorted accesses)
HKU CS DB Seminar 10
Implication f is the terminating object Objects that do not
appear must be dominated by f
HKU CS DB Seminar 11
Phase 2 Find skyline from candidates in phase 1 During sequential scanning of sources, data
structures K1, K2, K3, …, Kn are created n is the no. of dimension
If source igetNext() returns a data object d1. create an entry in Ki2. update the lower_bound of the source i
HKU CS DB Seminar 12
Phase 2: find skyline from candidates Ki
A lemma shows that “Objects can only be dominated by objects in the same set Ki”
HKU CS DB Seminar 13
Motivations BDS returns skyline results in a batch In practice, it would be useful to return
skyline results progressively such that users could adjust their decisions right away
Consider the “next DB seminar” skyline example: minimize “Distance from HKU”, maximize “Area” <Nikos: 8km, 250m2> is first returned
From HKU to Nikos’s home needs to take a $50 bus! Add the “travel-expense” attribute into the skyline query
HKU CS DB Seminar 14
Progressive Distributed Skylining (PDS) Goal:
Evaluates skyline queries progressively with minimal overhead
Overhead: Network/Data source accesses Computational time
HKU CS DB Seminar 15
Enable progressiveness To identify a data point belongs to the final
skyline or not, we rely on the following lemma (assume the data values are distinct): If a data source Di returns data objects in a strictly
monotonic order, an object O retrieved from Di would only be dominated by objects that are retrieved from Di before O
HKU CS DB Seminar 16
If an object O is retrieved from a data source by sorted access, we could only need to test if O is dominated by any objects that appears before O in the same source only
2 usages:1. We don’t need to consider objects appear in other data sources2. After the test, we can output O as a skyline immediately O must be a
skyline, we do not need to worry about objects appear later would dominate O
HKU CS DB Seminar 17
An R-tree approach Build an r-tree Ri for each attribute/data
source i involved in the skyline query For each object O retrieved from source i, we
check to see if any object in Ri dominates O If no such objects exists, O is a skyline (output it
immediately) If some objects dominates O in Ri, O is not a
skyline object (O is discarded immediately)
HKU CS DB Seminar 18
D3.getNext() the 1st time
SA on D3 returns e<1> e is a skyline (no object is better than e on D3),
e(7,4) is projected into r-tree R3
e(7,4)
D1
D2
HKU CS DB Seminar 19
D3.getNext() the 2nd time
SA on D3 returns c<2> Construct a query Q(origin, c) on R3
Q returns no answer c is a skyline insert c into R3
e(7,4)c(2,5)
HKU CS DB Seminar 20
D3.getNext() the 3rd time
SA on D3 returns j<3> Construct a query Q(origin, j) on R3
Q returns c as an answer j is dominated by c discard j
e(7,4)c(2,5)
j(6,10)
HKU CS DB Seminar 21
D3.getNext() the 4th time
SA on D3 returns f<4>, construct a query Q(origin, f) on R3 Q returns no answer f is a skyline Delete e after insertion of f to make the R-tree more compact and
efficient
e(7,4)
c(2,5)
HKU CS DB Seminar 22
The R-tree approach The R-tree is very small
in size since it stores skyline objects with highest pruning power
Containment query operation is very efficient
HKU CS DB Seminar 23
A linear regression based heuristic The R-tree approach enable progressiveness
with better efficiency We use a linear regression based heuristic to
minimize the number of source accesses during the evaluation process
HKU CS DB Seminar 24
A rank based approach
1. We use linear regression to estimate the rank of objects along the process
2. Assume the object with lowest rank is the real terminating object and probe the sources accordingly (rather than round-robin)
HKU CS DB Seminar 25
Extensions Evaluation of top-K skyline queries Progress indicator (based on the estimated ranks)
An clipart of Kevin Yip
HKU CS DB Seminar 26
Experimental results – Number of source accesses
HKU CS DB Seminar 27
Experimental results – Number of source accesses
Random Distribution Denormalized Domain
HKU CS DB Seminar 28
Experimental results – progressive behavior
HKU CS DB Seminar 29
Experimental results – progress indicator
HKU CS DB Seminar 30
Conclusion and future directions Skyline queries on the Web Return skyline points on-the-fly Future work:
Improve the usability of PDS by allowing the users to barter between progressiveness and efficiency
Compute skyline from real-time stream data Only 1 data source supports sorted access and
the rest support random access only
HKU CS DB Seminar 31
References S.Borzonyi, D.Kossmann, K.Stocker, The Skyline Operat
or, in ICDE 2001. D.Kossmann, F.Ramsak, S. Rost, Shooting Stars in the
Sky: An Online Algorithm for Skyline Queries, in VLDB 2002.
W.T.Balke, U.Guntzer, J.X. Zheng, Efficient Distributed Skylining for Web Information Systems, in EDBT 2004