Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The...
-
Upload
olivia-stanley -
Category
Documents
-
view
214 -
download
1
Transcript of Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The...
Efficient Computation of the Skyline Cube
Yidong YuanSchool of Computer Science & EngineeringThe University of New South Wales & NICTA
Sydney, Australia
Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW), Wei Wang (UNSW), Jeffrey Xu Yu (CUHK), Qing Zhang (UNSW & CSIRO)
VLDB 2005 Yidong Yuan @DBG.UNSW 2
Outline
Introduction Skycube Computation Techniques Experiments Summary
VLDB 2005 Yidong Yuan @DBG.UNSW 3
Skyline Query A real estate example
P5 P1
skyline returns data points not dominated by others
price (100K)
dist age …
P1
3 3 5 …
P2
5 1 1 …
P3
1 4 4 …
P4
4 5 2 …
P5
2 2 3 …
Properties and Values
Skyline on price & dist
Skyline on price & age
P1
P3
P5
P4
P2
price
age
P4
P3
P5
P1
P2
price
dist
(x1, x2, …, xd) (y1, y2, …, yd)
i, xi yi & ∃k, xk<yk
VLDB 2005 Yidong Yuan @DBG.UNSW 4
Skyline Cube
Skycube Skyline on price & dist &
age Skyline on price & dist Skyline on price & age ……
A union of skyline results of all the non-empty subsets of d-dimensional set (2d - 1)
Lattice Structure of a Skycube
Dataset
Skycube Example
A B C
P1
3 3 5
P2
5 1 1
P3
1 4 4
P4
4 5 2
P5
2 2 3
skyline
ABC
{P2, P3, P4, P5}
AB {P2, P3, P5}
AC {P2, P3, P4, P5}
BC {P2}
A {P3}
B {P2}
C {P2}
ABC
AB AC BC
A B C
VLDB 2005 Yidong Yuan @DBG.UNSW 5
Motivation
How to compute Skycube efficiently? existing skyline techniques are applicable no sharing computation Not efficient!
VLDB 2005 Yidong Yuan @DBG.UNSW 6
Motivation (cont.) nested-loop-based alg.
BNL [ICDE 01]
redundant comparison Not efficient! SFS [ICDE 03] : presort the dataset keep the candidate list
minimum
repeated sorting Not efficient!
P4
P3
P5
P1
P2A
B Candidate List
Comparison of Skyline on
A
Comparison of Skyline on A and
B
P1
-- --
P2
P1 P1(A) vs. P2(A)
P1(A) vs. P2(A)
P1(B) vs. P2(B)
……
VLDB 2005 Yidong Yuan @DBG.UNSW 7
Motivation (cont.)
divide-and-conquer-based alg. (DC [ICDE 01])
repeat same divide/merge steps Not efficient!
P4
P3
P5
P1
P2
A
B
mA m’’Am’A
P4
P3
P5
P1
P2
A
BC
mA m’’Am’A
Divide Step ofSkyline on A and B
Divide Step ofSkyline on A, B, and C
VLDB 2005 Yidong Yuan @DBG.UNSW 8
Outline
Introduction Skycube Computation Techniques
Bottom-Up Skycube Algorithm (BUS) Top-Down Skycube Algorithm (TDS)
Experiments Summary
VLDB 2005 Yidong Yuan @DBG.UNSW 9
Property of Skycube
Distinct Value Condition no two data points have same value on the
same dimension SKYU(S): skyline on sub-dimension set U
SKYU(S) SKYV(S) U V
General Case Keep track of the “bad guys”
VLDB 2005 Yidong Yuan @DBG.UNSW 10
Basic Idea
compute the Skycube in a level-wise and bottom-up manner
each skyline is computed by a nested-loop-based algorithm
ABC
AB AC BC
A B C
VLDB 2005 Yidong Yuan @DBG.UNSW 11
Sharing Strategies
share-results: SKYU(S) SKYV(S) reduce the size of input reduce the # of dominance test
share-sorting: sort the dataset on each dimension keep the candidate list minimum reduce the # of sorting from 2d – 1 to d
AB
A B
VLDB 2005 Yidong Yuan @DBG.UNSW 12
Filtering Effective Dominance Test
filter function: p = sum of p’s coordinates no false negative: p q q does not dominate p
maintain the candidate list in a non-decreasing order of filtering values (e.g. avl-tree)
P4
P3
P5
P1
P2A
BSort on B
P2
P5
P1
P3
P4
ABp 6 4 6 5 9Candidat
e ListComparison (without
filter)
Comparison (with filter)
P5
P2 P2(A) vs. P5(A)
P2(B) vs. P5(B)
ABP2 vs. ABP5
Skyline on A and B
VLDB 2005 Yidong Yuan @DBG.UNSW 13
DC Algorithm
P3
A
B
P5
P4
P1
P2
P3
A
B
P5
P4
P1
P2
mA
S1 S2
P3
A
B
P5
P1
P2
mA
mB
S12 S22
S11 S21
Divide Step Merge Step
VLDB 2005 Yidong Yuan @DBG.UNSW 14
Sharing Opportunities
share-partitioning
A
P1
P3
P4
P5P2
B
mA
S1 S2
A
P1
P3
P4
P5P2
BC
mA
S1 S2
skyline on
A and B
skyline on A, B, and
C
mi mj… … …… mi mj… … ……
VLDB 2005 Yidong Yuan @DBG.UNSW 15
Sharing Opportunities (cont.)
share-merging
skyline on A and Bskyline on A, B, and C
{P3, P5} {P1, P2}B {P3, P5} {P1, P2, P4}BC
{P3, P5} {{P1, P2}, {P4}}BC
{P3, P5} {P1, P2}BC{P3, P5} {P1, P2}B
{P3, P5} above resultC
decompose merge step
P3
A
BC
P5
P1
P2
mA
S1 S2
{P3, P5} {P4}BC
P3
A
B
P5
P1
P2
mA
S1 S2P4
VLDB 2005 Yidong Yuan @DBG.UNSW 16
TDS Algorithm
Basic Idea compute skylines on a path simultaneously find a minimal set of paths share-parent: using parent’s skyline result as the
input
ABC
AB AC BC
A B C
BC AC
ABCAB
A B C
S
SKYABC(S) SKYABC(S)
ABCAB
A
VLDB 2005 Yidong Yuan @DBG.UNSW 17
Outline
Introduction Skycube Computation Techniques Experiments Summary
VLDB 2005 Yidong Yuan @DBG.UNSW 18
Experiment Setting
Algorithms(* our sharing strategies applied)
BNLS: BNL-Skycube algorithm *
SFSS: SFS-Skycube algorithm *
DCS: DC-Skycube algorithm *
BUS: Bottom-Up Skycube algorithm
TDS: Top-Down Skycube algorithm
Dataset correlated, independent, anti-correlated
Dimensionality
d [4, 10]
Cardinality n [100k, 500k]
VLDB 2005 Yidong Yuan @DBG.UNSW 20
Effect of Dimensionality (cont.)
correlated anti-correlated
Dimensionality (n = 500k) Dimensionality (n = 500k)
VLDB 2005 Yidong Yuan @DBG.UNSW 23
Outline
Introduction Skycube Computation Techniques Experiments Summary
VLDB 2005 Yidong Yuan @DBG.UNSW 24
Summary A novel concept –– Skycube Skycube computation Techniques
Bottom-Up Skycube algorithm share-results, share-sorting
Top-Down Skycube algorithm share-partition-and-merging, share-parent
Future Work I/O based techniques multiple skyline queries
VLDB 2005 Yidong Yuan @DBG.UNSW 26
Preliminaries
Existing Skyline Computation Algorithms nested-loop-based
Block-Nested-Loop (BNL) algorithm [BKS, ICDE 01]
Sort-Filter-Skyline (SFS) algorithm [CGG+, ICDE 03]
divide-and-conquer-based Divide-and-Conquer (DC) algorithm [BKS, ICDE 01]
index-based Bitmap, Index-Method [TEO, VLDB 01]
R-tree Index Based [KRR, VLDB 02; PTF+, SIGMOD 03]
VLDB 2005 Yidong Yuan @DBG.UNSW 27
Preliminaries–– BNL and SFS Algorithms
BNL algorithm
SFS algorithm entropy value (indicator of the dominance power) pre-sort the dataset (e.g., {P5, P2, P3, P1, P4})
P3
A
B
P5
P4
P1
P2
Current Cand. List
Results
P1 P1
P2 P1 P1 , P2
P3 P1 , P2 P1 , P2 , P3
P4 P1 , P2 , P3 P1 , P2 , P3
P5 P1 , P2 , P3 P2 , P3 , P5
VLDB 2005 Yidong Yuan @DBG.UNSW 28
Preliminaries–– DC Algorithm
P3
A
B
P5
P4
P1
P2
P3
A
B
P5
P4
P1
P2
mA
S1 S2
P3
A
B
P5
P4
P1
P2
mA
mB
S12 S22
S11 S21
Divide Step Merge Step
VLDB 2005 Yidong Yuan @DBG.UNSW 29
General Case
Issue: SKYU(S) SKYV(S) does not necessarily hold
Solution share-results: re-examine SKYU(S) on V
SKYB(S) = {P3, P4, P5}
SKYAB(S) = {P3}
P3A
B
P5 P4
P1 P2