Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The...

30
Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The University of New South Wales & NICTA Sydney, Australia Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW), Wei Wang (UNSW), Jeffrey Xu Yu (CUHK), Qing Zhang (UNSW & CSIRO)

Transcript of Efficient Computation of the Skyline Cube Yidong Yuan School of Computer Science & Engineering The...

Efficient Computation of the Skyline Cube

Yidong YuanSchool of Computer Science & EngineeringThe University of New South Wales & NICTA

Sydney, Australia

Joint Work: Xuemin Lin (UNSW), Qing Liu (UNSW), Wei Wang (UNSW), Jeffrey Xu Yu (CUHK), Qing Zhang (UNSW & CSIRO)

VLDB 2005 Yidong Yuan @DBG.UNSW 2

Outline

Introduction Skycube Computation Techniques Experiments Summary

VLDB 2005 Yidong Yuan @DBG.UNSW 3

Skyline Query A real estate example

P5 P1

skyline returns data points not dominated by others

price (100K)

dist age …

P1

3 3 5 …

P2

5 1 1 …

P3

1 4 4 …

P4

4 5 2 …

P5

2 2 3 …

Properties and Values

Skyline on price & dist

Skyline on price & age

P1

P3

P5

P4

P2

price

age

P4

P3

P5

P1

P2

price

dist

(x1, x2, …, xd) (y1, y2, …, yd)

i, xi yi & ∃k, xk<yk

VLDB 2005 Yidong Yuan @DBG.UNSW 4

Skyline Cube

Skycube Skyline on price & dist &

age Skyline on price & dist Skyline on price & age ……

A union of skyline results of all the non-empty subsets of d-dimensional set (2d - 1)

Lattice Structure of a Skycube

Dataset

Skycube Example

A B C

P1

3 3 5

P2

5 1 1

P3

1 4 4

P4

4 5 2

P5

2 2 3

skyline

ABC

{P2, P3, P4, P5}

AB {P2, P3, P5}

AC {P2, P3, P4, P5}

BC {P2}

A {P3}

B {P2}

C {P2}

ABC

AB AC BC

A B C

VLDB 2005 Yidong Yuan @DBG.UNSW 5

Motivation

How to compute Skycube efficiently? existing skyline techniques are applicable no sharing computation Not efficient!

VLDB 2005 Yidong Yuan @DBG.UNSW 6

Motivation (cont.) nested-loop-based alg.

BNL [ICDE 01]

redundant comparison Not efficient! SFS [ICDE 03] : presort the dataset keep the candidate list

minimum

repeated sorting Not efficient!

P4

P3

P5

P1

P2A

B Candidate List

Comparison of Skyline on

A

Comparison of Skyline on A and

B

P1

-- --

P2

P1 P1(A) vs. P2(A)

P1(A) vs. P2(A)

P1(B) vs. P2(B)

……

VLDB 2005 Yidong Yuan @DBG.UNSW 7

Motivation (cont.)

divide-and-conquer-based alg. (DC [ICDE 01])

repeat same divide/merge steps Not efficient!

P4

P3

P5

P1

P2

A

B

mA m’’Am’A

P4

P3

P5

P1

P2

A

BC

mA m’’Am’A

Divide Step ofSkyline on A and B

Divide Step ofSkyline on A, B, and C

VLDB 2005 Yidong Yuan @DBG.UNSW 8

Outline

Introduction Skycube Computation Techniques

Bottom-Up Skycube Algorithm (BUS) Top-Down Skycube Algorithm (TDS)

Experiments Summary

VLDB 2005 Yidong Yuan @DBG.UNSW 9

Property of Skycube

Distinct Value Condition no two data points have same value on the

same dimension SKYU(S): skyline on sub-dimension set U

SKYU(S) SKYV(S) U V

General Case Keep track of the “bad guys”

VLDB 2005 Yidong Yuan @DBG.UNSW 10

Basic Idea

compute the Skycube in a level-wise and bottom-up manner

each skyline is computed by a nested-loop-based algorithm

ABC

AB AC BC

A B C

VLDB 2005 Yidong Yuan @DBG.UNSW 11

Sharing Strategies

share-results: SKYU(S) SKYV(S) reduce the size of input reduce the # of dominance test

share-sorting: sort the dataset on each dimension keep the candidate list minimum reduce the # of sorting from 2d – 1 to d

AB

A B

VLDB 2005 Yidong Yuan @DBG.UNSW 12

Filtering Effective Dominance Test

filter function: p = sum of p’s coordinates no false negative: p q q does not dominate p

maintain the candidate list in a non-decreasing order of filtering values (e.g. avl-tree)

P4

P3

P5

P1

P2A

BSort on B

P2

P5

P1

P3

P4

ABp 6 4 6 5 9Candidat

e ListComparison (without

filter)

Comparison (with filter)

P5

P2 P2(A) vs. P5(A)

P2(B) vs. P5(B)

ABP2 vs. ABP5

Skyline on A and B

VLDB 2005 Yidong Yuan @DBG.UNSW 13

DC Algorithm

P3

A

B

P5

P4

P1

P2

P3

A

B

P5

P4

P1

P2

mA

S1 S2

P3

A

B

P5

P1

P2

mA

mB

S12 S22

S11 S21

Divide Step Merge Step

VLDB 2005 Yidong Yuan @DBG.UNSW 14

Sharing Opportunities

share-partitioning

A

P1

P3

P4

P5P2

B

mA

S1 S2

A

P1

P3

P4

P5P2

BC

mA

S1 S2

skyline on

A and B

skyline on A, B, and

C

mi mj… … …… mi mj… … ……

VLDB 2005 Yidong Yuan @DBG.UNSW 15

Sharing Opportunities (cont.)

share-merging

skyline on A and Bskyline on A, B, and C

{P3, P5} {P1, P2}B {P3, P5} {P1, P2, P4}BC

{P3, P5} {{P1, P2}, {P4}}BC

{P3, P5} {P1, P2}BC{P3, P5} {P1, P2}B

{P3, P5} above resultC

decompose merge step

P3

A

BC

P5

P1

P2

mA

S1 S2

{P3, P5} {P4}BC

P3

A

B

P5

P1

P2

mA

S1 S2P4

VLDB 2005 Yidong Yuan @DBG.UNSW 16

TDS Algorithm

Basic Idea compute skylines on a path simultaneously find a minimal set of paths share-parent: using parent’s skyline result as the

input

ABC

AB AC BC

A B C

BC AC

ABCAB

A B C

S

SKYABC(S) SKYABC(S)

ABCAB

A

VLDB 2005 Yidong Yuan @DBG.UNSW 17

Outline

Introduction Skycube Computation Techniques Experiments Summary

VLDB 2005 Yidong Yuan @DBG.UNSW 18

Experiment Setting

Algorithms(* our sharing strategies applied)

BNLS: BNL-Skycube algorithm *

SFSS: SFS-Skycube algorithm *

DCS: DC-Skycube algorithm *

BUS: Bottom-Up Skycube algorithm

TDS: Top-Down Skycube algorithm

Dataset correlated, independent, anti-correlated

Dimensionality

d [4, 10]

Cardinality n [100k, 500k]

VLDB 2005 Yidong Yuan @DBG.UNSW 19

Effect of Dimensionality

independent

Dimensionality (n = 500k)

VLDB 2005 Yidong Yuan @DBG.UNSW 20

Effect of Dimensionality (cont.)

correlated anti-correlated

Dimensionality (n = 500k) Dimensionality (n = 500k)

VLDB 2005 Yidong Yuan @DBG.UNSW 21

Effect of Cardinalityanti-correlated

Cardinality (d = 8)x100K

VLDB 2005 Yidong Yuan @DBG.UNSW 22

Effect of Duplicate Valuesindependent (d = 8)

VLDB 2005 Yidong Yuan @DBG.UNSW 23

Outline

Introduction Skycube Computation Techniques Experiments Summary

VLDB 2005 Yidong Yuan @DBG.UNSW 24

Summary A novel concept –– Skycube Skycube computation Techniques

Bottom-Up Skycube algorithm share-results, share-sorting

Top-Down Skycube algorithm share-partition-and-merging, share-parent

Future Work I/O based techniques multiple skyline queries

VLDB 2005 Yidong Yuan @DBG.UNSW 25

Q&A

Thank you.

VLDB 2005 Yidong Yuan @DBG.UNSW 26

Preliminaries

Existing Skyline Computation Algorithms nested-loop-based

Block-Nested-Loop (BNL) algorithm [BKS, ICDE 01]

Sort-Filter-Skyline (SFS) algorithm [CGG+, ICDE 03]

divide-and-conquer-based Divide-and-Conquer (DC) algorithm [BKS, ICDE 01]

index-based Bitmap, Index-Method [TEO, VLDB 01]

R-tree Index Based [KRR, VLDB 02; PTF+, SIGMOD 03]

VLDB 2005 Yidong Yuan @DBG.UNSW 27

Preliminaries–– BNL and SFS Algorithms

BNL algorithm

SFS algorithm entropy value (indicator of the dominance power) pre-sort the dataset (e.g., {P5, P2, P3, P1, P4})

P3

A

B

P5

P4

P1

P2

Current Cand. List

Results

P1 P1

P2 P1 P1 , P2

P3 P1 , P2 P1 , P2 , P3

P4 P1 , P2 , P3 P1 , P2 , P3

P5 P1 , P2 , P3 P2 , P3 , P5

VLDB 2005 Yidong Yuan @DBG.UNSW 28

Preliminaries–– DC Algorithm

P3

A

B

P5

P4

P1

P2

P3

A

B

P5

P4

P1

P2

mA

S1 S2

P3

A

B

P5

P4

P1

P2

mA

mB

S12 S22

S11 S21

Divide Step Merge Step

VLDB 2005 Yidong Yuan @DBG.UNSW 29

General Case

Issue: SKYU(S) SKYV(S) does not necessarily hold

Solution share-results: re-examine SKYU(S) on V

SKYB(S) = {P3, P4, P5}

SKYAB(S) = {P3}

P3A

B

P5 P4

P1 P2

VLDB 2005 Yidong Yuan @DBG.UNSW 30

Motivation (cont.) other techniques

Index method [VLDB 01]

R-tree based index [VLDB 02; SIGMOD 03]

Goal Maximizing sharing computation!

pre-computation

(e.g. index)is not reusable

repeat pre-computation

Not efficient!