An optimal and progressive algorithm for skyline queries slide

59
INI Lab. An Optimal and Progressive Algorithm for Skyline Queries Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger ACM SIGMOD’ 2003 Presenters KYEONG SEOK HYUN, WOO-SUNG CHOI, JA-YEON KIM,

Transcript of An optimal and progressive algorithm for skyline queries slide

Page 1: An optimal and progressive algorithm for skyline queries slide

INI Lab.

An Optimal and Progressive Algorithm for Skyline QueriesDimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger

ACM SIGMOD’ 2003

PresentersKYEONG SEOK HYUN,

WOO-SUNG CHOI,

JA-YEON KIM,

Page 2: An optimal and progressive algorithm for skyline queries slide

Ab

stra

ct

An Optimal

and Progressive Algorithm

for Skyline Queries

Using R-Tree

Page 3: An optimal and progressive algorithm for skyline queries slide

con

ten

ts1. Introduction

2. Related Work

2.1 Block Nested Loop (BNL)

2.5 Nearest Neighbor (NN)

3. Branch and Bound Skyline Algorithm

With I/O analysis

5. Experimental Evaluation

Page 4: An optimal and progressive algorithm for skyline queries slide

Skyline

Problem definition

Page 5: An optimal and progressive algorithm for skyline queries slide

Wh

ich

on

e d

o yo

u p

refe

r?

http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html

http://drmoontv.blogspot.kr/2013/03/blog-post_17.html

http://emperia.egloos.com/m/2516211

5,000 Won

40,000 Won

4,500 Won

http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting

혜자>> 창렬

Page 6: An optimal and progressive algorithm for skyline queries slide

pre

lim

ina

ries

Formal definition of Dominates (≪)

Given a set of d-dimensional points 𝑇

We say that a point t1 ∈ 𝑇 DOMINATES another point t2 ∈ 𝑇

If and only if

∀𝑖 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑖 ≧ 𝑡2[𝑖]

∃𝑗 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑗 > 𝑡2[𝑗]

and Denoted by t2 ≪ t1

(simply saying, t1 이이득)

Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf

Note thatthe meaning of ‘dominates’ may differ

according to type of application

Page 7: An optimal and progressive algorithm for skyline queries slide

Wh

ich

on

e d

o yo

u p

refe

r?

http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html

http://drmoontv.blogspot.kr/2013/03/blog-post_17.html

http://emperia.egloos.com/m/2516211

5,000 Won

40,000 Won

4,500 Won

4,500 Won

http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting

Still혜자 >> 창렬

Page 8: An optimal and progressive algorithm for skyline queries slide

Hotel(attraction, 1/price, 1/distance)

Two Hotel

A : `80`, `1/15,000`, `1/500m`

B : `30`, `1/20,000`, `1/1500m`

𝐵 ≪ 𝐴

Why?

30<80

1/20,000 < 1/15,000

1/1,500m < 1/500m

A

1/p

rice

attraction

BAB

Dominates!

for example,

Page 9: An optimal and progressive algorithm for skyline queries slide

Very important

Pro

ble

m D

efin

itio

n(m

ath

ema

tica

l)The Skyline operator

Input - Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝𝑁}

Output – {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗ ∈ 𝑃 𝑠. 𝑡. 𝑝𝑖 ≪ 𝑝∗}

A

B

C

D

E

F

Dominating Area(B)

x axis

yax

is

G

Common misconceptions“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡 s𝑖𝑛𝑐𝑒 𝐵 ≫ 𝐶 , D, F” , wrong

“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡, s𝑖𝑛𝑐𝑒 𝑛𝑜 𝑜𝑡ℎ𝑒𝑟 𝑝𝑜𝑖𝑛𝑡 𝑃 ≫ 𝐵”, correct

Page 10: An optimal and progressive algorithm for skyline queries slide

Naïve approach

for processing skyline queries

Page 11: An optimal and progressive algorithm for skyline queries slide

Exh

aust

ive

Test

Suppose there are n objects in the given set

𝐷𝑥 = {𝑜1, 𝑜2, … , 𝑜𝑛}

Algorithm -Naïve 1

𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑥 ∈ 𝐷

𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒

𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑦 ∈ 𝐷

𝑖𝑓 ¬(𝑜𝑥 = 𝑜𝑦) 𝐴𝑁𝐷 ¬ 𝑜𝑥 ≪ 𝑜𝑦 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;

𝑒𝑙𝑠𝑒

𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;

break;

𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜𝑥} A

B

C

D

E

F

G

Page 12: An optimal and progressive algorithm for skyline queries slide

Suppose there are n objects in the given set

𝐷𝑥 = {𝑜1, 𝑜2, … , 𝑜𝑛}

Algorithm -Naïve 1

𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑥 ∈ 𝐷

𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒

𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑦 ∈ 𝐷

𝑖𝑓 ¬(𝑜𝑥 = 𝑜𝑦) 𝐴𝑁𝐷 ¬ 𝑜𝑥 ≪ 𝑜𝑦 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;

𝑒𝑙𝑠𝑒

𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;

break;

𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜𝑥}

Exh

aust

ive

Test

Nes

ted

Lo

op

Str

uct

ure

Modification: (Algorithm -Naïve 2)

Idea 1. Use Nested Loop StructureIdea 2. Take advantage of ‘Block-transfer’

towards better re-usability!

Block A

Block B

A

B

C

D

E

F

G

The Inherited Limitation of these approaches

1. It needs full-scan over the data

2. Though, query result containsonly a small fraction of the dataset

3. That is, these approaches are wasteful

Page 13: An optimal and progressive algorithm for skyline queries slide

R-Tree Index Approach

for processing skyline queries

Page 14: An optimal and progressive algorithm for skyline queries slide

Pre

lim

ina

ries

R-Tree

Nearest Neighbor Query

Page 15: An optimal and progressive algorithm for skyline queries slide

Pre

lim

ina

ries

R-Tree: Balanced tree for indexing multi-dimensional object

Support Dynamic operation (insert, update, delete)

R-Tree Index Approach

Page 16: An optimal and progressive algorithm for skyline queries slide

R-TreeVS

B-Tree

B+-Tree

Balanced

Requiring that all leaves be at the

same depth

Leaf nodes contain one

dimensional value

R-Tree

Similar to B+-Tree

Leaf nodes contain d-dimensional

value

http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html

R-Tree Index Approach

Page 17: An optimal and progressive algorithm for skyline queries slide

Spatial objects (or d-dimensional objects or geometric objects)

d-dimensional object? R-Tree Used for the Organization of

a set of d-dimensional objects

How? Main Idea

Minimum Bounding Rectangles (MBRs)

http://caversham.otago.ac.nz/research/geog.php

<Objects in 2-dimension space>

Page 18: An optimal and progressive algorithm for skyline queries slide

Qu

izWhat is the minimum number of points for representing

a rectangle?

Assumption: each rectangle is parallel to the coordinate axes

18

6 8

4

7

x

y

0

R-Tree Index Approach

Page 19: An optimal and progressive algorithm for skyline queries slide

Demonstration

R-Tree Simulator

Page 20: An optimal and progressive algorithm for skyline queries slide

Nea

rest

Nei

ghb

or

(NN

) Q

uer

y P

roce

ssin

g u

sin

g R

-Tre

e

Nearest Neighbor Query

Input

Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝𝑁}

Query Point - q

Output – {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗ ∈ 𝑃 𝑠. 𝑡. 𝐿𝑝 𝑝𝑖 , 𝑞 > 𝐿𝑝(𝑝∗, 𝑞)}

0 x

y

See how it works in appendix

R-Tree Index Approach

Page 21: An optimal and progressive algorithm for skyline queries slide

0 x

y

MINDIST(X, 0) MINDIST(X,1)

MINMAXDIST(X, 0)

MINMAXDIST(X,1)

0 1Root node

Key

ID

EA

!Pruning!

http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/

http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html

Page 22: An optimal and progressive algorithm for skyline queries slide

http://www.davey.com/

Page 23: An optimal and progressive algorithm for skyline queries slide

Back to the original question

Skyline with R-Tree

Page 24: An optimal and progressive algorithm for skyline queries slide

R-T

ree

Ind

ex A

pp

roac

h Let’s process skyline objects using R-Tree

Strategy 1 – Use traditional tech. (i.e. NN Query)

Strategy 2 – This paper

Strategy 1

Partition the data using NN Query recursively

Distance metric: 𝐿1 𝑛𝑜𝑟𝑚

First NN Query -> start from the ideal point (i.e. zero point)

Page 25: An optimal and progressive algorithm for skyline queries slide

Strategy 1

Recursive NN Query

Page 26: An optimal and progressive algorithm for skyline queries slide

Dominating Area(i)

exa

mp

lea

x axis

yax

is b

c

d

e

f

g

i m

n

k

i

IDEAL

i

Page 27: An optimal and progressive algorithm for skyline queries slide

To-do Area 1

To-do Area 2

exa

mp

lea

x axis

yax

is b

i

k

IDEAL

i

Dominating Area(i)

TO-DO Area 2

TO-DO Area 1

Page 28: An optimal and progressive algorithm for skyline queries slide

To-do Area 2To-do Area 2

To-do Area 1

exa

mp

lea

x axis

yax

is b

i

k

i

Dominating Area(i)

TO-DO Area 1

TO-DO Area 2Dominating Area(k)k

IDEAL

``

Next, test these area (only to find nothing)

Page 29: An optimal and progressive algorithm for skyline queries slide

To-do Area 1

exa

mp

le

x axis

i

k

i

Dominating Area(i)

TO-DO Area 1

Dominating Area(k)

To-do Area 1

k

a

yax

is b

IDEAL

a

Dominating Area(a)

Page 30: An optimal and progressive algorithm for skyline queries slide

Dominating Area(k)

Result

Dominating Area(i)

IDEAL

Dominating Area(a)

x axis

yax

is

i

k

i

k

aa

Page 31: An optimal and progressive algorithm for skyline queries slide

Lim

ita

tion

of

Str

ate

gy 1

Generally speaking,

In a d-dimensional space,

Each skyline object discovered causes d recursive partitioning phase

Dominated

Page 32: An optimal and progressive algorithm for skyline queries slide

Lim

ita

tion

of

Str

ate

gy 1

Generally speaking,

In a d-dimensional space,

Each skyline object discovered causes d recursive partitioning phase

Area 1

Dominated

Area 2

Dominated

Area 3

Dominated

Page 33: An optimal and progressive algorithm for skyline queries slide

What if?

In general, for d>2

The overlapping of the partitions

Necessitates DUPLICATE ELIMINATION

Area 1

Dominated Area

2

Dominated

Area 3

Dominated

Page 34: An optimal and progressive algorithm for skyline queries slide

Dis

ad

van

tage

! Strategy 1 needs an additional phase

For removing redundant outputs

4 elimination methods

Laisser-faire

Propagate

Merge

Fine-grained Partitioning

They works

Problem: sub-optimal

Page 35: An optimal and progressive algorithm for skyline queries slide

Strategy 2

Branch & Bound Skyline Algorithm

Page 36: An optimal and progressive algorithm for skyline queries slide

Idea!

Similar to previous NN Query

Branch & Bound Skyline (BBS)

http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/

Page 37: An optimal and progressive algorithm for skyline queries slide

h

example

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1Root

Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Page 38: An optimal and progressive algorithm for skyline queries slide

example

h

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1

L1E1 L1E2

Queue

L1E2, 4 L1E1, 10

RootPtr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Result

Page 39: An optimal and progressive algorithm for skyline queries slide

example

h

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1

L1E2, 4 L1E2

Queue

L2E2, 5

L1E1, 10

L2E3, 7 L2E4, 8

3 5 7

2

1

9

1

RootPtr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Result

Page 40: An optimal and progressive algorithm for skyline queries slide

example

h

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1Root

Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Queue

3 5 7

2

1

9

1

L2E2, 5 L1E1, 10L2E3, 7 L2E4, 8

c, 12 h, 7 i, 5

Result

Page 41: An optimal and progressive algorithm for skyline queries slide

example

h

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1Root

Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Queue

3 5 7

2

1

9

1

L1E1, 10L2E4, 8 c, 12h, 7i, 5

Result

L2E3, 7

Page 42: An optimal and progressive algorithm for skyline queries slide

example

h

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1Root

Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Queue

3 5 7

2

1

9

1

L1E1, 10L2E4, 8 c, 12h, 7

i, 5

Result

L2E3, 7

Page 43: An optimal and progressive algorithm for skyline queries slide

example

h

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1Root

Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Queue

3 5 7

2

1

9

1

L1E1, 10L2E4, 8 c, 12

i, 5

Result

k, 10 f n i

Page 44: An optimal and progressive algorithm for skyline queries slide

example

h

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1Root

Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E2Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Queue

3 5 7

2

1

9

1

i, 5

Result

a, 10 k, 10

Page 45: An optimal and progressive algorithm for skyline queries slide

Analysis

Strategy 1

Page 46: An optimal and progressive algorithm for skyline queries slide

An

alys

iso

f Str

ateg

y 1

Notation

Variable Description

s #of Skyline obj

e Empty Query

ne Non-empty Query

r Redendent Query

d d-dimension

h Height of the given R-Tree

Recursion Tree

d new recursive NN

… …

𝑛𝑒 = 𝑠 + 𝑟

𝑒 = 𝑛𝑒 ∙ 𝑑 − 1 + 1, 𝑠𝑖𝑛𝑐𝑒 𝑛𝑒 + 𝑒 = 𝑛𝑒 ∙ 𝑑 + 1(𝑟𝑜𝑜𝑡)

𝑒 = 𝑠 + 𝑟 𝑑 − 1 + 1

𝑁𝐴𝑁𝑁 ≥ 𝑒 + 𝑠 + 𝑟 ∗ ℎ = 𝑠 + 𝑟 𝑑 − 1 + 1 + 𝑠 + 𝑟 ℎ > 𝑠 ∙ ℎ ∙ 𝑑

Page 47: An optimal and progressive algorithm for skyline queries slide

Analysis

Strategy 2

Page 48: An optimal and progressive algorithm for skyline queries slide

An

alys

iso

f Str

ateg

y 2

(bri

ef v

ersi

on)

Notation

Variable Description

s #of Skyline obj

h Height of the given R-Tree

𝑠 ∙ ℎ ≥ 𝑁𝐴𝐵𝐵𝑆

𝑁𝐴𝑁𝑁 > 𝑠 ∙ ℎ ∙ 𝑑 > 𝑁𝐴𝐵𝐵𝑆

Page 49: An optimal and progressive algorithm for skyline queries slide

Is it the optimal solution?

BBS Algorithm

Page 50: An optimal and progressive algorithm for skyline queries slide

Proof 1.

Termination&

Correctness

Lemma 1. BBS visits entries in ascending order

Of their distance to the ‘ideal point’

Lemma 2. Any data point added into Result_Set

Is guaranteed to be a final skyline point

Proof.

Suppose not then 𝑝𝑗 was added into Result_Set but not a final skyline point

Then, ∃ 𝑝∗ ∈ 𝐷𝐵 𝑠. 𝑡, 𝑝∗ ≫ 𝑝𝑗 , which means L1 ideal, p∗ < L1(ideal, pj)

However, observe that 𝑝∗ must be visited before 𝑝𝑗 by lemma 1.

Contradiction: 𝑝𝑗 should have been pruned, which contradicts the assumption.

Lemma 3. All data point will be examined, unless one of its ancestor

nodes has been pruned.

Page 51: An optimal and progressive algorithm for skyline queries slide

Lem

ma

s fo

r th

e th

eore

m Lemma 4. Any skyline algorithm

based on R-Tree must access all the

nodes whose mbrs intersects the SSR

Lemma 5. If an entry e doesn’t

intersect the SSR

Then ∃𝑝∗ 𝑠. 𝑡. 𝐿1 𝑖𝑑𝑒𝑎𝑙, 𝑝∗ <

𝐿1(𝑖𝑑𝑒𝑎𝑙, 𝑒. 𝑙𝑒𝑓𝑡𝑑𝑜𝑤𝑛)

Theorem: The # of node accesses

performed by BBS is OPTIMAL

A

B

C

D

E

F

Do

min

atin

g A

rea(

B)

x ax

is

yaxis

G

SSR

Page 52: An optimal and progressive algorithm for skyline queries slide

Pro

of o

f th

e th

eore

mProof 1. BBS only accesses nodes that

may contain skyline points.

That is, BBS only accesses nodes

whose mbrs intersect the SSR

Suppose not

Node e that doesn’t intersect the SSR

∃𝑝∗ by lemma 5

Contradicts, by lemma 1

Proof 2. BBS visits nodes at most

once. (trivial)

A

B

C

D

E

F

Do

min

atin

g A

rea(

B)

x ax

is

yaxis

G

SSR

Page 53: An optimal and progressive algorithm for skyline queries slide

To q

uan

tify

th

e ac

tual

co

st

Skip the details A

B

C

D

E

F

Dominating Area(B)

x axis

yax

is

G

SSR

Page 54: An optimal and progressive algorithm for skyline queries slide

Experimental Evaluation

Page 55: An optimal and progressive algorithm for skyline queries slide

Exp

erim

enta

l E

valu

ati

on

Page 56: An optimal and progressive algorithm for skyline queries slide

Dim

ensi

on

alit

y

Page 57: An optimal and progressive algorithm for skyline queries slide

Car

din

alit

y3d dataset

Page 58: An optimal and progressive algorithm for skyline queries slide

Pro

gres

sive

beh

avio

rN=1M, d=3

Page 59: An optimal and progressive algorithm for skyline queries slide

Co

nst

rain

ed

skyl

ine

qu

erie

sN=1M, d=3

h

a

x axis

yax

is b

c

d

e

f

g

i m

n

k

l

IDEAL

L1E2

L1E1

L2E4

L2E2

L2E3

L2E1

Constrain