An optimal and progressive algorithm for skyline queries slide

INI Lab.

An Optimal and Progressive Algorithm for Skyline QueriesDimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger

ACM SIGMOD’ 2003

PresentersKYEONG SEOK HYUN,

WOO-SUNG CHOI,

JA-YEON KIM,

An Optimal

and Progressive Algorithm

for Skyline Queries

Using R-Tree

ts1. Introduction

2. Related Work

2.1 Block Nested Loop (BNL)

2.5 Nearest Neighbor (NN)

3. Branch and Bound Skyline Algorithm

With I/O analysis

5. Experimental Evaluation

Skyline

Problem definition

http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html

http://drmoontv.blogspot.kr/2013/03/blog-post_17.html

http://emperia.egloos.com/m/2516211

5,000 Won

40,000 Won

4,500 Won

http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting

혜자>> 창렬

Formal definition of Dominates (≪)

Given a set of d-dimensional points 𝑇

We say that a point t1 ∈ 𝑇 DOMINATES another point t2 ∈ 𝑇

If and only if

∀𝑖 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑖 ≧ 𝑡2[𝑖]

∃𝑗 ∈ 1, 2, 3, … , 𝑑 , 𝑡1 𝑗 > 𝑡2[𝑗]

and Denoted by t2 ≪ t1

(simply saying, t1 이이득)

Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf

Note thatthe meaning of ‘dominates’ may differ

according to type of application

http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html

http://drmoontv.blogspot.kr/2013/03/blog-post_17.html

http://emperia.egloos.com/m/2516211

5,000 Won

40,000 Won

4,500 Won

http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting

Still혜자 >> 창렬

Hotel(attraction, 1/price, 1/distance)

Two Hotel

A : `80`, `1/15,000`, `1/500m`

B : `30`, `1/20,000`, `1/1500m`

𝐵 ≪ 𝐴

1/20,000 < 1/15,000

1/1,500m < 1/500m

attraction

Dominates!

for example,

Very important

l)The Skyline operator

Input - Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝𝑁}

Output – {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗ ∈ 𝑃 𝑠. 𝑡. 𝑝𝑖 ≪ 𝑝∗}

Dominating Area(B)

x axis

Common misconceptions“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡 s𝑖𝑛𝑐𝑒 𝐵 ≫ 𝐶 , D, F” , wrong

“𝐵 ∈ 𝑂𝑢𝑝𝑢𝑡, s𝑖𝑛𝑐𝑒 𝑛𝑜 𝑜𝑡ℎ𝑒𝑟 𝑝𝑜𝑖𝑛𝑡 𝑃 ≫ 𝐵”, correct

Naïve approach

for processing skyline queries

Suppose there are n objects in the given set

𝐷𝑥 = {𝑜1, 𝑜2, … , 𝑜𝑛}

Algorithm -Naïve 1

𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑥 ∈ 𝐷

𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒

𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑦 ∈ 𝐷

𝑖𝑓 ¬(𝑜𝑥 = 𝑜𝑦) 𝐴𝑁𝐷 ¬ 𝑜𝑥 ≪ 𝑜𝑦 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;

𝑒𝑙𝑠𝑒

𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;

break;

𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜𝑥} A

Suppose there are n objects in the given set

𝐷𝑥 = {𝑜1, 𝑜2, … , 𝑜𝑛}

Algorithm -Naïve 1

𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑥 ∈ 𝐷

𝑏𝑜𝑜𝑙𝑒𝑎𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑓𝑎𝑙𝑠𝑒

𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑜𝑏𝑗𝑒𝑐𝑡 𝑜𝑦 ∈ 𝐷

𝑖𝑓 ¬(𝑜𝑥 = 𝑜𝑦) 𝐴𝑁𝐷 ¬ 𝑜𝑥 ≪ 𝑜𝑦 𝑡ℎ𝑒𝑛 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒;

𝑒𝑙𝑠𝑒

𝑡ℎ𝑒𝑛 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 = 𝑡𝑟𝑢𝑒;

break;

𝑖𝑓 ! 𝑖𝑠𝐷𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑 𝑆 ∪ {𝑜𝑥}

Modification: (Algorithm -Naïve 2)

Idea 1. Use Nested Loop StructureIdea 2. Take advantage of ‘Block-transfer’

towards better re-usability!

Block A

Block B

The Inherited Limitation of these approaches

1. It needs full-scan over the data

2. Though, query result containsonly a small fraction of the dataset

3. That is, these approaches are wasteful

R-Tree Index Approach

for processing skyline queries

R-Tree

Nearest Neighbor Query

R-Tree: Balanced tree for indexing multi-dimensional object

Support Dynamic operation (insert, update, delete)

R-TreeVS

B-Tree

B+-Tree

Balanced

Requiring that all leaves be at the

same depth

Leaf nodes contain one

dimensional value

R-Tree

Similar to B+-Tree

Leaf nodes contain d-dimensional

http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html

Spatial objects (or d-dimensional objects or geometric objects)

d-dimensional object? R-Tree Used for the Organization of

a set of d-dimensional objects

How? Main Idea

Minimum Bounding Rectangles (MBRs)

http://caversham.otago.ac.nz/research/geog.php

izWhat is the minimum number of points for representing

a rectangle?

Assumption: each rectangle is parallel to the coordinate axes

Demonstration

R-Tree Simulator

Nearest Neighbor Query

Given a set of objects P = {𝑝1, 𝑝2, … , 𝑝𝑁}

Query Point - q

Output – {𝑝𝑖| 𝑝𝑖 ∈ 𝑃 𝑎𝑛𝑑 ∄ 𝑝∗ ∈ 𝑃 𝑠. 𝑡. 𝐿𝑝 𝑝𝑖 , 𝑞 > 𝐿𝑝(𝑝∗, 𝑞)}

See how it works in appendix

MINDIST(X, 0) MINDIST(X,1)

MINMAXDIST(X, 0)

MINMAXDIST(X,1)

0 1Root node

!Pruning!

http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/

http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html

http://www.davey.com/

Back to the original question

Skyline with R-Tree

h Let’s process skyline objects using R-Tree

Strategy 1 – Use traditional tech. (i.e. NN Query)

Strategy 2 – This paper

Strategy 1

Partition the data using NN Query recursively

Distance metric: 𝐿1 𝑛𝑜𝑟𝑚

First NN Query -> start from the ideal point (i.e. zero point)

Strategy 1

Recursive NN Query

Dominating Area(i)

x axis

To-do Area 1

To-do Area 2

x axis

Dominating Area(i)

TO-DO Area 2

TO-DO Area 1

To-do Area 2To-do Area 2

To-do Area 1

x axis

Dominating Area(i)

TO-DO Area 1

TO-DO Area 2Dominating Area(k)k

Next, test these area (only to find nothing)

To-do Area 1

x axis

Dominating Area(i)

TO-DO Area 1

Dominating Area(k)

To-do Area 1

Dominating Area(a)

Dominating Area(k)

Result

Dominating Area(i)

Dominating Area(a)

x axis

Generally speaking,

In a d-dimensional space,

Each skyline object discovered causes d recursive partitioning phase

Dominated

Generally speaking,

In a d-dimensional space,

Each skyline object discovered causes d recursive partitioning phase

Area 1

Dominated

Area 2

Dominated

Area 3

Dominated

What if?

In general, for d>2

The overlapping of the partitions

Necessitates DUPLICATE ELIMINATION

Area 1

Dominated Area

Dominated

Area 3

Dominated

! Strategy 1 needs an additional phase

For removing redundant outputs

4 elimination methods

Laisser-faire

Propagate

Fine-grained Partitioning

They works

Problem: sub-optimal

Strategy 2

Branch & Bound Skyline Algorithm

Similar to previous NN Query

Branch & Bound Skyline (BBS)

http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/

example

x axis

L2E1Root

Ptr 1 Ptr 2 Ptr 3 Ptr 4

L1E1Ptr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

example

x axis

L1E1 L1E2

L1E2, 4 L1E1, 10

RootPtr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Result

example

x axis

L1E2, 4 L1E2

L2E2, 5

L1E1, 10

L2E3, 7 L2E4, 8

RootPtr 1 Ptr 2 Ptr 3 Ptr 4

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Result

example

x axis

L2E1Root

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

L2E2, 5 L1E1, 10L2E3, 7 L2E4, 8

c, 12 h, 7 i, 5

Result

example

x axis

L2E1Root

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

L1E1, 10L2E4, 8 c, 12h, 7i, 5

Result

L2E3, 7

example

x axis

L2E1Root

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

L1E1, 10L2E4, 8 c, 12h, 7

Result

L2E3, 7

example

x axis

L2E1Root

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

L1E1, 10L2E4, 8 c, 12

Result

k, 10 f n i

example

x axis

L2E1Root

L2E1a b c null

L2E2c h i null

L2E3d g m null

L2E4f k l n

Result

a, 10 k, 10

Analysis

Strategy 1

Notation

Variable Description

s #of Skyline obj

e Empty Query

ne Non-empty Query

r Redendent Query

d d-dimension

h Height of the given R-Tree

Recursion Tree

d new recursive NN

… …

𝑛𝑒 = 𝑠 + 𝑟

𝑒 = 𝑛𝑒 ∙ 𝑑 − 1 + 1, 𝑠𝑖𝑛𝑐𝑒 𝑛𝑒 + 𝑒 = 𝑛𝑒 ∙ 𝑑 + 1(𝑟𝑜𝑜𝑡)

𝑒 = 𝑠 + 𝑟 𝑑 − 1 + 1

𝑁𝐴𝑁𝑁 ≥ 𝑒 + 𝑠 + 𝑟 ∗ ℎ = 𝑠 + 𝑟 𝑑 − 1 + 1 + 𝑠 + 𝑟 ℎ > 𝑠 ∙ ℎ ∙ 𝑑

Analysis

Strategy 2

Notation

Variable Description

s #of Skyline obj

h Height of the given R-Tree

𝑠 ∙ ℎ ≥ 𝑁𝐴𝐵𝐵𝑆

𝑁𝐴𝑁𝑁 > 𝑠 ∙ ℎ ∙ 𝑑 > 𝑁𝐴𝐵𝐵𝑆

Is it the optimal solution?

BBS Algorithm

Proof 1.

Termination&

Correctness

Lemma 1. BBS visits entries in ascending order

Of their distance to the ‘ideal point’

Lemma 2. Any data point added into Result_Set

Is guaranteed to be a final skyline point

Proof.

Suppose not then 𝑝𝑗 was added into Result_Set but not a final skyline point

Then, ∃ 𝑝∗ ∈ 𝐷𝐵 𝑠. 𝑡, 𝑝∗ ≫ 𝑝𝑗 , which means L1 ideal, p∗ < L1(ideal, pj)

However, observe that 𝑝∗ must be visited before 𝑝𝑗 by lemma 1.

Contradiction: 𝑝𝑗 should have been pruned, which contradicts the assumption.

Lemma 3. All data point will be examined, unless one of its ancestor

nodes has been pruned.

m Lemma 4. Any skyline algorithm

based on R-Tree must access all the

nodes whose mbrs intersects the SSR

Lemma 5. If an entry e doesn’t

intersect the SSR

Then ∃𝑝∗ 𝑠. 𝑡. 𝐿1 𝑖𝑑𝑒𝑎𝑙, 𝑝∗ <

𝐿1(𝑖𝑑𝑒𝑎𝑙, 𝑒. 𝑙𝑒𝑓𝑡𝑑𝑜𝑤𝑛)

Theorem: The # of node accesses

performed by BBS is OPTIMAL

mProof 1. BBS only accesses nodes that

may contain skyline points.

That is, BBS only accesses nodes

whose mbrs intersect the SSR

Suppose not

Node e that doesn’t intersect the SSR

∃𝑝∗ by lemma 5

Contradicts, by lemma 1

Proof 2. BBS visits nodes at most

once. (trivial)

Skip the details A

Dominating Area(B)

x axis

Experimental Evaluation

y3d dataset

rN=1M, d=3

sN=1M, d=3

x axis

Constrain

An optimal and progressive algorithm for skyline queries slide

Data & Analytics

Transcript of An optimal and progressive algorithm for skyline queries slide

Chasing the Pareto Frontier – In-Database Multi-Criteria Optimization with Skyline Queries

Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k Skyline Queries Marlene Goncalves and María-Esther Vidal Universidad Simón Bolívar,

Caching Dynamic Skyline Queries

LNCS 3896 - Parallelizing Skyline Queries for Scalable Distribution

Online Interval Skyline Queries on Time Series. I. Introduction.

Spatial Skyline Queries - Informatik · Beispielbild Spatial Skyline Queries Seminar zur Datenverwaltung, SoSe 2010 Fachbereich Mathematik und Informatik, Institut für Informatik

Progressive Deep Web Crawling Through Keyword Queries For ...jnwang/papers/sigmod2019-deeper-crawler… · Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment

Skyline Queries Based on User Locations and Preferences for Making Location-Based ...dbgroup/public... · 2013. 6. 28. · Skyline Queries Based on User Locations and Preferences

Parallel Distributed Processing of Constrained Skyline Queries by Filtering

Online Interval Skyline Queries on Time Series

Efficient privacy-preserving data merging and skyline ...rlu1/paper/ZhengLLSYC19.pdf · Skyline query is one of the most important queries as it has potential application in various

The Spatial Nearest Neighbor Skyline Queries

Skyline queries

Online Interval Skyline Queries on Time Series ICDE 2009.

SkyGraph: Retrieving Regions of Interest using Skyline Subgraph Queries › pvldb › vol10 › p1382-ranu.pdf · SkyGraph: Retrieving Regions of Interest using Skyline Subgraph Queries

HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

Parallel Computation of Skyline Queries Verification

Progressive Approximate Aggregate Queries with a Multi-Resolution Tree Structure

Answering Metric Skyline Queries by PM-tree

Secure and Efficient Skyline Queries on Encrypted Datajuncheny/publications/tkde18-SecureQuery.pdf · tions). Focusing on similarity search, secure k-nearest neigh-bor (kNN) queries,