V. Megalooikonomou Spatial Access Methods (SAMs) I

128
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Spatial Access Methods (SAMs) I (some slides are based on notes by C. Faloutsos)

description

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases. V. Megalooikonomou Spatial Access Methods (SAMs) I (some slides are based on notes by C. Faloutsos). General Overview. Multimedia Indexing Spatial Access Methods (SAMs) k-d trees - PowerPoint PPT Presentation

Transcript of V. Megalooikonomou Spatial Access Methods (SAMs) I

Page 1: V. Megalooikonomou Spatial Access Methods (SAMs) I

CIS750 – Seminar in Advanced Topics in Computer Science

Advanced topics in databases – Multimedia Databases

V. MegalooikonomouSpatial Access Methods (SAMs) I

(some slides are based on notes by C. Faloutsos)

Page 2: V. Megalooikonomou Spatial Access Methods (SAMs) I

General Overview

Multimedia Indexing Spatial Access Methods (SAMs)

k-d trees Point Quadtrees MX-Quadtree z-ordering R-trees

Page 3: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - Detailed outline

spatial access methods problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees

Page 4: V. Megalooikonomou Spatial Access Methods (SAMs) I

Spatial Access Methods - problem

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer spatial queries (like??)

Page 5: V. Megalooikonomou Spatial Access Methods (SAMs) I

Spatial Access Methods - problem

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 6: V. Megalooikonomou Spatial Access Methods (SAMs) I

Spatial Access Methods - problem

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 7: V. Megalooikonomou Spatial Access Methods (SAMs) I

Spatial Access Methods - problem

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 8: V. Megalooikonomou Spatial Access Methods (SAMs) I

Spatial Access Methods - problem

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 9: V. Megalooikonomou Spatial Access Methods (SAMs) I

Spatial Access Methods - problem

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ within ε)

Page 10: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - motivation

Q: applications?

Page 11: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - motivation

salary

age

traditional DB GIS

Page 12: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - motivation

salary

age

traditional DB GIS

Page 13: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - motivation

CAD/CAMfind elementstoo closeto each other

Page 14: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - motivation

CAD/CAM

Page 15: V. Megalooikonomou Spatial Access Methods (SAMs) I

day1 365

day1 365

S1

Sn

F(S1)

F(Sn)

SAMs - motivation

eg, avg

eg,. std

Page 16: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs: solutions

K-d trees point quadtrees MX-quadtrees z-ordering R-trees (grid files)

Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page)

Page 17: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - Detailed outline

spatial access methods problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees

Page 18: V. Megalooikonomou Spatial Access Methods (SAMs) I

k-d trees

Used to store k dimensional point data

It is not used to store region data A 2-d tree (i.e., for k=2) stores 2-

dimensional point data while a 3-d tree stores 3-dimensional point data, etc.

Page 19: V. Megalooikonomou Spatial Access Methods (SAMs) I

2-d trees – node structure

Binary trees Info: information field Xval,Yval: coordinates of a point associated with the

node Llink, Rlink: pointers to children Properties (N: node):

If level N even -> for all nodes M in the subtree rooted at N.Llink: M.Xval < N.Xval for all nodes P in the subtree rooted at N.Rlink: P.Xval >= N.Xval

If level N odd -> Similarly use Yvals

Page 20: V. Megalooikonomou Spatial Access Methods (SAMs) I

2-d trees – Example

Page 21: V. Megalooikonomou Spatial Access Methods (SAMs) I

2-d trees: Insertion/Search

To insert a node N into the tree pointed by T If N and T agree on Xval, Yval then

overwrite T Else, branch left if N.Xval < T.xval, right

otherwise (even levels) Similarly for odd levels (branching on

Yvals)

Page 22: V. Megalooikonomou Spatial Access Methods (SAMs) I

2-d trees – Example of Insertion

City (Xval, Yval)

Banja Luka (19, 45)

Derventa (40, 50)

Toslic (38, 38)

Tuzla (54, 35)

Sinj (4, 4)

Splitting of region by Banja LukaSplitting of region by Derventa

Splitting of region by Toslic Splitting of region by Sinj

Page 23: V. Megalooikonomou Spatial Access Methods (SAMs) I

2-d trees: Deletion

Deletion of point (x,y) from T If N is a leaf node easy Otherwise either Tl (left subtree) or Tr

(right subtree) is non-empty Find a “candidate replacement” node R in Tl or

Tr

Replace all of N’s non-link fields by those of R Recursively delete R from Ti

Recursion guaranteed to terminate - Why?

Page 24: V. Megalooikonomou Spatial Access Methods (SAMs) I

2-d trees: Deletion

Finding candidate replacement nodes for deletion Replacement node R must bear same

spatial relation to all nodes in Tl and Tr as node N

Page 25: V. Megalooikonomou Spatial Access Methods (SAMs) I

2-d trees: Range Queries

Q: Given a point (xc, yc) and a distance r find all points in the 2-d tree that lie within the circle

A: Each node N in a 2-d tree implicitly represents a region RN – If the circle (specified by the query) has no intersection with RN then there is no point in searching the subtree rooted at node N

Page 26: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - Detailed outline

spatial access methods problem dfn k-d trees point quadtrees z-ordering R-trees

Page 27: V. Megalooikonomou Spatial Access Methods (SAMs) I

Point Quadtrees

Represent point data Always split regions into 4 parts 2-d tree: a node N splits a region into two

by drawing one line through the point (N.xval, N.yval)

Point quadtree: a node N splits a region by drawing a horizontal and a vertical line through the point (N.xval, N.yval)

Four parts: NW, SW, NE, and SE quadrants Q: Quadtree nodes have 4 children?

Page 28: V. Megalooikonomou Spatial Access Methods (SAMs) I

Point Quadtrees

Nodes in point quadtrees represent regions

Page 29: V. Megalooikonomou Spatial Access Methods (SAMs) I

Point quadtrees - Insertion

City (Xval, Yval)

Banja Luka (19, 45)

Derventa (40, 50)

Toslic (38, 38)

Tuzla (54, 35)

Sinj (4, 4)Splitting of region by Banja LukaSplitting of region by Derventa

Splitting of region by Toslic Splitting of region by SinjSplitting of region by Tuzla

Page 30: V. Megalooikonomou Spatial Access Methods (SAMs) I

Point Quadtrees - Insertion

Page 31: V. Megalooikonomou Spatial Access Methods (SAMs) I

Point quadtrees: Deletion Deletion of point (x,y) from T

If N is a leaf node easy Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is

non-empty Find a “candidate replacement” node R in one of the

subtrees such that: Every other node R1 in N.NW is to the NW of R Every other node R2 in N.SW is to the SW of R etc…

Replace all of N’s non-link fields by those of R Recursively delete R from Ti

In general, it may not always be possible to find such as replacement node

Q: What happens in the worst case?

Page 32: V. Megalooikonomou Spatial Access Methods (SAMs) I

Point quadtrees: Deletion Deletion of point (x,y) from T

If N is a leaf node easy Otherwise a subtree (N.NW, N.SW, N.NE. N.SE) is non-

empty Find a “candidate replacement” node R in one of the

subtrees such that: Every other node R1 in N.NW is to the NW of R Every other node R2 in N.SW is to the SW of R etc…

Replace all of N’s non-link fields by those of R Recursively delete R from Ti

In general, it may not always be possible to find such as replacement node

Q: What happens in the worst case? May require all nodes to be reinserted

Page 33: V. Megalooikonomou Spatial Access Methods (SAMs) I

Point quadtrees: Range Searches

Each node in a point quadtree represents a region

Do not search regions that do not intersect the circle defined by the query

Page 34: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - Detailed outline

spatial access methods problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees

Page 35: V. Megalooikonomou Spatial Access Methods (SAMs) I

MX-Quadtrees

Drawbacks of 2-d trees, point quadtrees: shape of tree depends upon the order in

which objects are inserted into the tree splits may be uneven depending upon where

the point (N.xval, N.yval) is located inside the region (represented by N)

MX-quadtrees: shape (and height) of tree independent of number of nodes and order of insertion

Page 36: V. Megalooikonomou Spatial Access Methods (SAMs) I

MX-Quadtrees

Assumption: the map is represented as a grid of size (2k x 2k) for some k

When a region gets “split” it splits down the middle

Page 37: V. Megalooikonomou Spatial Access Methods (SAMs) I

MX-Quadtrees - Insertion

After insertion of A, B, C, and D respectively

Page 38: V. Megalooikonomou Spatial Access Methods (SAMs) I

MX-Quadtrees - Insertion

After insertion of A, B, C, and D respectively

Page 39: V. Megalooikonomou Spatial Access Methods (SAMs) I

MX-Quadtrees - Deletion

Fairly easy – why? All point are represented at the

leaf level Total time for deletion: O(k)

Page 40: V. Megalooikonomou Spatial Access Methods (SAMs) I

MX-Quadtrees –Range Queries

Same as in point quadtrees One difference:

Checking to see if a point is in the circle defined by the range query needs to be performed at the leaf level (points are stored at the leaf level)

Page 41: V. Megalooikonomou Spatial Access Methods (SAMs) I

SAMs - Detailed outline

spatial access methods problem dfn k-d trees point quadtrees MX-quadtrees z-ordering R-trees

Page 42: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page)

Hint: reduce the problem to 1-d points(!!)

Q1: why?A:Q2: how?

Page 43: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q: how would you organize, e.g., n-dim points, on disk? (C points per disk page)

Hint: reduce the problem to 1-d points (!!)

Q1: why?A: B-trees!Q2: how?

Page 44: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q2: how?A: assume finite granularity; z-

ordering = bit-shuffling = N-trees = Morton keys = geo-coding = ...

Page 45: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q2: how?A: assume finite granularity (e.g.,

232x232 ; 4x4 here)Q2.1: how to map n-d cells to 1-d

cells?

Page 46: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q2.1: how to map n-d cells to 1-d cells?

Page 47: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q2.1: how to map n-d cells to 1-d cells?

A: row-wiseQ: is it good?

Page 48: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q: is it good?A: great for ‘x’ axis; bad for ‘y’ axis

Page 49: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q: How about the ‘snake’ curve?

Page 50: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q: How about the ‘snake’ curve?A: still problems:

2^32

2^32

Page 51: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q: Why are those curves ‘bad’?A: no distance preservation (~

clustering)Q: solution?

2^32

2^32

Page 52: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q: solution? (w/ good clustering, and easy to compute, for 2-d and n-d?)

Page 53: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Q: solution? (w/ good clustering, and easy to compute, for 2-d and n-d?)

A: z-ordering/bit-shuffling/linear-quadtrees

‘looks’ better:• few long jumps;• scoops out the whole quadrant before leaving it• a.k.a. space filling curves

Page 54: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

z-ordering/bit-shuffling/linear-quadtrees

Q: How to generate this curve (z = f(x,y) )?

A: 3 (equivalent) answers!

Page 55: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

z-ordering/bit-shuffling/linear-quadtrees

Q: How to generate this curve (z = f(x,y))?

A1: ‘z’ (or ‘N’) shapes, RECURSIVELY

order-1 order-2... order (n+1)

Page 56: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Notice: self similar (we’ll see about

fractals, soon) method is hard to use: z =? f(x,y)

order-1 order-2... order (n+1)

Page 57: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

z-ordering/bit-shuffling/linear-quadtrees

Q: How to generate this curve (z = f(x,y) )?

A: 3 (equivalent) answers!

Method #2?

Page 58: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

bit-shuffling

0001

1011

11100100

x

y

x0 0

y1 1

z =( 0 1 0 1 )2 = 5

Page 59: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

bit-shuffling

0001

1011

11100100

x

y

x0 0

y1 1

z =( 0 1 0 1 )2 = 5

How about the reverse:

(x,y) = g(z) ?

Page 60: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

bit-shuffling

0001

1011

11100100

x

y

x0 0

y1 1

z =( 0 1 0 1 )2 = 5

How about n-d spaces?

Page 61: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

z-ordering/bit-shuffling/linear-quadtrees

Q: How to generate this curve (z = f(x,y) )?

A: 3 (equivalent) answers!

Method #3?

Page 62: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

linear-quadtrees : assign N->1, S->0 e.t.c.

0 1

0

1

00...

01...

10...

11...

W E

N

S

Page 63: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

... and repeat recursively. Eg.: zgray-cell

= WN;WN = (0101)2 = 5

0 1

0

1

00...

01...

10...

11...

W E

N

S

00 11

Page 64: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Drill: z-value of grey cell, with the three methods?

0 1

0

1

W E

N

S

Page 65: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Drill: z-value of grey cell, with the three methods?

0 1

0

1

W E

N

S

method#1: 14method#2: shuffle(11;10)= (1110)2 = 14

Page 66: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering

Drill: z-value of grey cell, with the three methods?

0 1

0

1

W E

N

S

method#1: 14method#2: shuffle(11;10)= (1110)2 = 14method#3: EN;ES = ... = 14

Page 67: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - Detailed outline

spatial access methods z-ordering

main idea - 3 methods use w/ B-trees; algorithms (range, knn

queries ...) non-point (eg., region) data analysis; variations

R-trees

Page 68: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q1: How to store on disk?A:Q2: How to answer range queries etc

Page 69: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q1: How to store on disk?A: treat z-value as primary key; feed to

B-tree

SF

PGHz cname etc

5 SF12 PGH

Page 70: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

MAJOR ADVANTAGES w/ B-tree: already inside commercial systems (no

coding /debugging!) concurrency & recovery is ready

SF

PGHz cname etc

5 SF12 PGH

Page 71: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2: queries? (eg.: find city at (0,3) )?

SF

PGHz cname etc

5 SF12 PGH

Page 72: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2: queries? (eg.: find city at (0,3) )?A: find z-value; search B-tree

SF

PGHz cname etc

5 SF12 PGH

Page 73: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2: range queries?

SF

PGHz cname etc

5 SF12 PGH

Page 74: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2: range queries?A: compute ranges of z-values; use

B-tree

SF

PGHz cname etc

5 SF12 PGH

9,11-15

Page 75: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2’: range queries - how to reduce # of qualifying ranges?

SF

PGHz cname etc

5 SF12 PGH

9,11-15

Page 76: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2’: range queries - how to reduce # of qualifying ranges?

A: Augment the query!

SF

PGHz cname etc

5 SF12 PGH

9,11-15 -> 8-15

Page 77: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2’’: range queries - how to break a query into ranges?

9,11-15

Page 78: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2’’: range queries - how to break a query into ranges?

A: recursively, quadtree-style; decompose only non-full quadrants

9,11-1512-15

Page 79: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q2’’: range queries - how to break a query into ranges?

A: recursively, quadtree-style; decompose only non-full quadrants

9,11-1512-15

9, 11

Page 80: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - Detailed outline

spatial access methods z-ordering

main idea - 3 methods use w/ B-trees; algorithms (range, knn

queries ...) non-point (eg., region) data analysis; variations

R-trees

Page 81: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q3: k-nn queries? (say, 1-nn)?

SF

PGHz cname etc

5 SF12 PGH

Page 82: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q3: k-nn queries? (say, 1-nn)?A: traverse B-tree; find nn wrt z-

values and ...

SF

PGHz cname etc

5 SF12 PGH

Page 83: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

... ask a range query.

SF

PGH

53 12

nn wrt z-value

Page 84: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

... ask a range query.

SF

PGH

53 12

nn wrt z-value

Page 85: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - usage & algo’s

Q4: all-pairs queries? ( all pairs of cities within 10 miles from each other? )

SF

PGH

(we’ll see ‘spatial joins’ later: find

all PA counties that intersect a lake)

Page 86: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - Detailed outline

spatial access methods z-ordering

main idea - 3 methods use w/ B-trees; algorithms (range, knn

queries ...) non-point (eg., region) data analysis; variations

R-trees ...

Page 87: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Q: z-value for a region?

zB = ??

zC = ??A B

C

Page 88: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Q: z-value for a region?A: 1 or more z-values; by quadtree

decomposition

zB = ??

zC = ??

A B

C

Page 89: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Q: z-value for a region?

0 1

0

1

00...

01...

10...

11...

W E

N

S

00 11A B

C

zB = 11**

zC = ??

“don’t care”

Page 90: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Q: z-value for a region?

0 1

0

1

00...

01...

10...

11...

W E

N

S

00 11A B

C

zB = 11**

zC = {0010; 1000}

“don’t care”

Page 91: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Q: How to store in B-tree?Q: How to search (range etc queries)

A B

C

Page 92: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regionsQ: How to store in B-tree? A: sort (*<0<1)Q: How to search (range etc queries)

A B

C

z obj-id etc

0010 C0101 A1000 C

11** B

Page 93: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Q: How to search (range etc queries) –

eg ‘red’ range query

A B

C

z obj-id etc

0010 C0101 A1000 C

11** B

Page 94: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Q: How to search (range etc queries) –

eg ‘red’ range queryA: break query in z-values; check B-

treeA B

C

z obj-id etc

0010 C0101 A1000 C

11** B

Page 95: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Almost identical to range queries for point data, except for the “don’t cares” - i.e.,

A B

C

z obj-id etc

0010 C0101 A1000 C

11** B

1100 ?? 11**

Page 96: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Almost identical to range queries for point data, except for the “don’t cares” - i.e.,

z1= 1100 ?? 11** = z2Specifically: does z1

contain/avoid/intersect z2?Q: what is the criterion to decide?

Page 97: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

z1= 1100 ?? 11** = z2Specifically: does z1

contain/avoid/intersect z2?Q: what is the criterion to decide?A: Prefix property: let r1, r2 be the

corresponding regions, and let r1 be the smallest (=> z1 has fewest ‘*’s). Then:

Page 98: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

r2 will either contain completely, or avoid completely r1.

it will contain r1, if z2 is the prefix of z1

A B

C

1100 ?? 11**

region of z1:completely contained in

region of z2

Page 99: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Drill (True/False). Given: z1= 011001** z2= 01****** z3= 0100****T/F r2 contains r1T/F r3 contains r1T/F r3 contains r2

Page 100: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Drill (True/False). Given: z1= 011001** z2= 01****** z3= 0100****T/F r2 contains r1 - TRUE (prefix property)T/F r3 contains r1 - FALSE (disjoint)T/F r3 contains r2 - FALSE (r2 contains r3)

Page 101: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Drill (True/False). Given: z1= 011001** z2= 01****** z3= 0100****

z2

Page 102: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Drill (True/False). Given: z1= 011001** z2= 01****** z3= 0100**** z3

z2

T/F r2 contains r1 - TRUE (prefix property)T/F r3 contains r1 - FALSE (disjoint)T/F r3 contains r2 - FALSE (r2 contains r3)

Page 103: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Spatial joins: find (quickly) all counties intersecting

lakes

Page 104: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Spatial joins: find (quickly) all counties intersecting

lakes

Naive algorithm: O( N * M)Something faster?

Page 105: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Spatial joins: find (quickly) all counties intersecting

lakes z obj-id etc

0011 Erie0101 Erie…

10** Ont.

z obj-id etc

0010 ALG… …1000 WAS

11** ALG

Page 106: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - regions

Spatial joins: find (quickly) all counties intersecting lakes

Solution: merge the lists of (sorted) z-values, looking for the prefix property

footnote#1: ‘*’ needs careful treatmentfootnote#2: need dup. elimination

Page 107: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - Detailed outline

spatial access methods z-ordering

main idea - 3 methods use w/ B-trees; algorithms (range, knn

queries ...) non-point (eg., region) data analysis; variations

R-trees

Page 108: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - variations

Q: is z-ordering the best we can do?

Page 109: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - variations

Q: is z-ordering the best we can do?A: probably not - occasional long

‘jumps’Q: then?

Page 110: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - variations

Q: is z-ordering the best we can do?A: probably not - occasional long

‘jumps’Q: then? A1: Gray codes

Page 111: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - variations

A2: Hilbert curve! (a.k.a. Hilbert-Peano curve)

Page 112: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - variations

‘Looks’ better (never long jumps). How to derive it?

Page 113: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - variations

‘Looks’ better (never long jumps). How to derive it?

order-1 order-2 ... order (n+1)

Page 114: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - variations

Q: function for the Hilbert curve ( h = f(x,y) )?

A: bit-shuffling, followed by post-processing, to account for rotations. Linear on # bits. See textbook, for pointers to

code/algorithms (eg., [Jagadish, 90])

Page 115: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - variations

Q: how about Hilbert curve in 3-d? n-d?

A: Exists (and is not unique!). Eg., 3-d, order-1 Hilbert curves (Hamiltonian paths on cube)#1 #2

Page 116: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - Detailed outline

spatial access methods z-ordering

main idea - 3 methods use w/ B-trees; algorithms (range, knn

queries ...) non-point (eg., region) data analysis; variations

R-trees ...

Page 117: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - analysis

Q: How many pieces (‘quad-tree blocks’) per region?

A: proportional to perimeter (surface etc)

Page 118: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - analysis

(How long is the coastline, say, of England?

Paradox: The answer changes with the yard-stick -> fractals ...)

Page 119: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - analysis

Q: Should we decompose a region to full detail (and store in B-tree)?

Page 120: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - analysis

Q: Should we decompose a region to full detail (and store in B-tree)?

A: NO! approximation with 1-3 pieces/z-values is best [Orenstein90]

Page 121: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - analysis

Q: how to measure the ‘goodness’ of a curve?

Page 122: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - analysis

Q: how to measure the ‘goodness’ of a curve?

A: e.g., avg. # of runs, for range queries

4 runs 3 runs(#runs ~ #disk accesses on B-tree)

Page 123: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - analysis

Q: So, is Hilbert really better?A: 27% fewer runs, for 2-d (similar for

3-d)

Q: are there formulas for #runs, #of quadtree blocks etc?

A: Yes ([Jagadish; Moon+ etc] see textbook)

Page 124: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - fun observations

Hilbert and z-ordering curves: “space filling curves”: eventually, they visit every point

in n-d space - therefore:

order-1 order-2 ... order (n+1)

Page 125: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - fun observations

... they show that the plane has as many points as a line (-> headaches for 1900’s mathematics/topology). (fractals, again!)

order-1 order-2 ... order (n+1)

Page 126: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - fun observations

Observation #2: Hilbert (like) curve for video encoding [Y. Matias+, CRYPTO ‘87]:

Given a frame, visit its pixels in randomized

hilbert order; compress; and transmit

Page 127: V. Megalooikonomou Spatial Access Methods (SAMs) I

z-ordering - fun observations

In general, Hilbert curve is great for preserving distances, clustering, vector quantization etc

Page 128: V. Megalooikonomou Spatial Access Methods (SAMs) I

Conclusions

z-ordering is a great idea (n-d points -> 1-d points; feed to B-trees)

used by TIGER system and (most probably) by other GIS products

works great with low-dim points