Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y...

19
Spatial Issues in DBGlobe Dieter Pfoser

Transcript of Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y...

Page 1: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Spatial Issues in DBGlobe

Dieter Pfoser

Page 2: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Location Parameter in Services

Entering the harbor (x,y position)…

…triggers information request

Page 3: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Spatial Data in DBGlobe

Spatial information might be the predominant type of data to structure information content

PMOs contain spatially (+temporally) referenced data

These data is distributed over a set of devices How can we relate all these data to one spatial

location

“What have we stored for this location?” This introduces space as the organizing criterion

for data, i.e., a distinguished context

Page 4: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Spatial Data… (cont’d)

Each PMO contains a set of positions that reference content

The job of DBGlobe is now to find this content based on a given positional reference

Position {PMO (id)} content BUT!

– Content is referenced by position as the only argument!

– The question is of how to introduce further filters that only retrieve relevant (interesting) content based on additional parameters?

Page 5: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Distributed Indexes

Using tree-based structures, a global index needs to be constructed and some portion of the index replicated in the CAS

Given the set of locations for each PMO, one could compute a signature that the PMO communicates to a CAS (and further aggregated there)

This signature is used to potentially scan all PMOs for relevant spatial information

Page 6: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Bloom Filters: High Level Idea

Everyone thinks they need to know exactly what everyone else has. Give me a list of what you have.

Lists are long and unwieldy. Using Bloom filters, you can get small,

approximate lists. Give me information so I can figure out what you have.

Page 7: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

A Bloom Filter: To check an object’s name against a Bloom filter summary, the name is hashed with n different hash functions (here, n=3) and bits corresponding to the result are checked.

Bloom Filter Example

Bit VectorHash Functions

Page 8: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Bloom Filter

Multiple hash functions used for mapping of values on bit vector

Example: Web proxy cache sharing– Hashing URLs using the MD5 algorithm, which is a

cryptographic message digest algorithm that hashes arbitrary length strings to 128 bits

– Hash functions are built by first calculating the MD5 signature of a URL128

bits dividing the 128 bits into four 32-bit word, and finally taking the modulus of each 32-bit word by the table

size

Page 9: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Spatial Hashing

Alphanumeric hashing, string hash value Spatial coordinates as string?

– (Long/Lat) 23.123 deg. East, 38.01 deg. North– Equal to– 23.12 deg. East, 38.02 deg. North ???

Hashing the two pairs of coordinates as strings their hash values would not match (be totally different, given a good hash function such as MD5)

Spatial data is different from alphanumeric data since its semantics have to be seen in the context of a reference system

In the context of matching hash values tolerance is needed to test for equality

Page 10: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Spatial Subdivisions

Regular subdivisions Occupation-based, e.g., adaptive k-d-tree

Page 11: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Spatial Subdivision

Earthquakes 1964-83 Earthquakes 1964-92

Computing spatial subdivisions of space based on existing data

Page 12: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Spatial Hashing

Linearize the spatial subdivisions using space filling curves

Space filling curves as hash functions– Z-ordering (Peano curves)– Hilbert curves– …

Example:– Hashing positions using the above space-filling curves– Determine the spatial subdivision the position falls into– Compute respective linearization values for each of the

space filling curves (hash functions)– taking the modulus of each value by the size of the bit vector

Page 13: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

PMO containing spatial data communicate signatures to CAS

CAS “ORs” signatures and keeps track of associations

Overall Scenario

Page 14: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Questions

Types of queries, e.g., range queries vs. “point” queries

Spatial hash functions by using grids and space filling curves

Distinct type of data that deserves special treatment?

Can stand as a single query parameter? Needs more context?

Page 15: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

END

Page 16: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.
Page 17: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Given a set S = {x1,x2,x3,…xn} on a universe U, want to answer queries of the form:

Example: a set of URLs from the universe of all possible URL strings.

Bloom filter provides an answer in– “Constant” time (time to hash).– Small amount of space.– But with some probability of being wrong.

.SyIs

Lookup Problem

Page 18: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 1 2 3 4 5 6 7 8 9 10

Hash functions

Fal

se p

osit

ive

rate

m/n = 8

Opt k = 8 ln 2 = 5.45

Optimal Choice of Parameters

Given m bits for filter and n elements, choose number k of hash functions

Find optimal at k = (ln 2)m/n by calculus

Page 19: Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.

Spatial Subdivision