Spatial Issues in DBGlobe
Dieter Pfoser
Location Parameter in Services
Entering the harbor (x,y position)…
…triggers information request
Spatial Data in DBGlobe
Spatial information might be the predominant type of data to structure information content
PMOs contain spatially (+temporally) referenced data
These data is distributed over a set of devices How can we relate all these data to one spatial
location
“What have we stored for this location?” This introduces space as the organizing criterion
for data, i.e., a distinguished context
Spatial Data… (cont’d)
Each PMO contains a set of positions that reference content
The job of DBGlobe is now to find this content based on a given positional reference
Position {PMO (id)} content BUT!
– Content is referenced by position as the only argument!
– The question is of how to introduce further filters that only retrieve relevant (interesting) content based on additional parameters?
Distributed Indexes
Using tree-based structures, a global index needs to be constructed and some portion of the index replicated in the CAS
Given the set of locations for each PMO, one could compute a signature that the PMO communicates to a CAS (and further aggregated there)
This signature is used to potentially scan all PMOs for relevant spatial information
Bloom Filters: High Level Idea
Everyone thinks they need to know exactly what everyone else has. Give me a list of what you have.
Lists are long and unwieldy. Using Bloom filters, you can get small,
approximate lists. Give me information so I can figure out what you have.
A Bloom Filter: To check an object’s name against a Bloom filter summary, the name is hashed with n different hash functions (here, n=3) and bits corresponding to the result are checked.
Bloom Filter Example
Bit VectorHash Functions
Bloom Filter
Multiple hash functions used for mapping of values on bit vector
Example: Web proxy cache sharing– Hashing URLs using the MD5 algorithm, which is a
cryptographic message digest algorithm that hashes arbitrary length strings to 128 bits
– Hash functions are built by first calculating the MD5 signature of a URL128
bits dividing the 128 bits into four 32-bit word, and finally taking the modulus of each 32-bit word by the table
size
Spatial Hashing
Alphanumeric hashing, string hash value Spatial coordinates as string?
– (Long/Lat) 23.123 deg. East, 38.01 deg. North– Equal to– 23.12 deg. East, 38.02 deg. North ???
Hashing the two pairs of coordinates as strings their hash values would not match (be totally different, given a good hash function such as MD5)
Spatial data is different from alphanumeric data since its semantics have to be seen in the context of a reference system
In the context of matching hash values tolerance is needed to test for equality
Spatial Subdivisions
Regular subdivisions Occupation-based, e.g., adaptive k-d-tree
Spatial Subdivision
Earthquakes 1964-83 Earthquakes 1964-92
Computing spatial subdivisions of space based on existing data
Spatial Hashing
Linearize the spatial subdivisions using space filling curves
Space filling curves as hash functions– Z-ordering (Peano curves)– Hilbert curves– …
Example:– Hashing positions using the above space-filling curves– Determine the spatial subdivision the position falls into– Compute respective linearization values for each of the
space filling curves (hash functions)– taking the modulus of each value by the size of the bit vector
PMO containing spatial data communicate signatures to CAS
CAS “ORs” signatures and keeps track of associations
Overall Scenario
Questions
Types of queries, e.g., range queries vs. “point” queries
Spatial hash functions by using grids and space filling curves
Distinct type of data that deserves special treatment?
Can stand as a single query parameter? Needs more context?
END
Given a set S = {x1,x2,x3,…xn} on a universe U, want to answer queries of the form:
Example: a set of URLs from the universe of all possible URL strings.
Bloom filter provides an answer in– “Constant” time (time to hash).– Small amount of space.– But with some probability of being wrong.
.SyIs
Lookup Problem
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0 1 2 3 4 5 6 7 8 9 10
Hash functions
Fal
se p
osit
ive
rate
m/n = 8
Opt k = 8 ln 2 = 5.45
Optimal Choice of Parameters
Given m bits for filter and n elements, choose number k of hash functions
Find optimal at k = (ln 2)m/n by calculus
Spatial Subdivision
Top Related