1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
-
Upload
brent-hampton -
Category
Documents
-
view
217 -
download
0
Transcript of 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
1
SD-Rtree: A Scalable Distributed Rtree
Witold Litwin &Cédric du Mouza & Philippe
Rigaux
2
Plan
Introduction SDDS R-tree
SD-Rtree Evolution Balancing
Spatial Rotations Overlapping
Redundant Coverage Queries Performance Conclusion
3
SDDS Principles (1993)
Data are at server nodes Communicating through point-to-point
messaging ; Overloaded servers split over new
servers Queries go to client nodes use local
images of the SDDS No central addressing component A node can be client and server (peer)
4
SDDS Principles (1993)
An outdated image may send a query an incorrect server
Servers forward such a query to the correct server
Image gets adjusted Image Adjustment Message (IAM) comes back
Client does not repeat the same error twice Data are basically in the RAM of the servers
5
SD-Rtree : a Spatial SDDS
Distributed Spatial Data
6
SD-Rtree : a Spatial SDDS
•Distributed Index • No central component
7
SD-Rtree : a Spatial SDDS Point & Window Queries kNN queries (future)
8
SD-Rtree : Generalizes R-tree
R-tree: Nodes are minimal
bounding boxes Leaf nodes point to
data Internal nodes
bound subtrees May overlap Split when overflow Generate balanced
m-ary tree
9
SD-Rtree : Generalizes R-tree
R-tree: An insert may
go through multiple paths
Ends up in the smallest bounding box
If there is any One of the
boxes gets enlarged
Box may split
10
SD-Rtree : Generalizes R-tree
R-tree: Search may go
through multiple paths
All paths may bring relevant objects
13
SD-Rtree: a Balanced Binary Tree
The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: Each internal node (or routing node) has
exactly two sons Each leaf node stores a subset of the
indexed dataset At each node, the height of the subtrees
differ by at most one Each server stores one data node and one
routing node
14
Sd-tree: Binary Tree Structure
di = data node (leaf) ri = routing node (internal node)
15
Sd-tree: Tree Distribution
17
SD-Rtree Balancing
The binary tree should be height-balanced The heights of the two subtrees
rooted at any node should not differ by more than 1 (cf. AVL trees)
The tree height is then logarithmic in the number of leaves
18
SD-Rtree Balancing
SD-Rtree balancing occurs during splits Messages are sent bottom-up to adjust the
height of the ancestor nodes Rotation occurs if an ancestor is imbalanced SD-Rtree rotation are spatial
change rectangles of internal nodes Best rotation minimizes rectangle overlapping
Tie breaking minimizes the « dead space »
20
Rotation Pattern
Properties The sons of a node are
not ordered => more freedom for
reorganizing the tree Any imbalanced node
matches a rotation pattern
A rotation pattern is a subtree a(b(e(f,g),d),c) such that:
h(c) = h(d) = h(f ) = n − 1 (n > 0)
h(g) = max(0, n − 2)
21
SD-Rtree :Spatial Rotation
22
Rotation Cost Constant number of messages (3 or 6,
depending on the choice) Few rotations in practice
In particular when the dataset is uniformly distributed
See our experiments
23
SD-Rtree : Images
Each image defines the addressing structure Resides as cache on a client or on a peer Starts with the address of the contact
server IAMs make it a subtree
Splits make images outdated IAMs adjust it incrementally
24
Image Adjustment Client contacts a server with a query Each incorrect server initiates a
traversal of the tree During the traversal, the description
of the nodes is collected The correct server sends the up-to-
date tree structure The client updates its image
26
Out-of-range situation
27
Insertion of objects
28
Overlapping management The directory rectangles in an Rtree may
overlap Local subtree does not suffice for locating all
the nodes that contains the point (point query) or the window (window query) searched for.
SD-Rtree servers maintain data on node overlapping Redundant Coverage
It avoids to systematically access the root node.
29
Redundant Coverage Example
The region common to A and B is stored on both nodes
If a point query sent to A falls in the region shared with B: A sends a point query message to B
For D: we must keep the intersection with C or B: here empty.
30
Queries Point queries and window queries. The
technique is similar to the insertion algorithm: Search in the client image a server whose mbb
contains the point or intersects the window Send the query to this server If the server actually covers the point or the
window; it answers to the client; else it sends the query to its parent node
A server uses the overlapping information to transmit the query
31
Experiments Synthetic data (points and rectangles)
generated with GSTD 50.000 to 500.000 objects 0 to 3.000 queries Server capacity: 3 000 objects
Comparison of three SD-Rtree variants: BASIC: no image; every query is processed
top-down from the root IMSERVER: no IAMs among the servers IMCLIENT: client images
33
Per Insert Cost
34
Cost of balancing
35
Image convergence
36
Distribution of messages
37
Cost per Query
38
Conclusion SD-Rtree is an efficient scalable distributed
Rtree For very large spatial data collections Can be processed in distributed RAM
Access time much faster than to disk data Load balancing
Spatial rotations Overlapping management
Redundant coverage O(log n) worst insert cost Future work
kNN-queries Objects distribution balancing on servers