1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.

Post on 28-Dec-2015

217 views 0 download

Transcript of 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.

1

SD-Rtree: A Scalable Distributed Rtree

Witold Litwin &Cédric du Mouza & Philippe

Rigaux

2

Plan

Introduction SDDS R-tree

SD-Rtree Evolution Balancing

Spatial Rotations Overlapping

Redundant Coverage Queries Performance Conclusion

3

SDDS Principles (1993)

Data are at server nodes Communicating through point-to-point

messaging ; Overloaded servers split over new

servers Queries go to client nodes use local

images of the SDDS No central addressing component A node can be client and server (peer)

4

SDDS Principles (1993)

An outdated image may send a query an incorrect server

Servers forward such a query to the correct server

Image gets adjusted Image Adjustment Message (IAM) comes back

Client does not repeat the same error twice Data are basically in the RAM of the servers

5

SD-Rtree : a Spatial SDDS

Distributed Spatial Data

6

SD-Rtree : a Spatial SDDS

•Distributed Index • No central component

7

SD-Rtree : a Spatial SDDS Point & Window Queries kNN queries (future)

8

SD-Rtree : Generalizes R-tree

R-tree: Nodes are minimal

bounding boxes Leaf nodes point to

data Internal nodes

bound subtrees May overlap Split when overflow Generate balanced

m-ary tree

9

SD-Rtree : Generalizes R-tree

R-tree: An insert may

go through multiple paths

Ends up in the smallest bounding box

If there is any One of the

boxes gets enlarged

Box may split

10

SD-Rtree : Generalizes R-tree

R-tree: Search may go

through multiple paths

All paths may bring relevant objects

13

SD-Rtree: a Balanced Binary Tree

The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: Each internal node (or routing node) has

exactly two sons Each leaf node stores a subset of the

indexed dataset At each node, the height of the subtrees

differ by at most one Each server stores one data node and one

routing node

14

Sd-tree: Binary Tree Structure

di = data node (leaf) ri = routing node (internal node)

15

Sd-tree: Tree Distribution

17

SD-Rtree Balancing

The binary tree should be height-balanced The heights of the two subtrees

rooted at any node should not differ by more than 1 (cf. AVL trees)

The tree height is then logarithmic in the number of leaves

18

SD-Rtree Balancing

SD-Rtree balancing occurs during splits Messages are sent bottom-up to adjust the

height of the ancestor nodes Rotation occurs if an ancestor is imbalanced SD-Rtree rotation are spatial

change rectangles of internal nodes Best rotation minimizes rectangle overlapping

Tie breaking minimizes the « dead space »

20

Rotation Pattern

Properties The sons of a node are

not ordered => more freedom for

reorganizing the tree Any imbalanced node

matches a rotation pattern

A rotation pattern is a subtree a(b(e(f,g),d),c) such that:

h(c) = h(d) = h(f ) = n − 1 (n > 0)

h(g) = max(0, n − 2)

21

SD-Rtree :Spatial Rotation

22

Rotation Cost Constant number of messages (3 or 6,

depending on the choice) Few rotations in practice

In particular when the dataset is uniformly distributed

See our experiments

23

SD-Rtree : Images

Each image defines the addressing structure Resides as cache on a client or on a peer Starts with the address of the contact

server IAMs make it a subtree

Splits make images outdated IAMs adjust it incrementally

24

Image Adjustment Client contacts a server with a query Each incorrect server initiates a

traversal of the tree During the traversal, the description

of the nodes is collected The correct server sends the up-to-

date tree structure The client updates its image

26

Out-of-range situation

27

Insertion of objects

28

Overlapping management The directory rectangles in an Rtree may

overlap Local subtree does not suffice for locating all

the nodes that contains the point (point query) or the window (window query) searched for.

SD-Rtree servers maintain data on node overlapping Redundant Coverage

It avoids to systematically access the root node.

29

Redundant Coverage Example

The region common to A and B is stored on both nodes

If a point query sent to A falls in the region shared with B: A sends a point query message to B

For D: we must keep the intersection with C or B: here empty.

30

Queries Point queries and window queries. The

technique is similar to the insertion algorithm: Search in the client image a server whose mbb

contains the point or intersects the window Send the query to this server If the server actually covers the point or the

window; it answers to the client; else it sends the query to its parent node

A server uses the overlapping information to transmit the query

31

Experiments Synthetic data (points and rectangles)

generated with GSTD 50.000 to 500.000 objects 0 to 3.000 queries Server capacity: 3 000 objects

Comparison of three SD-Rtree variants: BASIC: no image; every query is processed

top-down from the root IMSERVER: no IAMs among the servers IMCLIENT: client images

33

Per Insert Cost

34

Cost of balancing

35

Image convergence

36

Distribution of messages

37

Cost per Query

38

Conclusion SD-Rtree is an efficient scalable distributed

Rtree For very large spatial data collections Can be processed in distributed RAM

Access time much faster than to disk data Load balancing

Spatial rotations Overlapping management

Redundant coverage O(log n) worst insert cost Future work

kNN-queries Objects distribution balancing on servers

39

SD-Rtree

Thank You for

Your Attention

Questions: First.Last@dauphine.fr