Download - R-Tree: Spatial Representation on a Dynamic-Index Structure

R-Tree: Spatial Representation on a Dynamic-Index Structure

Advanced Algorithms & Data Structures

Lecture Theme 03 – Part I

K. A. Mohamed

Summer Semester 2006

2

Overview

• Representing and handling spatial data

• The R-Tree indexing approach, style and structure

• Properties and notions

• Searching and inserting index Entry-records

• Deleting and updating

• Performance analyses

• Node splitting algorithms

• Derivatives of the R-Trees

• Applications

3

Spatial Database (Ia)

• Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.

4

Spatial Database (Ib)

• Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick topological search.

Spatial object: Contour (outline) of the area around the building(s).

Minimum bounding region (MBR) of the object.

5

Spatial Database (Ic)

• Consider: Given a city map, ‘index’ all university buildings in an efficient structure for quick relational-topological search.

MBR of the city neighbourhoods.

MBR of the city defining the overall search region.

6

Spatial Database (II)

Notion: To retrieve data items quickly and efficiently according to their spatial locations.

• Involves 2D regions.

• Need to support 2D range queries.

• Multiple return values desired: Answering a query region by reporting all spatial objects that are fully-contained-in or overlapping the query region (Spatial-Access Method – SAM).

In general:

• Spatial data objects often cover areas in multidimensional spaces.

• Spatial data objects are not well-represented by point-location.

• An ‘index’ based on an object’s spatial location is desirable.

7

The Indexing Approach

• A B-Tree (Rosenberg & Snyder, 1981) is an ordered, dynamic, multi-way structure of order m (i.e. each node has at most m children).

• The keys and the subtrees are arranged in the fashion of a search tree.

• Each node may contain a large number of keys, and the number of subtrees in each node, then, may also be large.

• The B-Tree is designed (among other objectives):

– to branch out this large number of directions, and

– to contain a lot of keys in each node so that the height of the tree is relatively short.

M

P T X

B D F G K L N O Q S V W Y ZI

E H

8

The R-Tree Index Structure

• An R-Tree is a height-balanced tree, similar to a B-Tree.

• Index records in the leaf nodes contain pointers to the actual spatial-objects they represent.

• Leaves in the structure all appear on the same level.

• Spatial searching requires visiting only a small number of nodes.

• The index is completely dynamic: inserts and deletes can be intermixed with searches.

• No periodic reorganisation is required.

9

The R-Tree Index Structure

• A spatial database consists of a collection of tuples representing spatial objects, known as Entries.

• Each Entry has a unique identifier that points to one spatial object, and its MBR; i.e. Entry = (MBR, pointer).

10

R-Tree Index Structure – Leaf Entries

• An entry E in a leaf node is defined as (Guttman, 1984):

E = (I, tuple-identifier)• Where I refers to the smallest binding n-dimensional region (MBR) that

encompasses the spatial data pointed to by its tuple-identifier.

• I is a series of closed-intervals that make up each dimension of the binding region.

Example. In 2D, I = (Ix, Iy), where Ix = [xa, xb], and Iy = [ya, yb].

11

R-Tree Index Structure – Leaf Entries

• In general I = (I0, I1, …, In-1) for n-dimensions, and that Ik = [ka, kb].

• If either ka or kb (or both) are equal to , this means that the spatial object extends outward indefinitely along that dimension.

12

R-Tree Index Structure – Non-Leaf Entries

• An entry E in a non-leaf node is defined as:

E = (I, child-pointer)• Where the child-pointer points to the child of this node, and I is the MBR

that encompasses all the regions in the child-node’s pointer’s entries.

I(A) I(B) … I(M)

I(a) I(b) I(c) I(d)

B

a

b

c

d

13

Properties

Then an R-Tree must satisfy the following properties:

1. Every leaf node contains between m and M index records, unless it is the root.

2. For each index-record Entry (I, tuple-identifier) in a leaf node, I is the MBR that spatially contains the n-dimensional data object represented by the tuple-identifier.

3. Every non-leaf node has between m and M children, unless it is the root.

4. For each Entry (I, child-pointer) in a non-leaf node, I is the MBR that spatially contains the regions in the child node.

5. The root has two children unless it is a leaf.

6. All leaves appear on the same level.

Let M be the maximum number of entries that will fit in one node.Let m ≤ M/2 be a parameter specifying the minimum number of entries in one node.

14

Node Overflow and Underflow

• A Node-Overflow happens when a new Entry is added to a fully packed node, causing the resulting number of entries in the node to exceed the upper-bound M.

• The ‘overflow’ node must be split, and all its current entries, as well as the new one, consolidated for local optimum arrangement.

• A Node-Underflow happens when one or more Entries are removed from a node, causing the remaining number of entries in that node to fall below the lower-bound m.

• The underflow node must be condensed, and its entries dispersed for global optimum arrangement.

15

Search

Typical query: Find and report all university building sites that are within 5km of the city centre.

Key:a: FAWb: Uni-Klinikumc: Psychologied: Biotechnologye: Institutsviertelf: Rektoratg: Uni-zentrumh: Biology / Botanischeri: Sportszentrum

i. Build the R-Tree using rectangular regions a, b, … i.ii. Formulate the query range Q.iii. Query the R-Tree and report all regions overlapping Q.

Approach:

16

Search Strategy

Let Q be the query region.

Let T be the root of the R-Tree.

Search all entry-records whose regions overlaps Q.

Search sub-trees:

• If T is not leaf, then apply Search on ever child-node entry E whose I overlaps Q.

Search leaf nodes:

• If T is leaf, then check each entry E in the leaf and return E if E.I overlaps Q.

17

Search

Typical query: Find and report all university buildings/sites that are within 5km of the city centre.

r2

e

r5 r8

r3 r4r1 r7r0

ic gf hba d

@ r6

@ r2 @ r5 @ r8

@ r0 @ r1 @ r7 @ r3 @ r4

R-Tree settings: M = m =

18

Search

• The search algorithm descends the tree from the root in a manner similar to a B-Tree.

• More than one subtree under a node visited may need to be searched.

• Cannot guarantee good worst-case performance.

• Countered by the algorithms during insertion, deletion, and update that maintain the tree in a form that allows the search algorithm to eliminate irrelevant regions of the indexed space.

• So that only data near the search area need to be examined.

• Emphasis is on the optimal placement of spatial objects with respect to the spatial location of other objects in the structure.

19

Insertion

• New index entry-records are added to the leaves.• Nodes that overflow are split, and splits propagate up the tree.• A split-propagation may cause the tree to grow in height.

The main Insert routine• Let E = (I, tuple-identifier) be the new entry to be inserted.• Let T be the root of the R-Tree.

[Ins_1] Locate a leaf L starting from T to insert E. [Ins_2] Add E to L. If L is already full (overflow), split L into L and

L’. [Ins_3] Propagate MBR changes (enlarged or reduced) upwards. [Ins_4] Grow tree taller if node split propagation causes T to split.

20

Insertion – Choose Leaf

[Ins_1] Locate a leaf L starting from T to insert E = (I, tuple-identifier).• Notion (i): Select the path that would require the least enlargement to include

E.I.• Notion (ii): Resolve ties by choosing the child-node with the smallest MBR.

• Invoke: L = ChooseLeaf (E, T).

A B C

@rNA

C

B

E.I

rN

21

Insertion – Choose Leaf

Algorithm: ChooseLeaf (E, N)

Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.

Output: The leaf L where E should be inserted.

• If N is leaf Then Return N

• Let FS be the set of current entries in the node N

• Let F = (I, child-pointer) FS, so that F.I satisfies the Insertion-Notions

• Return ChooseLeaf (E, F.child-pointer)

22

Insertion – Add and Adjust

[Ins_2] Add E to L.

• Notion (i): If L has room for another entry, install E.

• Notion (ii): Otherwise split L to obtain L and L’, which between them, will contain all previous entries in L and the new E (consolidated for local optima).

[Ins_3] Propagate MBR changes upwards by invoking AdjustTree (L, L’).

• Notion (i): Ascend from leaf L to the root T while adjusting the covering rectangles MBR.

• Notion (ii): If L’ exists, propagate node splits as necessary; i.e. attempt to install a new entry in the parent of L to point to L’.

23

Insertion – Propagating Node Splits

Example. Found L = @Y to insert new E = e. R-Tree settings: M = 3, m = 1.

K

@G

a b c

@Y

X Y Z

@K

24

Insertion – Adjust Tree (I)

Algorithm: AdjustTree (N, N’)

Inputs: (i) A node N that has had its contents modified, (ii) The resultant split node N’, if not NULL, that accompanies N.

Outputs: (i) N as above, (ii) N’ as above.

• If N is the root Then Return {(i) N, (ii) N’}

• Let PN be the parent node of N.

• Let EN = (I_N, child-pointer_N) in PN, where child-pointer_N points to N.

• Adjust I_N so that it tightly encloses all entry regions in N.

25

Insertion – Adjust Tree (II)

• If N’ is Not NULL Then If number of entries in PN < M-1 Then

• Create a new Entry EN’ = (I_N’, child-pointer_N’)• Install EN’ in PN

• Return AdjustTree (PN, NULL) Else

• Set {PN, PN’} = SplitNode (PN, EN’)

• Return AdjustTree (PN, PN’) End If

• Else Return AdjustTree (PN, NULL)

• End If

26

Insertion – Grow Tree Taller

[Ins_4] Grow Tree taller.

• Notion: If the recursive node split propagation causes the root to split, then create a new root whose children are the two resulting nodes.

A B C

@T (root)

E F

@C

G H

@C’

27

Insertion – Summary

• The height of the R-Tree containing n entry-records is at most logm n – 1, because the branching factor of each node is at least m.

• The maximum number of nodes is:

• Worst case space utilisation for all nodes except the root is:

• Nodes will tend to have more than m entries, and this will:

28

Overview

• Representing and handling spatial data

• The R-Tree indexing approach, style and structure

• Properties and notions

• Searching and inserting index Entry-records

• Deleting and updating

• Performance analyses

• Node splitting algorithms

• Derivatives of the R-Trees

• Applications

29

Deletion

• Current index entry-records are removed from the leaves.

• Nodes that underflow are condensed, and its contents redistributed appropriately throughout the tree.

• A condense propagation may cause the tree to shorten in height.

The main Delete routine

• Let E = (I, tuple-identifier) be a current entry to be removed.

• Let T be the root of the R-Tree. [Del_1] Find the leaf L starting from T that contains E. [Ins_2] Remove E from L, and condense ‘underflow’ nodes. [Ins_3] Propagate MBR changes upwards. [Ins_4] Shorten tree if T contains only 1 entry after condense

propagation.

30

Deletion – Find Leaf

[Del_1] Find the leaf L starting from T that contains E.Algorithm: FindLeaf (E, N)Inputs: (i) Entry E = (I, tuple-identifier), (ii) A valid R-Tree node N.Output: The leaf L containing E.

• If N is leaf Then If N contains E Then Return N Else Return NULL

• Else Let FS be the set of current entries in N. For each F = (I, child-pointer) FS where F.I overlaps E.I Do

• Set L = FindLeaf (E, F.child-pointer)• If L is not NULL Then Return L

Next F Return NULL

• End If

31

Deletion – Remove and Adjust

[Del_2] Remove E from L, and condense ‘underflow’ nodes.

[Del_3] Propagate MBR changes upwards.

• Notion (i): Ascend from leaf L to root T while adjusting covering rectangles MBR.

• Notion (ii): If after removing the entry E in L and the number of entries in L becomes fewer than m, then the node L has to be eliminated and its remaining contents relocated.

32

Deletion – Remove and Adjust

• Propagate these notions upwards by invoking CondenseTree (N, QS), where N is an R-Tree node whose entries have been modified, and QS is the set of eliminated nodes.

• Start the propagation by setting N = L, and QS = .

• Re-insert the entries from the eliminated nodes in QS back into the tree.

• Entries from eliminated leaf nodes are re-inserted as new entries using the Insert routine discussed earlier.

• Entries from higher-level nodes must be placed higher in the tree so that leaves of their dependent subtrees will be on the same level as the leaves on the main tree.

33

Deletion – Propagating Node Condensation

• Example: Delete the index entry-record b. R-Tree settings: M = 4, m = 2.

• Spatial constraint: a.I will form smallest MBR with r4.

r2 r6

@ r7

a b

@ r0

r0 r1

@ r2

r3 r4 r5

@ r6

c d e

@ r1

f g h

@ r3

i j

@ r4

k l m

@ r5

n

34

Deletion – Condense Tree (I)

Algorithm: CondenseTree (N, QS)

Inputs: (i) A node N whose entries have been modified, (ii) A set of eliminated nodes QS.

• If N is NOT the root Then Let PN be the parent node of N. Let EN = (I_N, child-pointer_N) in PN. If N.entries < m Then

• Delete EN from PN

• Add N to QS Else

• Adjust I_N so that it tightly encloses all entry regions in N. End If CondenseTree (PN, QS)

35

Deletion – Condense Tree (II)

• Else If N is root AND Q is NOT Then For each Q QS Do

• For each E Q Do If Q is leaf Then Insert (E) Else Insert (E) as a node entry at the same node level as Q End If

• Next E Next Q

• End If

36

Deletion – Summary

Why ‘re-insert’ orphaned entries?

• Alternatively, like the delete routine in B-Tree (Rosenberg & Snyder, 1981), an ‘underflow’ node can be merged with whichever adjacent sibling that will have its area increased the least, or its entries re-distributed among sibling nodes.

• Both methods can cause the nodes to split.

• Eventually all changes need to be propagated upwards, anyway.

Re-insertion accomplishes the same thing, and:

• It is simpler to implement (and at comparable efficiency).

• It incrementally refines the spatial structure of the tree.

• It prevents gradual deterioration if each entry was located permanently under the same parent node.

37

Performance with respect to Parameter m

• A high value of m, nearer to M, is useful when the underlying database represented by the R-Tree is mostly used for search inquiries with very few updates.

• The height of the tree will be kept to a minimum.

• High search performance is maintained.

• However, the risk of overflow and underflow is also high.

• A small value of m is good when frequent updates and modifications of the underlying database is required.

• The nodes are less dense.

• Maintenance is less costly.