Project Proposals
description
Transcript of Project Proposals
Project Proposals
Simonas ŠaltenisAalborg University
Nykredit Center for Database ResearchDepartment of Computer Science, Aalborg University
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 2
Outline
• An overview of the R-tree and the TPR-tree• Project proposals:
Update-Efficient TPR-tree Time-parameterized SS-tree
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 3
Spatial Indexing With the R-Tree
• Example
QueryR1
R2
R1 R2
R3 R4 R5
p6 p7p5p1 p2
Pointers to data tuples
p8
p3 p4 p9 p10p11 p12p13
R6 R7
R3 R4
R5
R6
R7
p1
p7
p6
p8
p2
p3
p4
p5
p9 p10
p11
p12
p13
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 4
R-tree Properties
• Leaf entry = <n- dimensional point, rid >• Non-leaf entry = < n- dim MBR, ptr to a child node >
MBR – a Minimum Bounding Rectangle of all points in the subtreee pointed to by ptr
• R-tree is a balanced tree – all leaves are at same depth from root
• Through insertion and deletion algorithms, nodes are kept at least m% full (except root)
m is usually chosen to be 40%. m is the minimum fill factor, depending on the workload the
average fill factor is usually 70%.
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 5
Internal Nodes
Leaf Nodes
BP1 BPnBP2...
…..
...
Grow-Post Trees
• Union( ) – computes a BP of a coleection of entries (in the R-tree, computes an MBR – minimum and maximum in all dimensions )
• Penalty(BP, E) – returns an estimate how “worse” BP becomes if E is inserted under it
Bounding predicate (BP) = something that describes entries in a subtree
Building blocks of algorithms:• Consistent(BP, Q) – returns true if results of query Q can be under BP (in the R-tree, MBR intersects Q)• PickSplit(node) – splits a page of entries into two groups
R-Tree – a Grow-Post tree
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 6
Range Query in R-trees
• Answering range query Q in R-trees1. Start at the root
2. If current node is non- leaf, for each entry <MBR, ptr>, if Consistent(MBR, Q) , search subtree identified by ptr
3. If current node is leaf, for each entry <E, rid>, if E overlaps Q, rid identifies a point that overlaps Q
• Note: We may have to search several subtrees at each node! (In contrast, a B- tree equality search goes to just one leaf.) Worst-case performance O(n)! But in practice, R-trees exhibit good query performance for
various data sets
• What about insertion and deletion?
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 7
Insert Entry E<point, ptr>• Insertion algorithm
1. cn = root
2. If cn is leaf stop.
3. From all entries in cn choose the one e with the smallest Penalty (e.BP, E). (In R-trees, choose an entry whose MBR needs least enlargement to cover B; resolve ties by going to smallest area child)
4. cn = e.ptr, go to 3.
5. Insert e into cn. Call PropogateUp (cn).
• PropogateUp(cn)1. If cn is overfull, call PickSplit(cn) to produce cn1 and cn2, replace cn’s
old entry in its parent by e1 = Union(cn1), e2 = Union(cn2), call PropogateUp on cn’s parent.
2. Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent.
• Create a new root with two entries whenever a root is split.
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 8
Heuristics for Penalty
• Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty.
R1
R2
R1 R2
R3 R4 R5
p6 p7p5p1 p2
Pointers to data tuples
p8
p3 p4 p9 p10p11 p12p13
R6 R7
R3 R4
R5
R6
R7
p1
p7
p6
p8
p2
p3
p4
p5
p9 p10
p11
p12
p13p14
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 9
Heuristics for Penalty
• Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty.
R1
R2
R1 R2
R3 R4 R5
p6 p7p5p1 p2
Pointers to data tuples
p8
p3 p4 p9 p10p11 p12p13
R6 R7
R3 R4
R5
R6
R7
p1
p7
p6
p8
p2
p3
p4
p5
p9 p10
p11
p12
p13p14
p14
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 10
Deletion in R-trees
• Delete entry E1. Using the search procedure, find a leaf cn where entry E is
located
2. Remove E from cn. Call PropogateUp(cn).
• PropogateUp(cn)1. If cn is underfull, deallocate the node cn remove cn’s entry in its
parent, call PropogateUp on cn’s parent, and reinsert all cn’s entries or merge them into some other node
2. Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent.
• No additional heuristics are involved in Delete, underfull nodes are handled using Insert as a subroutine.
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 11
Modeling Continuous Movement
• In conventional databases, data is assumed constant unless explicitly modified.
• With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 12
Modeling Continuous Movement
• In conventional databases, data is assumed constant unless explicitly modified.
• With continuous movement, this is problematic. Too frequent updates Outdated, inacurate data
• Instead of storing position values, we store positions as functions of time, yielding time-parameterized positions.
We use linear functions to capture the present and future positions.
Updates are necessary only when the parameters of the functions change. For example, given , the current and anticiapted, future position of a two-
dimensional point can be described by four parameters.
where,)()()( 00 nowtttvtxtx
0t
yx vvtytx ,),(),( 00
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 13
Queries
• Type 1: objects that intersect a given rectangle at
• Type 2: objects that intersect a given rectangle sometime from to
• Type 3: objects that intersect a given moving rectangle sometime between and
1t 2t
1t 2t
t
1
23456x
t1 2 3 4 5 6
o1
o1o2
o3
o4
• We can expect, that most queries will be consentrated in the sliding window [CT, CT+W], i.e. CT <= t, t1, t2 <= CT + W
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 14
Time-Parameterized Rectangles
• The TPR-tree is based on the R-tree.
• Moving points are bounded with time-parameterized rectangles.
Are bounding from now on. The R-tree allows overlap.
• The tree employs conservative bounding rectangles.
min
max
min
max
Union( ) :
( ) min . ( )
( ) max . ( )
min .
max .
i c i c
i c i c
i i
i i
node
x t o x t o node
x t o x t o node
v o v o node
v o v o node
min min min
max max max
( ) ( ) ( )
( ) ( ) ( )
i i c i c
i i c i c
x t x t v t t
x t x t v t t
• At any t > tc we can get a valid R-tree: TPBR-tree(t) = R-tree
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 15
Insertion: Grouping Points
• How to group moving points (Penalty and PickSplit)? The R-tree’s algorithms minimize characteristics of MBRs such as
area, overlap, and margin. How does that work for moving points?
7
1
6
5
4
2
3
7
5
6
4
2
31
6
5
4
2
31
7
7
5
6
4
2
31
7
5
6
4
2
31
7
5
6
4
2
31
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 16
Insertion in the TPR-Tree
• The bounding rectangle characteristics (area, overlap, and margin) are functions of time.
• The goal is to minimize these for all time points from now to now+H.
Minimizing the characteristics for time now + H/2 does not work (e.g., the area of a conservative bounding rectangle is not linear).
,Hnow
now
dttA )( where A(t) is, e.g., the area of an MBR
• We use the regular R*-tree algorithms, but all bounding rectangle characteristics are replaced by their integrals.
• What H to use? H depends on the update rate, and on how far queries may reach
into the future (W).
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 18
Outline
• An overview of the R-tree and the TPR-tree• Project proposals:
Update-Efficient TPR-tree Time-parameterized SS-tree
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 19
Update-Efficient TPR-tree
• Handling hyper-dynamic data 500,000 objects; on the average each object updates its positional
info three times per hour => ~400 updates per second
• Update – deletion followed by an insertion• Observations:
Usually object’s positional information does not change too drastically in-between updates
Most of the update cost is due to a search phase of a deletion (several paths down the tree may be followed)
We assume that the object reports it’s previous positional information, so that we know what to delete.
We need to spend I/Os on making bounding predicates as “tight” as possible, although we may be willing to sacrifice query performance
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 20
In-place Updates
• Lazy Update R-tree (LUR-tree): Hash table (on object id’s) is used to access leaf pages directly
(without the search phase of deletion). Update is one operation:
1. Go to the hash table with an object’s id, and get the pointer to the leaf page
2. Update the object’s information in this page or, if object’s information changed too “drastically”, insert it from the top of the tree using the normal insertion procedure
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 21
Problems to Solve Problems (that you have to try to solve, refining and
applying these ideas to the TPR-tree): How do we update bounding rectangles in ancestor nodes?
Possible solution: hash table storing the full path from the root to the leaf
When do we do a real insertion and when an update in place? What do we do when nodes are split/merged? (Can we spend so
many I/Os maintaining our hash table?) Possible solution: Lazy updating of the hash table and use of pointers
to split-off nodes as in R-link trees.
WIM workshop, Gl. Vrå Slot, December 6-8, 2001 22
Time-Parameterized SS-trees
• SS-tree – a Grow-Post tree, where bounding predicates are spheres:
Good for Nearest Neighbor queries Compact description of a bounding predicate (independent of
dimensionality)
• Project – explore time-parameterized SS-trees. Issues to be addressed:
Writing the Consistent method Writing the Penalty method Experimentally comparing with TPR-tree for range queries and NN
queries