Indexing for Multidimensional Data An Introduction.
Transcript of Indexing for Multidimensional Data An Introduction.
Indexing for Multidimensional Data
An Introduction
Advanced Data Structures 2Jaruloj Chongstitvatana
Applications of Multidimensional Databases
• Databases with multiple-attribute key
• Spatial databases
• Geographic information system (GIS)
• Computer-aided design (CAD)
• Multimedia databases
• Medical applications
Advanced Data Structures 3
Characteristics of Good Index Structures
• Dynamic
• Operations– Queries
• Point queries
• Range queries
• Spatial queries
– Insert
– Delete
• Simplicity
• Performance– Disk accesses
– Running time
– Storage utilization• Low % of waste space
• Memory
• Disk
• Scalability– Data size
– Data dimension
Jaruloj Chongstitvatana
Advanced Data Structures 4
Why Hierarchical Structures
ADVANTAGES
• Allow the search to be focused on interesting subset of data
• Eliminate useless search• Clean and simple
implementation
DISADVANATGES
• Parallelism
Jaruloj Chongstitvatana
Advanced Data Structures 5
Types of Data
• Multi-dimension point data– Database with multiple-attribute key– Point in 2D or 3D
• Interval data• Multi-dimension region data• High-dimensional point data
– Data mining
Jaruloj Chongstitvatana
Advanced Data Structures 6Jaruloj Chongstitvatana
Comparison
B tree• Binary tree• Unbalanced• Organize data• Memory-based index
– Measuring the running time
• Practical memory size
B+ tree
• N-ary tree
• Height-balanced
• Organize data space
• Disk-based index– Measuring the number
of disk accesses
• Disk page size
Advanced Data Structures 7Jaruloj Chongstitvatana
B tree
10
4
9
20
6
7
Advanced Data Structures 8Jaruloj Chongstitvatana
B+ tree
6 11 14 48 19 22
16 31
Advanced Data Structures 9
B+ tree
Jaruloj Chongstitvatana
• N-ary tree• Increase the breadth of trees to decrease the height• Used for indexing of large amount of data (stored in
disk)
Advanced Data Structures 10
Example
Jaruloj Chongstitvatana
12 52 78
83 91
60 69 19 26 37 46
4 8
012 70
717677
7980818283
858690
9395979899
5456575960
61626667
13141719
20212226
27283135
384445
4950
567
891112
Advanced Data Structures 11
Properties of B+ trees
For an M-ary B tree:• The root has up to M children.• Non-leaf nodes store up to M-1 keys, and have
between M/2 and M children, except the root.• All data items are stored at leaves.• All leaves have to same depth, and store
between L/2 and L data items.
Jaruloj Chongstitvatana
Advanced Data Structures 12
Search
Jaruloj Chongstitvatana
12 52 78
83 91
60 69 19 26 37 46
4 8
012 70
717677
7980818283
858690
9395979899
5456575960
61626667
13141719
20212226
27283135
384445
4950
567
891112
Search for 66
Advanced Data Structures 13
Insert
Jaruloj Chongstitvatana
12 52 78
83 91
60 69 19 26 37 46
4 8
012 70
717677
7980818283
858690
9395979899
5456575960
61626667
13141719
20212226
27283135
384445
4950
567
891112
Insert 55Split leave
Advanced Data Structures 14
Insert
Jaruloj Chongstitvatana
12 52 78
83 91
60 69 19 26 37 46
4 8
012 70
717677
7980818283
858690
9395979899
5456575960
61626667
13141719
20212226
2728313536
384445
4950
567
891112
Insert 32Split leave
Insert key 31Split node
Insert key 31
Advanced Data Structures 15Jaruloj Chongstitvatana
Handling multiple attributes
• Separate index structure for each attributes– Update all index structures for each record update.– Data are scattered in many disk pages.
a1 a2 a3
disk
a4
Advanced Data Structures 16Jaruloj Chongstitvatana
Handling multiple attributes
• Bit interleaving
• Attribute interleaving
Advanced Data Structures 17
Multiple-attribute indexing
•Quad-tree
•k-d tree
•k-d-B tree
•Grid file
•hB-tree
Issues• Non-linear relationship• Distance measure• k-nearest-neighbor
queries
Jaruloj Chongstitvatana
Advanced Data Structures 18
Spatial Indexing
•R-tree
•R*-tree
•SKD-tree
Issues• Non-linear ordering• Spatial queries• High cost of determining
spatial relationship
Jaruloj Chongstitvatana
Advanced Data Structures 19
High-dimensional Indexing
•SS-tree
•TV-tree
Issues: Curse of dimensionality• Volume grows exponentially with
dimension• Partition in higher dimension is
coarser• Distance measurement in higher
dimension is not practical
Jaruloj Chongstitvatana