Indexing for Multidimensional Data An Introduction.

Post on 31-Dec-2015

213 views 1 download

Transcript of Indexing for Multidimensional Data An Introduction.

Indexing for Multidimensional Data

An Introduction

Advanced Data Structures 2Jaruloj Chongstitvatana

Applications of Multidimensional Databases

• Databases with multiple-attribute key

• Spatial databases

• Geographic information system (GIS)

• Computer-aided design (CAD)

• Multimedia databases

• Medical applications

Advanced Data Structures 3

Characteristics of Good Index Structures

• Dynamic

• Operations– Queries

• Point queries

• Range queries

• Spatial queries

– Insert

– Delete

• Simplicity

• Performance– Disk accesses

– Running time

– Storage utilization• Low % of waste space

• Memory

• Disk

• Scalability– Data size

– Data dimension

Jaruloj Chongstitvatana

Advanced Data Structures 4

Why Hierarchical Structures

ADVANTAGES

• Allow the search to be focused on interesting subset of data

• Eliminate useless search• Clean and simple

implementation

DISADVANATGES

• Parallelism

Jaruloj Chongstitvatana

Advanced Data Structures 5

Types of Data

• Multi-dimension point data– Database with multiple-attribute key– Point in 2D or 3D

• Interval data• Multi-dimension region data• High-dimensional point data

– Data mining

Jaruloj Chongstitvatana

Advanced Data Structures 6Jaruloj Chongstitvatana

Comparison

B tree• Binary tree• Unbalanced• Organize data• Memory-based index

– Measuring the running time

• Practical memory size

B+ tree

• N-ary tree

• Height-balanced

• Organize data space

• Disk-based index– Measuring the number

of disk accesses

• Disk page size

Advanced Data Structures 7Jaruloj Chongstitvatana

B tree

10

4

9

20

6

7

Advanced Data Structures 8Jaruloj Chongstitvatana

B+ tree

6 11 14 48 19 22

16 31

Advanced Data Structures 9

B+ tree

Jaruloj Chongstitvatana

• N-ary tree• Increase the breadth of trees to decrease the height• Used for indexing of large amount of data (stored in

disk)

Advanced Data Structures 10

Example

Jaruloj Chongstitvatana

12 52 78

83 91

60 69 19 26 37 46

4 8

012 70

717677

7980818283

858690

9395979899

5456575960

61626667

13141719

20212226

27283135

384445

4950

567

891112

Advanced Data Structures 11

Properties of B+ trees

For an M-ary B tree:• The root has up to M children.• Non-leaf nodes store up to M-1 keys, and have

between M/2 and M children, except the root.• All data items are stored at leaves.• All leaves have to same depth, and store

between L/2 and L data items.

Jaruloj Chongstitvatana

Advanced Data Structures 12

Search

Jaruloj Chongstitvatana

12 52 78

83 91

60 69 19 26 37 46

4 8

012 70

717677

7980818283

858690

9395979899

5456575960

61626667

13141719

20212226

27283135

384445

4950

567

891112

Search for 66

Advanced Data Structures 13

Insert

Jaruloj Chongstitvatana

12 52 78

83 91

60 69 19 26 37 46

4 8

012 70

717677

7980818283

858690

9395979899

5456575960

61626667

13141719

20212226

27283135

384445

4950

567

891112

Insert 55Split leave

Advanced Data Structures 14

Insert

Jaruloj Chongstitvatana

12 52 78

83 91

60 69 19 26 37 46

4 8

012 70

717677

7980818283

858690

9395979899

5456575960

61626667

13141719

20212226

2728313536

384445

4950

567

891112

Insert 32Split leave

Insert key 31Split node

Insert key 31

Advanced Data Structures 15Jaruloj Chongstitvatana

Handling multiple attributes

• Separate index structure for each attributes– Update all index structures for each record update.– Data are scattered in many disk pages.

a1 a2 a3

disk

a4

Advanced Data Structures 16Jaruloj Chongstitvatana

Handling multiple attributes

• Bit interleaving

• Attribute interleaving

Advanced Data Structures 17

Multiple-attribute indexing

•Quad-tree

•k-d tree

•k-d-B tree

•Grid file

•hB-tree

Issues• Non-linear relationship• Distance measure• k-nearest-neighbor

queries

Jaruloj Chongstitvatana

Advanced Data Structures 18

Spatial Indexing

•R-tree

•R*-tree

•SKD-tree

Issues• Non-linear ordering• Spatial queries• High cost of determining

spatial relationship

Jaruloj Chongstitvatana

Advanced Data Structures 19

High-dimensional Indexing

•SS-tree

•TV-tree

Issues: Curse of dimensionality• Volume grows exponentially with

dimension• Partition in higher dimension is

coarser• Distance measurement in higher

dimension is not practical

Jaruloj Chongstitvatana