1 More Specialized Data Structures String data structures Spatial data structures.

14
1 More Specialized Data Structures String data structures Spatial data structures

Transcript of 1 More Specialized Data Structures String data structures Spatial data structures.

Page 1: 1 More Specialized Data Structures String data structures Spatial data structures.

1

More Specialized Data Structures

String data structures

Spatial data structures

Page 2: 1 More Specialized Data Structures String data structures Spatial data structures.

2

String Data Structures

Page 3: 1 More Specialized Data Structures String data structures Spatial data structures.

3

String Operations String indexing Pattern matching

Find pattern P in text T Find common substrings among a set of

a strings Application Domains

Bioinformatics Google search!

Page 4: 1 More Specialized Data Structures String data structures Spatial data structures.

4

A simplified hash table for strings

0. Build a lookup table of size |Σ|w for all w-length words in D

AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT

S1: C A G T C C TS2: C G T T C G C

1 2 3 4 5 6 7

S1,1S1,2 S1,3 S1,4S1,5 S1,6

S2,1 S2,2 S2,3S2,4

S2,5

S2,6

Σ={A,C,G,T}w = 2 42 (=16) entries in lookup table

Lookup table:

Page 5: 1 More Specialized Data Structures String data structures Spatial data structures.

6

PATRICIA trees “Practical Algorithm to Retrieve

Information Coded in Alphanumeric” Compacted trie of a set of strings Dictionary searches made easy

Page 6: 1 More Specialized Data Structures String data structures Spatial data structures.

7

Suffix Tree Compacted trie

of all suffixes of a string

1 2 3 4 5 6B A N A N A

Find Pattern: “ANAN”

Think how to implement Google Search?

Page 7: 1 More Specialized Data Structures String data structures Spatial data structures.

8

Generalized Suffix Tree (GST)$

O

ND

W

I$OGD

$OG

I

OW

$

$OG

ND

$OG

I OW

$

$OG

I OW

$ $W

$

IND

OW

$

$

(2, 3) (1, 4)

(2, 5)

(2, 4)

(2, 1) (1, 2)

(2, 2) (1, 3) (1, 5) (2, 6)

(1, 6) (1, 1)

(1, 7)(2, 7)

WINDOW$ INDIGO$ 1234567 1234567

Page 8: 1 More Specialized Data Structures String data structures Spatial data structures.

10

Spatial Data Structures

Page 9: 1 More Specialized Data Structures String data structures Spatial data structures.

11

Spatial Data Structures

Operation Type Data Structures

Spatial queries on high-dimensional data: - range queries - nearest neighbor search

Quad-trees oct-trees k-d trees range trees R-trees

Points in 2-D Bounding rectangle

Page 10: 1 More Specialized Data Structures String data structures Spatial data structures.

12

Recursive Bisection Technique for spatial domain decomposition

Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005

c

F D E G …

….

root

Quad trees(4-way trees)

Page 11: 1 More Specialized Data Structures String data structures Spatial data structures.

13

Compacted Quad-trees (for 2D data)

Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005

• For 3D data, the corresponding tree is called an oct-tree

N

• Each node has exactly 4 children (for 4 quadrants)

Compact path into single edge

2D space with data Quad-tree decomposition

E

Page 12: 1 More Specialized Data Structures String data structures Spatial data structures.

Range Queries on Quad-trees

(0,0)

RangeQueryResult

(a1,b1)

(a2,b2)

Page 13: 1 More Specialized Data Structures String data structures Spatial data structures.

15

Oct-Trees (for 3D data)Issue:What happens if the data is unevenly(ie., non-uniformly)distributed ?

Most of the levels in the tree will be empty

Solution:“Compacted Oct-trees”

Page 14: 1 More Specialized Data Structures String data structures Spatial data structures.

16

k-d trees (for k dimensions) Maintain a combined

binary search tree for all dimensions

Recursively bisect each dimension, alternating dimensions at each level of the tree