1 More Specialized Data Structures String data structures Spatial data structures.
-
Upload
valerie-floyd -
Category
Documents
-
view
278 -
download
8
Transcript of 1 More Specialized Data Structures String data structures Spatial data structures.
1
More Specialized Data Structures
String data structures
Spatial data structures
2
String Data Structures
3
String Operations String indexing Pattern matching
Find pattern P in text T Find common substrings among a set of
a strings Application Domains
Bioinformatics Google search!
4
A simplified hash table for strings
0. Build a lookup table of size |Σ|w for all w-length words in D
AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
S1: C A G T C C TS2: C G T T C G C
1 2 3 4 5 6 7
S1,1S1,2 S1,3 S1,4S1,5 S1,6
S2,1 S2,2 S2,3S2,4
S2,5
S2,6
Σ={A,C,G,T}w = 2 42 (=16) entries in lookup table
Lookup table:
6
PATRICIA trees “Practical Algorithm to Retrieve
Information Coded in Alphanumeric” Compacted trie of a set of strings Dictionary searches made easy
7
Suffix Tree Compacted trie
of all suffixes of a string
1 2 3 4 5 6B A N A N A
Find Pattern: “ANAN”
Think how to implement Google Search?
8
Generalized Suffix Tree (GST)$
O
ND
W
I$OGD
$OG
I
OW
$
$OG
ND
$OG
I OW
$
$OG
I OW
$ $W
$
IND
OW
$
$
(2, 3) (1, 4)
(2, 5)
(2, 4)
(2, 1) (1, 2)
(2, 2) (1, 3) (1, 5) (2, 6)
(1, 6) (1, 1)
(1, 7)(2, 7)
WINDOW$ INDIGO$ 1234567 1234567
10
Spatial Data Structures
11
Spatial Data Structures
Operation Type Data Structures
Spatial queries on high-dimensional data: - range queries - nearest neighbor search
Quad-trees oct-trees k-d trees range trees R-trees
Points in 2-D Bounding rectangle
12
Recursive Bisection Technique for spatial domain decomposition
Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005
c
F D E G …
….
root
Quad trees(4-way trees)
13
Compacted Quad-trees (for 2D data)
Source: Handbook of Data Structures & Applications, Chapman & Hall/CRC Press, 2005
• For 3D data, the corresponding tree is called an oct-tree
N
• Each node has exactly 4 children (for 4 quadrants)
Compact path into single edge
2D space with data Quad-tree decomposition
E
Range Queries on Quad-trees
(0,0)
RangeQueryResult
(a1,b1)
(a2,b2)
15
Oct-Trees (for 3D data)Issue:What happens if the data is unevenly(ie., non-uniformly)distributed ?
Most of the levels in the tree will be empty
Solution:“Compacted Oct-trees”
16
k-d trees (for k dimensions) Maintain a combined
binary search tree for all dimensions
Recursively bisect each dimension, alternating dimensions at each level of the tree