Data Storage & Indexesweb.stanford.edu/class/cs245/slides/05-Storage-Formats-p2.pdf · Data Storage...
Transcript of Data Storage & Indexesweb.stanford.edu/class/cs245/slides/05-Storage-Formats-p2.pdf · Data Storage...
Outline
Co-designing storage and compute (paper)
Indexes
CS 245 2
Outline
Co-designing storage and compute (paper)
Indexes
CS 245 3
C-Store Storage
The storage construct was a “projection”; what does that mean?
CS 245 4
C-Store Compression
Five types of compression:» Null suppression» Dictionary encoding» Run-length encoding» Bit-vector encoding» Lempel-Ziv
Tradeoff: size vs ease of computation
CS 245 5
API for Compressed Blocks
CS 245 6
Using the Block API
CS 245 7
Data Size with Each Scheme
CS 245 8
(a) Runs of length 50 (b) Runs of length 1000
Performance with Each Scheme
CS 245 9
(a) Runs of length 50 (b) Runs of length 1000
How would the results change on SSDs?
Outline
Co-designing storage and compute (paper)
Indexes
CS 245 10
Key Operations on an Index
Find all records with a given value for a key» Key can be one field or a tuple of fields
(e.g. country=“US” AND state=“CA”)» In some cases, only one matching record
Find all records with key in a given range
Find nearest neighbor to a data point?
CS 245 11
Tradeoffs in Indexing
Improved queryperformance
Size ofindexes
Cost to updateindexes
CS 245 12
Some Types of Indexes
Conventional indexes
B-trees
Hash indexes
Multi-key indexing
CS 245 13
Many standard data structures, but adapted to work well on disk
Sequential File
2010
4030
6050
8070
10090
CS 245 14
Sequential File
2010
4030
6050
8070
10090
Dense Index
10203040
50607080
90100110120
CS 245 15
Sequential File
2010
4030
6050
8070
10090
Sparse Index
10305070
90110130150
170190210230
CS 245 16
Sequential File
2010
4030
6050
8070
10090
2-level sparse index
10305070
90110130150
170190210230
1090
170250
330410490570
CS 245 17File and 2nd level index blocks need not be contiguous on disk
Sparse: Less space usage, can keep moreof index in memory
Dense: Can tell whether any record existswithout accessing file
(Later: sparse better for insertions, dense needed for secondary indexes)
Sparse vs Dense Tradeoff
CS 245 18
Terms
Search key of an indexPrimary index (on primary key of ordered files)Secondary indexDense index (contains all search key values)Sparse indexMulti-level index
CS 245 19
Handling Duplicate Keys
For a primary index, can point to first instance of each item (assuming blocks are linked)
For a secondary index, need to point to a list of records since they can be anywhere
CS 245 20
2010
4030
6050
8070
10305070
90110130150
Deletion: Sparse Index
CS 245 21
Deletion: Sparse Index
2010
4030
6050
8070
10305070
90110130150
– delete record 40
CS 245 22
2010
4030
6050
8070
10305070
90110130150
Deletion: Sparse Index– delete record 40
CS 245 23
2010
4030
6050
8070
10305070
90110130150
Deletion: Sparse Index– delete record 30
CS 245 24
2010
4030
6050
8070
10305070
90110130150
4040
Deletion: Sparse Index– delete record 30
CS 245 25
2010
4030
6050
8070
10305070
90110130150
Deletion: Sparse Index– delete records 30 & 40
CS 245 26
2010
4030
6050
8070
10305070
90110130150
Deletion: Sparse Index– delete records 30 & 40
CS 245 27
2010
4030
6050
8070
10305070
90110130150
5070
Deletion: Sparse Index– delete records 30 & 40
CS 245 28
2010
4030
6050
8070
10203040
50607080
Deletion: Dense Index
CS 245 29
2010
4030
6050
8070
10203040
50607080
Deletion: Dense Index– delete record 30
CS 245 30
2010
4030
6050
8070
10203040
50607080
40
Deletion: Dense Index– delete record 30
CS 245 31
2010
4030
6050
8070
10203040
50607080
4040
Deletion: Dense Index– delete record 30
CS 245 32
2010
30
5040
60
10304060
Insertion: Sparse Index– insert record 34
CS 245 33
2010
30
5040
60
10304060
Insertion: Sparse Index– insert record 34
CS 245 34
2010
30
5040
60
10304060 34
• our lucky day!we have free spacewhere we need it!
Insertion: Sparse Index– insert record 34
CS 245 35
2010
30
5040
60
10304060
Insertion: Sparse Index– insert record 15
CS 245 36
2010
30
5040
60
10304060
15
2030
20
Insertion: Sparse Index– insert record 15
CS 245 37
2010
30
5040
60
10304060
15
2030
20
• Illustrated: Immediate reorganization• Variation:
– insert new block (chained file)– update index
Insertion: Sparse Index– insert record 15
CS 245 38
2010
30
5040
60
10304060
Insertion: Sparse Index– insert record 25
CS 245 39
2010
30
5040
60
10304060
25
overflow blocks(reorganize later...)
Insertion: Sparse Index– insert record 25
CS 245 40
Orderingfield
5030
7020
4080
10100
6090
Secondary Indexes
CS 245 41
Orderingfield
5030
7020
4080
10100
6090
Sparse index: 302080
100
90...
Secondary Indexes
CS 245 42
Sparse index:
Secondary Indexes Orderingfield
5030
7020
4080
10100
6090
302080
100
90...
does not make sense!
CS 245 43
Orderingfield
5030
7020
4080
10100
6090
10203040
506070...
Secondary IndexesDense index:
CS 245 44
Orderingfield
5030
7020
4080
10100
6090
10203040
506070...
105090...
Sparsehigherlevel
Dense index:
Secondary Indexes
CS 245 45
Lowest level is dense
Other levels are sparse
Pointers are record pointers (not block)
With Secondary Indexes
CS 245 46
1020
4020
4010
4010
4030
10203040
5060...
buckets
Duplicate Values in Secondary Indexes
CS 245 47
Can compute complex queries through Boolean operations on record pointer lists
Consider an employee table with foreign keys for department and floor:
Another Benefit of Buckets
EmpID Name DeptID FloorID
1 Alice 2 1
2 Bob 2 2
FloorID …
1
2
DeptID …
1
2
CS 245 48
Query: Get Employees in (Toy Dept) AND (2nd floor)
Dept. index Employee Floor index
Toy 2nd
CS 245 49
Intersect “Toy” bucket and “2nd floor” buckets to get list of matching employees
This Idea is Used in Text Information Retrieval
Documents
...the cat is fat ...
...was rainingcats and dogs...
...Fido the dog ...
CS 245 50
This Idea is Used in Text Information Retrieval
Documents
...the cat is fat ...
...was rainingcats and dogs...
...Fido the dog ...
Inverted lists
cat
dog
CS 245 51
cat Title 5
Title 100
Author 10Abstract 57
Title 12
d3d2
d1
dog
typepositio
nlocation
Common Technique: More Info in Index Entries
Answer queries like “cat within 5 words of dog”CS 245 52
Pros:- Simple- Index is sequential file (good for scans)
Cons:- Inserts expensive, and/or- Lose sequentiality & balance
Conventional Indexes
CS 245 53
Some Types of Indexes
Conventional indexes
B-trees
Hash indexes
Multi-key indexing
CS 245 56
B-Trees
Another type of index» Give up on sequentiality of index» Try to get “balance”
Note: the exact data structure we’ll look at is a B+ tree, but plain old “B-trees” are similar
CS 245 57
B+ Tree ExampleRoot
100
120
150
180
30
3 5 11 30 35 100
101
110
120
130
150
156
179
180
200
(n = 3)
CS 245 58
to keys to keys to keys to keys< 57 57£ k<81 81£k<95 ³95
57 81 95
Sample Non-Leaf
CS 245 59
From non-leaf node
to next leafin sequence
57 81 95
To re
cord
w
ith k
ey 5
7To
reco
rd
with
key
81
To re
cord
w
ith k
ey 9
5
Sample Leaf Node
CS 245 60
Size of Nodes on Disk
n + 1 pointersn keys
(Fixed size nodes)
CS 245 61
Use at least
Non-leaf: é(n+1)/2ù pointers
Leaf: ë(n+1)/2û pointers to data
Don’t Want Nodes to be Too Empty
CS 245 62
Example: n = 3Full node min. node
Non-leaf
Leaf
120
150
180
30
3 5 11 30 35
CS 245 63
1. All leaves are at same lowest level (balanced tree)
2. Pointers in leaves point to records, except for “sequence pointer”
B+ Tree Rules (tree of order n)
CS 245 64
(3) Number of pointers/keys for B+ tree:
* When there is only one record in the B+ tree, min pointersin the root is 1 (the other pointers are null)
Non-leaf(non-root) n+1 n é(n+1)/2ù é(n+1)/2ù-1
Leaf(non-root) n+1 n
Root n+1 n 2* 1
Max Max Min Min ptrs keys ptrs®data keys
ë(n+1)/2û ë(n+1)/2û
B+ Tree Rules (tree of order n)
CS 245 65
Insert Into B+ Tree
(a) simple case» space available in leaf
(b) leaf overflow
(c) non-leaf overflow
(d) new root
CS 245 66
(a) Insert key = 32 n=33 5 11 30 31
30
100
CS 245 67
(a) Insert key = 32 n=33 5 11 30 31
30
100
32
CS 245 68
(a) Insert key = 7 n=3
3 5 11 30 31
30
100
CS 245 69
(a) Insert key = 7 n=3
3 5 11 30 31
30
100
3 5
7
CS 245 70
(a) Insert key = 7 n=3
3 5 11 30 31
30
100
3 5
7
7
CS 245 71
(c) Insert key = 160 n=3
100
120
150
180
150
156
179
180
200
CS 245 72
(c) Insert key = 160 n=3
100
120
150
180
150
156
179
180
200
160
179
CS 245 73
(c) Insert key = 160 n=3
100
120
150
180
150
156
179
180
200
180
160
179
CS 245 74
(c) Insert key = 160 n=3
100
120
150
180
150
156
179
180
200
160
180
160
179
CS 245 75
(d) New root, insert 45 n=3
10 20 30
1 2 3 10 12 20 25 30 32 40CS 245 76
(d) New root, insert 45 n=3
10 20 30
1 2 3 10 12 20 25 30 32 40 40 45
CS 245 77
(d) New root, insert 45 n=3
10 20 30
1 2 3 10 12 20 25 30 32 40 40 45
40CS 245 78
(d) New root, insert 45 n=3
10 20 30
1 2 3 10 12 20 25 30 32 40 40 45
40
30new root
CS 245 79
Deletion from B+tree
(a) Simple case: no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf
CS 245 80
(b) Coalesce with sibling» Delete 50
10 40 100
10 20 30 40 50
n=4
CS 245 81
(b) Coalesce with sibling» Delete 50
10 40 100
10 20 30 40 50
n=4
40
CS 245 82
(c) Redistribute keys» Delete 50
10 40 100
10 20 30 35 40 50
n=4
CS 245 83
(c) Redistribute keys» Delete 50
10 40 100
10 20 30 35 40 50
n=4
35
35
CS 245 84
40 4530 3725 2620 2210 141 3
10 20 30 40
(d) Non-leaf coalesce– Delete 37
n=4
25
CS 245 85
40 4530 3725 2620 2210 141 3
10 20 30 40
(d) Non-leaf coalesce– Delete 37
n=4
30
25
CS 245 86
40 4530 3725 2620 2210 141 3
10 20 30 40
(d) Non-leaf coalesce– Delete 37
n=4
40
30
25
CS 245 87
40 4530 3725 2620 2210 141 3
10 20 30 40
(d) Non-leaf coalesce– Delete 37
n=4
40
30
25
25
new root
CS 245 88
B+ Tree Deletion in Practice
Often, coalescing is not implemented» Too hard and not worth it! (Most datasets just
grow in size over time.)
CS 245 89
Interesting Problem:
For B+ tree, how large should n be?
…
n is number of keys / node
CS 245 90
Sample Assumptions:
(1) Time to read node from disk is(S + Tn) msec.
CS 245 91
Sample Assumptions:
(1) Time to read node from disk is(S + Tn) msec.
(2) Once block in memory, use binarysearch to locate key:
(a + b log2 n) msec.For some constants a, b; Assume a << S
CS 245 92
Sample Assumptions:
(3) Assume B+tree is full, i.e., # nodes toexamine is logn N where N = # records
(1) Time to read node from disk is(S + Tn) msec.
(2) Once block in memory, use binarysearch to locate key:
(a + b log2 n) msec.For some constants a, b; Assume a << S
CS 245 93
Can Get:f(n) = time to find a record
f(n)
nopt n
CS 245 94
Find nopt by setting f’(n) = 0
Answer is nopt = “a few hundred” in practice
CS 245 95
Exercise
f(n) = logn N * (S + T n + a + b log2 n)
S = 14000 μsT = 0.2 μsb = 0.002 μsa = 0 μsN = 10,000,000
CS 245 96
N = 10 Million RecordsS= 14000T= 0.2b= 0.002a= 0N= 10,000,000
times in microseconds
n
CS 245 97
N = 100 Million RecordsS= 14000T= 0.2b= 0.002a= 0N= 100,000,000
times in microseconds
n
CS 245 98
Some Types of Indexes
Conventional indexes
B-trees
Hash indexes
Multi-key indexing
CS 245 100
Hash Indexes
key h(key)
record / ptr
...
Buckets(block sized)
Buckets can contain records or pointers to file
overflowbucket
CS 245 101
Chaining is used to handle bucket overflow
Hash vs Tree Indexes
+ O(1) instead of O(log N) disk accesses
– Can’t efficiently do range queries
CS 245 102
Challenge: Resizing
Hash tables try to keep occupancy in a fixed range (50-80%) and slow down beyond that» Too much chaining
How to resize the table when this happens?» In memory: just move everything, amortized
cost is pretty low» On disk: moving everything is expensive!
CS 245 103
Extendible Hashing
Tree-like design for hash tables that allows cheap resizing while requiring 2 IOs / access
CS 245 104
Extendible Hashing: 2 Ideas
(a) Use i of b bits output by hash function
b
h(K) ®
i
i will grow over time; the first i bits of each key’s hash are used to map it to a bucket
00110101
CS 245 105
(b) Use a directory with pointers to buckets
h(K)[0..i] to bucket...
...
Extendible Hashing: 2 Ideas
CS 245 106
Example: 4-bit h(K), 2 keys/bucket
i = 11
1
0001
10011100
CS 245 107
local depthglobal depth
0010
Insert 0010
Example: 4-bit h(K), 2 keys/bucket
i = 11
1
0001
10011100
Insert 1010
CS 245 108
local depthglobal depth
i = 11
1
0001
10011100
Insert 101011100
1010
Example: 4-bit h(K), 2 keys/bucket
CS 245 109
i = 11
1
0001
10011100
Insert 101011100
1010
New directory
200
01
10
11
i =
2
2
Example: 4-bit h(K), 2 keys/bucket
CS 245 110
10001
210011010
21100
Insert:
0111
0000
00
01
10
11
2i =
Example
CS 245 111
10001
210011010
21100
Insert:
0111
0000
00
01
10
11
2i =
0111
0000
0111
0001
Example
CS 245 112
10001
210011010
21100
Insert:
0111
0000
00
01
10
11
2i =
0111
0000
0111
0001
2
2Example
CS 245 113
00
01
10
11
2i =
210011010
21100
20111
200000001
Example
Note: still need chaining if values
of h(K) repeat and fill a bucket
CS 245 114
Some Types of Indexes
Conventional indexes
B-trees
Hash indexes
Multi-key indexing
CS 245 115
Motivation
Example: find records where
DEPT = “Toy” AND SALARY > 50k
CS 245 116
Strategy I:
Use one index, say Dept.
Get all Dept = “Toy” recordsand check their salary
I1
CS 245 117
Strategy II:
Use 2 indexes; manipulate pointers
Toy Sal> 50k
CS 245 118
Strategy III:
Multi-key index
One idea:
I1
I2
I3
CS 245 119
Example
ExampleRecord
DeptIndex
SalaryIndex
Name=JoeDEPT=SalesSALARY=15k
ArtSalesToy
10k15k17k21k
12k15k15k19k
CS 245 120
h
nb
i a
co
de
g
f
m
l
kj
k-d Tree
CS 245 121
Splits dimensions in any order to hold k-dimensional data
h
nb
i a
co
d
10 20
10 20
e
g
f
m
l
kj
CS 245 122
k-d Tree
h
nb
i a
co
d
10 20
10 20
e
g
f
m
l
kj25 15 35 20
40
30
20
10
CS 245 123
k-d Tree
h
nb
i a
co
d
10 20
10 20
e
g
f
m
l
kj25 15 35 20
40
30
20
10
5
15 15
CS 245 124
k-d Tree
h
nb
i a
co
d
10 20
10 20
e
g
f
m
l
kj25 15 35 20
40
30
20
10
5
15 15
h i a bcd efg
n omlj k
Efficient range queries in both
dimensionsCS 245 125
k-d Tree
Summary
Wide range of indexes for different data types and queries (e.g. range vs exact)
Key concerns: query time, cost to update, and size of index
Next: given all these storage data structures, how do we run our queries?
CS 245 126