Database Systems
15-445/15-645
Fall 2018
Andy PavloComputer Science Carnegie Mellon Univ.AP
Lecture #08
Trees Indexes(Part II)
https://db.cs.cmu.edu/http://www.cs.cmu.edu/~pavlo/
CMU 15-445/645 (Fall 2018)
ADMINISTRIVIA
Project #1 is due Wednesday Sept 26th @ 11:59pm
Homework #2 is due Friday Sept 28th @ 11:59pm
Project #2 will be released on Wednesday Sept 26th. First checkpoint is due Monday Oct 8th.
2
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
TODAY'S AGENDA
Additional Index Usage
Skip Lists
Radix Trees
Inverted Indexes
3
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
IMPLICIT INDEXES
Most DBMSs automatically create an index to enforce integrity constraints.→ Primary Keys→ Unique Constraints→ Foreign Keys (?)
4
CREATE TABLE foo (id SERIAL PRIMARY KEY,val1 INT NOT NULL,val2 VARCHAR(32) UNIQUE
);
CREATE UNIQUE INDEX foo_pkeyON foo (id);
CREATE UNIQUE INDEX foo_val2_keyON foo (val2);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
IMPLICIT INDEXES
Most DBMSs automatically create an index to enforce integrity constraints.→ Primary Keys→ Unique Constraints→ Foreign Keys (?)
4
CREATE TABLE foo (id SERIAL PRIMARY KEY,val1 INT NOT NULL,val2 VARCHAR(32) UNIQUE
);
CREATE TABLE bar (id INT REFERENCES foo (val1),val VARCHAR(32)
);
CREATE INDEX foo_val1_keyON foo (val1);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
IMPLICIT INDEXES
Most DBMSs automatically create an index to enforce integrity constraints.→ Primary Keys→ Unique Constraints→ Foreign Keys (?)
4
CREATE TABLE foo (id SERIAL PRIMARY KEY,val1 INT NOT NULL,val2 VARCHAR(32) UNIQUE
);
CREATE TABLE bar (id INT REFERENCES foo (val1),val VARCHAR(32)
);
CREATE INDEX foo_val1_keyON foo (val1);
X
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
IMPLICIT INDEXES
Most DBMSs automatically create an index to enforce integrity constraints.→ Primary Keys→ Unique Constraints→ Foreign Keys (?)
4
CREATE TABLE foo (id SERIAL PRIMARY KEY,val1 INT NOT NULL UNIQUE,val2 VARCHAR(32) UNIQUE
);
CREATE TABLE bar (id INT REFERENCES foo (val1),val VARCHAR(32)
);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
PARTIAL INDEXES
Create an index on a subset of the entire table. This potentially reduces its size and the amount of overheadto maintain it.
One common use case is to partition indexes by date ranges.→ Create a separate index per month, year.
5
CREATE INDEX idx_fooON foo (a, b)
WHERE c = 'WuTang';
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
PARTIAL INDEXES
Create an index on a subset of the entire table. This potentially reduces its size and the amount of overheadto maintain it.
One common use case is to partition indexes by date ranges.→ Create a separate index per month, year.
5
CREATE INDEX idx_fooON foo (a, b)
WHERE c = 'WuTang';
SELECT b FROM fooWHERE a = 123AND c = 'WuTang';
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
COVERING INDEXES
If all of the fields needed to process the query are available in an index, then the DBMS does not need to retrieve the tuple.
This reduces contention on the DBMS's buffer pool resources.
6
SELECT b FROM fooWHERE a = 123;
CREATE INDEX idx_fooON foo (a, b);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
INDEX INCLUDE COLUMNS
Embed additional columns in indexes to support index-only queries.
Not part of the search key.
7
CREATE INDEX idx_fooON foo (a, b)
INCLUDE (c);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
INDEX INCLUDE COLUMNS
Embed additional columns in indexes to support index-only queries.
Not part of the search key.
7
SELECT b FROM fooWHERE a = 123AND c = 'WuTang';
CREATE INDEX idx_fooON foo (a, b)
INCLUDE (c);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
INDEX INCLUDE COLUMNS
Embed additional columns in indexes to support index-only queries.
Not part of the search key.
7
SELECT b FROM fooWHERE a = 123AND c = 'WuTang';
CREATE INDEX idx_fooON foo (a, b)
INCLUDE (c);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
FUNCTIONAL/EXPRESSION INDEXES
The index does not need to store keys in the same way that they appear in their base table.
8
SELECT * FROM usersWHERE EXTRACT(dow
⮱FROM login) = 2;
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
FUNCTIONAL/EXPRESSION INDEXES
The index does not need to store keys in the same way that they appear in their base table.
8
SELECT * FROM usersWHERE EXTRACT(dow
⮱FROM login) = 2;
CREATE INDEX idx_user_loginON users (login);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
FUNCTIONAL/EXPRESSION INDEXES
The index does not need to store keys in the same way that they appear in their base table.
You can use expressions when declaring an index.
8
SELECT * FROM usersWHERE EXTRACT(dow
⮱FROM login) = 2;
CREATE INDEX idx_user_loginON users (login);X
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
FUNCTIONAL/EXPRESSION INDEXES
The index does not need to store keys in the same way that they appear in their base table.
You can use expressions when declaring an index.
8
SELECT * FROM usersWHERE EXTRACT(dow
⮱FROM login) = 2;
CREATE INDEX idx_user_loginON users (login);
CREATE INDEX idx_user_loginON users (EXTRACT(dow FROM login));
X
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
FUNCTIONAL/EXPRESSION INDEXES
The index does not need to store keys in the same way that they appear in their base table.
You can use expressions when declaring an index.
8
SELECT * FROM usersWHERE EXTRACT(dow
⮱FROM login) = 2;
CREATE INDEX idx_user_loginON users (login);
CREATE INDEX idx_user_loginON users (EXTRACT(dow FROM login));
X
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
FUNCTIONAL/EXPRESSION INDEXES
The index does not need to store keys in the same way that they appear in their base table.
You can use expressions when declaring an index.
8
SELECT * FROM usersWHERE EXTRACT(dow
⮱FROM login) = 2;
CREATE INDEX idx_user_loginON users (login);
CREATE INDEX idx_user_loginON users (EXTRACT(dow FROM login));
X
CREATE INDEX idx_user_loginON foo (login)
WHERE EXTRACT(dow FROM login) = 2;
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
OBSERVATION
The easiest way to implement a dynamic order-preserving index is to use a sorted linked list.
All operations have to linear search.→ Average Cost: O(N)
9
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
K1 K2 K3 K4 K6K5 K7
OBSERVATION
The easiest way to implement a dynamic order-preserving index is to use a sorted linked list.
All operations have to linear search.→ Average Cost: O(N)
9
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
K1 K2 K3 K4 K6K5 K7
OBSERVATION
The easiest way to implement a dynamic order-preserving index is to use a sorted linked list.
All operations have to linear search.→ Average Cost: O(N)
9
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
SKIP LISTS
Multiple levels of linked lists with extra pointers that skip over intermediate nodes.
Maintains keys in sorted order without requiring global rebalancing.
10
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/http://dl.acm.org/citation.cfm?id=78977
CMU 15-445/645 (Fall 2018)
SKIP LISTS
A collection of lists at different levels→ Lowest level is a sorted, singly linked list of all keys→ 2nd level links every other key→ 3rd level links every fourth key→ In general, a level has half the keys of one below it
To insert a new key, flip a coin to decide how many levels to add the new key into.Provides approximate O(log n) search times.
11
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: EXAMPLE
12
∞
∞
∞
P=N
P=N/2
P=N/4
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: EXAMPLE
12
∞
∞
∞
P=N
P=N/2
P=N/4
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: EXAMPLE
12
∞
∞
∞
P=N
P=N/2
P=N/4
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: EXAMPLE
12
∞
∞
∞
P=N
P=N/2
P=N/4
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: EXAMPLE
12
∞
∞
∞
P=N
P=N/2
P=N/4
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: EXAMPLE
12
∞
∞
∞
P=N
P=N/2
P=N/4
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: INSERT
13
∞
∞
∞
P=N
P=N/2
P=N/4
Insert K5
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: INSERT
13
∞
∞
∞
P=N
P=N/2
P=N/4
Insert K5
K5
K5
K5
V5
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: INSERT
13
∞
∞
∞
P=N
P=N/2
P=N/4
Insert K5
K5
K5
K5
V5
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: INSERT
13
∞
∞
∞
P=N
P=N/2
P=N/4
Insert K5
K5
K5
K5
V5
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: INSERT
13
∞
∞
∞
P=N
P=N/2
P=N/4
Insert K5
K5
K5
K5
V5
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
EndLevels
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
SKIP LISTS: INSERT
13
∞
∞
∞
P=N
P=N/2
P=N/4
Insert K5
K5
K5
K5
V5
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: SEARCH
14
∞
∞
∞
P=N
P=N/2
P=N/4
Find K3
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: SEARCH
14
∞
∞
∞
P=N
P=N/2
P=N/4
Find K3
K3
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: SEARCH
14
∞
∞
∞
P=N
P=N/2
P=N/4
Find K3
K3K2
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: SEARCH
14
∞
∞
∞
P=N
P=N/2
P=N/4
Find K3
K3K2 K3
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: SEARCH
14
∞
∞
∞
P=N
P=N/2
P=N/4
Find K3
K3K2 K3
CMU 15-445/645 (Fall 2018)
SKIP LISTS: DELETE
First logically remove a key from the index by setting a flag to tell threads to ignore.
Then physically remove the key once we know that no other thread is holding the reference.
15
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: DELETE
16
∞
∞
∞
P=N
P=N/2
P=N/4
Delete K5
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: DELETE
16
∞
∞
∞
P=N
P=N/2
P=N/4
Delete K5
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Deltrue
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: DELETE
16
∞
∞
∞
P=N
P=N/2
P=N/4
Delete K5
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Deltrue
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: DELETE
16
∞
∞
∞
P=N
P=N/2
P=N/4
Delete K5
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Deltrue
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K5
V5
K5
K5
K6
V6
Levels
SKIP LISTS: DELETE
16
∞
∞
∞
P=N
P=N/2
P=N/4
Delete K5
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
Deltrue
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
End
K1
V1
K2
K2
V2
K3
V3
K4
V4
K4
K6
V6
Levels
SKIP LISTS: DELETE
16
∞
∞
∞
P=N
P=N/2
P=N/4
Delete K5
Delfalse
Delfalse
Delfalse
Delfalse
Delfalse
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
SKIP LISTS
Advantages:→ Uses less memory than a typical B+Tree if you don’t
include reverse pointers.→ Insertions and deletions do not require rebalancing.
Disadvantages:→ Not disk/cache friendly because they do not optimize
locality of references.→ Reverse search is non-trivial.
17
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE
Represent keys as individual digits. This allows threads to examine prefixes one-by-one instead of comparing entire key.→ The height of the tree depends on the length of keys.→ Does not require rebalancing→ The path to a leaf node represents the key of the leaf→ Keys are stored implicitly and can be reconstructed from
paths.
18
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
TRIE VS. RADIX TREE
19
Keys: HELLO, HAT, HAVE
Trie
EH
L
¤
L
O
A
¤T
¤
V
E
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
TRIE VS. RADIX TREE
19
Keys: HELLO, HAT, HAVE
Trie
EH
L
¤
L
O
A
¤T
¤
V
E
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
TRIE VS. RADIX TREE
19
Keys: HELLO, HAT, HAVE
Trie
EH
L
¤
L
O
A
¤T
¤
V
E
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
TRIE VS. RADIX TREE
19
Keys: HELLO, HAT, HAVE
Trie
EH
L
¤
L
O
A
¤T
¤
V
E
Radix Tree
ELLOH
¤
A
¤T
¤
VE
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
TRIE VS. RADIX TREE
19
Keys: HELLO, HAT, HAVE
Trie
EH
L
¤
L
O
A
¤T
¤
V
E
Radix Tree
ELLOH
¤
A
¤T
¤
VE
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: MODIFICATIONS
20
¤
ELLO
¤¤
TVE
H
A
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: MODIFICATIONS
20
¤
ELLO
¤¤
TVE
H
A
Insert HAIR
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: MODIFICATIONS
20
¤
ELLO
¤¤
TVE
H
A
¤
IR
Insert HAIR
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: MODIFICATIONS
20
¤
ELLO
¤¤
TVE
H
A
¤
IR
Insert HAIR
Delete HAT, HAVE
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: MODIFICATIONS
20
¤
ELLO
H
A
¤
IR
Insert HAIR
Delete HAT, HAVE
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: MODIFICATIONS
20
¤
ELLO
H
A
¤
IR
Insert HAIR
Delete HAT, HAVE
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: MODIFICATIONS
20
¤
ELLO
H
A
Insert HAIR
Delete HAT, HAVEAIR
¤
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: BINARY COMPARABLE KEYS
Not all attribute types can be decomposed into binary comparable digits for a radix tree.→ Unsigned Integers: Byte order must be flipped for little
endian machines.→ Signed Integers: Flip two’s-complement so that negative
numbers are smaller than positive.→ Floats: Classify into group (neg vs. pos, normalized vs.
denormalized), then store as unsigned integer.→ Compound: Transform each attribute separately.
21
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: BINARY COMPARABLE KEYS
22
Hex Key: 0A 0B 0C 0D
Int Key: 168496141
0A
0B
0C
0D
Big Endian
0D
0C
0B
0A
Little Endian
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: BINARY COMPARABLE KEYS
22
Hex Key: 0A 0B 0C 0D
Int Key: 168496141
0A
0B
0C
0D
Big Endian
0D
0C
0B
0A
Little Endian
0F0F0F0A
¤
0B
¤
0B0F¤
OF0F
¤
¤¤
0C
0F 0D
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
RADIX TREE: BINARY COMPARABLE KEYS
22
Hex Key: 0A 0B 0C 0D
Int Key: 168496141
0A
0B
0C
0D
Big Endian
0D
0C
0B
0A
Little Endian
0F0F0F0A
¤
0B
¤
0B0F¤
OF0F
¤
¤¤
0C
0F 0D
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
IN-MEMORY TABLE INDEXES
23
9.9415.5 13.3
8.09
2925.1
2.51 2.78 1.51
44.951.5
42.9
0
20
40
60
Insert-Only Read-Only Read/Update
Op
era
tio
ns/
sec
(M)
Open Bw-Tree B+Tree Skip List Radix
Processor: 1 socket, 10 cores w/ 2×HTWorkload: 50m Random Integer Keys (64-bit)
Source: Ziqi Wang
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/https://github.com/wangziqi2016/index-microbench
CMU 15-445/645 (Fall 2018)
OBSERVATION
The tree indexes that we've discussed so far are useful for "point" and "range" queries:→ Find all customers in the 15217 zip code.→ Find all orders between June 2018 and September 2018.
They are not good at keyword searches:→ Find all Wikipedia articles that contain the word "Pavlo"
24
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
WIKIPEDIA EXAMPLE
25
CREATE TABLE revisions (revID INT PRIMARY KEY,userID INT REFERENCES useracct (userID), pageID INT REFERENCES pages (pageID),content TEXT,updated DATETIME
);
CREATE TABLE pages (pageID INT PRIMARY KEY,title VARCHAR UNIQUE,latest INT⮱REFERENCES revisions (revID),
);
CREATE TABLE useracct (userID INT PRIMARY KEY,userName VARCHAR UNIQUE,⋮
);
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
WIKIPEDIA EXAMPLE
If we create an index on the content attribute, what does that actually do?
This doesn't help our query.
Our SQL is also not correct...
26
CREATE INDEX idx_rev_cntntON revisions (content);
SELECT pageID FROM revisionsWHERE content LIKE '%Pavlo%';
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
INVERTED INDEX
An inverted index stores a mapping of words to records that contain those words in the target attribute.→ Sometimes called a full-text search index.→ Also called a concordance in old (like really old) times.
The major DBMSs support these natively.There are also specialized DBMSs.
27
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
QUERY T YPES
Phrase Searches→ Find records that contain a list of words in the given
order.
Proximity Searches→ Find records where two words occur within n words of
each other.
Wildcard Searches→ Find records that contain words that match some pattern
(e.g., regular expression).
28
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
DESIGN DECISIONS
Decision #1: What To Store→ The index needs to store at least the words contained in
each record (separated by punctuation characters).→ Can also store frequency, position, and other meta-data.
Decision #2: When To Update→ Maintain auxiliary data structures to "stage" updates and
then update the index in batches.
29
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/
CMU 15-445/645 (Fall 2018)
CONCLUSION
B+Trees are still the way to go for tree indexes.
Inverted indexes are covered in CMU 11-442.
We did not discuss geo-spatial tree indexes:→ Examples: R-Tree, Quad-Tree, KD-Tree→ This is covered in CMU 15-826.
31
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/https://boston.lti.cs.cmu.edu/classes/11-642/http://www.cs.cmu.edu/~christos/courses/826.S17/
CMU 15-445/645 (Fall 2018)
NEXT CL ASS
How to make indexes thread-safe!
32
https://db.cs.cmu.edu/https://15445.courses.cs.cmu.edu/Top Related