1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

50
1 CPS216: Advanced CPS216: Advanced Database Systems Database Systems Notes 04: Operators for Notes 04: Operators for Data Access Data Access Shivnath Babu Shivnath Babu

Transcript of 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Page 1: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

1

CPS216: Advanced CPS216: Advanced Database SystemsDatabase Systems

Notes 04: Operators for Data Notes 04: Operators for Data AccessAccess

Shivnath BabuShivnath Babu

Page 2: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

ProblemProblem

Relation: Employee (ID, Name, Dept, Relation: Employee (ID, Name, Dept, …)…)

10 M tuples10 M tuples (Filter) Query:(Filter) Query:

SELECT *SELECT * FROM Employee FROM Employee WHERE Name = “Bob” WHERE Name = “Bob”

Page 3: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

3

Solution #1: Full Table Solution #1: Full Table ScanScan

Storage:Storage: Employee relation stored in Employee relation stored in contiguouscontiguous

blocksblocks Query plan:Query plan:

Scan the entire relation, output tuples with Scan the entire relation, output tuples with Name = “Bob”Name = “Bob”

Cost:Cost: Size of each record = 100 bytesSize of each record = 100 bytes Size of relation = 10 M x 100 = 1 GBSize of relation = 10 M x 100 = 1 GB Time @ 20 MB/s ≈ 1 Minute Time @ 20 MB/s ≈ 1 Minute

Page 4: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

4

Solution #2Solution #2

Storage:Storage: Employee relation Employee relation sortedsorted on Name on Name

attributeattribute Query plan:Query plan:

Binary searchBinary search

Page 5: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

5

Solution #2Solution #2

Cost:Cost: Size of a block: 1024 bytesSize of a block: 1024 bytes Number of records per block: 1024 / Number of records per block: 1024 /

100 = 10100 = 10 Total number of blocks: 10 M / 10 = 1 Total number of blocks: 10 M / 10 = 1

MM Blocks accessed by binary search: 20Blocks accessed by binary search: 20 Total time: 20 ms x 20 = 400 msTotal time: 20 ms x 20 = 400 ms

Page 6: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

6

Solution #2: IssuesSolution #2: Issues

Filters on different attributes:Filters on different attributes:

SELECT * SELECT * FROM EmployeeFROM EmployeeWHERE Dept = “Sales”WHERE Dept = “Sales”

Inserts and DeletesInserts and Deletes

Page 7: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

7

IndexesIndexes

Data structures that efficiently evaluate Data structures that efficiently evaluate a class of filter predicates over a relationa class of filter predicates over a relation

Class of filter predicates:Class of filter predicates: Single or multi-attributes (Single or multi-attributes (index-key index-key

attributesattributes)) Range and/or equality predicatesRange and/or equality predicates

(Usually) independent of physical (Usually) independent of physical storage of relation:storage of relation: Multiple indexes per relationMultiple indexes per relation

Page 8: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

8

IndexesIndexes

Disk residentDisk resident Large to fit in memoryLarge to fit in memory PersistentPersistent

Updated when indexed relation Updated when indexed relation updatedupdated Relation updates costlierRelation updates costlier Query cheaperQuery cheaper

Page 9: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

ProblemProblem

Relation: Employee (ID, Name, Dept, Relation: Employee (ID, Name, Dept, …)…)

(Filter) Query:(Filter) Query: SELECT *SELECT * FROM Employee FROM Employee WHERE Name = “Bob” WHERE Name = “Bob”

Single-Attribute Index on NameName that supports equality predicates

Page 10: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

10

RoadmapRoadmap

MotivationMotivation Single-Attribute Indexes: OverviewSingle-Attribute Indexes: Overview Order-based IndexesOrder-based Indexes

B-TreesB-Trees Hash-based IndexesHash-based Indexes

Extensible HashingExtensible Hashing Linear HashingLinear Hashing

Multi-Attribute Indexes (Chapter 14 Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future)GMUW, May cover in future)

Page 11: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Single Attribute Index: General Single Attribute Index: General ConstructionConstruction

b1

2b

ib

nb

a1

2a

ia

na

A B

Page 12: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Single Attribute Index: General Single Attribute Index: General ConstructionConstruction

b1

2b

ib

nb

a1

2a

ia

na

a1

2a

ia

na

A B

A = val

A > lowA < high

Page 13: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

13

ExceptionsExceptions

Sparse IndexesSparse Indexes Require specific physical layout of Require specific physical layout of

relationrelation Example: Relation sorted on indexed Example: Relation sorted on indexed

attributeattribute More efficientMore efficient

Page 14: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

14

Single Attribute Index: General Single Attribute Index: General ConstructionConstruction

b1

2b

ib

nb

a1

2a

ia

na

a1

2a

ia

na

A B

A = val

A > lowA < high

Textbook: Dense Index

Page 15: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Single Attribute Index: General Single Attribute Index: General ConstructionConstruction

a1

2a

ia

na

A = val

A > lowA < high

How do we organize(attribute, pointer) pairs?

Idea: Use dictionary data structures

Issue: Disk resident?

Page 16: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

16

RoadmapRoadmap

MotivationMotivation Single-Attribute Indexes: OverviewSingle-Attribute Indexes: Overview Order-based IndexesOrder-based Indexes

B-TreesB-Trees Hash-based IndexesHash-based Indexes

Extensible HashingExtensible Hashing Linear HashingLinear Hashing

Multi-Attribute Indexes (Next class)Multi-Attribute Indexes (Next class)

Page 17: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

17

B-TreesB-Trees

Adaptation of search tree data Adaptation of search tree data structurestructure 2-3 trees2-3 trees

Supports range predicates (and Supports range predicates (and equality)equality)

Page 18: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Use Binary Search Tree Use Binary Search Tree Directly?Directly?

16 32 54 71 74 83 92

16 74

71

54

32

92

83

Page 19: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

19

Use Binary Search Tree Use Binary Search Tree Directly?Directly?

Store records of type Store records of type <key, left-ptr, right-ptr, data-ptr><key, left-ptr, right-ptr, data-ptr>

Remember position of rootRemember position of root Question: will this work?Question: will this work?

YesYes But we can do better!But we can do better!

Page 20: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

20

Use Binary Search Tree Use Binary Search Tree Directly?Directly?

Number of keys: 1 MNumber of keys: 1 M Number of levels: log (2^20) = 20Number of levels: log (2^20) = 20 Total cost index lookup: 20 random Total cost index lookup: 20 random

disk I/Odisk I/O 20 x 20 ms = 400 ms20 x 20 ms = 400 ms

B-Tree: less than 3 random disk I/OB-Tree: less than 3 random disk I/O

Page 21: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

21

B-Tree vs. Binary Search B-Tree vs. Binary Search TreeTree

k k1 k2 k3 k40

1 Random I/O prunes tree by half

1 Random I/O prunes tree by 40

Page 22: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

22

B-Tree ExampleB-Tree Example

15 36 57 63 76 87 92 100

Page 23: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

23

B-Tree ExampleB-Tree Example

null

63

36

15 36 57

84

63 76 87

91

92 100

Page 24: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

24

Meaning of Internal Meaning of Internal NodeNode

84 91

key < 84 84 ≤ key < 91 91 ≤ key

Page 25: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

25

B-Tree ExampleB-Tree Example

null

63

36

15 36 57

84

63 76 87

91

92 100

Page 26: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

26

Meaning of Leaf NodesMeaning of Leaf Nodes

63 76

pointer to record 63pointer to record 76

Next leaf

Page 27: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

27

Equality PredicatesEquality Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

key = 87

Page 28: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

28

Equality PredicatesEquality Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

key = 87

Page 29: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

29

Equality PredicatesEquality Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

key = 87

Page 30: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

30

Equality PredicatesEquality Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

key = 87

Page 31: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

31

Range PredicatesRange Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

57 ≤ key < 95

Page 32: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

32

Range PredicatesRange Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

57 ≤ key < 95

Page 33: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

33

Range PredicatesRange Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

57 ≤ key < 95

Page 34: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

34

Range PredicatesRange Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

57 ≤ key < 95

Page 35: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

35

Range PredicatesRange Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

57 ≤ key < 95

Page 36: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

36

Range PredicatesRange Predicates

null

63

36

15 36 57

84

63 76 87

91

92 100

57 ≤ key < 95

Page 37: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

37

General B-TreesGeneral B-Trees

Fixed parameter: nFixed parameter: n Number of keys: nNumber of keys: n Number of pointers: n + 1Number of pointers: n + 1

Page 38: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

38

B-Tree ExampleB-Tree Example

null

63

36

15 36 57

84

63 76 87

91

92 100

n = 2

Page 39: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

39

General B-TreesGeneral B-Trees

Fixed parameter: nFixed parameter: n Number of keys: nNumber of keys: n Number of pointers: n + 1Number of pointers: n + 1 All leaves at same depthAll leaves at same depth All (key, record pointer) in leavesAll (key, record pointer) in leaves

Page 40: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

40

B-Tree ExampleB-Tree Example

null

63

36

15 36 57

84

63 76 87

91

92 100

n = 2

Page 41: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

41

General B-Trees: General B-Trees: Space related constraintsSpace related constraints

Use at leastUse at least

Root: 2 pointersRoot: 2 pointers

Internal:Internal: (n+1)/2(n+1)/2pointerspointers

Leaf:Leaf: (n+1)/2(n+1)/2 pointers to pointers to datadata

Page 42: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

42

n=3

5 15 21 15

31 42 56 31 42

Internal

Leaf

Max Min

Page 43: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

43

Leaf NodesLeaf Nodes

n key slots

(n+1) pointer slots

Page 44: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

44

Leaf NodesLeaf Nodes

n key slots

(n+1) pointer slots

k kk1 2 mk3 … …

unused

record of k1record of k

2

… …

record of k m

…nextleaf

Page 45: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

45

Leaf NodesLeaf Nodes

n key slots

(n+1) pointer slots

k kk1 2 mk3 … …

unused

record of k1record of k

2

… …

record of k m

m ≥ (n+1)

2

nextleaf

Page 46: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

46

Internal NodesInternal Nodes

n key slots

(n+1) pointer slots

Page 47: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

47

Internal NodesInternal Nodes

n key slots

(n+1) pointer slots

k kk1 2 mk3

key < k 1k ≤ key < k1 2 k ≤ keym

unused

Page 48: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

48

Internal NodesInternal Nodes

n key slots

(n+1) pointer slots

k kk1 2 mk3

key < k 1k ≤ key < k1 2 k ≤ keym

unused(m+1) ≥

(n+1)

2

Page 49: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

49

Root NodeRoot Node

n key slots

(n+1) pointer slots

k kk1 2 mk3

key < k 1k ≤ key < k1 2 k ≤ keym

unused(m+1) ≥2

Page 50: 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

50

LimitsLimits

Why the specific limitsWhy the specific limits (n+1)/2(n+1)/2 and and (n+1)/2(n+1)/2 ? ?

Why different limits for leaf and Why different limits for leaf and internal nodes?internal nodes?

Can we reduce each limit?Can we reduce each limit? Can we increase each limit?Can we increase each limit? What are the implications?What are the implications?