CPS216: Advanced Database Systems

105
1 CPS216: Advanced CPS216: Advanced Database Systems Database Systems Notes 05: Operators for Notes 05: Operators for Data Access (contd.) Data Access (contd.) Shivnath Babu Shivnath Babu

description

CPS216: Advanced Database Systems. Notes 05: Operators for Data Access (contd.) Shivnath Babu. Insertion in a B-Tree. n = 2. 49. 36. 49. 15. Insert: 62. Insertion in a B-Tree. n = 2. 49. 36. 49. 62. 15. Insert: 62. Insertion in a B-Tree. n = 2. 49. 36. 49. 62. 15. - PowerPoint PPT Presentation

Transcript of CPS216: Advanced Database Systems

Page 1: CPS216: Advanced Database Systems

1

CPS216: Advanced CPS216: Advanced Database SystemsDatabase Systems

Notes 05: Operators for Data Notes 05: Operators for Data Access (contd.)Access (contd.)Shivnath BabuShivnath Babu

Page 2: CPS216: Advanced Database Systems

2

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36

Insert: 62

Page 3: CPS216: Advanced Database Systems

3

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36

Insert: 62

62

Page 4: CPS216: Advanced Database Systems

4

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36 62

Insert: 50

Page 5: CPS216: Advanced Database Systems

5

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36 50

Insert: 50

62

62

Page 6: CPS216: Advanced Database Systems

6

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36 50

Insert: 75

62

62

Page 7: CPS216: Advanced Database Systems

7

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36 50

Insert: 75

62

62

75

Page 8: CPS216: Advanced Database Systems

8

InsertionInsertion

Page 9: CPS216: Advanced Database Systems

9

InsertionInsertion

Page 10: CPS216: Advanced Database Systems

10

InsertionInsertion

Page 11: CPS216: Advanced Database Systems

11

InsertionInsertion

Page 12: CPS216: Advanced Database Systems

12

InsertionInsertion

Page 13: CPS216: Advanced Database Systems

13

InsertionInsertion

Page 14: CPS216: Advanced Database Systems

14

InsertionInsertion

Page 15: CPS216: Advanced Database Systems

15

InsertionInsertion

Page 16: CPS216: Advanced Database Systems

16

InsertionInsertion

Page 17: CPS216: Advanced Database Systems

17

InsertionInsertion

Page 18: CPS216: Advanced Database Systems

18

InsertionInsertion

Page 19: CPS216: Advanced Database Systems

19

Insertion: PrimitivesInsertion: Primitives

Inserting into a leaf nodeInserting into a leaf node Splitting a leaf nodeSplitting a leaf node Splitting an internal nodeSplitting an internal node Splitting root nodeSplitting root node

Page 20: CPS216: Advanced Database Systems

20

Inserting into a Leaf Inserting into a Leaf NodeNode

54 57 60 62

58

Page 21: CPS216: Advanced Database Systems

21

Inserting into a Leaf Inserting into a Leaf NodeNode

54 57 60 62

58

Page 22: CPS216: Advanced Database Systems

22

Inserting into a Leaf Inserting into a Leaf NodeNode

54 57 60 62

58

58

Page 23: CPS216: Advanced Database Systems

23

61

54 57 60 6258

54 66

Splitting a Leaf NodeSplitting a Leaf Node

Page 24: CPS216: Advanced Database Systems

24

61

54 57 60 6258

54 66

Splitting a Leaf NodeSplitting a Leaf Node

Page 25: CPS216: Advanced Database Systems

25

61

54 57 61 6258

54 66

60

Splitting a Leaf NodeSplitting a Leaf Node

Page 26: CPS216: Advanced Database Systems

26

61

54 57 61 6258

54 66

60

59

Splitting a Leaf NodeSplitting a Leaf Node

Page 27: CPS216: Advanced Database Systems

27

61

54 57 61 6258

54 66

60

59

Splitting a Leaf NodeSplitting a Leaf Node

Page 28: CPS216: Advanced Database Systems

59

54 6640

[ 59, 66)[54, 59)

74 84

9921 ……

[66,74)

Splitting an Internal NodeSplitting an Internal Node

Page 29: CPS216: Advanced Database Systems

59

54 6640 74 84

9921 ……

[ 59, 66)[54, 59) [66,74)

Splitting an Internal NodeSplitting an Internal Node

Page 30: CPS216: Advanced Database Systems

5954

66

40 74 84

9921 ……

[66, 99)

[ 59, 66)[54, 59)

[21,66)

[66,74)

Splitting an Internal NodeSplitting an Internal Node

Page 31: CPS216: Advanced Database Systems

54 6640 74 84

59

[ 59, 66)[54, 59) [66,74)

Splitting the RootSplitting the Root

Page 32: CPS216: Advanced Database Systems

54 6640 74 84

59

[ 59, 66)[54, 59) [66,74)

Splitting the RootSplitting the Root

Page 33: CPS216: Advanced Database Systems

54

66

40 74 8459

[ 59, 66)[54, 59) [66,74)

Splitting the RootSplitting the Root

Page 34: CPS216: Advanced Database Systems

34

DeletionDeletion

Page 35: CPS216: Advanced Database Systems

35

DeletionDeletion

redistribute

Page 36: CPS216: Advanced Database Systems

36

DeletionDeletion

Page 37: CPS216: Advanced Database Systems

37

Deletion - IIDeletion - II

Page 38: CPS216: Advanced Database Systems

Deletion - IIDeletion - II

merge

Page 39: CPS216: Advanced Database Systems

39

Deletion - IIDeletion - II

Page 40: CPS216: Advanced Database Systems

40

Deletion - IIDeletion - II

Page 41: CPS216: Advanced Database Systems

41

Deletion - IIDeletion - II

Page 42: CPS216: Advanced Database Systems

42

Deletion - IIDeletion - II

merge

Not needed

Page 43: CPS216: Advanced Database Systems

43

Deletion - IIDeletion - II

Page 44: CPS216: Advanced Database Systems

44

Deletion: PrimitivesDeletion: Primitives

Delete key from a leafDelete key from a leaf Redistribute keys between sibling Redistribute keys between sibling

leavesleaves Merge a leaf into its siblingMerge a leaf into its sibling Redistribute keys between two Redistribute keys between two

sibling internal nodessibling internal nodes Merge an internal node into its Merge an internal node into its

siblingsibling

Page 45: CPS216: Advanced Database Systems

45

Merge Leaf into SiblingMerge Leaf into Sibling

54 58 64 68 72 75

67 85…72

Page 46: CPS216: Advanced Database Systems

46

Merge Leaf into SiblingMerge Leaf into Sibling

54 58 64 68 75

67…72 85

Page 47: CPS216: Advanced Database Systems

47

Merge Leaf into SiblingMerge Leaf into Sibling

54 58 64 68 75

67…72 85

Page 48: CPS216: Advanced Database Systems

48

Merge Leaf into SiblingMerge Leaf into Sibling

54 58 64 68 75

…72 85

Page 49: CPS216: Advanced Database Systems

49

Merge Internal Node into Merge Internal Node into SiblingSibling

41 48 52 63 74

59

[52, 59) [59,63)

……

Page 50: CPS216: Advanced Database Systems

50

Merge Internal Node into Merge Internal Node into SiblingSibling

41 48 52 63

59

[52, 59) [59,63)

59

……

Page 51: CPS216: Advanced Database Systems

51

B-Tree RoadmapB-Tree Roadmap

B-TreeB-Tree RecapRecap Insertion (recap)Insertion (recap) DeletionDeletion ConstructionConstruction EfficiencyEfficiency

B-Tree variantsB-Tree variants Hash-based IndexesHash-based Indexes

Page 52: CPS216: Advanced Database Systems

52

QuestionQuestion

How does insertion-based constructionperform?

Page 53: CPS216: Advanced Database Systems

53

B-Tree ConstructionB-Tree Construction

11 1315 21 344148 57 6275 81 97

Sort

Page 54: CPS216: Advanced Database Systems

B-Tree ConstructionB-Tree Construction

75 9721 41 571511 13 4834 62 81

Scan

75 81 9711 13 15 21 34 41 48 57 62

Page 55: CPS216: Advanced Database Systems

B-Tree ConstructionB-Tree Construction

21 48 75

11 13 15 21 34 41 48 57 62 75 81 97

Scan

Page 56: CPS216: Advanced Database Systems

56

B-Tree ConstructionB-Tree Construction

Why is sort-based construction better thaninsertion-based one?

Page 57: CPS216: Advanced Database Systems

57

Cost of B-Tree Cost of B-Tree OperationsOperations

Height of B-Tree: HHeight of B-Tree: H Assume no duplicatesAssume no duplicates Question: what is the random I/O Question: what is the random I/O

cost of:cost of: Insertion:Insertion: Deletion:Deletion: Equality search:Equality search: Range Search: Range Search:

Page 58: CPS216: Advanced Database Systems

58

Height of B-TreeHeight of B-Tree

Number of keys: NNumber of keys: N B-Tree parameter: nB-Tree parameter: n

Height ≈ log N = Height ≈ log N = nn

log Nlog N

log nlog n

In practice: 2-3 levelsIn practice: 2-3 levels

Page 59: CPS216: Advanced Database Systems

59

Question: How do you pick parameter n? Question: How do you pick parameter n?

1.1. Ignore inserts and deletesIgnore inserts and deletes2.2. Optimize for equality searchesOptimize for equality searches3.3. Assume no duplicatesAssume no duplicates

Page 60: CPS216: Advanced Database Systems

60

RoadmapRoadmap

B-TreeB-Tree B-Tree variantsB-Tree variants

Sparse IndexSparse Index Duplicate KeysDuplicate Keys

Hash-based IndexesHash-based Indexes

Page 61: CPS216: Advanced Database Systems

61

RoadmapRoadmap

B-TreeB-Tree B-Tree variantsB-Tree variants Hash-based IndexesHash-based Indexes

Static Hash TableStatic Hash Table Extensible Hash TableExtensible Hash Table Linear Hash TableLinear Hash Table

Page 62: CPS216: Advanced Database Systems

62

Hash-Based IndexesHash-Based Indexes

Adaptations of main memory hash Adaptations of main memory hash tablestables

Support equality searchesSupport equality searches No range searchesNo range searches

Page 63: CPS216: Advanced Database Systems

Indexing Problem (recap)Indexing Problem (recap)

a1

2a

ia

na

A = val

Index Keysrecord pointers

Page 64: CPS216: Advanced Database Systems

64

Main Memory Hash Main Memory Hash TableTable

buckets

32

(null)

(null)

(null)

(null)

(null)

10

48

27 75

21

55

0

3

1

2

4

5

6

7

keyh (key)

h (key) = key % 8

Page 65: CPS216: Advanced Database Systems

65

Adapting to diskAdapting to disk

1 Hash Bucket = 1 Block1 Hash Bucket = 1 Block All keys that hash to bucket stored in All keys that hash to bucket stored in

the blockthe block Intuition: keys in a bucket usually Intuition: keys in a bucket usually

accessed togetheraccessed together No need for linked lists of keys …No need for linked lists of keys …

Page 66: CPS216: Advanced Database Systems

66

Adapting to DiskAdapting to Disk

How do we handle this?

Page 67: CPS216: Advanced Database Systems

67

Adapting to diskAdapting to disk

1 Hash Bucket = 1 Block1 Hash Bucket = 1 Block All keys that hash to bucket stored in All keys that hash to bucket stored in

the blockthe block Intuition: keys in a bucket usually Intuition: keys in a bucket usually

accessed togetheraccessed together No need for linked lists of keys …No need for linked lists of keys … … … but need linked list of blocks but need linked list of blocks

((overflow blocksoverflow blocks))

Page 68: CPS216: Advanced Database Systems

68

Adapting to DiskAdapting to Disk

Page 69: CPS216: Advanced Database Systems

69

Adapting to DiskAdapting to Disk

0

1

2

Is there any otherissue?

Map ‘bucket id’to disk location

Page 70: CPS216: Advanced Database Systems

70

Adapting to diskAdapting to disk

1 Hash Bucket = 1 Block1 Hash Bucket = 1 Block Bucket Id Bucket Id Disk Address mapping Disk Address mapping

Contiguous blocksContiguous blocks Store mapping in main memoryStore mapping in main memory

Too large?Too large?

Page 71: CPS216: Advanced Database Systems

71

Beware of claims that assume 1 I/O Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!for hash tables and 3 I/Os for B-Tree!!

Page 72: CPS216: Advanced Database Systems

72

Adapting to diskAdapting to disk

1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block (or more than one contiguous (or more than one contiguous blocks)blocks)

Bucket Id Bucket Id Disk Address mapping Disk Address mapping Number of bucketsNumber of buckets

≈ ≈ Number of keys (main memory Number of keys (main memory version)version)

≈ ≈ Number of blocks (disk version)Number of blocks (disk version)Textbook: Static Hash TableTextbook: Static Hash Table

Page 73: CPS216: Advanced Database Systems

73

Assigned ReadingAssigned Reading

Insertion and Deletion on Static Hash TableInsertion and Deletion on Static Hash TableSection 13.4Section 13.4

Page 74: CPS216: Advanced Database Systems

74

RoadmapRoadmap

B-TreeB-Tree B-Tree variantsB-Tree variants Hash-based IndexesHash-based Indexes

Static Hash TableStatic Hash Table Extensible Hash TableExtensible Hash Table Linear Hash TableLinear Hash Table

Page 75: CPS216: Advanced Database Systems

75

Dynamic Hash IndexesDynamic Hash Indexes

Static Hash Table:Static Hash Table: Fixed number of bucketsFixed number of buckets Waste space / inefficientWaste space / inefficient

Dynamic Hash Tables:Dynamic Hash Tables: Number of buckets can increase / Number of buckets can increase /

decrease dynamicallydecrease dynamically

Page 76: CPS216: Advanced Database Systems

76

Extensible Hash Table: Extensible Hash Table: Main Ideas (Abstract)Main Ideas (Abstract)

Hash Function: {Keys} Hash Function: {Keys} {Large {Large space of hash values}space of hash values}

Buckets Buckets dynamicallydynamically partition space partition space of hash valuesof hash values

Insertions: partitioning grows finerInsertions: partitioning grows finer i.e., more bucketsi.e., more buckets

Deletions: partitioning grows coarserDeletions: partitioning grows coarser i.e., fewer bucketsi.e., fewer buckets

Page 77: CPS216: Advanced Database Systems

77

Extensible Hash Table:Extensible Hash Table:Main Ideas (concrete)Main Ideas (concrete)

Hash Function: {Keys} Hash Function: {Keys} bit string of length b bit string of length b

0 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 Example:Example:

Bucket: Bucket: prefixprefix of bit string of bit string

All (keys with) hash values having that prefixAll (keys with) hash values having that prefixfall into that bucketfall into that bucket

Page 78: CPS216: Advanced Database Systems

11

0

10

01011010

01100110

10110001

10011010

11011110

prefixesHash Value bucket?

Page 79: CPS216: Advanced Database Systems

11

0

10

01011010

01100110

10110001

10011010

11011110

00

01

10

11

i = 2

i = max length of prefix

Page 80: CPS216: Advanced Database Systems

80

i = 0

.

Insertion

Page 81: CPS216: Advanced Database Systems

81

i = 0

.10110001

Insertion

Page 82: CPS216: Advanced Database Systems

82

i = 0

.

1011000110110001

Insertion

Page 83: CPS216: Advanced Database Systems

83

i = 0

.

10110001

00110101

00110101

Insertion

Page 84: CPS216: Advanced Database Systems

84

i = 0

.

10110001

00110101

11010010

Insertion

Page 85: CPS216: Advanced Database Systems

85

i = 0

0

10110001

00110101

11010010

1

Insertion

Page 86: CPS216: Advanced Database Systems

86

i = 0

0

10110001

00110101

11010010

1

Insertion

Page 87: CPS216: Advanced Database Systems

87

i = 1

0

10110001

00110101

11010010

10

1

Insertion

Page 88: CPS216: Advanced Database Systems

88

i = 1

0

10110001

00110101

11010010

10

1

11010010

Insertion

Page 89: CPS216: Advanced Database Systems

89

i = 1

0

10110001

00110101

11010010

10

1

11001101

Insertion

Page 90: CPS216: Advanced Database Systems

90

i = 1

0

10110001

00110101

11010010

10

1

11001101

Insertion

Page 91: CPS216: Advanced Database Systems

91

i = 1

0

10110001

00110101

11010010

100

1

11001101

11

Insertion

Page 92: CPS216: Advanced Database Systems

92

i = 1

0

10110001

00110101

11010010

100

1

11001101

11

Insertion

Page 93: CPS216: Advanced Database Systems

93

i = 2

0

10110001

00110101

11010010

1000

11001101

11

01

10

11

Insertion

Page 94: CPS216: Advanced Database Systems

94

i = 2

0

10110001

00110101

11010010

1000

11001101

11

01

10

11

11001101

Insertion

Page 95: CPS216: Advanced Database Systems

95

DeletionDeletion

Inverse of insertion: work out detailsInverse of insertion: work out details

Page 96: CPS216: Advanced Database Systems

96

i = 2

1

00

01

10

11

Textbook NotationTextbook Notation

Number of bits in prefix

0

Page 97: CPS216: Advanced Database Systems

97

Extensible Hash TableExtensible Hash Table

Directory doubles in size during some insertsDirectory doubles in size during some inserts

One Issue:One Issue:

Page 98: CPS216: Advanced Database Systems

98

RoadmapRoadmap

B-TreeB-Tree B-Tree variantsB-Tree variants Hash-based IndexesHash-based Indexes

Static Hash TableStatic Hash Table Extensible Hash TableExtensible Hash Table Linear Hash TableLinear Hash Table

Page 99: CPS216: Advanced Database Systems

99

Linear Hash TableLinear Hash Table

Differences from Extensible Hash Differences from Extensible Hash Table:Table: Bucket: Bucket: suffixsuffix of the hash value of the hash value Grows linearly Grows linearly

(avoids doubling of directory)(avoids doubling of directory)

Page 100: CPS216: Advanced Database Systems

10

00

1

01011000

01100100

10110001

10011001

11011110

suffixes

Linear Hash TableLinear Hash Table

Page 101: CPS216: Advanced Database Systems

101

0

1

Linear GrowthLinear Growth

Page 102: CPS216: Advanced Database Systems

102

00

1

10redistribute

Linear GrowthLinear Growth

Page 103: CPS216: Advanced Database Systems

00

01

10

11

redistribute

Linear GrowthLinear Growth

Page 104: CPS216: Advanced Database Systems

104

What does linear growth What does linear growth buy?buy?

000

01

10

11

100

i = 3

101

000

001

010

011

100

110

111

Redundant if we know # buckets = 5

Page 105: CPS216: Advanced Database Systems

105

What does linear growth What does linear growth buy?buy?

000

01

10

11

100

i = 3

000

001

010

011

100

i = 3n = 3