1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.

Post on 18-Jan-2016

220 views 0 download

Tags:

Transcript of 1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.

1

CPS216: Advanced CPS216: Advanced Database SystemsDatabase Systems

Notes 05: Operators for Data Notes 05: Operators for Data Access (contd.)Access (contd.)Shivnath BabuShivnath Babu

2

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36

Insert: 62

3

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36

Insert: 62

62

4

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36 62

Insert: 50

5

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36 50

Insert: 50

62

62

6

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36 50

Insert: 75

62

62

7

Insertion in a B-TreeInsertion in a B-Tree

49

49

n = 2

15 36 50

Insert: 75

62

62

75

8

InsertionInsertion

9

InsertionInsertion

10

InsertionInsertion

11

InsertionInsertion

12

InsertionInsertion

13

InsertionInsertion

14

InsertionInsertion

15

InsertionInsertion

16

InsertionInsertion

17

InsertionInsertion

18

InsertionInsertion

19

Insertion: PrimitivesInsertion: Primitives

Inserting into a leaf nodeInserting into a leaf node Splitting a leaf nodeSplitting a leaf node Splitting an internal nodeSplitting an internal node Splitting root nodeSplitting root node

20

Inserting into a Leaf Inserting into a Leaf NodeNode

54 57 60 62

58

21

Inserting into a Leaf Inserting into a Leaf NodeNode

54 57 60 62

58

22

Inserting into a Leaf Inserting into a Leaf NodeNode

54 57 60 62

58

58

23

61

54 57 60 6258

54 66

Splitting a Leaf NodeSplitting a Leaf Node

24

61

54 57 60 6258

54 66

Splitting a Leaf NodeSplitting a Leaf Node

25

61

54 57 61 6258

54 66

60

Splitting a Leaf NodeSplitting a Leaf Node

26

61

54 57 61 6258

54 66

60

59

Splitting a Leaf NodeSplitting a Leaf Node

27

61

54 57 61 6258

54 66

60

59

Splitting a Leaf NodeSplitting a Leaf Node

59

54 6640

[ 59, 66)[54, 59)

74 84

9921 ……

[66,74)

Splitting an Internal NodeSplitting an Internal Node

59

54 6640 74 84

9921 ……

[ 59, 66)[54, 59) [66,74)

Splitting an Internal NodeSplitting an Internal Node

5954

66

40 74 84

9921 ……

[66, 99)

[ 59, 66)[54, 59)

[21,66)

[66,74)

Splitting an Internal NodeSplitting an Internal Node

54 6640 74 84

59

[ 59, 66)[54, 59) [66,74)

Splitting the RootSplitting the Root

54 6640 74 84

59

[ 59, 66)[54, 59) [66,74)

Splitting the RootSplitting the Root

54

66

40 74 8459

[ 59, 66)[54, 59) [66,74)

Splitting the RootSplitting the Root

34

DeletionDeletion

35

DeletionDeletion

redistribute

36

DeletionDeletion

37

Deletion - IIDeletion - II

Deletion - IIDeletion - II

merge

39

Deletion - IIDeletion - II

40

Deletion - IIDeletion - II

41

Deletion - IIDeletion - II

42

Deletion - IIDeletion - II

merge

Not needed

43

Deletion - IIDeletion - II

44

Deletion: PrimitivesDeletion: Primitives

Delete key from a leafDelete key from a leaf Redistribute keys between sibling Redistribute keys between sibling

leavesleaves Merge a leaf into its siblingMerge a leaf into its sibling Redistribute keys between two Redistribute keys between two

sibling internal nodessibling internal nodes Merge an internal node into its Merge an internal node into its

siblingsibling

45

Merge Leaf into SiblingMerge Leaf into Sibling

54 58 64 68 72 75

67 85…72

46

Merge Leaf into SiblingMerge Leaf into Sibling

54 58 64 68 75

67…72 85

47

Merge Leaf into SiblingMerge Leaf into Sibling

54 58 64 68 75

67…72 85

48

Merge Leaf into SiblingMerge Leaf into Sibling

54 58 64 68 75

…72 85

49

Merge Internal Node into Merge Internal Node into SiblingSibling

41 48 52 63 74

59

[52, 59) [59,63)

……

50

Merge Internal Node into Merge Internal Node into SiblingSibling

41 48 52 63

59

[52, 59) [59,63)

59

……

51

B-Tree RoadmapB-Tree Roadmap

B-TreeB-Tree RecapRecap Insertion (recap)Insertion (recap) DeletionDeletion ConstructionConstruction EfficiencyEfficiency

B-Tree variantsB-Tree variants Hash-based IndexesHash-based Indexes

52

QuestionQuestion

How does insertion-based constructionperform?

53

B-Tree ConstructionB-Tree Construction

11 1315 21 344148 57 6275 81 97

Sort

B-Tree ConstructionB-Tree Construction

75 9721 41 571511 13 4834 62 81

Scan

75 81 9711 13 15 21 34 41 48 57 62

B-Tree ConstructionB-Tree Construction

21 48 75

11 13 15 21 34 41 48 57 62 75 81 97

Scan

56

B-Tree ConstructionB-Tree Construction

Why is sort-based construction better thaninsertion-based one?

57

Cost of B-Tree Cost of B-Tree OperationsOperations

Height of B-Tree: HHeight of B-Tree: H Assume no duplicatesAssume no duplicates Question: what is the random I/O Question: what is the random I/O

cost of:cost of: Insertion:Insertion: Deletion:Deletion: Equality search:Equality search: Range Search: Range Search:

58

Height of B-TreeHeight of B-Tree

Number of keys: NNumber of keys: N B-Tree parameter: nB-Tree parameter: n

Height ≈ log N = Height ≈ log N = nn

log Nlog N

log nlog n

In practice: 2-3 levelsIn practice: 2-3 levels

59

Question: How do you pick parameter n? Question: How do you pick parameter n?

1.1. Ignore inserts and deletesIgnore inserts and deletes2.2. Optimize for equality searchesOptimize for equality searches3.3. Assume no duplicatesAssume no duplicates

60

RoadmapRoadmap

B-TreeB-Tree B-Tree variantsB-Tree variants

Sparse IndexSparse Index Duplicate KeysDuplicate Keys

Hash-based IndexesHash-based Indexes

61

RoadmapRoadmap

B-TreeB-Tree B-Tree variantsB-Tree variants Hash-based IndexesHash-based Indexes

Static Hash TableStatic Hash Table Extensible Hash TableExtensible Hash Table Linear Hash TableLinear Hash Table

62

Hash-Based IndexesHash-Based Indexes

Adaptations of main memory hash Adaptations of main memory hash tablestables

Support equality searchesSupport equality searches No range searchesNo range searches

Indexing Problem (recap)Indexing Problem (recap)

a1

2a

ia

na

A = val

Index Keysrecord pointers

64

Main Memory Hash Main Memory Hash TableTable

buckets

32

(null)

(null)

(null)

(null)

(null)

10

48

27 75

21

55

0

3

1

2

4

5

6

7

keyh (key)

h (key) = key % 8

65

Adapting to diskAdapting to disk

1 Hash Bucket = 1 Block1 Hash Bucket = 1 Block All keys that hash to bucket stored in All keys that hash to bucket stored in

the blockthe block Intuition: keys in a bucket usually Intuition: keys in a bucket usually

accessed togetheraccessed together No need for linked lists of keys …No need for linked lists of keys …

66

Adapting to DiskAdapting to Disk

How do we handle this?

67

Adapting to diskAdapting to disk

1 Hash Bucket = 1 Block1 Hash Bucket = 1 Block All keys that hash to bucket stored in All keys that hash to bucket stored in

the blockthe block Intuition: keys in a bucket usually Intuition: keys in a bucket usually

accessed togetheraccessed together No need for linked lists of keys …No need for linked lists of keys … … … but need linked list of blocks but need linked list of blocks

((overflow blocksoverflow blocks))

68

Adapting to DiskAdapting to Disk

69

Adapting to DiskAdapting to Disk

0

1

2

Is there any otherissue?

Map ‘bucket id’to disk location

70

Adapting to diskAdapting to disk

1 Hash Bucket = 1 Block1 Hash Bucket = 1 Block Bucket Id Bucket Id Disk Address mapping Disk Address mapping

Contiguous blocksContiguous blocks Store mapping in main memoryStore mapping in main memory

Too large?Too large?

71

Beware of claims that assume 1 I/O Beware of claims that assume 1 I/O for hash tables and 3 I/Os for B-Tree!!for hash tables and 3 I/Os for B-Tree!!

72

Adapting to diskAdapting to disk

1 Hash Bucket = 1 Block 1 Hash Bucket = 1 Block (or more than one contiguous (or more than one contiguous blocks)blocks)

Bucket Id Bucket Id Disk Address mapping Disk Address mapping Number of bucketsNumber of buckets

≈ ≈ Number of keys (main memory Number of keys (main memory version)version)

≈ ≈ Number of blocks (disk version)Number of blocks (disk version)Textbook: Static Hash TableTextbook: Static Hash Table

73

Assigned ReadingAssigned Reading

Insertion and Deletion on Static Hash TableInsertion and Deletion on Static Hash TableSection 13.4Section 13.4

74

RoadmapRoadmap

B-TreeB-Tree B-Tree variantsB-Tree variants Hash-based IndexesHash-based Indexes

Static Hash TableStatic Hash Table Extensible Hash TableExtensible Hash Table Linear Hash TableLinear Hash Table

75

Dynamic Hash IndexesDynamic Hash Indexes

Static Hash Table:Static Hash Table: Fixed number of bucketsFixed number of buckets Waste space / inefficientWaste space / inefficient

Dynamic Hash Tables:Dynamic Hash Tables: Number of buckets can increase / Number of buckets can increase /

decrease dynamicallydecrease dynamically

76

Extensible Hash Table: Extensible Hash Table: Main Ideas (Abstract)Main Ideas (Abstract)

Hash Function: {Keys} Hash Function: {Keys} {Large {Large space of hash values}space of hash values}

Buckets Buckets dynamicallydynamically partition space partition space of hash valuesof hash values

Insertions: partitioning grows finerInsertions: partitioning grows finer i.e., more bucketsi.e., more buckets

Deletions: partitioning grows coarserDeletions: partitioning grows coarser i.e., fewer bucketsi.e., fewer buckets

77

Extensible Hash Table:Extensible Hash Table:Main Ideas (concrete)Main Ideas (concrete)

Hash Function: {Keys} Hash Function: {Keys} bit string of length b bit string of length b

0 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 Example:Example:

Bucket: Bucket: prefixprefix of bit string of bit string

All (keys with) hash values having that prefixAll (keys with) hash values having that prefixfall into that bucketfall into that bucket

11

0

10

01011010

01100110

10110001

10011010

11011110

prefixesHash Value bucket?

11

0

10

01011010

01100110

10110001

10011010

11011110

00

01

10

11

i = 2

i = max length of prefix

80

i = 0

.

Insertion

81

i = 0

.10110001

Insertion

82

i = 0

.

1011000110110001

Insertion

83

i = 0

.

10110001

00110101

00110101

Insertion

84

i = 0

.

10110001

00110101

11010010

Insertion

85

i = 0

0

10110001

00110101

11010010

1

Insertion

86

i = 0

0

10110001

00110101

11010010

1

Insertion

87

i = 1

0

10110001

00110101

11010010

10

1

Insertion

88

i = 1

0

10110001

00110101

11010010

10

1

11010010

Insertion

89

i = 1

0

10110001

00110101

11010010

10

1

11001101

Insertion

90

i = 1

0

10110001

00110101

11010010

10

1

11001101

Insertion

91

i = 1

0

10110001

00110101

11010010

100

1

11001101

11

Insertion

92

i = 1

0

10110001

00110101

11010010

100

1

11001101

11

Insertion

93

i = 2

0

10110001

00110101

11010010

1000

11001101

11

01

10

11

Insertion

94

i = 2

0

10110001

00110101

11010010

1000

11001101

11

01

10

11

11001101

Insertion

95

DeletionDeletion

Inverse of insertion: work out detailsInverse of insertion: work out details

96

i = 2

1

00

01

10

11

Textbook NotationTextbook Notation

Number of bits in prefix

0

97

Extensible Hash TableExtensible Hash Table

Directory doubles in size during some insertsDirectory doubles in size during some inserts

One Issue:One Issue:

98

RoadmapRoadmap

B-TreeB-Tree B-Tree variantsB-Tree variants Hash-based IndexesHash-based Indexes

Static Hash TableStatic Hash Table Extensible Hash TableExtensible Hash Table Linear Hash TableLinear Hash Table

99

Linear Hash TableLinear Hash Table

Differences from Extensible Hash Differences from Extensible Hash Table:Table: Bucket: Bucket: suffixsuffix of the hash value of the hash value Grows linearly Grows linearly

(avoids doubling of directory)(avoids doubling of directory)

10

00

1

01011000

01100100

10110001

10011001

11011110

suffixes

Linear Hash TableLinear Hash Table

101

0

1

Linear GrowthLinear Growth

102

00

1

10redistribute

Linear GrowthLinear Growth

00

01

10

11

redistribute

Linear GrowthLinear Growth

104

What does linear growth What does linear growth buy?buy?

000

01

10

11

100

i = 3

101

000

001

010

011

100

110

111

Redundant if we know # buckets = 5

105

What does linear growth What does linear growth buy?buy?

000

01

10

11

100

i = 3

000

001

010

011

100

i = 3n = 3