Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing...

111
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and Sudarshan and notes by C. Faloutsos at CMU)

Transcript of Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing...

Page 1: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Temple University – CIS Dept.CIS331– Principles of Database Systems

V. Megalooikonomou

Indexing and Hashing I

(based on notes by Silberchatz, Korth, and Sudarshan and notes by C. Faloutsos at CMU)

Page 2: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

General Overview - rel. model Relational model - SQL

Formal & commercial query languages

Functional Dependencies Normalization Physical Design Indexing

Page 3: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing- overview primary / secondary indices index-sequential (ISAM) B - trees, B+ - trees hashing

static hashing dynamic hashing

Page 4: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Basic Concepts

Indexing mechanisms speed up access to desired data

E.g., author catalog in library Search Key - attribute to set of attributes used to

look up records in a file An index file consists of records (called index

entries) of the form

Index files are typically much smaller than the original file

Two basic kinds of indices: Ordered indices: search keys are stored in sorted order Hash indices: search keys are distributed uniformly across

“buckets” using a “hash function”

search-key pointer

Page 5: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing once the records are stored in a

file, how do you search efficiently? (e.g., ssn=123?)

STUDENTSsn Name Address

123 smith main str234 jones forbes ave125 tomson main str

Page 6: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing once the records are stored in a

file, how do you search efficiently?

brute force: retrieve all records, report the qualifying ones

better: use indices (pointers) to locate the records directly

Page 7: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing – main idea:

123125234

STUDENTSsn Name Address

123 smith main str234 jones forbes ave125 tomson main str

Page 8: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Measuring ‘goodness’ retrieval time?

insertion / deletion?

space overhead?

reorganization?

range queries?

Page 9: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Main concepts search keys are sorted in the index

file and point to the actual records

primary vs. secondary indices

Clustering (sparse) vs

non-clustering (dense) indices

Page 10: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing

STUDENTSsn Name Address

123 smith main str234 jones forbes ave678 tomson main str456 stevens forbes ave345 smith forbes ave

123234345456567

Primary key index: on primary key (no duplicates)

Page 11: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

forbes avemain str

secondary key index: duplicates may exist

Address-index

Page 12: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

forbes avemain str

secondary key index: typically, with ‘postings lists’

Postings lists

Page 13: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Main concepts – cont’d Clustering (= sparse) index:

records are physically sorted on that key (and not all key values are needed in the index)

Non-clustering (=dense) index: the opposite

E.g.:

Page 14: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing- Sparse index

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

123456

Clustering/sparse index on ssn

>=123

>=456

Page 15: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Sparse Index Files Sparse Index: contains index records for only some

search-key values Applicable when records are sequentially ordered on search-

key To locate a record with search-key value K we:

Find index record with largest search-key value < K Search file sequentially starting at the record to which the

index record points Less space and less maintenance overhead for

insertions and deletions Generally slower than dense index for locating records Good tradeoff: sparse index with an index entry for

every block in file, corresponding to least search-key value in the block

Page 16: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing – Dense Index

Ssn Name Address345 tomson main str234 jones forbes ave567 smith forbes ave456 stevens forbes ave123 smith main str

123234345456567

Non-clustering / dense index

Page 17: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Summary

Dense Sparse

Primary usual

secondary

usual rare

• All combinations are possible…

• at most one sparse/clustering index

• as many as desired dense indices

• usually: one primary-key index (maybe clustering) and a few secondary-key indices (non-clustering)

Page 18: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Indexing- overview primary / secondary indices index-sequential (ISAM) B - trees, B+ - trees hashing

static hashing dynamic hashing

Page 19: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

ISAM What if index is too large to search

sequentially?

use a multilevel index…

Page 20: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

ISAM

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

123456

>=123

>=456

1233,423

block

Page 21: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

ISAM - observations if index is too large, store it on disk

and keep index-on-the-index usually two levels of indices, one first-level entry per disk block

(why? )

Page 22: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

ISAM - Multilevel Index

Page 23: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

ISAM - observations What about insertions/deletions?

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

123456

>=123

>=456

1233,423

124; peterson; fifth ave.

Page 24: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

ISAM - observations What about insertions/deletions?

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

123456

1233,423

…124; peterson; fifth ave.

overflows

Problems?

Page 25: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

ISAM - observations What about insertions/deletions?

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

123456

1233,423

…124; peterson; fifth ave.

overflows

• overflow chains may become very long - what to do?

Page 26: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

ISAM - observations What about insertions/deletions?

STUDENTSsn Name Address

123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave

123456

1233,423

…124; peterson; fifth ave.

overflows

• overflow chains may become very long - thus:

• shut-down & reorganize

• start with ~80% utilization

Page 27: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

So far … indices (like ISAM) suffer in the

presence of frequent updates sequential scan using primary index is

efficient, but a sequential scan using a secondary index is expensive each record access may fetch a new block

from disk

alternative indexing structure: B - trees

Page 28: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Overview primary / secondary indices multilevel (ISAM) B - trees, B+ - trees hashing

static hashing dynamic hashing

Page 29: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees the most successful family of

index schemes (B-trees, B+-trees, B*-trees)

can be used for primary/secondary, clustering/non-clustering index

they are balanced “n-way” search trees

Page 30: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees Disadvantage of indexed-sequential files:

performance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required

Advantage of B+-tree index files: automatic self-reorganization with small, local,

changes, in the face of insertions and deletions. Reorganization of entire file is not required

Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead

Advantages of B+-trees outweigh disadvantages, and they are used extensively

Page 31: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-treesE.g., B-tree of order 3 (i.e., at most 3 pointers from each

node):

1 3

6

7

9

13

<6

>6 <9 >9

Page 32: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-tree properties: each node, in a B-tree of order n :

key order at most n pointers at least n/2 pointers (except root) all leaves at the same level if number of pointers is k, then node has

exactly k-1 keys

v1 v2 … vn-1

p1 pn

Page 33: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Properties “block aware” nodes: each node -> disk

page

O(log (N)) for everything! (ins/del/search)

typically, if N = 50 - 100, then 2 - 3 levels

utilization >= 50%, guaranteed; on average 69%

Page 34: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Queries Algorithm for exact match query? (e.g., ssn=8?)

1 3

6

7

9

13

<6

>6 <9 >9

Page 35: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Queries Algorithm for exact match query? (e.g., ssn=8?)

1 3

6

7

9

13

<6

>6 <9 >9

Page 36: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Queries Algorithm for exact match query? (e.g., ssn=8?)

1 3

6

7

9

13

<6

>6 <9 >9

Page 37: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Queries Algorithm for exact match query? (e.g., ssn=8?)

1 3

6

7

9

13

<6

>6 <9 >9

Page 38: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Queries Algorithm for exact match query? (e.g., ssn=8?)

1 3

6

7

9

13

<6

>6 <9 >9H steps (= disk accesses)

Page 39: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Queries what about range queries? (e.g.,

5<salary<8) Proximity/ nearest neighbor

searches? (e.g., salary ~ 8 )

Page 40: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Queries what about range queries? (e.g.,

5<salary<8) Proximity/ nearest neighbor searches?

(e.g., salary ~ 8 )

1 3

6

7

9

13

<6

>6 <9 >9

Page 41: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Queries what about range queries? (eg.,

5<salary<8) Proximity/ nearest neighbor searches?

(eg., salary ~ 8 )

1 3

6

7

9

13

<6

>6 <9 >9

Page 42: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees: Insertion Insert in leaf;

on overflow, push middle up (recursively)

split: preserves B - tree properties

Page 43: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees

Easy case: Tree T0; insert ‘8’

1 3

6

7

9

13

<6

>6 <9 >9

Page 44: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees

Tree T0; insert ‘8’

1 3

6

7

9

13

<6

>6 <9 >9

8

Page 45: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees

Hardest case: Tree T0; insert ‘2’

1 3

6

7

9

13

<6

>6 <9 >9

2

Page 46: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees

Hardest case: Tree T0; insert ‘2’

1 2

6

7

9

133

push middle up

Page 47: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees

Hardest case: Tree T0; insert ‘2’

6

7

9

131 3

22Ovf; push middle

Page 48: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees

Hardest case: Tree T0; insert ‘2’

7

9

131 3

2

6

Final state

Page 49: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees - insertion Q: What if there are two

middles? (e.g., order 4) A: either one is fine

Page 50: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees: Insertion Insert in leaf; on overflow, push

middle up (recursively – ‘propagate split’)

split: preserves all B - tree properties (!!)

notice how it grows: height increases when root overflows & splits

Automatic, incremental re-organization (contrast with ISAM!)

Page 51: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

INSERTION OF KEY ’K’

find the correct leaf node ’L’;

if ( ’L’ overflows ){

split ’L’, by pushing the middle key upstairs to parent node ’P’;

if (’P’ overflows){

repeat the split recursively;

}

else{

add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */

}

Pseudo-code

Page 52: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Overview primary / secondary indices multilevel (ISAM) B – trees

Dfn, Search, insertion, deletion

B+ - trees hashing

Page 53: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Deletion

Rough outline of algorithm: Delete key; on underflow, may need to merge

In practice, some implementors just allow underflows to happen…

Page 54: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion

Easiest case: Tree T0; delete ‘3’

1 3

6

7

9

13

<6

>6 <9 >9

Page 55: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion

Easiest case: Tree T0; delete ‘3’

1

6

7

9

13

<6

>6 <9 >9

Page 56: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich

sibling’ Case4: delete leaf-key; underflow, and ‘poor

sibling’

Page 57: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case1: delete a key at a leaf – no underflow

(delete 3 from T0)

1 3

6

7

9

13

<6

>6 <9 >9

Page 58: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case2: delete a key at a non-leaf – no

underflow (e.g., delete 6 from T0)

1 3

6

7

9

13

<6

>6 <9 >9

Delete & promote, i.e:

Page 59: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case2: delete a key at a non-leaf – no

underflow (e.g., delete 6 from T0)

1 3 7

9

13

<6

>6 <9 >9

Delete & promote, i.e.:

Page 60: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case2: delete a key at a non-leaf – no

underflow (eg., delete 6 from T0)

1 7

9

13

<6

>6 <9 >9

Delete & promote, i.e.:3

Page 61: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case2: delete a key at a non-leaf – no

underflow (eg., delete 6 from T0)

1 7

9

13

<3

>3 <9 >9

3FINAL TREE

Page 62: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg.,

delete 6 from T0) Q: How to promote? A: pick the largest key from the left sub-tree (or the

smallest from the right sub-tree)

Observation:

Every deletion eventually becomes a deletion of a leaf key

Page 63: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and

‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor

sibling’

Page 64: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,

delete 7 from T0)

1 3

6

7

9

13

<6

>6 <9 >9

Delete & borrow, ie:

Page 65: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,

delete 7 from T0)

1 3

6 9

13

<6

>6 <9 >9

Delete & borrow, ie:

Rich sibling

Page 66: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case3: underflow & ‘rich sibling’

‘rich’ = can give a key, without underflowing

‘borrowing’ a key: always THROUGH the PARENT!

Page 67: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,

delete 7 from T0)

1 3

6 9

13

<6

>6 <9 >9

Delete & borrow, ie:

Rich sibling

NO!!

Page 68: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,

delete 7 from T0)

1 3

6 9

13

<6

>6 <9 >9

Delete & borrow, ie:

Page 69: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,

delete 7 from T0)

1

3 9

13

<6

>6 <9 >9

Delete & borrow, ie:

6

Page 70: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,

delete 7 from T0)

1

3 9

13

<3

>3 <9 >9

Delete & borrow, through the parent

6

FINAL TREE

Page 71: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich

sibling’ Case4: delete leaf-key; underflow, and

‘poor sibling’

Page 72: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,

delete 13 from T0)

1 3

6

7

9

13

<6

>6 <9 >9

Page 73: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,

delete 13 from T0)

1 3

6

7

9<6

>6 <9 >9

Page 74: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,

delete 13 from T0)

1 3

6

7

9<6

>6 <9 >9

A: merge w/ ‘poor’ sibling

Page 75: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,

delete 13 from T0)

Merge, by pulling a key from the parent exact reversal from insertion: ‘split and push

up’, vs. ‘merge and pull down’ Ie.:

Page 76: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,

delete 13 from T0)

1 3

6

7

<6

>6

A: merge w/ ‘poor’ sibling

9

Page 77: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,

delete 13 from T0)

1 3

6

7

<6

>69

FINAL TREE

Page 78: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees – Deletion Case4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows? A: repeat recursively

Page 79: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-tree deletion - pseudocodeDELETION OF KEY ’K’

locate key ’K’, in node ’N’

if( ’N’ is a non-leaf node) {

delete ’K’ from ’N’;

find the immediately largest key ’K1’;

/* which is guaranteed to be on a leaf node ’L’ */

copy ’K1’ in the old position of ’K’;

invoke this DELETION routine on ’K1’ from the leaf node ’L’;

else {

/* ’N’ is a leaf node */

... (next slide..)

Page 80: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-tree deletion - pseudocode/* ’N’ is a leaf node */ if( ’N’ underflows ){ let ’N1’ be the sibling of ’N’; if( ’N1’ is "rich"){ /* ie., N1 can lend us a key */ borrow a key from ’N1’ THROUGH the parent node; }else{ /* N1 is 1 key away from underflowing */ MERGE: pull the key from the parent ’P’, and merge it with the keys of ’N’ and ’N1’ into a new

node; if( ’P’ underflows){ repeat recursively } } }

Page 81: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees in practiceIn practice: no empty leaves; pointers to records

1 3

6

7

9

13

<6

>6 <9 >9theory

Page 82: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees in practiceIn practice: no empty leaves; pointers to records

1 3

6

7

9

13

<6

>6 <9 >9

practice

Page 83: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees in practiceIn practice:

1 3

6

7

9

13

<6

>6 <9 >9

Ssn ……

3

7

6

9

1

Page 84: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees in practice

In practice, the formats are:- leaf nodes: (v1, rp1, v2, rp2, … vn, rpn)- Non-leaf nodes: (p1, v1, rp1, p2, v2, rp2, …)

1 3

6

7

9

13

<6

>6 <9 >9

Page 85: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Overview primary / secondary indices multilevel (ISAM)

B – trees

B+ - trees

hashing

Page 86: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees - Motivation

B-tree – print keys in sorted order:

1 3

6

7

9

13

<6

>6 <9 >9

Page 87: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees - Motivation

B-tree needs back-tracking – how to avoid it?

1 3

6

7

9

13

<6

>6 <9 >9

Page 88: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Solution: B+ - trees Facilitate sequential ops

They string all leaf nodes together

AND

Replicate keys from non-leaf nodes, to make sure every key appears at the leaf level !!

Page 89: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees

1 3

6

6

9

9

<6

>=6 <9 >=9

7 13

Page 90: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+-Trees (Cont.)

All paths from root to leaf are of the same length

Each node that is not a root or a leaf has between [n/2] and n children

A leaf node has between [(n–1)/2] and n–1 values

Special cases: If the root is not a leaf, it has at least 2 children If the root is a leaf (that is, there are no other nodes

in the tree), it can have between 0 and (n–1) values

A B+-tree is a rooted tree satisfying the following properties:

Page 91: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+-Tree Node Structure Typical node

Ki are the search-key values Pi are pointers to children (for non-leaf

nodes) or pointers to records or buckets of records (for leaf nodes).

The search-keys in a node are ordered K1 < K2 < K3 < . . . < Kn–1

Page 92: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Leaf Nodes in B+-Trees - Properties

For i = 1, 2, . . ., n–1, pointer Pi either points to a file record with search-key value Ki, or to a bucket of pointers to file records, each record having search-key value Ki. Only need bucket structure if search-key does not form a primary key.

If Li, Lj are leaf nodes and i < j, Li’s search-key values are less than Lj’s search-key values

Pn points to next leaf node in search-key order

Page 93: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Non-Leaf Nodes in B+-Trees - Properties

Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers: All the search-keys in the subtree to which P1 points

are less than K1

For 2 i n – 1, all the search-keys in the subtree to which Pi points have values greater than or equal to Ki–

1 and less than Km–1

Page 94: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-Tree vs B+-Tree

B-tree (above) and B+-tree (below) on same data

Page 95: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ tree insertionINSERTION OF KEY ’K’ insert search-key value to ’L’ such that the keys are in order; if ( ’L’ overflows) { split ’L’ ; insert (ie., COPY) smallest search-key value of new node to parent node ’P’; if (’P’ overflows) { repeat the B-tree split procedure recursively; /* Notice: the B-TREE split; NOT the B+ -tree */ } }

Page 96: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+-tree insertion – cont’d

/* ATTENTION:

a split at the LEAF level is handled by COPYING the middle key upstairs;

A split at a higher level is handled by PUSHING the middle key upstairs

*/

Page 97: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees - insertion

1 3

6

6

9

9

<6

>=6 <9 >=9

7 13

Eg., insert ‘8’

Page 98: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees - insertion

1 3

6

6

9

9

<6

>=6 <9 >=9

7 13

Eg., insert ‘8’

8

Page 99: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees - insertion

1 3

6

6

9

9

<6

>=6 <9 >=9

7 13

Eg., insert ‘8’

8

COPY middle upstairs

Page 100: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees - insertion

1 3

6

6

9<6

>=6 <9>=9

9 13

Eg., insert ‘8’

COPY middle upstairs

7 8

7

Page 101: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees - insertion

1 3

6

6

9<6

>=6 <9>=9

9 13

Eg., insert ‘8’

COPY middle upstairs

7 8

7

Non-leaf overflow – just PUSH the middle

Page 102: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B+ trees - insertion

1 3

6

6

<6

>=6>=9

9 13

Eg., insert ‘8’

7 8

7

9

<7 >=7

<9

FINAL TREE

Page 103: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-Trees vs B+-Trees

Advantages of B-Tree indices: May use less tree nodes than a corresponding B+-Tree. Sometimes possible to find search-key value before reaching

leaf node. Disadvantages of B-Tree indices:

Only small fraction of all search-key values are found early Non-leaf nodes are larger, so fan-out is reduced. Thus B-Trees

typically have greater depth than corresponding B+-Tree Insertion and deletion more complicated than in B+-Trees Implementation is harder than B+-Trees.

Typically, advantages of B-Trees do not out weigh disadvantages

Page 104: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B*-tree In B-trees, worst case util. = 50%,

if we have just split all the pages how to increase the utilization of B

- trees?

… with B* - trees!

Page 105: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B-trees and B*-trees

E.g., Tree T0; insert ‘2’

1 3

6

7

9

13

<6

>6 <9 >9

2

Page 106: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B*-trees: deferred split! Instead of splitting, LEND keys to

sibling!(through PARENT, of course!)

1 3

6

7

9

13

<6

>6 <9 >9

2

Page 107: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B*-trees: deferred split! Instead of splitting, LEND keys to

sibling!(through PARENT, of course!)

1 2

3

6

9

13

<3

>3 <9 >9

2

7

FINAL TREE

Page 108: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B*-trees: deferred split!

Notice: shorter, more packed, faster tree

It’s a rare case, where space utilization and speed improve together

BUT: What if the sibling has no room for our ‘lending’?

Page 109: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

B*-trees: deferred split!

BUT: What if the sibling has no room for our ‘lending’?

A: 2-to-3 split: get the keys from the sibling, pool them with ours (and a key from the parent), and split in 3.

Details: too messy (and even worse for deletion)

Page 110: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Conclusions all B – tree variants can be used for

any type of index: primary/secondary, sparse (clustering), or dense (non-clustering)

All have excellent, O(logN) worst-case performance for ins/del/search

It’s the prevailing indexing method

Page 111: Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.

Overview ordered indices

primary / secondary indices index-sequential multilevel (ISAM)

B - trees, B+ - trees

hashing static hashing dynamic hashing