Ms. Hall, Ms. Colling, Mrs. Cummings, Mr. Li, and Ms. McFadyen
File Organizations March 2007R McFadyen ACS - 39021 In SQL Server 2000 Tree terms root, internal,...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of File Organizations March 2007R McFadyen ACS - 39021 In SQL Server 2000 Tree terms root, internal,...
March 2007 R McFadyen ACS - 3902 1
File Organizations
•In SQL Server 2000
•Tree terms
•root, internal, leaf, subtree
•parent, child, sibling
•balanced, unbalanced
•b+-tree
- split on overflow; merge on underflow
- in practice it is usually 3 or 4 levels deep
•search, insert, delete algorithms
March 2007 R McFadyen ACS - 3902 3
File Organizations
Using Query Analyzer to create a create table / index
Create an index on the employeeID column of the emp_pay table that enforces uniqueness. This index physically orders the data on disk because the CLUSTERED clause is specified.
CREATE TABLE emp_pay ( employeeID int NOT NULL, base_pay money NOT NULL, commission decimal(2, 2) NOT NULL )
INSERT emp_pay VALUES (1, 500, .10) INSERT emp_pay VALUES (2, 1000, .05) …
CREATE UNIQUE CLUSTERED INDEX employeeID_ind ON emp_pay (employeeID) GO
March 2007 R McFadyen ACS - 3902 4
File Organizations
Using Query Analyzer to create a create table / index
Create an index on the orderID and employeeID columns of the order_emp table.
CREATE TABLE order_emp ( orderID int IDENTITY(1000, 1), employeeID int NOT NULL, orderdate datetime NOT NULL DEFAULT GETDATE(), orderamount money NOT NULL )
INSERT order_emp (employeeID, orderdate, orderamount) VALUES (5, '4/12/98', 315.19) INSERT order_emp (employeeID, orderdate, orderamount) VALUES (5, '5/30/98', 1929.04) INSERT order_emp (employeeID, orderdate, orderamount) VALUES (1, '1/03/98', 2039.82) … CREATE INDEX emp_order_ind ON order_emp (orderID, employeeID)
March 2007 R McFadyen ACS - 3902 5
File Organizations
Using Enterprise Manager to create a create table / index
PRIMARY KEYIs a constraint that enforces entity integrity for a given column or columns through a unique index. Only one PRIMARY KEY constraint can be created per table.
UNIQUEIs a constraint that provides entity integrity for a given column or columns through a unique index. A table can have multiple UNIQUE constraints.
March 2007 R McFadyen ACS - 3902 6
File Organizations
Using Enterprise Manager to create table / index
CLUSTEREDCreates an object where the physical order of rows is the same as the indexed order of the rows, and the bottom (leaf) level of the clustered index contains the actual data rows. (note this is a variation from the text’s discussion)A table or view is allowed one clustered index at a time. (another variation … indexing a view)A view with a clustered index is called an indexed view.
FILLFACTORSpecifies how full SQL Server should make each index page used to store the index data. User-specified fillfactor values can be from 1 through 100, with a default of 0. A lower fill factor creates the index with more space available for new index entries without having to allocate new space.
March 2007 R McFadyen ACS - 3902 7
File Organizations
Motivation (finding one record given its key)
•Scanning a file is time consuming
•B+-tree provides a short access path
file of recordspage1
page2
page3
B+-tree
March 2007 R McFadyen ACS - 3902 8
File Organizations
Motivation
•A B+-tree is a tree, in which each node is a page.
•A B+-tree for a file is stored in a separate file.
•A file could have many B+-treesfile of records
page1
page2
page3
B+-tree
March 2007 R McFadyen ACS - 3902 9
File Organizations
b+-tree
•based on b-tree (Bayer, balanced, Boeing)
•dynamic
Root
Internal
nodes
Leaf nodes
... ...
March 2007 R McFadyen ACS - 3902 10
File Organizations
Node structure for b+-tree of order p
non-leaf node (internal node or a root)
•< P1, K1, P2, K2, …, Pq-1, Kq-1, Pq > (q p)
•K1 < K2 < ... < Kq-1 (i.e. it’s an ordered set)
•for any key value, X, in the subtree pointed to by Pi
•Ki-1 < X Ki for 1 < i < q•X K1 for i = 1•Kq-1 < X for i = q
•each internal node has at most p pointers•each node except root must have at least p/2 pointers•the root, if it has some children, must have at least 2 pointers
One more
pointer than
there are keys
All but root
must be at
least half full
March 2007 R McFadyen ACS - 3902 11
File Organizations
Node structure for b+-tree of order p
leaf node (terminal node)
•< (K1, Pr1), (K2, Pr2), …, (Kq-1, Prq-1), Pnext >
•K1 < K2 < ... < Kq-1
•Pri points to a record with key value Ki or
•Pri points to a block containing a record with key value Ki
•each leaf has at least p/2 keys •maximum of p keys•all leaves are at the same level (balanced)•Pnext points to the next leaf for key sequencing
Pairs of
key/pointer plus
one next pointer
Dense or non-
dense index
March 2007 R McFadyen ACS - 3902 12
File Organizations
Example
•insert records with key values
Diane, Cory, Ramon, Amy, Miranda, Ahmed,
Marshall, Zena, Rhonda, Vincent, Hok
into a b+-tree with p=3.
internal node will have minimum 2 pointers and maximum 3 pointers - inserting a fourth will cause a split
leaf can have at least 2 key/pointer pairs and a maximum of 3 key/pointer pairs - inserting a fourth will cause a split
Typically p is very
large … 100, 200, …
March 2007 R McFadyen ACS - 3902 13
File Organizations
insert Diane
Diane
Pointer to data
Pointer to next leaf in ascending key sequence
insert Cory
Cory , Diane
March 2007 R McFadyen ACS - 3902 14
File Organizations
Example
insert Ramon
Cory , Diane , Ramon
inserting Amy will cause the node to overflow:
Amy , Cory , Diane , Ramon This leaf must split
see next =>
March 2007 R McFadyen ACS - 3902 15
File Organizations
continuing with insertion of Amy - split the node and promote a key value upwards (this must be Cory because it’s the highest key value in the left subtree)
Amy , Cory , Diane , Ramon
Amy , Cory Diane , Ramon
Cory
Tree has grown one level, from the bottom up
March 2007 R McFadyen ACS - 3902 16
File Organizations
Splitting Nodes
Any value being promoted upwards will come from the node that is splitting.
•When a leaf splits, a ‘copy’ of a key value is promoted. •When an internal node splits, the middle key value ‘moves’ from a child to a parent node.
There are three situations to be concerned with: •a leaf splits, •an internal node splits, •a new root is generated.
March 2007 R McFadyen ACS - 3902 17
File Organizations
Leaf splittingWhen a leaf splits, a new leaf is allocated
•the original leaf is the left sibling, the new one is the right sibling •key and pointer pairs of the overflowing node are redistributed: the left sibling will have smaller keys than the right sibling•a 'copy' of the key value which is the largest of the keys in the left sibling is promoted to the parent
Two situations arise: the parent exists or not. If the parent exists, then a copy of the key value and the pointer to the right sibling are promoted upwards. Otherwise, the b+-tree is just beginning to grow ...
33
12 22 33 44 48 55 12 22 44 48 5531 33
22 33
insert 31
March 2007 R McFadyen ACS - 3902 18
File Organizations
Internal node splitting
If an internal node splits and it is not the root,•insert the key and pointer and then determine the middle key•a new 'right' sibling is allocated•everything to its left stays in the left sibling•everything to its right goes into the right sibling •the middle key value along with the pointer to the new right sibling is promoted to the parent (the middle key value 'moves' to the parent to become the discriminator between this left and right sibling)
22 33
55
22
26 55
Insert 26
33
March 2007 R McFadyen ACS - 3902 19
File Organizations
Internal node splitting
When a new root is formed, a key value and two pointers must be placed into it.
26 55
Insert 56
26 56
55
March 2007 R McFadyen ACS - 3902 20
File Organizations
A sample trace
Diane, Cory, Ramon, Amy, Miranda,
Marshall, Zena, Rhonda, Vincent, Simon, Mary
into a b+-tree with p=3.
Amy , Cory Diane , Ramon
Cory
Miranda
March 2007 R McFadyen ACS - 3902 21
File Organizations
Amy , Cory
Cory
Diane , Miranda , Ramon
Marshall
Amy , Cory Diane , Marshall Miranda , Ramon
Cory Marshall
Zena
March 2007 R McFadyen ACS - 3902 22
File Organizations
Amy , Cory Diane , Marshall Miranda , Ramon , Zena
Cory Marshall
Rhonda
Amy , Cory Diane , Marshall Rhonda , Zena
Cory Marshall Ramon
Miranda , Ramon
March 2007 R McFadyen ACS - 3902 23
File Organizations
Amy , Cory Diane , Marshall Rhonda , Zena
Marshall
Miranda , Ramon
Cory Ramon
Vincent
March 2007 R McFadyen ACS - 3902 24
File Organizations
Amy , Cory Diane , Marshall
Rhonda , Vincent ,Zena
Marshall
Miranda , Ramon
Cory Ramon
Simon
March 2007 R McFadyen ACS - 3902 25
File Organizations
Marshall
Miranda , Ramon
Ramon Simon
Rhonda , Simon Vincent , Zena
Mary
March 2007 R McFadyen ACS - 3902 26
File Organizations
A sample b+-tree
5
3 7 8
6 7 9 125 81 3
Records
p = 3,pleaf = 2.
March 2007 R McFadyen ACS - 3902 27
File Organizations
Searching a b+-tree
- search a record with key = 8:
5
3 7 8
6 7 9 125 81 3
March 2007 R McFadyen ACS - 3902 28
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 7 9
6 7 125 91 3
Deleting 8 results in underflow and key redistribution
March 2007 R McFadyen ACS - 3902 29
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 7
6 75 91 3
12 is removed.
March 2007 R McFadyen ACS - 3902 30
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 6
65 71 3
9 is removed.
March 2007 R McFadyen ACS - 3902 31
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 6
651 3
Deleting 7 makes this pointer no use.Therefore, a merge at the level abovethe leaf level occurs.
March 2007 R McFadyen ACS - 3902 32
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
For this merge, 5 will be taken as a key value in A since any key value in B is less than or equal to 5 but any key value in C is larger than 5.
651 3
53 5A
B
C
5
This pointer becomes useless.The corresponding nodeshould also be removed.
March 2007 R McFadyen ACS - 3902 33
File Organizations
Entry deletion
- deletion sequence: 8, 12, 9, 7
51 3 6
53 5
March 2007 R McFadyen ACS - 3902 34
File Organizations
b+-tree operations
•search - always the same search length - tree height
•retrieval - sequential access is facilitated - how?
•insert - may cause overflow - tree may grow
•delete - may cause underflow - tree may shrink
What do you expect for storage utilization?