4. 2-3-B-TRIE

Post on 30-Jan-2016

225 views 1 download

Tags:

description

4. 2-3-B-TRIE

Transcript of 4. 2-3-B-TRIE

Data Structures2-3 Trees, B Trees, TRIE Trees

2-3 Trees

• J. E. Hopcroft in 1970• improvement on existing height balanced

binary search trees• Later they were generalized to B-trees by Bayer

and McCreight• a first generalization of AVL trees: in the

structure of the node there may be placed two keys– possibility to have three children instead of two

2-3 Trees

Sample 2-3 tree

2-3 Trees

• Invariants• Data elements within a node are ordered from left (minimum) to

right (maximum).• The tree is perfectly balanced• Every node has at most two keys.• For any node, the number of children is greater by one the

number of keys.• If a node has two keys (k1 and k2, with k1 < k2) than:

– left child (and corresponding sub-tree) contains keys smaller than k1;– middle child (and corresponding sub-tree) contains keys greater than

k1 and smaller than k2;– right child (and corresponding sub-tree) contains keys greater than k2;

2-3 Trees

typedef struct Node23{int k1,k2;Node23 *left,*middle,*right;

};

2-3 Trees

The operations that may be performed in a 2-3 tree are:• Searching: Searching involves traversing the tree by traveling

through the subtrees that can contain the target element.• Insertion: Insertion is based on two basic rules: – Insertion is performed only in leaf nodes and – When a node is full it must be spited.

Key fact: The tree is always perfectly balanced.• Deletion: Deletion is performed quite similarly with deletion

from BSTs. – The key is firstly searched and then a replacement key is found. – The replacement key is represented by a successor (or predecessor)

key

2-3 Treesprocedure Search (23Node node, int key){

if ( node is leaf and does not contain key ){return nil;}else{

if ( key is in node ){return node;}else{

if (a <= node->k1 ){ return Search(node->left, key) }else{ if (a <= node->k2) {

return Search(node->middle, key) }else{

return Search(node->right, key)}}}}

}

2-3 Trees

Searching a key in a 2-3 Tree

2-3 Treesprocedure insert(Node23 node, int key){

if( node = NULL){ # create a new node and place key in it

}else{ while (!continue) //search the key

//verifies if the key is already in the 2_3 treeif (key is in node){

found = true;continue = false;

}else if( node.left = NULL) t←0;else if(key < node.k1) node ← node.left;

//take the left sub treeelse if(key < node.k2) node ← node.middle;

//take the middle sub treeelse node ← node.right; //take the right sub tree

} }//end while}//end if

if ( found ){ #the key is already in the tree!}else

if (node has one key)#place the key in proper position

else{min ← minimum(node.k1, node.k2, key);mid ←middleFrom(node.k1, node.k2, key);max ←maximum (node.k1, node.k2, key);leftChild ←new 23Node();leftChild.k1 ←min;rightChild ←new 23Node();rightChild.k1 ←max;parent ← node.parent;if (parent = null) {

parent←new 23Node();parent.k1 ← mid;parent.left ←leftChild;parent.right←rightChildl

}else{insert (parent, mid);}

}}

2-3 Trees

Splitting a leaf

2-3 Treesprocedure deleteFromLeaf (key x){

# Locate the leaf L containing x and let v be the parent of Lif (L has two keys){

#delete key x}else{

if (L has just key x){if (v is root){ //the number of levels is reduced by 1

# delete v;# the lonely child becomes new root;

}else{if (next sibling of x has two keys){

# borrow a key from rich sibling in node containing x# delete key x

}else{# merge x, poor sibling and key from v#delete key x

}}

}}

}

2-3 Trees

procedure deleteFromInternalNode (key x){# Locate internal node I containing xp ← predecessor key of x;x ← p; //key x in node I is replaced by value pdeleteFromLeaf (p);

}

How to delete key 4?

B Trees

Definition. A B-tree of order m (the maximum number of children for each node) is a tree which satisfies the following properties:

• Every node has at most m children.• Every node (except root and leaves) has at least m⁄2

children.• The root has at least two children if it is not a leaf node.• All leaves appear in the same level, and carry information.• A non-leaf node with k children contains k–1 keys.• The keys are stored in non-decreasing order

B Trees

Sample B Tree of order 2

B Trees

typedef struct NodeB {int nr;int key[m+1];struct node *pchildren[m+1];}pnode;

B TreesPROCEDURE B-TREE-SEARCH (x , k)begini=1while ( i <= n[x] and k > keyi[x] )

do ( i <- i + 1);if ( i <= n[x] and k = keyi[x] )

then return (x, i)if ( leaf[x] ) then return NIL; else Disk-Read(ci[x])

return B-Tree-Search(ci[x], k);

end;

Searching key 38 in a B-Tree

The B-Tree search algorithm

B TreesInsert operation of a new element into that node with the following steps:• If the node contains fewer than the maximum legal number of elements,

then there is room for the new element. Insert the new element in the node, keeping the node's elements ordered.

• Otherwise, the node is full, so evenly split it into two nodes. • A single median key is chosen from among the leaf's elements and the

new element.• Values less than the median are put in the new left node and values

greater than the median are put in the new right node, with the median acting as a separation value.

• Insert the separation value in the node's parent, which may cause it to be split, and so on. If the node has no parent (i.e., the node was the root), create a new root above this node (increasing the height of the tree).

B Trees

B-tree after the insertion of key 5

B-tree after the insertion of keys 6 and 7

B-tree after the insertion of key 8

Task: Continue inserting keys 9, 10, 11, 12, 13, 14, 15, 16, 17

B TreesPROCEDURE B-TREE-INSERT(T, k)begin

r = root[T]if ( n[r] == 2t – 1){ s <- Allocate-Node() root[T] = s

leaf[s] = FALSE n[s] = 0 c1[s] = r

B-Tree-Split-Child(s, 1, r) B-Tree-Insert-Nonfull(s, k)

} else B-Tree-Insert-Nonfull(r, k)end;

B Trees

Detailed example of split when inserting key 16

B Trees

Deleting a Key. There are two popular strategies for deletion from a B-Tree.

• locate and delete the item, then restructure the tree to regain its invariants

• do a single pass down the tree, but before entering (visiting) a node, restructure the tree so that once the key to be deleted is encountered, it can be deleted without triggering the need for any further restructuring

B Trees

The necessary steps for deleting from a leaf node are:• Step 1. Search for the value to delete.• Step 2. If the value is in a leaf node, it can simply be

deleted from the node,• Step 3. If underflow happens, check siblings to either

transfer a key or fuse the siblings together.• Step 4. If deletion happened from right child retrieve

the max value of left child if there is no underflow in left child. In vice-versa situation retrieve the min element from right.

B TreesDeletion of key 3 from a B-Tree of order 2.

B TreesDeletion of key 19 (from the root of the tree)

B TreesIf deleting a key from a node with n/2 keys, the rebalancing steps are:• If the right sibling has more than the minimum number of elements – Get a key from it.• Otherwise, if the left sibling has more than the minimum number of

elements. – Get a key from it.• If both immediate siblings have only the minimum number of elements – Create a new node with all the elements from the deficient node, all the

elements from one of its siblings, and the separator in the parent between the two combined sibling nodes. [this is a merge operation]

• Remove the separator from the parent, and replace the two children it separated with the combined node.

• If that brings the number of elements in the parent under the minimum, repeat these steps with that deficient node, unless it is the root, since the root may be deficient.

B TreesDelete key 17 from previous slide (#24)

Task: Delete 13, 12, etc.

TRIE Trees

• TRIE, (or prefix tree), is an ordered multi-way tree data structure that is used to store strings over an alphabet.

• The term TRIE comes from "retrieval."

#define NR 27 // the American alphabet(26 letters) plus blank.

typedef struct TrieNode{ bool NotLeaf;

TrieNode *pChildren[NR];char word[20];

};

TRIE Trees

The structure of a TRIE node

TRIE Trees

Sample TRIE tree

TRIE Trees

The search algorithm involves the following steps:1. For each character in the string, see if there is a child

node with that character as the content.2. If that character does not exist, return false.3. If that character exist, repeat step 1.4. Do the above steps until the end of string is reached. 5. When end of string is reached and if the marker

(NotLeaf) of the current Node is set to false, return true, else return false.

TRIE Treesprocedure FIND(trie, string)beginif ( trie == NULL) then

return FALSEelse

next = index = triecount = 0

while ( index->NotLeaf and count < length ( keyword ) and

index->pChildren[keyWord[count]-'a'] <> NULL )

do {next = index->pChildren[keyWord[count]-'a']Index = nextcount =count +1

}//end whileIf ( next == NULL) then

return TRUEelse {

data <- nextif ( data->word == keyword ) then

return TRUE else {if ( data->pChildren[26]->word ==

keyword ) then return TRUE else

return NULL }

}end

TRIE Trees

Sample search in a TRIE tree

TRIE Trees

Insertion steps:• Find the place of the item by following bits.• If there is nothing, just insert the item there as a leaf

node.• If there is something on the leaf node, it becomes a new

intern node. Build a new subtree or subtrees to that inner node depending how the item to be inserted and the item that was in the leaf node differs.

• Create new leaf nodes where you store the item that was to be inserted and the item that was originally in the leaf node.

TRIE Treesprocedure Insert(trie, keyWord)begin

lenght = length(keyWord)next = trie;if ( trie == NULL ) then // if empty TRIE

trie = create empty internal nodenew_leaf = create leaf with keyWordtrie->pChildren[keyWord[0]-'a'] = new_leaf //add the leaf into

the trie exit

else // non empty trie ..start searchingindex = next

inWordIndex = 0

TRIE Trees… procedure Insert(trie, keyWord) … continued …//move down in trie while end of word isn't reached and the pChildren branch node

doesn't leads to //NULLwhile ( inWordIndex < lenght and index->NotLeaf == true and index->pChildren[keyWord[inWordIndex]-'a'] <> NULL)) do {

// .... go down with 1 levelparent = next; //set as parent the actual node// the actual node goes down with 1 level in trie following the pChildren field //corresponding to the actual letter from keyWordnext= index->pChildren[keyWord[inWordIndex]-'a'];index <- next;inWordIndex = inWordIndex + 1 //move right in word with 1 level (1 letter)

}//end while

TRIE Trees… procedure Insert(trie, keyWord) … continued …// if pChildren branch node points to NULL(end of prefix is reached) and no word already inserted, //simply

insert the word if ( inWordIndex < lenght and index->pChildren[keyWord[inWordIndex]-'a'] = NULL and index->NotLeaf == true ) then

new_index = NewLeaf(keyWord)index->pChildren[keyWord[inWordIndex]-'a'] = new_indexexit

elsedata = next

if ( data->word == keyword ) thenprint "Word already exists in trie !!!"

else { // store in oldChildren the subtree that derived from the same prefix as keyWord

oldChildren = parent->pChildren[keyWord[inWordIndex-1]-'a']newWord = NewLeaf(keyWord)prefixLenght = lenght(keyWord)if ( data->word[0] <> '\0' ) then

if ( lenght(data->word) < prefixLenght ) then // determine the minimum lenght of words

prefixLenght = lenght(data->word)}//end if

TRIE Trees… procedure Insert(trie, keyWord) … continued …

createIntern = false// Build a new subtree while the word to be inserted and the item that was in the leaf node

have the //same letters or the end of one of them is reached.while ( inWordIndex <= prefixLenght and (data->word[0] <> '\0' and (data->word[inWordIndex-1] = keyWord[inWordIndex-1]) or (data->word[0] == '\0' ) do{

intern = NewIntern()parent->pChildren[keyWord[inWordIndex-1]-'a'] = intern

//insert this node in the corresponding field in parent->pChildren//(with respect to the letter index in array)

parent->NotLeaf = trueparent = intern; // move down in tree with 1 levelinWordIndex = inWordIndex +1// move right in word with 1 lettercreateIntern = true

}//end while

TRIE Trees

… procedure Insert(trie, keyWord) … continued …

if ( createIntern ) then inWordIndex <- inWordIndex -1//if items have a common prefix if ( inWordIndex <> prefixLenght or (inWordIndex = prefixLenght and length(keyWord) = length(data->word)) )

then//store in leaves the item that was to be inserted and the item that was originally in the //leaf node.

parent->pChildren[data->word[inWordIndex]-'a'] = oldChildrenparent->pChildren[keyWord[inWordIndex]-'a'] = newWord

TRIE Treeselse // one word (keyWord or an item that was

originally in the leaf node) represents a prefix //for the other(s) item(s)

if ( data->word[0] <> '\0' ) then { //just a word that has as prefix the keyword or

vice versa// insert the items as information nodes

corresponding to the pChildren //fields of prefixLenght and blank character if ( lenght(data->word) <= prefixLenght ) then

parent->pChildren[26] = oldChildrenparent->pChildren[keyWord[prefixLenght]-'a'] =

newWord

else

parent->pChildren[26] = newWord

parent->pChildren[data->word[prefixLenght]-'a']= oldChildren//end if

else {// Two or more words that have the same prefix

for (int count = 0 ; count < 27;count++) //copy the subtree

parent->pChildren[count] = oldChildren->pChildren[count]

//newWord is the prefix(save in blank pointer)

parent->pChildren[26] = newWord

}//end if//end ifexitend

TRIE Trees

Task: Insert “AEROSMITH”

TRIE Trees