Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked...
-
Upload
britton-phelps -
Category
Documents
-
view
219 -
download
1
Transcript of Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked...
Lists, Hash Tables, TreesLists, Hash Tables, Trees
CS 1302CS 1302
Fall 1999Fall 1999
Contents of LectureContents of Lecture
• Linked listsLinked lists– Linked lists in generalLinked lists in general– What classes are needed to implement?What classes are needed to implement?– Code exampleCode example– Linked lists versus arrays: when to use which?Linked lists versus arrays: when to use which?
• Hash table basicsHash table basics• Binary search treesBinary search trees
Linked Lists in JavaLinked Lists in JavaWe will now show the basics for a Java linked list holding data elements of some specific class.
Our example:
• A linked list of class RingList. It’s a class, so we may create any number of instances of RingList.
• Any and all objects of class Ring contain nodes that are of class RingNode which manages data about a single ring.
• Note that the list itself is one class, the items in that list are of another class, and the program using them is a 3rd class.
width
head
width1 instanceof RingList
2 instancesof RingNode
class RingNode {
private int width; private RingNode next;
public RingNode (int width) { this.width = width; next = null; // reserved word for
// nonexistent obj }
public RingNode getNext () { return (next); }
public void setNext (RingNode next) { this.next = next; }
public void print () { System.out.println (title); }
}//class BookNode
The RingNode ClassThe RingNode Class
width
class RingList { private RingNode head; RingList () { head = null; }
public void add (RingNode newRing) {
RingNode currRing;
if (head == null) head = newRing; else { currRing = head; while (currRing.getNext() != null) currRing = currRing.getNext(); currRing.setNext (newRing); }}//add
public void print () { printHelper (head); }//print private void printHelper (RingNode probe) { if (probe != null) { probe.print (); printHelper (probe.getNext()); } } //printHelper
}//class RingList
head
The List ClassThe List Class
public class ListUser {
public static void main (String args[ ]) {
RingList rings = new RingList (); /* the RingList.add method can do the "new RingNode” for you. Which is better, do you think? */ rings.add
(new RingNode (6)); rings.add
(new RingNode (5)); rings.add
(new RingNode (4)); rings.add
(new RingNode (3)); rings.print (); }//main
}//class ListUser
The class that uses the listThe class that uses the list
6
head
5rings
Using Linked Lists in JavaUsing Linked Lists in Java
Three crucial things to remember:
1. No pointers in Java. Manipulate references• main difference in code from 1301/1501: no need for de-referencing operator ( ^ )
2. No direct access to data:• assignment of references only• data access via accessor/modifier methods
3. Multiple classes required:• A type of node is defined in one class.• A list that uses nodes is defined in another class.• The program that uses the list is in yet another class.
Q: Why bother with different Collection types?Q: Why bother with different Collection types?
Designing is largely choosing among alternatives based on design goals
Arrays vs. Linked ListsArrays vs. Linked Lists
Same Cost/Benefit Trade-offs as in Pseudocode:
Arrays are statically sized, so you have to commit to length in advanceLinked lists are dynamically sized, so you can hedge your bets
Arrays require O(N) work to insert at front (need to shift rest)Linked lists require O(1) (constant) work (need to “splice” in place)
Arrays provide random access to each data elementLinked lists require that you traverse list to get to search node
Arrays require space for data objectsLinked lists also require space for “next” references
Problems with static memory allocation• you run out of memory if you underestimate• you waste memory if you overestimate• you have to manage the memory manually (e.g., closing gaps in the data structure)
Dynamic memory allocation• all data (objects) is allocated in the heap; you work with reference to those objects• you request new memory when you need it (using "new")• java collects memory that you don't need any more and returns it to the heap (garbage collection)
Examples• linked lists, ordered lists, queues, stacks, doubly-linked lists, etc.• trees, binary trees, binary search trees, height-balanced binary search trees, graphs (to come)
Linked Structures & Dynamic Linked Structures & Dynamic Memory Allocation: Summary PointsMemory Allocation: Summary Points
Lists: SummaryLists: Summary• Linked lists
– Class that uses the list (e.g. driver)– Class representing the list itself (NodeList)– Class representing nodes (Node)
• One or more data fields
• One field is “pointer” to next node: type is Node
– null is “zero”/non-existent object• Never reference a null object!!!
• Lists -v- arrays: Design as choosing among options– Early commitment to length in arrays– Insertion cost constant for lists– Space overhead for next field in list
• Hashtable basics• Binary search trees
Contents of LectureContents of Lecture
• Linked lists
• Hash table basics– Purpose– Hash functions and collisions– Collision resolution strategies
• Binary search trees
Problem: How to quickly access items in a list?
Possible Solution: Hashing
Idea:Shrink the address space to fit the population size.
For example:Use some function to reduce the address space of a billion possible SocSecNums to the population size of 100 students.
Hash Function: The function by which you shrink the address space, e.g,
index = SocSecNum % 100
HashingHashing
The Perfect Hash Function:• would be very fast (used for all data access)• would return a unique result for each key, i.e., would result in zero collisions• in general case, perfect hash doesn’t exist (we can create one for a specific population, but as soon as that population changes... )• provides an ideal point of reference
Common Hash Functions:• Digit selection: e.g., last 4 of phone num• Division: modulo•Character keys: use ASCII num values for chars (e.g., ‘R’ is 82)
Hash FunctionsHash Functions
Two costs of hashing: 1. loss of natural order
• side effect of desired random shrinking
• lose any ordering of original indices
2. collision will occur
• no perfect hash function
• when (not “if”) collision, how to handle it?
Collision Resolution strategies:
• Multiple record buckets: small for each index, but . . .
• Open address methods: look for next open address, but . . .
• Coalesced chaining: use cellar for overflow (~34..40% of size)
• External chaining: linked list at each location
Cost of HashCost of Hash
01
2
3
4
5
6
7
Collision ResolutionTechnique: Multiple element buckets
• Idea: have extra spaces there for overflow • if population of 8, and if hash fuction of mod 8, then:
1st 1st 2ndhash collision collision
Problems: using 3N space; “what if 3rd collision at any one locale?”
Technique: Open address methods
• Idea: upon collision, look for an empty spot• if population of 8, and if hash fuction of mod 8• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D
Problem: Deteriorates to an unsorted list (e.g., O(N) )
0 D hashes to 21 W hashes to 1
2 C hashes to 1
3 X hashes to 3
4 Y hashes to 4
5 Z hashes to 3
6 A hashes to 6
7 B hashes to 5
X already at 3, so Zto next available slot
B belongs at 5, but Zalready there
W already at 1, so Cto next available slot
D belongs at 2, but Calready there
Collision ResolutionCollision Resolution
Technique: Coalesced chaining:
• Idea: have small extra “cellar” to handle collision• if population of 8, and if hash fuction of mod 8• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D
Cellar bottom is now 8C
ella
r
Works well with cellar of
35..40% of N if good hash
function; cellar can
overflow if need be
Collision ResolutionCollision Resolution
01 W hashes to 1 9
2 D hashes to 2
3 X hashes to 3 10
4 Y hashes to 4
5 B hashes to 5
6 A hashes to 6
7
8
9 C hashes to 1
10 Z hashes to 3
Technique: External chaining:
• Idea: have pointers to all items at given hash, handle collision as normal event.• if population of 8, and if hash fuction of mod 8• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D
Collision ResolutionCollision Resolution
0
1 W hashes to 1 C hashes to 1
2 D hashes to 2
3 X hashes to 3 Z hashes to 3
4 Y hashes to 4
5 B hashes to 5
6 A hashes to 6
7
Hashing with Chaining: ExampleHashing with Chaining: Example
public class HashChain { private Node[] bucket; private int TableSize;
public HashChain(int TableSize) { this.TableSize = TableSize; bucket = new Node[TableSize]; for (int i=0; i< TableSize; i++) bucket[i] = new Node(); } // HashChain private int getHashKey(int newElement) { return newElement % TableSize; } // getHashKey
public void addElement (int newElement) { int index = getHashKey(newElement); bucket[index] .insertNode(newElement); } //addElement
public Node getElement(int iData) { int index = getHashKey(iData); Node item = bucket[index] .locateNode(iData); return item; } // getElement
} // HashChain
public class Node { int iData; Node nextNode; public Node() { ; }
public Node(int iData) { this.iData = iData; }
public void insertNode(int iData) { insertNode (iData, this); }
public void insertNode(int iData, Node current) {
if (current.getNextNode() == null) current.setNextNode
(new Node(iData)); else insertNode
(iData, current.getNextNode()); }
public Node locateNode(int iData) { return locateNode(iData, this); }
public Node locateNode (int iData, Node current) { if (iData == current.getData()) return current; else if (current.getNextNode()== null) return null; else return locateNode (iData, current.getNextNode()); }
public int getData() { return iData; }
public Node getNextNode() { return nextNode; }
public void setNextNode(Node nextNode) { this.nextNode = nextNode; }
public String toString() { return "Node: " + iData; }}
class Driver{
public static void main(String arg[]) { HashChain hash = new HashChain(10);
for (int i=0; i< 100; i++) { hash.addElement(i); } // for
for (int i=0; i<100; i++) { System.out.println (hash.getElement(i)); } // for
} // main
} // Driver
Summary of Hash TablesSummary of Hash Tables
• Purpose: Fast searching of lists by reducing address space to approx. population size.
• Hash function: the reduction function• Collision: hash(a) == hash(b), but a!=b• Collision resolution strategies
– Multiple element buckets still risk collisions– Open addressing quickly deteriorates to unordered
list– Chaining is most general solution
Contents of LectureContents of Lecture
• Linked lists
• Hash table basics
• Binary search trees– BSTs in Java– Balancing a BST– Search strategies
• and there’s a bit more coming up, so don’t pack up yet
Trees--DefinitionsTrees--Definitions
Recall our previous discussion of Trees.
Defined: A tree is a multiply-linked data structure
fox
doghen
cat elf hat hog
Leaf(terminal)
Internal node (nonterminal)
branch
height
path
A
D
CB
E F
Binary TreesBinary Trees
Binary Trees:same basic idea
as fromPseudocode. A few neat properties of binary trees:
* There exists at most one path between any two nodes.
* A tree with N nodes has N-1 edges.
* A full binary tree with N internal nodes has N+1 external nodes.
* The height of a full binary tree with N nodes is about log2N.
class TreeNode {
private SomeClass dataObject; // reference to data private TreeNode left; // reference to left subtree private TreeNode right; // reference to right subtree
// constructor for leaf node with null references public TreeNode(SomeClass newObject) {
this( newObject, null, null ); }
// accessor: returns reference to current data object public SomeClass getObject( ) {
return dataObject ; }
// accessor: returnes reference to left subtree public TreeNode getLeft( ) {
return left; }
// accessor: return reference to right subtree public TreeNode getRight( ) {
return right; } } // class TreeNode
Binary Trees: Binary Trees: Part of a Class Part of a Class for the Tree’sfor the Tree’sNodesNodes
dataObject
a TreeNode
Insertion into a Binary Search TreeInsertion into a Binary Search TreeIn Class TreeNode:
public void insert (SomeClass newObject) {
if ( newObject.lessThan( dataObject ) ) { if ( left == null ) left = new TreeNode(newObject); else left.insert( newObject ); } else // treating duplicates as if they’re greaterThan { if ( right == null ) right = new TreeNode (newObject); else right.insert( newObject ); } } // insert
More Insertion, plusTraversalMore Insertion, plusTraversal
public class Tree { private TreeNode root; public Tree( ) {
root = null; }
public void insertInBST (SomeClass newObject) { if (root == null) root = new TreeNode
(newObject); else root.insert(newObject ); } // insertInBST
public void inorderTraversal( ) { inorderHelper (root);
}
public void inorderHelper (TreeNode node) {
if (node != null) { inorderHelper (node.left); System.out.println
(node.dataObject ); inorderHelper (node.right); } // if } // inorderHelper} // class Tree
In Class Tree:root
rootdata
Object2
possible Trees
a TreeNode
Binary Tree BalancingBinary Tree Balancing
If unbalanced, binary tree can become linked list:
A
D
C
B
E
F
Tree as linked list
creates worst case
search time
In best-case, tree search time is O(log N).
Problem: As tree grows out of balance - search time deteriorates: worst-case, the search time of O(N).
Solved by keeping tree
in balance!
Binary Tree BalancingBinary Tree Balancing
Two general methods of tree balancing
1) Rebalance from sorted list.
E.g., perform inorder traversal followed by tree reconstruction of sorted array.
Cost: O(N) time to reconstruct global tree
2) Create ‘almost balanced’ binary tree and use local tree balancing(AVL trees, covered in algorithms courses)
Cost : O(log N) insertion time.
Binary Tree BalancingBinary Tree BalancingReconstruction Technique:
place nodes in array (in order traversal).
Sort the array
Take midpoint as new tree head.
Take midpoint of each remaining half as the left and right child; repeat
Combineinto arecursivecall
9 12 14 22 25 31 44 47 64 74
New Head
Left & Right Children
84
DFS: Simple ExampleDFS: Simple Example
A DFS of a binary tree is similar to a recursive pre-order traversal, except that:
1. Pre-order visits entire tree, while DFS stops at at a goal node.
2. The right tree gets visited only if the left fails to hold the goal node.
A
B C
D E F
To find C, traverse: A-B-D-E-C (but not F)To find E, traverse: A-B-D-E, etc.
Tree Searching: RecursiveTree Searching: Recursive
public boolean searchTree(Node item, Node current) {
boolean found = false;
if (current == null)
found = false;
else if (current.equals(item))
found = true;
else if (searchTree(current.getLeft())
found = true;
else if (searchTree(current.getRight())
found = true;
else
found = false;
return found;
} // searchTree
Terminatesrecursive
call
Note: If binary search tree,one achieves improved searchperformance by comparing thecurrent node to item, limiting search to left or right subtree.
Contents of LectureContents of Lecture
• Linked lists• Hash table basics
• Binary search trees– Terms, revisited: branch, leaf, etc.– Java example for Node class (cf. List Node)– Search cost degenerates to that for list unless
tree is balanced– Depth-first search is like preorder traversal, but
terminates when target node is found• and there’s really is a bit more coming up, so don’t pack up just yet
Designing with (== choosing among)Designing with (== choosing among)Data StructuresData Structures
Example: Maintaining Very Large Example: Maintaining Very Large Distributed DatabasesDistributed DatabasesIssues
How fast can we search?
How fast can we insert and delete entries?
How large a data structure do we need?
How large a data structure do we need in main memory to work with?
Very Large DatabasesVery Large Databases
• How large is an English dictionary?
/usr/dict/words: 25,000 words
ARTFL Project, Webster’s Revised Unabridged Dictionary, 1913 Edition: 110,000 words (http://humanities.uchicago.edu/forms_unrest/webster.form.html)
• How large is the web?
• How fast can we search databases this size?
How fast are these techniques?
Searching and Maintaining Searching and Maintaining Very Large DatabasesVery Large Databases
O(log n), but how long do they really take on real machines?
Data representation Search Algorithm
Sorted Array Binary Search
Ordered Linked List Linear Search
Binary Search Tree Tree Search
Balanced Binary Search Tree Tree Search