Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked...

37
Lists, Hash Tables, Lists, Hash Tables, Trees Trees CS 1302 CS 1302 Fall 1999 Fall 1999

Transcript of Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked...

Page 1: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Lists, Hash Tables, TreesLists, Hash Tables, Trees

CS 1302CS 1302

Fall 1999Fall 1999

Page 2: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Contents of LectureContents of Lecture

• Linked listsLinked lists– Linked lists in generalLinked lists in general– What classes are needed to implement?What classes are needed to implement?– Code exampleCode example– Linked lists versus arrays: when to use which?Linked lists versus arrays: when to use which?

• Hash table basicsHash table basics• Binary search treesBinary search trees

Page 3: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Linked Lists in JavaLinked Lists in JavaWe will now show the basics for a Java linked list holding data elements of some specific class.

Our example:

• A linked list of class RingList. It’s a class, so we may create any number of instances of RingList.

• Any and all objects of class Ring contain nodes that are of class RingNode which manages data about a single ring.

• Note that the list itself is one class, the items in that list are of another class, and the program using them is a 3rd class.

width

head

width1 instanceof RingList

2 instancesof RingNode

Page 4: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

class RingNode {

private int width; private RingNode next;

public RingNode (int width) { this.width = width; next = null; // reserved word for

// nonexistent obj }

public RingNode getNext () { return (next); }

public void setNext (RingNode next) { this.next = next; }

public void print () { System.out.println (title); }

}//class BookNode

The RingNode ClassThe RingNode Class

width

Page 5: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

class RingList { private RingNode head; RingList () { head = null; }

public void add (RingNode newRing) {

RingNode currRing;

if (head == null) head = newRing; else { currRing = head; while (currRing.getNext() != null) currRing = currRing.getNext(); currRing.setNext (newRing); }}//add

public void print () { printHelper (head); }//print private void printHelper (RingNode probe) { if (probe != null) { probe.print (); printHelper (probe.getNext()); } } //printHelper

}//class RingList

head

The List ClassThe List Class

Page 6: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

public class ListUser {

public static void main (String args[ ]) {

RingList rings = new RingList (); /* the RingList.add method can do the "new RingNode” for you. Which is better, do you think? */ rings.add

(new RingNode (6)); rings.add

(new RingNode (5)); rings.add

(new RingNode (4)); rings.add

(new RingNode (3)); rings.print (); }//main

}//class ListUser

The class that uses the listThe class that uses the list

6

head

5rings

Page 7: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Using Linked Lists in JavaUsing Linked Lists in Java

Three crucial things to remember:

1. No pointers in Java. Manipulate references• main difference in code from 1301/1501: no need for de-referencing operator ( ^ )

2. No direct access to data:• assignment of references only• data access via accessor/modifier methods

3. Multiple classes required:• A type of node is defined in one class.• A list that uses nodes is defined in another class.• The program that uses the list is in yet another class.

Page 8: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Q: Why bother with different Collection types?Q: Why bother with different Collection types?

Designing is largely choosing among alternatives based on design goals

Arrays vs. Linked ListsArrays vs. Linked Lists

Same Cost/Benefit Trade-offs as in Pseudocode:

Arrays are statically sized, so you have to commit to length in advanceLinked lists are dynamically sized, so you can hedge your bets

Arrays require O(N) work to insert at front (need to shift rest)Linked lists require O(1) (constant) work (need to “splice” in place)

Arrays provide random access to each data elementLinked lists require that you traverse list to get to search node

Arrays require space for data objectsLinked lists also require space for “next” references

Page 9: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Problems with static memory allocation• you run out of memory if you underestimate• you waste memory if you overestimate• you have to manage the memory manually (e.g., closing gaps in the data structure)

Dynamic memory allocation• all data (objects) is allocated in the heap; you work with reference to those objects• you request new memory when you need it (using "new")• java collects memory that you don't need any more and returns it to the heap (garbage collection)

Examples• linked lists, ordered lists, queues, stacks, doubly-linked lists, etc.• trees, binary trees, binary search trees, height-balanced binary search trees, graphs (to come)

Linked Structures & Dynamic Linked Structures & Dynamic Memory Allocation: Summary PointsMemory Allocation: Summary Points

Page 10: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Lists: SummaryLists: Summary• Linked lists

– Class that uses the list (e.g. driver)– Class representing the list itself (NodeList)– Class representing nodes (Node)

• One or more data fields

• One field is “pointer” to next node: type is Node

– null is “zero”/non-existent object• Never reference a null object!!!

• Lists -v- arrays: Design as choosing among options– Early commitment to length in arrays– Insertion cost constant for lists– Space overhead for next field in list

• Hashtable basics• Binary search trees

Page 11: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Contents of LectureContents of Lecture

• Linked lists

• Hash table basics– Purpose– Hash functions and collisions– Collision resolution strategies

• Binary search trees

Page 12: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Problem: How to quickly access items in a list?

Possible Solution: Hashing

Idea:Shrink the address space to fit the population size.

For example:Use some function to reduce the address space of a billion possible SocSecNums to the population size of 100 students.

Hash Function: The function by which you shrink the address space, e.g,

index = SocSecNum % 100

HashingHashing

Page 13: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

The Perfect Hash Function:• would be very fast (used for all data access)• would return a unique result for each key, i.e., would result in zero collisions• in general case, perfect hash doesn’t exist (we can create one for a specific population, but as soon as that population changes... )• provides an ideal point of reference

Common Hash Functions:• Digit selection: e.g., last 4 of phone num• Division: modulo•Character keys: use ASCII num values for chars (e.g., ‘R’ is 82)

Hash FunctionsHash Functions

Page 14: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Two costs of hashing: 1. loss of natural order

• side effect of desired random shrinking

• lose any ordering of original indices

2. collision will occur

• no perfect hash function

• when (not “if”) collision, how to handle it?

Collision Resolution strategies:

• Multiple record buckets: small for each index, but . . .

• Open address methods: look for next open address, but . . .

• Coalesced chaining: use cellar for overflow (~34..40% of size)

• External chaining: linked list at each location

Cost of HashCost of Hash

Page 15: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

01

2

3

4

5

6

7

Collision ResolutionTechnique: Multiple element buckets

• Idea: have extra spaces there for overflow • if population of 8, and if hash fuction of mod 8, then:

1st 1st 2ndhash collision collision

Problems: using 3N space; “what if 3rd collision at any one locale?”

Page 16: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Technique: Open address methods

• Idea: upon collision, look for an empty spot• if population of 8, and if hash fuction of mod 8• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D

Problem: Deteriorates to an unsorted list (e.g., O(N) )

0 D hashes to 21 W hashes to 1

2 C hashes to 1

3 X hashes to 3

4 Y hashes to 4

5 Z hashes to 3

6 A hashes to 6

7 B hashes to 5

X already at 3, so Zto next available slot

B belongs at 5, but Zalready there

W already at 1, so Cto next available slot

D belongs at 2, but Calready there

Collision ResolutionCollision Resolution

Page 17: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Technique: Coalesced chaining:

• Idea: have small extra “cellar” to handle collision• if population of 8, and if hash fuction of mod 8• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D

Cellar bottom is now 8C

ella

r

Works well with cellar of

35..40% of N if good hash

function; cellar can

overflow if need be

Collision ResolutionCollision Resolution

01 W hashes to 1 9

2 D hashes to 2

3 X hashes to 3 10

4 Y hashes to 4

5 B hashes to 5

6 A hashes to 6

7

8

9 C hashes to 1

10 Z hashes to 3

Page 18: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Technique: External chaining:

• Idea: have pointers to all items at given hash, handle collision as normal event.• if population of 8, and if hash fuction of mod 8• Assume data items arrived in the order: W, X, Y, Z, A, B, C, D

Collision ResolutionCollision Resolution

0

1 W hashes to 1 C hashes to 1

2 D hashes to 2

3 X hashes to 3 Z hashes to 3

4 Y hashes to 4

5 B hashes to 5

6 A hashes to 6

7

Page 19: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Hashing with Chaining: ExampleHashing with Chaining: Example

public class HashChain { private Node[] bucket; private int TableSize;

public HashChain(int TableSize) { this.TableSize = TableSize; bucket = new Node[TableSize]; for (int i=0; i< TableSize; i++) bucket[i] = new Node(); } // HashChain private int getHashKey(int newElement) { return newElement % TableSize; } // getHashKey

public void addElement (int newElement) { int index = getHashKey(newElement); bucket[index] .insertNode(newElement); } //addElement

public Node getElement(int iData) { int index = getHashKey(iData); Node item = bucket[index] .locateNode(iData); return item; } // getElement

} // HashChain

Page 20: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

public class Node { int iData; Node nextNode; public Node() { ; }

public Node(int iData) { this.iData = iData; }

public void insertNode(int iData) { insertNode (iData, this); }

public void insertNode(int iData, Node current) {

if (current.getNextNode() == null) current.setNextNode

(new Node(iData)); else insertNode

(iData, current.getNextNode()); }

public Node locateNode(int iData) { return locateNode(iData, this); }

public Node locateNode (int iData, Node current) { if (iData == current.getData()) return current; else if (current.getNextNode()== null) return null; else return locateNode (iData, current.getNextNode()); }

public int getData() { return iData; }

public Node getNextNode() { return nextNode; }

public void setNextNode(Node nextNode) { this.nextNode = nextNode; }

public String toString() { return "Node: " + iData; }}

Page 21: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

class Driver{

public static void main(String arg[]) { HashChain hash = new HashChain(10);

for (int i=0; i< 100; i++) { hash.addElement(i); } // for

for (int i=0; i<100; i++) { System.out.println (hash.getElement(i)); } // for

} // main

} // Driver

Page 22: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Summary of Hash TablesSummary of Hash Tables

• Purpose: Fast searching of lists by reducing address space to approx. population size.

• Hash function: the reduction function• Collision: hash(a) == hash(b), but a!=b• Collision resolution strategies

– Multiple element buckets still risk collisions– Open addressing quickly deteriorates to unordered

list– Chaining is most general solution

Page 23: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Contents of LectureContents of Lecture

• Linked lists

• Hash table basics

• Binary search trees– BSTs in Java– Balancing a BST– Search strategies

• and there’s a bit more coming up, so don’t pack up yet

Page 24: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Trees--DefinitionsTrees--Definitions

Recall our previous discussion of Trees.

Defined: A tree is a multiply-linked data structure

fox

doghen

cat elf hat hog

Leaf(terminal)

Internal node (nonterminal)

branch

height

path

Page 25: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

A

D

CB

E F

Binary TreesBinary Trees

Binary Trees:same basic idea

as fromPseudocode. A few neat properties of binary trees:

* There exists at most one path between any two nodes.

* A tree with N nodes has N-1 edges.

* A full binary tree with N internal nodes has N+1 external nodes.

* The height of a full binary tree with N nodes is about log2N.

Page 26: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

class TreeNode {

private SomeClass dataObject; // reference to data private TreeNode left; // reference to left subtree private TreeNode right; // reference to right subtree

// constructor for leaf node with null references public TreeNode(SomeClass newObject) {

this( newObject, null, null ); }

// accessor: returns reference to current data object public SomeClass getObject( ) {

return dataObject ; }

// accessor: returnes reference to left subtree public TreeNode getLeft( ) {

return left; }

// accessor: return reference to right subtree public TreeNode getRight( ) {

return right; } } // class TreeNode

Binary Trees: Binary Trees: Part of a Class Part of a Class for the Tree’sfor the Tree’sNodesNodes

dataObject

a TreeNode

Page 27: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Insertion into a Binary Search TreeInsertion into a Binary Search TreeIn Class TreeNode:

public void insert (SomeClass newObject) {

if ( newObject.lessThan( dataObject ) ) { if ( left == null ) left = new TreeNode(newObject); else left.insert( newObject ); } else // treating duplicates as if they’re greaterThan { if ( right == null ) right = new TreeNode (newObject); else right.insert( newObject ); } } // insert

Page 28: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

More Insertion, plusTraversalMore Insertion, plusTraversal

public class Tree { private TreeNode root; public Tree( ) {

root = null; }

public void insertInBST (SomeClass newObject) { if (root == null) root = new TreeNode

(newObject); else root.insert(newObject ); } // insertInBST

public void inorderTraversal( ) { inorderHelper (root);

}

public void inorderHelper (TreeNode node) {

if (node != null) { inorderHelper (node.left); System.out.println

(node.dataObject ); inorderHelper (node.right); } // if } // inorderHelper} // class Tree

In Class Tree:root

rootdata

Object2

possible Trees

a TreeNode

Page 29: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Binary Tree BalancingBinary Tree Balancing

If unbalanced, binary tree can become linked list:

A

D

C

B

E

F

Tree as linked list

creates worst case

search time

In best-case, tree search time is O(log N).

Problem: As tree grows out of balance - search time deteriorates: worst-case, the search time of O(N).

Solved by keeping tree

in balance!

Page 30: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Binary Tree BalancingBinary Tree Balancing

Two general methods of tree balancing

1) Rebalance from sorted list.

E.g., perform inorder traversal followed by tree reconstruction of sorted array.

Cost: O(N) time to reconstruct global tree

2) Create ‘almost balanced’ binary tree and use local tree balancing(AVL trees, covered in algorithms courses)

Cost : O(log N) insertion time.

Page 31: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Binary Tree BalancingBinary Tree BalancingReconstruction Technique:

place nodes in array (in order traversal).

Sort the array

Take midpoint as new tree head.

Take midpoint of each remaining half as the left and right child; repeat

Combineinto arecursivecall

9 12 14 22 25 31 44 47 64 74

New Head

Left & Right Children

84

Page 32: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

DFS: Simple ExampleDFS: Simple Example

A DFS of a binary tree is similar to a recursive pre-order traversal, except that:

1. Pre-order visits entire tree, while DFS stops at at a goal node.

2. The right tree gets visited only if the left fails to hold the goal node.

A

B C

D E F

To find C, traverse: A-B-D-E-C (but not F)To find E, traverse: A-B-D-E, etc.

Page 33: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Tree Searching: RecursiveTree Searching: Recursive

public boolean searchTree(Node item, Node current) {

boolean found = false;

if (current == null)

found = false;

else if (current.equals(item))

found = true;

else if (searchTree(current.getLeft())

found = true;

else if (searchTree(current.getRight())

found = true;

else

found = false;

return found;

} // searchTree

Terminatesrecursive

call

Note: If binary search tree,one achieves improved searchperformance by comparing thecurrent node to item, limiting search to left or right subtree.

Page 34: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Contents of LectureContents of Lecture

• Linked lists• Hash table basics

• Binary search trees– Terms, revisited: branch, leaf, etc.– Java example for Node class (cf. List Node)– Search cost degenerates to that for list unless

tree is balanced– Depth-first search is like preorder traversal, but

terminates when target node is found• and there’s really is a bit more coming up, so don’t pack up just yet

Page 35: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Designing with (== choosing among)Designing with (== choosing among)Data StructuresData Structures

Example: Maintaining Very Large Example: Maintaining Very Large Distributed DatabasesDistributed DatabasesIssues

How fast can we search?

How fast can we insert and delete entries?

How large a data structure do we need?

How large a data structure do we need in main memory to work with?

Page 36: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

Very Large DatabasesVery Large Databases

• How large is an English dictionary?

/usr/dict/words: 25,000 words

ARTFL Project, Webster’s Revised Unabridged Dictionary, 1913 Edition: 110,000 words (http://humanities.uchicago.edu/forms_unrest/webster.form.html)

• How large is the web?

• How fast can we search databases this size?

Page 37: Lists, Hash Tables, Trees CS 1302 Fall 1999. Contents of Lecture Linked listsLinked lists –Linked lists in general –What classes are needed to implement?

How fast are these techniques?

Searching and Maintaining Searching and Maintaining Very Large DatabasesVery Large Databases

O(log n), but how long do they really take on real machines?

Data representation Search Algorithm

Sorted Array Binary Search

Ordered Linked List Linear Search

Binary Search Tree Tree Search

Balanced Binary Search Tree Tree Search