Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

82
Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Transcript of Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Page 1: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Chapter 11

Sets

© 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Page 2: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Overview

● Set– A collection of elements in which the same

element does not appear more than once.– Three implementations of sets: ordered lists

(11.2), Binary search trees (11.3) and Hash tables (11.4)

● 11.1: Set interface● 11.5: Java collections framework,

discussing Java's own Set interface.

Page 3: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 4: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 5: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 6: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 7: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

11.1 The Set Interface

Page 8: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 9: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 10: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 11: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 12: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 13: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

Page 14: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

● A good list of words can be found in the file /usr/share/dict/words on most Unix systems, including Mac OS X

● Efficient Set implementation is important.

Page 15: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

The Set Interface

● Given a large words.txt file with tens or hundreds of thousands of words, the program as written takes at most a few seconds to start up.

● If we change– Words = new OrderedList<String>();

● to– Words = new BinarySearchTree<String>();

● Takes unacceptable amount of time.

Page 16: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

11.2 Ordered Lists

● Since a Set changes size as items are added and removed, a linked structure seems in order.

● An OrderedList is like a LinkedList, except that:– The elements of an OrderedList must implement

the Comparable interfaces– The elements of an OrderedList are kept in order.– The OrderedList class implements the Set

interface. It provides the methods add(), contains(), remove(), and size().

– No duplicate elements are allowed.

Page 17: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

● Extending LinkedList– The problem is that the LinkedList class

implements the List interface, which conflicts with the Set interface.

– The add() method from the List interface should add the argument target to the end of the list, even if it is already present.

– But the add() method from the Set interface may add target at any position, but not if it is already present.

Page 18: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

Page 19: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

● We should extend a class only when an instance of the subclass works in place of an instance of the superclass.

Page 20: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

Page 21: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

Page 22: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

●The contains() method for the OrderedList class is a linear search (sequential search).

Page 23: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

● To add something to a Set, we must first find where it belongs, then (if it is not present) put it there.

Page 24: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

Page 25: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Ordered Lists

●To remove something, we must first find

where it belongs, then (if it is present) remove it.

Page 26: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

● Since the final “put it there” and “remove it” operations take constant time, all three methods (search, insertion and deletion) have the same order of running time Θ(n) for a given implementation.

● The OrderedList data structure is easy to implement, but it requires linear time for search, insertion, and deletion.

Ordered Lists

Page 27: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

11.3 Binary Search Trees

● Binary search tree– More efficient under some circumstances.

Page 28: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

Page 29: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees● Search

– Successful search for node 16.

– Shaded nodes are never examined

Page 30: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

● Searching a binary search tree.– In a perfect tree, time complexity is Θ(log n).– In the Anagrams program when the word file is

in alphabetical order, it produces a worst case. Every new node is a right child, Θ(n).

Page 31: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

● There are two complications to the code:– Once we reach a null node, we have forgotten

how we got there. Since we need to modify either the left or right field in the parent (e.g. node 16) of the new leaf, we'll need this information.

– We need to deal with the situation in which the binary search tree is empty.

● Insertion (node 15)● The search fails when 16 has no child, so we add a new leaf there.

Page 32: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

Fig. 11-20 (p300) Both the BinaryNode and BinarySearchTree classes must be modified to implement the Parent interface.

Page 33: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

Fig. 11-21 (p300) The getChild() and setChild() methods for the BinaryNode class.

Page 34: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

Fig. 11-22 (p301) The getChild() and setChild() methods for the BinarySearchTree class. Since there is only one child (the root), the argument direction is ignored.

Page 35: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

Page 36: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

● Deletion– The challenge is to make sure the tree is still a

binary search tree when we're done with the deletion.

– Deleting a leaf, this is easy.– If the node has only one child, we just splice it

out much as we would a node in a linked list.

Page 37: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

● Deletion (to delete node 2): The deleted node’s child 8 becomes a child of the deleted node’s parent 13.

Page 38: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

● When the node we want to delete has two children (see Figure 11-25 on page 302).– We must be very careful about which node we

choose to delete so that the tree will still be a binary search tree.

– It is always safe to choose the inorder successor of the node we originally wanted to delete.

– Find a node's inorder successor by going to the right child, then going left until we hit a node with no left child.

– It can have a right child.

Page 39: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

● It is safe to replace the node we want to delete with its inorder successor.– It is therefore larger than anything in the left

subtree and smaller than anything else in the right subtree.

Page 40: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

Page 41: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

Page 42: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

● We don't need special code for the case where node is a leaf, because in this situation parent.setChild(direction, node.getRight());

● Is equivalent to:– parent.setChild(direction, null);– Because node.getRight() returns null.

Page 43: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Binary Search Trees

● BinarySearchTrees should not be used in the plain form explained here.

– The worst-case, running time is linear– Worst case are not uncommon.

Page 44: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

● LetterCollection– a collection of letters– Since there are only 26 different letters a data

structure as complicated as an ordered list or binary search tree is not required.

● It is also handy to keep track of the total number of letters in the collection.

Page 45: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

● LetterCollection containing 1 'a', 3 'd's, and 1 'y'.

Page 46: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

Page 47: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

Page 48: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

Page 49: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

Page 50: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

Page 51: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

● A set can be represented by an array of booleans.– This approach is called direct addressing.

● When we want to look up some element, we go directly to the appropriate place in the array.

● Direct addressing is incredibly fast.

Page 52: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

● Direct addressing cannot be used for every set.– The set of possible elements might be vastly larger

than the set of elements actually stored.– A direct addressing table would be a huge wast of

space.● Hash function

– Takes an int as input and returns an array index.– f(x) = x mod 10, where x is the integer input and f(x)

is the array index.

Page 53: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table (p308)

● f(x) = x mod 10

Fig. 11-31: In a hash table, a hash function maps each potential element to an array index. The shaded positions do not contain set elements; in practice, some invalid value such as -1 or null is stored there.

Page 54: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

● Hash table– Have all the advantages of direct addressing,

even though it works when the number of possible elements is much larger than the number of elements actually stored.

– Most of the built-in classes have a method hashCode() which takes no arguments and returns an int which can be passes to a hash function.

● This int is called the hash code.

Page 55: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table● hashCode() method must return the same value

for two objects which are equals()– That is if a.equals(b), then a.hashCode() ==

b.hashCode().– But if a.hashCode() == b.hashCode(), it does not

follow that a.equals(b)– Sometimes two different, non-identical objects have

the same hash code.– When storing our own classes in a hash table, we

must define both equals() and hashCode().● Hash table set methods use hashCode() to find the right

position in the table, then use equals() to verify that the item at the position is the one we're looking for.

Page 56: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table

● The hash function might occasionally map two different hash codes to the same index.

● For example, 37 mod 10 = 87 mod 10. This is called a collision.– We try to choose a hash function which makes

collisions rare.– No matter how good our hash function is,

collisions cannot be completely avoided.– Since there are more potential elements than

positions in the table, some elements must hash to the same location.

Page 57: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table (collision resolution)

1. Chaining– To keep a sequence of ListNodes (effectively an

ordered list) at each position in the table.

Page 58: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table (collision resolution)● To search, insert, or delete, we simply use the

hash function to find the right list and then perform the appropriate ordered list operation there.

● If we know in advance how many elements the Set contains (e.g. n), we can choose the array length to be proportional to this.– This limits the average number of elements per list

to a constant.● Average time for search, insertion, and deletion

is Θ(1) (constant time).

Page 59: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table (collision resolution)

2. Open addressing– Collision resolution techniques that avoid chains

and store all of the elements directly in the array.– 2.1 Linear probing

● If there is a collision during insertion, we simply move on to the next unoccupied position.

Page 60: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Table● Three problems with linear probing:

– Tables can fill up: Fails catastrophically when there's no more room.

● Solve this problem by making the new table larger and rehashing when the table gets too full.

– We can't simply remove an item to delete it.● This affects future searches● By replacing a deleted item with a special value.

– Neither null nor equals() to any target, so searches continue past it.● The table may become full of deleted values.

– Solve by occasionally rehashing.

– Clusters tend to occur and grow.● Insertion into any position in a cluster expands the cluster.

Page 61: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

● 2.2 Double hashing– We use two hash functions

● The first function tells us where to look first, while the second one tells us how many positions to skip each time if we find an occupied one.

● Two elements which originally has to the same position are unlikely to follow the same sequence of position through the table.

– See Fig. 11-35 (p311): Double hashing● f(x) = x mod 10● g(x) = floor(x / 100) //floor function● Examples: to insert 256 and 386 into the hash table.

Page 62: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

– Reduce the risk of clustering– Linear probing is a special case of double hashing

where the second has function always returns 1.– The size of the table and the number returned by

the second hash function should be relatively prime.● Have no factor in common other than 1.● Easiest way to ensure relative primality is to make the

table size a power of two, require that the second hash function always returns an odd number

Page 63: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

Page 64: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

Page 65: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

● If this position is occupied by something other than target, we use hash2() to decide how many positions to skip ahead.

Page 66: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

Page 67: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

● If the table is at least half full (including deleteds), we rehash into a larger table before inserting.

Page 68: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

Page 69: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

Page 70: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Hash Tables

● There are only two drawbacks to hash tables:– Traversal is not efficient.

● Visiting all of the elements requires a search through the entire table, which is presumably mostly empty.

– This takes time proportional to the capacity of the table, not to the number of elements.

– With open addressing, deletion is a bit awkward● We have to leave behind a special value deleted and

occasionally rehash.– A hash table may not be the best Set implementation to use

in an application where many deletions are expected.

Page 71: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Java Collections Framework Again

● Java collections framework– In the java.util package, the interface Collection

is extended by an interface Set, which is similar to the one we defined in Section 11.1.

– The same element may appear more than once in a collection, but not in a set.

– The Set interface is extended by SortedSet.

Page 72: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Java Collections Framework Again

● HashSet and TreeSet– HashSet

● Is very much like the HashTable we defined in Section 11.4

– TreeSet● Is similar to the BinarySearchTree, but it uses some

advanced techniques.

Page 73: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Java Collections Framework Again

Page 74: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Java Collections Framework Again

● Maps– Each of a set of keys is associated with a value.– Each key may appear only once in a Map, although

it is possible for several keys to be associated with the same value.

– The Map interface requires the methods put() and get().

Page 75: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Java Collections Framework Again

● Map<String, Interger> numbers =

new TreeMap<String, Integer>();● numbers.put(“five”, 5);● numbers.put(“twelve”, 12);● numbers.put(“a dozen”, 12);

● numbers.get(“twelve”) returns the Integer 12.

Page 76: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Java Collections Framework Again

● Data structures for Maps are the same as those used for Sets, with a small amount of extra work needed to keep track of the values in addition to the keys.

Page 77: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Java Collections Framework Again

Page 78: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Summary

● Set interface– OrderedList

● An ordered list is a linked list in which the elements are in order.

– It should be be used only for very small sets.

– BinarySerachTree● A binary tree where all of the elements in the left subtree

are less than the root and all the elements in the right subtree are greater than the root.

– Average performance is good, it performs very poorly when the data are already sorted—a common situation.

Page 79: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Summary

– HashTable● Stores elements in an array, using a hash function to

determine where in the array to place each element.– Since there are more potential elements than positions in the

array, collisions can occur.– Three approaches to collision resolution are:

● Chaining● Linear probing● Double hashing

– Hash tables have extremely good performance on average.

Page 80: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Summary

● The java collections framework defines a Set interface, implemented by a TreeSet and HashSet.

● Map interface for associating keys with values, implemented by TreeMap and HashMap.

Page 81: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Summary

Page 82: Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.

Chapter 11 Self-Study Homework

● Pages: 296-318● Exercises: 11.5, 11.7, 11.8, 11.9, 11.17, 11.22