Building Java Programs Chapter 18 Advanced Data Structures: Hashing and Heaps Copyright (c) Pearson...
-
Upload
deshawn-acy -
Category
Documents
-
view
223 -
download
2
Transcript of Building Java Programs Chapter 18 Advanced Data Structures: Hashing and Heaps Copyright (c) Pearson...
Building Java ProgramsChapter 18
Advanced Data Structures:Hashing and Heaps
Copyright (c) Pearson 2013.All rights reserved.
Hashing
Reading: 18.1
3
Recall: ADTs
• abstract data type (ADT): A specification of a collection of data and the operations that can be performed on it.– Describes what a collection does, not how it does it.
• Java's collection framework describes ADTs with interfaces:– Collection, Deque, List, Map, Queue, Set, SortedMap
• An ADT can be implemented in multiple ways by classes:– ArrayList and LinkedList implement List– HashSet and TreeSet implement Set– LinkedList , ArrayDeque, etc. implement Queue
4
SearchTree as a set
• We implemented a class SearchTree to store a BST of ints:
• Our BST is essentially a set of integers.Operations we support:– add– contains– remove
...
• But there are other ways to implement a set...
9160
8729
55
42-3
overallRoot
5
Sets
• set: A collection of unique values (no duplicates allowed)
that can perform the following operations efficiently:– add, remove, search (contains)
– The client doesn't think of a set as having indexes; we just add things to the set in general and don't worry about orderset.contains("to") true
set
"the" "of"
"from""to"
"she""you"
"him""why"
"in"
"down""by"
"if"
set.contains("be") false
6
Int Set ADT interface
• Let's think about how to write our own implementation of a set.– To simplify the problem, we only store ints in our set for
now.– As is (usually) done in the Java Collection Framework, we
will define sets as an ADT by creating a Set interface.– Core operations are: add, contains, remove.
public interface IntSet { void add(int value); boolean contains(int value); void clear(); boolean isEmpty(); void remove(int value); int size();}
7
Unfilled array set
• Consider storing a set in an unfilled array.– It doesn't really matter what order the elements appear in a
set, so long as they can be added and searched quickly.– What would make a good ordering for the elements?
• If we store them in the next available index, as in a list, ...– set.add(9);set.add(23);set.add(8);set.add(-3);set.add(49);set.add(12);
– How efficient is add? contains? remove?•O(1), O(N), O(N)•(contains must loop over the array; remove must shift elements.)
index
0 1 2 3 4 5 6 7 8 9
value
9 23
8 -3 49
12
0 0 0 0
size 6
8
Sorted array set
• Suppose we store the elements in an unfilled array, butin sorted order rather than order of insertion.– set.add(9);set.add(23);set.add(8);set.add(-3);set.add(49);set.add(12);
– How efficient is add? contains? remove?•O(N), O(log N), O(N)•(You can do an O(log N) binary search to find elements in contains,and to find the proper index in add/remove; but add/remove still need to shift elements right/left to make room, which is O(N) on average.)
index
0 1 2 3 4 5 6 7 8 9
value
-3 8 9 12
23
49
0 0 0 0
size 6
9
A strange idea
• Silly idea: When client adds value i, store it at index i in the array.– Would this work?– Problems / drawbacks of this approach? How to work
around them?
set.add(7);set.add(1);set.add(9);...
set.add(18);set.add(12);
index
0 1 2 3 4 5 6 7 8 9
value
0 1 0 0 0 0 0 7 0 9
size 3
index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
value
0 1 0 0 0 0 0 7 0 9 0 0 12
0 0 0 0 0 18
0
size 5
10
Hashing• hash: To map a large domain of values to a smaller fixed
domain.– Typically, mapping a set of elements to integer indexes in an array.– Idea: Store any given element value in a particular predictable
index.•That way, adding / removing / looking for it are constant-time (O(1)).
– hash table: An array that stores elements via hashing.
• hash function: An algorithm that maps values to indexes.– hash code: The output of a hash function for a given value.
– In previous slide, our "hash function" was: hash(i) i•Potentially requires a large array (a.length > i).•Doesn't work for negative numbers.•Array could be very sparse, mostly empty (memory waste).
11
Improved hash function
• To deal with negative numbers: hash(i) abs(i)• To deal with large numbers: hash(i) abs(i) %
length
set.add(37); // abs(37) % 10 == 7set.add(-2); // abs(-2) % 10 == 2set.add(49); // abs(49) % 10 == 9
// inside HashIntSet classprivate int hash(int i) { return Math.abs(i) % elements.length;}
index
0 1 2 3 4 5 6 7 8 9
value
0 0 -2 0 0 0 0 37
0 49
size 3
12
Sketch of implementation
public class HashIntSet implements IntSet { private int[] elements; ... public void add(int value) { elements[hash(value)] = value; }
public boolean contains(int value) { return elements[hash(value)] == value; }
public void remove(int value) { elements[hash(value)] = 0; }}
– Runtime of add, contains, and remove: O(1) !!•Are there any problems with this approach?
13
Collisions
• collision: When hash function maps 2 values to same index.
set.add(11);set.add(49);set.add(24);set.add(37);set.add(54); // collides with 24!
• collision resolution: An algorithm for fixing collisions.
index
0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 54
0 0 37
0 49
size 5
14
Probing
• probing: Resolving a collision by moving to another index.– linear probing: Moves to the next available index (wraps if
needed).
set.add(11);set.add(49);set.add(24);set.add(37);set.add(54); // collides with 24; must probe
– variation: quadratic probing moves increasingly far away: +1, +4, +9, ...
index
0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
54
0 37
0 49
size 5
15
Implementing HashIntSet
• Let's implement an int set using a hash table with linear probing.– For simplicity, assume that the set cannot store 0s for now.
public class HashIntSet implements IntSet { private int[] elements; private int size;
// constructs new empty set public HashIntSet() { elements = new int[10]; size = 0; }
// hash function maps values to indexes private int hash(int value) { return Math.abs(value) % elements.length; } ...
16
The add operation
• How do we add an element to the hash table?– Use the hash function to find the proper bucket index.– If we see a 0, put it there.– If not, move forward until we find an empty (0) index to
store it.– If we see that the value is already in the table, don't re-
add it.
– set.add(54); // client code– set.add(14);inde
x0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
54
14
37
0 49
size 6
17
Implementing add• How do we add an element to the hash table?public void add(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { // linear probing h = (h + 1) % elements.length; // for empty slot } if (elements[h] != value) { // avoid duplicates elements[h] = value; size++; }}
index
0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
54
0 37
0 49
size 5
18
The contains operation
• How do we search for an element in the hash table?– Use the hash function to find the proper bucket index.– Loop forward until we either find the value, or an empty
index (0).– If find the value, it is contained (true). If we find 0, it is
not (false).
– set.contains(24) // true– set.contains(14) // true– set.contains(35) // falseinde
x0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
54
14
37
0 49
size 6
19
Implementing containspublic boolean contains(int value) { int h = hash(value); while (elements[h] != 0) { if (elements[h] == value) { // linear probing return true; // to search } h = (h + 1) % elements.length; } return false; // not found}
index
0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
54
0 37
0 49
size 5
20
The remove operation
• We cannot remove by simply zeroing out an element:set.remove(54); // set index 5 to 0set.contains(14) // false??? oops
• Instead, we replace it by a special "removed" placeholder value– (can be re-used on add, but keep searching on contains)
index
0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
0 14
34
0 49
size 5
index
0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
XX
14
34
0 49
size 5
21
Implementing removepublic void remove(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { h = (h + 1) % elements.length; } if (elements[h] == value) { elements[h] = -999; // "removed" flag value size--; }}
set.remove(54); // client codeset.remove(11);set.remove(34);
index
0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
-999
14
34
0 49
size 5
22
Patching add, containsprivate static final int REMOVED = -999;
public void add(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value && elements[h] != REMOVED) { h = (h + 1) % elements.length; } if (elements[h] != value) { elements[h] = value; size++; }}
// contains does not need patching;// it should keep going on a -999, which it already doespublic boolean contains(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { h = (h + 1) % elements.length; } return elements[h] == value;}
23
Problem: full array
• clustering: Clumps of elements at neighboring indexes.– Slows down the hash table lookup; you must loop through
them.set.add(11);set.add(49);set.add(24);set.add(37);set.add(54); // collides with 24set.add(14); // collides with 24, then 54set.add(86); // collides with 14, then 37
•Where does each value go in the array?•How many indexes must be examined to answer contains(94)?•What will happen if the array completely fills?
index
0 1 2 3 4 5 6 7 8 9
value
0 0 0 0 0 0 0 0 0 0
size 0
24
Rehashing
• rehash: Growing to a larger array when the table is too full.– Cannot simply copy the old array to a new one. (Why
not?)
• load factor: ratio of (# of elements ) / (hash table length )– many collections rehash when load factor ≅ .75
index
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
value
0 0 0 0 24
0 66
0 48
0 0 11
0 0 54
95
14
37
0 0
size 8
index
0 1 2 3 4 5 6 7 8 9
value
95
11
0 0 24
54
14
37
66
48
size 8
25
Implementing rehash // Grows hash table to twice its original size. private void rehash() { int[] old = elements; elements = new int[2 * old.length]; size = 0; for (int value : old) { if (value != 0 && value != REMOVED) { add(value); } } }
public void add(int value) { if ((double) size / elements.length >= 0.75) { rehash(); } ... }
26
Hash table sizes
• Can use prime numbers as hash table sizes to reduce collisions.
• Also improves spread / reduces clustering on rehash.
set.add(11); // 11 % 13 == 11set.add(39); // 39 % 13 == 0set.add(21); // 21 % 13 == 8set.add(29); // 29 % 13 == 3set.add(71); // 81 % 13 == 6set.add(41); // 41 % 13 == 2set.add(99); // 101 % 13 == 10
index
0 1 2 3 4 5 6 7 8 9 10 11
12
value
39
0 41
29 0 0 71 0 21
0 101
11 0
size 7
27
Other details
• How would we implement toString on our HashIntSet?
System.out.println(set);// [11, 24, 54, 37, 49]
index
0 1 2 3 4 5 6 7 8 9
value
0 11
0 0 24
54
0 37
0 49
size 5
28
Separate chaining
• separate chaining: Solving collisions by storing a list at each index.– add/contains/remove must traverse lists, but the lists are short– impossible to "run out" of indexes, unlike with probing
private class Node { public int data; public Node next; ...}
index
0 1 2 3 4 5 6 7 8 9
value
54
14
24
11 7 49
29
Implementing HashIntSet
• Let's implement a hash set of ints using separate chaining.
public class HashIntSet implements IntSet { // array of linked lists; // elements[i] = front of list #i (null if empty) private Node[] elements; private int size;
// constructs new empty set public HashIntSet() { elements = new Node[10]; size = 0; }
// hash function maps values to indexes private int hash(int value) { return Math.abs(value) % elements.length; } ...
30
The add operation
• How do we add an element to the hash table?– When you want to modify a linked list, you must either
change the list's front reference, or the next field of a node in the list.
– Where in the list should we add the new element?– Must make sure to avoid duplicates.
– set.add(24);
index
0 1 2 3 4 5 6 7 8 9
value 54
1424
11 7 49
new node
31
Implementing addpublic void add(int value) { if (!contains(value)) { int h = hash(value); // add to front Node newNode = new Node(value); // of list #h newNode.next = elements[h]; elements[h] = newNode; size++; }}
32
The contains operation
• How do we search for an element in the hash table?– Must loop through the linked list for the appropriate hash
index,looking for the desired value.
– Looping through a linked list requires a "current" node reference.
– set.contains(14) // true– set.contains(84) // false– set.contains(53) // false
index
0 1 2 3 4 5 6 7 8 9
value
54
14
2411 7 49
current
33
Implementing containspublic boolean contains(int value) { Node current = elements[hash(value)]; while (current != null) { if (current.data == value) { return true; } current = current.next; } return false;}
34
The remove operation
• How do we remove an element from the hash table?– Cases to consider: front (24), non-front (14), not found
(94), null (32)– To remove a node from a linked list, you must either
change the list's front reference, or the next field of the previous node in the list.
– set.remove(54);index
0 1 2 3 4 5 6 7 8 9
value
54
14
2411 7 49
current
35
Implementing removepublic void remove(int value) { int h = hash(value); if (elements[h] != null && elements[h].data == value) { elements[h] = elements[h].next; // front case size--; } else { Node current = elements[h]; // non-front case while (current != null && current.next != null) { if (current.next.data == value) { current.next = current.next.next; size--; return; } current = current.next; } }}
36
Rehashing w/ chaining
• Separate chaining handles rehashing similarly to linear probing.– Loop over the list in each hash bucket; re-add each
element.– An optimal implementation re-uses node objects, but this
is optional.
index
0 1 2 3 4 5 6 7 8 9
value 1
1245414
7 49
index
0 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
value 1
124
14
7 49
54
37
Hash set of objectspublic class HashSet<E> implements Set<E> { ... private class Node { public E data; public Node next; }}
• It is easy to hash an integer i (use index abs(i) % length ).– How can we hash other types of values (such as objects)?
38
The hashCode method
• All Java objects contain the following method:
public int hashCode()
Returns an integer hash code for this object.
– We can call hashCode on any object to find its preferred index.
– HashSet, HashMap, and the other built-in "hash" collections call hashCode internally on their elements to store the data.
• We can modify our set's hash function to be the following:private int hash(E e) { return Math.abs(e.hashCode()) % elements.length;}
39
Issues with generics
• You must make an unusual cast on your array of generic nodes:public class HashSet<E> implements Set<E> { private Node[] elements; ... public HashSet() { elements = (Node[]) new HashSet.Node[10]; }
• Perform all element comparisons using equals:public boolean contains(int value) { ... // if (current.data == value) { if (current.data.equals(value)) { return true; } ...
40
Implementing hashCode• You can write your own hashCode methods in classes you
write.– All classes come with a default version based on memory
address.– Your overridden version should somehow "add up" the object's
state.•Often you scale/multiply parts of the result to distribute the results.
public class Point { private int x; private int y; ... public int hashCode() { // better than just returning (x + y); // spreads out numbers, fewer collisions return 137 * x + 23 * y; }}
41
Good hashCode behavior
• A well-written hashCode method has:– Consistently with itself (must produce same results on each
call):o.hashCode() == o.hashCode(), if o's state doesn't change
– Consistently with equality:a.equals(b) must imply that a.hashCode() == b.hashCode(),
!a.equals(b) does NOT necessarily imply that a.hashCode() != b.hashCode() (why not?)
•When your class has an equals or hashCode, it should have both.
– Good distribution of hash codes:•For a large set of objects with distinct states, they will generally
return unique hash codes rather than all colliding into the same hash bucket.
42
Example: String hashCode
• The hashCode function inside a String object looks like this:
public int hashCode() { int hash = 0; for (int i = 0; i < this.length(); i++) { hash = 31 * hash + this.charAt(i); } return hash;}
– As with any general hashing function, collisions are possible.•Example: "Ea" and "FB" have the same hash value.
– Early versions of the Java examined only the first 16 characters.For some common data this led to poor hash table performance.
43
hashCode tricks
• If one of your object's fields is an object, call its hashCode:public int hashCode() { // Student return 531 * firstName.hashCode() + ...;
• To incorporate a double or boolean, use the hashCode method from the Double or Boolean wrapper classes:public int hashCode() { // BankAccount return 37 * Double.valueOf(balance).hashCode() + Boolean.valueOf(isCheckingAccount).hashCode();
• Guava includes an Objects.hashCode(...) method that takes any number of values and combines them into one hash code.public int hashCode() { // BankAccount return Objects.hashCode(name, id, balance);
44
Implementing a hash map
• A hash map is like a set where the nodes store key/value pairs:
public class HashMap<K, V> implements Map<K, V> { ...}
// key valuemap.put("Marty", 14);map.put("Jeff", 21);map.put("Kasey", 20);map.put("Stef", 35);
– Must modify your Node class to store a key and a value
index
0 1 2 3 4 5 6 7 8 9
value "Jeff" 2
1"Mart
y"14
"Kasey"
20
"Stef"
35
45
Map ADT interface• Let's think about how to write our own implementation
of a map.– As is (usually) done in the Java Collection Framework, we
will define map as an ADT by creating a Map interface.– Core operations: put (add), get, contains key, remove
public interface Map<K, V> { void clear(); boolean containsKey(K key); V get(K key); boolean isEmpty(); void put(K key, V value); void remove(int value); int size();}
46
Hash map vs. hash set– The hashing is always done on the keys, not the values.– The contains method is now containsKey; there and in remove, you search for a node whose key matches a given key.
– The add method is now put; if the given key is already there, you must replace its old value with the new one.•map.put("Bill", 66); // replace 49 with 66
index
0 1 2 3 4 5 6 7 8 9
value "Jeff" 2
1"Mart
y"14
"Kasey"
20
"Stef"
35
66
"Abby"
57
"Bill" 49
Priority Queuesand Heaps
Reading: 18.2
48
Prioritization problems
• print jobs: CSE lab printers constantly accept and complete jobs from all over the building. We want to print faculty jobs before staff before student jobs, and grad students before undergrad, etc.
• ER scheduling: Scheduling patients for treatment in the ER. A gunshot victim should be treated sooner than a guy with a cold, regardless of arrival time. How do we always choose the most urgent case when new patients continue to arrive?
• key operations we want:– add an element (print job, patient, etc.)– get/remove the most "important" or "urgent" element
49
Priority Queue ADT
• priority queue: A collection of ordered elements that provides fast access to the minimum (or maximum) element.– add adds in order– peek returns minimum or "highest priority" value– remove removes/returns minimum value– isEmpty, clear, size, iterator O(1)
pq.add("if");pq.add("from");...
priority queue
"the" "of"
"from""to"
"she" "you"
"him""why"
"in"
"down""by"
"if"
pq.remove() "by"
50
Unfilled array?• Consider using an unfilled array to implement a priority
queue.– add: Store it in the next available index, as in a list.– peek: Loop over elements to find minimum element.– remove: Loop over elements to find min. Shift to remove.
queue.add(9);queue.add(23);queue.add(8);queue.add(-3);queue.add(49);queue.add(12);queue.remove();
– How efficient is add? peek? remove?•O(1), O(N), O(N)•(peek must loop over the array; remove must shift elements)
index
0 1 2 3 4 5 6 7 8 9
value
9 23
8 -3
49
12
0 0 0 0
size 6
51
Sorted array?
• Consider using a sorted array to implement a priority queue.– add: Store it in the proper index to maintain sorted order.– peek: Minimum element is in index [0].– remove: Shift elements to remove min from index [0].
queue.add(9);queue.add(23);queue.add(8);queue.add(-3);queue.add(49);queue.add(12);queue.remove();
– How efficient is add? peek? remove?•O(N), O(1), O(N)•(add and remove must shift elements)
index
0 1 2 3 4 5 6 7 8 9
value
-3
8 9 12
23
49
0 0 0 0
size 6
52
Linked list?
• Consider using a doubly linked list to implement a priority queue.– add: Store it at the end of the linked list.– peek: Loop over elements to find minimum element.– remove: Loop over elements to find min. Unlink to remove.
queue.add(9);queue.add(23);queue.add(8);queue.add(-3);queue.add(49);queue.add(12);queue.remove();
– How efficient is add? peek? remove?•O(1), O(N), O(N)•(peek and remove must loop over the linked list)
9 23
8 -3
49
12
front back
53
Sorted linked list?• Consider using a sorted linked list to implement a priority
queue.– add: Store it in the proper place to maintain sorted order.– peek: Minimum element is at the front.– remove: Unlink front element to remove.
queue.add(9);queue.add(23);queue.add(8);queue.add(-3);queue.add(49);queue.add(12);queue.remove();
– How efficient is add? peek? remove?•O(N), O(1), O(1)•(add must loop over the linked list to find the proper insertion
point)
-3
8 9 12
23
49
front back
54
Binary search tree?
• Consider using a binary search tree to implement a PQ.– add: Store it in the proper BST L/R - ordered spot.– peek: Minimum element is at the far left edge of the tree.– remove: Unlink far left element to remove.
queue.add(9);queue.add(23);queue.add(8);queue.add(-3);queue.add(49);queue.add(12);queue.remove();
– How efficient is add? peek? remove?•O(log N), O(log N), O(log N)...?•(good in theory, but the tree tends to become unbalanced to the
right)
49-3
238
9
12
55
Unbalanced binary treequeue.add(9);queue.add(23);queue.add(8);queue.add(-3);queue.add(49);queue.add(12);queue.remove();
queue.add(16);queue.add(34);queue.remove();queue.remove();queue.add(42);queue.add(45);queue.remove();
– Simulate these operations. What is the tree's shape?– A tree that is unbalanced has a height close to N rather
than log N, which breaks the expected runtime of many operations.
49
23
12
16
34
42
45
56
Heaps
• heap: A complete binary tree with vertical ordering.– complete tree: Every level is full except possibly the
lowest level, which must be filled from left to right•(i.e., a node may not have any children until all possible
siblings exist)
57
Heap ordering
• heap ordering: If P ≤ X for every element X with parent P.– Parents' values are always smaller than those of their
children.– Implies that minimum element is always the root (a "min-
heap").•variation: "max-heap" stores largest element at root,
reverses ordering
– Is a heap a BST? How are they related?
58
Which are min-heaps?
1530
8020
10
996040
8020
10
50 700
85
996040
8020
10
50 700
85 996040
8010
20
50 700
85
6040
8020
10
996040
8020
10
no no
no
no
59
24
7 3
30
10 40
30
80
2510
48
21
14
10 17
33
91828
11
22
3530
50
30
10 20
no
no
Which are max-heaps?
59
60
Heap height and runtime
• The height of a complete tree is always log N.– How do we know this for sure?
• Because of this, if we implement a priority queue using a heap, we can provide the following runtime guarantees:– add: O(log N)– peek: O(1)– remove:O(log N)
n-node complete treeof height h:
2h n 2h+1 – 1h = log n
61
The add operation
• When an element is added to a heap, where should it go?– Must insert a new node while maintaining heap
properties.– queue.add(15);
996040
8020
10
50 700
85
65
15
new node
62
The add operation
• When an element is added to a heap, it should be initially placed as the rightmost leaf (to maintain the completeness property).– But the heap ordering property becomes broken!
996040
8020
10
50 700
85
65
996040
8020
10
50 700
85
65 15
63
"Bubbling up" a node
• bubble up: To restore heap ordering, the newly added element is shifted ("bubbled") up the tree until it reaches its proper place.– Weiss: "percolate up" by swapping with its parent– How many bubble-ups are necessary, at most?
996040
8020
10
50 700
85
65 15
992040
8015
10
50 700
85
65 60
64
Bubble-up exercise
• Draw the tree state of a min-heap after adding these elements:– 6, 50, 11, 25, 42, 20, 104, 76, 19, 55, 88, 2
1044225
619
2
76 50
11
55 88 20
65
The peek operation
• A peek on a min-heap is trivial to perform.– because of heap properties, minimum element is always
the root– O(1) runtime
• Peek on a max-heap would be O(1) as well (return max, not min)
996040
8020
10
50 76
85
65
66
The remove operation
• When an element is removed from a heap, what should we do?– The root is the node to remove. How do we alter the
tree?– queue.remove();
996040
8020
10
50 700
85
65
67
The remove operation
• When the root is removed from a heap, it should be initially replaced by the rightmost leaf (to maintain completeness).– But the heap ordering property becomes broken!
996040
8020
10
700 50
85
65
996040
8020
65
700 50
85
65
68
"Bubbling down" a node
• bubble down: To restore heap ordering, the new improper root is shifted ("bubbled") down the tree until it reaches its proper place.– Weiss: "percolate down" by swapping with its smaller
child (why?)– How many bubble-down are necessary, at most?
996040
8020
65
74 50
85 996050
8040
20
74 65
85
69
Bubble-down exercise
• Suppose we have the min-heap shown below. • Show the state of the heap tree after remove has been
called 3 times, and which elements are returned by the removal.
1044225
619
2
76 50
11
55 88 20
70
Array heap implementation
• Though a heap is conceptually a binary tree,since it is a complete tree, when implementing itwe actually can "cheat" and just use an array!– index of root = 1 (leave 0 empty to simplify the math)– for any node n at index i :
•index of n.left = 2i•index of n.right = 2i + 1•parent index of n?
– This array representationis elegant and efficient (O(1))for common tree operations.
71
Implementing HeapPQ
• Let's implement an int priority queue using a min-heap array.
public class HeapIntPriorityQueue implements IntPriorityQueue { private int[] elements; private int size;
// constructs a new empty priority queue public HeapIntPriorityQueue() { elements = new int[10]; size = 0; }
...}
72
Helper methods
• Since we will treat the array as a complete tree/heap, and walk up/down between parents/children, these methods are helpful:
// helpers for navigating indexes up/down the treeprivate int parent(int index) { return index/2; }private int leftChild(int index) { return index*2; }private int rightChild(int index) { return index*2 + 1; }private boolean hasParent(int index) { return index > 1; }private boolean hasLeftChild(int index) { return leftChild(index) <= size;}private boolean hasRightChild(int index) { return rightChild(index) <= size;}private void swap(int[] a, int index1, int index2) { int temp = a[index1]; a[index1] = a[index2]; a[index2] = temp;}
73
Implementing add• Let's write the code to add an element to the heap:
public void add(int value) { ...}
996040
8020
10
50 700
85
65 15
992040
8015
10
50 700
85
65 60
74
Implementing add// Adds the given value to this priority queue in order.public void add(int value) { elements[size + 1] = value; // add as rightmost leaf
// "bubble up" as necessary to fix ordering int index = size + 1; boolean found = false; while (!found && hasParent(index)) { int parent = parent(index); if (elements[index] < elements[parent]) { swap(elements, index, parent(index)); index = parent(index); } else { found = true; // found proper location; stop } }
size++;}
75
Resizing a heap
• What if our array heap runs out of space?– We must enlarge it.– When enlarging hash sets, we needed to carefully rehash
the data.– What must we do here?
– (We can simply copy the datainto a larger array.)
76
Modified add code// Adds the given value to this priority queue in order.public void add(int value) { // resize to enlarge the heap if necessary if (size == elements.length - 1) { elements = Arrays.copyOf(elements, 2 * elements.length); } ...}
77
Implementing peek• Let's write code to retrieve the minimum element in
the heap:
public int peek() { ...}
992040
8015
10
50 700
85
65 60
78
Implementing peek// Returns the minimum element in this priority queue.// precondition: queue is not emptypublic int peek() { return elements[1];}
79
Implementing remove• Let's write code to remove the minimum element in the
heap:
public int remove() { ...}
996040
8020
10
700 50
85
65
996040
8020
65
700 50
85
65
80
Implementing removepublic int remove() { // precondition: queue is not empty int result = elements[1]; // last leaf -> root elements[1] = elements[size]; size--; int index = 1; // "bubble down" to fix ordering boolean found = false; while (!found && hasLeftChild(index)) { int left = leftChild(index); int right = rightChild(index); int child = left; if (hasRightChild(index) && elements[right] < elements[left]) { child = right; } if (elements[index] > elements[child]) { swap(elements, index, child); index = child; } else { found = true; // found proper location; stop } } return result;}
81
Int PQ ADT interface
• Let's write our own implementation of a priority queue.– To simplify the problem, we only store ints in our set for
now.– As is (usually) done in the Java Collection Framework, we
will define sets as an ADT by creating a Set interface.– Core operations are: add, peek (at min), remove (min).
public interface IntPriorityQueue { void add(int value); void clear(); boolean isEmpty(); int peek(); // return min element int remove(); // remove/return min element int size();}
82
Generic PQ ADT
• Let's modify our priority queue so it can store any type of data.– As with past collections, we will use Java generics (a type
parameter).
public interface PriorityQueue<E> { void add(E value); void clear(); boolean isEmpty(); E peek(); // return min element E remove(); // remove/return min element int size();}
83
Generic HeapPQ class
• We can modify our heap priority class to use generics as usual...
public class HeapPriorityQueue<E> implements PriorityQueue<E> { private E[] elements; private int size;
// constructs a new empty priority queue public HeapPriorityQueue() { elements = (E[]) new Object[10]; size = 0; }
...}
84
Problem: ordering elements
// Adds the given value to this priority queue in order.public void add(E value) { ... int index = size + 1; boolean found = false; while (!found && hasParent(index)) { int parent = parent(index); if (elements[index] < elements[parent]) { // error swap(elements, index, parent(index)); index = parent(index); } else { found = true; // found proper location; stop } }}
– Even changing the < to a compareTo call does not work.•Java cannot be sure that type E has a compareTo method.
85
Comparing objects
• Heaps rely on being able to order their elements.• Operators like < and > do not work with objects in Java.
– But we do think of some types as having an ordering (e.g. Dates).
– (In other languages, we can enable <, > with operator overloading.)
• natural ordering: Rules governing the relative placement of all values of a given type.– Implies a notion of equality (like equals) but also < and >
.– total ordering: All elements can be arranged in A ≤ B ≤
C ≤ ... order.– The Comparable interface provides a natural ordering.
86
The Comparable interface
• The standard way for a Java class to define a comparison function for its objects is to implement the Comparable interface.
public interface Comparable<T> {
public int compareTo(T other);
}
• A call of A.compareTo(B) should return:a value < 0 if A comes "before" B in the ordering,a value > 0 if A comes "after" B in the ordering,or exactly0 if A and B are considered "equal" in the
ordering.
• Effective Java Tip #12: Consider implementing Comparable.
87
Bounded type parameters
<Type extends SuperType>– An upper bound; accepts the given supertype or any of its
subtypes.– Works for multiple superclass/interfaces with & :<Type extends ClassA & InterfaceB & InterfaceC & ...>
<Type super SuperType>– A lower bound; accepts the given supertype or any of its
supertypes.
• Example:// can be instantiated with any animal typepublic class Nest<T extends Animal> { ...}...Nest<Bluebird> nest = new Nest<Bluebird>();
88
Corrected HeapPQ classpublic class HeapPriorityQueue<E extends Comparable<E>> implements PriorityQueue<E> { private E[] elements; private int size;
// constructs a new empty priority queue public HeapPriorityQueue() { elements = (E[]) new Object[10]; size = 0; } ... public void add(E value) { ... while (...) { if (elements[index].compareTo( elements[parent]) < 0) { swap(...); } } }}
Ordering and Comparators
90
What's the "natural" order?
public class Rectangle implements Comparable<Rectangle> { private int x, y, width, height;
public int compareTo(Rectangle other) { // ...? }}
• What is the "natural ordering" of rectangles?– By x, breaking ties by y?– By width, breaking ties by height?– By area? By perimeter?
• Do rectangles have any "natural" ordering?– Might we want to arrange rectangles into some order
anyway?
91
Comparator interfacepublic interface Comparator<T> { public int compare(T first, T second);}
• Interface Comparator is an external object that specifies a comparison function over some other type of objects.– Allows you to define multiple orderings for the same type.– Allows you to define a specific ordering(s) for a type even if
there is no obvious "natural" ordering for that type.– Allows you to externally define an ordering for a class that, for
whatever reason, you are not able to modify to make it Comparable:•a class that is part of the Java class libraries•a class that is final and can't be extended•a class from another library or author, that you don't control• ...
92
Comparator examplespublic class RectangleAreaComparator implements Comparator<Rectangle> { // compare in ascending order by area (WxH) public int compare(Rectangle r1, Rectangle r2) { return r1.getArea() - r2.getArea(); }}
public class RectangleXYComparator implements Comparator<Rectangle> { // compare by ascending x, break ties by y public int compare(Rectangle r1, Rectangle r2) { if (r1.getX() != r2.getX()) { return r1.getX() - r2.getX(); } else { return r1.getY() - r2.getY(); } }}
93
Using Comparators•TreeSet, TreeMap , PriorityQueue can use Comparator:
Comparator<Rectangle> comp = new RectangleAreaComparator();Set<Rectangle> set = new TreeSet<Rectangle>(comp);Queue<Rectangle> pq = new PriorityQueue<Rectangle>(10,comp);
• Searching and sorting methods can accept Comparators.Arrays.binarySearch(array, value, comparator)Arrays.sort(array, comparator)Collections.binarySearch(list, comparator)Collections.max(collection, comparator)Collections.min(collection, comparator)Collections.sort(list, comparator)
• Methods are provided to reverse a Comparator's ordering:public static Comparator Collections.reverseOrder()public static Comparator Collections.reverseOrder(comparator)
94
PQ and Comparator
• Our heap priority queue currently relies on the Comparable natural ordering of its elements:public class HeapPriorityQueue<E extends Comparable<E>>
implements PriorityQueue<E> {
...
public HeapPriorityQueue() {...}
}
• To allow other orderings, we can add a constructor that accepts a Comparator so clients can arrange elements in any order: ...
public HeapPriorityQueue(Comparator<E> comp) {...}
95
PQ Comparator exercise
• Write code that stores strings in a priority queue and reads them back out in ascending order by length.– If two strings are the same length, break the tie by ABC
order.
Queue<String> pq = new PriorityQueue<String>(...);pq.add("you");pq.add("meet");pq.add("madam");pq.add("sir");pq.add("hello");pq.add("goodbye");while (!pq.isEmpty()) { System.out.print(pq.remove() + " ");}
// sir you meet hello madam goodbye
96
PQ Comparator answer
• Use the following comparator class to organize the strings:
public class LengthComparator implements Comparator<String> { public int compare(String s1, String s2) { if (s1.length() != s2.length()) { // if lengths are unequal, compare by length return s1.length() - s2.length(); } else { // break ties by ABC order return s1.compareTo(s2); } }}...Queue<String> pq = new PriorityQueue<String>(100, new LengthComparator());
97
Heap sort
• heap sort: An algorithm to sort an array of N elements by turning the array into a heap, then calling remove N times.– The elements will come out in sorted order.– We can put them into a new sorted array.– What is the runtime?
98
Heap sort implementation
public static void heapSort(int[] a) { PriorityQueue<Integer> pq = new HeapPriorityQueue<Integer>(); for (int n : a) { pq.add(a); } for (int i = 0; i < a.length; i++) { a[i] = pq.remove(); }}
– This code is correct and runs in O(N log N) time but wastes memory.
– It makes an entire copy of the array a into the internal heap of the priority queue.
– Can we perform a heap sort without making a copy of a?
99
Improving the code
• Idea: Treat a itself as a max-heap, whose data starts at 0 (not 1). – a is not actually in heap order.– But if you repeatedly "bubble down" each non-leaf node,
starting from the last one, you will eventually have a proper heap.
• Now that a is a valid max-heap:– Call remove repeatedly until the heap is empty.– But make it so that when an element is "removed", it is
moved to the end of the array instead of completely evicted from the array.
– When you are done, voila! The array is sorted.
100
Step 1: Build heap in-place
• "Bubble" down non-leaf nodes until the array is a max-heap:– int[] a = {21, 66, 40, 10, 70, 81, 30, 22, 45, 95, 88, 38};
– Swap each node with itslarger child as needed.
307010
4066
21
22 45
81
95 88
index
0 1 2 3 4 5 6 7 8 9 0 1 2 ...
value 21
66
40
10
70
81
30
22
45
95
88
38
0 ...
size 12
38
101
Build heap in-place answer
– 30: nothing to do– 81: nothing to do– 70: swap with 95– 10: swap with 45– 40: swap with 81– 66: swap with 95, then 88– 21: swap with 95, then 88, then 70
307045
8188
95
22 10
40
66 21
index
0 1 2 3 4 5 6 7 8 9 0 1 2 ...
value 95
88
81
45
70
40
30
22
10
66
21
38
0 ...
size 12
38
102
Remove to sort
• Now that we have a max-heap, remove elements repeatedly until we have a sorted array.– Move each removed element
to the end, rather than tossing it.
307045
8188
95
22 10
40
66 21
index
0 1 2 3 4 5 6 7 8 9 0 1 2 ...
value 95
88
81
45
70
40
30
22
10
66
21
38
0 ...
size 12
38
103
Remove to sort answer– 95: move 38 up, swap with 88, 70, 66– 88: move 21 up, swap with 81, 40– 81: move 38 up, swap with 70, 66– 70: move 10 up, swap with 66, 45, 22– ...
– (Notice that after 4 removes,the last 4 elements in thearray are sorted.If we remove everyelement, the entirearray will be sorted.)
303822
4045
66
10 70
21
81 88
index
0 1 2 3 4 5 6 7 8 9 0 1 2 ...
value 66
45
40
22
38
21
30
10
70
81
88
95
0 ...
size 12
95