Lecture 17 - University of California, San...
Transcript of Lecture 17 - University of California, San...
Page 1 of 31
CSE 100, UCSD: LEC 17Lecture 17
✔ Separate chaining
✔ Dictionary data types
✔ Hashtables vs. balanced search trees
✔ A hashtable implementation: java.util.Hashtable
✔ Object serialization in Java
Reading: Weiss, Ch 5; and JDK source code
Page 2 of 31
Final exam
ns of the textbook, and
CSE 100, UCSD: LEC 17
✔ Final exam time: Tue Mar 17 11:30am-2:00pm
✔ Location: CSB 002
✔ Closed book, closed notes, no calculators...
✔ Bring something to write with, and picture ID
✔ Practice final is on line
✔ Final exam discussion topic on Webboard
✔ Exam review sessions will be:
✗ last lecture
✔ Coverage: All lectures, all assignments, corresponding sectiohandouts
Page 3 of 31
Open addressing vs. separate chaining
f the keys are kept as
head of a linked list
need to probe other table here.
r this; but linked lists
CSE 100, UCSD: LEC 17
✔ Linear probing, double and random hashing are appropriate ientries in the hashtable itself...
✗ doing that is called "open addressing"
✗ it is also called "closed hashing"
✔ Another idea: Entries in the hashtable are just pointers to the (“chain”); elements of the linked list contain the keys...
✗ this is called "separate chaining"
✗ it is also called "open hashing"
✔ Collision resolution becomes easy with separate chaining: nolocations; just insert a key in its linked list if it is not already t
✔ (It is possible to use fancier data structures than linked lists fowork very well in the average case, as we will see)
Page 4 of 31
Separate chaining: basic algorithms
t to avoid duplicates.)
)
arch.
of entries approachespen addressing
inters in addition to data, e
CSE 100, UCSD: LEC 17
✔ When inserting a key K in a table with hash function H(K)
1. Set indx = H(K)2. Insert key in linked list headed at indx. (Search the list firs
✔ When searching for a key K in a table with hash function H(K
1. Set indx = H(K)2. Search for key in linked list headed at indx, using linear se
✔ When deleting a key K in a table with hash function H(K)
1. Set indx = H(K)2. Delete key in linked list headed at indx
✔ Advantages: average case performance stays good as numberand even exceeds M; delete is easier to implement than with o
✔ Disadvantages: requires dynamic data, requires storage for pocan have poor locality which causes poor caching performanc
Page 5 of 31
Separate chaining, an example
5 6
CSE 100, UCSD: LEC 17
M = 7, H(K) = K mod Minsert these keys 701, 145, 217, 19, 13, 749in this table, using separate chaining:
index: 0 1 2 3 4
Page 6 of 31
Analysis of separate-chaining hashing
:
n the best, average, and
let’s analyze the average
CSE 100, UCSD: LEC 17
✔ Keep in mind the load factor measure of how full the table is
α = N/M
where M is the size of the table, and N is the number of keys that have been inserted in the table
✔ With separate chaining, it is possible to have α > 1
✔ Given a load factor α, we would like to know the time costs, iworst case of
✗ new-key insert and unsuccessful find (these are the same)
✗ successful find
✔ The best case is O(1) and worst case is O(N) for all of these...case
Page 7 of 31
Average case costs with separate chaining
f which may be empty),
y is accessed; then the
it by the hash function, th separate chaining is
; then the linked list robabilistic assumption) chaining is
eds 1.
CSE 100, UCSD: LEC 17
✔ Assume a table with load factor α = N/M
✔ There are N items total distributed over M linked lists (some oso the average number of items per linked list is:
✔ In any unsuccessful find/insert, the hash table entry for the kelinked list headed there is exhaustively searched
✔ Therefore, assuming all table entries are equally likely to be hthe average number of steps for insert or unsuccessful find wi
✔ In successful find, the hash table entry for the key is accessedheaded there is linearly searched. Therefore, (with the same pthe average number of steps for successful find with separate
✔ These are less than 2 and 1.5 respectively, when α < 1
✔ And these remain O(1), independent of M, even when α exce
Uα 1 α+=
Sα 1α2---+=
Page 8 of 31
Dictionary data types
find operation says ta item; etc.
consists of a key, together
nd operation takes a key te takes a key and
data types, or
CSE 100, UCSD: LEC 17
✔ A data structure is intended to hold data
✗ An insert operation inserts a data item into the structure; awhether a data item is in the structure; delete removes a da
✔ A Dictionary is a specialized kind of data structure:
✗ A Dictionary structure is intended to hold pairs: each pair with some related data
✗ An insert operation inserts a key-data pair in the table; a fiand returns the data in the key-data pair with that key; deleremoves the key-data pair with that key; etc.
✔ Dictionaries are sometimes called "Table” or “Map” abstract "associative memories"
Page 9 of 31
Dictionary as ADT
additional data
th the same key is already
n key; return the data
CSE 100, UCSD: LEC 17
✔ Domain:
✗ a collection of pairs; each pair consists of a key, and some
✔ Operations (typical):
✗ Create a table (initially empty)
✗ Insert a new key-data pair in the table; if a key-data pair withere, update the data part of the pair
✗ Find the key-data pair in the table corresponding to a give
✗ Delete the key-data pair corresponding to a given key
✗ Enumerate (traverse) all key-data pairs in the table
Page 10 of 31
Implementing the Dictionary ADT
-data pairs
, find, and delete
can specify the functions
CSE 100, UCSD: LEC 17
✔ A Dictionary can be implemented in various ways:
✗ using a list, binary search tree, hashtable, etc., etc.
✔ In each case:
✗ the implementing data structure has to be able to hold key
✗ the implementing data structure has to be able to do insertoperations paying attention to the key
✔ This could be done in a generic data structure, where the usercomparison function to be used by the insert, find, and delete
Page 11 of 31
The Dictionary ADT and search engine indexes
t to store, retrieve, and
h as what documents a within the document, etc.
n is done in the index to and possibly other
peration is done to add or n which it occurs,
have their associations
ey, a user can find the
CSE 100, UCSD: LEC 17
✔ The Dictionary ADT is useful in any situation where you wanmanipulate data based on associated keys
✔ One important application is a document search engine index
✔ An index associates words (keys) with information (data) sucword occurs in, how many times it occurs, what its position is
✗ When a word is read for the first time, an "insert" operatioassociate that word with the document in which it occurs (information)
✗ When a word is encountered again, "insert" or "update" omodify associations with that word (additional document iincrement the number of times it occurs, etc.)
✗ If a document is no longer available, words contained in itchanged, and the "delete" operation may be necessary
✗ By doing a “find” operation in the index using a word as kdocuments that contain that word
Page 12 of 31
Hashtables vs. balanced search trees
ications that need fast
(log N), which is quite
which is excellent; but
ys K1, K2, either K1<K2,
nd that you can compute
CSE 100, UCSD: LEC 17
✔ Hashtables and balanced search trees can both be used in applinsert and find
✔ What are advantages and disadvantages of each?
✗ Balanced search trees guarantee worst-case performance Ogood
✗ A well-designed hash table has typical performance O(1), worst-case is O(N), which is bad
✗ Search trees require that keys be well-ordered: For any keK1==K2, or K1> K2
✗ Hashtables only require that keys be testable for equality, aa hash function for them
Page 13 of 31
Hashtables vs. balanced search trees, cont’d
ue to a given key, or to sorted order
tion efficiently
fficient, and somewhat
ent correctly
CSE 100, UCSD: LEC 17
✗ A search tree can easily be used to return keys close in valreturn the smallest key in the tree, or to output the keys in
✗ A hashtable does not normally deal with ordering informa
✗ In a balanced search tree, delete is as efficient as insert
✗ In a hashtable that uses open addressing, delete can be inetricky to implement (easy with separate chaining though)
✗ Overall, balanced search trees are rather difficult to implem
✗ Hash tables are relatively easy to implement
Page 14 of 31
A look at Java’s Hashtable
library since JDK1.0
Framework”, and
nterface, but the
is makes them slightly programming, use
type parameters for keys
CSE 100, UCSD: LEC 17
✔ The java.util.Hashtable class has existed in the Java standard
✔ In JDK 1.2, Hashtable was incorporated into the “Collectionsdeclared declared to implement Map
✔ java.util.Hashtable is similar to java.util.HashMap
✗ They both implement Map, so they have the same public iimplementation is slightly different
✗ One difference is Hashtable has synchronized methods (thslower; if you don’t need synchronization for multitheadedHashMap)
✔ In JDK 1.5, Hashtable and Hashmap were made generic, withand values
Page 15 of 31
Hashtable.java
keys to values.
s a value.
m a hashtable, the
>hashCode</code>
izable {
CSE 100, UCSD: LEC 17
package java.util;
import java.io.*;
/**
* This class implements a hashtable, which maps
* Any non-null object can be used as a key or a
* <p>
* To successfully store and retrieve objects fro
* objects used as keys must implement the <code
* method and the <code>equals</code> method.
*/
public class Hashtable<K,V>
extends Dictionary<K,V>
implements Map<K,V>, Cloneable, java.io.Serial
Page 16 of 31
Dictionary abstract class
thods. It acts like an rface instead of a class. ere, without comments:
CSE 100, UCSD: LEC 17
✔ Dictionary is an abstract class, that specifies some abstract meinterface specification, and probably should have been an inteVery similar to the interface java.util.Map. Methods shown h
public abstract class Dictionary<K,V> {
abstract public int size();
abstract public boolean isEmpty();
abstract public Enumeration<K> keys();
abstract public Enumeration<V> elements();
abstract public V get(Object key);
abstract public V put(K key, V value);
abstract public V remove(Object key);
}
Page 17 of 31
Instance variables
:
able.
s threshold.
shtable?
CSE 100, UCSD: LEC 17
✔ Here are the instance variables declared in the Hashtable class /**
* The hash table data.
*/
private transient Entry table[];
/**
* The total number of entries in the hash t
*/
private transient int count;
/**
* Rehashes the table when count exceeds thi
*/
private int threshold;
✔ What is the type of elements of the array implementing the ha
Page 18 of 31
Entry
ntry<K,V> {
objects of this class.
lution strategy is used?
CSE 100, UCSD: LEC 17
✔ The Hashtable.java file also defines this inner class:
private static class Entry<K,V> implements Map.E
int hash;
K key;
V value;
Entry<K,V> next;
}
✔ Entries in a Hashtable object’s table[] array are pointers to
✔ From these declarations so far, can you tell what collision reso
Page 19 of 31
Hashtable methods
:
CSE 100, UCSD: LEC 17
✔ We will look at these instance methods in the Hashtable class
✗ constructors
✗ get()
✗ put()
✗ keySet()
Page 20 of 31
Hashtable constructors
ecified initial
city of the table
n 0.0 and 1.0.
initial capacity is
actor
Factor) {
0)) {
r);
CSE 100, UCSD: LEC 17
/**
* Constructs a new, empty hashtable with the sp
* capacity and the specified load factor.
*
* @param initialCapacity the initial capa
* @param loadFactor a number betwee
* @exception IllegalArgumentException if the
* less than zero, or if the load f
* is less than or equal to zero.
* @since JDK1.0
*/
public Hashtable(int initialCapacity, float load
if ((initialCapacity < 0) || (loadFactor <= 0.
throw new IllegalArgumentException();
}
this.loadFactor = loadFactor;
table = new Entry[initialCapacity];
threshold = (int) (initialCapacity * loadFacto
}
Page 21 of 31
Hashtable default constructor
ult capacity and
the hash table design
CSE 100, UCSD: LEC 17
/**
* Constructs a new, empty hashtable with a defa
* load factor.
*
* @since JDK1.0
*/
public Hashtable() {
this(11, 0.75);
}
✔ How do the default values for size and load factor compare toprinciples we talked about?...
Page 22 of 31
get()
is mapped in this
in this hashtable;
value in
;
e = e.next) {
) {
CSE 100, UCSD: LEC 17
/**
* Returns the value to which the specified key
* hashtable.
*
* @param key a key in the hashtable.
* @return the value to which the key is mapped
* null if the key is not mapped to any
* this hashtable.
*/
public synchronized V get(Object key) {
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % table.length
for (Entry<K,V> e = table[index] ; e != null ;
if ( e.hash == hash && e.key.equals(key)
return e.value;
}
}
return null;
}
Page 23 of 31
put()
ecified
the key nor the
de>get</code>
al key.
ed key in this
t did not have one.
or value is
CSE 100, UCSD: LEC 17
✔ Here are the javadoc comments:/**
* Maps the specified <code>key</code> to the sp
* <code>value</code> in this hashtable. Neither
* value can be <code>null</code>.
* <p>
* The value can be retrieved by calling the <co
* method with a key that is equal to the origin
*
* @param key the hashtable key.
* @param value the value.
* @return the previous value of the specifi
* hashtable,or <code>null</code> if i
* @exception NullPointerException if the key
* <code>null</code>.
* @since JDK1.0
*/
✔ ... and the code follows.
Page 24 of 31
public synchronized V put(K key, V value) {
ate its value
;
e = e.next) {
) {
ceeded
f the table
;
CSE 100, UCSD: LEC 17
// Make sure the value is not null
if (value == null) {
throw new NullPointerException();
}
// If the key is already in the hashtable, upd
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % table.length
for (Entry<K,V> e = table[index] ; e != null ;
if ( e.hash == hash && e.key.equals(key)
V old = e.value;
e.value = value;
return old;
}
}
if (count >= threshold) {
// Rehash the table if the threshold is ex
rehash(); // this enlarges the capacity o
index = (hash & 0x7FFFFFFF) % table.length
}
Page 25 of 31
CSE 100, UCSD: LEC 17// Create and add the new entry.
Entry<K,V> e = new Entry<K,V>();
e.hash = hash;
e.key = key;
e.value = value;
e.next = table[index];
table[index] = e;
count++;
return null;
}
Page 26 of 31
Rehashing
rganizes this
s its entries more
l ; ) {
city;
CSE 100, UCSD: LEC 17
/** Increases the capacity of and internally reo
* hashtable, in order to accommodate and acces
* efficiently.
*/
protected void rehash() {
int oldCapacity = table.length;
Entry oldMap[] = table;
int newCapacity = oldCapacity * 2 + 1;
Entry newMap[] = new Entry[newCapacity];
threshold = (int)(newCapacity * loadFactor);
table = newMap;
for (int i = oldCapacity ; i-- > 0 ;) {
for (Entry<K,V> old = oldMap[i] ; old != nul
Entry<K,V> e = old;
old = old.next;
int index = (e.hash & 0x7FFFFFFF) % newCapa
e.next = newMap[index];
newMap[index] = e;
}
}
Page 27 of 31
keySet()
r not: just use get()
e are many possible keys, check them all with get()
e keys in the table:in this Hashtable.
emoves the
but not element
in this Map.
ver the keys in the table
CSE 100, UCSD: LEC 17
✔ For any key value, you can find out if that key is in the table o
✔ But how can you get a listing of all the keys in the table? Therand only a few of them will be in the table; it’s not feasible to
✔ The keySet() method returns a Set object that contains only th /* Returns a Set view of the keys contained
* The Set supports element removal (which r
* corresponding entry from the Hashtable),
* addition.
* @return a Set view of the keys contained
* @since 1.2
*/
public Set<K> keySet() {
//...
}
✔ An Iterator for the Set can then be used to iterate efficiently o
Page 28 of 31
Serializable objects
a sequence of bytes, in ted over a network
‘pickling’ the object
i.e. reconstituted, later received at the other end
ed to implement the
ares itself to implement it
s can also be serializable
CSE 100, UCSD: LEC 17
✔ Since JDK1.1, Java has had the ability to “serialize” objects
✔ Serialization is the process of converting an existing object toorder to be sent over a stream (e.g. saved to a file, or transmitconnection, etc.)
✗ serializing an object also sometimes called ‘persisting’ or
✔ This is done in such a way that the object can be deserialized,(e.g. by reading from the file, or when the serialized object is of the network connection, etc.)
✔ In order for an object to be serialized, its class must be declarjava.io.Serializable interface
✔ This interface does not specify any methods: a class that declis just indicating that instances of it can be serialized
✔ Many Java library classes are serializable; user-defined classe
Page 29 of 31
Serializing a serializable class
s or a subclass can be
of an appropriately
Object() method (you he appropriate type)
CSE 100, UCSD: LEC 17
✔ If a class is Serializable, objects that are instances of that classerialized
✔ To serialize an object, pass it to the writeObject() methodcreated java.io.ObjectOutputStream object
✔ The object can be deserialized by creating a corresponding java.io.ObjectInputStream object and calling its readwill want to downcast the returned Object reference to be of t
Page 30 of 31
Designing a serializable class
itive types or Serializable rializable interface and
u do not want it to be part marked transient
lues (null for class types,
serialization methods, ion for how to do this
everything to work, the d deserialization contexts
re created and initialized when an instance of the
CSE 100, UCSD: LEC 17
✔ If all the instance variables of a user-defined class are of primclass types, then the class can be declared to implement the Seinstances of the class can be serialized
✔ If an instance variable is not of a Serializable class type, or yoof the serialized representation, the instance variable must be
✔ transient instance variables are serialized as their default va“zero” for primitive types)
✗ to change this you can write your own serialization and dewhich can call the default methods; see online documentat
✔ Classes themselves are not serialized, only objects! So, to getsame class definition must be available in both serialization an
✗ As a corollary, static variables are never serialized: they awhen the class is loaded into the Java virtual machine, notclass is deserialized
Page 31 of 31
Next time
CSE 100, UCSD: LEC 17
✔ Self-organizing data structures
✔ Self-organizing lists
✔ Splay trees
✔ Spatial data structures
✔ K-D trees
✔ The C++ Standard Template Library