Lecture 17 - University of California, San...

31
Page 1 of 31 CSE 100, UCSD: LEC 17 Lecture 17 Separate chaining Dictionary data types Hashtables vs. balanced search trees A hashtable implementation: java.util.Hashtable Object serialization in Java Reading: Weiss, Ch 5; and JDK source code

Transcript of Lecture 17 - University of California, San...

Page 1: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 1 of 31

CSE 100, UCSD: LEC 17

Lecture 17

✔ Separate chaining

✔ Dictionary data types

✔ Hashtables vs. balanced search trees

✔ A hashtable implementation: java.util.Hashtable

✔ Object serialization in Java

Reading: Weiss, Ch 5; and JDK source code

Page 2: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 2 of 31

Final exam

ns of the textbook, and

CSE 100, UCSD: LEC 17

✔ Final exam time: Tue Mar 17 11:30am-2:00pm

✔ Location: CSB 002

✔ Closed book, closed notes, no calculators...

✔ Bring something to write with, and picture ID

✔ Practice final is on line

✔ Final exam discussion topic on Webboard

✔ Exam review sessions will be:

✗ last lecture

✔ Coverage: All lectures, all assignments, corresponding sectiohandouts

Page 3: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 3 of 31

Open addressing vs. separate chaining

f the keys are kept as

head of a linked list

need to probe other table here.

r this; but linked lists

CSE 100, UCSD: LEC 17

✔ Linear probing, double and random hashing are appropriate ientries in the hashtable itself...

✗ doing that is called "open addressing"

✗ it is also called "closed hashing"

✔ Another idea: Entries in the hashtable are just pointers to the (“chain”); elements of the linked list contain the keys...

✗ this is called "separate chaining"

✗ it is also called "open hashing"

✔ Collision resolution becomes easy with separate chaining: nolocations; just insert a key in its linked list if it is not already t

✔ (It is possible to use fancier data structures than linked lists fowork very well in the average case, as we will see)

Page 4: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 4 of 31

Separate chaining: basic algorithms

t to avoid duplicates.)

)

arch.

of entries approachespen addressing

inters in addition to data, e

CSE 100, UCSD: LEC 17

✔ When inserting a key K in a table with hash function H(K)

1. Set indx = H(K)2. Insert key in linked list headed at indx. (Search the list firs

✔ When searching for a key K in a table with hash function H(K

1. Set indx = H(K)2. Search for key in linked list headed at indx, using linear se

✔ When deleting a key K in a table with hash function H(K)

1. Set indx = H(K)2. Delete key in linked list headed at indx

✔ Advantages: average case performance stays good as numberand even exceeds M; delete is easier to implement than with o

✔ Disadvantages: requires dynamic data, requires storage for pocan have poor locality which causes poor caching performanc

Page 5: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 5 of 31

Separate chaining, an example

5 6

CSE 100, UCSD: LEC 17

M = 7, H(K) = K mod Minsert these keys 701, 145, 217, 19, 13, 749in this table, using separate chaining:

index: 0 1 2 3 4

Page 6: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 6 of 31

Analysis of separate-chaining hashing

:

n the best, average, and

let’s analyze the average

CSE 100, UCSD: LEC 17

✔ Keep in mind the load factor measure of how full the table is

α = N/M

where M is the size of the table, and N is the number of keys that have been inserted in the table

✔ With separate chaining, it is possible to have α > 1

✔ Given a load factor α, we would like to know the time costs, iworst case of

✗ new-key insert and unsuccessful find (these are the same)

✗ successful find

✔ The best case is O(1) and worst case is O(N) for all of these...case

Page 7: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 7 of 31

Average case costs with separate chaining

f which may be empty),

y is accessed; then the

it by the hash function, th separate chaining is

; then the linked list robabilistic assumption) chaining is

eds 1.

CSE 100, UCSD: LEC 17

✔ Assume a table with load factor α = N/M

✔ There are N items total distributed over M linked lists (some oso the average number of items per linked list is:

✔ In any unsuccessful find/insert, the hash table entry for the kelinked list headed there is exhaustively searched

✔ Therefore, assuming all table entries are equally likely to be hthe average number of steps for insert or unsuccessful find wi

✔ In successful find, the hash table entry for the key is accessedheaded there is linearly searched. Therefore, (with the same pthe average number of steps for successful find with separate

✔ These are less than 2 and 1.5 respectively, when α < 1

✔ And these remain O(1), independent of M, even when α exce

Uα 1 α+=

Sα 1α2---+=

Page 8: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 8 of 31

Dictionary data types

find operation says ta item; etc.

consists of a key, together

nd operation takes a key te takes a key and

data types, or

CSE 100, UCSD: LEC 17

✔ A data structure is intended to hold data

✗ An insert operation inserts a data item into the structure; awhether a data item is in the structure; delete removes a da

✔ A Dictionary is a specialized kind of data structure:

✗ A Dictionary structure is intended to hold pairs: each pair with some related data

✗ An insert operation inserts a key-data pair in the table; a fiand returns the data in the key-data pair with that key; deleremoves the key-data pair with that key; etc.

✔ Dictionaries are sometimes called "Table” or “Map” abstract "associative memories"

Page 9: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 9 of 31

Dictionary as ADT

additional data

th the same key is already

n key; return the data

CSE 100, UCSD: LEC 17

✔ Domain:

✗ a collection of pairs; each pair consists of a key, and some

✔ Operations (typical):

✗ Create a table (initially empty)

✗ Insert a new key-data pair in the table; if a key-data pair withere, update the data part of the pair

✗ Find the key-data pair in the table corresponding to a give

✗ Delete the key-data pair corresponding to a given key

✗ Enumerate (traverse) all key-data pairs in the table

Page 10: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 10 of 31

Implementing the Dictionary ADT

-data pairs

, find, and delete

can specify the functions

CSE 100, UCSD: LEC 17

✔ A Dictionary can be implemented in various ways:

✗ using a list, binary search tree, hashtable, etc., etc.

✔ In each case:

✗ the implementing data structure has to be able to hold key

✗ the implementing data structure has to be able to do insertoperations paying attention to the key

✔ This could be done in a generic data structure, where the usercomparison function to be used by the insert, find, and delete

Page 11: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 11 of 31

The Dictionary ADT and search engine indexes

t to store, retrieve, and

h as what documents a within the document, etc.

n is done in the index to and possibly other

peration is done to add or n which it occurs,

have their associations

ey, a user can find the

CSE 100, UCSD: LEC 17

✔ The Dictionary ADT is useful in any situation where you wanmanipulate data based on associated keys

✔ One important application is a document search engine index

✔ An index associates words (keys) with information (data) sucword occurs in, how many times it occurs, what its position is

✗ When a word is read for the first time, an "insert" operatioassociate that word with the document in which it occurs (information)

✗ When a word is encountered again, "insert" or "update" omodify associations with that word (additional document iincrement the number of times it occurs, etc.)

✗ If a document is no longer available, words contained in itchanged, and the "delete" operation may be necessary

✗ By doing a “find” operation in the index using a word as kdocuments that contain that word

Page 12: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 12 of 31

Hashtables vs. balanced search trees

ications that need fast

(log N), which is quite

which is excellent; but

ys K1, K2, either K1<K2,

nd that you can compute

CSE 100, UCSD: LEC 17

✔ Hashtables and balanced search trees can both be used in applinsert and find

✔ What are advantages and disadvantages of each?

✗ Balanced search trees guarantee worst-case performance Ogood

✗ A well-designed hash table has typical performance O(1), worst-case is O(N), which is bad

✗ Search trees require that keys be well-ordered: For any keK1==K2, or K1> K2

✗ Hashtables only require that keys be testable for equality, aa hash function for them

Page 13: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 13 of 31

Hashtables vs. balanced search trees, cont’d

ue to a given key, or to sorted order

tion efficiently

fficient, and somewhat

ent correctly

CSE 100, UCSD: LEC 17

✗ A search tree can easily be used to return keys close in valreturn the smallest key in the tree, or to output the keys in

✗ A hashtable does not normally deal with ordering informa

✗ In a balanced search tree, delete is as efficient as insert

✗ In a hashtable that uses open addressing, delete can be inetricky to implement (easy with separate chaining though)

✗ Overall, balanced search trees are rather difficult to implem

✗ Hash tables are relatively easy to implement

Page 14: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 14 of 31

A look at Java’s Hashtable

library since JDK1.0

Framework”, and

nterface, but the

is makes them slightly programming, use

type parameters for keys

CSE 100, UCSD: LEC 17

✔ The java.util.Hashtable class has existed in the Java standard

✔ In JDK 1.2, Hashtable was incorporated into the “Collectionsdeclared declared to implement Map

✔ java.util.Hashtable is similar to java.util.HashMap

✗ They both implement Map, so they have the same public iimplementation is slightly different

✗ One difference is Hashtable has synchronized methods (thslower; if you don’t need synchronization for multitheadedHashMap)

✔ In JDK 1.5, Hashtable and Hashmap were made generic, withand values

Page 15: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 15 of 31

Hashtable.java

keys to values.

s a value.

m a hashtable, the

>hashCode</code>

izable {

CSE 100, UCSD: LEC 17

package java.util;

import java.io.*;

/**

* This class implements a hashtable, which maps

* Any non-null object can be used as a key or a

* <p>

* To successfully store and retrieve objects fro

* objects used as keys must implement the <code

* method and the <code>equals</code> method.

*/

public class Hashtable<K,V>

extends Dictionary<K,V>

implements Map<K,V>, Cloneable, java.io.Serial

Page 16: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 16 of 31

Dictionary abstract class

thods. It acts like an rface instead of a class. ere, without comments:

CSE 100, UCSD: LEC 17

✔ Dictionary is an abstract class, that specifies some abstract meinterface specification, and probably should have been an inteVery similar to the interface java.util.Map. Methods shown h

public abstract class Dictionary<K,V> {

abstract public int size();

abstract public boolean isEmpty();

abstract public Enumeration<K> keys();

abstract public Enumeration<V> elements();

abstract public V get(Object key);

abstract public V put(K key, V value);

abstract public V remove(Object key);

}

Page 17: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 17 of 31

Instance variables

:

able.

s threshold.

shtable?

CSE 100, UCSD: LEC 17

✔ Here are the instance variables declared in the Hashtable class /**

* The hash table data.

*/

private transient Entry table[];

/**

* The total number of entries in the hash t

*/

private transient int count;

/**

* Rehashes the table when count exceeds thi

*/

private int threshold;

✔ What is the type of elements of the array implementing the ha

Page 18: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 18 of 31

Entry

ntry<K,V> {

objects of this class.

lution strategy is used?

CSE 100, UCSD: LEC 17

✔ The Hashtable.java file also defines this inner class:

private static class Entry<K,V> implements Map.E

int hash;

K key;

V value;

Entry<K,V> next;

}

✔ Entries in a Hashtable object’s table[] array are pointers to

✔ From these declarations so far, can you tell what collision reso

Page 19: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 19 of 31

Hashtable methods

:

CSE 100, UCSD: LEC 17

✔ We will look at these instance methods in the Hashtable class

✗ constructors

✗ get()

✗ put()

✗ keySet()

Page 20: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 20 of 31

Hashtable constructors

ecified initial

city of the table

n 0.0 and 1.0.

initial capacity is

actor

Factor) {

0)) {

r);

CSE 100, UCSD: LEC 17

/**

* Constructs a new, empty hashtable with the sp

* capacity and the specified load factor.

*

* @param initialCapacity the initial capa

* @param loadFactor a number betwee

* @exception IllegalArgumentException if the

* less than zero, or if the load f

* is less than or equal to zero.

* @since JDK1.0

*/

public Hashtable(int initialCapacity, float load

if ((initialCapacity < 0) || (loadFactor <= 0.

throw new IllegalArgumentException();

}

this.loadFactor = loadFactor;

table = new Entry[initialCapacity];

threshold = (int) (initialCapacity * loadFacto

}

Page 21: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 21 of 31

Hashtable default constructor

ult capacity and

the hash table design

CSE 100, UCSD: LEC 17

/**

* Constructs a new, empty hashtable with a defa

* load factor.

*

* @since JDK1.0

*/

public Hashtable() {

this(11, 0.75);

}

✔ How do the default values for size and load factor compare toprinciples we talked about?...

Page 22: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 22 of 31

get()

is mapped in this

in this hashtable;

value in

;

e = e.next) {

) {

CSE 100, UCSD: LEC 17

/**

* Returns the value to which the specified key

* hashtable.

*

* @param key a key in the hashtable.

* @return the value to which the key is mapped

* null if the key is not mapped to any

* this hashtable.

*/

public synchronized V get(Object key) {

int hash = key.hashCode();

int index = (hash & 0x7FFFFFFF) % table.length

for (Entry<K,V> e = table[index] ; e != null ;

if ( e.hash == hash && e.key.equals(key)

return e.value;

}

}

return null;

}

Page 23: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 23 of 31

put()

ecified

the key nor the

de>get</code>

al key.

ed key in this

t did not have one.

or value is

CSE 100, UCSD: LEC 17

✔ Here are the javadoc comments:/**

* Maps the specified <code>key</code> to the sp

* <code>value</code> in this hashtable. Neither

* value can be <code>null</code>.

* <p>

* The value can be retrieved by calling the <co

* method with a key that is equal to the origin

*

* @param key the hashtable key.

* @param value the value.

* @return the previous value of the specifi

* hashtable,or <code>null</code> if i

* @exception NullPointerException if the key

* <code>null</code>.

* @since JDK1.0

*/

✔ ... and the code follows.

Page 24: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 24 of 31

public synchronized V put(K key, V value) {

ate its value

;

e = e.next) {

) {

ceeded

f the table

;

CSE 100, UCSD: LEC 17

// Make sure the value is not null

if (value == null) {

throw new NullPointerException();

}

// If the key is already in the hashtable, upd

int hash = key.hashCode();

int index = (hash & 0x7FFFFFFF) % table.length

for (Entry<K,V> e = table[index] ; e != null ;

if ( e.hash == hash && e.key.equals(key)

V old = e.value;

e.value = value;

return old;

}

}

if (count >= threshold) {

// Rehash the table if the threshold is ex

rehash(); // this enlarges the capacity o

index = (hash & 0x7FFFFFFF) % table.length

}

Page 25: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 25 of 31

CSE 100, UCSD: LEC 17

// Create and add the new entry.

Entry<K,V> e = new Entry<K,V>();

e.hash = hash;

e.key = key;

e.value = value;

e.next = table[index];

table[index] = e;

count++;

return null;

}

Page 26: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 26 of 31

Rehashing

rganizes this

s its entries more

l ; ) {

city;

CSE 100, UCSD: LEC 17

/** Increases the capacity of and internally reo

* hashtable, in order to accommodate and acces

* efficiently.

*/

protected void rehash() {

int oldCapacity = table.length;

Entry oldMap[] = table;

int newCapacity = oldCapacity * 2 + 1;

Entry newMap[] = new Entry[newCapacity];

threshold = (int)(newCapacity * loadFactor);

table = newMap;

for (int i = oldCapacity ; i-- > 0 ;) {

for (Entry<K,V> old = oldMap[i] ; old != nul

Entry<K,V> e = old;

old = old.next;

int index = (e.hash & 0x7FFFFFFF) % newCapa

e.next = newMap[index];

newMap[index] = e;

}

}

Page 27: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 27 of 31

keySet()

r not: just use get()

e are many possible keys, check them all with get()

e keys in the table:in this Hashtable.

emoves the

but not element

in this Map.

ver the keys in the table

CSE 100, UCSD: LEC 17

✔ For any key value, you can find out if that key is in the table o

✔ But how can you get a listing of all the keys in the table? Therand only a few of them will be in the table; it’s not feasible to

✔ The keySet() method returns a Set object that contains only th /* Returns a Set view of the keys contained

* The Set supports element removal (which r

* corresponding entry from the Hashtable),

* addition.

* @return a Set view of the keys contained

* @since 1.2

*/

public Set<K> keySet() {

//...

}

✔ An Iterator for the Set can then be used to iterate efficiently o

Page 28: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 28 of 31

Serializable objects

a sequence of bytes, in ted over a network

‘pickling’ the object

i.e. reconstituted, later received at the other end

ed to implement the

ares itself to implement it

s can also be serializable

CSE 100, UCSD: LEC 17

✔ Since JDK1.1, Java has had the ability to “serialize” objects

✔ Serialization is the process of converting an existing object toorder to be sent over a stream (e.g. saved to a file, or transmitconnection, etc.)

✗ serializing an object also sometimes called ‘persisting’ or

✔ This is done in such a way that the object can be deserialized,(e.g. by reading from the file, or when the serialized object is of the network connection, etc.)

✔ In order for an object to be serialized, its class must be declarjava.io.Serializable interface

✔ This interface does not specify any methods: a class that declis just indicating that instances of it can be serialized

✔ Many Java library classes are serializable; user-defined classe

Page 29: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 29 of 31

Serializing a serializable class

s or a subclass can be

of an appropriately

Object() method (you he appropriate type)

CSE 100, UCSD: LEC 17

✔ If a class is Serializable, objects that are instances of that classerialized

✔ To serialize an object, pass it to the writeObject() methodcreated java.io.ObjectOutputStream object

✔ The object can be deserialized by creating a corresponding java.io.ObjectInputStream object and calling its readwill want to downcast the returned Object reference to be of t

Page 30: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 30 of 31

Designing a serializable class

itive types or Serializable rializable interface and

u do not want it to be part marked transient

lues (null for class types,

serialization methods, ion for how to do this

everything to work, the d deserialization contexts

re created and initialized when an instance of the

CSE 100, UCSD: LEC 17

✔ If all the instance variables of a user-defined class are of primclass types, then the class can be declared to implement the Seinstances of the class can be serialized

✔ If an instance variable is not of a Serializable class type, or yoof the serialized representation, the instance variable must be

✔ transient instance variables are serialized as their default va“zero” for primitive types)

✗ to change this you can write your own serialization and dewhich can call the default methods; see online documentat

✔ Classes themselves are not serialized, only objects! So, to getsame class definition must be available in both serialization an

✗ As a corollary, static variables are never serialized: they awhen the class is loaded into the Java virtual machine, notclass is deserialized

Page 31: Lecture 17 - University of California, San Diegocseweb.ucsd.edu/~kube/cls/100/Lectures/lec17.hashing2/lec17.pdf · Lecture 17 Separate chaining ... Search for key in linked list headed

Page 31 of 31

Next time

CSE 100, UCSD: LEC 17

✔ Self-organizing data structures

✔ Self-organizing lists

✔ Splay trees

✔ Spatial data structures

✔ K-D trees

✔ The C++ Standard Template Library