Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson...

24
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13- 140909-3 1 Hash Tables, Maps and Sets Chapter 15 Wherein we throw all the data into random array slots and somehow obtain O(1) retrieval time

Transcript of Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson...

Page 1: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

1

Hash Tables, Maps and SetsChapter 15

Wherein we throw all the data into random array slots and somehow obtain

O(1) retrieval time

Page 2: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

2

Hash Tables

• Recall order of magnitude of searches– Linear search O(n)

– Binary search O(log2n)

– Balanced binary tree search O(log2n)

– Unbalanced binary tree can degrade to O(n)

Page 3: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

3

Hash Tables

• In some situations faster search is needed– Solution is to use a hash function– Value of key field given to hash function– Location in a hash table is calculated

Page 4: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

4

Hash Functions

• Simple function could be to mod the value of the key by some arbitrary integerint h(int i){ return i % someInt; }

• Note the max number of locations in the table will be same as someInt

• Note that we have traded speed for wasted space– Table must be considerably larger than number

of items anticipated

Page 5: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

5

Hash Functions

• Observe the problem with same value returned by h(i) for different values of i– Called collisions

• A simple solution is linear probing– Linear search begins at

collision location– Continues until empty

slot found for insertion

Page 6: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

6

Hash Functions

• When retrieving a valuelinear probe until found– If empty slot encountered

then value is not in table

• If deletions permitted – Slot can be marked so

it will not be empty and cause an invalid linear probe

Page 7: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

7

Hash Functions

• Strategies for improved performance– Increase table capacity (less collisions)– Use different collision resolution technique– Devise different hash function

• Hash table capacity– Size of table must be 1.5 to 2 times the size of

the number of items to be stored– Otherwise probability of collisions is too high

Page 8: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

8

Collision Strategies

• Linear probing can result in primary clustering

• Consider quadratic probing– Probe sequence from location i is

i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, …– Secondary clusters can still form

• Double hashing – Use a second hash function to determine probe

sequence

Page 9: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

9

Collision Strategies

• Chaining– Table is a list or vector of head nodes to linked

lists– When item hashes to location, it is added to that

linked list

Page 10: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

10

Improving the Hash Function

• Ideal hash function– Simple to evaluate– Scatters items uniformly throughout table

• Modulo arithmetic not so good for strings– Possible to manipulate numeric (ASCII) value of

first and last characters of a name

Page 11: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

11

Implementations

    Hash TableResizable

ArrayBalanced

TreeLinked

List

Hash Table + Linked

List

Interfaces

Set HashSet   TreeSet   LinkedHashSet

List   ArrayList   LinkedList  

Map HashMap   TreeMap   LinkedHashMap

Page 12: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

• HashMap -- an associative data structure

• HashSet -- a container for holding values

• TreeMap – just like HashMap only items are ordered by key (internal red-black tree)

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

12

Page 13: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Hash Tables 13

the Map Interface

• Map ADT methods:– get(k): if the map M has an entry with key k, return

its associated value; else, return null – put(k, v): insert entry (k, v) into the map M; if key k

is not already in M, then return null; else, return old value associated with k

– remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null

– size(), isEmpty()– keys(): return an iterator of the keys in M– values(): return an iterator of the values in M

Page 14: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Hash Tables 14

the Set Interface

• Set ADT methods:– add(o): Adds the specified element to this set if it

is not already present. Returns true if the set does not already contain element (new item added)

– contains(o): returnts true if the set contains the specified element

– remove(o): Removes the specified element from this set if it is present. Returns true if the set contained the specified element

– size(), isEmpty()

Page 15: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

15

Page 16: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

16

Page 17: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

17

Page 18: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

18

Page 19: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

19

Page 20: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

20

Page 21: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

21

Page 22: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

22

Page 23: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Using “Chaining” to resolve collisions

• A linked list in every cell of array

• No rehashing required

• Unlimited collisions can be handled

• Lousy hash codes will still reduce performanceNyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson

Education, Inc. All rights reserved. 0-13-140909-3 23

Page 24: Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3

24