18. Dictionaries, Hash Tables and Sets

39
Hash Tables and Hash Tables and Sets Sets Dictionaries, Hash Tables, Collisions Dictionaries, Hash Tables, Collisions Resolution, Sets Resolution, Sets Svetlin Nakov Svetlin Nakov Telerik Telerik Corporation Corporation www.telerik. com

description

The Dictionary Abstract Data Structure Hashing and Hash Tables: Dictionary Class Red-Black Tree Based Dictionary Implementation: SortedDictionary Class Sets and Implementations: HashSet and SortedDictionary Classes Implementing TreeSet by Wrapping SortedDictionary Exercises: Working with Dictionaries and Sets

Transcript of 18. Dictionaries, Hash Tables and Sets

Page 1: 18. Dictionaries, Hash Tables and Sets

Hash Tables and Hash Tables and SetsSets

Dictionaries, Hash Tables, Collisions Dictionaries, Hash Tables, Collisions Resolution, SetsResolution, Sets

Svetlin NakovSvetlin NakovTelerik Telerik

CorporationCorporationwww.telerik.com

Page 2: 18. Dictionaries, Hash Tables and Sets

Table of ContentsTable of Contents

1.1. DictionariesDictionaries

2.2. Hash TablesHash Tables

3.3. Dictionary<TKey,Dictionary<TKey, TValue>TValue> Class Class

4.4. SetsSets

2

Page 3: 18. Dictionaries, Hash Tables and Sets

DictionariesDictionariesData Structures that Map Keys to ValuesData Structures that Map Keys to Values

Page 4: 18. Dictionaries, Hash Tables and Sets

The Dictionary (Map) The Dictionary (Map) ADTADT

The abstract data type (ADT) The abstract data type (ADT) ""dictionarydictionary" maps key to values" maps key to values Also known as "Also known as "mapmap" or "" or "associative associative

arrayarray""

Contains a set of (key, value) pairsContains a set of (key, value) pairs Dictionary ADT operations:Dictionary ADT operations:

Add(key,Add(key, value)value)

FindByKey(key)FindByKey(key) valuevalue

Delete(key)Delete(key) Can be implemented in several waysCan be implemented in several ways

List, array, hash table, balanced tree, ...List, array, hash table, balanced tree, ...4

Page 5: 18. Dictionaries, Hash Tables and Sets

ADT Dictionary – ADT Dictionary – ExampleExample

Example dictionary:Example dictionary:

5

KeyKey ValueValue

C#C#Modern object-oriented Modern object-oriented programming language for the programming language for the Microsoft .NET platformMicrosoft .NET platform

CLRCLR

Common Language Runtime – Common Language Runtime – execution engine for .NET execution engine for .NET assemblies, integral part of .NET assemblies, integral part of .NET FrameworkFramework

compilcompilerer

Software that transforms a Software that transforms a computer program to computer program to executable machine codeexecutable machine code

…… ……

Page 6: 18. Dictionaries, Hash Tables and Sets

Hash TablesHash TablesWhat is Hash Table? How it Works?What is Hash Table? How it Works?

Page 7: 18. Dictionaries, Hash Tables and Sets

Hash TableHash Table A hash table is an array that holds A hash table is an array that holds

a set of (key, value) pairsa set of (key, value) pairs The process of mapping a key to a The process of mapping a key to a

position in a table is called position in a table is called hashinghashing

…… …… …… …… …… …… …… ……

0 1 2 3 4 5 … m-1

T

h(h(kk))

Hash Hash table of table of size size mm

Hash Hash function function h:h: kk →→ 00 …… m-1m-1

7

Page 8: 18. Dictionaries, Hash Tables and Sets

Hash Functions and Hash Functions and HashingHashing

A hash table has A hash table has mm slots, indexed slots, indexed from from 00 to to m-1m-1

A hash function A hash function h(k)h(k) maps keys to maps keys to positions:positions:

h:h: kk → → 00 …… m-1m-1

For any value For any value kk in the key range and in the key range and some hash function some hash function hh we have we have h(k)h(k) == pp and and 00 ≤≤ pp << mm…… …… …… …… …… …… …… ……

0 1 2 3 4 5 … m-1

T

h(h(kk)) 8

Page 9: 18. Dictionaries, Hash Tables and Sets

Hashing FunctionsHashing Functions Perfect hashing function (PHF)Perfect hashing function (PHF)

h(k)h(k) : one-to-one mapping of each key : one-to-one mapping of each key kk to an integer in the range to an integer in the range [[00,, mm-1-1]]

The PHF maps each key to a The PHF maps each key to a distinctdistinct integer within some manageable rangeinteger within some manageable range

Finding a perfect hashing function is Finding a perfect hashing function is in most cases in most cases impossibleimpossible

More realisticallyMore realistically Hash function Hash function h(k)h(k) that maps that maps mostmost of of

the keys onto unique integers, but the keys onto unique integers, but not not allall

9

Page 10: 18. Dictionaries, Hash Tables and Sets

Collisions in a Hash Collisions in a Hash TableTable

A A collisioncollision is the situation when different is the situation when different keys have the same hash valuekeys have the same hash value

h(kh(k11)) == h(kh(k22)) forfor kk11 ≠≠ kk22

When the number ofWhen the number of collisions is collisions is sufficiently small, the hash tables work sufficiently small, the hash tables work quite well (fast)quite well (fast)

Several collisions resolution strategies Several collisions resolution strategies existexist Chaining in a listChaining in a list Using the neighboring slots (linear Using the neighboring slots (linear

probing)probing) Re-hashingRe-hashing ...... 10

Page 11: 18. Dictionaries, Hash Tables and Sets

Collision Resolution: Collision Resolution: ChainingChaining

h("Pesho") = 4h("Pesho") = 4h("Kiro") = 2h("Kiro") = 2 h("Mimi") = 1h("Mimi") = 1h("Ivan") = 2h("Ivan") = 2h("Lili") = m-1h("Lili") = m-1

Kiro

Ivannull

Mimi

null

Lili

null

Pesho

null

collisioncollision

Chaining Chaining elements in elements in

case of case of collisioncollision

nullnull …… …… nullnull …… …… ……

0 1 2 3 4 … m-1

T

11

Page 12: 18. Dictionaries, Hash Tables and Sets

Hash Tables and Hash Tables and EfficiencyEfficiency

Hash tables are the most efficient Hash tables are the most efficient implementation of ADT "dictionary"implementation of ADT "dictionary"

Add / Find / Delete take just few Add / Find / Delete take just few primitive operationsprimitive operations Speed does not depend on the size of Speed does not depend on the size of

the hash-table (constant time)the hash-table (constant time)

Example: finding an element in a hash-Example: finding an element in a hash-table with 1 000 000 elements, takes table with 1 000 000 elements, takes just few stepsjust few steps

Finding an element in array of 1 000 000 Finding an element in array of 1 000 000 elements takes average 500 000 stepselements takes average 500 000 steps

12

Page 13: 18. Dictionaries, Hash Tables and Sets

Dictionaries – Dictionaries – Interfaces and Interfaces and

ImplementationsImplementations

13

Page 14: 18. Dictionaries, Hash Tables and Sets

Hash Tables in C#Hash Tables in C#The The Dictionary<TKey,TValue>Dictionary<TKey,TValue> Class Class

Page 15: 18. Dictionaries, Hash Tables and Sets

Dictionary<TKey,TValueDictionary<TKey,TValue>>

Implements the ADT dictionary as hash Implements the ADT dictionary as hash tabletable Size is dynamically increased as neededSize is dynamically increased as needed Contains a collection of key-value pairsContains a collection of key-value pairs Collisions are resolved by chainingCollisions are resolved by chaining Elements have almost random orderElements have almost random order

Ordered by the hash code of the keyOrdered by the hash code of the key Dictionary<TKey,TValue>Dictionary<TKey,TValue> relies on relies on

Object.Object.Equals(Equals()) – for comparing the – for comparing the keyskeys

Object.GetHashCode()Object.GetHashCode() –– for calculating for calculating the hash codes of the keysthe hash codes of the keys

15

Page 16: 18. Dictionaries, Hash Tables and Sets

Dictionary<TKey,TValue>Dictionary<TKey,TValue> (2)(2)

Major operations:Major operations: Add(TKey,TValue)Add(TKey,TValue) – adds an element with – adds an element with

the specified key and valuethe specified key and value

Remove(TKey)Remove(TKey) – removes the element by – removes the element by keykey

this[]this[] – get/add/replace of element by key – get/add/replace of element by key

Clear()Clear() – removes all elements – removes all elements

CountCount – returns the number of elements – returns the number of elements

KeysKeys – returns a collection of the keys – returns a collection of the keys

ValuesValues – returns a collection of the – returns a collection of the valuesvalues

16

Page 17: 18. Dictionaries, Hash Tables and Sets

Major operations:Major operations: ContainsKey(TKey)ContainsKey(TKey) – checks whether – checks whether

the dictionary contains given keythe dictionary contains given key

ContainsValue(TValue)ContainsValue(TValue) – checks – checks whether the dictionary contains given whether the dictionary contains given valuevalue

Warning: slow operation!Warning: slow operation!

TryGetValue(TKey,TryGetValue(TKey, outout TValue)TValue)

If the key is found, returns it in the If the key is found, returns it in the TValueTValue

Otherwise returns Otherwise returns falsefalse

Dictionary<TKey,TValue>Dictionary<TKey,TValue> (3)(3)

17

Page 18: 18. Dictionaries, Hash Tables and Sets

Dictionary<TKey,TValue>Dictionary<TKey,TValue> – – Example Example

Dictionary<string, int> studentsMarks =Dictionary<string, int> studentsMarks = new Dictionary<string, int>();new Dictionary<string, int>();studentsMarks.Add("Ivan", 4);studentsMarks.Add("Ivan", 4);studentsMarks.Add("Peter", 6);studentsMarks.Add("Peter", 6);studentsMarks.Add("Maria", 6);studentsMarks.Add("Maria", 6);studentsMarks.Add("George", 5);studentsMarks.Add("George", 5);

int peterMark = studentsMarks["Peter"];int peterMark = studentsMarks["Peter"];Console.WriteLine("Peter's mark: {0}", peterMark);Console.WriteLine("Peter's mark: {0}", peterMark);Console.WriteLine("Is Peter in the hash table: Console.WriteLine("Is Peter in the hash table: {0}",{0}", studentsMarks.ContainsKey("Peter"));studentsMarks.ContainsKey("Peter"));

Console.WriteLine("Students and grades:");Console.WriteLine("Students and grades:");foreach (var pair in studentsMarks)foreach (var pair in studentsMarks){{ Console.WriteLine("{0} --> {1}", pair.Key, Console.WriteLine("{0} --> {1}", pair.Key, pair.Value);pair.Value);}} 18

Page 19: 18. Dictionaries, Hash Tables and Sets

Dictionary<TKey,TValuDictionary<TKey,TValue>e>

Live DemoLive Demo

Page 20: 18. Dictionaries, Hash Tables and Sets

Counting the Words in Counting the Words in a Texta Text

string text = "a text, some text, just some text";string text = "a text, some text, just some text";IDictionary<string, int> wordsCount = IDictionary<string, int> wordsCount = new new DictionaryDictionary<string, int>(); <string, int>();

string[] words = text.Split(' ', ',', '.');string[] words = text.Split(' ', ',', '.');foreach (string word in words)foreach (string word in words){{ int count = 1;int count = 1; if (wordsCount.ContainsKey(word))if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1;count = wordsCount[word] + 1; wordsCount[word] = count;wordsCount[word] = count;}}

foreach(var pair in wordsCount)foreach(var pair in wordsCount){{ Console.WriteLine("{0} -> {1}", pair.Key, Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);pair.Value);}}

20

Page 21: 18. Dictionaries, Hash Tables and Sets

Balanced Tree Balanced Tree DictionariesDictionariesThe SortedThe SortedDictionary<TKey,TValue>Dictionary<TKey,TValue>

ClassClass

Page 22: 18. Dictionaries, Hash Tables and Sets

SortedDictionarySortedDictionary<TKey,TValue><TKey,TValue>

SortedDictionary<TKey,TValue>SortedDictionary<TKey,TValue> implements the ADT "dictionary" as self-implements the ADT "dictionary" as self-balancing search treebalancing search tree Elements are arranged in the tree ordered Elements are arranged in the tree ordered

by keyby key Traversing the tree returns the elements in Traversing the tree returns the elements in

increasing orderincreasing order Add / Find / Delete perform logAdd / Find / Delete perform log22(n) (n)

operationsoperations Use Use SortedDictionary<TKey,TValue>SortedDictionary<TKey,TValue>

when you need the elements sortedwhen you need the elements sorted Otherwise use Otherwise use Dictionary<TKey,TValue>Dictionary<TKey,TValue> – –

it has better performanceit has better performance22

Page 23: 18. Dictionaries, Hash Tables and Sets

Counting Words (Again)Counting Words (Again)

string text = "a text, some text, just some text";string text = "a text, some text, just some text";IDictionary<string, int> wordsCount = IDictionary<string, int> wordsCount = new new SortedDictionarySortedDictionary<string, int>(); <string, int>();

string[] words = text.Split(' ', ',', '.');string[] words = text.Split(' ', ',', '.');foreach (string word in words)foreach (string word in words){{ int count = 1;int count = 1; if (wordsCount.ContainsKey(word))if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1;count = wordsCount[word] + 1; wordsCount[word] = count;wordsCount[word] = count;}}

foreach(var pair in wordsCount)foreach(var pair in wordsCount){{ Console.WriteLine("{0} -> {1}", pair.Key, Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);pair.Value);}}

23

Page 24: 18. Dictionaries, Hash Tables and Sets

Comparing Dictionary Comparing Dictionary KeysKeys

Using custom key classes in Using custom key classes in Dictionary<TKey, TValue>Dictionary<TKey, TValue> and and SortedDictionary<TKey,TValue>SortedDictionary<TKey,TValue>

Page 25: 18. Dictionaries, Hash Tables and Sets

IComparable<T>IComparable<T> Dictionary<TKey,TValue>Dictionary<TKey,TValue> relies on relies on

Object.Object.Equals(Equals()) – for comparing the keys – for comparing the keys Object.GetHashCode()Object.GetHashCode() –– for calculating for calculating

the hash codes of the keysthe hash codes of the keys SortedDictionary<TKey,TValue>SortedDictionary<TKey,TValue> relies on relies on IComparable<T>IComparable<T> for ordering the keys for ordering the keys

Built-in types like Built-in types like intint, , longlong, , floatfloat, , stringstring and and DateTimeDateTime already implement already implement Equals()Equals(), , GetHashCode()GetHashCode() and and IComparable<T>IComparable<T> Other types used when used as Other types used when used as

dictionary keys should provide custom dictionary keys should provide custom implementationsimplementations

25

Page 26: 18. Dictionaries, Hash Tables and Sets

Implementing Implementing Equals()Equals() and and GetHashCodeGetHashCode()()

26

public struct Pointpublic struct Point{{ public int X { get; set; }public int X { get; set; } public int Y { get; set; }public int Y { get; set; }

public override bool Equals(Object obj)public override bool Equals(Object obj) {{ if (!(obj is Point) || (obj == null)) return if (!(obj is Point) || (obj == null)) return false;false; Point p = (Point)obj;Point p = (Point)obj; return (X == p.X) && (Y == p.Y);return (X == p.X) && (Y == p.Y); }}

public override int GetHashCode()public override int GetHashCode() {{ return (X << 16 | X >> 16) ^ Y;return (X << 16 | X >> 16) ^ Y; }}}}

Page 27: 18. Dictionaries, Hash Tables and Sets

Implementing Implementing IComparable<T>IComparable<T>

27

public struct Point : IComparable<Point>public struct Point : IComparable<Point>{{ public int X { get; set; }public int X { get; set; } public int Y { get; set; }public int Y { get; set; }

public int CompareTo(Point otherPoint)public int CompareTo(Point otherPoint) {{ if (X != otherPoint.X)if (X != otherPoint.X) {{ return this.X.CompareTo(otherPoint.X);return this.X.CompareTo(otherPoint.X); }} elseelse {{ return this.Y.CompareTo(otherPoint.Y);return this.Y.CompareTo(otherPoint.Y); }} }}}}

Page 28: 18. Dictionaries, Hash Tables and Sets

SetsSetsSets of ElementsSets of Elements

Page 29: 18. Dictionaries, Hash Tables and Sets

Set and Bag ADTsSet and Bag ADTs The abstract data type (ADT) "The abstract data type (ADT) "setset" keeps " keeps

a set of elements with no duplicatesa set of elements with no duplicates Sets with duplicates are also known as Sets with duplicates are also known as

ADT "ADT "bagbag"" Set operations:Set operations:

Add(element)Add(element)

Contains(element)Contains(element) true / falsetrue / false

Delete(element)Delete(element)

Union(set) / Intersect(set)Union(set) / Intersect(set) Sets can be implemented in several waysSets can be implemented in several ways

List, array, hash table, balanced tree, ...List, array, hash table, balanced tree, ...29

Page 30: 18. Dictionaries, Hash Tables and Sets

Sets – Interfaces and Sets – Interfaces and ImplementationsImplementations

30

Page 31: 18. Dictionaries, Hash Tables and Sets

HashSet<T>HashSet<T> HashSet<T>HashSet<T> implements ADT implements ADT setset by hash by hash

tabletable

Elements are in no particular orderElements are in no particular order

All major operations are fast:All major operations are fast:

Add(element)Add(element) – appends an element to the set – appends an element to the set

Does nothing if the element already existsDoes nothing if the element already exists

Remove(element)Remove(element) – removes given element – removes given element

CountCount – returns the number of elements – returns the number of elements

UnionWith(set)UnionWith(set) / / IntersectWith(set)IntersectWith(set) – – performs union / intersection with another performs union / intersection with another setset

31

Page 32: 18. Dictionaries, Hash Tables and Sets

HashSet<T>HashSet<T> – Example – Example

32

ISet<string> firstSet = new HashSet<string>(ISet<string> firstSet = new HashSet<string>( new string[] { "SQL", "Java", "C#", "PHP" });new string[] { "SQL", "Java", "C#", "PHP" });ISet<string> secondSet = new HashSet<string>(ISet<string> secondSet = new HashSet<string>( new string[] { "Oracle", "SQL", "MySQL" });new string[] { "Oracle", "SQL", "MySQL" });

ISet<string> union = new ISet<string> union = new HashSet<string>(firstSet);HashSet<string>(firstSet);union.UnionWith(secondSet);union.UnionWith(secondSet);PrintSet(union); // SQL Java C# PHP Oracle MySQLPrintSet(union); // SQL Java C# PHP Oracle MySQL

private static void PrintSet<T>(ISet<T> set)private static void PrintSet<T>(ISet<T> set){{ foreach (var element in set)foreach (var element in set) {{ Console.Write("{0} ", element);Console.Write("{0} ", element); }} Console.WriteLine();Console.WriteLine();}}

Page 33: 18. Dictionaries, Hash Tables and Sets

SortedSet<T>SortedSet<T>

SortedSet<T>SortedSet<T> implements ADT implements ADT setset by by balanced search treebalanced search tree

Elements are sorted in increasing Elements are sorted in increasing orderorder

Example:Example:

33

ISet<string> firstSet = new SortedSet<string>(ISet<string> firstSet = new SortedSet<string>( new string[] { "SQL", "Java", "C#", "PHP" });new string[] { "SQL", "Java", "C#", "PHP" });ISet<string> secondSet = new SortedSet<string>(ISet<string> secondSet = new SortedSet<string>( new string[] { "Oracle", "SQL", "MySQL" });new string[] { "Oracle", "SQL", "MySQL" });ISet<string> union = new ISet<string> union = new HashSet<string>(firstSet);HashSet<string>(firstSet);union.UnionWith(secondSet);union.UnionWith(secondSet);PrintSet(union); // C# Java PHP SQL MySQL OraclePrintSet(union); // C# Java PHP SQL MySQL Oracle

Page 34: 18. Dictionaries, Hash Tables and Sets

HashSet<T>HashSet<T> and and SortedSet<T>SortedSet<T>

Live DemoLive Demo

Page 35: 18. Dictionaries, Hash Tables and Sets

SummarySummary Dictionaries map key to valueDictionaries map key to value

Can be implemented as hash table or Can be implemented as hash table or balanced search treebalanced search tree

Hash-tables map keys to valuesHash-tables map keys to values Rely on hash-functions to distribute the Rely on hash-functions to distribute the

keys in the tablekeys in the table Collisions needs resolution algorithm Collisions needs resolution algorithm

(e.g. chaining)(e.g. chaining) Very fast add / find / deleteVery fast add / find / delete

Sets hold a group of elementsSets hold a group of elements Hash-table or balanced tree Hash-table or balanced tree

implementationsimplementations 35

Page 36: 18. Dictionaries, Hash Tables and Sets

Hash Tables and SetsHash Tables and Sets

Questions?Questions?

http://academy.telerik.com

Page 37: 18. Dictionaries, Hash Tables and Sets

ExercisesExercises1.1. Write a program that counts in a given Write a program that counts in a given

array of integers the number of array of integers the number of occurrences of each integer. Use occurrences of each integer. Use Dictionary<TKey,TValue>Dictionary<TKey,TValue>..

Example: array = {Example: array = {33, , 44, , 44, , 22, , 33, , 33, , 44, , 33, , 22}}

22 22 times times

33 44 times times

44 33 times times

2.2. Write a program that extracts from a given Write a program that extracts from a given sequence of strings all elements that sequence of strings all elements that present in it odd number of times. Example:present in it odd number of times. Example:

{C#, SQL, PHP, PHP, SQL, SQL } {C#, SQL, PHP, PHP, SQL, SQL } {C#, SQL} {C#, SQL}

37

Page 38: 18. Dictionaries, Hash Tables and Sets

Exercises (2)Exercises (2)3.3. Write a program that counts how many Write a program that counts how many

times each word from given text file times each word from given text file words.txtwords.txt appears in it. The character appears in it. The character casing differences should be ignored. The casing differences should be ignored. The result words should be ordered by their result words should be ordered by their number of occurrences in the text. number of occurrences in the text. Example:Example:

is is 2 2

the the 2 2

this this 3 3

text text 6 6 38

This is the TEXT. Text, text, text – THIS This is the TEXT. Text, text, text – THIS TEXT! Is this the text?TEXT! Is this the text?

Page 39: 18. Dictionaries, Hash Tables and Sets

Exercises (3)Exercises (3)

39

4.4. Implement the data structure "Implement the data structure "hash tablehash table" in a " in a class class HashTable<K,T>HashTable<K,T>. Keep the data in array . Keep the data in array of lists of key-value pairs of lists of key-value pairs ((LinkedList<KeyValuePair<K,T>>[]LinkedList<KeyValuePair<K,T>>[]) with initial ) with initial capacity of capacity of 1616. When the hash table load runs . When the hash table load runs over over 7575%, perform resizing to %, perform resizing to 22 times larger times larger capacity. Implement the following methods and capacity. Implement the following methods and properties: properties: Add(key,Add(key, value)value), , Find(key)Find(key)valuevalue, , Remove( key)Remove( key), , CountCount, , Clear()Clear(), , this[]this[], , KeysKeys. Try . Try to make the hash table to support iterating to make the hash table to support iterating over its elements with over its elements with foreachforeach..

5.5. Implement the data structure "Implement the data structure "setset" in a class " in a class HashedSet<T>HashedSet<T> using your class using your class HashTable<T,T>HashTable<T,T> to hold the elements. Implement all standard to hold the elements. Implement all standard set operations like set operations like Add(T)Add(T), , Find(T)Find(T), , Remove(T)Remove(T), , CountCount, , Clear()Clear(), union and intersect., union and intersect.