(5) collections algorithms

download (5) collections algorithms

If you can't read please download the document

description

This presentation comes with many additional notes (pdf): http://de.slideshare.net/nicolayludwig/5-collections-algorithms-38613850 - Implementation Strategies of Map/Dictionary-like associative Collections -- BST-based Implementation -- Hashtable-based Implementation - A 20K Miles Perspective on Trees in Computer Science - Equality - Multimaps - Set-like associative Collections

Transcript of (5) collections algorithms

  • 1. Nico Ludwig (@ersatzteilchen) Collections Part V

2. 2 Collections Part V Implementation Strategies of Map/Dictionary-like associative Collections BST-based Implementation Hashtable-based Implementation A 20K Miles Perspective on Trees in Computer Science Equality Multimaps Set-like associative Collections TOC 3. 3 In the last lecture we learned about associative collections. Associative collections allow associating keys with values. Example: In white pages names are associated to phone numbers. Associative collections allow to lookup a value for a certain key, e.g. getting the phone number of a specific person. Associative collections could be implemented with two arrays as shown before, but this would be very inefficient for common cases. In this lecture we're going to understand how associative collections are implemented. We'll start with SortedDictionary, which was introduced in the last lecture. We already know about SortedDictionary that it allows only one value per key, that it organizes its keys by equivalence, that a .Net type provides equivalence by implementing IComparable or delegating equivalence to objects implementing IComparer. Ok, the organization of keys is based on equivalence, but when a key is in a SortedDictionary, where does it have to "be"? I.e. where and how will it be stored? We're going to understand this in this lecture! Implementation of Associative Collections 4. 4 In opposite to indexed/sequential collections, items got and set by analyzing their relationships in associative collections. The idea of SortedDictionary is to keep the contained items sorted by key whilst adding items! So, its way of key organization is the analysis of the sort order of keys of the items. To make this happen, SortedDictionary is implemented with a tree-like storage organization for the keys. When we analyzed algorithms we learned that the most efficient sorting algorithms can be visualized and analyzed with trees. I.e. those implementing "divide and conquer", like mergesort and quicksort. Before we discuss SortedDictionary's tree-based implementation, we're going to talk about alternative implementations. Sidebar: Java's implementation of a sorted associative collection is called TreeMap. It implements the interface Map. Associative Collections What's in a SortedDictionary? 5. 5 SortedDictionary could be implemented with a sorted list. Finding would be fast (O(logn)), but inserting would be costly (O(n), because of resizing). It is simple to code. SortedDictionary could be implemented with a sorted linked list. Inserting would be fast if the location is known (O(1)), but finding would be costly (O(n), finding involves the iteration of a linked list). More memory per item is required (e.g. one item plus a few references, whereby (sorted) lists don't (just one item, due to continuous memory)). Problems of these approaches: For lists continuos memory is the problem. It makes the data structure rigid. Resizing required. For linked lists finding items is a problem due to the loose structure. "Linked iteration" Let's finally discuss how we come from sorted lists and sorted linked lists to the real implementation of SortedDictionary. "Helen" "James""Gil""Clark""Ed" "Clark" "Ed" "Gil" "James" null "Helen" SortedDictionary two alternative Implementations 6. 6 When we examine a sorted linked list we'll find an interesting way of thinking about finding items. If we had a reference to the middle item, we could determine the "direction" or half to insert a new item to keep the linked list sorted. Therefor it would be nice to allow moving into both directions, so we should use a sorted doubly linked list (3rd alternative)! The idea of referencing only the middle item of an already sorted doubly linked list can be recursively applied to the sorted halves. The resulting structure is a tree, on which the real implementation of SortedDictionary is built upon! "Clark" "Ed" "Helen" "James""Gil" "Clark" "Ed" "Helen" "James""Gil" "Bud" "Will" null "Will" null "Bud" null null"Clark" "Ed" null "Helen" "James"null null "Gil" "Will" nullnullnull "Bud" null "Kyle" "Kyle" "Kyle" "Kyle" SortedDictionary Coming to the real Implementation 7. 7 Trees represent one of the many data structures available in cs (others are graphs, relational, heap, stack etc.) Similar to collections data structures can be categorized in different ways, e.g. a tree is a special form of a graph. Basic features of trees: Trees are recursive branching structures consisting of linked nodes. Trees have only one root node. In cs the root of a tree is written on the top at graphical representations. A node can be reached by exactly one path from the root. This is only true for linear paths, in xpath there exist various expressions to select any node from the root. Trees represent data that has a hierarchy. This is very common in computer business. In the last example the hierarchy was the order: left => less than root, right => greater than root. Directory and file structures. A function call hierarchy: a call stack. "Has a" and "is a" associations in object-oriented systems. The structure of an XML file. Trees in Computer Science (cs) 8. 8 The final solution we found to handle sorted collections is a special form of a tree, a binary search tree (BST). The implementation of SortedDictionary is based on a BST (BST-based). The core strategy of BSTs is to maintaining pointers/references to the middle of subtrees. Here some general terminology on trees (seen from "James"): "James" is a node or cell. The arrows can be called edges. "James" has two children. Below "James" there is a subtree of two children. "James"' parent is "Gil". "Gil" is the root of the tree. In computer science trees are evolving upside down, having the root at the top. "Will" is a leaf, i.e. it has no children at all. A node having only one child is sometimes called half-leaf. "Clark" and "James" are siblings. The BST is called "binary", because each node has at most two children. A binary search can be issued on a BST in a very simple manner. null"Clark" "Ed" null "Helen" "James"null null "Gil" "Will" nullnullnull "Bud" null Trees Terminology 9. 9 Basically all operations on trees involve recursion! The iteration of a tree is called traversal. It means to visit every node in the tree from a certain starting node. (1) Do something with the current node. (2) If the left (or right node) is not null make it the current node and continue with (1). Recurse! There are three classical ways to traverse (binary) trees: Inorder: visit the left node, then the root and finally the right node. Postorder: visit the left node, then the right node and finally the root. Preorder/depth-first: visit the root, then the left node and finally the right node. The most important traversal for BSTs is inorder, because it visits the nodes in sorted order. Notice, how the way of traversal could be encapsulated behind the iterator design pattern! public class Node { // Mind the recursive definition! public string Value { get; set;} public Node Left { get; set;} public Node Right { get; set;} } public static void PrintTreePostorder(Node root) { if (null != root) { PrintTreePostorder(root.Left); // Recurse! PrintTreePostorder(root.Right);// Recurse! Console.WriteLine(root.Value); } } public static void PrintTreeInorder(Node root) { if (null != root) { PrintTreeInorder(root.Left); // Recurse! Console.WriteLine(root.Value); PrintTreeInorder(root.Right); // Recurse! } } public static void PrintTreePreorder(Node root) { if (null != root) { Console.WriteLine(root.Value); PrintTreePreorder(root.Left); // Recurse! PrintTreePreorder(root.Right); // Recurse! } } Traversal of Trees 10. 10 So SortedDictionary is implemented with a BST. The string that was used in each node in former examples, plays the role of SortedDictionary's search key. The nodes of SortedDictionary have one more field: the value that is associated with the search key. Using a BST in the implementation allows fast implementations to find items. E.g. the methods Add() and ContainsKey() apply a binary search on SortedDictionary's BST to locate items. The involved search algorithms are implemented recursively. Complexity. The costs for finding one item only depends on the height of the tree. We've already discussed that a tree's height can be retrieved by log n. This is valid for perfectly balanced trees. Other costs like swapping points will not be taken into consideration as always. This makes the complexity for finding (and related operations) O(log n) for BSTs. Max. ten comparisons to find an item in 1000 items. null"Clark" "Ed" null "Helen" "James"null null "Gil" "Will" nullnullnull "Bud" null "Kyle" "Kyle" (Node) 8758 value null root * left * right "Gil" key SortedDictionary final Words 11. 11 Associative Collections A completely different Strategy Up to now we discussed associative collections using equivalence to manage keys. We used a BST and got logarithmic cost for inserting and finding elements. - This is pretty fast! An interesting point is that the organization of BSTs is concretely imaginable/understandable/drawable. Now we're going to discuss associative collections using equality to manage keys, which is really cool! Instead of keeping the associative collection sorted, we manage a set of collections that collect items having something in common. These individual collections are then called buckets. E.g. an associative collection telephoneDictionary: for each bucket the contained strings will have the same first character. (This is exactly how "real" dictionaries work, where there are tabs collecting words starting with the same character. The tabs play the role of the buckets.) This type of associative collection is called hashtable. A so called hash-function maps a key to a hash-code. In this case the hash-function maps a string (= the key) to its first letter (= the hash-code). The hash-function determines, in which bucket to find or insert a key (or key-value-pair). When the bucket is determined, a specific operation takes only place on the items in that bucket. telephoneDictionary (Bucket) "Bud""B" "Ben" (Bucket) "Gil""G" (Bucket) "Will""W" "Walt" 3821 9427 7764 4689 1797 public class Bucket { public string HashCode { get; set; } public IList> Items { get; set; } } 12. 12 Associative Collections Hashtable-based Implementations - Keys OK, but how will an implementation using a hashtable help? - To understand it, let's dissect hashtable starting with the keys. The idea of a hashtable is to have a hash-function that calculates a position in a table from the input item. How can we have a function that is able to do that for an item? The idea is to have a method of an item-object, more specifically the key, that calculates this position. In the frameworks Java and .Net each UDT inherits a base type (Object in both frameworks), forming a cosmic hierarchy. In Java a UDT can override Object's method hashCode() in order to calculate the hash-code. In .Net a UDT can override Object's method GetHashCode() in order to calculate the hash-code. Now we have hash-functions/methods to "hash" a name string to the int value of the first letter (incl. some error handling). We can use instances of NameKey as keys for hashtables! // C# public class NameKey { public string Name { get; set; } public override int GetHashCode() { // Return int-value of first letter. return string.IsNullOrEmpty(Name) ? 0 : Name[0]; } } // Java public class NameKey { // (members hidden) public String name; // (getter/setter hidden) @Override public int hashCode() { // Return int-value of first letter. return (null == getName() || getName().isEmpty()) ? 0 : getName().charAt(0); } } 13. 13 Associative Collections Hashtable-based Implementations - .Net's Dictionary In .Net we can now use NameKey as type for the key in an IDictionary that is implemented in terms of a hashtable. The hashtable-based implementation of IDictionary is simply called Dictionary! It can (of course) be used like any other IDictionary: When GetHashCode() is overridden in the key's type, the hashtable will automatically work: When those items are added into the telephoneDictionary, following will happen: (1) For the key NameKey{ Name = "Bud" } the hash-code is retrieved. GetHashCode() returns 66. (2) In the hashtable-array (here an int[]) the entry on the index 66 is fetched: If there is an empty entry at 66, it will be set to the value 3821, otherwise the value at the existing entry at 66 will be overwritten with the value 3821. Another key (like NameKey{ Name = "Gil" }) yields another hash-code and another index in the hashtable. The operations in (2) are O(1) operations, because accessing indexed collections (like arrays) on indexes only cost O(1). Oversimplification! There's more and it's getting more contrived! And we've to understand it to get the whole picture! IDictionary telephoneDictionary = new Dictionary(); telephoneDictionary[new NameKey{ Name = "Bud" }] = 3821; // GetHashCode() returns 66. telephoneDictionary[new NameKey{ Name = "Gil" }] = 7764; // GetHashCode() returns 71. telephoneDictionary hashTable : int[] 382166 ...... 776471 ...... ...... 38213821 adding these objects (NameKey) "Bud" 3821 (NameKey) "Gil" 7764 14. 14 Associative Collections Hashtable-based Implementations Hash Collisions and Buckets But there is a problem: what if we try adding keys having the same hash-code, e.g. names with the same first letter? The thing that happens here is called hash collision. To handle hash collisions in a hashtable we have to review the idea of Buckets. Actually a hashtable is an array of Bucket objects (Not just an int[]!). For each hash-code there exists one Bucket. The hash-code is the index of the hashtable, where that hash-code's Bucket resides. Within a Bucket all inserted items with a key having the same hash-code are collected as key-value-pairs. Hm... wait! If a hash collision leads to putting items into the same Bucket, how can the associated value be retrieved from the hashtable in a definite way? In other words: the hashtable story is even more complex! telephoneDictionary[new NameKey{ Name = "Bud" }] = 3821; // GetHashCode() returns 66. telephoneDictionary[new NameKey{ Name = "Ben" }] = 9427; // GetHashCode() also returns 66! telephoneDictionary hashTable : Bucket[] (Bucket)66 ...... (Bucket)71 ...... ...... (Bucket) (Key : NameKey) "Bud" (Key : NameKey) "Ben" 66 hashCode (Bucket) (Key : NameKey) "Gil" 71 hashCode (Value : int) 3821 (Value : int) 9427 (Value : int) 7764 adding these objects (NameKey) "Bud" 3821 (NameKey) "Ben" 9427 15. 15 Associative Collections Hashtable-based Implementations Equality As a matter of fact hash collisions can not be avoided, because any objects could have the same hash-code! E.g. two names could have the same first letter. This fact is not under our control! The answer is that all the key-value-pairs of a hash-code-matching Bucket need to be searched for a certain key. The keys of a hash-code-matching Bucket are linearly searched and compared for equality to find the exactly matching key. But what means "exactly matching key" in opposite to a (just) hash-code-matching key? We have to prepare our key-type to provide another method to check for the exact equality of two keys! Similar to objects providing hash-codes, Java and .Net want us overriding the method equals()/Equals() inherited from Object: Let's discuss how this works! public class NameKey { // (members hidden) public override int GetHashCode() { /* pass */ } public override bool Equals(object other) { // Simple and nave implementation of Equals(): NameKey otherNameKey = (NameKey)other; return Name == otherNameKey.Name; } } public class NameKey { // (members hidden) @Override public int hashCode() { /* pass */ } @Override public boolean equals(Object other) { // Simple and nave implementation of equals(): NameKey otherNameKey = (NameKey) other; return getName().equals(otherNameKey.getName()); } } 16. 16 Associative Collections Hashtable Implementations Implementing Equality .Net: Mechanics of comparing two NameKey objects with the method Equals(): Equals() has a parameter that is of the very base type Object (static type). We assume the dynamic type behind the passed argument is NameKey (only NameKeys can be equality-compared to each other). We can cast the parameter other to NameKey blindly. (And this can be a problem, which we'll discuss shortly.) When we've the NameKey behind the parameter, we equality-compare the argument's Name property against this' Name property. Mind that we used the operator== to compare the Name properties (well, we could have used Equals() as well); those are of type string. Equals() makes a deeper (more expensive) comparison than GetHashCode(), the latter only deals with Name's first letter: public class NameKey { // (members hidden) public override bool Equals(object other) { NameKey otherNameKey = (NameKey)other; return this.Name == otherNameKey.Name; } } // Both hash-codes evaluate to 66, but these objects are not equal! NameKey key1 = new NameKey{ Name = "Bud" }; NameKey key2 = new NameKey{ Name = "Ben" }; bool hashCodesAreTheSame = key1.GetHashCode() == key2.GetHashCode(); // evaluates to true bool areNotEqual = key1.Equals(key2); // evaluates to false public class NameKey { // (members hidden) public override bool Equals(object other) { NameKey otherNameKey = (NameKey)other; return this.Name == otherNameKey.Name; } } public class NameKey { // (members hidden) public override int GetHashCode() { return string.IsNullOrEmpty(this.Name) ? 0 : this.Name[0]; } } 17. 17 Associative Collections Hashtable-based Implementations Rules of Equality We'll not discuss all facets of the implementation of GetHashCode() and Equals(), but here are the most important rules: Two objects having the same hash-code need not to return true for calling Equals()! But, if two objects having the same dynamic type return true for calling Equals(), then those need to have the same hash-code! GetHashCode() and Equals() have to return the same results for the "structurally" same objects unless one of them is modified. GetHashCode() and Equals() should be implemented to work very fast. Neither GetHashCode() nor Equals() are allowed to throw exceptions! If null is passed to Equals() the result has to be false. Finally GetHashCode() and Equals() for NameKey could be implemented like so to fulfill these rules: public class NameKey { // (members hidden) public override int GetHashCode() { return string.IsNullOrEmpty(Name) ? 0 : Name[0]; } public override bool Equals(object other) { // Stable implementation of Equals() if (this == other) { return true; } if (null != other && GetType() == other.GetType())) { return Name == ((NameKey)other).Name; } return false; } } - checks for identity - checks for nullity (to avoid exceptions) - checks the dynamic type of this and the other object - the cast is type safe - the Name properties of both objects are equality-compared 18. 18 Associative Collections Hashtable-based Implementations The Lookup Algorithm All right! Now we have the methods GetHashCode() and Equals() in place. But how do hashtables use these tools to work? Let's assume following content in the hashtable-based Dictionary telephoneDictionary: Then we'll lookup/search the phone number of "Bud": The lookup will initiate following algorithm basically: Dictionary will call GetHashCode() on the indexer's (i.e. operator[]) argument 'new NameKey{ Name = "Bud" }' and the result is 66. Dictionary will get the Bucket at the hashtable's index 66. This Bucket has two entries! Dictionary will call 'Equals(new NameKey{ Name = "Bud" })' against each key of the key-value-pairs in the just returned Bucket. The value of the key-value-pair for which the key was equal to 'new NameKey{ Name = "Bud" })' will be returned. Read these steps for multiple times and make sure you understood those! Now we have to discuss some details... The algorithms to insert or update a key-value-pair work the same way as for lookups! telephoneDictionary hashTable : Bucket[] (Bucket)66 ...... ...... (Bucket) (Key : NameKey) "Bud" (Key : NameKey) "Ben" 66 hashCode (Value : int) 3821 (Value : int) 9427 searching this key (NameKey) "Bud" IDictionary telephoneDictionary = new Dictionary { { new NameKey { Name = "Bud" }, 3821 }, { new NameKey { Name = "Ben" }, 9427 } } ; int no = telephoneDictionary[new NameKey { Name = "Bud" }]; 19. 19 Associative Collections Hashtable-based Implementation best Case Complexity Although the algorithm to lookup keys is complex, the analysis of a hashtable's complexity is really simple! The best case yields a complexity of O(1)! - Wow! If every item can be associated to exactly one distinct Bucket each, we have one item in every Bucket: an 1 : 1 association between items and Buckets. This means that every item has a distinct hash-code and thus a distinct index in the hashtable. As the hashtable is an array and arrays can access their items by index with O(1) complexity, we have the best case for hashtable! In the best case, accessing a hashtable, means accessing an array by index with constant complexity. This is better than O(log n) for tree-based associative collections! This is the best we can get for collections! But there is also a worst case! telephoneDictionary hashTable : Bucket[] (Bucket)66 ...... (Bucket)71 ...... ...... (Bucket) (Key : NameKey) "Gil" 71 hashCode (Value : int) 7764 (Bucket) (Key : NameKey) "Bud" 66 hashCode (Value : int) 3821 (Bucket) (Key : NameKey) "Will" 87 hashCode (Value : int) 4689 (Bucket)87 ...... 20. 20 Associative Collections Hashtable-based Implementations worst Case Complexity The worst case yields a complexity of O(n)! - Oh no! If every item can be associated to the same Bucket, we have all items in only one single Bucket: an n : 1 association between items and Buckets. This means that every item has the same hash-code and thus the same index in the hashtable. As the only one Bucket needs to be search linearily to find the equal key, the worst complexity boils down to the linear complexity O(n)! In the worst case the workload is moved to a single Bucket that must be searched in a linear manner. telephoneDictionary hashTable : Bucket[] (Bucket)66 ...... ...... (Bucket) (Key : NameKey) "Bud" (Key : NameKey) "Ben" 66 hashCode (Value : int) 3821 (Value : int) 9427 (Key : NameKey) "Betty" (Value : int) 8585 21. 21 Associative Collections Hashtable-based Implementations Controlling Performance Part I An interesting point when working with .Net's Dictionary is how we can influence the performance as developers: We can override GetHashCode() and Equals() for our own UDTs being used as keys and this is what we are going to discuss now. Concretely we have a problem with the UDT NameKey: we'll get the same hash-code for keys having the same first letter! However, we can override GetHashCode(), so that a "deeper" or in other words "more distinct" hash-code will be produced. .Net's string is able to produce its own hash-code that is calculated concerning the whole string-value and not only the first letter: We already know that each .Net type inherits Object and can override GetHashCode() and Equals(). A type implementing equality in the .Net framework needs overriding GetHashCode() and Equals()! Think: GetHashCode() => level one equality, Equals() => level two equality. Many types of the .Net framework provide useful overrides of GetHashCode() and Equals() to implement equality. (GetHashCode() is not only used with Dictionary's but also in other places of the .Net framework.) (A type implementing equivalence in the .Net framework needs implementing IComparable or another type implementing IComparer.) public class NameKey { public string Name {get; set;} public override int GetHashCode() { // Return int-value of first letter. return string.IsNullOrEmpty(Name) ? 0 : Name[0]; } } public class NameKey { public string Name {get; set;} public override int GetHashCode() { // Return the hash-code of the string Name. return string.IsNullOrEmpty(Name) ? 0 : Name.GetHashCode(); } } 22. 22 NO! Stop it! We're not going to discuss it here! As a matter of fact implementing equality correctly in .Net and Java is not simple and it is often done downright wrong and potentially dangerous! I have one simple tip: Don't be too clever and follow the rules! Ok, we'll discuss it in depth in a future lecture. ***Under Construction Equality Under Construction*** An interesting point is that each object in Java/.Net implements equals()/Equals() and hashcode()/GetHashCode(). But overrides of those methods need to follow certain rules. Let's discuss those. 23. 23 Associative Collections Hashtable-based Implementations Controlling Performance Part II The last implementation of NameKey does basically delegate equality fully to its property Name, which is of type string. Hm! - To drive this point home: we can get rid of the type NameKey and use string instead as key-type! Hey! As a matter of fact, it is often not required to program extra UDTs as key types! The present .Net types are often sufficient to be used as keys for hashtables. Most often int and string are used as key types. Nevertheless, .Net (and Java) developers have to know, how equality needs to be implemented correctly! In principle the implementation pattern and even the idiom for equality works the same way in Java! // A dictionary that associates strings (names) to ints (phone numbers): IDictionary telephoneDictionary = new Dictionary(); // Adding two string-int key-value-pairs: telephoneDictionary["Bud"] = 3821; telephoneDictionary["Gil"] = 7764; // Looking up the phone number of "Bud": int no = telephoneDictionary["Bud"]; 24. 24 Without any further explanation it should be clear that maintaining a collection having only unique items is very painful. E.g. collecting only the surnames from a deck of invitations. It could be done by filling a list w/o duplicates: There is a variant of associative collections that just avoids the presence of duplicates. Those are often called sets. Sets are associative collections like dictionaries, in which the keys are the values! In Java/.Net we can use BST-based (TreeSet/SortedSet) and hashtable-based (HashSet/HashSet) implementations of sets. In C++ we can use set either the BST-based std::set () or the hashtable-based std::unordered_set (C++11: ). // Java List allSurnames = Arrays.asList("Taylor", "Miller", "Taylor", "Miller"); List uniqueSurnames = new ArrayList(); for (String surName : allSurnames) { if (!uniqueSurnames.contains(surName)) { // Filters unique items out. uniqueSurnames.add(surName); } } for (String surName : uniqueSurnames) { System.out.println(surName); } // > Taylor // > Miller // Implementation using a Set: Set uniqueSurnames2 = new HashSet(); for (String surName : allSurnames) { uniqueSurnames2.add(surName); } for (String surName : uniqueSurnames2) { System.out.println(surName); } Associative Collections Sets 25. 25 Set collections can really represent mathematical sets, this also includes operations on sets (e.g. via .Net's ISet interface): Subsets Union // C#/.Net ISet A = new HashSet{ 1, 2, 3, 4, 5, 6, 7, 8, 9 }; ISet B = new HashSet{ 3, 6, 8 }; bool BSubSetOfA = B.IsSubsetOf(A); // >true bool BProperSubSetOfA = B.IsProperSubsetOf(A); // >true Associative Collections Operations on Sets Part I // Java Set A = new HashSet(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9)); Set B = new HashSet(Arrays.asList(3, 6, 8)); boolean BSubSetOfA = A.containsAll(B); // >true // Java doesn't provide a test for _proper_ subsets. AB AB A={1,2,3,4,5,6,7,8,9};B={3,6,8} A B AB A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}{4,5,6,7,8,9}={1,2,3,4,5,6,7,8,9} ISet A = new SortedSet{ 1, 2, 3, 4, 5, 6 }; ISet B = new SortedSet{ 4, 5, 6, 7, 8, 9 }; A.UnionWith(B); // The set A will be _modified_! // >{1, 2, 3, 4, 5, 6, 7, 8, 9} // LINQ's Union() extension method will create a _new_ // sequence: ISet A2 = new SortedSet{ 1, 2, 3, 4, 5, 6 }; ISet B2 = new SortedSet{ 4, 5, 6, 7, 8, 9 }; IEnumerable A2UnionB2 = A2.Union(B2); // >{1, 2, 3, 4, 5, 6, 7, 8, 9} Set A = new TreeSet(Arrays.asList(1, 2, 3, 4, 5, 6)); Set B = new TreeSet(Arrays.asList(4, 5, 6, 7, 8, 9)); boolean AWasModified = A.addAll(B); // The set A will be _modified_! // >true System.out.println(A); // >[1, 2, 3, 4, 5, 6, 7, 8, 9] A B 26. 26 Difference Symmetric difference // C#/.Net ISet A = new SortedSet{ 1, 2, 3, 4, 5, 6 }; ISet B = new SortedSet{ 4, 5, 6, 7, 8, 9 }; A.ExceptWith(B); // The set A will be _modified_! // >{1, 2, 3} // LINQ's Except() method will create a _new_ sequence: ISet A2 = new SortedSet{ 1, 2, 3, 4, 5, 6 }; ISet B2 = new SortedSet{ 4, 5, 6, 7, 8, 9 }; IEnumerable A2ExceptB2 = A2.Except(B2); // >{1, 2, 3} Associative Collections Operations on Sets Part II A B A B // Java Set A = new TreeSet(Arrays.asList(1, 2, 3, 4, 5, 6)); Set B = new TreeSet(Arrays.asList(4, 5, 6, 7, 8, 9)); A.removeAll(B); // The set A will be _modified_! System.out.println(A); // >[1, 2, 3] A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}{4,5,6,7,8,9}={1,2,3} AB(:=( A B)(B A)) A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}{4,5,6,7,8,9}={1,2,3,7,8,9} A B ISet A = new SortedSet{ 1, 2, 3, 4, 5, 6 }; ISet B = new SortedSet{ 4, 5, 6, 7, 8, 9 }; A.SymmetricExceptWith(B); // The set A will be _modified_! // >{1, 2, 3, 7, 8, 9} // Using LINQ we can create a _new_ sequence: ISet A2 = new SortedSet{ 1, 2, 3, 4, 5, 6 }; ISet B2 = new SortedSet{ 4, 5, 6, 7, 8, 9 }; IEnumerable A2SymmetricExceptB2 = A2.Except(B2).Union(B2.Except(A2)); // >{1, 2, 3, 7, 8, 9} Set A = new TreeSet(Arrays.asList(1, 2, 3, 4, 5, 6)); Set B = new TreeSet(Arrays.asList(4, 5, 6, 7, 8, 9)); Set A2 = new TreeSet(Arrays.asList(1, 2, 3, 4, 5, 6)); A.removeAll(B); // The sets A and B will be _modified_! B.removeAll(A2); A.addAll(B); System.out.println(A); // >[1, 2, 3, 7, 8, 9] 27. 27 Intersection Associative Collections Operations on Sets Part III AB A={1,2,3,4,5,6};B={4,5,6,7,8,9};{1,2,3,4,5,6}{4,5,6,7,8,9}={4,5,6} A B // C#/.Net ISet A = new SortedSet{ 1, 2, 3, 4, 5, 6 }; ISet B = new SortedSet{ 4, 5, 6, 7, 8, 9 }; A.IntersectWith(B); // The set A will be _modified_! // >{4, 5, 6} // LINQ's Intersect() method will create a _new_ sequence: ISet A2 = new SortedSet{ 1, 2, 3, 4, 5, 6 }; ISet B2 = new SortedSet{ 4, 5, 6, 7, 8, 9 }; IEnumerable A2IntersectionWithB2 = A2.Intersect(B2); // >{4, 5, 6} // Java Set A = new TreeSet(Arrays.asList(1, 2, 3, 4, 5, 6)); Set B = new TreeSet(Arrays.asList(4, 5, 6, 7, 8, 9)); A.retainAll(B); // The set A will be _modified_! System.out.println(A); // >[4, 5, 6] 28. 28 Inserting values having already present keys in an associative collection overwrites or updates present values. Sometimes it is not desired. E.g. mind a telephoneDictionary, in which a name can have more than one phone number! Some collection frameworks provide associative collections that can handle multiplicity. In C++ we can use std::multimap () and std::multiset (). In other frameworks (Java, .Net etc.) multi-associative collections need to be explicitly implemented or taken from 3rd party. 3rd party sources: Apache Commons (Java) // C++11 std::multimap telephoneDictionary { { "Ben", 9427 }, // Mind: two values with the same key are added here. { "Ben", 4367 }, { "Jody", 1781 }, // Mind: three values with the same key are added here. { "Jody", 9032 }, { "Jody", 8038 } }; // It is required to use STL iterators, because std::multimap provides no subscript operator: for (auto item = telephoneDictionary.begin(); item != telephoneDictionary.end(); ++item) { std::cout