(3) collections algorithms

24
Nico Ludwig (@ersatzteilchen) Collections – Part III

description

- Sequential Collections – Linked Lists -- Pattern Matching in functional Programming Languages - Ordered and unordered Collections - Iterators - An Interface-based Approach - The Dependency Inversion Principle (DIP)

Transcript of (3) collections algorithms

Page 1: (3) collections algorithms

Nico Ludwig (@ersatzteilchen)

Collections – Part III

Page 2: (3) collections algorithms

2

● Collections – Part III– Sequential Collections – Linked Lists

● Pattern Matching in functional Programming Languages

– Ordered and unordered Collections

– Iterators

– An Interface-based Approach

– The Dependency Inversion Principle (DIP)

TOC

Page 3: (3) collections algorithms

3

● Often we need to append elements at the end or prepend elements to the front of an indexed collection (e.g. a list):

● Appending elements to a list works like so:– (Internally citiesToVisit is backed by an array.)

– (1) Find the free slot after the last element.

– (2) If there is no free slot, resize the backing array to create space.

– (3) Insert the new value "Oslo" into the free slot.

● Of course this works, but it is somewhat inefficient:– In the worst case (in step (2)) the backing array's capacity is too small to hold another element, e.g. if we add a tenth element!

● Then a completely new array with a larger capacity needs to be created!

– This "enlarging" problem is present, when elements are appended, prepended or otherwise inserted.

● Therefor the initial specification of the capacity of an indexed collection is possible.

– But there is another way to address the resizing problem with another type of collections: sequential collections, e.g. linked lists.

// Java// The specified cities need to be visited in the given order:ArrayList citiesToVisit = new ArrayList(Arrays.asList("London", "Paris", "Frankfurt"));// But then we've been informed that we need to visit "Oslo" after we visited "Frankfurt". So let's append "Oslo" to the list:citiesToVisit.add("Oslo");

citiesToVisit

"London" "Paris" "Frankfurt"

1 20 3

null null null null null null null

citiesToVisit

"London" "Paris" "Frankfurt"

1 20

"Oslo" null null null null null null

3

When indexed Collections reach their Limits

Page 4: (3) collections algorithms

4

citiesToVisit (simplified java.util.LinkedList)

● The internal representation of linked lists (here we'll discuss Java's java.util.LinkedList) is no longer array-based!– Its elements are not stored in a continuos block of memory, but as a bunch of linked nodes.

● Appending of a new element works like so:– (1) Get the last node.

– (2) Create a new node with the value "Oslo".

– (3) Link the new node with the last node.

– (4) Designate the new node as last node.

● Some facts on LinkedList:– LinkedList does directly refer the first and the last node.

– Each node refers its successor.

– A node referring to null as its successor is the last node.

– That's basically it!

// Java// The specified cities need to be visited in the given order:LinkedList citiesToVisit = new LinkedList(Arrays.asList("London", "Paris", "Frankfurt"));

(Node)

"London" element

(Node) next

(Node)

"Paris" element

(Node) next

(Node)

"Frankfurt" element

null next

(Node) first

(Node)

"Oslo" element

null next

(Node) last// Append "Oslo":citiesToVisit.add("Oslo");

(Node)

Sequential Collections – Linked Lists

Page 5: (3) collections algorithms

5

● For further discussions, we'll use Lisp's box notation of the linked list we've just created:

● Linked lists facts:– They have very good performance, when elements are prepended or appended.

● This is because no continuous block of memory needs administration in form of reallocations, only references (i.e. shortcuts) need to be moved around.

● The performance of prepending and appending elements is independent of the count of the list's elements.

● => Prepending and appending are constant time operations on a linked list, because the first and last node are directly accessible.

– Linked lists that have nodes referring to successor and predecessor nodes are called doubly linked lists.

● (Java's and .Net's LinkedLists are in fact doubly linked lists.)

citiesToVisit"London" "Paris" "Frankfurt" "Oslo"

nullfirst last

Lisp's box notation

citiesToVisit"London" "Paris" "Frankfurt" "Oslo""Rome" "Dublin"

null lastfirstappended element

(the new last element)prepended element

(the new first element)

citiesToVisit "London" "Paris" "Frankfurt" "Oslo"

nullnull lastfirst

Linked List – Appending and Prepending Elements

Page 6: (3) collections algorithms

6

● More facts on linked lists:– Virtually linked lists don't have a notion of physical indexes, because no "indexable" memory builds up its base.

– However, it is possible to map logical indexes to nodes beginning from the first node.

– Accessing an arbitrary element by an index k is a relatively expensive operation, because all k nodes need to be visited sequentially!

● The performance of index-accessing elements is dependent of the index that is to be accessed!

● Index-accessing elements are O(n) operations on a linked list, because individual elements are not directly accessible. Loops are required here!

● Or: linked list don't allow random access to elements. - Well, they are no indexed collections and we only have sequential access to elements!

Node currentNode = citiesToVisit.first;int k = 2;int currentIndex = 0;while (null != currentNode && currentIndex != k) {

currentNode = currentNode.next;++currentIndex;

}System.out.printf("element at index %d: '%s'.", k, null != currentNode? currentNode.element : "");

citiesToVisit

"London" "Paris" "Frankfurt" "Oslo"

nullfirst last

logical indexes1 20 3

Linked List – Accessing Elements

Page 7: (3) collections algorithms

7

● Esp. in functional programming (fp) languages, lists do often play an important role; therefor I'll lose some words about it.– As a matter of fact immutable (linked) lists are often first class citizens in fp languages, therefor they have direct syntactical support.

● (In opposite to imperative languages, where (mutable) arrays are often first class citizens with direct syntactical support.)

● Before we discuss this syntactical support, we've to clarify how linked lists can be understood as a chain of cascaded headand tail elements:

● So a single list element consists of two parts: the head and the tail:

– Cascading allows the tail of the element to be yet another element having a head and a tail.

Linked List – Functional Programming – Part I

citiesToVisit "London" "Paris" "Frankfurt" "Oslo"

null

"London" "Paris" "Frankfurt" "Oslo" null

"Oslo" null

tailhead

"Frankfurt" "Oslo" null

tailhead

Page 8: (3) collections algorithms

8

● The idea of the mentioned syntactical support is to break lists into head and tail with a special operator:

– In each snippet the passed list is matched against a given pattern:

● If the list is not empty, it'll be broken into its head and tail. The head will be added to the result of a recursive call with the tail.

● If the list is empty the result is 0 and the recursion ends.

– This idiom of the languages F#, Haskell and Scala is called pattern matching. Pattern matching is a kind of control structure.

● In Lisp we don't have syntactical support like pattern matching, but there are functions to get the head and the tail of a list:– car (Contents of the Address part of Register number) and

cdr (Contents of the Decrement part of Register number).; Lisp(defun sum-list(l) (if (not (null l))

(+ (car l)(sum-list (cdr l)))0))

(sum-list '(1 2 3 4 5 6 7 8 9))

Linked List – Functional Programming – Part II

-- Haskellimport System.Environment

sumList (x:xs) = x + sumList xssumList [] = 0 main = print(sumList [1, 2, 3, 4, 5, 6, 7, 8, 9])

// F#let rec sumList l =

match l with | head :: tail -> head + sumList tail | [] -> 0

printfn "%A" (sumList [1; 2; 3; 4; 5; 6; 7; 8; 9])

// Scaladef sumList(list: List[Int]) : Int =

list match { case head :: tail => head + sumList(tail) case Nil => 0 }

sumList(List(1, 2, 3, 4, 5, 6, 7, 8, 9))

Page 9: (3) collections algorithms

9

● Up to now we discussed so called ordered collections.– (The opposite are unordered collections.)

● Basically in ordered collections the order, in which elements are added is the same order, in which they'll be iterated.

● Notice that "being ordered" has nothing to do with being sorted!

● Arrays, lists and linked lists are ordered collections.

● Now its time to understand what "iteration of collections" means.

Ordered and unordered Collections

Page 10: (3) collections algorithms

10

● Each collection allows accessing its elements iteratively. This can be done with loops, e.g. with for loops:

– Not all collections have indexes, on which iteraton can be done. But nevertheless generalized iteration is possible in most languages.

● Such languages introduce the concept of an iterator that abstracts iteration from an underlying collection.

● If collections provide an iterator, iterative access to these collections can be done with the same pattern/syntax. "Iterator" defines a basic design pattern.

● Iterative access: elements of collections can be retrieved in a loop in a defined manner and usually in a stable order.– Defined manner: the algorithm, with which the elements are retrieved is strictly defined. Usually it is a forward-iteration.

– Stable order: the order of iterated elements won't change in between full iterations as long as the underlying collection wasn't modified.

– In solid words iterability allows programming concise forward loops with an unknown end condition.

● The implicit end condition is "no more elements to iterate".

● Now let's understand how iterability as a feature of collections can help us.

// C#'s foreach loop is iterator driven. Mind that no index is required!ArrayList numbers = new ArrayList{1, 2, 3, 4, 5};foreach (int element in numbers) {

Console.WriteLine(element);}

// C#ArrayList numbers = new ArrayList{1, 2, 3, 4, 5};for (int i = 0; i < numbers.length; ++i) {

Console.WriteLine(numbers[i]);}

Iterability – The least common Denominator of Collections

Page 11: (3) collections algorithms

11

● "Iterator" is one of the easiest (behavioral) design patterns. The concept works like this:– Iterator provides a way to support iteration w/o unleashing the real structure of a collection.

● E.g. no indexed collection is required and no length of the collection must be specified or known (unknown end condition).

– The aim of iterator is to separate the collection from the element access.

– Usually the elements visited during iteration are called items. Hence we'll use both terms (elements and items) interchangeably.

– Not all items of a collection are fetched, but only the current item.

● How does this concept help us?– One benefit is that we have a guarantee that all collections can be iterated, e.g. via foreach loops (e.g. in .Net).

– The concept is like a contract: all collections support iterability.

● Good! But how do collections support iterability? How do collections "publish" the feature of "being iterable"?– We have to define the concept of being "iterable", in other words we've to define iterable behavior...

● Now we'll talk about the way Java's and .Net's modern oo-technologies handle this "abstraction of behaviors".– => Let's discuss Java and C#/.Net's interfaces.

Iterability – The Iterator Concept

Page 12: (3) collections algorithms

12

● Iterability can be abstracted: collections providing iterability implement a certain interface.– An interface is a type that only declares a set of unimplemented methods, furthermore it doesn't provide any data members (fields).

– An interface only defines a behavior that implementors of that interface have to adopt.

● In .Net iterable collections implement the interface IEnumerable:

– And IEnumerable.GetEnumerator() returns another object implementing the interface IEnumerator:

– And to drive iterability home the type ArrayList does implement IEnumerable, because it is an iterable collection:

● Now its time to see IEnumerable in action...

public interface IEnumerable {IEnumerator GetEnumerator();

}

public interface IEnumerator {object Current { get; }bool MoveNext();void Reset();

}

// (simplified)public class ArrayList : IEnumerable {

public IEnumerator GetEnumerator() { /* pass */ }}

Interface-based Iterability Support in .Net – Part I

Page 13: (3) collections algorithms

13

● Following C# foreach loop ...

● … will really be transformed into code working like this:

● To drive this point home:– In .Net, iterable collections, which are basically all collections of the .Net framework, implement the interface IEnumerable.

● Java collections implement the interface Iterable, Cocoa's collections implement the protocol NSFastEnumeration.

– Modern collection APIs express/publish their behavioral aspects via interfaces.

– We'll revisit this aspect when we discuss further collection-relevant topics.

● IEnumerable is the basis for .Net's LINQ.

// Iterate over an array with foreach:foreach (int item in numbers) {

Console.WriteLine(item);}

// Virtually following iteration is executed for foreach via the iterator (simplified):IEnumerator items = numbers.GetEnumerator();while (items.MoveNext()) {

Console.WriteLine(items.Current);}

Interface-based Iterability Support in .Net – Part II

// We have following list of ints:ArrayList numbers = new ArrayList{1, 2, 3, 4, 5};

Page 14: (3) collections algorithms

14

● The iterability of linked list is a special topic:– Due to the linked structure, forward iteration is a very efficient operation. (And also backward iteration for doubly linked lists.)

– We can of course use an iterator to hide us from the details of the linked structure.

● Java supports bidirectional iterators that can move to next as well as previous items with the ListIterator:

– Inserting/removing items from a linked list are expensive operations in general (e.g. from an index), but they're cheap during iteration:

● Mind that we could have used pattern matching in an fp language for iteration (which is implemented as recursion).

Node currentNode = citiesToVisit.first;while (null != node) {

// do something with currentNode.item currentNode = currentNode.next;}

ListIterator listIterator = citiesToVisit.listIterator(); // Get a bidirectional iterator from the ArrayList.Object value1 = listIterator.next(); // value1 = "London"Object value2 = listIterator.next(); // value2 = "Paris"Object value3 = listIterator.previous(); // value3 = "Paris"; a switch of the direction will return the last item as previous/next item.Object value4 = listIterator.previous(); // value4 = "London"

for (Object item : citiesToVisit) {// do something with currentNode.item

}

// Clear the whole LinkedList via a ListIterator:while (listIterator.hasNext()) {

listIterator.next(); // advances the iteratorlistIterator.remove(); // removes the item

}

Iterability – With linked Lists

citiesToVisit "London" "Paris" "Frankfurt" "Oslo"

nullnull lastfirst

Page 15: (3) collections algorithms

15

● General benefits of iterators:– We can use them, if we don't know the length of the collection we're iterating.

– Iterators are good to avoid off-by-one errors, because there is no index.

● .Net/C# and Java's implementation of iterator are similar:– E.g.: Java: Iterable<T> needs to be implemented by collections to support Java's for each loop. (This is different for Java's arrays).

– During iteration, the collection as well as the current item are generally mutation guarded:

● Java and C++ support bidirectional iterators that can move to next as well as previous positions (e.g. Java's ListIterator).– The ListIterator is special in Java, as it can also modify the collection being iterated.

– Let's understand why Java's ListIterator is useful when used together with LinkedLists...

// C#: Fail fast.ArrayList numbers = new ArrayList{1, 2, 3, 4, 5, 6, 7, 8, 9};foreach (int item in numbers) {// Will throw InvalidOperationException, because numbers was modified// concurrently during the iteration.

numbers.Add(10);}

// Java: Fail fast.ArrayList numbers = new ArrayList(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9));for (Object item : numbers) {// Will throw ConcurrentModificationException, because numbers was modified// concurrently during the iteration.

numbers.add(10);}

Iterability – The Iterator Concept in other Frameworks – Part I

Page 16: (3) collections algorithms

16

● In C/C++ iteration of arrays is possible via pointer arithmetics.– Arrays have a strict feature in C/C++: elements reside on a contigouos block of memory. Thererfor pointer arithmetics work nicely.

– We can iterate from a pointer to the first element of the array to the array's past-the-end-pointer.

● C++ adopts the operators that have been used for pointer arithmetics to implement iterability over C++ collections.– In C++ the collection APIs are part of the standard template library (STL). The STL has another name for collections: containers.

– The most important indexed collection in the STL is the std::vector (<vector>).

– std::vector is a generic collection (in opposite to an object-based collection). That means we have to specify the element type.

– The STL introduces the UDT std::iterator that abstracts iterability of STL collections.

– std::iterator overloads several operators to use instances like pointers to implement pointer arithmetics (++, */->, ==/!=).

● Some STL types support other iterators with special features: forward, input, output, bi-directional, random access and const iterators.

int numbers[] = {1, 2, 3, 4, 5}; // six elementsfor (int* iter = numbers; iter != numbers + 5; ++iter) {

std::cout<<*iter<<std::endl;}

std::vector<int> numbers; // numbers is a vector containing int-elementsnumbers.push_back(1); numbers.push_back(2); numbers.push_back(3); // Add five ints.numbers.push_back(4); numbers.push_back(5);for (std::vector<int>::iterator iter = numbers.begin(); iter != numbers.end(); ++iter) {

std::cout<<*iter<<std::endl;}

3 4 5 ? numbers21

numbers numbers + 5

C++11// In C++11 it is much simpler to iterate containers:// Initialize numbers with an initializer list:std::vector<int> numbers{1, 2, 3, 4, 5};// Iterate over numbers with a range-based for loop:for (int item : numbers) {

std::cout<<item<<std::endl;}

Iterability – The Iterator Concept in other Frameworks – Part II

Page 17: (3) collections algorithms

17

● JavaScript is a special topic: It supports a loop looking like an iterator loop (foreach), but isn't one! There are two problems:– (1) The current item in a loop is the index of that item in the iterated list, not the item itself:

– (2) The order of items yielded by the iteration is undefined! (The iterator pattern doesn't mandate order, but most coders assume it.)

● => Avoid using JavaScript's "foreach" loop! Better use good old for loops!– for loops are the most general way to iterate in JavaScript.

● Only for loops can be used in many cases, e.g. when iterating the result of document.querySelectorAll().

– (JavaScript's "foreach" allows to enumerate the values of all the properties of an object, or to get the count of an object's properties.)

var names = ["Mona", "Lisa"];for (var item in names) {

console.log("item: "+item);}//> item: 0 // Often unexpected!//> item: 1

for (var i in names) { // Use the "item" as index: console.log("item: "+names[i]);

}//> item: Mona // Aha!//> item: Lisa

Iterability – The Iterator Concept in other Frameworks – Part III

Page 18: (3) collections algorithms

18

● Apart from iterability we can abstract more aspects, e.g. that a C# type is a collection per se.– All collections seem to have the property Count in common, e.g. in ArrayLists:

– From this consideration we can derive .Net's common interface that all collections have to implement: ICollection.

● Apart from iterability we can also abstract indexability of indexed collections.– E.g. for ArrayLists the indexer can be extracted (among other methods) as an important feature of indexed collections:

– From this consideration we can derive .Net's common interface that all indexable collections have to implement: IList.

ArrayList- elementData : Object[]+ Count : int

<<interface>>

ICollection+ Count : int

ArrayList- elementData : Object[]+ Count : int

Other members hidden.

<<interface>>

IList+ this[int index] : Object

ArrayList- elementData : Object[]+ this[int index] : Object+ Count : int

Other members hidden.

<<interface>>

ICollection

ArrayList- elementData : Object[]+ this[int index] : Object

Other members hidden.

<<interface>>

ListArrayList

LinkedList

Java – the List interface

Interface-based Design of modern Collection APIs – Part I

Other members hidden.

Page 19: (3) collections algorithms

19

● That's all well and good, but what have we gained having interfaces?

● The answer is that interfaces allow abstracting usage from implementation. Let's discuss following perspectives:– Collection types (e.g. classes) that share the same behavior implement the same interface, but have different implementations.

– A collection can expose different behaviors at the the same time, i.e. it can implement multiple interfaces.

– Here an example:

● This example shows...

– … that ArrayList and int[] are ICollections

ICollection aCollection; // Create two collections: an ArrayList and an int-array:ArrayList listOfNumbers = new ArrayList{20, 13, 45, 60};int[] arrayOfNumbers = new []{0, 1, 2, 3, 4, 5, 6, 7, 8, 9};// ArrayList implements ICollection, implicit conversion is allowed:aCollection = listOfNumbers;Console.WriteLine(aCollection.Count);// Arrays also implement ICollection, implicit conversion is allowed:aCollection = arrayOfNumbers; // Implicit conversion! Console.WriteLine(aCollection.Count);

public static void OutputLength(ICollection anyCollection) {Console.WriteLine(anyCollection.Count);

}// Both calls work, because ArrayList and int-array implement ICollection.OutputLength(listOfNumbers); // Implicit conversion!OutputLength(arrayOfNumbers);

Interface-based Design of modern Collection APIs – Part II

Page 20: (3) collections algorithms

20

● The method OutputLength() unleashes following views:– Callers: "I can pass any type that implements ICollection to OutputLength()!"

– Callee (OutputLength()): "Just give me an ICollection object, the concrete type doesn't matter to me!"

– The conclusion to be drawn from both views: interfaces separate interfaces of types from their implementations.

● A class diagram shows the situation more clearly:

– OutputLength() only depends on the interface ICollection, but not on any of its concrete implementations (ArrayList or int[])!

– This fact makes OutputLength() a very valuable method, because it can handle any present and future ICollection types.

– When concrete types only depend on abstracter types, we call this the dependency inversion principle (DIP).

public static void OutputLength(ICollection anyCollection) {Console.WriteLine(anyCollection.Count);

}OutputLength(listOfNumbers);OutputLength(arrayOfNumbers);

<<interface>>

ICollection+ Count : int

MyAlgorithms+ OutputLength(anyCollection : ICollection)

ArrayList+ Count : int

int[] (Array)+ Count : int

Interface-based Design of modern Collection APIs – Part III

Page 21: (3) collections algorithms

21

● (1) Decoupling: The DIP yields the benefit of separation of interface from implementation.– And the separation of interface from implementation provides so called decoupling. E.g. for ICollections or USB-interfaces:

● (2) Static and dynamic type: For example collections can implement the same interface, but have different implementations.– An interface defines a concept and a class implements a concept.

– This idea is another exploitation of the static vs. dynamic type of objects.

– This idea is also another incarnation of the substitution principle of objects.

<<interface>>

ICollectionArrayList

MyAlgorithmsint[] (Array)

MyAlgorithms "knows" nothing about ArrayList or int[]!The interface ICollection perfectly decouples MyAlgorithmsfrom these types!

<<interface>>

USBMouse

MacBook

Printer MacBook "knows" nothing about Mouse or Printer!The USB-interface perfectly decouples MacBookfrom these types!

public static void OutputLength(ICollection anyCollection) {// The static type of anyCollection is ICollection, the// dynamic type could be int[] or ArrayList or any other// implementation of ICollection.

}// Call with static type ArrayList:OutputLength(new ArrayList{20, 13, 45, 60});// Call with static type int[]:OutputLength(new []{0, 1, 2, 3, 4, 5, 6, 7, 8, 9});

The Dependency Inversion Principle (DIP) – Part I

Page 22: (3) collections algorithms

22

● DIP isn't only applicable for collections, in fact it is a basis of many design patterns to create maintainable oo architectures.– The reusability of code increases, because all objects of more concrete types can be used (int[], ArrayList etc.).

– We only need to know the methods of the abstract type's interface basically (only those of ICollection).

● The application of DIP can be reduced to three rules:– Only expose abstract types in public interfaces.

– Only use abstract types when objects are declared (supertypes or interfaces).

– Concrete types are only used to create instances with the new operator and using a ctor.

public class WithDependencyInversion {private ICollection _elements = new ArrayList();

public static void OutputLength(ICollection anyCollection) {Console.WriteLine(anyCollection.Count);

}

public ICollection GetElements() {return _elements;

}}

public class NoDependencyInversion {private ArrayList _elements = new ArrayList();

public static void OutputLength(ArrayList anyCollection) {Console.WriteLine(anyCollection.Count);

}

public ArrayList GetElements() {return _elements;

}}

The Dependency Inversion Principle (DIP) – Part II

Page 23: (3) collections algorithms

23

● Let's catch a glimpse of why DIP is important for, e.g., .Net's collection framework: we have downright very good reusability.– E.g. ArrayList provides a ctor and the method AddRange() accepting ICollections.

– And were an ICollection is accepted any type implementing ICollection can be used, so for these methods:

● We're going to encounter DIP in many more places on our route through modern collection frameworks.

● In future lectures and examples we'll strive to using DIP wherever applicable.– This goes hand in hand with learning the collection frameworks along with its abstraction hierarchies.

int[] numbers = {1, 2, 3, 4, 5};

// Construct an ArrayList with another ICollection (int[]):ArrayList mergedNumbers = new ArrayList(numbers);

ArrayList moreNumbers = new ArrayList{6, 7, 8, 9, 10};

// Add yet another ICollection (ArrayList) to mergedNumbers:mergedNumbers.AddRange(moreNumbers);

public class ArrayList { // (members hidden)

public ArrayList(ICollection c) {/* pass */

}

public virtual void AddRange(ICollection c) {/* pass */

}}

The Dependency Inversion Principle (DIP) – Part III

Page 24: (3) collections algorithms

24

Thank you!