Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in...

37
Sorting

Transcript of Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in...

Page 1: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

Sorting

Page 2: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 2

Sorting

• Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)).

• Being able to sort data efficiently is thus a quite important ability

• But how fast can be sort data…?

Page 3: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 3

Selection sort

• A very simple algorithm for sorting an array of n integers works like this:– Search the array from element 0 to element

(n-1), to find the smallest element– If the smallest element is element i, then swap

element 0 and element i– Now repeat the process from element 1 to

element (n-1)– …and so on…

Page 4: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 4

Selection sort

10 56 26 4 82 7634 18 60 40

Page 5: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 5

Selection sort

10 56 26 4 82 7634 18 60 40

Page 6: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 6

Selection sort

10 56 26 34 82 764 18 60 40

Page 7: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 7

Selection sort

10 56 26 34 82 764 18 60 40

Page 8: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 8

Selection sort

10 56 26 34 82 764 18 60 40

Page 9: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 9

Selection sort

10 56 26 34 82 764 18 60 40

Page 10: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 10

Selection sort

10 56 26 34 82 764 18 60 40

Page 11: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 11

Selection sort

10 18 26 34 40 564 60 76 82

Page 12: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 12

Selection sort

• How fast is selection sort?

• We scan for the smallest element n times– In scan 1, we examine n element– In scan 2, we examine (n-1) element– …and so on

• A total of n + (n -1) + (n – 2) +…+ 2 + 1 examinations

• The sum is n(n + 1)/2

Page 13: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 13

Selection sort

• The total number of examinations is equal to n(n + 1)/2 = (n2 + n)/2

• The run-time complexity of selection sort is therefore O(n2)

• O(n2) grows fairly fast…

Page 14: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 14

Selection sort

n n2

2 4

5 25

20 400

50 2500

200 40000

Page 15: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 15

Exercise 1

• Download the project selectionSortInJava from the PSL website• Examine the code – see how selection sort is implemented in Java• The project contains two helper classes ArrayUtil (generates a

random array of integers), and StopWatch (can measure the time needed to execute some code). Using these classes, the program measures how long it takes to sort an array using selection sort

• Try to run the program with various array sizes. For each run, write down the array size and the elapsed time. Make sure to try some array sizes that take several seconds to complete

• Enter the data into an Excel spreadsheet, plot a curve from the data, and see how the run time behaves when the array size increases

Page 16: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 16

Merge sort

• Selection sort is conceptually very simple, but not very efficient…

• A different algorithm for sorting is merge sort

• Merge sort is an example of a divide-and-conquer algorithm

• It is also a recursive algorithm

Page 17: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 17

Merge sort

• The principle in merge sort is to merge two already sorted arrays:

10 26 34 56 18 404 60 76 82

Page 18: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 18

Merge sort

10 26 34 56 18 404 60 76 82

Page 19: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 19

Merge sort

• Merging two sorted arrays is pretty simple, but how did the arrays get sorted…?

• Recursion to the rescue!

• Sort the two arrays simply by appying merge sort to them…

• If the array has length 1 (or 0), it is sorted

Page 20: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 20

Merge sortpublic void sort() // Sort the array a

{

if (a.length <= 1) return; // Base case

int[] a1 = new int[a.length/2]; // Create two new

int[] a2 = new int[a.length – a1.length]; // arrays to sort

System.arraycopy(a,0,a1,0,a1.length); // Copy data to

System.arraycopy(a,a1.length,a2,0,a2.length); // the new arrays

MergeSorter ms1 = new MergeSorter(a1); // Create two new

MergeSorter ms2 = new MergeSorter(a2); // sorter objects

ms1.sort(); // Sort the two

ms2.sort(); // new arrays

merge(a1,a2); // Merge the arrays

}

Page 21: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 21

Merge sort

• All that is left is the method for merging two arrays

• A little bit tedious, but as such trivial…

• Time needed to merge two arrays to the total length of the arrays, i.e to n

• We can now analyse the run-time com-plexity for merge sort

Page 22: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 22

Merge sort

• Merge sort of an array of length n requires– Two merge sorts of arrays of length n/2– Merging two arrays of length n/2

• The running time T(n) then becomes:

T(n) = 2×T(n/2) + n

Page 23: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 23

Merge sort

• If we re-insert the expression for T(n) into itself m times, we get

T(n) = 2m×T(n/2m) + mn

• If we choose m such that n = 2m, we get

T(n) = n×T(1) + mn = n + n×log(n)

Page 24: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 24

Merge sort

• The run-time complexity of merge sort is therefore O(n log(n))

• Many other sorting algorithms have this run-time complexity

• This is the fastest we can sort, except under very special circumstances

• Much better than O(n2)…

Page 25: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 25

Merge sort

n n log(n) n2

2 2 4

5 12 25

20 86 400

50 282 2500

200 1529 40000

Page 26: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 26

Exercise 2

• Download the project mergeSortInJava from the PSL website• Examine the code – see how merge sort is implemented in Java

(the project contains the same helper classes as the selectionSortInJava project – ArrayUtil and StopWatch)

• Try to run the program with various array sizes. For each run, write down the array size and the elapsed time. Make sure to try some array sizes that take several seconds to complete

• Enter the data into an Excel spreadsheet, plot a curve from the data, and see how the run time behaves when the array size increases

• Compare the results with the results obtained for selection sort – when do the curves for run time cross each other (if at all)?

Page 27: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 27

Sorting in practice

• It does matter which sorting algorithm you use…

• …but do I have to code sorting algorithms myself?

• No! You can – and should – use sorting algorithms found in the Java library

Page 28: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 28

Sorting in practice

• Sorting an array:

Car[] cars = new Car[n];

Arrays.sort(cars);

Page 29: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 29

Sorting in practice

• Sorting an arraylist:

ArrayList<Car> cars =

new ArrayList<Car>();

Collections.sort(cars);

Page 30: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 30

Sorting in practice

• Why not code my own sorting algorithms?

• Sorting algorithms in Java library are better than anything you can produce…– Carefully debugged– Highly optimised– Used by thousands

• You cannot beat them

Page 31: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 31

Sorting in practice

• In order to sort an array of data, we need to be able to compare the elements

• ”Larger than” should make sense for the elements in the array

• Easy for numeric types (>)

• What about types we define ourselves…?

Page 32: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 32

Sorting in practice

• If a class T implements the Comparable interface, objects of type T can be compared:

public interface Comparable<T>

{

int compareTo(T other);

}

Page 33: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 33

Sorting in practice

• In the interface definition, T is a type parameter

• It is used the same way as we use an arraylist

• ArrayList<Car> : an arraylist holding elements of type Car

Page 34: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 34

Sorting in practice

• In order for the sorting algorithms to work properly, an implementation of the interface must obey these rules:

• The call a.compareTo(b) must return:– A negative number if a < b– Zero if a = b– A positive number if a > b

Page 35: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 35

Sorting in practice

• The implementation of compareTo must define a so-called total ordering:– Antisymmetric: If a.compareTo(b) ≤ 0, then b.compareTo(a) ≥ 0

– Reflexive: a.compareTo(a) = 0– Transitive: If a.compareTo(b) ≤ 0 and b.compareTo(c) ≤ 0, then a.compareTo(c) ≤ 0

Page 36: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 36

Sorting in practicepublic class Car implements Comparable<Car>

{

...

// Here using weight as ordering criterion

//

public int compareTo(Car other)

{

if (getWeight() < other.getWeight()) return -1;

if (getWeight() == other.getWeight()) return 0;

return 1;

}

...

}

Page 37: Sorting. RHS – SOC 2 Sorting Searching in sorted data is much faster (O(log(n)), than searching in unsorted data (O(n)). Being able to sort data efficiently.

RHS – SOC 37

Exercises

• Programming P14.12, P.14.14

• For P14.14, read about the Comparator interface in Advanced Topic 14.5, page 657-658