CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with...

41
CS 137 Part 8 Merge Sort, Quick Sort, Binary Search

Transcript of CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with...

Page 1: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

CS 137 Part 8Merge Sort, Quick Sort, Binary Search

Page 2: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

This Week

• We’re going to see two more complicated sorting algorithmsthat will be our first introduction to O(n log n) sortingalgorithms.

• The first of which is Merge Sort.

• Basic Idea:

1. Divide array in half2. Sort each half recursively3. Merge the results

Page 4: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Merge Sort

void sort(int a[], int n) {

int *t = malloc(n*sizeof(a[0]));

assert(t);

merge_sort(a, t, n);

free(t);

}

int main (void){

int a[] = {-10,2,14,-7,11,38};

int n = sizeof(a)/ sizeof(a[0]);

sort(a,n);

for (int i = 0; i < n; i++) {

printf("%d, ", a[i]);

}

printf("\n");

return 0;

}

Page 5: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

void merge_sort (int a[], int t[], int n) {

if (n <= 1) return;

int middle = n / 2;

int *lower = a;

int *upper = a + middle;

merge_sort(lower , t, middle );

merge_sort(upper , t, n - middle );

int i = 0; // lower index

int j = middle; // upper index

int k = 0; // temp index

while (i < middle && j < n) {

if (a[i] <= a[j]) t[k++] = a[i++];

else t[k++] = a[j++];

}

while (i < middle) t[k++] = a[i++];

while (j < n) t[k++] = a[j++];

for (i = 0; i < n; i++) a[i] = t[i];

}

Page 6: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Runtime of Merge Sort

Page 7: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Analysis

How much work is done by each instance?

• Two function calls of O(1)

• Copy the left and right into t[] is O(k) where k is the size ofthe current instance

• Copy t[] into a[] which is also O(k).

• Therefore, each instance of a merge sort is O(k).

Page 8: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Continuing

So each instance is O(k) but how many instances are there at eachlevel?

• The number of bubbles per row is n/k . This follows since afterk = n/2` halves, we’ve created 2` bubbles and this is n/k .

• Thus, for each level, the total amount of work done isO(n/k · k) = O(n).

Page 9: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Wrapping Up

• Finally, how many levels are there?

• To answer this, we are looking for a number m such thatn2m = 1.

• Solving gives m = log2(n).

• Hence, the total time is O(n log n).

• This analysis applies for the best, worst and average cases!

Page 10: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Quick Sort

• Created by Tony Hoare in 1959 (or 1960)

• Basic Idea:

1. Pick a pivot element p in the array2. Partition the array into

< p p ≥ p

3. Recursively sort the two partitions.

• Key benefit: No temporary array!

Page 11: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Tricky Point

• How do we pick the pivot and partition?

• We will discuss two such ways, the Lomuto Partition andchoosing a median of medians.

• The Lomuto Partition is usually easier to implement butdoesn’t preform as well in the worst case.

• The median of medians enhancement vastly improves theworst case runtime

• Note that Hoare’s original partitioning scheme has similarruntimes to Lomuto’s but is slightly harder to implement sowe won’t do so.

Page 12: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Lomuto Partition

• Swaps left to right.

• Pivot is first element (Could by last element withmodifications below if desired).

• Have two index variables:

1. m which represents the last index in the partition < p2. i which represents the first index of the unpartitioned part.3. In other words

So elements with indices between 1 and m inclusive are < pand elements with indices

Page 13: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Lomuto Partition Continued

There are two cases:

• If x < p, then increment m and swap a[m] with a[i] thenincrement i

• If x ≥ p, then just increment i .

• End case: When i = n, then we swap a[0] and a[m] so thatthe partition is in the middle.

Let’s see this in action

Page 14: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Quick Sort Example

Page 15: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Quick Sort Main

#include <stdio.h>

#include <stdlib.h>

#include <assert.h>

// Code here is on next slides

int main(void){

int a[] = {-10,2,14,-7,11,38};

const int n = sizeof(a)/ sizeof(a[0]);

quick_sort(a,n);

for (int i = 0; i < n; i++) {

printf("%d, ", a[i]);

}

printf("\n");

return 0;

}

Page 16: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Swapping

void swap(int *a, int *b) {

int t = *a;

*a = *b;

*b = t;

}

Page 17: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Quick Sort

void quick_sort(int a[], int n) {

if (n <= 1) return;

int m = 0;

for (int i = 1; i < n; i++) {

if (a[i] < a[0]) {

m++;

swap(&a[m],&a[i]);

}

}

// put pivot between partitions

// i.e, swap a[0] and a[m]

swap(&a[0],&a[m]);

quick_sort(a, m);

quick_sort(a + m + 1, n - m - 1);

}

Page 18: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Time Complexity Analysis

Best Case:

• Ideally, each partition is split into (roughly) equal halves.

• Going down the recursion tree, we notice that the analysishere is almost identical to merge sort.

• There are log n levels and at each level we have O(n) work.

• Therefore, in the best case, the runtime is O(n log n).

Page 19: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Time Complexity Analysis

Average Case:

• Again this isn’t quite easy to quantify but we’ll suppose ateach level, the pivot is between the 25th and 75th percentiles.

• Then, in this case, the worst partition is 3:1.

• Now, there are log4/3 n levels (solve (3/4)mn = 1) and ateach level we still have O(n) work.

• Therefore, in the average case, the runtime is O(n log n).

Page 20: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Time Complexity Analysis

Worst Case:

• Worst case turns out to be very poor.

• What if the array were sorted (or reverse sorted)?

• Then the array is always partitioned into an array of size 1and an array of size len-1.

• At each level we still have O(n) work.

• Unfortunately, there are now n levels for a total runtime ofO(n2).

• To be slightly more accurate, note that at each level, we havea constant times

(n − 1) + (n − 2) + ... + 2 + 1 =n(n − 1)

2= O(n2)

amount of work.

Page 21: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Well if that’s the case...

• ... then why is quick sort one of more frequently used sortingalgorithms?

• One of the reasons why is that we have other schemes thatcan help ensure that even in the worst case, we get O(n log n)as our runtime.

• The key idea is picking our pivot intelligently.

• Turns out while this does theoretically reduces our worst caseruntime, often in practice this isn’t done because it increasesour overhead of choosing a pivot.

Page 22: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Idea

• What we’ll do is group the array into groups of five (a total ofn/5 such groups)

• Then we’ll take the median of each of those groups

• Now of these n/5 medians, repeat until you eventually findthe median of these medians. Call this m.

• Thus, with this m, we know that m is bigger than n/10medians and each of those medians was at least the size of 3other numbers (including itself) and so in total, we know thatm is bigger than 3n/10 of the numbers and similarly that it issmaller than 3n/10 of the numbers.

• In the worst case then, we split the array into an array with3n/10 elements and one with 7n/10 elements. The analysis issimilar to the average case from before.

• For a more formal proof take CS 341 (Algorithms)!

Page 23: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Picture for 25 elements

a b c d ef g h i jk l m n op q r s tu v w x y

Page 24: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Space Requirements for Quick Sort

• Partition uses only a constant number of variables (m, i andpossibly swapping)

• However recursive calls add to the stack.

• In the best case, we have 50/50 splits and in this case, thestack only has size log n

• In the worst case, we have 1 and n − 1 splits (where nchanges with the length of the array we are considering) andso we have a stack with size n.

Page 25: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Picture of the splits

Best Case

quicksort(a,1)...

...quicksort(a,n/4)

quicksort(a,n/2)

quicksort(a,n)

stack

Worst Case

quicksort(a,1)...

...quicksort(a,n-5)

quicksort(a,n-4)

quicksort(a,n-3)

quicksort(a,n-2)

quicksort(a,n-1)

quicksort(a,n)

stack

Page 26: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Optimizations

Tail Recursion

This is when the recursive call is the last action of the function

For example,

void quicksort(int a[], int n){

// Commands here ... no recursion

quicksort(a+m+1,n-m-1);

}

Page 27: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Tail Call Elimination

• When the recursive call returns, its return is simply passed on.

• Therefore, the activation record of the caller can be reused bythe callee and thus the stack doesn’t grow.

• This is guaranteed by some languages like Scheme

• It can be enabled in gcc by using the -O2 optimization flag.

• With this and sorting the smallest partition first, the stackdepth in the worst case space complexity of quick sort isO(log n).

Page 28: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Built in Sorting

• In practice, people almost never create their own sortingalgorithms as there is usually one built into the languagealready.

• For C, <stdlib.h> contains a library function called qsort

(Note: This isn’t necessarily quick sort!)

void qsort(void *base , size_t n, size_t ,size ,

int (* compare )( const void *a, const void *b));

Page 29: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Description of Parameters

• base is the beginning of the array

• n is the length of the array

• size is the size of a byte in the array

• *compare is a function pointer to a comparison criteria.

Page 30: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Compare Function

#include <stdlib.h>

int compare(const void *a, const void *b) {

int p = *(int *)a;

int q = *(int *)b;

if (p < q) return -1;

else if (p == q) return 0;

else return 1;

}

void sort(int a[], int n) {

qsort(a,n,sizeof(int),compare );

}

Page 31: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Alternatively

int compare(const void *a, const void *b) {

int p = *(int *)a;

int q = *(int *)b;

return (p < q) ? -1 : ((q > p) ? 1 : 0);

}

Page 32: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Binary Search

• Recall we started with linear searching which we said had aO(n) runtime.

• If we already have a sorted array, linear searching seems likewe’re not effectively using our information.

• We’ll discuss binary searching which will allow us to searchthrough a sorted array.

Page 33: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Binary Searching

Basic Idea

• Check the middle element a[m]

• If we match with value, return this element

• If a[m] > value then search the lower half

• If a[m] < value then search the upper half

• Stop when lo > hi where lo is the lower index and hi is theupper index (start and end in the next graphic)

Page 34: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Binary SearchLet’s find 6 in the following sorted array

https://puzzle.ics.hut.fi/ICS-A1120/2016/notes/

round-efficiency--binarysearch.html

Page 35: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Binary Search

#include <stdio.h>

int search(int a[], int n, int value) {

int lo = 0, hi = n-1;

while (hi >= lo) {

//Note int m = (hi +lo)/2 is equivalent

//but may overflow.

//Same with (hi+lo)>>1;

int m = lo+(hi -lo)/2;

if (a[m] == value) return m;

if (a[m] < value) lo = m+1;

if (a[m] > value) hi = m-1;

}

return -1;

}

Page 36: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Binary Searching

int main() {

int a[] = {-10,-7,0,2,11,14,38,42};

const int n = sizeof(a)/ sizeof(a[0]);

printf("%d\n", search(a,n ,10));

printf("%d\n", search(a,n ,11));

printf("%d\n", search(a,n, -100));

return 0;

}

Page 37: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Tracing

Let’s trace the code with value = 10, 11 and -100

Page 38: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Time Complexity of Binary Search

Worst Case (and Average Case) Analysis

• If we don’t find the element, then at each step we search onlyin half as many elements as before and searching takes onlyO(1) work.

• We can cut the array in half a total of O(log n) times.

• Thus, the total runtime is O(log n).

Best Case Analysis

• We find the element immediately and return taking only O(1)time.

Note this is extremely fast - even with 1 billion integers, we onlyneed at worst 30 probes.

Page 39: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Sorting Summary

Selection sort

• Find the smallest and swap with the first

• Runtime for best, average and worst case: O(n2)

Insertion sort

• Find where element i goes and shift the rest to make room forelement i .

• Runtime for best case is O(n) and for average and worst case:O(n2)

Page 40: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Sorting Summary

Merge sort

• Divide and conquer - Split in halves, recurse then merge.

• Runtime for best, average and worst case: O(n log n) but hasO(n) extra space.

Quick Sort

• Pick pivot, split elements into smaller and large than pivot,repeat.

• Runtime for best, average and worst case: O(n log n)

Page 41: CS 137 Part 8cbruni/CS137Resources/lectures/CS137Part… · Binary Search Recall we started with linear searching which we said had a O(n) runtime. If we already have a sorted array,

Searching Summary

Linear Search

• Scan one by one until found

• Runtime for best case is O(1) and for, average and worst case:O(n)

Binary Search

• Probe middle and repeat (requires sorted array)

• Runtime for best case is O(1) and for, average and worst case:O(log n)