Sorting

Post on 02-Jan-2016

29 views 1 download

Tags:

description

Sorting. 15-211 Fundamental Data Structures and Algorithms. Klaus Sutner February 17, 2004. Announcements. Homework 5 is out Reading: Chapter 8 in MAW Quiz 1 available on Thursday. Introduction to Sorting. Boring …. - PowerPoint PPT Presentation

Transcript of Sorting

Sorting

15-211 Fundamental Data Structures and Algorithms

Klaus Sutner

February 17, 2004

Announcements

Homework 5 is out

Reading:

Chapter 8 in MAW

Quiz 1 available on Thursday

Introduction to Sorting

Boring …

Sorting is admittedly not very sexy, everybody knows some algorithms already, …

But: Good sorting algorithms are needed absolutely everywhere.

Sorting is fairly well understood theoretically.

Provides a good way to introduce some important ideas.

The Problem

We are given a sequence of items

a1 a2 a3 … an-1 an

We want to rearrange them so that they are in non-decreasing order.

More precisely, we need a permutation f such that

af(1) af(2) af(3) … af(n-1) af(n)

.

A Constraint

Comparison Based Sorting

While we are rearranging the items, we will only use queries of the form

ai aj

Or variants thereof (<,> and so forth).

Say What?

The important point here is that the algorithm can can only make comparison such as

if( a[i] < a[j] ) …

We are not allowed to look at pieces of the elements a[i] and a[j]. For example, if these elements are numbers, we are not allowed to compare the most significant digits.

An Easy Upper Bound

Here is a simple idea to sort an array: a flip is a position in the array where two adjacent elements are out of order.

a[i] > a[i+1]

Let’s look for a flip and correct it by swapping the two elements.

A Prototype Algorithm

// FlipSortwhile( there is a flip )

pick one, fix it

Is this algorithm guaranteed to terminate?

If so, what can we say about its running time?

Is it correct, i.e., is the array sorted?

Termination

while( there is a flip )pick one, fix it

It’s tempting to do induction on the number of flips but beware:

10 15 5 10 10 5 15 10

We need to talk about inversions instead.

Flips and Inversions

24 47 13 99 105 222

inversion flip

Running Time

The total number of inversions is clearly quadratic at most.

So we can sort in quadratic time if we can manage to find and fix a flip in constant time.

We need to organize the search somehow.

Probably should try to avoid recomputation.

Naïve sorting algorithms

Bubble Sort

Selection Sort

Insertion Sort this one is actually important

Are all quadratic in the worst case and on average.

Bubble Sort

Scan through the array, fix flips as you go along. Repeat until array is sorted.

for( i = 2; i <= n; i++ )

for( j = n; j >= i; j-- )

if( A[j-1] > A[j] )

swap A[j-1] and A[j];

Selection Sort

For k = n, n-1, … find the smallest element in the last k elements of the array and swap it to the front.

for( i = 1; i <= n-1; i++ )

find A[j] minimal in A[i..n]

swap with A[i]

Insertion Sort

Place the ith element into the proper place into the already sorted list of the first i-1 elements.

for i = 2 to n do

order-insert a[i] in a[1:i-1]

Can be implemented nicely.

Insertion Sort

Using a sentinel.

for( i = 2; i <= n; i++ )

x = A[i];

A[0] = x;

for( j = i; x < A[j-1]; j-- )

A[j] = A[j-1];

A[j] = x;

Insertion sort

105 47 13 99 30 222

47 105 13 99 30 222

13 47 105 99 30 222

13 47 99 105 30 222

13 30 47 99 105 222

105 47 13 99 30 222

Sorted sublist

How fast is insertion sort?

Takes O(#inversions) steps, which is very fast if array is nearly sorted to begin with.

3 2 1 6 5 4 9 8 7 …

How long does it take to sort?

Can we do better than O(n2)?

In the worst case?

In the average case

Sorting in O(n log n)

O(n log n) turns out to be a Magic Wall: it is hard to reach, and exceedingly hard to break through.

In fact, it’s impossible in a sense to do better than O(n log n).

We already know that Heapsort will give us this bound:

- build the heap in linear time,- destroy it in O(n log n).

Heapsort in practice

The average-case analysis for heapsort is somewhat complex.

In practice, heapsort consistently tends to use nearly n log n comparisons.

So, while the worst case is better than n2, other algorithms sometimes work better.

Shellsort

Shellsort, like insertion sort, is based on swapping inverted pairs.

It achieves O(n4/3) running time.

[See your book for details.]

Shellsort

Example with sequence 3, 1.

105 47 13 99 30 222

99 47 13 105 30 222

99 30 13 105 47 222

99 30 13 105 47 222

30 99 13 105 47 222

30 13 99 105 47 222

...

Several inverted pairs fixed in one exchange.

Recursive Sorting

Recursive sorting

Intuitively, divide the problem into pieces and then recombine the results.

If array is length 1, then done.

If array is length N>1, then split in half and sort each half.

Then combine the results.

An example of a divide-and-conquer algorithm.

Divide-and-conquer

Divide-and-conquer

Why divide-and-conquer works

Suppose the amount of work required to divide and recombine is linear, that is, O(n).

Suppose also that the amount of work to complete each step is greater than O(n).

Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.

Divide-and-conquer is big

We will see several examples of divide-and-conquer in this course.

Recursive Sorting

If array is length 1, then done.

Otherwise, split into two smaller pieces.

Sort each piece.

Combine the sorted pieces.

Two Major Approaches

1. Make the split trivial, but perform some work when the pieces are combined Merge Sort.

2. Work during the split, but then do nothing in the combination step Quick Sort.

In either case, the overhead should be linearwith small constants.

Analysis

The analysis is relatively easy if the two pieceshave (approximately) the same size.

This is the case for Merge Sort, but not forQuick Sort.

Let’s ignore the second case for the time being.

Recurrence Equations

We need to deal with equations of the form

T(1) = 1T(n) = 2 T(n/2) + f(n)

Here f(n) is the non-recursive overhead.

There are two recursive calls, each to a sub-instance of the same size n/2.

Of course, there are other cases to consider.

Recurrence Equations

A slight generalization is

T(1) = 1T(n) = a T(n/b) + f(n)

Here f(n) is again the non-recursive overhead.

There are a recursive calls, each to a sub-instance of the size n/b.

Recurrence Equations

Of course, we’re cheating:

T(1) = 1T(n) = a T(n/b) + f(n)

Makes no sense unless b divides n.

Let’s just ignore this. In reality there are ceilings and floors and continuity arguments everywhere.

Mergesort

The Algorithm

Merging the two sorted parts here is responsible for the overhead.

merge( nil, B ) = B;merge( A, nil ) = A;

merge( a A, b B ) = if( a <= b )

prepend( merge( A, b B ), a ) elseprepend( merge( a A, B ), b )

The Algorithm

The main function.

List MergeSort( List L ){

if( length(L) <= 1 ) return L;

A = first half of L;B = second half of L;return

merge(MergeSort(A),MergeSort(B));}

Harsh Reality

In reality, the items are always given in an array.

The first and second half can be found by index arithmetic.

L LR L

But Note …

We cannot perform the merge operation in place.

Rather, we need to have another array as scratch space.

The total space requirement for Merge Sort is

2n + O(log n)

Assuming the recursive implementation.

Running Time

Solving the recurrence equation for Merge Sort one can see that the running time is

O(n log n)

Since Merge Sort reads the data strictly sequentially it is sometimes useful when data reside on slow external media.

But overall it is no match for Quick Sort.

Quicksort

Quicksort

Quicksort was invented in 1960 by Tony Hoare.

Although it has O(N2) worst-case performance, on average it is O(Nlog N).

More importantly, it is the fastest known comparison-based sorting algorithm in practice.

Quicksort idea

Choose a pivot.

Quicksort idea

Choose a pivot.

Rearrange so that pivot is in the “right” spot.

Quicksort idea

Choose a pivot.

Rearrange so that pivot is in the “right” spot.

Recurse on each half and conquer!

Quicksort algorithm

If array A has 1 (or 0) elements, then done.

Choose a pivot element x from A.

Divide A-{x} into two arrays:

B = {yA | yx}

C = {yA | yx}

Quicksort arrays B and C.

Result is B+{x}+C.

Quicksort algorithm

105 47 13 17 30 222 5 19

5 17 13 47 30 222 10519

5 17 30 222 105

13 47

105 222

Quicksort algorithm

105 47 13 17 30 222 5 19

5 17 13 47 30 222 10519

5 17 30 222 105

13 47

In practice, insertion sort is used once the arrays get “small enough”.

105 222

Doing quicksort in place

85 24 63 50 17 31 96 45

85 24 63 45 17 31 96 50

L R

85 24 63 45 17 31 96 50

L R

31 24 63 45 17 85 96 50

L R

Doing quicksort in place

31 24 63 45 17 85 96 50

L R

31 24 17 45 63 85 96 50

R L

31 24 17 45 50 85 96 63

31 24 17 45 63 85 96 50

L R

Quicksort is fast but hard to do

Quicksort, in the early 1960’s, was famous for being incorrectly implemented many times.

More about invariants next time.

Quicksort is very fast in practice.

Faster than mergesort because Quicksort can be done “in place”.

Informal analysis

If there are duplicate elements, then algorithm does not specify which subarray B or C should get them.Ideally, split down the middle.

Also, not specified how to choose the pivot.Ideally, the median value of the array,

but this would be expensive to compute.

As a result, it is possible that Quicksort will show O(N2) behavior.

Worst-case behavior

105 47 13 17 30 222 5 195

47 13 17 30 222 19 105

47 105 17 30 222 19

13

17

47 105 19 30 22219

Analysis of quicksort

Assume random pivot.

T(0) = 1

T(1) = 1

T(N) = T(i) + T(N-i-1) + cN, for N>1

where I is the size of the left subarray.

Worst-case analysis

If the pivot is always the smallest element, then:

T(0) = 1T(1) = 1T(N) = T(0) + T(N-1) + cN, for N>1 T(N-1) + cN = O(N2)

See the book for details on this solution.

Best-case analysis

In the best case, the pivot is always the median element.

In that case, the splits are always “down the middle”.

Hence, same behavior as mergesort.

That is, O(Nlog N).

Average-case analysis

Consider the quicksort tree:

105 47 13 17 30 222 5 19

5 17 13 47 30 222 10519

5 17 30 222 105

13 47

105 222

Average-case analysis

The time spent at each level of the tree is O(N).

So, on average, how many levels?That is, what is the expected height of

the tree?

If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.

Average-case analysis

We’ll answer this question next time…

Summary of quicksort

A fast sorting algorithm in practice.

Can be implemented in-place.

But is O(N2) in the worst case.

Average-case performance?