Analysis of Insertion and Merge Sort

download Analysis of Insertion and Merge Sort

of 21

Transcript of Analysis of Insertion and Merge Sort

  • 8/8/2019 Analysis of Insertion and Merge Sort

    1/21

    Sorting Algorithms

    In computer scienceand mathematics, a sorting algorithm is analgorithm that putselements of a listin a certainorder.

    The most-used orders are numerical order and lexicographical order.In-Place Sorting Algorithms

    In computer science, an in-place algorithm is analgorithm which transforms a data

    structure using a small, constant amount of extra storage space. The input is usually overwritten by the output as the algorithm executes. An algorithm which is not in-place is sometimes called not-in-place orout-of-place. A sorting algorithm is said to be in-place if it requires very little additional space besidesthe initial array holding the elements that are to be sorted.

    Normally very little is taken to meanthat for sorting elements, O(log n) extra spaceis required.

    There are a number ofsorting algorithms that can rearrange arraysinto sorted order in-place, including:

    Bubble sort

    Selection sort

    Insertion sort

    Heapsort

    Quicksort and Mergesort are not in-place sorting algorithms.Stable Sorting Algorithms

    Stable sorting algorithms maintain the relative order ofrecords with equal keys (i.e.,sort key values).

    That is, a sorting algorithm is stable if whenever there are two recordsR and Swith thesame key and withR appearing before Sin the original list,R will appear before Sin the

    sorted list.

    Assume that the following pairs of numbers are to be sorted by their first component:

    (4, 1) (3, 7) (3, 1) (5, 6) In this case, two different results are possible, one which maintains the relative order ofrecords with equal keys, and one which does not:

    (3, 7) (3, 1) (4, 1) (5, 6) (order maintained)

    (3, 1) (3, 7) (4, 1) (5, 6) (order changed)

    Sorting Algorithms Stable

    Bubble Sort Yes

    Selection Sort No

    Insertion Sort YesHeapsort No

    Quicksort NoMergesort No

    Internal Sort The data to be sorted is all stored in the computers main memory.

    External Sort

    Some of the data to be sorted might be stored in some external, slower, device.

    In-Memory Sorting : When the size of memory is bigger than that of file to be sorted!

    http://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/List_(computing)http://en.wikipedia.org/wiki/List_(computing)http://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Lexicographical_orderhttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://planetmath.org/encyclopedia/MatiyasevivcsTheorem.htmlhttp://planetmath.org/encyclopedia/Mean3.htmlhttp://planetmath.org/encyclopedia/Mean3.htmlhttp://en.wikipedia.org/wiki/Sorting_algorithmhttp://en.wikipedia.org/wiki/Arrayhttp://en.wikipedia.org/wiki/Arrayhttp://en.wikipedia.org/wiki/Bubble_sorthttp://en.wikipedia.org/wiki/Selection_sorthttp://en.wikipedia.org/wiki/Insertion_sorthttp://en.wikipedia.org/wiki/Heapsorthttp://en.wikipedia.org/wiki/Strict_weak_orderinghttp://en.wikipedia.org/wiki/Strict_weak_orderinghttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/List_(computing)http://en.wikipedia.org/wiki/Total_orderhttp://en.wikipedia.org/wiki/Lexicographical_orderhttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://planetmath.org/encyclopedia/MatiyasevivcsTheorem.htmlhttp://planetmath.org/encyclopedia/Mean3.htmlhttp://en.wikipedia.org/wiki/Sorting_algorithmhttp://en.wikipedia.org/wiki/Arrayhttp://en.wikipedia.org/wiki/Bubble_sorthttp://en.wikipedia.org/wiki/Selection_sorthttp://en.wikipedia.org/wiki/Insertion_sorthttp://en.wikipedia.org/wiki/Heapsorthttp://en.wikipedia.org/wiki/Strict_weak_ordering
  • 8/8/2019 Analysis of Insertion and Merge Sort

    2/21

    External Sorting: When the size of file to be sorted is bigger than that of availablememory!

    Loop Invariants

    Loop invariants provide us with a way to reason about loops, and can be used to verifythat the loops we design are correct.

    Simply stated, a loop invariant is a relationship among variables in a program that istrue when control enters a loop, remains true each time the body of the loop is executed,and is still true when the loop is exited.

    The General Form of a Loop

    It is important to understand a bit about the anatomy of a loop, and to discuss some loopterminology. There are three essential components to any loop.

    1. Initialization for the loop.

    2. The reason for looping, or the termination condition for the

    loop.

    3. The loop body, made up of the statements to be executed each

    time the program goes though the loop. The loop body must guarantee

    that the loop terminates in some finite number of steps.Example

    Consider the problem of computing the value of n! where n is a positive, non-zero integergreater than 1. We know from math that

    (1) n! = 1 * 2 * 3 * ... * (n-1) * n

    The strategy that we want to use to solve this problem is to come up with a loop thatcreates each new integer value in turn, and then multiples it by the product of the previously

    computed integers.

    To do this, define the program variables j and product. The variable j will hold each newinteger value as we compute it, and product will hold the product of j and the previously

    computed integer values.

    Clearly, when we are finished,j = n andproduct = j!

    It should be easy to see that the loop body will computeproduct = product * j;

    and we will want the loop to run as long as

    j < n.

    The loop can therefore be constructed as follows: Every time that we loop back and start to execute the loop body again, we know that jwill be less than n.

    This is the reason for looping. Inside the body of the loop we need to generate one newinteger value with the statement

    j = j + 1;

    and use that new value to calculate a new product.

    The loop body therefore has two statements:j = j + 1;

    product = product * j;

  • 8/8/2019 Analysis of Insertion and Merge Sort

    3/21

    Since we know that every loop needs some initialization, we need to provide that here.Our equation (1) for factorial starts with the integer 1. Furthermore, we know that 1! = 1. So,let's write

    j = 1;

    product = 1;

    The finished loop now looks like

    j = 1;product = 1;

    //At this point we know that product = j! and j < n

    while ( j < n )

    {

    j = j + 1;product = product * j;

    //At this point we also know that product = j! and j < n

    }// At this point, we know that product = j! and j = n

    We see from the above that there is one condition that is true before we start the loop, it is

    true each time we complete the body of the loop, and it is true when we exit the loop. Thiscondition, product = j!, is called the loop invariant.

    Using Loop Invariants to Construct Loops

    The steps that we just went through can be used in any similar situation, to construct aloop.

    In summary, these steps are:1. Come up with a loop strategy that solves the problem.

    2. Determine the set of variables required in the loop.3. Express the result desired when the loop exits, in terms of the loop variables.

    4. Write down the reason for leaving the loop and the loop invariant. These can

    usually be written in terms of the desired result when the loop exits.

    5. Construct the initialization and the loop body.Loop Invariants

    If a condition is true when you enter a loop it will be true when you leave each iteration

    of the loop

    Used to determine if you have written your loop correctly.

    int fact(int n) {

    int prod = 1;

    int k = 0;

    while(k < n) {

    k = k + 1;

    prod = prod * k;

    }return prod;

    }

    Loop invariant : prod = k!

    When loop terminates, k = n, hence prod = n!

    Insertion Sort

    Design approach: Incremental

    Sorts in place: Yes

    Best case: (n)

  • 8/8/2019 Analysis of Insertion and Merge Sort

    4/21

    Worst case: (n2) Idea: like sorting a hand of playing cards

    Start with an empty left hand and the cards facing down on the table.

    Remove one card at a time from the table, and insert it into the correct position in the left

    hand

    compare it with each of the cards already in the hand, from right to left

    The cards held in the left hand are sorted

    these cards were originally the top cards of the pile on the table.

    61

    0

    2

    4

    12

    36

    6 10 2436

    12

    6 1024

    36

    To insert 12, we need to make

    room for it by moving first 36and then 24.

  • 8/8/2019 Analysis of Insertion and Merge Sort

    5/21

    61

    0

    2

    4

    12

    36

    6 10 2436

    12

    To insert 12, we need to makeroom for it by moving first 36and then 24.

  • 8/8/2019 Analysis of Insertion and Merge Sort

    6/21

    6 1024

    36

    12

    5 2 4 6 13

    input array

    left sub-array right sub-array

    at each iteration, the array is divided in two sub-arrays:

    sorted unsorted

  • 8/8/2019 Analysis of Insertion and Merge Sort

    7/21

    Loop Invariants and the Correctness of Insertion Sort

    Proving loop invariants works like induction

    Initialization (base case): It is true prior to the first iteration of the loop

    Maintenance (inductive step): If it is true before an iteration of the loop, it remains true before the next iteration(

    [i-1] true => [i] true ) Termination:

    When the loop terminates, the invariant gives us a useful property that helps show

    that the algorithm is correct

    Stop the induction when the loop terminates

    a8

    a7

    a6

    a5

    a4

    a3

    a2

    a1

    1 2 3 4 5 6 7 8

    key

  • 8/8/2019 Analysis of Insertion and Merge Sort

    8/21

    When the first two properties hold, the loop invariant is true prior to everyiteration of the loop.

    Loop invariant:OriginalA[1 ..j-1] is permuted toA[1 ..j-1] but insorted order

    At the start of each iteration offor loop, the subarrayA[1 ..j-1] consists ofthe elements originally inA[1 ..j-1] but in sorted order.

    It can help us to understand why an algorithm is correct. Initialization:j=2,A[1 ..j-1]=A[1] consists of just the singleA[1] , which is the

    original element inA[1], and is sorted. Obviously, the loop invariant holds prior to the firstiteration of the loop.

    Maintenance: the while inner loop moves A[j -1], A[j -2], A[j -3], and so on, byone position to the right until the proper position for key (which has the value that started out

    in A[j]) is found. At that point, the value of key is placed into this position. The second

    property holds for the outer loop.

    Termination:

    The outer for loop ends when j = n + 1 j-1 = n Replace nwith j-1 in the loop invariant:

    the subarray A[1 . . n] consists of the elements originally in A[1 . . n],

    but in sorted order.

    The entire array is sorted!

    Analysis of Insertion Sort

    Alg.:INSERTION-SORT(A) Cost Times

  • 8/8/2019 Analysis of Insertion and Merge Sort

    9/21

    forj 2to n c1 ndo key A[ j ] c2 n-1

    Insert A[ j ] into the sorted sequence A[1 . . j -1] c3 n-1

    i j 1 c4 n-1while i > 0 and A[i] > key c5do A[i + 1] A[i] c6

    i i 1 c7A[i + 1] key c8 n-1

    tj: # of times the while statement is executed at iteration j

    Best Case Analysis

    The array is already sorted

    A[i] key upon the first time the while loop test is run (when i =j -1)

    tj= 1

    T(n) = c1n + c2(n -1) + c4(n -1) + c5(n -1) + c8(n-1) = (c1 + c2 + c4 + c5 + c8)n + (c2 + c4 + c5+ c8)

    = an + b = (n)Worst Case Analysis

    The array is in reverse sorted order

    Always A[i] > key in while loop test

    Have to compare keywith all elements to the left of the j-th position comparewith j-1 elements tj = j, using

    , we have

    a quadratic function of n

    T(n) = (n2) order of growth in n2

    =n

    j jt

    2

    = n

    j jt

    2)1(

    = n

    j jt

    2)1(

    ( ) ( ) )1(11)1()1()( 827

    2

    6

    2

    5421 ++++++= === nctctctcncncncnTn

    j

    j

    n

    j

    j

    n

    j

    j

    1 2 2

    ( 1) ( 1) ( 1)1 ( 1)

    2 2 2

    n n n

    j j j

    n n n n n nj j j

    = = =

    + + = => = => =

    )1(

    2

    )1(

    2

    )1(1

    2

    )1()1()1()( 8765421 +

    +

    +

    +

    +++= ncnn

    cnn

    cnn

    cncncncnT

    cbnan ++=2

  • 8/8/2019 Analysis of Insertion and Merge Sort

    10/21

    Average-case Analysis

    Want to determine the average number of comparisons taken over all possible inputs.

    Determine the average no. of comparisons for a keyA[j].

    A[j] can belong to any of thej locations, 1..j, with equal probability.

    The number ofkey comparisons for A[j] isjk+1, ifA[j] belongs to location k, 1 < k jand isj1 if it belongs to location 1.

    Average no. of comparisons for inserting keyA[j] is:

    Summing over the no. of comparisons for all keys,

    Therefore, Tavg(n) = (n2)Mergesort

    Design approach: divide and conquer

    Sorts in place: No

    Running time: (nlogn) Recursive in structure

    Divide the problem into sub-problems that are similar to the original but smaller

    in size

    Conquerthe sub-problems by solving them recursively. If they are small enough,just solve them in a straightforward manner.

    Combine the solutions to create a solution to the original problemSorting Problem: Sort a sequence ofn elements into non-decreasing order.A[p . . r]:

    Divide: Divide the n-element sequence to be sorted into two subsequences ofn/2

    elements each

    Conquer: Sort the two subsequences recursively using merge sort.

    Combine: Merge the two sorted subsequences to produce the sorted answer.

    ( )

    j

    j

    j

    j

    jk

    j

    jj

    kj

    j

    k

    j

    k

    1

    2

    111

    2

    1

    11

    1

    )1(11

    1

    1

    1

    1

    +

    =+

    =

    +=

    +

    =

    =

    )(

    )(ln)(

    11

    4

    3

    4

    1

    2

    1

    )(

    2

    2

    2

    2

    2

    n

    nOn

    i

    nn

    i

    i

    nC

    n

    i

    n

    iavg

    =

    =

    +=

    +

    =

    =

    =

  • 8/8/2019 Analysis of Insertion and Merge Sort

    11/21

  • 8/8/2019 Analysis of Insertion and Merge Sort

    12/21

    Merge Sort

    Alg.: MERGE-SORT(A, p, r)

    ifp < r Check for base casethen q (p + r)/2 Divide

    MERGE-SORT(A, p, q) Conquer

    MERGE-SORT(A, q + 1, r) Conquer

    MERGE(A, p, q, r) Combine

    Initial call:MERGE-SORT(A, 1, n)

    1 2 3 4 5 6 7 8

    62317425

    p rq

    Example n Power of 21 2 3 4 5 6 7 8

    q = 462317425

    1 2 3 4

    7425

    5 6 7 8

    6231

    1 2

    25

    3 4

    74

    5 6

    31

    7 8

    62

    1

    5

    2

    2

    3

    4

    4

    7 1

    6

    3

    7

    2

    8

    6

    5

    Divide

  • 8/8/2019 Analysis of Insertion and Merge Sort

    13/21

    Example n Power of 2

    1

    5

    2

    2

    3

    4

    4

    7 1

    6

    3

    7

    2

    8

    6

    5

    1 2 3 4 5 6 7 8

    76543221

    1 2 3 4

    7542

    5 6 7 8

    6321

    1 2

    52

    3 4

    74

    5 6

    31

    7 8

    62

    ConquerandMerge

    Example n Not a Power of 2

    62537416274

    1 2 3 4 5 6 7 8 9 10 11

    q = 6

    416274

    1 2 3 4 5 6

    62537

    7 8 9 10 11

    q = 9q = 3

    274

    1 2 3

    416

    4 5 6

    537

    7 8 9

    62

    10 11

    74

    1 2

    2

    3

    16

    4 5

    4

    6

    37

    7 8

    5

    9

    2

    10

    6

    11

    4

    1

    7

    2

    6

    4

    1

    5

    7

    7

    3

    8

    Divide

  • 8/8/2019 Analysis of Insertion and Merge Sort

    14/21

    Merging

    Input: Array Aand indices p, q, rsuch that p q < r

    Subarrays A[p . . q] and A[q + 1 . . r] are sorted

    Output: One single sorted subarray A[p . . r]

    Idea for merging:

    Two piles of sorted cards

    Choose the smaller of the two top cards Remove it and place it in the output pile

    Repeat the process until one pile is empty

    Take the remaining input pile and place it face-down onto the output pile

    Example n Not a Power of 2

    77665443221

    1 2 3 4 5 6 7 8 9 10 11

    764421

    1 2 3 4 5 6

    76532

    7 8 9 10 11

    742

    1 2 3

    641

    4 5 6

    753

    7 8 9

    62

    10 11

    2

    3

    4

    6

    5

    9

    2

    10

    6

    11

    4

    1

    7

    2

    6

    4

    1

    5

    7

    7

    3

    8

    74

    1 2

    61

    4 5

    73

    7 8

    ConquerandMerge

    1 2 3 4 5 6 7 8

    63217542

    p rq

  • 8/8/2019 Analysis of Insertion and Merge Sort

    15/21

    Merge - Pseudocode

    Alg.:MERGE(A, p, q, r)Compute n

    1and n

    2

    Copy the first n1elements into L[1 . .

    n1+ 1] and the next n

    2elements into R[1 . . n

    2+ 1]

    L[n1+ 1] ; R[n

    2+ 1]

    i 1; j 1

    fork pto r do ifL[ i ] R[ j ]

    then A[k] L[ i ]

    i i + 1

    else A[k] R[ j ]

    j j + 1

    p q

    7542

    6321

    rq + 1

    L

    R

    1 2 3 4 5 6 7 8

    63217542

    p rq

    n1

    n2

    Sentinels, to avoid having to

    check if either subarray is fully

    copied at each step.

    n1qp + 1

    n2rq

    fori 1 ton1

    doL[i] A[p + i 1]

    forj 1 ton2

    doR[j] A[q +j]

  • 8/8/2019 Analysis of Insertion and Merge Sort

    16/21

    Example: MERGE(A, 9, 12, 16)

    p rq

    Example: MERGE(A, 9, 12, 16)

  • 8/8/2019 Analysis of Insertion and Merge Sort

    17/21

    Example (cont.)

    Example (cont.)

  • 8/8/2019 Analysis of Insertion and Merge Sort

    18/21

    Done!

    Correctness of Merge Sort

    Loop invariant(at the start of the forloop)A[pk-1] contains the k-p smallest elements of

    L[1 . . n1+ 1] and R[1 . . n

    2+ 1] in

    sorted order

    L[i] and R[j] are the smallest elements not yetcopied back to A

    p r

    Example (cont.)

  • 8/8/2019 Analysis of Insertion and Merge Sort

    19/21

    Proof of the Loop Invariant

    InitializationPrior to first iteration: k = p

    subarray A[p..k-1] is empty

    A[p..k-1] contains the k p = 0 smallest elements

    ofL and R

    L and R are sorted arrays (i = j = 1)

    L[1]

    andR[1]

    are the smallest elements inL

    and

    R

    Proof of the Loop Invariant

    Maintenance

    Assume L[i] R[j] L[i] is the smallest element not

    yet copied to A

    After copying L[i] into A[k], A[p..k] contains the k p

    + 1 smallest elements ofL and R

    Incrementing k (forloop) and i reestablishes the loopinvariant

  • 8/8/2019 Analysis of Insertion and Merge Sort

    20/21

    Proof of the Loop Invariant

    TerminationAt termination k = r + 1

    By the loop invariant: A[p..k-1] = A[pr] contains the k

    p = r p + 1 smallest elements ofL and R in sortedorder

    Exactly the number of elements to be sortedMERGE(A, p, q, r) is correct

    k = r + 1

    Running Time of Merge

    Initialization (copying into temporary arrays): (n

    1+ n

    2) = (n)

    Adding the elements to the final array (the last forloop):n iterations, each taking constant time (n)

    Total time for Merge:

    (n)

  • 8/8/2019 Analysis of Insertion and Merge Sort

    21/21

    Analyzing Divide-and ConquerAlgorithms

    The recurrence is based on the three steps of theparadigm:T(n) running time on a problem of size n

    Divide the problem into a subproblems, each of size

    n/b: takes D(n)

    Conquer(solve) the subproblems aT(n/b)

    Combine the solutions C(n)

    (1) ifn c

    T(n) = aT(n/b) + D(n) + C(n) otherwise