Lecture 2 Sorting. Sorting Problem Insertion Sort, Merge Sort e.g.,
Lecture 11 Sorting
description
Transcript of Lecture 11 Sorting
![Page 1: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/1.jpg)
1
Lecture 11 Sorting
Parallel ComputingFall 2008
![Page 2: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/2.jpg)
2
Sorting Algorithm Rearranging a list of numbers
into increasing (strictly nondecreasing) order.
![Page 3: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/3.jpg)
3
Potential Speedup O(nlogn) optimal for any sequential sorting algorithm
without using special properties of the numbers. Best we can expect based upon a sequential sorting
algorithm using n processors is
Optimal parallel time complexity = O(n logn)/n= O(logn)
Has been obtained but the constant hidden in the order notation extremely large.
![Page 4: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/4.jpg)
4
Compare-and-Exchange Sorting Algorithms: Compare and Exchange Form the basis of several, if not most, classical
sequential sorting algorithms. Two numbers, say A and B, are compared. If A > B, A
and B are exchanged, i.e.:if (A > B) {
temp = A;A = B;B = temp;
}
![Page 5: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/5.jpg)
5
Message Passing Method For P1 to send A to P2 and P2 to send B to P1. Then both processes perform compare operations. P1 keeps the larger of A and B and P2 keeps the smaller of A and B:
![Page 6: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/6.jpg)
6
Merging Two Sublists
![Page 7: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/7.jpg)
7
Bubble Sort First, largest number moved to the end
of list by a series ofcompares and exchanges, starting at the opposite end.
Actions repeated with subsequent numbers, stopping just before the previously positioned number.
In this way, the larger numbers move (“bubble”) toward one end,
![Page 8: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/8.jpg)
8
Bubble Sort
![Page 9: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/9.jpg)
9
Time Complexity Number of compare and exchange
operations
Indicates a time complexity of O(n^2) given that a single compare-and-exchange operation has a constant complexity, O(1).
![Page 10: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/10.jpg)
10
Parallel Bubble SortIteration could start before previous iteration finished
if doesnot overtake previous bubbling action:
![Page 11: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/11.jpg)
11
Odd-Even (Transposition) Sort
Variation of bubble sort.Operates in two alternating phases, even
phase and odd phase.Even phase
Even-numbered processes exchange numbers with their right neighbor.
Odd phaseOdd-numbered processes exchange numbers
with their right neighbor.
![Page 12: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/12.jpg)
12
Sequential Odd-Even Transposition Sort
![Page 13: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/13.jpg)
13
Parallel Odd-Even Transposition SortSorting eight numbers
![Page 14: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/14.jpg)
14
Parallel Odd-Even Transposition Sort
Consider the one item per processor case. There are n iterations; in each iteration,
each processor does one compare-exchange –which can all be done in parallel.
The parallel run time of this formulation is Θ(n).
This is cost optimal with respect to the base serial algorithm but not the optimal serial algorithm.
![Page 15: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/15.jpg)
15
Parallel Odd-Even Transposition Sort
![Page 16: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/16.jpg)
16
Parallel Odd-Even Transposition Sort Consider a block of n/p elements per processor. The first step is a local sort. In each subsequent step, the compare
exchange operation is replaced by the compare split operation.
There are p phases with each phase performing Θ(n/p) compares and Θ(n/p) communication.
The parallel run time of the formulation is
The parallel formulation is cost-optimal for p= O(logn).
( log ) ( ) ( )pn nT n np p
![Page 17: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/17.jpg)
17
Quicksort Very popular sequential sorting algorithm that
performs well with average sequential time complexity of O(nlogn).
First list divided into two sublists. All numbers in one sublist arranged to be smaller than all numbers in other sublist.
Achieved by first selecting one number, called a pivot, against which every other number is compared. If the number is less than the pivot, it is placed in one sublist. Otherwise, it is placed in the other sublist.
Pivot could be any number in the list, but often first number in list chosen. Pivot itself could be placed in one sublist, or the pivot could be separated and placed in its final position
![Page 18: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/18.jpg)
18
Quicksort
![Page 19: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/19.jpg)
19
Quicksort
Example of the quicksort algorithm sorting a sequence of size n= 8.
![Page 20: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/20.jpg)
20
Parallel Quicksort Lets start with recursive decomposition -the list is
partitioned by a single process and then each of the subproblems is handled by a different processor.
The time for this algorithm is lower-bounded by Ω(n)!
–Not cost optimal as the process-time product is Ω(n^2).
Can we parallelize the partitioning step -in particular, if we can use n processors to partition a list of length n around a pivot in O(1)time, we have a winner.
This is difficult to do on real machines, though.
![Page 21: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/21.jpg)
21
Parallel Quicksort Using tree allocation of processes
![Page 22: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/22.jpg)
22
Parallel Quicksort With the pivot being withheld in processes:
![Page 23: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/23.jpg)
23
Analysis Fundamental problem with all tree
constructions –initial division done by a single processor, which will seriously limit speed.
Tree in quicksort will not, in general, be perfectly balanced
Pivot selection very important to make quicksort operate fast.
![Page 24: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/24.jpg)
24
Parallelizing Quicksort: PRAM Formulation We assume a CRCW (concurrent read, concurrent write)
PRAM with concurrent writes resulting in an arbitrary write succeeding.
The formulation works by creating pools of processors. Every processor is assigned to the same pool initially and has one element.
Each processor attempts to write its element to a common location (for the pool).
Each processor tries to read back the location. If the value read back is greater than the processor's value, it assigns itself to the 'left' pool, else, it assigns itself to the 'right' pool.
Each pool performs this operation recursively. Note that the algorithm generates a tree of pivots. The
depth of the tree is the expected parallel runtime. The average value is O(logn).
![Page 25: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/25.jpg)
25
Parallelizing Quicksort: PRAM Formulation
A binary tree generated by the execution of the quicksortalgorithm. Each level of the tree represents a different array-partitioning iteration. If pivot selection is optimal, then the height of the tree is Θ(log n), which is also the number of iterations.
![Page 26: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/26.jpg)
26
Parallelizing Quicksort: PRAM Formulation
The execution of the PRAM algorithm on the array shown in (a).
![Page 27: Lecture 11 Sorting](https://reader034.fdocuments.us/reader034/viewer/2022051700/56815d80550346895dcb8e8a/html5/thumbnails/27.jpg)
27
End
Thank you!