Sorting Algorithms
CS 524 – High-Performance Computing
CS 524 (Au 2004/05)- Asim Karim @ LUMS 2
Sorting
Sorting is the task of arranging an unordered collection (sequence) of elements into monotonically increasing (or decreasing) order
Sorting transforms an unordered set of elements S = {a1, a2, a3,…an} into the set S’ = {a’1, a’2, a’3,…a’n} where a’i ≤ a’j for 0 ≤ i ≤ j ≤ n and S’ is a permutation of S
Sorting algorithms can be categorized into internal (S can fit into main memory) and external (S cannot fit in main memory) We study internal algorithms only
Sorting algorithms can also be categorized as comparison-based or noncomparison-based
CS 524 (Au 2004/05)- Asim Karim @ LUMS 3
Data Storage on Parallel Computers
Storage of input and output sequences Where? One processor or distributed among processors? How? What is the order of data distribution with respect to
the order of the processors
CS 524 (Au 2004/05)- Asim Karim @ LUMS 4
Compare-Exchange on Parallel Computers
One element per processor: ai on Pi and aj on Pj
Compare-exchange between two processors Pi and Pj requires a communication and a comparison operation
A parallel system with as many processors as number of elements would deliver poor performance. Why?
CS 524 (Au 2004/05)- Asim Karim @ LUMS 5
Compare-Split on Parallel Computers (1)
CS 524 (Au 2004/05)- Asim Karim @ LUMS 6
Compare-Split on Parallel Computers (2)
Each processors has n/p elements of the sequence Initially processor Pi has block Ai
After sorting, the blocks of elements are ordered such that A’i ≤ A’j for i ≤ j and union of Ai = union of A’i
Compare-split Each processor sends its block to the other (each block is
sorted locally) The processor merges the two blocks of elements The processor splits the merged elements and retains the
appropriate half of it
CS 524 (Au 2004/05)- Asim Karim @ LUMS 7
Sorting Network (1)
Sorting network is a specialized interconnection network that can perform many comparisons simultaneously thus improving sorting performance significantly
Key component of the soriting network: comparator Increasing comparator Decreasing comparator
CS 524 (Au 2004/05)- Asim Karim @ LUMS 8
Sorting Network (2)
CS 524 (Au 2004/05)- Asim Karim @ LUMS 9
Bubble Sort
Complexity: O(n2) Bubble sort is difficult to parallelize. Why?
CS 524 (Au 2004/05)- Asim Karim @ LUMS 10
Odd-Even Transposition Sort (1)
CS 524 (Au 2004/05)- Asim Karim @ LUMS 11
Odd-Even Transpositon Sort (2)
CS 524 (Au 2004/05)- Asim Karim @ LUMS 12
Parallel Implementation: p = n
Data partitioning: Each processor Pi has one element ai
Computation and Communication: During each phase, the odd or even numbered processors perform a compare-exchange with their right processors
Performance On a linear array On a crossbar On a bus
Not cost optimal
CS 524 (Au 2004/05)- Asim Karim @ LUMS 13
Parallel Implementation: p < n
Data partitioning: Each processor Pi has n/p elements in the block Ai
Computation and Communication: Sort Ai locally (using merge sort or quicksort). Then, execute p phases (p/2 odd and p/2 even) performing compare-split operations with the right neigboring processor.
Performance On a linear array On a crossbar On a bus
Cost optimal on linear array and crossbar when p = O(log n). Not cost optimal on bus
CS 524 (Au 2004/05)- Asim Karim @ LUMS 14
Shellsort (1)
Odd-even transposition sort moves elements one position at a time If a sequence has only a few unordered elements and if they
are far away from their correct position then OE sort will take a long time to sort the sequence
Shellsort can move elements longer distances. It has two phases: In the first phase, blocks that are far away are compare-split In the second phase, an odd-even transposition sort is
conducted. This is continued as long as blocks are changing positions
CS 524 (Au 2004/05)- Asim Karim @ LUMS 15
Shellsort (2)
CS 524 (Au 2004/05)- Asim Karim @ LUMS 16
Shellsort (3)
Initially, each processor sort its block of elements locally
First phase1. Compare-split Pi (i < p/2) with Pp-i-1 (reverse order compare-
split)
2. The processors are partitioned into two groups; one group has the first p/2 processors and the other the next p/2 processors. Compare-split (in reverse order) among each group.
3. Go to 1. Repeat for log p times.
Second phase Perform OE sort until no changes occur
CS 524 (Au 2004/05)- Asim Karim @ LUMS 17
Shellsort (4)
Performance On a linear array On a crossbar On a bus
CS 524 (Au 2004/05)- Asim Karim @ LUMS 18
Quicksort (1)
CS 524 (Au 2004/05)- Asim Karim @ LUMS 19
Quicksort (2)
Recursive divide-and-conquer algorithm that has an average complexity of O(nlogn)
CS 524 (Au 2004/05)- Asim Karim @ LUMS 20
Quicksort (3)
The partitioning of a sequence of length n has a complexity of O(n)
The selection of the pivot affects significantly the overall complexity of quicksort In the worst case, where a n-length sequence is partitioned
into a 1 and a n-1-length subsequences, the overall complexity becomes O(n2)
On average, the complexity is O(nlogn)
CS 524 (Au 2004/05)- Asim Karim @ LUMS 21
Parallelizing Quicksort
A naïve formulation Start off with one process with does the initial partitioning.
Then, assign one of the subproblems (the recursion) to another process. Repeat for each subsequence until no further partitioning is possible.
Not cost-optimal (Why?)
Analysis
CS 524 (Au 2004/05)- Asim Karim @ LUMS 22
Message-Passing Parallel Formulation
Data partitioning: Each processor Pi has Ai of n/p elements
Computation and communication Select a pivot Broadcast the pivot to all processors Locally rearrange the block Ai into sub-blocks Si and Li
Combine Si and Li from all processors as S and L
Partition S to one group of processors and L to the other Recursively perform these operations until a sub-block is
assigned to one processor only. Then, the processors sort the set locally
CS 524 (Au 2004/05)- Asim Karim @ LUMS 23
Top Related