CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
-
date post
21-Dec-2015 -
Category
Documents
-
view
232 -
download
1
Transcript of CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.
![Page 1: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/1.jpg)
CSCI 4440 / 8446
Parallel ComputingThree Sorting Algorithms
![Page 2: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/2.jpg)
Outline
Sorting problemSequential quicksortParallel quicksortHyperquicksortParallel sorting by regular sampling
![Page 3: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/3.jpg)
Sorting Problem
Permute: unordered sequence ordered sequenceTypically key (value being sorted) is part of record with additional values (satellite data)Most parallel sorts designed for theoretical parallel models: not practicalOur focus: internal sorts based on comparison of keys
![Page 4: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/4.jpg)
Sequential Quicksort
17 14 65 4 22 63 11
Unordered list of values
![Page 5: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/5.jpg)
Sequential Quicksort
17 14 65 4 22 63 11
Choose pivot value
![Page 6: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/6.jpg)
Sequential Quicksort
1714 654 22 6311
Low list( 17)
High list(> 17)
![Page 7: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/7.jpg)
Sequential Quicksort
174 6511 22 6314
Recursivelyapply quicksortto low list
![Page 8: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/8.jpg)
Sequential Quicksort
174 2211 63 6514
Recursivelyapply quicksortto high list
![Page 9: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/9.jpg)
Sequential Quicksort
174 2211 63 6514
Sorted list of values
![Page 10: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/10.jpg)
Attributes of Sequential Quicksort
Average-case time complexity: (n log n)Worst-case time complexity: (n2)
Occurs when low, high lists maximally unbalanced at every partitioning step
Can make worst-case less probable by using sampling to choose pivot value
Example: “Median of 3” technique
![Page 11: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/11.jpg)
Quicksort Good Starting Point for Parallel Algorithm
SpeedGenerally recognized as fastest sort in average casePreferable to base parallel algorithm on fastest sequential algorithm
Natural concurrencyRecursive sorts of low, high lists can be done in parallel
![Page 12: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/12.jpg)
Definitions of “Sorted”
Definition 1: Sorted list held in memory of a single processorDefinition 2:
Portion of list in every processor’s memory is sortedValue of last element on Pi’s list is less than or equal to value of first element on Pi+1’s list
We adopt Definition 2: Allows problem size to scale with number of processors
![Page 13: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/13.jpg)
Parallel Quicksort
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
20, 40, 89, 47, 19, 61, 86, 85
P0
P1
P2
P3
![Page 14: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/14.jpg)
Parallel Quicksort
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
20, 40, 89, 47, 19, 61, 86, 85
P0
P1
P2
P3
Process P0 chooses and broadcastsrandomly chosen pivot value
![Page 15: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/15.jpg)
Parallel Quicksort
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
20, 40, 89, 47, 19, 61, 86, 85
P0
P1
P2
P3
Exchange “lower half” and “upper half” values”
![Page 16: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/16.jpg)
Parallel Quicksort
75, 15, 64, 21, 8, 54, 66, 67, 0, 70
50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61
83, 98, 99, 82, 91, 88
89, 86, 85
P0
P1
P2
P3
After exchange step
Lower“half”
Upper“half”
![Page 17: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/17.jpg)
Parallel Quicksort
75, 15, 64, 21, 8, 54, 66, 67, 0, 70
50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61
83, 98, 99, 82, 91, 88
89, 86, 85
P0
P1
P2
P3
Processes P0 and P2 choose andbroadcast randomly chosen pivots
Lower“half”
Upper“half”
![Page 18: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/18.jpg)
Parallel Quicksort
75, 15, 64, 21, 8, 54, 66, 67, 0, 70
50, 12, 47, 72, 65, 54, 66,22, 20, 40, 47, 19, 61
83, 98, 99, 82, 91, 88
89, 86, 85
P0
P1
P2
P3
Exchange values
Lower“half”
Upper“half”
![Page 19: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/19.jpg)
Parallel Quicksort
15, 21, 8, 0, 12, 20, 19
50, 47, 72, 65, 54, 66, 22, 40,47, 61, 75, 64, 54, 66, 67, 70
83, 82, 91, 88, 89, 86, 85
98, 99
P0
P1
P2
P3
Exchange values
Lower “half”of lower “half”
Lower “half”of upper “half”
Upper “half”of lower “half”
Upper “half”of upper “half”
![Page 20: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/20.jpg)
Parallel Quicksort
0, 8, 12, 15, 19, 20, 21
22, 40, 47, 47, 50, 54, 54, 61,64, 65, 66, 66, 67, 70, 72, 75
82, 83, 85, 86, 88, 89, 91
98, 99
P0
P1
P2
P3
Each processor sorts values it controls
Lower “half”of lower “half”
Lower “half”of upper “half”
Upper “half”of lower “half”
Upper “half”of upper “half”
![Page 21: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/21.jpg)
Analysis of Parallel Quicksort
Execution time dictated by when last process completesAlgorithm likely to do a poor job balancing number of elements sorted by each processCannot expect pivot value to be true medianCan choose a better pivot value
![Page 22: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/22.jpg)
Hyperquicksort
Start where parallel quicksort ends: each process sorts its sublistFirst “sortedness” condition is metTo meet second, processes must still exchange valuesProcess can use median of its sorted list as the pivot valueThis is much more likely to be close to the true median
![Page 23: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/23.jpg)
Hyperquicksort
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
20, 40, 89, 47, 19, 61, 86, 85
P0
P1
P2
P3
Number of processors is a power of 2
![Page 24: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/24.jpg)
Hyperquicksort
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
19, 20, 40, 47, 61, 85, 86, 89
P0
P1
P2
P3
Each process sorts values it controls
![Page 25: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/25.jpg)
Hyperquicksort
8, 15, 21, 54, 64, 75, 91, 88
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
19, 20, 40, 47, 61, 85, 86, 89
P0
P1
P2
P3
Process P0 broadcasts its median value
![Page 26: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/26.jpg)
Hyperquicksort
8, 15, 21, 54, 64, 75, 91, 88
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
19, 20, 40, 47, 61, 85, 86, 89
P0
P1
P2
P3
Processes will exchange “low”, “high” lists
![Page 27: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/27.jpg)
Hyperquicksort
0, 8, 15, 21, 54
12, 19, 20, 22, 40, 47, 47, 50, 54
64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99
61, 65, 66, 72, 85, 86, 89
P0
P1
P2
P3
Processes merge kept and received values.
![Page 28: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/28.jpg)
Hyperquicksort
0, 8, 15, 21, 54
12, 19, 20, 22, 40, 47, 47, 50, 54
64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99
61, 65, 66, 72, 85, 86, 89
P0
P1
P2
P3
Processes P0 and P2 broadcast median values.
![Page 29: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/29.jpg)
Hyperquicksort
0, 8, 15, 21, 54
12, 19, 20, 22, 40, 47, 47, 50, 54
64, 66, 67, 70, 75, 82, 83, 88, 91, 98, 99
61, 65, 66, 72, 85, 86, 89
P0
P1
P2
P3
Communication pattern for second exchange
![Page 30: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/30.jpg)
Hyperquicksort
0, 8, 12, 15
19, 20, 21, 22, 40, 47, 47, 50, 54, 54
61, 64, 65, 66, 66, 67, 70, 72, 75, 82
83, 85, 86, 88, 89, 91, 98, 99
P0
P1
P2
P3
After exchange-and-merge step
![Page 31: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/31.jpg)
Complexity Analysis Assumptions
Average-case analysisLists stay reasonably balancedCommunication time dominated by message transmission time, rather than message latency
![Page 32: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/32.jpg)
Complexity Analysis
Initial quicksort step has time complexity ((n/p) log (n/p))Total comparisons needed for log p merge steps: ((n/p) log p)Total communication time for log p exchange steps: ((n/p) log p)
![Page 33: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/33.jpg)
Isoefficiency Analysis
Sequential time complexity: (n log n)Parallel overhead: (n log p)Isoefficiency relation:n log n C n log p log n C log p n pC
The value of C determines the scalability. Scalability depends on ratio of communication speed to computation speed.
1//)( CCC pppppM
![Page 34: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/34.jpg)
Another Scalability Concern
Our analysis assumes lists remain balancedAs p increases, each processor’s share of list decreasesHence as p increases, likelihood of lists becoming unbalanced increasesUnbalanced lists lower efficiencyWould be better to get sample values from all processes before choosing median
![Page 35: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/35.jpg)
Parallel Sorting by Regular Sampling (PSRS Algorithm)
Each process sorts its share of elementsEach process selects regular sample of sorted listOne process gathers and sorts samples, chooses pivot values from sorted sample list, and broadcasts these pivot valuesEach process partitions its list into p pieces, using pivot valuesEach process sends partitions to other processesEach process merges its partitions
![Page 36: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/36.jpg)
PSRS Algorithm
75, 91, 15, 64, 21, 8, 88, 54
50, 12, 47, 72, 65, 54, 66, 22
83, 66, 67, 0, 70, 98, 99, 82
P0
P1
P2
Number of processors does nothave to be a power of 2.
![Page 37: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/37.jpg)
PSRS Algorithm
Each process sorts its list using quicksort.
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 38: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/38.jpg)
PSRS Algorithm
Each process chooses p regular samples.
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 39: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/39.jpg)
PSRS Algorithm
One process collects p2 regular samples.
15, 54, 75, 22, 50, 65, 66, 70, 83
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 40: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/40.jpg)
PSRS Algorithm
One process sorts p2 regular samples.
15, 22, 50, 54, 65, 66, 70, 75, 83
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 41: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/41.jpg)
PSRS Algorithm
One process chooses p-1 pivot values.
15, 22, 50, 54, 65, 66, 70, 75, 83
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 42: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/42.jpg)
PSRS Algorithm
One process broadcasts p-1 pivot values.
15, 22, 50, 54, 65, 66, 70, 75, 83
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 43: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/43.jpg)
PSRS Algorithm
Each process divides list, based on pivotvalues.
8, 15, 21, 54, 64, 75, 88, 91
12, 22, 47, 50, 54, 65, 66, 72
0, 66, 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 44: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/44.jpg)
PSRS Algorithm
Each process sends partitions tocorrect destination process.
8, 15, 21 12, 22, 47, 50 0
54, 64 54, 65, 66 66
75, 88, 91 72 67, 70, 82, 83, 98, 99
P0
P1
P2
![Page 45: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/45.jpg)
PSRS Algorithm
Each process merges p partitions.
0, 8, 12, 15, 21, 22, 47, 50
54, 54, 64, 65, 66, 66
67, 70, 72, 75, 82, 83, 88, 91, 98, 99
P0
P1
P2
![Page 46: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/46.jpg)
Assumptions
Each process ends up merging close to n/p elementsExperimental results show this is a valid assumptionProcessor interconnection network supports p simultaneous message transmissions at full speed4-ary hypertree is an example of such a network
![Page 47: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/47.jpg)
Time Complexity Analysis
ComputationsInitial quicksort: ((n/p)log(n/p))Sorting regular samples: (p2 log p)Merging sorted sublists: ((n/p)log pOverall: ((n/p)(log n + log p) + p2log p)
CommunicationsGather samples, broadcast pivots: (log p)All-to-all exchange: (n/p)Overall: (n/p)
![Page 48: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/48.jpg)
Isoefficiency Analysis
Sequential time complexity: (n log n)Parallel overhead: (n log p)Isoefficiency relation:n log n Cn log p log n C log pScalability function same as for hyperquicksortScalability depends on ratio of communication to computation speeds
![Page 49: CSCI 4440 / 8446 Parallel Computing Three Sorting Algorithms.](https://reader036.fdocuments.us/reader036/viewer/2022062401/56649d545503460f94a310ec/html5/thumbnails/49.jpg)
Summary
Three parallel algorithms based on quicksortKeeping list sizes balanced
Parallel quicksort: poorHyperquicksort: betterPSRS algorithm: excellent
Average number of times each key moved:Parallel quicksort and hyperquicksort: log p / 2PSRS algorithm: (p-1)/p