Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort...
Transcript of Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort...
![Page 1: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/1.jpg)
Introduction to Parallel Computing
George KarypisSorting
![Page 2: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/2.jpg)
OutlineBackgroundSorting NetworksQuicksortBucket-Sort & Sample-Sort
![Page 3: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/3.jpg)
BackgroundInput Specification
Each processor has n/p elementsA ordering of the processors
Output SpecificationEach processor will get n/p consecutive elements of the final sorted array.The “chunk” is determined by the processor ordering.
VariationsUnequal number of elements on output.
In general, this is not a good idea and it may require a shift to obtain the equal size distribution.
![Page 4: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/4.jpg)
Basic Operation: Compare-Split Operation
Single element per processor
Multiple elements per processor
![Page 5: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/5.jpg)
Sorting NetworksSorting is one of the fundamental problems in Computer ScienceFor a long time researchers have focused on the problem of “how fast can we sort n elements”?
Serialnlog(n) lower-bound for comparison-based sorting
ParallelO(1), O(log(n)), O(???)
Sorting networksCustom-made hardware for sorting!
Hardware & algorithmMostly of theoretical interest but fun to study!
![Page 6: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/6.jpg)
Elements of Sorting NetworksKey Idea:
Perform many comparisons in parallel.
Key Elements:Comparators:
Consist of two-input, two-output wiresTake two elements on the input wires and outputs them in sorted order in the output wires.
Network architecture:The arrangement of the comparators into interconnected comparator columns
similar to multi-stage networks
Many sorting networks have been developed.
Bitonic sorting networkΘ(log2(n)) columns of comparators.
![Page 7: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/7.jpg)
Bitonic Sequence
Bitonic sequences are graphically represented by lines as follows:
12
7
0
46
![Page 8: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/8.jpg)
Why Bitonic Sequences?A bitonic sequence can be “easily” sorted in increasing/decreasing order.
s s1 s2
• Every element of s1 will be less than or equal to every element of s2• Both s1 and s2 are bitonic sequences.• So how can a bitonic sequence be sorted?
BitonicSplit
![Page 9: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/9.jpg)
An example
![Page 10: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/10.jpg)
Bitonic Merging Network
A comparator network that takes as input a bitonicsequence and performs a sequence of bitonic splits to sort it.
+BM[n]A bitonic merging network for sorting in increasing order an n-element bitonicsequence.
-BM[n]Similar sort in decreasing order.
![Page 11: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/11.jpg)
Are we done?Given a set of elements, how do we re-arrange them into a bitonic sequence?Key Idea:
Use successively larger bitonic networks to transform the set into a bitonic sequence.
![Page 12: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/12.jpg)
An example
![Page 13: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/13.jpg)
ComplexityHow many columns of comparators are required to sort n=2l elements?
i.e., depth d(n) of the network?
![Page 14: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/14.jpg)
Bitonic Sort on a HypercubeOne-element-per-processor case
How do we map the algorithm onto a hypercube?What is the comparator?How do the wires get mapped?
What can you say about thepairs of wires that are inputsto the various comparators?
![Page 15: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/15.jpg)
Illustration
![Page 16: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/16.jpg)
Communication Pattern
![Page 17: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/17.jpg)
Algorithm
Complexity?
![Page 18: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/18.jpg)
Bitonic Sort on a MeshOne-element-per-processor case
How do the wires get mapped?
Which one is better?Why?
![Page 19: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/19.jpg)
Row-Major Shuffled Mapping
Complexity?
Can we do better?What is the lowest bound of sorting on a mesh?
communication performed by each process
![Page 20: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/20.jpg)
More than one element per processor
Hypercube
Mesh
![Page 21: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/21.jpg)
Bitonic Sort Summary
![Page 22: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/22.jpg)
Quicksort
![Page 23: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/23.jpg)
Parallel FormulationHow about recursive decomposition?
Is it a good idea?We need to do the partitioning of the array around a pivot element in parallel.
What is the lower bound of parallel quicksort?
What will it take to achieve this lower bound?
![Page 24: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/24.jpg)
Optimal for CRCW PRAMOne element per processorArbitrary resolution of the concurrent writes.Views the sorting as a two-step process:
(i) Constructing a binary tree of pivot elements(ii) Obtaining the sorted sequence by performing an inordertraversal of this binary tree.
![Page 25: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/25.jpg)
Building the Binary Tree
Complexity?
![Page 26: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/26.jpg)
Practical QuicksortShared-memory
Data resides on a shared array.During a partitioning each processor is responsible for a certain portion.
Array Partitioning:Select & Broadcast pivot.Local re-arrangement.
Is this required?Global re-arrangement.
![Page 27: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/27.jpg)
Efficient Global Rearrangement
![Page 28: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/28.jpg)
Practical QuicksortComplexity
Complexity for message-passing is similar assuming that the all-to-allpersonalized communication is not cross-bisection bandwidth limited.
![Page 29: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/29.jpg)
A word on Pivot SelectionSelecting pivots that lead to balanced partitions is importance
height of the treeeffective utilization of processors
![Page 30: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/30.jpg)
Sample SortGeneralization of bucket sort with data-driven sampling
n/p elements per-processor.Each processor sorts is local elements. Each processor selects p-1 equally spaced elements from its own list.The combined p(p-1) set of elements are sorted and p-1 equally spaced elements are selected from that list.Each processor splits its own list according to these splitters into p buckets.Each processor sends its ith bucket to the ith processor.Each processor merges the elements that it receives.Done.
![Page 31: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/31.jpg)
Sample Sort Illustration
![Page 32: Introduction to Parallel Computingkarypis/parbook/Lectures/GK... · 2005-01-08 · Sample Sort Generalization of bucket sort with data-driven sampling n/p elements per-processor.](https://reader030.fdocuments.us/reader030/viewer/2022040806/5e46ef4cb8f95e1ca149731c/html5/thumbnails/32.jpg)
Sample Sort Complexity
Assumes a serial sort