Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel...
Transcript of Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel...
![Page 1: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/1.jpg)
Introduction to Parallel Computing
George KarypisPrinciples of Parallel Algorithm Design
![Page 2: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/2.jpg)
OutlineOverview of some Serial AlgorithmsParallel Algorithm vs Parallel FormulationElements of a Parallel Algorithm/FormulationCommon Decomposition Methods
concurrency extractor!Common Mapping Methods
parallel overhead reducer!
![Page 3: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/3.jpg)
Some Serial AlgorithmsWorking Examples
Dense Matrix-Matrix & Matrix-Vector MultiplicationSparse Matrix-Vector MultiplicationGaussian EliminationFloyd’s All-pairs Shortest PathQuicksortMinimum/Maximum FindingHeuristic Search—15-puzzle problem
![Page 4: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/4.jpg)
Dense Matrix-Vector Multiplication
![Page 5: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/5.jpg)
Dense Matrix-Matrix Multiplication
![Page 6: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/6.jpg)
Sparse Matrix-Vector Multiplication
![Page 7: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/7.jpg)
Gaussian Elimination
![Page 8: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/8.jpg)
Floyd’s All-Pairs Shortest Path
![Page 9: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/9.jpg)
Quicksort
![Page 10: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/10.jpg)
Minimum Finding
![Page 11: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/11.jpg)
15—Puzzle Problem
![Page 12: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/12.jpg)
Parallel Algorithm vs Parallel Formulation
Parallel FormulationRefers to a parallelization of a serial algorithm.
Parallel AlgorithmMay represent an entirely different algorithm than the one used serially.
We primarily focus on “Parallel Formulations”Our goal today is to primarily discuss how to develop such parallel formulations.Of course, there will always be examples of “parallel algorithms” that were not derived from serial algorithms.
![Page 13: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/13.jpg)
Elements of a Parallel Algorithm/Formulation
Pieces of work that can be done concurrentlytasks
Mapping of the tasks onto multiple processorsprocesses vs processors
Distribution of input/output & intermediate data across the different processorsManagement the access of shared data
either input or intermediateSynchronization of the processors at various points of the parallel execution
Holy Grail:Maximize concurrency and reduce overheads due to parallelization!Maximize potential speedup!
![Page 14: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/14.jpg)
Finding Concurrent Pieces of Work
Decomposition:The process of dividing the computation into smaller pieces of work i.e., tasks
Tasks are programmer defined and are considered to be indivisible
![Page 15: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/15.jpg)
Example: Dense Matrix-Vector Multiplication
Tasks can be of different size.• granularity of a task
![Page 16: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/16.jpg)
Example: Query Processing
Query:
![Page 17: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/17.jpg)
Example: Query ProcessingFinding concurrent tasks…
![Page 18: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/18.jpg)
Task-Dependency GraphIn most cases, there are dependencies between the different tasks
certain task(s) can only start once some other task(s) have finished
e.g., producer-consumer relationshipsThese dependencies are represented using a DAG called task-dependency graph
![Page 19: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/19.jpg)
Task-Dependency Graph (cont)Key Concepts Derived from the Task-Dependency Graph
Degree of ConcurrencyThe number of tasks that can be concurrently executed
we usually care about the average degree of concurrency
Critical PathThe longest vertex-weighted path in the graph
The weights represent task size
Task granularity affects both of the above characteristics
![Page 20: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/20.jpg)
Task-Interaction GraphCaptures the pattern of interaction between tasks
This graph usually contains the task-dependency graph as a subgraph
i.e., there may be interactions between tasks even if there are no dependencies
these interactions usually occur due to accesses on shared data
![Page 21: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/21.jpg)
Task Dependency/Interaction Graphs
These graphs are important in developing effectively mapping the tasks onto the different processors
Maximize concurrency and minimize overheads
![Page 22: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/22.jpg)
Common Decomposition Methods
Data DecompositionRecursive DecompositionExploratory DecompositionSpeculative DecompositionHybrid Decomposition
Task decomposition methods
![Page 23: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/23.jpg)
Recursive DecompositionSuitable for problems that can be solved using the divide-and-conquer paradigmEach of the subproblems generated by the divide step becomes a task
![Page 24: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/24.jpg)
Example: Quicksort
![Page 25: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/25.jpg)
Example: Finding the MinimumNote that we can obtain divide-and-conquer algorithms for problems that are traditionally solved using non-divide-and-conquer approaches
![Page 26: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/26.jpg)
Recursive DecompositionHow good are the decompositions that it produces?
average concurrency?critical path?
How do the quicksort and min-finding decompositions measure-up?
![Page 27: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/27.jpg)
Data DecompositionUsed to derive concurrency for problems that operate on large amounts of dataThe idea is to derive the tasks by focusing on the multiplicity of dataData decomposition is often performed in two steps
Step 1: Partition the dataStep 2: Induce a computational partitioning from the data partitioning
Which data should we partition?Input/Output/Intermediate?
Well… all of the above—leading to different data decomposition methods
How do induce a computational partitioning?Owner-computes rule
![Page 28: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/28.jpg)
Example: Matrix-Matrix Multiplication
Partitioning the output data
![Page 29: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/29.jpg)
Example: Matrix-Matrix Multiplication
Partitioning the intermediate data
![Page 30: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/30.jpg)
Data DecompositionIs the most widely-used decomposition technique
after all parallel processing is often applied to problems that have a lot of datasplitting the work based on this data is the natural way to extract high-degree of concurrency
It is used by itself or in conjunction with other decomposition methods
Hybrid decomposition
![Page 31: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/31.jpg)
Exploratory DecompositionUsed to decompose computations that correspond to a search of a space of solutions
![Page 32: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/32.jpg)
Example: 15-puzzle Problem
![Page 33: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/33.jpg)
Exploratory DecompositionIt is not as general purposeIt can result in speedup anomalies
engineered slow-down or superlinearspeedup
![Page 34: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/34.jpg)
Speculative DecompositionUsed to extract concurrency in problems in which the next step is one of many possible actions that can only be determined when the current tasks finishesThis decomposition assumes a certain outcome of the currently executed task and executes some of the next steps
Just like speculative execution at the microprocessor level
![Page 35: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/35.jpg)
Example: Discrete Event Simulation
![Page 36: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/36.jpg)
Speculative ExecutionIf predictions are wrong…
work is wastedwork may need to be undone
state-restoring overheadmemory/computations
However, it may be the only way to extract concurrency!
![Page 37: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/37.jpg)
Mapping the TasksWhy do we care about task mapping?
Can I just randomly assign them to the available processors?
Proper mapping is critical as it needs to minimize the parallel processing overheads
If Tp is the parallel runtime on p processors and Ts is the serial runtime, then the total overhead To is p*Tp – Ts
The work done by the parallel system beyond that required by theserial system
Overhead sources:Load imbalanceInter-process communication
coordination/synchronization/data-sharing
remember the holy grail…
they can be at odds with each
other
![Page 38: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/38.jpg)
Why Mapping can be Complicated?Proper mapping needs to take into account the task-dependency and interaction graphs
Are the tasks available a priori?Static vs dynamic task generation
How about their computational requirements?Are they uniform or non-uniform?Do we know them a priori?
How much data is associated with each task?How about the interaction patterns between the tasks?
Are they static or dynamic?Do we know them a priori?Are they data instance dependent?Are they regular or irregular?Are they read-only or read-write?
Depending on the above characteristics different mapping techniques are required of different complexity and cost
Task dependency graph
Task interaction graph
![Page 39: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/39.jpg)
Example: Simple & Complex Task Interaction
![Page 40: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/40.jpg)
Mapping Techniques for Load Balancing
Be aware…The assignment of tasks whose aggregate computational requirements are the same does not automatically ensure load balance.
Each processor is
assigned three tasks but (a) is better than (b)!
![Page 41: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/41.jpg)
Load Balancing TechniquesStatic
The tasks are distributed among the processors prior to the executionApplicable for tasks that are
generated staticallyknown and/or uniform computational requirements
DynamicThe tasks are distributed among the processors during the execution of the algorithm
i.e., tasks & data are migratedApplicable for tasks that are
generated dynamicallyunknown computational requirements
![Page 42: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/42.jpg)
Static Mapping—Array Distribution
Suitable for algorithms that use data decomposition their underlying input/output/intermediate data are in the form of arrays
Block DistributionCyclic DistributionBlock-Cyclic DistributionRandomized Block Distributions
1D/2D/3D
![Page 43: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/43.jpg)
Examples: Block Distributions
![Page 44: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/44.jpg)
Examples: Block Distributions
![Page 45: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/45.jpg)
Example: Block-Cyclic Distributions
Gaussian EliminationThe active portionof the array shrinksas the computationsprogress
![Page 46: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/46.jpg)
Random Block DistributionsSometimes the computations are performed only at certain portions of an array
sparse matrix-matrix multiplication
![Page 47: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/47.jpg)
Random Block DistributionsBetter load balance can be achieved via a random block distribution
![Page 48: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/48.jpg)
Graph PartitioningA mapping can be achieved by directly partitioning the task interaction graph.
EG: Finite element mesh-based computations
![Page 49: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/49.jpg)
Directly partitioning this graph
![Page 50: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/50.jpg)
Example: Sparse Matrix-VectorAnother instance of graph partitioning
![Page 51: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/51.jpg)
Dynamic Load Balancing Schemes
There is a huge body of researchCentralized Schemes
A certain processors is responsible for giving out workmaster-slave paradigm
Issue:task granularity
Distributed SchemesWork can be transferred between any pairs of processors.Issues:
How do the processors get paired?Who initiates the work transfer? push vs pullHow much work is transferred?
![Page 52: Introduction to Parallel Computingkarypis/parbook... · Elements of a Parallel Algorithm/Formulation Pieces of work that can be done concurrently tasks Mapping of the tasks onto multiple](https://reader033.fdocuments.us/reader033/viewer/2022042218/5ec2c9845c76333ddb200252/html5/thumbnails/52.jpg)
Mapping to Minimize Interaction Overheads
Maximize data localityMinimize volume of data-exchangeMinimize frequency of interactionsMinimize contention and hot spotsOverlap computation with interactionsSelective data and computation replication
Achieving the above is usually an interplay of decomposition and mapping and is usually done iteratively