Distributed Memory Programming With MPI (4) -...
Transcript of Distributed Memory Programming With MPI (4) -...
![Page 1: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/1.jpg)
1 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Distributed Memory Programming With MPI (4)
2014 Spring
Jinkyu Jeong ([email protected])
![Page 2: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/2.jpg)
2 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Roadmap
Hello World in MPI program
Basic APIs of MPI
Example program • The Trapezoidal Rule in MPI.
Collective communication.
MPI derived datatypes.
Performance evaluation of MPI programs.
Parallel sorting.
Safety in MPI programs.
![Page 3: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/3.jpg)
3 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
A PARALLEL SORTING ALGORITHM
![Page 4: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/4.jpg)
4 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Sorting
Parallelizing sorting
• n keys and p = comm sz processes.
• n/p keys assigned to each process.
• No restrictions on which keys are assigned to which processes.
• When the algorithm terminates: – The keys assigned to each process should be sorted in (say)
increasing order.
– If 0 ≤ q < r < p, then each key assigned to process q should be less than or equal to every key assigned to process r.
![Page 5: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/5.jpg)
5 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Serial bubble sort
Bubble sort cannot be efficiently parallelized
• Inner-loop parallelization
– P0 compares and swaps a[0] and a[1]
– P1 compares and swaps a[1] and a[2]
• Outer-loop parallelization
– After each loop, data stored in the array are changed
![Page 6: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/6.jpg)
6 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Odd-even transposition sort
Even phases, compare swaps:
Odd phases, compare swaps:
This odd-even transposition sort can be parallelized
• P0 compares and swaps a[0] and a[1]
• P1 compares and swaps a[2] and a[3]
![Page 7: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/7.jpg)
7 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Example: odd-even transposition sort
Start: 5, 9, 4, 3
Even phase: compare-swap (5,9) and (4,3) getting the list 5, 9, 3, 4
Odd phase: compare-swap (9,3) getting the list 5, 3, 9, 4
Even phase: compare-swap (5,3) and (9,4) getting the list 3, 5, 4, 9
Odd phase: compare-swap (5,4) getting the list 3, 4, 5, 9
![Page 8: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/8.jpg)
8 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Serial odd-even transposition sort
![Page 9: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/9.jpg)
9 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Parallelizing Odd-even transposition sort
For each phase, processes perform compare and swap of two numbers in parallel
Even phases, compare swaps:
Odd phases, compare swaps:
P0 P1 P1 …
P0 P1 P1 …
![Page 10: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/10.jpg)
10 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Parallelizing odd-even transposition sort -con’t
Start: 5, 9, 4, 3
Even phase: compare-swap (5,9) and (4,3) getting the list 5, 9, 3, 4
Odd phase: compare-swap (9,3) getting the list 5, 3, 9, 4
Even phase: compare-swap (5,3) and (9,4) getting the list 3, 5, 4, 9
Odd phase: compare-swap (5,4) getting the list 3, 4, 5, 9
P0 P1
![Page 11: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/11.jpg)
11 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Communications among tasks in odd-even sort For each phase, a process have to know whether two
values to be compared have changed or not
So, a process have to consult its sibling process before comparing two values
![Page 12: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/12.jpg)
12 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Parallel odd-even transposition sort
Assumption
• Each process performs local sorting for the given numbers
• Then, during the compare-swap phases, numbers in two processes are re-distributed based on the sorting order
![Page 15: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/15.jpg)
15 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Safety in MPI programs
The MPI standard allows MPI_Send to behave in two different ways: • it can simply copy the message into an MPI managed
buffer and return,
• or it can block until the matching call to MPI_Recv starts.
Many implementations of MPI set a threshold at which the system switches from buffering to blocking.
Relatively small messages will be buffered by MPI_Send.
Larger messages, will cause it to block.
![Page 16: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/16.jpg)
16 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Safety in MPI programs
If the MPI_Send executed by each process blocks, no process will be able to start executing a call to MPI_Recv, and the program will hang or deadlock.
Each process is blocked waiting for an event that will never happen.
(see pseudo-code)
![Page 17: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/17.jpg)
17 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Safety in MPI programs
A program that relies on MPI provided buffering is said to be unsafe.
Such a program may run without problems for various sets of input, but it may hang or crash with other sets.
![Page 18: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/18.jpg)
18 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
MPI_Ssend
An alternative to MPI_Send defined by the MPI standard.
The extra “s” stands for synchronous and MPI_Ssend is guaranteed to block until the matching receive starts.
![Page 19: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/19.jpg)
19 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Restructuring communication
![Page 20: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/20.jpg)
20 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
MPI_Sendrecv
An alternative to scheduling the communications ourselves.
Carries out a blocking send and a receive in a single call.
The dest and the source can be the same or different.
Especially useful because MPI schedules the communications so that the program won’t hang or crash.
A Rank Buffer
0
B 1
MPI_Sendrecv() A B Rank Buffer
0
B A 1
send_buf recv_buf
![Page 22: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/22.jpg)
22 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Restructuring communication using MPI_Sendrecv
![Page 23: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/23.jpg)
23 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Merging Numbers during Compare & Swap in a Process
![Page 24: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/24.jpg)
24 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Run-times of parallel odd-even sort
(times are in milliseconds)
![Page 25: Distributed Memory Programming With MPI (4) - …csl.skku.edu/uploads/SSE3054S14/05-MPI-4.pdfDistributed Memory Programming With MPI (4) ... blocks, no process will be able to start](https://reader035.fdocuments.us/reader035/viewer/2022062600/5ad3fa797f8b9aff228b47c3/html5/thumbnails/25.jpg)
25 SSE3054: Multicore Systems | Spring 2014 | Jinkyu Jeong ([email protected])
Concluding Remarks
MPI or the Message-Passing Interface • An interface of parallel programming in distributed
memory system
• Supports C, C++, and Fortran
• Many MPI implementations – Ex, MPICH2
SPMD program
Message passing • Communicator
• Point-to-point communication
• Collective communication
• Safe use of communication is important – Ex. MPI_Sendrecv()