Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI...
Transcript of Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI...
![Page 1: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/1.jpg)
![Page 2: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/2.jpg)
Parallel Programming with MPI
Basic and Advanced Features of the Message Passing Interface
03/10/2016 PRACE Autumn School 2016 2
![Page 3: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/3.jpg)
Introduction
• Aim of this session
– Overview on most important MPI Feature sets
• Focusing on 20% out of 300+ MPI functions
• Covering hopefully 80% of your use-cases
– Knowing where to start and where to find documentation
03/10/2016 PRACE Autumn School 2016 3
![Page 4: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/4.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 4
![Page 5: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/5.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 5
![Page 6: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/6.jpg)
Overview
• MPI, the Message Passing Interface, – is a standardized comprehensive library interface specification – targeted primarily at High Performance Computing (HPC).
• In addition to the classical message passing model, various
extensions apart from message passing have been added to the MPI standard, e.g. – Dynamic Process Creation – Remote Memory Access (RMA) – Parallel I/O (MPI I/O)
03/10/2016 PRACE Autumn School 2016 6
![Page 7: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/7.jpg)
Overview
• The MPI standard is defined through – an open process – by a community of
• Parallel computing vendors • Computer scientists and • Application developers
• The main advantages the MPI standard brings are portablility and ease-of-use.
03/10/2016 PRACE Autumn School 2016 7
![Page 8: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/8.jpg)
Overview
• Multiple implementations of MPI exist.
– The most predominant implementations are
• OpenMPI (https://www.open-mpi.org/)
• MPICH (http://www.mpich.org/)
– Vendor-specific implementations are often based on one of these two.
03/10/2016 PRACE Autumn School 2016 8
![Page 9: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/9.jpg)
History of MPI • 1994:
– MPI 1.0 was defined, covering • Point-to-Point communications • Collective operations • Process topologies (Cartesian and general graph)
• 1995: – MPI 1.1 brings minor changes.
• 1997: – MPI 1.2 brings additional clarifications and corrections – MPI 2.0 defines
• Interoperability with threads • Dynamic process creation • Remote Memory Access (RMA)
• 2008: – MPI 1.3 (now MPI-1) brings further minor changes. – MPI 2.1 combines MPI 2.0 and MPI 1.3 in one single
document.
• 2009: – MPI 2.2 (now MPI-2) adds
• In-Place option for collective routines • Deprecation of some functions • Deprecation of the C++ language binding
• 2012 – MPI 3.0 is a major update and introduces
• Non-blocking collective communications • Neighborhood collective communications • Fortran 2008 bindings • Removal of some deprecated functions • Removal of C++ language binding
• 2015 – MPI 3.1 (now MPI-3) is a minor update, mostly
• Corrections and clarifications.
03/10/2016 PRACE Autumn School 2016 9
![Page 10: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/10.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 10
![Page 11: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/11.jpg)
MPI Basics
Semantic Terms and Basic Usage
03/10/2016 PRACE Autumn School 2016 11
![Page 12: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/12.jpg)
MPI Execution
• MPI execution – Several processes communicating with each other
– Typical: All Processes execute the same program but operate on
different data (SPMD), e.g. > mpirun -n 5 ./myprogram
– Also possible: Processes executing different programs (MPMD),
e.g. > mpirun -n 1 ./master : -n 4 ./worker
03/10/2016 PRACE Autumn School 2016 12
![Page 13: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/13.jpg)
(Intra-)Communicator
• Communicator (also Intra-Communicator) – A subset of processes
• Processes within a communicator can communicate with each other
– A context for communication • Messages within different communicators will never be mixed up
– Predefined Communicators • MPI_COMM_WORLD (Containing all processes) • MPI_COMM_SELF (Containing only the current process)
03/10/2016 PRACE Autumn School 2016 13
![Page 14: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/14.jpg)
Inter-Communicator
• Inter-Communicator – A pair of disjoint subsets of processes forming two
groups, say A and B.
– Processes of group A can communicate with processes of group B and vice versa. • But not for communication within group A or B.
03/10/2016 PRACE Autumn School 2016 14
![Page 15: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/15.jpg)
Process Rank
• Rank
– The unique numerical id of a process within a Communicator
– Strictly speaking “rank” is always with respect to a given Communicator, however often “rank” is used synonymous with “rank within MPI_COMM_WORLD”.
03/10/2016 PRACE Autumn School 2016 15
![Page 16: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/16.jpg)
Using MPI in programs
• C/C++ #include <mpi.h>
• Fortran 2008 + TS 29113 and later (recommended):
USE mpi_f08
• Earlier Fortran versions
USE mpi
03/10/2016 PRACE Autumn School 2016 16
![Page 17: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/17.jpg)
Compiling MPI Programs
• Most MPI Implementations provide wrapper compilers for convenience, e.g. – C:
mpicc, mpiicc
– C++: mpicxx, mpiCC, mpic++, mpiicpc
– Fortran: mpifc, mpif77, mpif90, mpiifort
03/10/2016 PRACE Autumn School 2016 17
![Page 18: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/18.jpg)
MPI Language Bindings
• C binding – All MPI functions return an error code (int)
• MPI_SUCCESS on success or otherwise implementation-defined.
– Function Naming Convention: MPI_Xxx_xxx
• Fortran binding – All MPI Operations are procedures
• The last argument is always the error code: INTEGER, OPTIONAL, INTENT(OUT) :: ierror
In the following, we will stick to C binding and C examples.
03/10/2016 PRACE Autumn School 2016 18
![Page 19: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/19.jpg)
MPI Initialization
• Single-Threaded execution int MPI_Init(int *argc, char ***argv)
• Multi-Threaded execution (Hybrid MPI programs, e.g. MPI+OpenMP) int MPI_Init_thread(int *argc, char ***argv, int required, int *provided) required may be one of: – MPI_THREAD_SINGLE: Only one thread will execute. – MPI_THREAD_FUNNELED: Only main thread makes MPI calls. – MPI_THREAD_SERIALIZED: Only one threat at a time makes MPI calls. – MPI_THREAD_MULTIPLE: No restrictions. Actually supported threading mode is returned as *provided.
03/10/2016 PRACE Autumn School 2016 19
![Page 20: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/20.jpg)
MPI Basics
Hello World Code Example
03/10/2016 PRACE Autumn School 2016 20
![Page 21: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/21.jpg)
Hello World Example (1/2)
03/10/2016 PRACE Autumn School 2016 21
#include <stdio.h> #include <mpi.h> int main(int argc, char* argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("Hello from rank %d of %d!\n", rank, size); MPI_Finalize(); return 0; }
![Page 22: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/22.jpg)
Hello World Example (2/2)
03/10/2016 PRACE Autumn School 2016 22
# Compilation > mpicc -o hello hello.c # Invocation > mpirun -n 5 ./hello # Possible Output Hello from rank 1 of 5! Hello from rank 2 of 5! Hello from rank 4 of 5! Hello from rank 0 of 5! Hello from rank 3 of 5!
![Page 23: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/23.jpg)
MPI Basics
Semantic Terms (continued)
03/10/2016 PRACE Autumn School 2016 23
![Page 24: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/24.jpg)
Blocking / Non-blocking operations
• Blocking – Function return indicates completion of associated operation. – E.g. MPI_Send, MPI_Recv
• Non-blocking – Function may return before associated operation has completed. – E.g. MPI_Isend, MPI_Irecv
• Note: “I” stands for “immediate”, i.e. the function will return “immediately”.
– Resources (e.g. message buffers) passed to function must not be reused until the operation has completed.
– Non-blocking functions return an MPI_Request handle for querying the completion status of the operation.
03/10/2016 PRACE Autumn School 2016 24
![Page 25: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/25.jpg)
Local / Non-local operations
• Local – The completion of the operation depends only on the local
executing process. – E.g. MPI_Comm_size, MPI_Comm_rank, MPI_Bsend
• Non-local – The completion of the operation may require the execution of
some MPI operation on another process (may involve communication).
– E.g. MPI_Send
03/10/2016 PRACE Autumn School 2016 25
![Page 26: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/26.jpg)
Collective / Non-collective operations
• Collective – An operation is collective on a communicator C, when all processes in
C are required to take part in the operation (all processes need to call the same MPI function on C).
– Multiple collective operations on C must be executed in the same order by all members of C.
– E.g. MPI_Bcast, MPI_Gather, MPI_Scatter, etc.
• Non-collective
– E.g. MPI_Send, MPI_Recv, etc.
03/10/2016 PRACE Autumn School 2016 26
![Page 27: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/27.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 27
![Page 28: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/28.jpg)
Point-to-Point Communication
Communication Protocols and Send Modes
03/10/2016 PRACE Autumn School 2016 28
![Page 29: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/29.jpg)
Communication Protocols • Eager protocol:
– Message is sent instantaneously. – Message may have to be buffered on the receiver side (if matching receive has
not yet been posted). – Typically used for small messages.
• Rendezvous protocol – 3-way handshake:
• Sender sends “Request to send” control message. • Receiver sends “Ready to send” (when matching receive is posted). • Sender sends actual message to receiver.
– Avoids message buffering on the receiver side. – Typically used for large messages.
03/10/2016 PRACE Autumn School 2016 29
![Page 30: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/30.jpg)
MPI Send Operations Blocking Variant Non-Blocking
Variant Protocol Locality
Standard Send: MPI_Send MPI_Isend Non-local
Synchronous Send: MPI_Ssend MPI_Issend Rendezvous Non-local
Ready Send: MPI_Rsend MPI_Irsend Eager * Non-local
Buffered Send: MPI_Bsend MPI_Ibsend Local
03/10/2016 PRACE Autumn School 2016 30
* Not explicitly stated by the standard, but typical for most MPI implementations;
![Page 31: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/31.jpg)
MPI Send Operations Blocking Variant Non-Blocking
Variant Protocol Locality
Standard Send: MPI_Send MPI_Isend Non-local
Synchronous Send: MPI_Ssend MPI_Issend Rendezvous Non-local
Ready Send: MPI_Rsend MPI_Irsend Eager Non-local
Buffered Send: MPI_Bsend MPI_Ibsend Local
03/10/2016 PRACE Autumn School 2016 31
• Completion Semantics: – Send buffer can be reused. – Message may have been sent or buffered locally (Most MPI implementations
use buffering for small messages).
![Page 32: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/32.jpg)
MPI Send Operations Blocking Variant Non-Blocking
Variant Protocol Locality
Standard Send: MPI_Send MPI_Isend Non-local
Synchronous Send: MPI_Ssend MPI_Issend Rendezvous Non-local
Ready Send: MPI_Rsend MPI_Irsend Eager Non-local
Buffered Send: MPI_Bsend MPI_Ibsend Local
03/10/2016 PRACE Autumn School 2016 32
• Completion Semantics: – A matching receive operation on the remote process has been posted. – Message transmission has started (may have finished). – Send buffer can be reused.
![Page 33: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/33.jpg)
MPI Send Operations Blocking Variant Non-Blocking
Variant Protocol Locality
Standard Send: MPI_Send MPI_Isend Non-local
Synchronous Send: MPI_Ssend MPI_Issend Rendezvous Non-local
Ready Send: MPI_Rsend MPI_Irsend Eager Non-local
Buffered Send: MPI_Bsend MPI_Ibsend Local
03/10/2016 PRACE Autumn School 2016 33
• Prerequisite: – The receiver has to be “ready” (matching receive has been posted). – No buffering will occur on the receiver side.
![Page 34: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/34.jpg)
MPI Send Operations Blocking Variant Non-Blocking
Variant Protocol Locality
Standard Send: MPI_Send MPI_Isend Non-local
Synchronous Send: MPI_Ssend MPI_Issend Rendezvous Non-local
Ready Send: MPI_Rsend MPI_Irsend Eager Non-local
Buffered Send: MPI_Bsend MPI_Ibsend Local
03/10/2016 PRACE Autumn School 2016 34
• Completion Semantics: – Same as Standard send. – Send buffer can be reused. – Message has been sent (typical case) or buffered.
![Page 35: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/35.jpg)
MPI Send Operations Blocking Variant Non-Blocking
Variant Protocol Locality
Standard Send: MPI_Send MPI_Isend Non-local
Synchronous Send: MPI_Ssend MPI_Issend Rendezvous Non-local
Ready Send: MPI_Rsend MPI_Irsend Eager Non-local
Buffered Send: MPI_Bsend MPI_Ibsend Local
03/10/2016 PRACE Autumn School 2016 35
• Completion Semantics: – Send buffer can be reused. – Message may have been sent (if matching receive operation has been posted)
or buffered.
![Page 36: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/36.jpg)
MPI Send Operations Blocking Variant Non-Blocking
Variant Protocol Locality
Standard Send: MPI_Send MPI_Isend Non-local
Synchronous Send: MPI_Ssend MPI_Issend Rendezvous Non-local
Ready Send: MPI_Rsend MPI_Irsend Eager Non-local
Buffered Send: MPI_Bsend MPI_Ibsend Local
03/10/2016 PRACE Autumn School 2016 36
• Advice: – Standard Send is most appropriate in most cases
• Most freedom for choosing best protocol and whether to use buffering or not.
– Synchronous send can be helpful for debugging (deadlock detection).
![Page 37: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/37.jpg)
MPI Receive Operations
• Any pairing of the different send and receive operations is legal.
03/10/2016 PRACE Autumn School 2016 37
Blocking Non-blocking
MPI_Recv MPI_Irecv
• Much simpler than Send:
![Page 38: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/38.jpg)
Point-to-Point Communication
Message Envelope and Message Matching
03/10/2016 PRACE Autumn School 2016 38
![Page 39: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/39.jpg)
Message Envelope • Message Envelope
– C: Communicator – src: Source rank of sending process w.r.t. C – dst: Destination rank of receiving process w.r.t. C – tag: Message Tag (arbitrary integer between 0 and at least 215-1 = 32767) – type: Type signature (sequence of basic data types occurring in the message) – len: Length of the message
• For send operations: – (C, dst, tag, type, len) are specified explicitly via arguments – src is given implicitly (rank of the local process w.r.t. C)
• For receive operations: – (C, src, tag, type, len) are specified explicitly (wildcards MPI_ANY_SOURCE and MPI_ANY_TAG allowed) – dst is given implicitly (rank of local process w.r.t. C)
03/10/2016 PRACE Autumn School 2016 39
![Page 40: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/40.jpg)
Message Matching • Send operation S
– Envelope E1
– Message M
• The operation R will receive the message M if and only if: – E1.C = E2.C (communicators match) – E1.dst = E2.dst (message goes to specified destination) – E1.src = E2.src or E2.src = MPI_ANY_SOURCE (message comes from specified source) – E1.tag = E2.tag or E2.tag = MPI_ANY_TAG (message has correct tag)
• Required for correctness (Runtime-checks can typically be turned on/off): – E1.type ≈ E2.type (sequence of basic types in the message is as expected, more on this later) – E1.len ≤ E2.len (message may be shorter then specified, but not longer)
• Receive operation R – Envelope E2
03/10/2016 PRACE Autumn School 2016 40
![Page 41: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/41.jpg)
Point-to-Point Communication
A simple example: 1D Halo exchange
03/10/2016 PRACE Autumn School 2016 41
![Page 42: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/42.jpg)
1D Halo-Exchange Example Variant 1 (1/2)
03/10/2016 PRACE Autumn School 2016 42
int rank, size; int data[3] = {-1, -1, -1}; int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); data[1] = rank; exchange_left(); exchange_right(); printf("Data of rank %d: {%d %d %d}\n", rank, data[0], data[1], data[2]); MPI_Finalize(); return 0; }
![Page 43: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/43.jpg)
1D Halo-Exchange Example Variant 1 (2/2)
03/10/2016 PRACE Autumn School 2016 43
void exchange_left() { int left = rank-1 >= 0 ? rank-1 : MPI_PROC_NULL; MPI_Send(&data[1], 1, MPI_INT, left, 0, MPI_COMM_WORLD); MPI_Recv(&data[0], 1, MPI_INT, left, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } void exchange_right() { int right = rank+1 < size ? rank+1 : MPI_PROC_NULL; MPI_Send(&data[1], 1, MPI_INT, right, 0, MPI_COMM_WORLD); MPI_Recv(&data[2], 1, MPI_INT, right, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); }
![Page 44: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/44.jpg)
1D Halo-Exchange Example Variant 1 (2/2)
03/10/2016 PRACE Autumn School 2016 44
void exchange_left() { int left = rank-1 >= 0 ? rank-1 : MPI_PROC_NULL; MPI_Send(&data[1], 1, MPI_INT, left, 0, MPI_COMM_WORLD); MPI_Recv(&data[0], 1, MPI_INT, left, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); } void exchange_right() { int right = rank+1 < size ? rank+1 : MPI_PROC_NULL; MPI_Send(&data[1], 1, MPI_INT, right, 0, MPI_COMM_WORLD); MPI_Recv(&data[2], 1, MPI_INT, right, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); }
What is the (biggest) problem here?
![Page 45: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/45.jpg)
1D Halo-Exchange Example Variant 1 - Problems
• Deadlock – When rendezvous protocol is used for sending, all processes hang in
first MPI_Send operation (no matching MPI_Recv at this stage). – Debugging Tip: Replacing MPI_Send with MPI_Ssend (“synchronous
send”) can help for detecting deadlocks (forces rendezvous protocol).
• Serialization – When eager protocol is used for sending, Process N cannot proceed
until Process N-1 sends data (exchange_left is always done first).
03/10/2016 PRACE Autumn School 2016 45
![Page 46: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/46.jpg)
1D Halo-Exchange Example Fixing the Deadlock
• Solution 1: – Using MPI_Sendrecv
• Combined MPI_Send and MPI_Recv, but user does not need to worry on correct order.
• Solution 2: – Using Non-blocking communication;
• Discussed later (Variant 2)
03/10/2016 PRACE Autumn School 2016 46
![Page 47: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/47.jpg)
1D Halo-Exchange Example Fixing the Deadlock
03/10/2016 PRACE Autumn School 2016 47
void exchange_left() { int left = rank-1 >= 0 ? rank-1 : MPI_PROC_NULL; MPI_Sendrecv( &data[1], 1, MPI_INT, left, 0, &data[0], 1, MPI_INT, left, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE ); } void exchange_right() { int right = rank+1 < size ? rank+1 : MPI_PROC_NULL; MPI_Sendrecv( &data[1], 1, MPI_INT, right, 0, &data[2], 1, MPI_INT, right, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE ); }
![Page 48: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/48.jpg)
1D Halo-Exchange Example Fixing Serialization
• Solution 1
– Processes with even rank communicate first with (1) right neighbor, then with (2) left neighbor.
– Processes with odd rank do it the other way round.
03/10/2016 PRACE Autumn School 2016 48
![Page 49: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/49.jpg)
1D Halo-Exchange Example Fixing Serialization – Solution 1
03/10/2016 PRACE Autumn School 2016 49
int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); data[1] = rank; if(rank % 1 == 0) { exchange_right(); exchange_left(); } else { exchange_left(); exchange_right(); } printf("Data of rank %d: {%d %d %d}\n", rank, data[0], data[1], data[2]); ...
![Page 50: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/50.jpg)
1D Halo-Exchange Example Fixing Serialization – Solution 2
• Solution 2 (more elegant)
– Let data “flow” in one diretction (e.g. right)
– And then let data “flow” in other direction.
03/10/2016 PRACE Autumn School 2016 50
![Page 51: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/51.jpg)
1D Halo-Exchange Example Fixing Serialization – Solution 2
03/10/2016 PRACE Autumn School 2016 51
int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); data[1] = rank; shift_left(); shift_right(); printf("Data of rank %d: {%d %d %d}\n", rank, data[0], data[1], data[2]); MPI_Finalize(); return 0; }
![Page 52: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/52.jpg)
1D Halo-Exchange Example Fixing Serialization – Solution 2
03/10/2016 PRACE Autumn School 2016 52
void shift_left() { int left = ..., right = ...; MPI_Sendrecv( &data[1], 1, MPI_INT, left, 0, &data[2], 1, MPI_INT, right, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE ); } void shift_right() { ... MPI_Sendrecv( &data[1], 1, MPI_INT, right, 0, &data[0], 1, MPI_INT, left, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE ); }
![Page 53: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/53.jpg)
1D Halo-Exchange Example Variant 2 (1/2)
03/10/2016 PRACE Autumn School 2016 53
int rank, size; int data[3] = {-1, -1, -1}; int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); data[1] = rank; exchange(); printf("Data of rank %d: {%d %d %d}\n", rank, data[0], data[1], data[2]); MPI_Finalize(); return 0; }
![Page 54: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/54.jpg)
1D Halo-Exchange Example Variant 2 (1/2)
03/10/2016 PRACE Autumn School 2016 54
void exchange() { int left = rank-1 >= 0 ? rank-1 : MPI_PROC_NULL; int right = rank+1 < size ? rank+1 : MPI_PROC_NULL; MPI_Request requests[4]; MPI_Isend(&data[1], 1, MPI_INT, left, 0, MPI_COMM_WORLD, &requests[0]); MPI_Isend(&data[1], 1, MPI_INT, right, 0, MPI_COMM_WORLD, &requests[1]); MPI_Irecv(&data[0], 1, MPI_INT, left, 0, MPI_COMM_WORLD, &requests[2]); MPI_Irecv(&data[2], 1, MPI_INT, right, 0, MPI_COMM_WORLD, &requests[3]); MPI_Waitall(4, requests, MPI_STATUSES_IGNORE); }
![Page 55: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/55.jpg)
Point-to-Point Communication
Receiving Messages of unknown length
03/10/2016 PRACE Autumn School 2016 55
![Page 56: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/56.jpg)
Receiving messages of known maximum length
03/10/2016 PRACE Autumn School 2016 56
#define MAXLEN 15 int recvbuf[MAXLEN]; int actual_length; void receive() { MPI_Status status; MPI_Recv(recvbuf, MAXLEN, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); MPI_Get_count(&status, MPI_INT, &actual_length); printf("Recieved %d integers from process 0.\n", actual_length); }
![Page 57: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/57.jpg)
Receiving messages of arbitrary length
03/10/2016 PRACE Autumn School 2016 57
int *recvbuf; int length; void receive() { MPI_Status status; MPI_Message message; // Blocking check for message, do not yet receive MPI_Mprobe(0, 0, MPI_COMM_WORLD, &message, &status); // Get message length, allocate buffer and receive MPI_Get_count(&status, MPI_INT, &length); recvbuf = malloc(length * sizeof(int)); MPI_Mrecv(recvbuf, length, MPI_INT, &message, &status); printf("Recieved %d integers from process 0.\n", length); }
![Page 58: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/58.jpg)
Point-to-Point Communication
Summary and Advice
03/10/2016 PRACE Autumn School 2016 58
![Page 59: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/59.jpg)
Summary • Standard Send and Receive (typical case):
– Use MPI_Send / MPI_Isend for sending – Use MPI_Recv / MPI_Irecv for receiving – Alternatively also MPI_Sendrecv (there is no MPI_Isendrecv).
• There is also MPI_Sendrecv_replace (in case send and receive buffer are the same)
• Other Send Modes – Synchronous Send forces rendezvous protocol (3-way handshake). – Ready Send avoids handshake, receiver has to be ready. – Buffered Send forces sender-side buffering (local operation).
03/10/2016 PRACE Autumn School 2016 59
![Page 60: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/60.jpg)
Summary • Request Completion (non-blocking operations)
– Single request: • MPI_Wait / MPI_Test: Wait is blocking; Test is non-blocking
– Array of requests: • MPI_Waitall / MPI_Testall: Check for completion of all requests • MPI_Waitany / MPI_Testany: Check for next completed request (exactly one) • MPI_Waitsome / MPI_Testsome: Check for next completed requests (maybe more than one at once)
• Messages of unknown length
– MPI_Mprobe / MPI_Improbe: • Check for a pending message without receiving • Returns message handle and status object
– MPI_Get_count: Get message length from status object – MPI_Mrecv / MPI_Imrecv: Receive message (by message handle)
03/10/2016 PRACE Autumn School 2016 60
![Page 61: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/61.jpg)
Advice • Standard Send is sufficient in most cases
– i.e. MPI_Send or MPI_Isend (paired with MPI_Recv or MPI_Irecv) – Gives MPI most freedom in choosing protocol and adequate buffering
• Synchronous send can be helpful for debugging (deadlock detection).
• Prefer higher-level communication constructs to tedious point-to-point communication whenever possible, e.g. – Collective operations
• on MPI_COMM_WORLD or tailored intra- or inter-communicators
– MPI-3 Sparse collective operations, • on Cartesian Communicators or MPI-3 Distributed Graph communicator.
03/10/2016 PRACE Autumn School 2016 61
![Page 62: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/62.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 62
![Page 63: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/63.jpg)
Remote Memory Access
a.k.a. One-sided communications
03/10/2016 PRACE Autumn School 2016 63
![Page 64: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/64.jpg)
RMA – Basic Workflow
• Create Memory window object for remote access – Makes a given memory region (buffer) usable for RMA
• Use MPI_Win_fence for synchronization
• Use MPI_Put / MPI_Get / MPI_Accumulate for accessing remote memory.
03/10/2016 PRACE Autumn School 2016 64
![Page 65: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/65.jpg)
RMA – Memory Window Creation
• MPI_Win_create: – Create Memory window from given (user-allocated) buffer.
• MPI_Win_allocate: – Create Memory window with memory allocated by MPI.
• MPI_Win_free: – Release Memory window – When window was created with MPI_Win_allocate also associated
buffer is released.
03/10/2016 PRACE Autumn School 2016 65
![Page 66: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/66.jpg)
RMA - Synchronization
• Active target synchronization (collective) – Use MPI_Win_fence to separate distinct “access epochs”
– Access epochs: • Local reads and writes (from/to local buffer)
• Local and remote reads (from local buffer)
• Remote writes or updates (to local buffer) – “exposure epoch”
• Also MPI_Put and MPI_Accumulate to the same remote memory region cannot be intermixed within one access epoch.
03/10/2016 PRACE Autumn School 2016 66
![Page 67: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/67.jpg)
RMA - Synchronization
• Active target synchronization (advanced) – Reduce synchronization overhead by specifying explicitly
participating processes (non-collective synchronization)
– MPI_Win_post / MPI_Win_wait • Start / end exposure epoch (target process)
– MPI_Win_start / MPI_Win_complete • Start / end remote access epoch (origin process)
03/10/2016 PRACE Autumn School 2016 67
![Page 68: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/68.jpg)
RMA - Synchronization
• Passive target synchronization – MPI_Win_lock / MPI_Win_lock_all
• Acquire shared lock (for reads) or exclusive lock (for writes) for accessing remote memory at specified rank (MPI_Win_lock) or all participating ranks (MPI_Win_lock)
– MPI_Win_unlock / MPI_Win_unlock_all • Release shared or exclusive lock after accessing remote
memory.
03/10/2016 PRACE Autumn School 2016 68
![Page 69: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/69.jpg)
RMA Operations
• Supported operations: – Get: read from remote memory. – Put: write to remote memory. – Accumulate: update remote memory with reduction operation. – Get-and-accumulate: get old value and update. – Compare-and-swap: get old value and overwrite only on successful
comparison.
• Non-blocking (“Request-based”) variants available – Only usable with passive target synchronization.
03/10/2016 PRACE Autumn School 2016 69
![Page 70: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/70.jpg)
RMA Operations • MPI_Get / MPI_Rget
– Blocking / non-blocking (“Request based”) read from remote memory.
• MPI_Put / MPI_Rput
– Blocking / non-blocking write to remote memory.
• MPI_Accumulate / MPI_Raccumulate – Blocking / non-blocking update of remote memory. – All predefined reduction operations can be used.
• MPI_MAX, MPI_MIN, MPI_SUM, MPI_LAND, etc. • See also MPI 3.1 Standard, Section 5.9.2
– Additionally MPI_REPLACE can be used.
03/10/2016 PRACE Autumn School 2016 70
![Page 71: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/71.jpg)
RMA Operations • MPI_Get_accumulate / MPI_Rget_accumulate
– Blocking / Non-blocking update of remote memory, returning original value.
• MPI_Fetch_and_op – Similar to MPI_Get_accumulate
• Less generic (fewer arguments), potentially faster (hardware-support)
• MPI_Compare_and_swap – Atomic compare-and-swap of remote memory
• If current value in target buffer equals given value, it is overwritten. • Original value in target buffer is returned.
03/10/2016 PRACE Autumn School 2016 71
![Page 72: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/72.jpg)
Remote Memory Access
Summary and Advice
03/10/2016 PRACE Autumn School 2016 72
![Page 73: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/73.jpg)
Summary • Remote Memory Access (RMA) is versatile and powerful (but hard to get
right).
• RMA can bring performance advantages in comparison to Point-to-Point communications.
• Three synchronization modes provided – MPI_Win_fence
• Active target synchronization (collective)
– MPI_Win_[post|wait|start|complete] • Active target synchronization (fine-grained)
– MPI_Win_[un]lock[_all] • Passive target synchronization (shared or exclusive locks)
03/10/2016 PRACE Autumn School 2016 73
![Page 74: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/74.jpg)
Advice • Using collective active target synchronization, i.e. using MPI_Win_fence
for separating distinct access epochs is the easiest variant to implement. – Still more than enough opportunities to get things wrong (e.g. race conditions)
• Performance – Pass MPI_Info dictionary when creating the memory window
• E.g. when not using passive target synchronization, pass "no_locks": "true" • See also MPI Standard 3.1, Section 11.2.1
– Use assertions (additional hints for MPI) when using synchronization calls • See also MPI Standard 3.1, Section 11.5.5
• Good luck!
03/10/2016 PRACE Autumn School 2016 74
![Page 75: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/75.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 75
![Page 76: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/76.jpg)
MPI Datatypes
Motivation and Definition
03/10/2016 PRACE Autumn School 2016 76
![Page 77: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/77.jpg)
MPI Datatypes
• MPI Datatypes – A way to describe the exact data layout of send and receive
buffers.
• Data layout – Location and type of primitive data elements in buffer
• Primitive data elements – E.g. integers, single/double precision numbers, characters, etc.
03/10/2016 PRACE Autumn School 2016 77
![Page 78: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/78.jpg)
MPI Datatypes
• Main advantage – MPI operations can directly operate on application data structures.
• No manual buffer packing and unpacking by the user
• Further advantages – Runtime type checking (to some degree)
• Error when data elements in received message are not as expected.
– Data-conversion • E.g. conversion from little to big endian • This aspect is in fact more relevant in the context of MPI-IO
03/10/2016 PRACE Autumn School 2016 78
![Page 79: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/79.jpg)
MPI Datatypes
• Type Signature – Data layout of a message – Given by a sequence of primitive datatypes
• MPI Datatype
– Data layout of a memory buffer – Given by
• Sequence of types of primitive data elements in buffer (= Type Signature) • Sequence of displacements of data elements w.r.t. buffer start address
03/10/2016 PRACE Autumn School 2016 79
![Page 80: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/80.jpg)
MPI Datatypes – Type Matching
• Let S = (s1, s2, s3, …) be the Type Signature of a message M, – E.g. S = (int, int, double)
• Let T = (t1, t2, t2, …) be the Type Signature of the MPI Datatype used in a receive operation R, – E.g. T = (int, int, double, int)
• For successful receipt of M by R, S must be a prefix of T
• In other words: – The primitive data types have to match one-to-one. – The actual message might be shorter than the receive buffer.
03/10/2016 PRACE Autumn School 2016 80
![Page 81: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/81.jpg)
MPI Datatypes
Predefined and Derived Types
03/10/2016 PRACE Autumn School 2016 81
![Page 82: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/82.jpg)
Predefined (primitive) Datatypes
MPI Datatype Corresponding C Datatype
MPI_CHAR char
MPI_INT int
MPI_FLOAT float
MPI_DOUBLE double
MPI_BYTE -
…
03/10/2016 PRACE Autumn School 2016 82
For a full list, see http://mpi-forum.org/docs/mpi-3.1/mpi31-report/node48.htm
![Page 83: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/83.jpg)
Creating Derived Datatypes
1) Use one or more “Type Constructors” MPI_Type_contiguous, MPI_Type_vector, MPI_Type_struct, etc.
2) Commit final type (those ones used in communications or I/O) MPI_Type_commit
3) Free all types (The freed type may still be in use by pending send and receive operations or in other derived types) MPI_Type_free
03/10/2016 PRACE Autumn School 2016 83
![Page 84: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/84.jpg)
MPI Datatypes
Type Constructors
(A selection)
03/10/2016 PRACE Autumn School 2016 84
![Page 85: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/85.jpg)
Contiguous Type (Array)
03/10/2016 PRACE Autumn School 2016 85
int MPI_Type_contiguous( int count, MPI_Datatype oldtype, MPI_Datatype *newtype )
oldtype newtype
count
![Page 86: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/86.jpg)
Vector Type
03/10/2016 PRACE Autumn School 2016 86
int MPI_Type_vector( int count, int blocklength, int stride, MPI_Datatype oldtype, MPI_Datatype *newtype )
oldtype newtype count
blocklength
stride
![Page 87: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/87.jpg)
Indexed Type (1)
03/10/2016 PRACE Autumn School 2016 87
int MPI_Type_indexed( int count, const int blocklengths[], const int displacements[], MPI_Datatype oldtype, MPI_Datatype *newtype )
oldtype
count: 3 blocklengths: {1, 2, 1} displacements: {0, 2, 6}
newtype
0 1 2 3 4 5 6
![Page 88: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/88.jpg)
Indexed Type (2)
03/10/2016 PRACE Autumn School 2016 88
oldtype
count: 4 blocklength: 1 displacements: {0, 2, 3, 6}
newtype
0 1 2 3 4 5 6
int MPI_Type_create_indexed_block( int count, int blocklength, const int displacements[], MPI_Datatype oldtype, MPI_Datatype *newtype )
![Page 89: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/89.jpg)
Structured Type
03/10/2016 PRACE Autumn School 2016 89
int MPI_Type_create_struct( int count, const int blocklengths[], const MPI_Aint displacements[], const MPI_Datatype types[], MPI_Datatype *newtype )
type1
type2
type3
count: 3 blocklengths: {2, 1, 3} displacements: {0, 8, 16} types: {type1, type2, type3}
newtype
8 byte 8 byte
![Page 90: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/90.jpg)
Subarray Type
03/10/2016 PRACE Autumn School 2016 90
int MPI_Type_create_subarray( int ndims, const int sizes[], const int subsizes[], const int starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype )
oldtype
newtype
0 1 2 3 4
0
1 2
3
ndims: 2 sizes: {5, 4} subsizes: {2, 3} starts: {2, 1} order: MPI_ORDER_FORTRAN oldtype buffer[4][5];
![Page 91: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/91.jpg)
Subarray Type
03/10/2016 PRACE Autumn School 2016 91
int MPI_Type_create_subarray( int ndims, const int sizes[], const int subsizes[], const int starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype )
oldtype
newtype
0 1 2 3 4
0
1 2
3
ndims: 2 sizes: {5, 4} subsizes: {2, 3} starts: {2, 1} order: MPI_ORDER_C oldtype buffer[5][4];
![Page 92: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/92.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 92
![Page 93: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/93.jpg)
Collective Communication
Motivation and Overview
03/10/2016 PRACE Autumn School 2016 93
![Page 94: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/94.jpg)
Motivation • Thinking in terms of global communication patterns
(of groups of processes) is in general easier than thinking in terms of Point-to-Point messages (which message has to go where?).
• Advantages of Collective Communications (compared to Point-to-Point) – Performance
• MPI can choose the best from multiple communication algorithms
– Maintainability • Code is often easer to read and understand
03/10/2016 PRACE Autumn School 2016 94
![Page 95: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/95.jpg)
Overview
• MPI defines 9 types of collective operations:
– Barrier – Broadcast – Gather – All-Gather – Scatter – All-to-all
• New in MPI-3: – Non-blocking variants for all collective operations.
– Reduce – All-Reduce – Reduce-Scatter – Scan
03/10/2016 PRACE Autumn School 2016 95
![Page 96: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/96.jpg)
Collective Communication
Collective Operation Semantics
03/10/2016 PRACE Autumn School 2016 96
![Page 97: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/97.jpg)
Barrier
• Synchronizes all processes within the Communicator comm.
• Completion Semantics – All processes in comm have entered the barrier.
• MPI makes no guarantees on how long it will take other processes to leave the barrier.
• MPI Barriers are not appropriate for highly accurate time measurements.
03/10/2016 PRACE Autumn School 2016 97
int MPI_Barrier(MPI_Comm comm)
![Page 98: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/98.jpg)
Broadcast
03/10/2016 PRACE Autumn School 2016 98
int MPI_Bcast( void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm )
Proc. 0
Proc. 1
Proc. 2 A
Proc. 3
Proc. 0 A
Proc. 1 A
Proc. 2 A
Proc. 3 A
count: 1 datatype: ??? root: 2
![Page 99: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/99.jpg)
Gather
03/10/2016 PRACE Autumn School 2016 99
int MPI_Gather( const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm )
Proc. 0 A
Proc. 1 B
Proc. 2 C
Proc. 3 D
Proc. 0 A
Proc. 1 B
Proc. 2 C
Proc. 3 D
A B C D
sendcount = recvcount: 1 sendtype = recvtype: ? root: 2
![Page 100: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/100.jpg)
All-Gather
03/10/2016 PRACE Autumn School 2016 100
int MPI_Allgather( const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm )
Proc. 0 A
Proc. 1 B
Proc. 2 C
Proc. 3 D
Proc. 0 A
Proc. 1 B
Proc. 2 C
Proc. 3 D
A B C D
A B C D
A B C D
A B C D
![Page 101: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/101.jpg)
Scatter
03/10/2016 PRACE Autumn School 2016 101
int MPI_Scatter( const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm )
Proc. 0
Proc. 1
Proc. 2
Proc. 3
A B C D
A
B
C
D
Proc. 0
Proc. 1
Proc. 2
Proc. 3
A B C D
sendcount = recvcount: 1 sendtype = recvtype: ? root: 2
![Page 102: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/102.jpg)
All-to-all
03/10/2016 PRACE Autumn School 2016 102
int MPI_Alltoall( const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm )
Proc. 0
Proc. 1
Proc. 2
A B C
D E F
G H I
Proc. 0
Proc. 1
Proc. 2
A B C
D E F
G H I
A D G
B E H
C F I
![Page 103: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/103.jpg)
Reduce
03/10/2016 PRACE Autumn School 2016 103
int MPI_Reduce( const void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm )
Proc. 0
Proc. 1
Proc. 2
A B
C D
E F
Proc. 0
Proc. 1
Proc. 2
A B
C D
A+C+E
E
B+D+F
F
count: 2 op: MPI_SUM root: 2
![Page 104: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/104.jpg)
Reduce
• Supported Reduction operations: – Predefined
• MPI_MIN, MPI_MAX, MPI_SUM, MPI_LAND, etc. • Full list see MPI 3.1 Standard, Section 5.9.2 • http://mpi-forum.org/docs/mpi-3.1/mpi31-report/node112.htm
– User-defined
• Use MPI_Op_create (register callback function) and MPI_Op_free.
03/10/2016 PRACE Autumn School 2016 104
![Page 105: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/105.jpg)
All-Reduce
03/10/2016 PRACE Autumn School 2016 105
int MPI_Allreduce( const void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm ) count: 2
op: MPI_SUM
Proc. 2
Proc. 0
Proc. 1
A B
C D
E F Proc. 2
Proc. 0
Proc. 1
A+C+E A
B+D+F B
A+C+E C
B+D+F D
A+C+E E
B+D+F F
![Page 106: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/106.jpg)
Reduce-Scatter
03/10/2016 PRACE Autumn School 2016 106
int MPI_Reduce_scatter_block( const void* sendbuf, void* recvbuf, int recvcount, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm )
Proc. 0 A B C D E F
Proc. 1 G H I J K L
Proc. 2 M N O P Q R
Proc. 0
Proc. 1
Proc. 2
A+G+M B+H+N
C+I+O D+J+P
E+K+Q F+L+R
recvcount: 2 sendcount: 2*nprocs = 6 op: MPI_SUM
![Page 107: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/107.jpg)
Scan (inclusive)
03/10/2016 PRACE Autumn School 2016 107
int MPI_Scan( const void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm ) count: 2
op: MPI_SUM
Proc. 2
Proc. 0
Proc. 1
A B
C D
E F Proc. 2
Proc. 0
Proc. 1
A A
B B
A+C C
B+D D
A+C+E E
B+D+F F
![Page 108: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/108.jpg)
Scan (exclusive)
03/10/2016 PRACE Autumn School 2016 108
int MPI_ExScan( const void* sendbuf, void* recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm )
count: 2 op: MPI_SUM
Proc. 2
Proc. 0
Proc. 1
A B
C D
E F Proc. 2
Proc. 0
Proc. 1
A B
A C
B D
A+C E
B+D F
![Page 109: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/109.jpg)
Collective Communication
Variants
03/10/2016 PRACE Autumn School 2016 109
![Page 110: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/110.jpg)
Variants • For all operations where it makes sense,
variants for varying send/receive counts (varying block-lengths) are defined. – For all-to-all, even a variant with varying send/receive types is provided.
• For many operations, MPI_IN_PLACE (predefined constant) can be used as send buffer argument – Send data (read from receive buffer) is overwritten “in-place”.
• For all operations, non-blocking variants are available.
– Following the already known naming scheme: • Prefix “I” for “immediate” return, e.g. MPI_Ibcast
– Checking for completion: • MPI_Request as output argument • Same *Wait* and *Test* routines as for non-blocking P2P
03/10/2016 PRACE Autumn School 2016 110
![Page 111: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/111.jpg)
Variants
Fixed Block-length Varying Block-lengths
MPI_Gather MPI_Gatherv
MPI_Allgather MPI_Allgatherv
MPI_Scatter MPI_Scatterv
MPI_Alltoall MPI_Alltoallv
MPI_Reduce_scatter_block MPI_Reduce_scatter
03/10/2016 PRACE Autumn School 2016 111
Fixed Block-length & Type Varying Block-lengths and Types
MPI_Alltoall MPI_Alltoallw
![Page 112: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/112.jpg)
Collective Communication
Correctness Issues
03/10/2016 PRACE Autumn School 2016 112
![Page 113: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/113.jpg)
Correctness Issues (1/2)
• Amount of data sent must exactly match amount of data specified by the receiver. – More strict than in P2P communications, where actual message may
be shorter than receive buffer
• Collective operations have to be issued in the same order by all
participating processes.
• Except for MPI_Barrier, users should not rely on any synchronization effects other than given by data-dependencies.
03/10/2016 PRACE Autumn School 2016 113
![Page 114: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/114.jpg)
Correctness Issues (2/2)
• Multiple active non-blocking collective operations are allowed.
– E.g. MPI_Ibcast – MPI_Ibcast – MPI_Waitall is correct.
• Blocking and non-blocking collective operations do not match.
– E.g. MPI_Ibcast on process A will not match MPI_Bcast on process B.
• Blocking and non-blocking collective operations can be interleaved.
– E.g. MPI_Ibarrier – MPI_Bcast – MPI_Wait is correct.
03/10/2016 PRACE Autumn School 2016 114
![Page 115: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/115.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 115
![Page 116: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/116.jpg)
Communicator Creation
Basic Communicator Constructurs
03/10/2016 PRACE Autumn School 2016 116
![Page 117: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/117.jpg)
Communicator Creation
• MPI_Comm_dup – Creates a new Communicator containing the same
processes with unchanged ordering of ranks but provides a different context for message transfer.
• MPI_Comm_split – Partitions a given Communicator into subgroups of
processes. – Useful for collective operations on subgroups of processes.
03/10/2016 PRACE Autumn School 2016 117
![Page 118: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/118.jpg)
Duplication
03/10/2016 PRACE Autumn School 2016 118
int MPI_Comm_dup(MPI_Comm comm, MPI_Comm *newcomm)
comm
new_comm
0
0
1
1
2
2
3
3
![Page 119: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/119.jpg)
Partition
03/10/2016 PRACE Autumn School 2016 119
int MPI_Comm_split( MPI_Comm comm, int color, int key, MPI_Comm *newcomm )
comm
new_comm (1)
0
0
1
1
2
0
3
1
4
2
new_comm (2)
color: rank < 2 ? 0 : 1 key: 0
![Page 120: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/120.jpg)
Partition
03/10/2016 PRACE Autumn School 2016 120
int MPI_Comm_split( MPI_Comm comm, int color, int key, MPI_Comm *newcomm )
comm
new_comm (1)
0
1
1
0
2
2
3
1
4
0
new_comm (2)
color: rank < 2 ? 0 : 1 key: -rank
![Page 121: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/121.jpg)
Inter-Communicator
03/10/2016 PRACE Autumn School 2016 121
int MPI_Intercomm_create( MPI_Comm local_comm, int local_leader, MPI_Comm peer_comm, int remote_leader, int tag, MPI_Comm *intercomm )
local_comm (1) local_comm (2)
0
0
0
2
2
0
3
1
4
2
peer_comm
intercomm 5
3
local_leader: 0 remote_leader: rank < 2 ? 0 : 2
![Page 122: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/122.jpg)
Inter-Communicator
• For exact semantics of collective operations on Inter-Communicators, see
– MPI 3.1 Standard, Section 6.6
– or http://mpi-forum.org/docs/mpi-3.1/mpi31-report/node166.htm
03/10/2016 PRACE Autumn School 2016 122
![Page 123: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/123.jpg)
Communicator Creation
Process Topologies
03/10/2016 PRACE Autumn School 2016 123
![Page 124: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/124.jpg)
Process Topologies
• Equipping Communicators with additional “virtual topology” (neighborhood relations between processes)
• Advantages – Certain communication patterns can be “naturally”
expressed in terms of Neighborhood collective operations
– MPI can select a good embedding of the virtual topology to physical machine (if user allows reordering of ranks)
03/10/2016 PRACE Autumn School 2016 124
![Page 125: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/125.jpg)
Cartesian Communicator
03/10/2016 PRACE Autumn School 2016 125
int MPI_Cart_create(MPI_Comm comm, int ndims, const int dims[], const int periods[], int reorder, MPI_Comm *newcomm )
0
0
0
3
2
2
0
5
1
1
0
4
ndims: 2 dims: {3, 2} reorder: 0 (false)
![Page 126: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/126.jpg)
Cartesian Communicator
03/10/2016 PRACE Autumn School 2016 126
int MPI_Cart_create(MPI_Comm comm, int ndims, const int dims[], const int periods[], int reorder, MPI_Comm *newcomm )
?
0
?
3
?
2
?
5
?
1
?
4
ndims: 2 dims: {3, 2} reorder: 1 (true)
![Page 127: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/127.jpg)
Convenience functions • MPI_Dims_create:
– Helps selecting a balanced distribution of processes per coordinate direction • Inputs: nnodes, ndims, constraints (optional)
• MPI_Cart_rank: – Get rank by coordinates
• MPI_Cart_coords: – Get coordinates by rank
• MPI_Cart_shift:
– Compute source and destination ranks for MPI_Sendrecv • Inputs: index of dimension, shift offset
03/10/2016 PRACE Autumn School 2016 127
![Page 128: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/128.jpg)
Graph Communicator
03/10/2016 PRACE Autumn School 2016 128
int MPI_Dist_graph_create_adjacent(MPI_Comm comm, int indegree, const int srcs[], const int srcweights[], int outdegree, const int dsts[], const int dstweights[], MPI_Info info, int reorder, MPI_Comm *newcomm )
indegree: 3 srcs: {0, 2, 4} srcweights: MPI_UNWEIGHTED outdegree: 2 dsts: {1, 4} dstweights: MPI_UNWEIGHTED
2
1
4
0
3
![Page 129: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/129.jpg)
Graph Communicator
03/10/2016 PRACE Autumn School 2016 129
int MPI_Dist_graph_create_adjacent(MPI_Comm comm, int indegree, const int srcs[], const int srcweights[], int outdegree, const int dsts[], const int dstweights[], MPI_Info info, int reorder, MPI_Comm *newcomm )
indegree: 3 srcs: {0, 2, 4} srcweights: {2, 1, 1} outdegree: 2 dsts: {1, 4} dstweights: {1, 2}
2
1
4
0
3
![Page 130: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/130.jpg)
Graph Communicator
• More functions for Creating Graph Topologies
– See MPI 3.1 Standard, Section 7.5.3
– http://mpi-forum.org/docs/mpi-3.1/mpi31-report/node195.htm
03/10/2016 PRACE Autumn School 2016 130
![Page 131: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/131.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 131
![Page 132: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/132.jpg)
Neighborhood Collectives
Gather and All-to-all
03/10/2016 PRACE Autumn School 2016 132
![Page 133: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/133.jpg)
Neighborhood Gather
03/10/2016 PRACE Autumn School 2016 133
int MPI_Neighbor_allgather( const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm )
A
A A
A
Typical case: • sendcount = recvcount • sendtype = recvtype Notes: • Length of sendbuf = sendcount • Length of recvbuf = recvcount * indegree
![Page 134: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/134.jpg)
Neighborhood All-to-all
03/10/2016 PRACE Autumn School 2016 134
D
B C
A
int MPI_Neighbor_alltoall( const void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm )
Typical case: • sendcount = recvcount • sendtype = recvtype Notes: • Length of sendbuf = sendcount * outdegree • Length of recvbuf = recvcount * indegree
![Page 135: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/135.jpg)
Variants
Fixed Block-length Varying Block-lengths
MPI_Neighbor_allgather MPI_Neighbor_allgatherv
MPI_Neighbor_alltoall MPI_Neighbor_alltoallv
03/10/2016 PRACE Autumn School 2016 135
Fixed Block-length & Type Varying Block-lengths and Types
MPI_Neighbor_alltoall MPI_Neighbor_alltoallw
![Page 136: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/136.jpg)
Variants
• Also non-blocking Variants provided
MPI_Ineighbor_*
• Completion checking and correctness rules same as for “normal” collective communication operations
03/10/2016 PRACE Autumn School 2016 136
![Page 137: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/137.jpg)
Collective Communication
Summary and Advice
03/10/2016 PRACE Autumn School 2016 137
![Page 138: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/138.jpg)
Summary
• MPI provides powerful high-level abstractions for a large variety of communication patterns. – MPI may use heuristics to dynamically choose a good algorithm for
given operation.
• Process topologies allow a good mapping of the virtual topology of
a given problem to the physical machine. – Currently most MPI Implementations do not yet exploit this
optimization potential. – This will change in future.
03/10/2016 PRACE Autumn School 2016 138
![Page 139: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/139.jpg)
Advice • Prefer Collective operations (on Topology-, Intra- or Inter-
Communicators) to Point-to-Point communications when possible. – In many cases you’ll get better (or at least similar) performance. – In some cases however, P2P and RMA are the right choices:
• E.g. Dynamic Sparse Data Exchange
• Experiment with different collective operations – Consult current MPI Standard documents for description of the
operations
03/10/2016 PRACE Autumn School 2016 139
![Page 140: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/140.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 140
![Page 141: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/141.jpg)
Shared Memory Windows
Creation and Access
03/10/2016 PRACE Autumn School 2016 141
![Page 142: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/142.jpg)
Shared Memory Window Creation
• MPI_Comm_split_type – Create a Communicator C containing all processes “living” on the same
node – Pass split_type = MPI_COMM_TYPE_SHARED.
• Other values for split_type may exist (e.g. processes “living” on one socket), but are implementation dependent.
• MPI_Win_allocate_shared – Create Shared Memory Window on the “intra-node” Communicator C – Can be used for RMA calls
03/10/2016 PRACE Autumn School 2016 142
![Page 143: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/143.jpg)
Shared Memory Window Creation
03/10/2016 PRACE Autumn School 2016 143
int MPI_Win_allocate_shared( MPI_Aint size, int disp_unit, MPI_Info info, MPI_Comm comm, void *baseptr, MPI_Win *win )
size (Rank 0)
baseptr (Rank 0)
size (Rank 1)
baseptr (Rank 1)
size (Rank 2)
baseptr (Rank 2)
![Page 144: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/144.jpg)
Shared Memory Window Creation
03/10/2016 PRACE Autumn School 2016 144
size (Rank 0)
baseptr (Rank 0)
size (Rank 1)
baseptr (Rank 1)
size (Rank 2)
baseptr (Rank 2)
Padding info: { "alloc_shared_noncontig"="true", ... }
int MPI_Win_allocate_shared( MPI_Aint size, int disp_unit, MPI_Info info, MPI_Comm comm, void *baseptr, MPI_Win *win )
![Page 145: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/145.jpg)
Shared Memory Window Access
• 3 Possibilities: – Direct address arithmetic based on baseptr
• Works only if memory is contiguous
– Use MPI_Win_shared_query for getting base pointers to memory chunks of other ranks.
– Use MPI_Put / MPI_Get
03/10/2016 PRACE Autumn School 2016 145
![Page 146: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/146.jpg)
Shared Memory Windows
Summary and Advice
03/10/2016 PRACE Autumn School 2016 146
![Page 147: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/147.jpg)
Summary
• Shared Memory windows can be useful in many scenarios.
– E.g. large replicated read-only data structures can be stored once per node.
• Memory saving / Alternative to threading
03/10/2016 PRACE Autumn School 2016 147
![Page 148: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/148.jpg)
Advice
• Allowing for non-contiguous layout of memory window may improve performance – Especially on non-uniform-memory-access (NUMA)
machines • Local memory chunks can be allocated near to the “owning”
process
• Thus consider specifying in MPI_Info object the entry • "alloc_shared_noncontig"="true"
03/10/2016 PRACE Autumn School 2016 148
![Page 149: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/149.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 149
![Page 150: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/150.jpg)
MPI I/O
Just one Remark
03/10/2016 PRACE Autumn School 2016 150
![Page 151: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/151.jpg)
MPI I/O
• For full reference see – MPI 3.1 Standard, Chapter 13 – http://mpi-forum.org/docs/mpi-3.1/mpi31-
report/node305.htm
• Consider also using higher-level libraries for I/O (often based on MPI-I/O) – E.g. Parallel HDF5, Parallel NetCDF, etc.
03/10/2016 PRACE Autumn School 2016 151
![Page 152: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/152.jpg)
Outline
1. Overview
2. MPI Basics
3. P2P Communication
4. RMA
5. Datatypes
6. Collective Communication
7. Communicators
8. Sparse Collectives
9. Shared Memory
10. Parallel I/O
11. Conclusion
03/10/2016 PRACE Autumn School 2016 152
![Page 153: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/153.jpg)
Conclusion
Overall Summary and General Advice
03/10/2016 PRACE Autumn School 2016 153
![Page 154: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/154.jpg)
Conclusion • MPI is much more than message passing.
– Remote and Shared Memory Access – Collective operations on Topology-, Intra- or Inter-Communicators – Scalable I/O – and more…
• Topics not covered in this talk (with reference to MPI 3.1 Standard): – Persistent communication requests (Section 3.9) – Process groups (Chapter 6) – MPI Environmental Management, in particular Error Handling (Chapter 8) – (Dynamic) Process Creation and Management (Chapter 9) – MPI I/O (Chapter 13)
03/10/2016 PRACE Autumn School 2016 154
![Page 155: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/155.jpg)
Alternatives to MPI
• Partitioned Global Address Space (PGAS) languages and libraries, e.g. – Chapel (Cray) – Co-Array Fortran – UPC (Unified Parallel C) – …
• Promising approaches, might be worth a try. – (but MPI is still predominant in HPC and it looks like this won’t change
in the nearer future).
03/10/2016 PRACE Autumn School 2016 155
![Page 156: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/156.jpg)
General Advice • Before rushing into programming with MPI,
– Consider using higher-level libraries (built on top of MPI) which potentially could serve your use-case.
– Examples: • PETSc (http://www.mcs.anl.gov/petsc/) • libMesh (http://libmesh.github.io/) • And many many many more…
– E.g. for Linear Algebra: http://www.netlib.org/utk/people/JackDongarra/la-sw.html
• Documentation:
– Use MPI Standard documents as complementary (if not primary) source. – Good books are out there (e.g. see next slide).
03/10/2016 PRACE Autumn School 2016 156
![Page 157: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/157.jpg)
References
• MPI Standard Documents: http://mpi-forum.org/docs/docs.html
• In particular MPI-3.1 Standard: http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
• William Gropp, Ewing Lusk, Anthony Skjellum, “Using MPI: Portable Parallel Programming with the Message-passing Interface, Volume 1” MIT Press, 1999
• William Gropp, Ewing Lusk, Rajeev Thakur, “Using MPI-2: Advanced Features of the Message-passing Interface”, Globe Pequot Press, 1999
• William Gropp, Torsten Hoefler, Rajeev Thakur, Ewing Lusk “Using Advanced MPI: Modern Features of the Message-Passing Interface”, MIT Press, 2014
03/10/2016 PRACE Autumn School 2016 157
![Page 158: Parallel Programming with MPI - Prace Training Portal: Events...•Multiple implementations of MPI exist. ... – MPI 3.0 is a major update and introduces • Non-blocking collective](https://reader033.fdocuments.us/reader033/viewer/2022041804/5e539ede3687333cd03d6bdb/html5/thumbnails/158.jpg)
Advertisement
• Have a look at the PRACE Code Vault:
– http://www.prace-ri.eu/prace-codevault/
– Feedback and even Code Contributions are warmly welcome!
03/10/2016 PRACE Autumn School 2016 158