Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used...
-
Upload
francis-campbell -
Category
Documents
-
view
234 -
download
2
Transcript of Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used...
Parallel Programming Models
Message Passing Model
Used on Distributed memory MIMD architectures
Multiple processes execute in parallel asynchronously• Process creation may be static or dynamic
Processes communicate by using send and receive primitives
Parallel Programming Models
Example: Pi calculation
f01 f(x) dx = f0
1 4/(1+x2) dx = w ∑ f(xi)
f(x) = 4/(1+x2)
n = 10
w = 1/n
xi = w(i-0.5)
x
f(x)
0 0.1 0.2 xi 1
Parallel Programming ModelsSequential Code
#define f(x) 4.0/(1.0+x*x);
main(){int n,i;float w,x,sum,pi;
printf(“n?\n”);scanf(“%d”, &n);w=1.0/n;sum=0.0;for (i=1; i<=n; i++){
x=w*(i-0.5);sum += f(x);
}pi=w*sum;printf(“%f\n”, pi);
}
= w ∑ f(xi) f(x) = 4/(1+x2) n = 10 w = 1/nxi = w(i-0.5)
x
f(x)
0 0.1 0.2 xi 1
SPMD Parallel MPI Code
#include <stdio.h> #include <mpi.h>#define f(x) 4.0/(1.0+x*x)
main(int argc, char * argv[]){int myid, nproc, root, err;int n, i, start, end;float w, x, sum, pi;
err = MPI_Init(&argc, &argv);if (err != MPI_SUCCESS) {
printf(stderr, “initialization error\n”);exit(1);
}MPI_Comm_size(MPI_COMM_WORLD, &nproc);MPI_Comm_rank(MPI_COMM_WORLD, &myid);root=0;if (myid == root) {
f1=fopen(“indata”, “r”);fscanf(f1, “%d”, &n);fclose(f1);
}MPI_Bcast(&n, 1, MPI_INT, root, MPI_COMM_WORLD);w=1.0/n;sum=0.0;start = myid*(n/nproc);end = (myid+1)*(n/nproc);for (i=start; i<end; i++){
x = w*(i-0.5);sum += f(x);
}MPI_Reduce(&sum, &pi, MPI_FLOAT, MPI_SUM, root, MPI_COMM_WORLD);if (myid == root) {
f1=fopen(“outdata”, “w”);fprintf(f1, “pi=%f”, &pi);fclose(f1);
}MPI_Finalize();
}
Message-Passing Interface (MPI)
MPI_INIT(int *argc, char ***argv): Initiate an MPI computation. MPI_FINALIZE(): Terminate a computation.
MPI_COMM_SIZE (comm, size): Determine number of processes. MPI_COMM_RANK(comm, pid): Determine my process identifier.
MPI_SEND(buf, count, datatype, dest, tag, comm): Send a message. MPI_RECV(buf, count, datatype, source, tag, comm, status): Receive a message.
• tag: message tag or MPI_ANY_TAG
• source: process id of source process or MPI_ANY_SOURCE
Message-Passing Interface (MPI)
Deadlock: MPI_SEND and MPI_RECV are blocking.
Consider the program where the two processes exchange data:
...if (rank .eq. 0) then
call mpi_send( abuf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, ierr )call mpi_recv( buf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, &status, ierr )
else if (rank .eq. 1) thencall mpi_send( abuf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, ierr )call mpi_recv( buf, n, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, &status, ierr )
endif
Message-Passing Interface (MPI)
Communicators
If two processes use different contexts for communication, there can be no danger of their communication being confused.
Each MPI communicator contains a separate communication context; this defines a separate virtual communication space.
Communicator Handle: identifies the process group and context with respect to which the operation is to be performed
MPI_COMM_WORLD: contains all the processes in a parallel computation
Message-Passing Interface (MPI)
Collective Operations
These operations are all executed in a collective fashion, meaning that each process in a process group calls the communication routine
Barrier: Synchronize all processes. Broadcast: Send data from one process to all processes. Gather: Gather data from all processes to one process. Scatter: Scatter data from one process to all processes. Reduction operations: addition, multiplication, etc. of distributed
data.
Message-Passing Interface (MPI)
Collective Operations
MPI_BCAST (inbuf, incnt, intype, root, comm): 1-to-all
Ex: MPI_BCAST(A, 5, MPI_INT, 0, MPI_COMM_WORLD);
A0A1A2A3A4
P0P1
P2
P3
P0
A0A1A2A3A4
A0A1A2A3A4
A0A1A2A3A4
A0A1A2A3A4
Message-Passing Interface (MPI)
Collective Operations
MPI_SCATTER (inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm): 1-to-all
Ex: int A[100], B[25];MPI_SCATTER(A, 25, MPI_INT, B, 25, MPI_INT, 0, MPI_COMM_WORLD);
A
A0A1A2A3
P0P1
P2
P3
P0
B
A0
A1
A2
A3
Message-Passing Interface (MPI)
Collective Operations
MPI_GATHER (inbuf, incnt, intype, outbuf, outcnt, outtype, root, comm): all-to-1
Ex: int A[100], B[25];MPI_GATHER(B, 25, MPI_INT, A, 25, MPI_INT, 0, MPI_COMM_WORLD);
A
B0B1B2B3
P0P1
P2
P3
P0
B
B0
B1
B2
B3
Message-Passing Interface (MPI)
Collective Operations
Reduction operations: Combine the values in the input buffer of each process using an operator
Operations: MPI_MAX, MPI_MIN MPI_SUM, MPI_PROD MPI_LAND, MPI_LOR, MPI_LXOR (logical) MPI_BAND, MPI_BOR, MPI_BXOR (bitwise)
Message-Passing Interface (MPI)
Collective Operations
MPI_REDUCE (inbuf, outbuf, count, type, op, root, comm)
Returns the combined value to the output buffer of a single root process
Ex: int A[2], B[2];MPI_REDUCE(A, B, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
P0P1
P2
P3
P0
5 7
A
2 4
0 3
6 2
B
0 2
5 7
A
2 4 0 3 6 2
B
0 2
min
Message-Passing Interface (MPI)
Collective Operations
MPI_ALLREDUCE (inbuf, outbuf, count, type, op, comm)
Returns the combined value to the output buffers of all processes
Ex: int A[2], B[2];MPI_ALLREDUCE(A, B, 2, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
P1
P2
P3
P0
5 7
A
2 4
0 3
6 2
5 7
A
2 4 0 3 6 2
B
0 2
minP1
P2
P3
P0
0 2
B
0 2
0 2
0 2
Message-Passing Interface (MPI)
Asynchronous Communication
Data is distributed among processes which must then poll periodically for pending read and write requests
Local computation may interleave with the processing of incoming messages
Non-blocking send/receive
MPI_ISEND (buf, count, datatype, dest, tag, comm): Send a message. MPI_IRECV (buf, count, datatype, source, tag, comm, status): Receive a message. MPI_WAIT (MPI_Request *request, MPI_Status *status): Complete a non-blocking
operation
Message-Passing Interface (MPI)
Asynchronous Communication
MPI_IPROBE (source, tag, comm, flag, status): Polls for a pending message without receiving it, and sets a flag. The message can then be received by using MPI_RECV.
MPI_PROBE (source, tag, comm, status): Blocks until the message is available.
MPI_GET_COUNT (status, datatype, count): Determines size of the message.
status (must be set by a previous probe):
• status.MPI_SOURCE
• status.MPI_TAG
Message-Passing Interface (MPI)
Asynchronous Communication
Ex:int count, *buf, source;
MPI_PROBE (MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);
source = status.MPI_SOURCE;
MPI_GET_COUNT(status, MPI_INT, count);
buf = malloc(count*sizeof(int));
MPI_RECV (buf, count, MPI_INT, source, 0, MPI_COMM_WORLD, &status);
Message-Passing Interface (MPI)
Communicators
Communicator Handle: identifies the process group and context with respect to which the operation is to be performed
MPI_COMM_WORLD: contains all the processes in a parallel computation (default)
New communicators are formed by either including or excluding processes from an existing communicator.
MPI_COMM_SIZE() : Determine number of processes. MPI_COMM_RANK() : Determine my process identifier.
Message-Passing Interface (MPI)
Communicators
MPI_COMM_DUP (comm, newcomm): creates a new handle for the same process group
MPI_COMM_SPLIT (comm, color, key, newcomm): creates a new handle for a subset of a given process group
MPI_INTERCOMM_CREATE (comm, leader, peer, rleader, tag, inter): links processes in two groups
MPI_COMM_FREE (comm): destroys a handle
Message-Passing Interface (MPI)
Communicators
Ex: Two processes communicating with a new handle
MPI_COMM newcomm;
MPI_COMM_DUP (MPI_COMM_WORLD, newcomm);
if (myid == 0) MPI_SEND (A, 100, MPI_INT, 1, 0, newcomm);
else MPI_RECV (A, 100, MPI_INT, 0, 0, newcomm);
MPI_COMM_FREE (newcomm);
Message-Passing Interface (MPI)Communicators
Ex: Creating a new group with 4 members
MPI_COMM comm, newcomm;int myid, color;...MPI_COMM_RANK (comm, &myid);if (myid<4) color=1;else color=MPI_UNDEFINED;MPI_COMM_SPLIT (comm, color, myid, &newcomm);MPI_SCATTER (A, 10, MPI_INT, B, 10, MPI_INT, 0, newcomm);
Processes: P0 P1 P2 P3 P4 P5 P6 P7
Ranks incomm: 0 1 2 3 4 5 6 7
Color: 1 1 1 1 ? ? ? ?
Ranks innewcomm: 0 1 2 3
Message-Passing Interface (MPI)Communicators
Ex: Splitting processes into 3 independent groups
MPI_COMM comm, newcomm;int myid, color;...MPI_COMM_RANK (comm, &myid);color = myid % 3;MPI_COMM_SPLIT (comm, color, myid, &newcomm);
Processes: P0 P1 P2 P3 P4 P5 P6 P7
Ranks incomm: 0 1 2 3 4 5 6 7
Color: 0 1 2 0 1 2 0 1
Ranks innewcomm: 0 1 2 0 1 2 0 1
Message-Passing Interface (MPI)
Communicators
MPI_INTERCOMM_CREATE (comm, local_leader, peer_comm, remote_leader, tag, intercomm): links processes in two groups
comm: intracommunicator (within group) local_leader: leader within the group peer_comm: parent communicator remote_leader: other groups’ leader within the parent communicator
Message-Passing Interface (MPI)Communicators
Ex: Communication of processes in two different groups
MPI_COMM newcomm, intercomm;int myid, color;...MPI_COMM_SIZE (MPI_COMM_WORLD, &count);if (count % 2 == 0){
MPI_COMM_RANK (MPI_COMM_WORLD, &myid); color = myid % 2;MPI_COMM_SPLIT (MPI_COMM_WORLD, color, myid, &newcomm);MPI_COMM_RANK (newcomm, &newid); if (newid % 2 == 0){ // group 0
MPI_INTERCOMM_CREATE(newcomm, 0, MPI_COMM_WORLD, 1, 99, intercomm);MPI_SEND (msg, 1, type, newid, 0, intercomm);
}else { // group 1
MPI_INTERCOMM_CREATE(newcomm, 0, MPI_COMM_WORLD, 0, 99, intercomm);MPI_RECV (msg, 1, type, newid, 0, intercomm, &status);
}}MPI_COMM_FREE (intercomm);MPI_COMM_FREE (newcomm);
local_leaderremote_leader
local_leader remote_leaderdestination
P0 P1P2 P3P4 P5P6 P7
Message-Passing Interface (MPI)Communicators
Ex: Communication of processes in two different groups
Processes: P0 P1 P2 P3 P4 P5 P6 P7
Rank in
MPI_COMM_WORLD: 0 1 2 3 4 5 6 7
Processes: P0 P2 P4 P6 P1 P3 P5 P7
Rank in
MPI_COMM_WORLD: 0 2 4 6 1 3 5 7
Rank in
newcomm: 0 1 2 3 0 1 2 3
local_leader remote_leader
newcomm newcomm
local_leaderremote_leader
Message-Passing Interface (MPI)
Derived Types
Allow noncontiguous data elements to be grouped together in a message.
Constructor functions:
MPI_TYPE_CONTIGUOUS (): constructs data type from contiguous elements MPI_TYPE_VECTOR (): constructs data type from blocks separated by stride MPI_TYPE_INDEXED (): constructs data type with variable indices and sizes MPI_TYPE_COMMIT (): commit data type so that it can be used in
communication MPI_TYPE_FREE (): used to reclaim storage
Message-Passing Interface (MPI)
Derived Types
MPI_TYPE_CONTIGUOUS (count, oldtype, newtype): constructs data type from contiguous elements
Ex: MPI_TYPE_CONTIGUOUS (10, MPI_REAL, &newtype);
MPI_TYPE_VECTOR (count, blocklength, stride, oldtype, newtype): constructs data type from blocks separated by stride
Ex: MPI_TYPE_VECTOR (5, 1, 4, MPI_FLOAT, &floattype);
MemoryA
Message-Passing Interface (MPI)
Derived Types
MPI_TYPE_INDEXED (count, blocklengths, indices, oldtype, newtype): constructs data type with variable indices and sizes
Ex: MPI_TYPE_INDEXED (3, BLenghts, Indices, MPI_INT, &newtype);
Blengths 2 3 1
Indices 1 5 10
Data 0 1 2 3 4 5 6 7 8 9 10
Block 0 Block 1 Block 2
Message-Passing Interface (MPI)
Derived Types
MPI_TYPE_COMMIT (type): commit data type so that it can be used in communication
MPI_TYPE_FREE (type): used to reclaim storage