2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message...

2.1

Collective Communication

Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations:

• MPI_BCAST() - Broadcast from root to all other processes• MPI_GATHER() - Gather values for group of processes• MPI_SCATTER() - Scatters buffer in parts to group of processes• MPI_REDUCE() - Combine values on all processes to single

value

2.2

BroadcastSending same message to all processes concerned with problem.Multicast - sending same message to defined group of processes.

bcast();

buf

bcast();

data

bcast();

datadata

Process 0 Process p - 1Process 1

Action

Code

MPI form

2.3

Broadcast Illustrated

2.4

Broadcast (MPI_BCAST)One-to-all communication: same data sent from rootprocess to all others in the communicator

MPI_BCAST(data, size, MPI_Datatype,root,MPI_COMM)

• All processes must specify same root, rank and comm

2.5

Reduction (MPI_REDUCE)The reduction operation allow to:• Collect data from each process• Reduce the data to a single value• Store the result on the root processes• Store the result on all processes• Reduction function works with arrays• Operations: sum, product, min, max, and, ….

COMPE472 Parallel Computing 2.6

Reduction Operation (SUM)

2.7

MPI_REDUCE

• MPI_REDUCE( snd_buf, rcv_buf, count, type, op, root, comm, ierr)

– snd_buf input array of type type containing local values.– rcv_buf output array of type type containing global results– count (INTEGER) number of element of snd_buf and rcv_buf– type (INTEGER) MPI type of snd_buf and rcv_buf– op (INTEGER) parallel operation to be performed– root (INTEGER) MPI id of the process storing the result– comm (INTEGER) communicator of processes involved in the

operation– ierr (INTEGER) output, error code (if ierr=0 no error occours)

• MPI_ALLREDUCE( snd_buf, rcv_buf, count, type, op, comm, ierr)– The argument root is missing, the result is stored to all processes.

2.8

Predefined Reduction Operations

2.9

Example#include <stdio.h>#include <stdlib.h>#include <string.h>#include <math.h>#include <mpi.h>#define MAXSIZE 100

int main(int argc, char **argv){

int myid, numprocs;int data[MAXSIZE], i, x, low, high, myresult=0, result;char fn[255];FILE *fp;

MPI_Init(&argc, &argv);MPI_Comm_size(MPI_COMM_WORLD, &numprocs);MPI_Comm_rank(MPI_COMM_WORLD, &myid);

2.10

Example cont.if(myid==0) {

/* open input file and intialize data */strcpy(fn, getenv("PWD"));strcat(fn, "/rand_data.txt");if( (fp = fopen(fn, "r")) == NULL) {printf("Can't open the input file: %s\n\n", fn);

exit(1);}for(i=0; i<MAXSIZE; i++) {

fscanf(fp, "%d", &data[i]);}

}

2.11

Example cont./* broadcast data */MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);/* add portion of data */x = MAXSIZE/numprocs; /* must be an integer */low = myid * x;high = low + x;for(i=low; i<high; i++) {myresult += data[i];}printf("I got %d from %d\n", myresult, myid);/* compute global sum */MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);if(myid == 0) {printf("The sum is %d.\n", result);}MPI_Finalize();return 0;

}


MPI_Scatter• One-to-all communication: different data sent

from root process to all others in the communicator

MPI_SCATTER(sndbuf, sndcount, sndtype, rcvbuf, rcvcount, rcvtype, root, comm, ierr)– Arguments definition are like other MPI subroutine– sndcount is the number of elements sent to each

process, not the size of sndbuf, that should be sndcount times the number of process in the communicator

– The sender arguments are significant only at root

MPI_Scatter Example• Suppose there are four processes including the root (process

0). A 16 element array on the root should be distributed among the processes.

• Every process should include the following line:

2.13

2.14

MPI_Gather• different data collected by the root process,

from all others processes in the communicator. • Is the opposite of ScatterMPI_GATHER(sndbuf, sndcount, sndtype,

rcvbuf, rcvcount,rcvtype, root, comm, ierr)– Arguments definition are like other MPI subroutine– rcvcount is the number of elements collected from

each process, not the size of rcvbuf, that should be rcvcount times the number of process in the communicator

– The receiver arguments are significant only at root

2.15

Scatter/Gather

2.16

Scatter/Gather Example#include <stdio.h>#include <stdlib.h>#include <mpi.h>int main ( int argc, char *argv[] ){ int myid,j,data[100],tosum[25],sums[4]; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if(myid==0) { for (j=0; j<100; j++)

data[j] = j+1;

printf("The data to sum : "); for (j=0; j<100; j++)

printf(" %d",data[j]);

printf("\n"); }

2.17

Scatter/Gather Example MPI_Scatter(data,25,MPI_INT,tosum,25,MPI_INT,0,MPI_COMM_WORLD);

printf("Node %d has numbers to sum :",myid); for(j=0; j<25; j++)

printf(" %d", tosum[j]); printf("\n"); sums[myid] = 0; for(j=0; j<25; j++)

sums[myid] += tosum[j];

printf("Node %d computes the sum %d\n",myid,sums[myid]);

2.18

Scatter/Gather Example MPI_Gather(&sums[myid],1,MPI_INT,sums,1,MPI_INT,0,MPI_COMM_WORLD);

if(myid==0) /* after the gather, sums contains the four sums*/ { printf("The four sums : "); printf("%d",sums[0]); for(j=1; j<4; j++)

printf(" + %d", sums[j]); for(j=1; j<4; j++)

sums[0] += sums[j]; printf(" = %d, which should be 5050.\n",sums[0]); } MPI_Finalize(); return 0;}

2.19

MPI_Barrier()

• Stop processes until all processes within a communicator reach the barrier

• Almost never required in a parallel program• Occasionally useful in measuring

performance and load balancing• C:

– int MPI_Barrier(MPI_Comm comm)

2.20

Barrier

2.21

Barrier routine

• A means of synchronizing processes by stopping each one until they all have reached a specific “barrier” call.

2.22

Barrier example

#include <stdio.h>#include <stdlib.h>#include <mpi.h>#include <unistd.h>// some time consuming functionalityint function(int t){ sleep(3*t+1); return 0;}

2.23

Barrier example cont.int main(int argc, char** argv){ int MyRank; double s_time, l_time, g_time;

MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);

s_time = MPI_Wtime(); function(MyRank); l_time = MPI_Wtime(); // wait for all to come together MPI_Barrier(MPI_COMM_WORLD); g_time = MPI_Wtime();

printf("Processor %d: LocalTime = %lf,GlobalTime = %lf\n",MyRank,l_time-s_time, g_time-s_time);

MPI_Finalize(); return 0;}


Evaluating Parallel Programs


Sequential execution time, ts: Estimate by counting computational steps of best sequential algorithm.

Parallel execution time, tp: In addition to number of computational steps, tcomp, need to estimate communication overhead, tcomm:

tp = tcomp + tcomm

Elapsed parallel time• Returns the number of seconds that have

elapsed since some time in the past.


Communication Time

Many factors, including network structure and network contention. As a first approximation, use

tcomm = tstartup + ntdata

tstartup is startup time, essentially time to send a message with no data. Assumed to be constant. tdata is transmission time to send one data word, also assumed constant, and there are n data words.


Idealized Communication Time

Number of data items (n)

Startup time

2.29

Benchmark FactorsWith ts, tcomp, and tcomm, can establish speedup factor and computation/communication ratio for a particular algorithm/implementation:

Both functions of number of processors, p, and number of data elements, n.


Factors give indication of scalability of parallel solution with increasing number of processors and problem size.

Computation/communication ratio will highlight effect of communication with increasing problem size and system size.

2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message...

Documents

Transcript of 2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message...