MPI Parallel Programming

Information Technology Services, HKU

MPI Parallel MPI Parallel ProgrammingProgrammingInformation Technology ServicesThe University of Hong Kong

By HPC/Grid Team, Email: [email protected]

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Must Know Before Must Know Before Programming…Programming…• HKU HPC Facilities

– hpcpower2: 64-bit Linux cluster consists of 24 nodes

• each node has TWO 64-bit quad-core Intel Xeon CPUs running at 3GHz

– hpcpower: 32-bit Linux cluster consists of 178 nodes

• 128 nodes of dual 2.8 GHz Xeon processors, and

• 50 nodes of dual 3.06 GHz Xeon processors

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

How to get an accountHow to get an account

• Download the form at http://www.hku.hk/cc/home/services/forms.htm

CF-137e : High Performance Computing Cluster (hpcpower) Account Application (for Non-TOSI Staff/Students)

• Return the application form to CC office.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

How to login HPCPOWERHow to login HPCPOWER

• From any PC within HKU campus network: ssh hpcpower (bundled in Linux) Download PuTTY to use SSH at Windows

http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

• PC outside HKU campus network: Must use HKUVPN to get a HKU campus network IP

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

File TransferFile Transfer

• From/To any PC within HKU campus network: scp/sftp hpcpower (bundled in Linux) Download WINSCP to use SSH at

Windowshttp://winscp.net/eng/download.php

• PC outside HKU campus network: Must use HKUVPN to get a HKU campus network IP

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Program EditingProgram Editing

• You can use the command vi, emacs or pico to edit programs. Please refer to the UNIX user's guide for detail. http://www.hku.hk/cc/handbook/unix/unix_toc.html

• Microsoft Windows users: do not use a standard Microsoft Windows editor such as Notepad/Wordpad to edit files that will be used on the Linux system.

• To convert Windows file to UNIX file, enter:

dos2unix file.txt

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Resource Management SystemResource Management System

• After logging into to the cluster, the user is on the master node. When a program is run, it is also immediately run on the master. This is the "interactive mode", which is convenient for running simple commands like ls, vi, etc. or for editing/compiling a program.

• Any interactive program runs on the master node will be killed without further notice.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

PBS (Portable Batch System)PBS (Portable Batch System)

• Long computing jobs should be submitted through the batch system.

• The submitted job will be in a queue waiting for its turn, then will be sent to one or more compute node(s), which the job will have dedicated access to until it finishes.

• With PBS, the job will run faster and the cluster will be more efficiently utilized.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Basic PBS CommandsBasic PBS Commands

Command

Description

qsub To submit a job to the queuing system

qdel To delete a job that has been submitted to the queuing system

qstat / showq

List all information about queues and jobs

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Sample PBS job scriptSample PBS job script

• Sample PBS job script (pbs.cmd)#!/bin/sh#PBS -N test#PBS -m ea#PBS -q qdev#PBS -l walltime=02:00:00#PBS -l nodes=1:ppn=2### Define number of processors usedNP=`wc -l < $PBS_NODEFILE` cd $PBS_O_WORKDIR mpirun –np $NP ./a.out

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Sample PBS job scriptSample PBS job script

• A line beginning with # is a comment; • A line beginning with #PBS is a PBS

directive; • This submit script pbs.cmd specifies the

name of the job (test), which queue to use (qdev), that it needs both processors on a single node (nodes=1:ppn=2), that it will run for at most 2 hours (walltime), and that TORQUE should email to user when the job exits or aborts (ea). Additionally, the user specifies where and what to execute (./a.out).

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Submit a PBS jobSubmit a PBS job

$ qsub pbs.cmd

2234.hpcpower

• PBS returns a job identifier of the form jobid.hpcpower where jobid (2234) is an integer number assigned by PBS.

• The job identifier is needed for any actions involving the job, such as checking job status or deleting the job.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

qsub optionsqsub options

• The resource requested on command line has a high preference than the directive line in the script file.

• e.g. submit job by command qsub -l nodes=2:ppn=8 pbs.cmd

, the job will run on 2 compute nodes with 8 processors each instead of what stated in the script file pbs.cmd.

Option Action

qsub -l list Set job resource list

qsub -N jobname

Set job name to jobname

qsub -q dest Submit to queue dest

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

List all Submitted jobs statusList all Submitted jobs status

$ qstat –a or $ qa

hpcpower:

Job ID UsernameQueue Jobname SessID NDS TSKReq'd Memory

Req'd Time S

Elap Time

-------------

------- ----- --------

------ --- --- ------ ----- - -----

2230.hpcpower

ycleung qdev Test 5597 8 -- -- 02:00 R 01:02

2231.hpcpower

ycleung qprod Job_1 11048 16 -- -- 10:00 R 09:17

2233.hpcpower

chliu oneday

huge -- 16 -- -- 15:00 Q --Job information provided• Username: Job owner• NDS: Number of nodes requested• Req’d Time: Requested amount of wallclock time• Elap Time: Elapsed time in the current job state• S: Job state (E-Exit; R-Running; Q-Queuing)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

qstat optionsqstat options

Option Action

qstat –a List all jobs status

qstat –q List all queues on the system

qstat –n List all jobs’ allocated node(s)

qstat –u username

List all jobs owned by user username

qstat –r List all running jobs

qstat –f jobid List all information known about specified job (jobid)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Delete a JobDelete a Job

$ qdel 2234

where 2234 is the job id.


MPI MPI (Message Passing Interface)(Message Passing Interface)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI (Message Passing Interface)MPI (Message Passing Interface)

• Is a communication protocol for parallel programming

• Is a library of functions but not a language

• MPI software is available for free• MPICH and Open MPI is most common

free versions• Many vendors have their own

optimized version of MPI - IBM, Sun, SGI, Myrinet …..

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

What is MPIWhat is MPI

P0 P1 P2 P3

Communication Network

Message Passing Interface

ProcessA process is a set of executable instructions (program) which runs on a processor. For maximum performance, each CPU (or core in a multicore machine) will be assigned just a single process.

Processes

Message PassingThe method by which data from one processor's memory is copied to the memory of another processor.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U


• Same program running on many processors at the same time.

• Single-Program-Multiple-Data (SPMD) style.if (my_rank == 0) then

Call Master(....)

else

call Worker(....)

endif

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U


• MPI codes can run on parallel machines with distributed memory and shared memory architecture.

• High portability.• All variable are private to each

process.• Process communicate via specific

communication call.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

General MPI Program StructureGeneral MPI Program Structure

MPI include file

#include <mpi.h>void main (int argc, char *argv[]){int np, rank, ierr;ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank);MPI_Comm_size(MPI_COMM_WORLD,&np);/* Do Some Works */ierr = MPI_Finalize();}

variable declarations #include <mpi.h>void main (int argc, char *argv[]){int np, rank, ierr;ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank);MPI_Comm_size(MPI_COMM_WORLD,&np);/* Do Some Works */ierr = MPI_Finalize();}

Initialize MPI environment


Do work and make message passing calls


Terminate MPI Environment


Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI Header FileMPI Header File

• C :#include “mpi.h”

• FORTRAN :include ‘mpif.h’

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI Function FormatMPI Function Format

• All MPI functions have names that begin with the prefix MPI_ to advoid name collisions.

• C function names are case sensitive but Fortran function names are not.C :ierror = MPI_Xxxx(parameter...) ;

FORTRAN :call MPI_Xxxx(argument,..., ierror)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Initialising MPI environmentInitialising MPI environment• C :

int MPI_Init (int *argc, char*** argv);

• FORTRAN :MPI_Init(ierror)

INTEGER ierror

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Exit MPI environmentExit MPI environment

• C :int MPI_Finialize()

• FORTRAN :MPI_Finalize(ierror)

INTEGER ierror

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

CommunicatorCommunicator

• Communicator is a collection of processes that can send messages to each other.

• MPI_COMM_WORLD is predefined and consist of all the processes.

1 3

02

4

5Communicator

MPI_COMM_WORLD

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

SizeSize

• How many processes are contained within a communicator ?

• C :MPI_Comm_size(MPI_Comm comm, *size)

• FORTRAN :MPI_Comm_size(comm,size,ierror)

INTEGER comm,size,ierror

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

RankRank• How do we identify different processes ?• C:

MPI_Comm_rank(MPI_Comm comm, *rank);

• FORTRANMPI_Comm_rank(comm,rank,ierror) INTEGER comm,rank,ierror

Rank values are range from 0 to N-1, where N is the number of processes in the communicator.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

A First MPI Program : HelloworldA First MPI Program : Helloworld

Program HelloworldINCLUDE ‘mpif.h’INTEGER nproc,myrank,ierrCALL MPI_Init(ierr)CALL MPI_Comm_size(MPI_COMM_WORLD,nproc,ierr)CALL MPI_Comm_rank(MPI_COMM_WORLD,myrank,ierr)Print *, “Hello World! I’m process “,myrank,” of “,nprocCALL MPI_Finalize(ierr)

• C

• FORTRAN

#include <stdio.h>#include ‘mpi.h’void main(int argc, char **argv) {int nproc,myrank,ierr;ierr=MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&nproc);MPI_Comm_rank(MPI_COMM_WORLD,&myrank);printf (“Hello World! I’m process %d of %d\

n”,myrank,nproc);ierr=MPI_Finalize(); }

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Compile MPI in HPCPOWERCompile MPI in HPCPOWER

• Cpgcc -Mmpi program.c

• FORTRAN (FIXED FORM)pgf90 -Mmpi program.f

• FORTRAN (FREE FORM)pgf90 -Mmpi –Mfree program.f90

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Execute MPI in HPCPOWERExecute MPI in HPCPOWER• Running in 4 processes in interactive node :

mpirun –np 4 ./a.out

Hello World! I’m process 0 of 4Hello World! I’m process 1 of 4Hello World! I’m process 2 of 4Hello World! I’m process 3 of 4

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

What is in a messageWhat is in a message

• An MPI message is an array of elements of a particular MPI datatype.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI DatatypeMPI DatatypeMPI Fortran

DatatypeFortran

DatatypeMPI C Datatype C Datatype

MPI_CHARACTER character(1) MPI_CHAR signed char

MPI_UNSIGNED_CHAR

unsigned char

MPI_INTEGER integer MPI_INT signed int

MPI_SHORT signed short int

MPI_LONG signed long int

MPI_UNSIGNED_SHORT

unsigned short int

MPI_UNSIGNED unsigned int

MPI_UNSIGNED_LONG

unsigned long int

MPI_REAL real MPI_FLOAT float

MPI_DOUBLE_PRECISION

double precision

MPI_DOUBLE double

MPI_LONG_DOUBLE long double

MPI_COMPLEX complex

MPI_LOGICAL logical


Point-to-point communicationPoint-to-point communication

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U


A point-to-point communication always involves exactly two processes. One process acts as the sender and the other acts as the receiver.

1 3

02

4

5Communicator

source

dest

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U


• Communicate between 2 processes.

• Source send message to destination process.

• Communication takes place within a communicator.

• Destination process is identified by its rank in the communicator.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Sending MessagesSending Messages

• C :int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)

• FORTRAN :MPI_Send( buf, count, datatype, dest, tag,

comm, ierror)

<type> BUF(*)

INTEGER count,datatype,dest,tag,comm

INTEGER ierror

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Receiving MessagesReceiving Messages

• C :int MPI_Recv(void *buf, int count,

MPI_Datatype datatype, int source,

int tag, MPI_Comm comm,

MPI_Status *status)

• FORTRAN :MPI_Recv( buf, count, datatype, source, tag,

comm, status, ierror)

<type> BUF(*)

INTEGER count,datatype,dest,tag,comm

INTEGER status(MPI_STATUS_SIZE),ierror

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MessagesMessages

• Messages = envelope + message body • Envelope

1. Source – The sending process2. Destination – The receiving process3. Communicator – specifies group of processes

which both sourse and destination belong4. Tag – used to classify messages

• Message Body1. Buffer – the message data2. Datatype – type of message data3. Count – number of items of type datatype in

buffer

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

TagTag

• The tag is a number specified by programmer for each message. It is attached with the message being sent.

• When the destination process receives the message, the process can check the tag and acts accordingly.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Message Matching RulesMessage Matching Rules• Sender must specify a valid

destination RANK.• Receiver must specify a valid source

RANK.• Sender and receiver must use the

same COMMUNICATOR.• TAG must match.• DATATYPE must match.• Receiver's buffer must be large

enough.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Example : Greetings(Fortran)Example : Greetings(Fortran)PROGRAM greetingsinclude 'mpif.h'integer my_rankinteger pinteger sourceinteger destinteger tagcharacter*100 messagecharacter*10 digit_stringinteger sizeinteger status(MPI_STATUS_SIZE)integer ierr

call MPI_Init(ierr)call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierr)call MPI_Comm_size(MPI_COMM_WORLD, p, ierr)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Example : Greetings(Fortran)Example : Greetings(Fortran)if (my_rank /= 0) then write(digit_string,FMT="(I3)") my_rank message ='Greetings from process ' // trim(digit_string) // '!' dest = 0 tag = 0 call MPI_Send(message, len_trim(message), & & MPI_CHARACTER, dest, tag, MPI_COMM_WORLD, ierr)else ! my_rank == 0 do source = 1, p-1 tag = 0 call MPI_Recv(message, len(message), & & MPI_CHARACTER, source, tag, & & MPI_COMM_WORLD, status, ierr) write(6,FMT="(A)") message enddo

endifcall MPI_Finalize(ierr)END PROGRAM greetings

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Example : Greetings (C)Example : Greetings (C)#include <stdio.h>#include <string.h>#include "mpi.h"main(int argc, char* argv[]) { int my_rank; /* rank of process */ int p; /* number of processes */ int source; /* rank of sender */ int dest; /* rank of receiver */ int tag = 0; /* tag for messages */ char message[100]; /* storage for message */ MPI_Status status; /* return status for */ /* receive */

/* Start up MPI */ MPI_Init(&argc, &argv); /* Find out process rank */ MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* Find out number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &p);

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Example : Greetings (C)Example : Greetings (C) if (my_rank != 0) { /* Create message */ sprintf(message, "Greetings from process %d!", my_rank); dest = 0; /* Use strlen+1 so that '\0' gets transmitted */ MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);

} else { /* my_rank == 0 */ for (source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s\n", message); } } /* Shut down MPI */ MPI_Finalize();} /* main */

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Example : GreetingsExample : Greetings

• % mpirun –np 4 ./a.out

Greetings from process 1 !Greetings from process 2 !Greetings from process 3 !

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_SendrecvMPI_Sendrecv

MPI_Sendrecv (*sendbuf,sendcount,sendtype,dest,sendtag, *recvbuf,recvcount,recvtype,source,recvtag, comm,*status)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

WildcardWildcard

• MPI_Recv can use MPI_ANY_SOURCE and MPI_ANY_TAG as argument.

• There is no wildcard for communicator.

call MPI_Recv(message, 1, MPI_INTEGER, & MPI_ANY_SOURCE, MPI_ANY_TAG, &

MPI_COMM_WORLD, MPI_Status , ierror)

• MPI_Send cannot use wildcard.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Communication EnvelopeCommunication Envelope

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

statusstatus

• Envelope information is returned from the argument of MPI_Recv - status.

• status has 3 elements :– status(MPI_SOURCE)– status(MPI_TAG)– status(MPI_ERROR)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Received Message CountReceived Message Count

• status also returns information on the size of message (i.e. number of elements) received. However, this is not directly accessible.

• To determine the size of message received, we can call :

call MPI_Get_Count(status, &

& MPI_Datatype, count, ierror)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Message Order PreservationMessage Order Preservation

• Message do not take over each other.

1 3

0

2

4

5Communicator

source

dest

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

System BufferSystem Buffer

Network

Send Buffer Receive Buffer

System Buffer

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Blocking CommunicationBlocking Communication

• MPI_Recv would hang until the message has been received into the buffer specified by the buffer argument.

• If message is not available, the process will remain hang until a message become available.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U


• MPI_Send would hang until– The whole message has arrived the receiv

er. OR– The whole message has been copied into

a system buffer.• After MPI_Send, the send buffer is save

for reuse.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U


• Correct version - Succeed even if no system buffer is available.

IF (rank == 0) THEN

call MPI_SEND(sendbuf,....to P(1))

call MPI_RECV(recvbuf,....from P(1))

ELSE ! rank == 1



ENDIF

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U


• Common bug-Always deadlock.

IF (rank == 0) THEN



ELSE ! rank == 1



ENDIF

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U


• Unsafe - As least one of the two messages sent must be buffered.

• Deadlock if system buffer not big enough.IF (rank == 0) THEN



ELSE ! rank == 1



ENDIF

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Non-Block CommunicationNon-Block Communication

• One can improve performance by overlapping communication and computation.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Non-Blocking SendNon-Blocking Send• A non-blocking send start call initiates the

send operation, but does not complete it.• The send start call will returns before the

message was copied out of the send buffer.• Communication may proceed concurrently

with computation.• A separate send complete operation is

need to complete the communication. i.e. to verify that the data has been copied out of the send buffer.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Non-Blocking Send ExampleNon-Blocking Send Examplecall MPI_Isend(x, 1, MPI_INTEGER, &

&1,0,MPI_COMM_WORLD,request1,ierror)

.....

! Do sth here while message being

! sent.

call MPI_Wait(request1,status,ierror)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_Isend SyntaxMPI_Isend SyntaxMPI_Isend(buf, count, datatype, dest, &

&tag, comm, request, ierror)

<type> buff(*)

INTEGER, intent(in) :: &

count, datatype, dest, tag, comm

INTEGER, intent(out) :: request, ierror

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_IsendMPI_Isend• The sender should not access any part

of the send buffer after MPI_Isend is called, until the send completes.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

RequestRequest

• request is a handle. i.e. the object referenced by request is system defined, and that it cannot be directly accessed by the user.

• In FORTRAN, every request should be declared as an integer.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

RequestRequest

• Its purpose is to identify the operation started by the non-blocking call.

• It match the operations that initiates the communication with the operation that terminate it.

• It stores information about status of the pending communication operation.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Communication CompletionCommunication Completion

• Completion of send indicates that the whole message has been copied out and the sender is now free to update the location in the send buffer.

• It doesn’t indicate message has been received, rather, it indicates that the whole message may have been put in system buffer.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_WaitMPI_Wait• MPI_Wait blocks until operation identi

fied by request completes.• When MPI_Wait returns, request is se

t to MPI_REQUEST_NULL. This means there is no pending operation associated to request. The opaque object associated with it would also be deallocated.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_Wait SyntaxMPI_Wait SyntaxMPI_Wait(request, status, ierror)

INTEGER, intent(inout) :: request, &

& ierror

INTEGER, intent(out) :: &

& status(MPI_STATUS_SIZE)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_TestMPI_Test• MPI_Test is non-blocking version of M

PI_Wait.MPI_Test(request, flag, status,ierror)

LOGICAL, intent(out) :: flag

INTEGER, intent(inout) :: request, &

& ierror

INTEGER, intent(out) :: &

& status(MPI_STATUS_SIZE)

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_TestMPI_Test• MPI_Test returns immediately. It don’t

wait the completion of communication operation.

• It sets flag == .TRUE. if the operation identified by request is complete. It also sets request to MPI_REQUEST_NULL.

• It sets flag == .FALSE. if the operation identified by request is not complete.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Nonblocking Receive ExampleNonblocking Receive Example• Method 1: MPI_Wait

• Method 2: MPI_Test

MPI_Irecv(buf,…,req);…do work not using bufMPI_Wait(req,status);…do work using buf

MPI_Irecv(buf,…,req);MPI_Test(req,flag,status);while (flag != 0) { …do work not using buf MPI_Test(req,flag,status);}…do work using buf

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_IRecvMPI_IRecv• The receiver should not access any par

t of the receive buffer after MPI_Irecv is called, until the receive completes.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Multiple CompletionMultiple Completion

• MPI provides routines which test mulitple communications at once.– MPI_WAITANY, MPI_TESTANY– MPI_WAITALL, MPI_TESTALL– MPI_WAITSOME, MPI_TESTSOME

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

ProbeProbe

• MPI_Probe and MPI_IProbe operations allow incoming messages to be checked for, without actually receiving them.

• The user can then decide how to receive them based on the information returned by probe.

• e.g. allocate memory for receive buffer according to the length of the probed message.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

CancelCancel

• MPI_Cancel operation allows pending communcations to be canceled.


Collective communicationCollective communication

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Collective communicationCollective communicationA collective communication involves communication among all processes in a process group (communicator).

1 3

0

2

4

5

Communicator

source

e.g. MPI_Bcast

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Collective communicationCollective communication• All collective communication events are

blocking. • The call to the collective routine,

typically replaces several point-to-point calls.

• The source code is much more readable.• Optimized forms of the collective

routines are often faster than the equivalent operation expressed in terms of point-to-point routines.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Collective communicationCollective communication

11 33 55 77

1616

reduction

scatter gather

broadcast

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_BarrierMPI_Barrier

• MPI_Barrier (comm)• Creates a barrier synchronization in

a group (communicator).

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_BcastMPI_Bcast

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_BcastMPI_Bcast

• Correct - Call MPI_Bcast in all nodesMPI_Bcast (message, … ,0,MPI_COMM_WORLD)

• Common mistakeif (I am master) then

MPI_Bcast (message,…,0,MPI_COMM_WORLD)

elseMPI_Recv(message, … ,0, MPI_COMM_WORLD)

endif

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_ScatterMPI_Scatter

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_GatherMPI_Gather

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_AllgatherMPI_Allgather

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_ReduceMPI_Reduce

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_AllreduceMPI_Allreduce

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI_ScanMPI_Scan

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

MPI Reduce and Scan predefined MPI Reduce and Scan predefined operationsoperations

MPI OP Operation CMPI_MAX Maximum integer, float

MPI_MIN Minimum integer, float

MPI_SUM Sum integer, float

MPI_PROD Product integer, float

MPI_LAND Logical AND integer

MPI_BAND Bitwise AND integer, MPI_BYTE

MPI_LOR Logical OR integer

MPI_BOR Bitwise OR integer, MPI_BYTE

MPI_LXOR Logical exclusive OR

Integer

MPI_BXOR Bitwise exclusive OR

integer, MPI_BYTE

MPI_MAXLOC

Max val and location

float, double, long double

MPI_MINLOC Min val and location

float, double, long double

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Example : Matrix-vector MultiplicatiExample : Matrix-vector MultiplicationonSchematic of parallel decomposition for matrix-vector multiplication, A=B*C. The vector A is depicted in yellow. The matrix B and vector C are depicted in multiple colors representing the portions, columns, and elements assigned to each processor, respectively.

Info

rmat

ion T

ech

nolo

gy S

erv

ices,

HK

U

Example : Matrix-vector MultiplicatiExample : Matrix-vector Multiplicationon

A=B*C

33,322,311,300,3

33,222,211,200,2

33,122,111,100,1

33,022,011,000,0

3

2

1

0

cbcbcbcb

cbcbcbcb

cbcbcbcb

cbcbcbcb

a

a

a

a

P0 P1 P2 P3

+

+

+

+

+

+

+

+

+

+

+

+

Reduction (SUM)

P0 P1 P2 P3

MPI Parallel Programming

Documents

Transcript of MPI Parallel Programming