1 Parallel Computing—Higher-level concepts of MPI.

Post on 19-Dec-2015

234 views 5 download

Tags:

Transcript of 1 Parallel Computing—Higher-level concepts of MPI.

1

Parallel Computing—Higher-level concepts of MPI

2

MPI—Presentation Outline

• Communicators, Groups, and Contexts• Collective Communication• Derived Datatypes• Virtual Topologies

3

Communicators, groups, and contexts

• MPI provides a higher level abstraction to create parallel libraries:• Safe communication space• Group scope for collective operations• Process Naming

• Communicators + Groups provide:• Process Naming (instead of IP address + ports)• Group scope for collective operations

• Contexts:• Safe communication

4

What are communicators?• A data-structure that contains groups (and thus processes)• Why is it useful:

• Process naming, ranks are names for application programmers • Easier than IPaddress + ports

• Group communications as well as point to point communication

• There are two types of communicators, • Intracommunicators:

• Communication within a group

• Intercommunicators:• Communication between two groups (must be disjoint)

5

What are contexts?• An unique integer:

• An additional tag on the messages

• Each communicator has a distinct context that provides a safe communication universe:• A context is agreed upon by all processes when a

communicator is built

• Intracommunicators has two contexts:• One for point-to-point communications• One for collective communications,

• Intercommunicators has two contexts:• Explained in the coming slides

6

Intracommunicators • Contains one group • Allows point-to-point and collective communications between

processes within this group• Communicators can only be built from existing communicators:

• MPI.COMM_WORLD is the first Intracommunicator to start with

• Creation of intracommunicators is a collective operation: • All processes in the existing communicator must call it in order to

execute successfully

• Intracommunicators can have process topologies: • Cartesian

• Graph

7

Creating new Intracommunicators

newComm

COMM_WORLD

0 1 2 3

Create new communicatorwith process 0 and 3 only

0 1

MPI.Init(args);int [] incl1 = { 0, 3};Group grp1 = MPI.COMM_WORLD.Group();Group grp2 = grp1.Incl(incl1);Intracomm newComm = MPI.COMM_WORLD.Create(grp2);

8

How do processes agree on context for new Intracommunicators ?

• Each process has a static context variable which is incremented whenever an Intracomm is created

• Each process increments this variable, sends it to all the other processes

• The max integer is agreed upon as the context• An existing communicators’ context is used for

sending “context agreement” messages:• What about MPI.COMM_WORLD?

• It is safe anyway, because it is the first intracommunicator and there is no chance of conflicts

9

Intercommunicators• Contains two groups:

• Local group (the local process is in this group)• Remote group • Both groups must be disjoint

• Only allows point-to-point communications• Intercommunicators cannot have process

topologies • Next slide: How to create intercommunicators

10

MPI.Init(args);int [] incl2 = {0, 2, 4, 6};int [] incl3 = {1, 3, 5, 7};Group grp1 = MPI.COMM_WORLD.Group();int rank = MPI.COMM_WORLD.Rank();Group grp2 = grp1.Incl(incl2);Group grp3 = grp1.Incl(incl3);Intercomm icomm = null;

if(rank == 0 || rank == 2 || rank == 4 || rank == 6) { icomm = MPI.COMM_WORLD.Create_intercomm(comm1,0,1,56);} else { icomm = MPI.COMM_WORLD.Create_intercomm(comm2,1,0,56);}

newComm

remote group

local group

Comm1

0(a) 1(b) 2(c) 3(d)

Comm2

0(e) 1(f) 2(g) 3(h)

0(a) 1(b) 2(c) 3(d)

0(e) 1(f) 2(g) 3(h)

Creating intercommunicators

11

Creating intercomms …• What are the arguments to Create_intercomm method:

• Local communicator (which contains current process)

• local_leader (rank)

• remote_leader (rank)

• tag for messages sent for selection of contexts

• But, the groups were disjoint, how can they communicate?• That is where a peer communicator is required

• At least local_leader and remote_leader are part of this peer communicator

• In the last figure, MPI.COMM_WORLD is the peer communicator, and process 0 and 1 (ranks relative to MPI.COMM_WORLD) are leaders of their respective groups

12

Selecting contexts for intercomms

• An intercommunicator has two contexts: • send_context (Used for sending messages)• recv_context (Used for receiving messages)

• In intercommunicators, processes in local group can only send messages to remote groups

• How is context agreed upon?• Each group decides its context, • The leaders (local and remote) exchange the

contexts agreed upon, • The one which is greater, is selected as the context

13

Process 0

Process 1

Process 3

Process 2

Process 4

Process 5

Process 7

Process 6

COMM_WORLD

Group1

Group2 0

12

0

1

2

14

MPI—Presentation Outline

• Point to Point Communication• Communicators, Groups, and Contexts• Collective Communication• Derived Datatypes• Virtual Topologies

15

Collective communications

• Provided as a convenience for application developers:• Save significant development time• Efficient algorithms may be used • Stable (tested)

• Built on top of point-to-point communications• These operations include:

• Broadcast, Barrier, Reduce, Allreduce, Alltoall, Scatter, Scan, Allscatter

• Versions that allows displacements between the data

16

Image from MPI standard doc

Broadcast, scatter, gather, allgather, alltoall

17

Reduce collective operations

1

2

3

4

5

15

1

2

3

4

5

15

15

15

15

15

reduce

allreduce

Processes Data MPI.PROD MPI.SUM MPI.MIN MPI.MAX MPI.LAND MPI.BAND MPI.LOR MPI.BOR MPI.LXOR MPI.BXOR MPI.MINLOC MPI.MAXLOC

Processes

18

Group A

0 54321

time ->

6 7

Eight processes, thus forms only one group Each process exchanges an integer 4 times Overlaps communications well

A Typical Barrier() Implementation

19

Intracomm.Bcast( … )• Sends data from a process to all the other processes• Code from adlib:

• A communication library for HPJava

• The current implementation is based on n-ary tree:• Limitation: broadcasts only from rank=0• Generated dynamically

• Cost: O( log2(N) )• MPICH1.2.5 uses linear algorithm:

• Cost O(N)

• MPICH2 has much improved algorithms• LAM/MPI uses n-ary trees:

• Limitation, broadcast from rank=0

20

0

654

4321

A Typical Broadcast Implementation

21

MPI—Presentation Outline

• Point to Point Communication• Communicators, Groups, and Contexts• Collective Communication• Derived Datatypes• Virtual Topologies

22

MPI Datatypes• What kind (type) of data can be sent using MPI

messaging? • Basically two types:

• Basic (primitive) datatypes• Derived datatypes

23

MPI Basic Datatypes

• MPI_CHAR • MPI_SHORT • MPI_INT • MPI_LONG • MPI_UNSIGNED_CHAR • MPI_UNSIGNED_SHORT • MPI_UNSIGNED_LONG • MPI_UNSIGNED • MPI_FLOAT • MPI_DOUBLE • MPI_LONG_DOUBLE • MPI_BYTE

24

• Besides basic datatypes, it is possible communicate heterogeneous and non-contiguous data: • Contiguous • Indexed• Vector• Struct

Derived Datatypes

25

MPI—Presentation Outline

• Point to Point Communication• Communicators, Groups, and Contexts• Collective Communication• Derived Datatypes• Virtual Topologies

26

Virtual topologies• Used to specify processes in a geometric shape• Virtual topologies have no connection with the

physical layout of machines:• Its possible to make use of underlying machine

architecture

• These virtual topologies can be assigned to processes in an Intracommunicator

• MPI provides:• Cartesian topology• Graph topology

27

Cartesian topology: Mapping four processes onto 2x2 topology

• Each process is assigned a coordinate:• Rank 0: (0,0)• Rank 1: (1,0)• Rank 2: (0,1)• Rank 3: (1,1)

• Uses:• Calculate rank by knowing grid position• Calculate grid positions from ranks• Easier to locate rank of neighbours• Applications may have communication

patterns:• Lots of messaging with immediate

neighbours

Comm1

0 1 2 3

y-axis

x-axis

0 (0,0) 1 (1,0)

2 (0,1) 3 (1,1)

28

Periods in cartesian topology

• Axis 1 (y-axis is periodic):• Processes in top and bottom

rows have valid neighbours towards top and bottom respectively

• Axis 0 (x-axis is non-periodic):• Processes in right and left

column have undefined neighbour towards right and left respectively

y-axis

x-axis

periodicity[0] = false;periodicity[1] = true;

29

Graph topology

nnodes=4;index=2,3,4,6edges=1,3,0,3,0,2

0 1

2 3

30

• Just to give you an idea how MPI-based applications are designed …

Doing Matrix Multiplication using MPI

31

=x

1 0 2

2 1 0

0 2 2

0 1 0

0 0 1

1 1 1

2 3 2

0 2 1

2 2 4

Basically how it works!

32

Matrix Multiplication MxN ..

int rank = MPI.COMM_WORLD.Rank() ; int size = MPI.COMM_WORLD.Size() ;

if(master_mpi_process) {

initialize matrices M and N

for(int i=1 ; i<size ; i++) { send rows of matrix M to process `i’ } broadcast matrix N to all non-zero processes

for (int i=0 ; i<size ; i++) { receive rows of resultant matrix from process `i’ }

.. print results .. } else { receive rows of Matrix M call broadcast to receive matrix N compute matrix multiplication for sub matrix (its done in parallel) send resultant row back to master process }

..