Programming using MPI - ccds.iitkgp.ac.in

51
Programming using MPI Preeti Malakar Indian Institute of Technology Kanpur National Supercomputing Mission Centre for Development of Advanced Computing SKA India IIT Kharagpur High Performance Computing for Astronomy and Astrophysics

Transcript of Programming using MPI - ccds.iitkgp.ac.in

Page 1: Programming using MPI - ccds.iitkgp.ac.in

Programming using MPI

Preeti MalakarIndian Institute of Technology Kanpur

National Supercomputing

Mission

Centre for Development of

Advanced Computing

SKAIndia

IITKharagpur

High Performance Computing for Astronomy and Astrophysics

Page 2: Programming using MPI - ccds.iitkgp.ac.in

FLOPS

2

Page 3: Programming using MPI - ccds.iitkgp.ac.in

Programming Supercomputers

PARAM Shakti, IIT Kharagpur3

Page 4: Programming using MPI - ccds.iitkgp.ac.in

Top 500 Supercomputers

4

Page 5: Programming using MPI - ccds.iitkgp.ac.in

Parallel Programming Models

Shared memory programming – OpenMP, Pthreads

Thread

Core

5

Cache

Page 6: Programming using MPI - ccds.iitkgp.ac.in

Message Passing Interface

• Standard for message passing in a distributed memory environment (most widely used programming model in supercomputers)

• Efforts began in 1991 by Jack Dongarra, Tony Hey, and David W. Walker

• MPI Forum formed in 1993• Version 1.0: 1994

• Version 2.0: 1997

• Version 3.0: 2012

6

Page 7: Programming using MPI - ccds.iitkgp.ac.in

Parallel Programming Models

Distributed memory programming – MPI (Message Passing Interface)

Memory

Process

Distributed memory (each process has its own address space)

Core

7

Page 8: Programming using MPI - ccds.iitkgp.ac.in

8Adapted from Neha Karanjkar’s slides

Page 9: Programming using MPI - ccds.iitkgp.ac.in

What is MPI?

Enables parallel programming using message passing

9

Process 0 Process 1

Page 10: Programming using MPI - ccds.iitkgp.ac.in

Distinct Process Address Space

x = 1, y = 2. . .

x+=2. . .

print x, y

Program order

x = 1, y = 2. . .y++. . .

print x, y

3, 2 1, 3

Process 0 Process 1

x = 1, y = 2. . .

x+=1. . .

print x, y

x = 10, y = 20. . .x++. . .

print x, y

2, 2 11, 20

Process 0 Process 1

10

Page 11: Programming using MPI - ccds.iitkgp.ac.in

Parallel Execution – Single Node

11

Cache

Process

Core

Instruction 1

Instruction 2

Instruction 1

Instruction 2

Instruction 1

Instruction 2

Instruction 1

Instruction 2

parallel.x parallel.x parallel.x parallel.x

parallel.x

Page 12: Programming using MPI - ccds.iitkgp.ac.in

Parallel Execution – Multiple Nodes

12

Memory

Process

Core

Instruction 1

Instruction 2

Instruction 1

Instruction 2

Instruction 1

Instruction 2

Instruction 1

Instruction 2

parallel.x parallel.x parallel.x parallel.x

parallel.x

Page 13: Programming using MPI - ccds.iitkgp.ac.in

Message Passing

13

Process 0 Process 1 Process 2 Process 3

Page 14: Programming using MPI - ccds.iitkgp.ac.in

Getting Started

14

Initialization

Finalization

• gather information about the parallel job

• set up internal library state• prepare for communication

Page 15: Programming using MPI - ccds.iitkgp.ac.in

Process Identification

15

Total number of processes

Rank of a process

• MPI_Comm_size: get the total number of processes

• MPI_Comm_rank: get my rank

Page 16: Programming using MPI - ccds.iitkgp.ac.in

MPI Code Execution Steps (On cluster)

• Compile• mpicc -o program.x program.c

• Execute• mpirun -np 1 ./program.x (or mpiexec in place of mpirun)

• Runs 1 process on the launch node

• mpirun -np 6 ./program.x• Runs 6 processes on the launch node

• mpirun -np 6 -f hostfile ./program.x• Runs 6 processes on the nodes specified in the hostfile

16

<hostfile>

Host1:3Host2:2Host3:1…

Page 17: Programming using MPI - ccds.iitkgp.ac.in

Multiple Processes on Multiple Cores and Nodes

mpiexec –np 8 –f hostfile ./program.x

17

Memory

Process

Core

Node 1 Node 2 Node 3 Node 4

0

1

2

3

4

5

6

7

Node1:2Node2:2Node3:2Node4:2

Hostfile

How?

Page 18: Programming using MPI - ccds.iitkgp.ac.in

Multiple Processes on Multiple Cores and Nodes

mpiexec –np 8 –f hostfile ./program.x 18

Memory

Process

Core

Node 1 Node 2 Node 3 Node 4

0

1

2

3

4

5

6

7

Node1:2Node2:2Node3:2Node4:2

Hostfile

Launch node

Proxy Proxy ProxyProxy

Page 19: Programming using MPI - ccds.iitkgp.ac.in

19

MPI Process Spawning

The child process and the parent process run in separate memory spaces. At the time of fork() both memory spaces have the same content. The entire virtual address space of the parent is replicated in the child.

https://man7.org/linux/man-pages/man2/fork.2.html

https://linux.die.net/man/3/execvp

The exec() family of functions replaces the current process image with a new process image.

Page 20: Programming using MPI - ccds.iitkgp.ac.in

SIMD Parallelism (Recap)

Memory

Process

Core

Instruction 1

Instruction 2

parallel.x

20

NO centralized server/master (typically)

Page 21: Programming using MPI - ccds.iitkgp.ac.in

Sum of Squares of N numbers

21

Serial

for i = 1 to N

sum += a[i] * a[i]

Parallel

for i = 1 to N/P

sum += a[i] * a[i]

P

Memory

Process

Core

0 1 2 3

for i = 1 to N/P

sum += a[i] * a[i]

for i = 1 to N/P

sum += a[i] * a[i]

for i = 1 to N/P

sum += a[i] * a[i]

for i = 1 to N/P

sum += a[i] * a[i]

for i = N/P * rank ; i < N/P * (rank+1) ; i++

localsum += a[i] * a[i]

Collect localsum, add up at one of the ranks

Page 22: Programming using MPI - ccds.iitkgp.ac.in

MPI Installation

22

1. sudo apt-get install mpich

OR

1. Download source code from https://www.mpich.org/downloads/2. ./configure --prefix=<INSTALL_PATH>3. make4. make install

Page 23: Programming using MPI - ccds.iitkgp.ac.in

MPI Code

23

Global communicator

Global communicator

Page 24: Programming using MPI - ccds.iitkgp.ac.in

MPI_COMM_WORLD

Required in every MPI communication

24

Page 25: Programming using MPI - ccds.iitkgp.ac.in

Communication Scope

25

Communicator (communication handle)

• Defines the scope

• Specifies communication context

Process

• Belongs to a group

• Identified by a rank within a group

Identification

• MPI_Comm_size – total number of processes in communicator

• MPI_Comm_rank

Page 26: Programming using MPI - ccds.iitkgp.ac.in

MPI_COMM_WORLD

2

4

3

1

0

26

Process identified by rank/id

Page 27: Programming using MPI - ccds.iitkgp.ac.in

MPI Message

• Data and header/envelope

• Typically, MPI communications send/receive messages

27

Source: Origin of message

Destination: Receiver of message

Communicator

Tag (0:MPI_TAG_UB)

Message Envelope

Page 28: Programming using MPI - ccds.iitkgp.ac.in

MPI Communication Types

Point-to-pointCollective

Page 29: Programming using MPI - ccds.iitkgp.ac.in

Point-to-point Communication

• MPI_Send

• MPI_Recv

SENDER

RECEIVER

int MPI_Send (const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)

int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

Tags should match

Blocking send and receive

Page 30: Programming using MPI - ccds.iitkgp.ac.in

MPI_Datatype

• MPI_BYTE

• MPI_CHAR

• MPI_INT

• MPI_FLOAT

• MPI_DOUBLE

30

Page 31: Programming using MPI - ccds.iitkgp.ac.in

Example 1

31

MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender processif (myrank == 0) /* code for process 0 */{strcpy (message,"Hello, there");MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99, MPI_COMM_WORLD);

}

// Receiver processelse if (myrank == 1) /* code for process 1 */{MPI_Recv (message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status);printf ("received :%s\n", message);

}

Message tag

Message tag

Page 32: Programming using MPI - ccds.iitkgp.ac.in

MPI_Status

• Source rank

• Message tag

• Message length• MPI_Get_count

32

Page 33: Programming using MPI - ccds.iitkgp.ac.in

MPI_ANY_*

• MPI_ANY_SOURCE• Receiver may specify wildcard value for source

• MPI_ANY_TAG• Receiver may specify wildcard value for tag

33

Page 34: Programming using MPI - ccds.iitkgp.ac.in

Example 2

34

MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender processif (myrank == 0) /* code for process 0 */{strcpy (message,"Hello, there");MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99, MPI_COMM_WORLD);

}

// Receiver processelse if (myrank == 1) /* code for process 1 */{MPI_Recv (message, 20, MPI_CHAR, MPI_ANY_SOURCE, 99, MPI_COMM_WORLD,

&status);printf ("received :%s\n", message);

}

Page 35: Programming using MPI - ccds.iitkgp.ac.in

MPI_Send (Blocking)

• Does not return until buffer can be reused

• Message buffering

• Implementation-dependent

• Standard communication mode

SENDER

RECEIVER

35

Page 36: Programming using MPI - ccds.iitkgp.ac.in

Safety

36

0 1

MPI_Send

MPI_Send

MPI_Recv

MPI_RecvSafe

MPI_Send

MPI_Recv

MPI_Send

MPI_RecvUnsafe

MPI_Send

MPI_Recv

MPI_Recv

MPI_SendSafe

MPI_Recv

MPI_Send

MPI_Recv

MPI_SendUnsafe

Page 37: Programming using MPI - ccds.iitkgp.ac.in

Other Send Modes

• MPI_Bsend Buffered• May complete before matching receive is posted

• MPI_Ssend Synchronous• Completes only if a matching receive is posted

• MPI_Rsend Ready•

37

Page 38: Programming using MPI - ccds.iitkgp.ac.in

Non-blocking Point-to-Point

• MPI_Isend (buf, count, datatype, dest, tag, comm, request)

• MPI_Irecv (buf, count, datatype, source, tag, comm, request)

• MPI_Wait

38

0 1

MPI_Isend

MPI_Recv

MPI_Isend

MPI_RecvSafe

Page 39: Programming using MPI - ccds.iitkgp.ac.in

Computation Communication Overlap

39

0 1

MPI_Isend

MPI_Recv

MPI_Wait

compute

compute

compute

compute

compute

Time

Page 40: Programming using MPI - ccds.iitkgp.ac.in

Collective Communications

• Must be called by all processes that are part of the communicator

Types

• Synchronization (MPI_Barrier)

• Global communication (MPI_Bcast, MPI_Gather, …)

• Global reduction (MPI_Reduce

40

Page 41: Programming using MPI - ccds.iitkgp.ac.in

Barrier

• Synchronization across all group members

• Collective call

• Blocks until all processes have entered the call

• MPI_Barrier (comm)

41

Page 42: Programming using MPI - ccds.iitkgp.ac.in

Broadcast

• Root process sends message to all processes

• Any process can be root process but has to be the same in all processes

• int MPI_Bcast (buffer, count, datatype, root, comm)

• Number of elements in buffer – count

• buffer – Input or output

42

X X

X

X

X

X

X

X

X

Page 43: Programming using MPI - ccds.iitkgp.ac.in

Example 3

43

int rank, size, color;MPI_Status status;

MPI_Init (&argc, &argv);MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);

color = rank + 2;int oldcolor = color;MPI_Bcast (&color, 1, MPI_INT, 0, MPI_COMM_WORLD);

printf ("%d: %d color changed to %d\n", rank, oldcolor, color);

Page 44: Programming using MPI - ccds.iitkgp.ac.in

Gather

• Gathers values from all processes to a root process

• int MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)

• Arguments recv* not relevant on non-root processes

44

A0 A1 A2

A0

A1

A2

DATA

PR

OC

ESSE

S

Page 45: Programming using MPI - ccds.iitkgp.ac.in

Example 4

45

MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);

color = rank + 2;

int colors[size];MPI_Gather (&color, 1, MPI_INT, colors, 1, MPI_INT, 0, MPI_COMM_WORLD);

if (rank == 0)for (i=0; i<size; i++)printf ("color from %d = %d\n", i, colors[i]);

Page 46: Programming using MPI - ccds.iitkgp.ac.in

Scatter

• Scatters values to all processes from a root process

• int MPI_Scatter (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)

• Arguments send* not relevant on non-root processes

• Output parameter – recvbuf

46

A0 A1 A2

A0

A1

A2

DATA

PR

OC

ESSE

S

Page 47: Programming using MPI - ccds.iitkgp.ac.in

Allgather

• All processes gather values from all processes

• int MPI_Allgather (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm)

47

A0 A1 A2

A0 A1 A2

A0 A1 A2

A0

A1

A2

DATA

PR

OC

ESSE

S

Page 48: Programming using MPI - ccds.iitkgp.ac.in

Reduce

• MPI_Reduce (inbuf, outbuf, count, datatype, op, root, comm)

• Combines element in inbuf of each process

• Combined value in outbuf of root

• op: MIN, MAX, SUM, PROD, …

48

0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0

2 1 2 3 4 5 6 MAX at root

Page 49: Programming using MPI - ccds.iitkgp.ac.in

Allreduce

• MPI_Allreduce (inbuf, outbuf, count, datatype, op, comm)

• op: MIN, MAX, SUM, PROD, …

• Combines element in inbuf of each process

• Combined value in outbuf of each process

49

0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0

2 1 2 3 4 5 6

MAX

2 1 2 3 4 5 62 1 2 3 4 5 6

Page 50: Programming using MPI - ccds.iitkgp.ac.in

DEMO

50