Introduction to Programming by MPI for Parallel...

253
Introduction to Programming by MPI for Parallel FEM Report S1 & S2 in C Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009)

Transcript of Introduction to Programming by MPI for Parallel...

Page 1: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Intr

oduc

tion

to P

rogr

amm

ing

by M

PI fo

r Par

alle

l FEM

Rep

ort S

1 &

S2

in C

Ken

go N

akaj

ima

Pro

gram

min

g fo

r Par

alle

l Com

putin

g (6

16-2

057)

S

emin

ar o

n A

dvan

ced

Com

putin

g (6

16-4

009)

Page 2: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

11

Mot

ivat

ion

for P

aral

lel C

ompu

ting

(and

this

cla

ss)

•La

rge-

scal

e pa

ralle

l com

pute

r ena

bles

fast

com

putin

g in

larg

e-sc

ale

scie

ntifi

c si

mul

atio

ns w

ith d

etai

led

mod

els.

C

ompu

tatio

nal s

cien

ce d

evel

ops

new

fron

tiers

of

scie

nce

and

engi

neer

ing.

•W

hy p

aral

lel c

ompu

ting

?–

fast

er &

larg

er–

“larg

er” i

s m

ore

impo

rtant

from

the

view

poi

nt o

f “ne

w fr

ontie

rs

of s

cien

ce &

eng

inee

ring”

, but

“fas

ter”

is a

lso

impo

rtant

.–

+ m

ore

com

plic

ated

–Id

eal:

Sca

labl

e•

Sol

ving

Nx

scal

e pr

oble

m u

sing

Nx

com

puta

tiona

l res

ourc

es d

urin

g sa

me

com

puta

tion

time.

MP

I Pro

gram

min

g

Page 3: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

22

Ove

rvie

w

•W

hat i

s M

PI ?

•Y

our F

irst M

PI P

rogr

am: H

ello

Wor

ld

•G

loba

l/Loc

al D

ata

•C

olle

ctiv

e C

omm

unic

atio

n•

Pee

r-to-

Pee

r Com

mun

icat

ion

MP

I Pro

gram

min

g

Page 4: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

33

Wha

t is

MPI

? (1

/2)

•M

essa

ge P

assi

ng In

terfa

ce•

“Spe

cific

atio

n” o

f mes

sage

pas

sing

AP

I for

dis

tribu

ted

mem

ory

envi

ronm

ent

–N

ot a

pro

gram

, Not

a li

brar

y•

http

://ph

ase.

hpcc

.jp/p

hase

/mpi

-j/m

l/mpi

-j-ht

ml/c

onte

nts.

htm

l

•H

isto

ry–

1992

MP

I For

um–

1994

MP

I-1–

1997

MP

I-2: M

PI I

/O–

2012

MP

I-3: F

ault

Res

ilien

ce, A

sync

hron

ous

Col

lect

ive

•Im

plem

enta

tion

–m

pich

AN

L (A

rgon

ne N

atio

nal L

abor

ator

y), O

penM

PI,

MV

AP

ICH

–H

/W v

endo

rs–

C/C

++, F

OTR

AN

, Jav

a ; U

nix,

Lin

ux, W

indo

ws,

Mac

OS

MP

I Pro

gram

min

g

Page 5: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

44

Wha

t is

MPI

? (2

/2)

•“m

pich

” (fre

e) is

wid

ely

used

–su

ppor

ts M

PI-2

spe

c. (p

artia

lly)

–M

PIC

H2

afte

r Nov

. 200

5.–

http

://w

ww

-uni

x.m

cs.a

nl.g

ov/m

pi/

•W

hy M

PI i

s w

idel

y us

ed a

s de

fact

o st

anda

rd ?

–U

nifo

rm in

terfa

ce th

roug

h M

PI f

orum

•P

orta

ble,

can

wor

k on

any

type

s of

com

pute

rs•

Can

be

calle

d fro

m F

ortra

n, C

, etc

.

–m

pich

•fre

e, s

uppo

rts e

very

arc

hite

ctur

e

•P

VM

(Par

alle

l Virt

ual M

achi

ne) w

as a

lso

prop

osed

in

early

90’

s bu

t not

so

wid

ely

used

as

MP

I

MP

I Pro

gram

min

g

Page 6: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

55

Ref

eren

ces

•W

.Gro

ppet

al.,

Usi

ng M

PI s

econ

d ed

ition

, MIT

Pre

ss, 1

999.

M.J

.Qui

nn, P

aral

lel P

rogr

amm

ing

in C

with

MP

I and

Ope

nMP

, M

cGra

whi

ll, 2

003.

•W

.Gro

ppet

al.,

MP

I:Th

e C

ompl

ete

Ref

eren

ce V

ol.I,

II, M

IT P

ress

, 19

98.

•ht

tp://

ww

w-u

nix.

mcs

.anl

.gov

/mpi

/ww

w/

–A

PI(

App

licat

ion

Inte

rface

) of M

PI

MP

I Pro

gram

min

g

Page 7: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

66

How

to le

arn

MPI

(1/2

)•

Gra

mm

ar–

10-2

0 fu

nctio

ns o

f MP

I-1 w

ill be

taug

ht in

the

clas

s•

alth

ough

ther

e ar

e m

any

conv

enie

nt c

apab

ilitie

s in

MP

I-2–

If yo

u ne

ed fu

rther

info

rmat

ion,

you

can

find

info

rmat

ion

from

web

, bo

oks,

and

MP

I exp

erts

.•

Pra

ctic

e is

impo

rtant

–P

rogr

amm

ing

–“R

unni

ng th

e co

des”

is th

e m

ost i

mpo

rtant

•B

e fa

milia

r with

or “

grab

” the

idea

of S

PM

D/S

IMD

op’

s–

Sin

gle

Pro

gram

/Inst

ruct

ion

Mul

tiple

Dat

a–

Eac

h pr

oces

s do

es s

ame

oper

atio

n fo

r diff

eren

t dat

a•

Larg

e-sc

ale

data

is d

ecom

pose

d, a

nd e

ach

part

is c

ompu

ted

by e

ach

proc

ess

–G

loba

l/Loc

al D

ata,

Glo

bal/L

ocal

Num

berin

g

MP

I Pro

gram

min

g

Page 8: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

77

SPM

D

PE

#0

Pro

gram

Dat

a #0

PE

#1

Pro

gram

Dat

a #1

PE

#2

Pro

gram

Dat

a #2

PE

#M

-1

Pro

gram

Dat

a #M

-1

mpirun -np M <Program>

You

unde

rsta

nd 9

0% M

PI,

if yo

u un

ders

tand

this

figu

re.

PE

: Pro

cess

ing

Ele

men

tP

roce

ssor

, Dom

ain,

Pro

cess

Eac

h pr

oces

s do

es s

ame

oper

atio

n fo

r diff

eren

t dat

aLa

rge-

scal

e da

ta is

dec

ompo

sed,

and

eac

h pa

rt is

com

pute

d by

eac

h pr

oces

sIt

is id

eal t

hat p

aral

lel p

rogr

am is

not

diff

eren

t fro

m s

eria

l one

exc

ept c

omm

unic

atio

n.

MP

I Pro

gram

min

g

Page 9: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

88

Som

e Te

chni

cal T

erm

s•

Pro

cess

or, C

ore

–P

roce

ssin

g U

nit (

H/W

), P

roce

ssor

=Cor

e fo

r sin

gle-

core

pro

c’s

•P

roce

ss–

Uni

t for

MP

I com

puta

tion,

nea

rly e

qual

to “c

ore”

–E

ach

core

(or p

roce

ssor

) can

hos

t mul

tiple

pro

cess

es (b

ut n

ot

effic

ient

)•

PE

(Pro

cess

ing

Ele

men

t)–

PE

orig

inal

ly m

ean

“pro

cess

or”,

but i

t is

som

etim

es u

sed

as

“pro

cess

” in

this

cla

ss. M

oreo

ver i

t mea

ns “d

omai

n” (n

ext)

•In

mul

ticor

e pr

oc’s

: PE

gen

eral

ly m

eans

“cor

e”

•D

omai

n–

dom

ain=

proc

ess

(=P

E),

each

of “

MD

” in

“SP

MD

”, ea

ch d

ata

set

•P

roce

ss ID

of M

PI (

ID o

f PE

, ID

of d

omai

n) s

tarts

from

“0”

–if

you

have

8 p

roce

sses

(PE

’s, d

omai

ns),

ID is

0~7

MP

I Pro

gram

min

g

Page 10: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

99

SPM

D

PE

#0

Pro

gram

Dat

a #0

PE

#1

Pro

gram

Dat

a #1

PE

#2

Pro

gram

Dat

a #2

PE

#M

-1

Pro

gram

Dat

a #M

-1

mpirun -np M <Program>

You

unde

rsta

nd 9

0% M

PI,

if yo

u un

ders

tand

this

figu

re.

PE

: Pro

cess

ing

Ele

men

tP

roce

ssor

, Dom

ain,

Pro

cess

Eac

h pr

oces

s do

es s

ame

oper

atio

n fo

r diff

eren

t dat

aLa

rge-

scal

e da

ta is

dec

ompo

sed,

and

eac

h pa

rt is

com

pute

d by

eac

h pr

oces

sIt

is id

eal t

hat p

aral

lel p

rogr

am is

not

diff

eren

t fro

m s

eria

l one

exc

ept c

omm

unic

atio

n.

MP

I Pro

gram

min

g

Page 11: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

1010

How

to le

arn

MPI

(2/2

)

•N

OT

so d

iffic

ult.

•Th

eref

ore,

2-3

lect

ures

are

eno

ugh

for j

ust l

earn

ing

gram

mar

of M

PI.

•G

rab

the

idea

of S

PM

D !

MP

I Pro

gram

min

g

Page 12: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

1111

Sche

dule

•M

PI

–B

asic

Fun

ctio

ns–

Col

lect

ive

Com

mun

icat

ion

–P

oint

-to-P

oint

(or P

eer-

to-P

eer)

Com

mun

icat

ion

•90

min

. x 4

-5 le

ctur

es–

Col

lect

ive

Com

mun

icat

ion

•R

epor

t S1

–P

oint

-to-P

oint

/Pee

r-to

-Pee

r Com

mun

icat

ion

•R

epor

t S2:

Par

alle

lizat

ion

of 1

D c

ode

–A

t thi

s po

int,

you

are

alm

ost a

n ex

pert

of M

PI p

rogr

amm

ing.

MP

I Pro

gram

min

g

Page 13: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

1212

•W

hat i

s M

PI ?

•Yo

ur F

irst M

PI P

rogr

am: H

ello

Wor

ld

•G

loba

l/Loc

al D

ata

•C

olle

ctiv

e C

omm

unic

atio

n•

Pee

r-to-

Pee

r Com

mun

icat

ion

MP

I Pro

gram

min

g

Page 14: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

1313

Logi

n to

Oak

leaf

-FX

MP

I Pro

gram

min

g

ssht71**@oakleaf-fx.cc.u-tokyo.ac.jp

Cre

ate

dire

ctor

y>$ cd

>$ mkdir2014summer

(you

r fav

orite

nam

e)>$ cd 2014summer

In th

is c

lass

this

top-

dire

ctor

y is

cal

led

<$O-TOP>.

File

s ar

e co

pied

to th

is d

irect

ory.

Und

er th

is d

irect

ory,

S1, S2, S1-ref

are

crea

ted:

<$O-S1> = <$O-TOP>/mpi/S1

<$O-S2> = <$O-TOP>/mpi/S2

Oak

leaf

-FX

ECC

S201

2

Page 15: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

1414

Cop

ying

file

s on

Oak

leaf

-FX

MP

I Pro

gram

min

g

Fort

an>$ cd <$O-TOP>

>$ cp/home/z30088/class_eps/F/s1-f.tar .

>$ tar xvfs1-f.tar

C>$ cd <$O-TOP>

>$ cp/home/z30088/class_eps/C/s1-c.tar .

>$ tar xvfs1-c.tar

Con

firm

atio

n>$ ls

mpi

>$ cd mpi/S1

This

dire

ctor

y is

cal

led

as<$O-S1>.

<$O-S1> = <$O-TOP>/mpi/S1

Page 16: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

151515

Firs

t Exa

mpl

eimplicit REAL*8 (A-H,O-Z)

include 'mpif.h‘

integer :: PETOT, my_rank, ierr

call MPI_INIT (ierr)

call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )

call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )

write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT

call MPI_FINALIZE (ierr)

stop

end

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

hello.f

hello.c

MP

I Pro

gram

min

g

Page 17: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

161616

Com

pilin

g he

llo.f/

c

FOR

TRA

N$> mpifrtpx–Kfasthello.f

“mpifrtpx”:

requ

ired

com

pile

r & li

brar

ies

are

incl

uded

for

FORTRAN90+MPI

C$> mpifccpx–Kfasthello.c

“mpifccpx”:

requ

ired

com

pile

r & li

brar

ies

are

incl

uded

for C

+MPI

MP

I Pro

gram

min

g

>$ cd <$O-S1>

>$ mpifrtpx –Kfast hello.f

>$ mpifccpx –Kfast hello.c

Page 18: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

171717

Run

ning

Job

•B

atch

Job

s–

Onl

y ba

tch

jobs

are

allo

wed

.–

Inte

ract

ive

exec

utio

ns o

f job

s ar

e no

t allo

wed

.•

How

to ru

n–

writ

ing

job

scrip

t–

subm

ittin

g jo

b–

chec

king

job

stat

us–

chec

king

resu

lts•

Util

izat

ion

of c

ompu

tatio

nal r

esou

rces

1-no

de (1

6 co

res)

is o

ccup

ied

by e

ach

job.

–Y

our n

ode

is n

ot s

hare

d by

oth

er jo

bs.

MP

I Pro

gram

min

g

Page 19: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

181818

Job

Scrip

t•<$O-S1>/hello.sh

•S

ched

ulin

g +

She

ll S

crip

t#!/bin/sh

#PJM -L “node=1“

Number of Nodes

#PJM -L “elapse=00:10:00“

Computation Time

#PJM -L “rscgrp=lecture2“

Name of “QUEUE”

#PJM -g “gt82“

Group Name (Wallet)

#PJM -j

#PJM -o “hello.lst“

Standard Output

#PJM --mpi “proc=4“

MPI Process #

mpiexec ./a.out

Execs

MP

I Pro

gram

min

g

8 proc’s

“node=1“

“proc=8”

16 proc’s

“node=1“

“proc=16”

32 proc’s

“node=2“

“proc=32”

64 proc’s

“node=4“

“proc=64”

192 proc’s

“node=12“

“proc=192”

Page 20: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

191919

Subm

ittin

g Jo

bsM

PI P

rogr

amm

ing

>$ cd <$O-S1>

>$ pjsub hello.sh

>$ cat hello.lst

Hello World 0

Hello World 3

Hello World 2

Hello World 1

Page 21: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

202020

Ava

ilabl

e Q

UEU

E’s

•Fo

llow

ing

2 qu

eues

are

ava

ilabl

e.•

1 To

fu (1

2 no

des)

can

be

used

–lecture

•12

nod

es (1

92 c

ores

), 15

min

., va

lid u

ntil

the

end

of

Oct

ober

, 201

4•

Sha

red

by a

ll “e

duca

tiona

l” us

ers

–lecture2

•12

nod

es (1

92 c

ores

), 15

min

., ac

tive

durin

g cl

ass

time

•M

ore

jobs

(com

pare

d to

lecture

) can

be

proc

esse

d up

on

ava

ilabi

lity.

MP

I Pro

gram

min

g

Page 22: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Tofu

Inte

rcon

nect

•N

ode

Gro

up–

12no

des

–A

-/C-a

xis:

4 n

odes

in s

yste

m b

oard

, B-a

xis:

3 b

oard

s•

6D: (

X,Y

,Z,A

,B,C

)–

AB

C 3

D M

esh:

in e

ach

node

gro

up: 2

×2×

3–

XY

Z 3D

Mes

h: c

onne

ctio

n of

nod

e gr

oups

: 10×

5×8

•Jo

b su

bmis

sion

acc

ordi

ng to

net

wor

k to

polo

gy is

pos

sibl

e:–

Info

rmat

ion

abou

t use

d “X

YZ”

is a

vaila

ble

afte

r exe

cutio

n.

21

Page 23: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

222222

Subm

ittin

g &

Che

ckin

g Jo

bs•

Sub

mitt

ing

Jobs

pjsubSCRIPT NAME

•C

heck

ing

stat

us o

f job

spjstat

•D

elet

ing/

abor

ting

pjdel JOB ID

•C

heck

ing

stat

us o

f que

ues

pjstat --rsc

•D

etai

led

info

. of q

ueue

spjstat --rsc –x

•N

umbe

r of r

unni

ng jo

bspjstat --rsc –b

•Li

mita

tion

of s

ubm

issi

onpjstat --limit

[z30088@oakleaf-fx-6 S2-ref]$ pjstat

Oakleaf-FX scheduled stop time: 2012/09/28(Fri) 09:00:00 (Remain: 31days 20:01:46)

JOB_ID JOB_NAME STATUS PROJECT RSCGROUP START_DATE ELAPSE TOKEN NODE:COORD

334730 go.sh RUNNING gt61 lecture 08/27 12:58:08 00:00:05 0.0 1

MP

I Pro

gram

min

g

Page 24: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

232323

Bas

ic/E

ssen

tial F

unct

ions

implicit REAL*8 (A-H,O-Z)

include 'mpif.h‘

integer :: PETOT, my_rank, ierr

call MPI_INIT (ierr)

call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )

call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )

write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT

call MPI_FINALIZE (ierr)

stop

end

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

‘mpif.h’, “mpi.h”

Ess

entia

l Inc

lude

file

“use mpi”

is p

ossi

ble

in F

90

MPI_Init

Initi

aliz

atio

n

MPI_Comm_size

Num

ber o

f MP

I Pro

cess

esmpirun

-np

XX

<prog>

MPI_Comm_rank

Pro

cess

ID s

tarti

ng fr

om 0

MPI_Finalize

Term

inat

ion

of M

PI p

roce

sses

MP

I Pro

gram

min

g

Page 25: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

242424

Diff

eren

ce b

etw

een

FOR

TRA

N/C

•(B

asic

ally

) sam

e in

terfa

ce–

In C

, UP

PE

R/lo

wer

cas

es a

re c

onsi

dere

d as

diff

eren

t•

e.g.

: MPI

_Com

m_s

ize

–M

PI:

UP

PE

R c

ase

–Fi

rst c

hara

cter

of t

he fu

nctio

n ex

cept

“MP

I_” i

s in

UP

PE

R c

ase.

–O

ther

cha

ract

ers

are

in lo

wer

cas

e.

•In

For

tran,

retu

rn v

alue

ierr

has

to b

e ad

ded

at th

e en

d of

the

argu

men

t lis

t. •

Cne

eds

spec

ial t

ypes

for v

aria

bles

:–

MP

I_C

omm

, MP

I_D

atat

ype,

MP

I_O

p et

c.•MPI_INIT

is d

iffer

ent:

–call MPI_INIT (ierr)

–MPI_Init (int *argc, char ***argv)

MP

I Pro

gram

min

g

Page 26: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

252525

Wha

t’s a

re g

oing

on

?

•mpiexec

star

ts u

p 4

MP

I pro

cess

es(”p

roc=

4”)

–A

sin

gle

prog

ram

runs

on

four

pro

cess

es.

–ea

ch p

roce

ss w

rites

a v

alue

of myid

•Fo

ur p

roce

sses

do

sam

e op

erat

ions

, but

va

lues

of myid

are

diffe

rent

.•

Out

put o

f eac

h pr

oces

s is

diff

eren

t.•

That

is S

PM

D !

MP

I Pro

gram

min

g

#!/bin/sh

#PJM -L “node=1“

Number of Nodes

#PJM -L “elapse=00:10:00“

Computation Time

#PJM -L “rscgrp=lecture“

Name of “QUEUE”

#PJM -g “gt64“

Group Name (Wallet)

#PJM -j

#PJM -o “hello.lst“

Standard Output

#PJM --mpi“proc=4“

MPI Process #

mpiexec./a.out

Execs

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

Page 27: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

262626

mpi

.h,

mpi

f.himplicit REAL*8 (A-H,O-Z)

include 'mpif.h‘

integer :: PETOT, my_rank, ierr

call MPI_INIT (ierr)

call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )

call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )

write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT

call MPI_FINALIZE (ierr)

stop

end

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

•V

ario

us ty

pes

of p

aram

eter

s an

d va

riabl

es fo

r MP

I & th

eir i

nitia

l val

ues.

•N

ame

of e

ach

var.

star

ts fr

om “M

PI_

”•

Val

ues

of th

ese

para

met

ers

and

varia

bles

can

not b

e ch

ange

d by

us

ers.

•U

sers

do

not s

peci

fy v

aria

bles

st

artin

g fro

m “M

PI_

” in

user

s’

prog

ram

s.

MP

I Pro

gram

min

g

Page 28: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

272727

MPI

_Ini

t•

Initi

aliz

e th

e M

PI e

xecu

tion

envi

ronm

ent (

requ

ired)

•It

is re

com

men

ded

to p

ut th

is B

EFO

RE

all

stat

emen

ts in

the

prog

ram

.

•MPI_Init(argc, argv)

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

CM

PI P

rogr

amm

ing

Page 29: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

282828

MPI

_Fin

aliz

e•

Term

inat

es M

PI e

xecu

tion

envi

ronm

ent (

requ

ired)

•It

is re

com

men

ded

to p

ut th

is A

FTE

R a

ll st

atem

ents

in th

e pr

ogra

m.

•P

leas

e do

not

forg

et th

is.

•MPI_Finalize()

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

CM

PI P

rogr

amm

ing

Page 30: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

292929

MPI

_Com

m_s

ize

•D

eter

min

es th

e si

ze o

f the

gro

up a

ssoc

iate

d w

ith a

com

mun

icat

or•

not r

equi

red,

but

ver

y co

nven

ient

func

tion

•MPI_Comm_size (comm, size)

–comm

MP

I_C

ommI

com

mun

icat

or–

size

int

Onu

mbe

r of p

roce

sses

in th

e gr

oup

of c

omm

unic

ator

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

CM

PI P

rogr

amm

ing

Page 31: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

303030

Wha

t is

Com

mun

icat

or ?

•G

roup

of p

roce

sses

for c

omm

unic

atio

n•

Com

mun

icat

or m

ust b

e sp

ecifi

ed in

MP

I pro

gram

as

a un

it of

com

mun

icat

ion

•A

ll pr

oces

ses

belo

ng to

a g

roup

, nam

ed

“MPI

_CO

MM

_WO

RLD

” (de

faul

t)•

Mul

tiple

com

mun

icat

ors

can

be c

reat

ed, a

nd c

ompl

icat

ed

oper

atio

ns a

re p

ossi

ble.

–C

ompu

tatio

n, V

isua

lizat

ion

•O

nly

“MP

I_C

OM

M_W

OR

LD” i

s ne

eded

in th

is c

lass

.

MPI_Comm_Size (MPI_COMM_WORLD, PETOT)

MP

I Pro

gram

min

g

Page 32: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

313131

MPI

_CO

MM

_WO

RLD

Com

mun

icat

or in

MPI

One

pro

cess

can

bel

ong

to m

ultip

le c

omm

unic

ator

s

CO

MM

_MA

NTL

EC

OM

M_C

RU

ST

CO

MM

_VIS

MP

I Pro

gram

min

g

Page 33: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

323232

Cou

plin

g be

twee

n “G

roun

d M

otio

n” a

nd

“Slo

shin

g of

Tan

ks fo

r Oil-

Stor

age”

MP

I Pro

gram

min

g

Page 34: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared
Page 35: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

3434

Targ

et A

pplic

atio

n•

Cou

plin

g be

twee

n “G

roun

d M

otio

n” a

nd “S

losh

ing

of

Tank

s fo

r Oil-

Sto

rage

”–

“One

-way

” cou

plin

g fro

m “G

roun

d M

otio

n” to

“Tan

ks”.

–D

ispl

acem

ent o

f gro

und

surfa

ce is

giv

en a

s fo

rced

di

spla

cem

ent o

f bot

tom

sur

face

of t

anks

.–

1 Ta

nk =

1 P

E (s

eria

l)

Def

orm

atio

nof

sur

face

w

ill b

e gi

ven

as

boun

dary

con

ditio

nsat

bot

tom

of t

anks

.

Def

orm

atio

nof

sur

face

w

ill b

e gi

ven

as

boun

dary

con

ditio

nsat

bot

tom

of t

anks

.

MP

I Pro

gram

min

g

Page 36: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

35

2003

Tok

achi

Ear

thqu

ake

(M8.

0)Fi

re a

ccid

ent o

f oil

tank

s du

e to

long

per

iod

grou

nd m

otio

n (s

urfa

ce w

aves

) dev

elop

ed in

the

basi

n of

Tom

akom

ai

MP

I Pro

gram

min

g

Page 37: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

36

Seis

mic

Wav

e Pr

opag

atio

n,

Und

ergr

ound

Str

uctu

re

MP

I Pro

gram

min

g

Page 38: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

3737

Sim

ulat

ion

Cod

es•

Gro

und

Mot

ion

(Ichi

mur

a): F

ortra

n–

Par

alle

l FE

M, 3

D E

last

ic/D

ynam

ic•

Exp

licit

forw

ard

Eul

er s

chem

e

–E

ach

elem

ent:

2m×

2m×

2m c

ube

–24

0m×

240m

×10

0m re

gion

•S

losh

ing

of T

anks

(Nag

ashi

ma)

: C–

Ser

ial F

EM

(Em

barr

assi

ngly

Par

alle

l)•

Impl

icit

back

war

d E

uler

, Sky

line

met

hod

•S

hell

elem

ents

+ In

visc

id p

oten

tial f

low

–D

: 42.

7m, H

: 24.

9m, T

: 20m

m,

–Fr

eque

ncy:

7.6

sec.

–80

ele

men

ts in

circ

., 0.

6m m

esh

in h

eigh

t–

Tank

-to-T

ank:

60m

, 4×

4•

Tota

l num

ber o

f unk

now

ns: 2

,918

,169

MP

I Pro

gram

min

g

Page 39: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

383838

Thre

e C

omm

unic

ator

sm

eshG

LOB

AL%

MP

I_C

OM

M

base

mem

t#0

base

men

t#1

base

men

t#2

base

men

t#3

mes

hBA

SE

%M

PI_

CO

MM

tank #0

tank #1

tank #2

tank #3

tank #4

tank #5

tank #6

tank #7

tank #8

mes

hTA

NK

%M

PI_

CO

MM

mes

hGLO

BA

L%m

y_ra

nk=

0~3

mes

hBA

SE

%m

y_ra

nk=

0~3

mes

hGLO

BA

L%m

y_ra

nk=

4~12

mes

hTA

NK

%m

y_ra

nk=

0~ 8

mes

hTA

NK

%m

y_ra

nk=

-1m

eshB

AS

E%

my_

rank

= -1

mes

hGLO

BA

L%M

PI_

CO

MM

base

mem

t#0

base

men

t#1

base

men

t#2

base

men

t#3

mes

hBA

SE

%M

PI_

CO

MM

base

mem

t#0

base

men

t#1

base

men

t#2

base

men

t#3

mes

hBA

SE

%M

PI_

CO

MM

tank #0

tank #1

tank #2

tank #3

tank #4

tank #5

tank #6

tank #7

tank #8

mes

hTA

NK

%M

PI_

CO

MM

tank #0

tank #1

tank #2

tank #3

tank #4

tank #5

tank #6

tank #7

tank #8

mes

hTA

NK

%M

PI_

CO

MM

mes

hGLO

BA

L%m

y_ra

nk=

0~3

mes

hBA

SE

%m

y_ra

nk=

0~3

mes

hGLO

BA

L%m

y_ra

nk=

4~12

mes

hTA

NK

%m

y_ra

nk=

0~ 8

mes

hTA

NK

%m

y_ra

nk=

-1m

eshB

AS

E%

my_

rank

= -1

MP

I Pro

gram

min

g

Page 40: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

393939

MPI

_Com

m_r

ank

•D

eter

min

es th

e ra

nk o

f the

cal

ling

proc

ess

in th

e co

mm

unic

ator

–“ID

of M

PI p

roce

ss” i

s so

met

imes

cal

led

“ran

k”

•MPI_Comm_rank (comm, rank)

–comm

MP

I_C

ommI

com

mun

icat

or–

rank

int

Ora

nk o

f the

cal

ling

proc

ess

in th

e gr

oup

of c

omm

Sta

rting

from

“0”

#include "mpi.h"

#include <stdio.h>

int main(int argc, char **argv)

{int n, myid, numprocs, i;

MPI_Init(&argc,&argv);

MPI_Comm_size(MPI_COMM_WORLD,&numprocs);

MPI_Comm_rank(MPI_COMM_WORLD,&myid);

printf ("Hello World %d¥n", myid);

MPI_Finalize();

}

CM

PI P

rogr

amm

ing

Page 41: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

404040

MPI

_Abo

rt•

Abo

rts M

PI e

xecu

tion

envi

ronm

ent

•MPI_Abort (comm, errcode)

–comm

MP

I_C

ommI

com

mun

icat

or–

errcode

int

Oer

ror c

ode

CM

PI P

rogr

amm

ing

Page 42: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

414141

MPI

_Wtim

e•

Ret

urns

an

elap

sed

time

on th

e ca

lling

proc

esso

r

•time= MPI_Wtime ()

–time

double

OTi

me

in s

econ

ds s

ince

an

arbi

trary

tim

e in

the

past

.

… double Stime, Etime;

Stime= MPI_Wtime ();

(…)

Etime= MPI_Wtime ();

CM

PI P

rogr

amm

ing

Page 43: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

424242

Exam

ple

of M

PI_W

time

MP

I Pro

gram

min

g $> cd <$O-S1>

$> mpifccpx –O1 time.c

$> mpifrtpx –O1 time.f

(modify go4.sh, 4 processes)

$> pjsub go4.sh

0 1.113281E+00

3 1.113281E+00

2 1.117188E+00

1 1.117188E+00

Process Time

ID

Page 44: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

434343

MPI

_Wtic

k•

Ret

urns

the

reso

lutio

n of

MP

I_W

time

•de

pend

s on

har

dwar

e, a

nd c

ompi

ler

•time= MPI_Wtick ()

–time

double

OTi

me

in s

econ

ds o

f res

olut

ion

of M

PI_

Wtim

e

implicit REAL*8 (A-H,O-Z)

include 'mpif.h'

… TM= MPI_WTICK ()

write (*,*) TM

double Time;

… Time = MPI_Wtick();

printf("%5d%16.6E¥n", MyRank, Time);

MP

I Pro

gram

min

g

Page 45: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

444444

Exam

ple

of M

PI_W

tick

MP

I Pro

gram

min

g

$> cd <$O-S1>

$> mpifccpx –O1 wtick.c

$> mpifrtpx –O1 wtick.f

(modify go1.sh, 1 process)

$> pjsub go1.sh

Page 46: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

454545

MPI

_Bar

rier

•B

lock

s un

til a

ll pr

oces

ses

in th

e co

mm

unic

ator

hav

e re

ache

d th

is ro

utin

e.•

Mai

nly

for d

ebug

ging

, hug

e ov

erhe

ad, n

ot re

com

men

ded

for r

eal c

ode.

•MPI_Barrier (comm)

–comm

MP

I_C

ommI

com

mun

icat

or

CM

PI P

rogr

amm

ing

Page 47: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

464646

•W

hat i

s M

PI ?

•Y

our F

irst M

PI P

rogr

am: H

ello

Wor

ld

•G

loba

l/Loc

al D

ata

•C

olle

ctiv

e C

omm

unic

atio

n•

Pee

r-to-

Pee

r Com

mun

icat

ion

MP

I Pro

gram

min

g

Page 48: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

4747

Dat

a St

ruct

ures

& A

lgor

ithm

s

•C

ompu

ter p

rogr

am c

onsi

sts

of d

ata

stru

ctur

es a

nd

algo

rithm

s.•

They

are

clo

sely

rela

ted.

In o

rder

to im

plem

ent a

n al

gorit

hm, w

e ne

ed to

spe

cify

an

appr

opria

te d

ata

stru

ctur

e fo

r tha

t.–

We

can

even

say

that

“Dat

a S

truct

ures

=Alg

orith

ms”

–S

ome

peop

le m

ay n

ot a

gree

with

this

, I (K

N) t

hink

it is

true

for

scie

ntifi

c co

mpu

tatio

ns fr

om m

y ex

perie

nces

.

•A

ppro

pria

te d

ata

stru

ctur

es fo

r par

alle

l com

putin

g m

ust

be s

peci

fied

befo

re s

tarti

ng p

aral

lel c

ompu

ting.

MP

I Pro

gram

min

g

Page 49: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

484848

SPM

D:Si

ngle

Pro

gram

Mul

tiple

Dat

a

•Th

ere

are

vario

us ty

pes

of “p

aral

lel c

ompu

ting”

, and

th

ere

are

man

y al

gorit

hms.

•C

omm

on is

sue

is S

PM

D (S

ingl

e P

rogr

am M

ultip

le D

ata)

.•

It is

idea

l tha

t par

alle

l com

putin

g is

don

e in

the

sam

e w

ay fo

r ser

ial c

ompu

ting

(exc

ept c

omm

unic

atio

ns)

–It

is re

quire

d to

spe

cify

pro

cess

es w

ith c

omm

unic

atio

ns a

nd

thos

e w

ithou

t com

mun

icat

ions

.

MP

I Pro

gram

min

g

Page 50: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

494949

Wha

t is

a da

ta s

truc

ture

whi

ch is

ap

prop

riate

for S

PMD

?

PE

#0

Pro

gram

Dat

a #0

PE

#1

Pro

gram

Dat

a #1

PE

#2

Pro

gram

Dat

a #2

PE

#3

Pro

gram

Dat

a #3

MP

I Pro

gram

min

g

Page 51: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

505050

Dat

a St

ruct

ure

for S

MPD

(1/2

)•

SP

MD

: Lar

ge d

ata

is d

ecom

pose

d in

to s

mal

l pie

ces,

and

ea

ch p

iece

is p

roce

ssed

by

each

pro

cess

or/p

roce

ss•

Con

side

r the

follo

win

g si

mpl

e co

mpu

tatio

n fo

r vec

tor V

gw

ith le

ngh

of N

g(=

20):

•If

you

com

pute

this

usi

ng fo

ur p

roce

ssor

s, e

ach

proc

esso

r sto

res

and

proc

esse

s 5

(=20

/4)

com

pone

nts

of V

g.

int main(){

int i,Ng;

double Vg[20];

Ng=20;

for(i=0;i<Ng;i++){

Vg[i] = 2.0*Vg[i];}

return 0;}

MP

I Pro

gram

min

g

Page 52: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

515151

Dat

a St

ruct

ure

for S

MPD

(2/2

)•

i.e.

•Th

us, a

“sin

gle

prog

ram

” can

exe

cute

par

alle

l pro

cess

ing.

–In

eac

h pr

oces

s, c

ompo

nent

s of

“Vl”

are

diffe

rent

: Mul

tiple

Dat

a –

Com

puta

tion

usin

g on

ly “V

l” (a

s lo

ng a

s po

ssib

le) l

eads

to

effic

ient

par

alle

l com

puta

tion.

–P

rogr

am is

not

diff

eren

t fro

m th

at fo

r ser

ial C

PU

(in

the

prev

ious

pa

ge).

MP

I Pro

gram

min

g

int main(){

int i,Nl;

double Vl[5];

Nl=5;

for(i=0;i<Nl;i++){

Vl[i] = 2.0*Vl[i];}

return 0;}

Page 53: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

525252

Glo

bal &

Loc

al D

ata

•V

g –E

ntire

Dom

ain

–“G

loba

l Dat

a” w

ith “G

loba

l ID

” fro

m 1

to 2

0•

Vl –fo

r Eac

h P

roce

ss (P

E, P

roce

ssor

, Dom

ain)

–“L

ocal

Dat

a” w

ith “L

ocal

ID” f

rom

1 to

5–

Effi

cien

t util

izat

ion

of lo

cal d

ata

lead

s to

exc

elle

nt p

aral

lel

effic

ienc

y.

MP

I Pro

gram

min

g

Page 54: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

5353

Idea

of L

ocal

Dat

a in

CVg

: Glo

bal D

ata

•0th-4

thco

mp.

on

PE

#0•5

th-9

thco

mp.

on

PE

#1•1

0th -

14th

com

p. o

n P

E#2

•15t

h -19

thco

mp.

on

PE

#3

Eac

h of

thes

e fo

ur s

ets

corr

espo

nds

to 0

th-4

th

com

pone

nts

of V

l(lo

cal

data

) whe

re th

ere

loca

l ID

’s

are

0-4.

Vl[0]

Vl[1]

Vl[2]

Vl[3]

Vl[4]

PE#0

PE#1

PE#2

PE#3

Vg[ 0]

Vg[ 1]

Vg[ 2]

Vg[ 3]

Vg[ 4]

Vg[ 5]

Vg[ 6]

Vg[ 7]

Vg[ 8]

Vg[ 9]

Vg[10]

Vg[11]

Vg[12]

Vg[13]

Vg[14]

Vg[15]

Vg[16]

Vg[17]

Vg[18]

Vg[19]

Vl[0]

Vl[1]

Vl[2]

Vl[3]

Vl[4]

Vl[0]

Vl[1]

Vl[2]

Vl[3]

Vl[4]

Vl[0]

Vl[1]

Vl[2]

Vl[3]

Vl[4]C

MP

I Pro

gram

min

g

Page 55: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

5454

Glo

bal &

Loc

al D

ata

•V

g –E

ntire

Dom

ain

–“G

loba

l Dat

a” w

ith “G

loba

l ID

” fro

m 1

to 2

0•

Vl –fo

r Eac

h P

roce

ss (P

E, P

roce

ssor

, Dom

ain)

–“L

ocal

Dat

a” w

ith “L

ocal

ID” f

rom

1 to

5•

Ple

ase

keep

you

r atte

ntio

n to

the

follo

win

g:–

How

to g

ener

ate

Vl(

loca

l dat

a) fr

om V

g (g

loba

l dat

a)–

How

to m

ap c

ompo

nent

s, fr

om V

g to

Vl,

and

from

Vlt

o V

g.–

Wha

t to

do if

Vlc

anno

t be

calc

ulat

ed o

n ea

ch p

roce

ss in

in

depe

nden

t man

ner.

–P

roce

ssin

g as

loca

lized

as

poss

ible

lead

s to

exc

elle

nt p

aral

lel

effic

ienc

y:

•D

ata

stru

ctur

es &

alg

orith

ms

for t

hat p

urpo

se.

MP

I Pro

gram

min

g

Page 56: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

555555

•W

hat i

s M

PI ?

•Y

our F

irst M

PI P

rogr

am: H

ello

Wor

ld

•G

loba

l/Loc

al D

ata

•C

olle

ctiv

e C

omm

unic

atio

n•

Pee

r-to-

Pee

r Com

mun

icat

ion

MP

I Pro

gram

min

g

Page 57: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

565656

Wha

t is

Col

lect

ive

Com

mun

icat

ion

?集

団通

信,グル

ープ通

•C

olle

ctiv

e co

mm

unic

atio

n is

the

proc

ess

of

exch

angi

ng in

form

atio

n be

twee

n m

ultip

le M

PI

proc

esse

s in

the

com

mun

icat

or: o

ne-to

-all

or a

ll-to

-all

com

mun

icat

ions

.

•E

xam

ples

–B

road

cast

ing

cont

rol d

ata

–M

ax, M

in–

Sum

mat

ion

–D

ot p

rodu

cts

of v

ecto

rs–

Tran

sfor

mat

ion

of d

ense

mat

rices

MP

I Pro

gram

min

g

Page 58: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

575757

Exam

ple

of C

olle

ctiv

e C

omm

unic

atio

ns (1

/4)

A0

P#0

B0

C0

D0

P#1

P#2

P#3

Bro

adca

stA

0P

#0B

0C

0D

0

A0

P#1

B0

C0

D0

A0

P#2

B0

C0

D0

A0

P#3

B0

C0

D0

A0

P#0

B0

C0

D0

P#1

P#2

P#3

Scat

ter

A0

P#0

B0

P#1

C0

P#2

D0

P#3

Gat

her

MP

I Pro

gram

min

g

Page 59: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

585858

Exam

ple

of C

olle

ctiv

e C

omm

unic

atio

ns (2

/4)

All

gath

erA

0P

#0B

0C

0D

0

A0

P#1

B0

C0

D0

A0

P#2

B0

C0

D0

A0

P#3

B0

C0

D0

All-

to-A

ll

A0

P#0

B0

P#1

C0

P#2

D0

P#3

A0

P#0

A1

A2

A3

B0

P#1

B1

B2

B3

C0

P#2

C1

C2

C3

D0

P#3

D1

D2

D3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

MP

I Pro

gram

min

g

Page 60: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

595959

Exam

ple

of C

olle

ctiv

e C

omm

unic

atio

ns (3

/4)

Red

uce

P#0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

All

redu

ceP

#0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

MP

I Pro

gram

min

g

Page 61: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

606060

Exam

ple

of C

olle

ctiv

e C

omm

unic

atio

ns (4

/4)

Red

uce

scat

ter

P#0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

op.A

0-A

3

op.B

0-B

3

op.C

0-C

3

op.D

0-D

3

MP

I Pro

gram

min

g

Page 62: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

616161

Exam

ples

by

Col

lect

ive

Com

m.

•D

ot P

rodu

cts

of V

ecto

rs•

Sca

tter/G

athe

r•

Rea

ding

Dis

tribu

ted

File

s•

MP

I_A

llgat

herv

MP

I Pro

gram

min

g

Page 63: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

626262

Glo

bal/L

ocal

Dat

a•

Dat

a st

ruct

ure

of p

aral

lel c

ompu

ting

base

d on

SP

MD

, w

here

larg

e sc

ale

“glo

bal d

ata”

is d

ecom

pose

d to

sm

all

piec

es o

f “lo

cal d

ata”

.

MP

I Pro

gram

min

g

Page 64: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

636363

Large-scale

Data

local

data

local

data

local

data

local

data

local

data

local

data

local

data

local

data

comm.

Domain

Decomposition

Dom

ain

Dec

ompo

sitio

n/Pa

rtiti

onin

g•

PC

with

1G

B R

AM

: can

exe

cute

FE

M a

pplic

atio

n w

ith u

p to

10

6m

eshe

s–

103 k

103

km×

102

km(S

W J

apan

): 10

8m

eshe

s by

1km

cu

bes

•La

rge-

scal

e D

ata:

Dom

ain

deco

mpo

sitio

n, p

aral

lel &

loca

l op

erat

ions

•G

loba

l Com

puta

tion:

Com

m. a

mon

g do

mai

ns n

eede

d MP

I Pro

gram

min

g

Page 65: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

646464

Loca

l Dat

a St

ruct

ure

•It

is im

porta

nt to

def

ine

prop

er lo

cal d

ata

stru

ctur

e fo

r ta

rget

com

puta

tion

(and

its

algo

rithm

)–

Alg

orith

ms=

Dat

a S

truct

ures

•M

ain

obje

ctiv

e of

this

cla

ss !

MP

I Pro

gram

min

g

Page 66: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

656565

Glo

bal/L

ocal

Dat

a•

Dat

a st

ruct

ure

of p

aral

lel c

ompu

ting

base

d on

SP

MD

, w

here

larg

e sc

ale

“glo

bal d

ata”

is d

ecom

pose

d to

sm

all

piec

es o

f “lo

cal d

ata”

.

•C

onsi

der t

he d

ot p

rodu

ct o

f fol

low

ing

VE

Cp

and

VE

Cs

with

leng

th=2

0 by

par

alle

l com

puta

tion

usin

g 4

proc

esso

rs

MP

I Pro

gram

min

g

VECp[ 0]= 2

[ 1]= 2

[ 2]= 2

…[17]= 2

[18]= 2

[19]= 2

VECs[ 0]= 3

[ 1]= 3

[ 2]= 3

…[17]= 3

[18]= 3

[19]= 3

Page 67: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

666666

<$O

-S1>

/dot

.f, d

ot.c

implicit REAL*8 (A-H,O-Z)

real(kind=8),dimension(20):: &

VECp, VECs

do i= 1, 20

VECp(i)= 2.0d0

VECs(i)= 3.0d0

enddo

sum= 0.d0

do ii= 1, 20

sum= sum + VECp(ii)*VECs(ii)

enddo

stop

end

#include <stdio.h>

int main(){

int i;

double VECp[20], VECs[20]

double sum;

for(i=0;i<20;i++){

VECp[i]= 2.0;

VECs[i]= 3.0;

} sum = 0.0;

for(i=0;i<20;i++){

sum += VECp[i] * VECs[i];

} return 0;

}

MP

I Pro

gram

min

g

Page 68: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

676767

<$O

-S1>

/dot

.f, d

ot.c

(do

it on

EC

CS

2012

)

MP

I Pro

gram

min

g

>$ cd <$O-S1>

>$ cc -O3 dot.c

>$ f90 –O3 dot.f

>$ ./a.out

1 2. 3.

2 2. 3.

3 2. 3.

…18 2. 3.

19 2. 3.

20 2. 3.

dot product 120.

Page 69: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

686868

MPI

_Red

uce

•R

educ

es v

alue

s on

all

proc

esse

s to

a s

ingl

e va

lue

–S

umm

atio

n, P

rodu

ct, M

ax, M

in e

tc.

•MPI_Reduce(sendbuf,recvbuf,count,datatype,op,root,comm)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

end

buffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s re

ceiv

e bu

ffer

type

is d

efin

ed b

y ”datatype”

–count

int

Inu

mbe

r of e

lem

ents

in s

end/

rece

ive

buffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

end/

reci

vebu

ffer

FORTRAN MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_CHARACTER etc.

C MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR etc

–op

MP

I_O

pI

redu

ce o

pera

tion

MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND etc

Use

rs c

an d

efin

e op

erat

ions

by MPI_OP_CREATE

–root

int

Ira

nk o

f roo

t pro

cess

comm

MP

I_C

ommI

com

mun

icat

or

Red

uce

P#0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

C

MP

I Pro

gram

min

g

Page 70: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

696969

Send

/Rec

eive

Buf

fer

(Sen

ding

/Rec

eivi

ng)

•A

rray

s of

“sen

d (s

endi

ng) b

uffe

r” a

nd “r

ecei

ve

(rec

eivi

ng) b

uffe

r” o

ften

appe

ar in

MP

I.

•A

ddre

sses

of “

send

(sen

ding

) buf

fer”

and

“rec

eive

(r

ecei

ving

) buf

fer”

mus

t be

diffe

rent

.

MP

I Pro

gram

min

g

Page 71: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

707070

Exam

ple

of M

PI_R

educ

e (1

/2)

MPI_Reduce

(sendbuf,recvbuf,count,datatype,op,root,comm)

double X0, X1;

MPI_Reduce

(&X0, &X1, 1, MPI_DOUBLE, MPI_MAX, 0, <comm>);

Glo

bal M

ax v

alue

s of

X0[

i] go

to X

MA

X[i]

on

#0 p

roce

ss (

i=0~

3)

double X0[4], XMAX[4];

MPI_Reduce

(X0, XMAX, 4, MPI_DOUBLE, MPI_MAX, 0, <comm>);

CM

PI P

rogr

amm

ing

Page 72: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

717171

Exam

ple

of M

PI_R

educ

e (2

/2)

double X0, XSUM;

MPI_Reduce

(&X0, &XSUM, 1, MPI_DOUBLE, MPI_SUM, 0, <comm>)

double X0[4];

MPI_Reduce

(&X0[0], &X0[2], 2, MPI_DOUBLE_PRECISION, MPI_SUM, 0, <comm>)

Glo

bal s

umm

atio

n of

X0

goes

to X

SU

Mon

#0

proc

ess.

・G

loba

l sum

mat

ion

of X

0[0]

goe

s to

X0[

2]on

#0

proc

ess.

・G

loba

l sum

mat

ion

of X

0[1]

goe

s to

X0[

3]on

#0

proc

ess.

MPI_Reduce

(sendbuf,recvbuf,count,datatype,op,root,comm)

CM

PI P

rogr

amm

ing

Page 73: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

727272

MPI

_Bca

st

•B

road

cast

s a

mes

sage

from

the

proc

ess

with

rank

"roo

t" to

all

othe

r pr

oces

ses

of th

e co

mm

unic

ator

•MPI_Bcast (buffer,count,datatype,root,comm)

–buffer

choi

ce

I/O

star

ting

addr

ess

of b

uffe

rty

pe is

def

ined

by ”datatype”

–count

int

Inu

mbe

r of e

lem

ents

in s

end/

recv

buf

fer

–datatype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

end/

recv

buf

fer

FORTRAN MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_CHARACTER etc.

C MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR etc.

–root

int

Ira

nk o

f roo

t pro

cess

comm

MP

I_C

ommI

com

mun

icat

or

A0

P#0

B0

C0

D0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

P#1

P#2

P#3

Bro

adca

stA

0P

#0B

0C

0D

0

A0

P#1

B0

C0

D0

A0

P#2

B0

C0

D0

A0

P#3

B0

C0

D0

A0

P#0

B0

C0

D0

A0

P#1

B0

C0

D0

A0

P#2

B0

C0

D0

A0

P#3

B0

C0

D0 C

MP

I Pro

gram

min

g

Page 74: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

737373

MPI

_Allr

educ

e•

MP

I_R

educ

e +

MP

I_B

cast

•S

umm

atio

n (o

f dot

pro

duct

s) a

nd M

AX

/MIN

val

ues

are

likel

y to

util

ized

in

each

pro

cess

•call MPI_Allreduce

(sendbuf,recvbuf,count,datatype,op, comm)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

end

buffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s re

ceiv

e bu

ffer

type

is d

efin

ed b

y ”datatype”

–count

int

Inu

mbe

r of e

lem

ents

in s

end/

recv

buf

fer

–datatype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

end/

recv

buf

fer

–op

MP

I_O

pI

redu

ce o

pera

tion

–comm

MP

I_C

ommI

com

mun

icat

or

All

redu

ceP

#0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

op.A

0-A

3op

.B0-

B3

op.C

0-C

3op

.D0-

D3

C

MP

I Pro

gram

min

g

Page 75: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

7474

“op”

of M

PI_R

educ

e/A

llred

uce

•MPI_MAX,MPI_MIN

Max

, Min

•MPI_SUM,MPI_PROD

Sum

mat

ion,

Pro

duct

•MPI_LAND

Logi

cal A

ND

MPI_Reduce

(sendbuf,recvbuf,count,datatype,op,root,comm)

C

MP

I Pro

gram

min

g

Page 76: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

757575

Loca

l Dat

a (1

/2)

•D

ecom

pose

vec

tor w

ith le

ngth

=20

into

4 d

omai

ns (p

roce

sses

)•

Eac

h pr

oces

s ha

ndle

s a

vect

or w

ith le

ngth

= 5

VECp[ 0]= 2

[ 1]= 2

[ 2]= 2

…[17]= 2

[18]= 2

[19]= 2

VECs[ 0]= 3

[ 1]= 3

[ 2]= 3

…[17]= 3

[18]= 3

[19]= 3

CM

PI P

rogr

amm

ing

Page 77: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

7676

Loca

l Dat

a (2

/2)

•1t

h -5t

hco

mpo

nent

s of

orig

inal

glo

bal v

ecto

r go

to 1

th-5

th c

ompo

nent

s of

PE

#0, 6

th-1

0th

-> P

E#1

, 11t

h -15

th->

PE

#2, 1

6th -

20th

-> P

E#3

.

MP

I Pro

gram

min

g

C

VECp[0]= 2

[1]= 2

[2]= 2

[3]= 2

[4]= 2

VECs[0]= 3

[1]= 3

[2]= 3

[3]= 3

[4]= 3

PE#0

PE#1

PE#2

PE#3

VECp[ 0]~VECp[ 4]

VECs[ 0]~VECs[ 4]

VECp[ 5]~VECp[ 9]

VECs[ 5]~VECs[ 9]

VECp[10]~VECp[14]

VECs[10]~VECs[14]

VECp[15]~VECp[19]

VECs[15]~VECs[19]

VECp[0]= 2

[1]= 2

[2]= 2

[3]= 2

[4]= 2

VECs[0]= 3

[1]= 3

[2]= 3

[3]= 3

[4]= 3

VECp[0]= 2

[1]= 2

[2]= 2

[3]= 2

[4]= 2

VECs[0]= 3

[1]= 3

[2]= 3

[3]= 3

[4]= 3

VECp[0]= 2

[1]= 2

[2]= 2

[3]= 2

[4]= 2

VECs[0]= 3

[1]= 3

[2]= 3

[3]= 3

[4]= 3

Page 78: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

77

But

...

•It

is to

o ea

sy !!

Jus

t de

com

posi

ng a

nd

renu

mbe

ring

from

1 (o

r 0)

.

•O

f cou

rse,

this

is n

ot

enou

gh. F

urth

er

exam

ples

will

be s

how

n in

the

latte

r par

t.

MP

I Pro

gram

min

g

Vl[0]

Vl[1]

Vl[2]

Vl[3]

Vl[4]

PE#0

PE#1

PE#2

PE#3

Vg[ 0]

Vg[ 1]

Vg[ 2]

Vg[ 3]

Vg[ 4]

Vg[ 5]

Vg[ 6]

Vg[ 7]

Vg[ 8]

Vg[ 9]

Vg[10]

Vg[11]

Vg[12]

Vg[13]

Vg[14]

Vg[15]

Vg[16]

Vg[17]

Vg[18]

Vg[19]

Vl[0]

Vl[1]

Vl[2]

Vl[3]

Vl[4]

Vl[0]

Vl[1]

Vl[2]

Vl[3]

Vl[4]

Vl[0]

Vl[1]

Vl[2]

Vl[3]

Vl[4]

Page 79: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

7878

Exam

ple:

Dot

Pro

duct

(1/3

)<$O-S1>/allreduce.c

MP

I Pro

gram

min

g

#include <stdio.h>

#include <stdlib.h>

#include "mpi.h"

int main(int argc, char **argv){

int i,N;

int PeTot, MyRank;

double VECp[5], VECs[5];

double sumA, sumR, sum0;

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &PeTot);

MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);

sumA= 0.0;

sumR= 0.0;

N=5;

for(i=0;i<N;i++){

VECp[i] = 2.0;

VECs[i] = 3.0;

} sum0 = 0.0;

for(i=0;i<N;i++){

sum0 += VECp[i] * VECs[i];

}

Loca

l vec

tor i

s ge

nera

ted

at e

ach

loca

l pro

cess

.

Page 80: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

7979

Exam

ple:

Dot

Pro

duct

(2/3

)<$O-S1>/allreduce.c

MP

I Pro

gram

min

g

MPI_Reduce(&sum0, &sumR, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

MPI_Allreduce(&sum0, &sumA, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);

printf("before BCAST %5d %15.0F %15.0F¥n", MyRank, sumA, sumR);

MPI_Bcast(&sumR, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

printf("after BCAST %5d %15.0F %15.0F¥n", MyRank, sumA, sumR);

MPI_Finalize();

return 0;

}

Page 81: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

8080

Exam

ple:

Dot

Pro

duct

(3/3

)<$O-S1>/allreduce.c

MP

I Pro

gram

min

g

MPI_Reduce(&sum0, &sumR, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

MPI_Allreduce(&sum0, &sumA, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);

Dot

Pro

duct

Sum

mat

ion

of re

sults

of e

ach

proc

ess

(sum

0)“s

umR

” has

val

ue o

nly

on P

E#0

.

“sum

A” h

as v

alue

on

all p

roce

sses

by

MP

I_A

llred

uce

“sum

R” h

as v

alue

on

PE

#1-#

3 by

MP

I_B

cast

MPI_Bcast(&sumR, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

Page 82: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

818181

Execute <$O-S1>/allreduce.f/c

MP

I Pro

gram

min

g

$> mpifccpx –Kfast allreduce.c

$> mpifrtpx –Kfast allreduce.f

(modify go4.sh, 4 process)

$> pjsub go4.sh

(my_rank, sumALLREDUCE,sumREDUCE)

before BCAST 0 1.200000E+02 1.200000E+02

after BCAST 0 1.200000E+02 1.200000E+02

before BCAST 1 1.200000E+02 0.000000E+00

after BCAST 1 1.200000E+02 1.200000E+02

before BCAST 3 1.200000E+02 0.000000E+00

after BCAST 3 1.200000E+02 1.200000E+02

before BCAST 2 1.200000E+02 0.000000E+00

after BCAST 2 1.200000E+02 1.200000E+02

Page 83: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

828282

Exam

ples

by

Col

lect

ive

Com

m.

•D

ot P

rodu

cts

of V

ecto

rs•

Sca

tter/G

athe

r•

Rea

ding

Dis

tribu

ted

File

s•

MP

I_A

llgat

herv

MP

I Pro

gram

min

g

Page 84: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

8383

Glo

bal/L

ocal

Dat

a (1

/3)

•P

aral

leliz

atio

n of

an

easy

pro

cess

whe

re a

real

num

ber

is a

dded

to e

ach

com

pone

nt o

f rea

l vec

tor V

ECg:

do i= 1, NG

VECg(i)= VECg(i) + ALPHA

enddo

for (i=0; i<NG; i++{

VECg[i]= VECg[i] + ALPHA

}

MP

I Pro

gram

min

g

Page 85: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

8484

Glo

bal/L

ocal

Dat

a (2

/3)

•C

onfig

urat

iona

–N

G=

32 (l

engt

h of

the

vect

or)

–A

LPH

A=1

000.

–P

roce

ss #

of M

PI=

4

•V

ecto

r VE

Cg

has

follo

win

g 32

com

pone

nts

(<

$T-S

1>/a

1x.a

ll):

(101.0, 103.0, 105.0, 106.0, 109.0, 111.0, 121.0, 151.0,

201.0, 203

.0, 205.0, 206

.0, 209.

0, 211

.0, 221.0, 251

.0,

301.0, 303

.0, 305.0, 306

.0, 309.

0, 311

.0, 321.0, 351

.0,

401.0, 403

.0, 405.0, 406

.0, 409.

0, 411

.0, 421.0, 451

.0)

MP

I Pro

gram

min

g

Page 86: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

8585

Glo

bal/L

ocal

Dat

a (3

/3)

•P

roce

dure

①R

eadi

ng v

ecto

r VEC

gw

ith le

ngth

=32

from

one

pro

cess

(e.g

.0th

proc

ess)

–G

loba

l Dat

a②

Dis

tribu

ting

vect

or c

ompo

nent

s to

4 M

PI p

roce

sses

equ

ally

(i.e

.len

gth=

8

for e

ach

proc

esse

s)–

Loca

l Dat

a, L

ocal

ID/N

umbe

ring

③A

ddin

g A

LPH

Ato

eac

h co

mpo

nent

of t

he lo

cal v

ecto

r (w

ith le

ngth

= 8)

on

each

pro

cess

.④

Mer

ging

the

resu

lts to

glo

bal v

ecto

r with

leng

th=

32.

•A

ctua

lly, w

e do

not

nee

d pa

ralle

l com

pute

rs fo

r suc

h a

kind

of

smal

l com

puta

tion.

MP

I Pro

gram

min

g

Page 87: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

8686

Ope

ratio

ns o

f Sca

tter/G

athe

r (1/

8)R

eadi

ng V

ECg

(leng

th=3

2) fr

om a

pro

cess

(e.g

.#0)

•R

eadi

ng g

loba

l dat

a fro

m #

0 pr

oces

s i

nclude

'mpif.h'

integer,

parameter :: NG= 32

real(kind=8), dimension(NG):: VECg

call MPI_INIT (ierr)

call MPI_COMM_SIZE (<comm>, PETOT , ierr)

call MPI_COMM_RANK (<comm>, my_rank, ierr)

if (my_rank.eq.0) then

open (21, file= 'a1x.all', status= 'un

known')

do i= 1, NG

read (21,*) VECg(i)

enddo

close (21)

endif

#include <mpi.h>

#include <stdio.h>

#include <math.h>

#include <assert.h>

int main(int argc, char **argv){

int i, NG=32;

int PeTot, MyRank, MPI_Comm;

double VECg[32];

char filename[80];

FILE *fp;

MPI_Init(&argc, &argv);

MPI_Comm_size(<comm>, &PeTot);

MPI_Comm_rank(<comm>, &MyRank);

fp = fopen("a1x.all", "r");

if(!MyRank) for(i=0;i<NG;i++){

fscanf(fp, "%lf", &VECg[i]);

}

MP

I Pro

gram

min

g

Page 88: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

8787

Ope

ratio

ns o

f Sca

tter/G

athe

r (2/

8)D

istri

butin

g gl

obal

dat

a to

4 p

roce

ss e

qual

ly (i

.e.l

engt

h=8

for

each

pro

cess

)

•M

PI_

Sca

tter

MP

I Pro

gram

min

g

Page 89: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

888888

MPI

_Sca

tter

•S

ends

dat

a fro

m o

ne p

roce

ss to

all

othe

r pro

cess

es in

a c

omm

unic

ator

scount

-siz

e m

essa

ges

are

sent

to e

ach

proc

ess

•MPI_Scatter (sendbuf, scount, sendtype, recvbuf, rcount,

recvtype, root, comm)

–sendbuf

choi

ce

Ist

artin

g ad

dres

s of

sen

ding

buf

fer

type

is d

efin

ed b

y ”datatype”

–scount

int

Inu

mbe

r of e

lem

ents

sen

t to

each

pro

cess

–sendtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

endi

ng b

uffe

rFORTRAN MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_CHARACTER etc.

C MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR etc.

–recvbuf

choi

ce

Ost

artin

g ad

dres

s of

rece

ivin

g bu

ffer

–rcount

int

Inu

mbe

r of e

lem

ents

rece

ived

from

the

root

pro

cess

recvtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of r

ecei

ving

buf

fer

–root

int

Ira

nk o

f roo

t pro

cess

comm

MP

I_C

ommI

com

mun

icat

orA0P

#0B0

C0

D0

P#1

P#2

P#3

A0P

#0B0

C0

D0

P#1

P#2

P#3

Sca

tter

A0P

#0

B0P

#1

C0

P#2

D0

P#3

A0P

#0

B0P

#1

C0

P#2

D0

P#3

Gat

her

C

MP

I Pro

gram

min

g

Page 90: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

898989

MPI

_Sca

tter

(con

t.)•

MPI_Scatter (sendbuf, scount, sendtype, recvbuf, rcount,

recvtype, root, comm)

–sendbuf

choi

ce

Ist

artin

g ad

dres

s of

sen

ding

buf

fer

–scount

int

Inu

mbe

r of e

lem

ents

sen

t to

each

pro

cess

–sendtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

endi

ng b

uffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s of

rece

ivin

g bu

ffer

–rcount

int

Inu

mbe

r of e

lem

ents

rece

ived

from

the

root

pro

cess

recvtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of r

ecei

ving

buf

fer

–root

int

Ira

nk o

f roo

t pro

cess

comm

MP

I_C

ommI

com

mun

icat

or

•U

sual

ly–

scount = rcount

–sendtype= recvtype

•Th

is fu

nctio

n se

nds scount

com

pone

nts

star

ting

from

sendbuf

(sen

ding

bu

ffer)

at p

roce

ss #root

to e

ach

proc

ess

in comm

. Eac

h pr

oces

s re

ceiv

es

rcount

com

pone

nts

star

ting

from

recvbuf

(rec

eivi

ng b

uffe

r).

A0P

#0B0

C0

D0

P#1

P#2

P#3

A0P

#0B0

C0

D0

P#1

P#2

P#3

Sca

tter

A0P

#0

B0P

#1

C0

P#2

D0

P#3

A0P

#0

B0P

#1

C0

P#2

D0

P#3

Gat

her

C

MP

I Pro

gram

min

g

Page 91: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

9090

Ope

ratio

ns o

f Sca

tter/G

athe

r (3/

8)D

istri

butin

g gl

obal

dat

a to

4 p

roce

sses

equ

ally

•A

lloca

ting

rece

ivin

g bu

ffer V

EC(le

ngth

=8) a

t eac

h pr

oces

s.•

8 co

mpo

nent

s se

nt fr

om s

endi

ng b

uffe

r VEC

gof

pro

cess

#0

are

rece

ived

at e

ach

proc

ess

#0-#

3 as

1st-8

thco

mpo

nent

s of

rece

ivin

g bu

ffer V

EC.

call MPI_SCATTER

(sendbuf, scount, sendtype, recvbuf, rcount,

recvtype, root, comm, ierr)

MP

I Pro

gram

min

g

integer, parameter :: N = 8

real(kind=8), dimension(N ) :: VEC

...

call MPI_Scatter &

(VECg, N, MPI_DOUBLE_PRECISION, &

VEC , N, MPI_DOUBLE_PRECISION, &

0, <comm>, ierr)

int N=8;

double VEC [8];

...

MPI_Scatter

(VECg, N, MPI_DOUBLE,

VEC, N,

MPI_DOUBLE, 0, <comm>);

Page 92: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

9191

Ope

ratio

ns o

f Sca

tter/G

athe

r (4/

8)D

istri

butin

g gl

obal

dat

a to

4 p

roce

sses

equ

ally

•8

com

pone

nts

are

scat

tere

dto

eac

h pr

oces

s fro

m ro

ot (#

0)•

1st -8

thco

mpo

nent

s of

VEC

gar

e st

ored

as

1st -8

thon

es o

f VEC

at #

0,

9th -

16th

com

pone

nts

of V

ECg

are

stor

ed a

s 1s

t -8th

ones

of V

ECat

#1,

et

c. –VE

Cg:

Glo

bal D

ata,

VEC

: Loc

al D

ata

VECg

sendbuf

VEC

recvbuf

PE

#0

88

88

8

root

PE

#18

PE

#28

PE

#38

VECg

sendbuf

VEC

recvbuf

PE

#0

88

88

8

root

PE

#18

PE

#28

PE

#38lo

cal d

ata

glob

al d

ata

MP

I Pro

gram

min

g

Page 93: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

9292

Ope

ratio

ns o

f Sca

tter/G

athe

r (5/

8)D

istri

butin

g gl

obal

dat

a to

4 p

roce

sses

equ

ally

•G

loba

l Dat

a: 1

st-3

2nd

com

pone

nts

of V

ECg

at #

0•

Loca

l Dat

a: 1

st-8

thco

mpo

nent

s of

VEC

at e

ach

proc

ess

•E

ach

com

pone

nt o

f VEC

can

be w

ritte

n fro

m e

ach

proc

ess

in th

e fo

llow

ing

way

:

do i= 1, N

write (*,'(a, 2i8,f10.0)') 'before', my_rank, i, VEC(i)

enddo

for(i=0;i<N;i++){

printf("before %5d %5d %10.0F\n", MyRank, i+1, VEC[i]);}

MP

I Pro

gram

min

g

Page 94: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

9393

Ope

ratio

ns o

f Sca

tter/G

athe

r (5/

8)D

istri

butin

g gl

obal

dat

a to

4 p

roce

sses

equ

ally

•G

loba

l Dat

a: 1

st-3

2nd

com

pone

nts

of V

ECg

at #

0•

Loca

l Dat

a: 1

st-8

thco

mpo

nent

s of

VEC

at e

ach

proc

ess

•E

ach

com

pone

nt o

f VEC

can

be w

ritte

n fro

m e

ach

proc

ess

in th

e fo

llow

ing

way

:

MP

I Pro

gram

min

g

PE#0

before 0 1 101.

before 0 2 103.

before 0 3 105.

before 0 4 106.

before 0 5 109.

before 0 6 111.

before 0 7 121.

before 0 8 151.

PE#1

before 1 1 201.

before 1 2 203.

before 1 3 205.

before 1 4 206.

before 1 5 209.

before 1 6 211.

before 1 7 221.

before 1 8 251.

PE#2

before 2 1 301.

before 2 2 303.

before 2 3 305.

before 2 4 306.

before 2 5 309.

before 2 6 311.

before 2 7 321.

before 2 8 351.

PE#3

before 3 1 401.

before 3 2 403.

before 3 3 405.

before 3 4 406.

before 3 5 409.

before 3 6 411.

before 3 7 421.

before 3 8 451.

Page 95: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

9494

Ope

ratio

ns o

f Sca

tter/G

athe

r (6/

8)O

n ea

ch p

roce

ss, A

LPH

Ais

add

ed to

eac

h of

8 c

ompo

nent

s of

VEC

•O

n ea

ch p

roce

ss, c

ompu

tatio

n is

in th

e fo

llow

ing

way

r

eal(kind=8), parameter :: ALPHA= 1000.

do i= 1, N

VEC(i)= VEC(i) + ALPHA

enddo

double ALPHA=1000.;

...

for(i=0;i<N;i++){

VEC[i]= VEC[i] + ALPHA;}

•R

esul

ts:

PE#0

after 0 1 1101.

after 0 2 1103.

after 0 3 1105.

after 0 4 1106.

after 0 5 1109.

after 0 6 1111.

after 0 7 1121.

after 0 8 1151.

PE#1

after 1 1 1201.

after 1 2 1203.

after 1 3 1205.

after 1 4 1206.

after 1 5 1209.

after 1 6 1211.

after 1 7 1221.

after 1 8 1251.

PE#2

after 2 1 1301.

after 2 2 1303.

after 2 3 1305.

after 2 4 1306.

after 2 5 1309.

after 2 6 1311.

after 2 7 1321.

after 2 8 1351.

PE#3

after 3 1 1401.

after 3 2 1403.

after 3 3 1405.

after 3 4 1406.

after 3 5 1409.

after 3 6 1411.

after 3 7 1421.

after 3 8 1451.

MP

I Pro

gram

min

g

Page 96: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

9595

Ope

ratio

ns o

f Sca

tter/G

athe

r (7/

8)M

ergi

ng th

e re

sults

to g

loba

l vec

tor w

ith le

ngth

= 32

•U

sing

MP

I_G

athe

r (in

vers

e op

erat

ion

to M

PI_

Sca

tter)

MP

I Pro

gram

min

g

Page 97: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

969696

MPI

_Gat

her

•G

athe

rs to

geth

er v

alue

s fro

m a

gro

up o

f pro

cess

es, i

nver

se o

pera

tion

to

MP

I_S

catte

r

•MPI_Gather (sendbuf, scount, sendtype, recvbuf, rcount,

recvtype, root, comm)

–sendbuf

choi

ce

Ist

artin

g ad

dres

s of

sen

ding

buf

fer

–scount

int

Inu

mbe

r of e

lem

ents

sen

t to

each

pro

cess

–sendtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

endi

ng b

uffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s of

rece

ivin

g bu

ffer

–rcount

int

Inu

mbe

r of e

lem

ents

rece

ived

from

the

root

pro

cess

recvtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of r

ecei

ving

buf

fer

–root

int

Ira

nk o

f roo

t pro

cess

comm

MP

I_C

ommI

com

mun

icat

or

•recvbuf

is o

n root

proc

ess.

A0P

#0B0

C0

D0

P#1

P#2

P#3

A0P

#0B0

C0

D0

P#1

P#2

P#3

Sca

tter

A0P

#0

B0P

#1

C0

P#2

D0

P#3

A0P

#0

B0P

#1

C0

P#2

D0

P#3

Gat

her

C

MP

I Pro

gram

min

g

Page 98: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

9797

Ope

ratio

ns o

f Sca

tter/G

athe

r (8/

8)M

ergi

ng th

e re

sults

to g

loba

l vec

tor w

ith le

ngth

= 32

•E

ach

proc

ess

com

pone

nts

of V

ECto

VEC

gon

root

(#0

in th

is c

ase)

.

VECg

recvbuf

VEC

sendbuf

PE

#0

88

88

8

root

PE

#18

PE

#28

PE

#38

VECg

recvbuf

VEC

sendbuf

PE

#0

88

88

8

root

PE

#18

PE

#28

PE

#38•

8 co

mpo

nent

s ar

e ga

ther

ed fr

om e

ach

proc

ess

to th

e ro

ot p

roce

ss.

loca

l dat

a

glob

al d

ata

MP

I Pro

gram

min

g

call MPI_Gather &

(VEC , N, MPI_DOUBLE_PRECISION, &

VECg, N, MPI_DOUBLE_PRECISION, &

0, <comm>, ierr)

MPI_Gather

(VEC, N,

MPI_DOUBLE, VECg,

N,

MPI_DOUBLE, 0, <comm>);

Page 99: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

989898

<$O-S1>/scatter-gather.f/c

exam

ple

$> mpifccpx –Kfast scatter-gather.c

$> mpifrtpx –Kfast scatter-gather.f

$> (exec.4 proc’s) go4.sh

PE#0

before 0 1 101.

before 0 2 103.

before 0 3 105.

before 0 4 106.

before 0 5 109.

before 0 6 111.

before 0 7 121.

before 0 8 151.

PE#1

before 1 1 201.

before 1 2 203.

before 1 3 205.

before 1 4 206.

before 1 5 209.

before 1 6 211.

before 1 7 221.

before 1 8 251.

PE#2

before 2 1 301.

before 2 2 303.

before 2 3 305.

before 2 4 306.

before 2 5 309.

before 2 6 311.

before 2 7 321.

before 2 8 351.

PE#3

before 3 1 401.

before 3 2 403.

before 3 3 405.

before 3 4 406.

before 3 5 409.

before 3 6 411.

before 3 7 421.

before 3 8 451.

PE#0

after 0 1 1101.

after 0 2 1103.

after 0 3 1105.

after 0 4 1106.

after 0 5 1109.

after 0 6 1111.

after 0 7 1121.

after 0 8 1151.

PE#1

after 1 1 1201.

after 1 2 1203.

after 1 3 1205.

after 1 4 1206.

after 1 5 1209.

after 1 6 1211.

after 1 7 1221.

after 1 8 1251.

PE#2

after 2 1 1301.

after 2 2 1303.

after 2 3 1305.

after 2 4 1306.

after 2 5 1309.

after 2 6 1311.

after 2 7 1321.

after 2 8 1351.

PE#3

after 3 1 1401.

after 3 2 1403.

after 3 3 1405.

after 3 4 1406.

after 3 5 1409.

after 3 6 1411.

after 3 7 1421.

after 3 8 1451.

MP

I Pro

gram

min

g

Page 100: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

999999

MPI

_Red

uce_

scat

ter

•M

PI_

Red

uce

+ M

PI_

Sca

tter

•MPI_Reduce_Scatter(sendbuf, recvbuf, rcount, datatype,

op, comm)

–sendbuf

choi

ce

Ist

artin

g ad

dres

s of

sen

ding

buf

fer

–recvbuf

choi

ce

Ost

artin

g ad

dres

s of

rece

ivin

g bu

ffer

–rcount

int

Iin

tege

r arr

ay s

peci

fyin

g th

e nu

mbe

r of e

lem

ents

in re

sult

dist

ribut

ed to

eac

h pr

oces

s. A

rray

mus

t be

iden

tical

on

all

callin

g pr

oces

ses.

datatype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

endi

ng/re

ceiv

ing

buffe

r–

op

MP

I_O

pI

redu

ce o

pera

tion

MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND etc

–comm

MP

I_C

ommI

com

mun

icat

or

Red

uce

scat

ter

P#0

P#1

P#2

P#3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

op.A

0-A

3op

.A0-

A3

op.B

0-B

3op

.B0-

B3

op.C

0-C

3op

.C0-

C3

op.D

0-D

3op

.D0-

D3

C

MP

I Pro

gram

min

g

Page 101: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

100

100

100

MPI

_Allg

athe

r

•M

PI_

Gat

her+

MP

I_B

cast

–G

athe

rs d

ata

from

all

task

s an

d di

strib

ute

the

com

bine

d da

ta to

all

task

s

•MPI_Allgather(sendbuf, scount, sendtype, recvbuf, rcount,

recvtype, comm)

–sendbuf

choi

ce

Ist

artin

g ad

dres

s of

sen

ding

buf

fer

–scount

int

Inu

mbe

r of e

lem

ents

sen

t to

each

pro

cess

–sendtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

endi

ng b

uffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s of

rece

ivin

g bu

ffer

–rcount

int

Inu

mbe

r of e

lem

ents

rece

ived

from

eac

h pr

oces

s –

recvtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of r

ecei

ving

buf

fer

–comm

MP

I_C

ommI

com

mun

icat

or

All

gath

erA

0P

#0B0

C0

D0

A0

P#1

B0C

0D

0

A0

P#2

B0C

0D

0

A0

P#3

B0C

0D

0

A0

P#0

B0C

0D

0

A0

P#1

B0C

0D

0

A0

P#2

B0C

0D

0

A0

P#3

B0C

0D

0

A0

P#0

B0

P#1

C0

P#2

D0

P#3

A0

P#0

B0

P#1

C0

P#2

D0

P#3

C

MP

I Pro

gram

min

g

Page 102: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

101

101

101

MPI

_Allt

oall

•S

ends

dat

a fro

m a

ll to

all

proc

esse

s: tr

ansf

orm

atio

n of

den

se m

atrix

•MPI_Alltoall (sendbuf, scount, sendtype, recvbuf, rcount,

recvtype, comm)

–sendbuf

choi

ce

Ist

artin

g ad

dres

s of

sen

ding

buf

fer

–scount

int

Inu

mbe

r of e

lem

ents

sen

t to

each

pro

cess

–sendtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

endi

ng b

uffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s of

rece

ivin

g bu

ffer

–rcount

int

Inu

mbe

r of e

lem

ents

rece

ived

from

the

root

pro

cess

recvtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of r

ecei

ving

buf

fer

–comm

MP

I_C

ommI

com

mun

icat

or

All-

to-A

llA

0P

#0A

1A

2A3

B0

P#1

B1

B2

B3

C0

P#2

C1

C2

C3

D0

P#3

D1

D2

D3

A0

P#0

A1

A2

A3

B0

P#1

B1

B2

B3

C0

P#2

C1

C2

C3

D0

P#3

D1

D2

D3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

A0

P#0

B0

C0

D0

A1

P#1

B1

C1

D1

A2

P#2

B2

C2

D2

A3

P#3

B3

C3

D3

C

MP

I Pro

gram

min

g

Page 103: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

102

102

102

Exam

ples

by

Col

lect

ive

Com

m.

•D

ot P

rodu

cts

of V

ecto

rs•

Sca

tter/G

athe

r•

Rea

ding

Dis

tribu

ted

File

s•

MP

I_A

llgat

herv

MP

I Pro

gram

min

g

Page 104: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

103

103

103

Ope

ratio

ns o

f Dis

trib

uted

Loc

al F

iles

•In

Sca

tter/G

athe

r exa

mpl

e, P

E#0

read

s gl

obal

dat

a, th

at

is s

catte

red

to e

ach

proc

esse

r, th

en p

aral

lel o

pera

tions

ar

e do

ne.

•If

the

prob

lem

siz

e is

ver

y la

rge,

a s

ingl

e pr

oces

sor m

ay

not r

ead

entir

e gl

obal

dat

a.–

If th

e en

tire

glob

al d

ata

is d

ecom

pose

d to

dis

tribu

ted

loca

l dat

a se

ts, e

ach

proc

ess

can

read

the

loca

l dat

a.–

If gl

obal

ope

ratio

ns a

re n

eede

d to

a c

erta

in s

ets

of v

ecto

rs,

MP

I fun

ctio

ns, s

uch

as M

PI_

Gat

her e

tc. a

re a

vaila

ble.

MP

I Pro

gram

min

g

Page 105: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

104

104

104

Rea

ding

Dis

trib

uted

Loc

al F

iles:

U

nifo

rm V

ec. L

engt

h (1

/2)

>$ cd <$O-S1>

>$ ls a1.*

a1.0 a1.1 a1.2 a1.3

a1x.all is decomposed to

4 files.

>$ mpifccpx –Kfast file.c

>$ mpifrtpx –Kfast file.f

(modify go4.sh for 4 processes)

>$ pjsub go4.sh

MP

I Pro

gram

min

g

Page 106: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

105

105

Ope

ratio

ns o

f Dis

trib

uted

Loc

al F

iles

•Lo

cal f

iles a1.0~a1.3

are

orig

inal

ly fr

om g

loba

l file

a1x.all

.

a1.0

a1.1

a1.2

a1.3

a1x.

all

MP

I Pro

gram

min

g

Page 107: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

106

106

Rea

ding

Dis

trib

uted

Loc

al F

iles:

U

nifo

rm V

ec. L

engt

h (2

/2)

<$O-S1>/file.c

MP

I Pro

gram

min

g

int main(int argc, char **argv){

int i;

int PeTot, MyRank;

MPI_Comm SolverComm;

double vec[8];

char FileName[80];

FILE *fp;

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &PeTot);

MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);

sprintf(FileName, "a1.%d", MyRank);

fp = fopen(FileName, "r");

if(fp == NULL) MPI_Abort(MPI_COMM_WORLD, -1);

for(i=0;i<8;i++){

fscanf(fp, "%lf", &vec[i]); }

for(i=0;i<8;i++){

printf("%5d%5d%10.0f¥n", MyRank, i+1, vec[i]);

} MPI_Finalize();

return 0;

}

Sim

ilar t

o “H

ello

Loca

l ID

is 0

-7

Page 108: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

107

107

Typi

cal S

PMD

Ope

ratio

n

PE

#0

“a.o

ut”

“a1.

0”

PE

#1

“a.o

ut”

“a1.

1”

PE

#2

“a.o

ut”

“a1.

2”

mpirun -np 4 a.out

PE

#3

“a.o

ut”

“a1.

3”

MP

I Pro

gram

min

g

Page 109: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

108

108

108

Non

-Uni

form

Vec

tor L

engt

h (1

/2)

>$ cd <$O-S1>

>$ lsa2.*

a2.0 a2.1 a2.2 a2.3

>$ cat a2.0

5N

umbe

r of C

ompo

nent

s at

eac

h P

roce

ss201.0

Com

pone

nts

203.0

205.0

206.0

209.0

>$ mpifccpx–Kfastfile2.c

>$ mpifrtpx–Kfastfile2.f

(modify go4.sh for 4 processes)

>$ pjsubgo4.sh

MP

I Pro

gram

min

g

Page 110: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

int main(int argc, char **argv){

int i, int PeTot, MyRank;

MPI_Comm SolverComm;

double *vec, *vec2, *vecg;

int num;

double sum0, sum;

char filename[80];

FILE *fp;

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &PeTot);

MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);

sprintf(filename, "a2.%d", MyRank);

fp = fopen(filename, "r");

assert(fp != NULL);

fscanf(fp, "%d", &num);

vec = malloc(num * sizeof(double));

for(i=0;i<num;i++){fscanf(fp, "%lf", &vec[i]);}

for(i=0;i<num;i++){

printf(" %5d%5d%5d%10.0f¥n", MyRank, i+1, num, vec[i]);}

MPI_Finalize();

}

109

109

Non

-Uni

form

Vec

tor L

engt

h (2

/2)

“num

” is

diffe

rent

at e

ach

proc

ess

<$O-S1>/file2.c

MP

I Pro

gram

min

g

Page 111: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

110

How

to g

ener

ate

loca

l dat

a•

Rea

ding

glo

bal d

ata

(N=N

G)

–S

catte

ring

to e

ach

proc

ess

–P

aral

lel p

roce

ssin

g on

eac

h pr

oces

s–

(If n

eede

d) re

cons

truct

ion

of g

loba

l dat

a by

gat

herin

g lo

cal d

ata

•G

ener

atin

g lo

cal d

ata

(N=N

L), o

r rea

ding

dis

tribu

ted

loca

l da

ta

–G

ener

atin

g or

read

ing

loca

l dat

a on

eac

h pr

oces

s–

Par

alle

l pro

cess

ing

on e

ach

proc

ess

–(If

nee

ded)

reco

nstru

ctio

n of

glo

bal d

ata

by g

athe

ring

loca

l dat

a

•In

futu

re, l

atte

r cas

e is

mor

e im

porta

nt, b

ut fo

rmer

cas

e is

al

so in

trodu

ced

in th

is c

lass

for u

nder

stan

ding

of

oper

atio

ns o

f glo

bal/l

ocal

dat

a.

MP

I Pro

gram

min

g

Page 112: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

111

111

111

Exam

ples

by

Col

lect

ive

Com

m.

•D

ot P

rodu

cts

of V

ecto

rs•

Sca

tter/G

athe

r•

Rea

ding

Dis

tribu

ted

File

s•

MP

I_A

llgat

herv

MP

I Pro

gram

min

g

Page 113: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

112

MPI

_Gat

herv,

MPI

_Sca

tterv

•M

PI_

Gat

her,

MP

I_S

catte

r–

Leng

th o

f mes

sage

from

/to e

ach

proc

ess

is u

nifo

rm

•M

PI_

XX

Xv

exte

nds

func

tiona

lity

of M

PI_

XX

X b

y al

low

ing

a va

ryin

g co

unt o

f dat

a fro

m e

ach

proc

ess:

MP

I_G

athe

rv–

MP

I_S

catte

rv–

MP

I_A

llgat

herv

–M

PI_

Allt

oallv

Page 114: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

113

MPI

_Allg

athe

rv

•V

aria

ble

coun

t ver

sion

of M

PI_

Allg

athe

r–

crea

tes

“glo

bal d

ata”

from

“loc

al d

ata”

•MPI_Allgatherv(sendbuf, scount, sendtype, recvbuf,

rcounts, displs, recvtype, comm)

–sendbuf

choi

ce

Ist

artin

g ad

dres

s of

sen

ding

buf

fer

–scount

int

Inu

mbe

r of e

lem

ents

sen

t to

each

pro

cess

–sendtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of s

endi

ng b

uffe

r–

recvbuf

choi

ce

Ost

artin

g ad

dres

s of

rece

ivin

g bu

ffer

–rcounts

int

Iin

tege

r arra

y (o

f len

gth

grou

p si

ze) c

onta

inin

g th

e nu

mbe

r of

elem

ents

that

are

to b

e re

ceiv

ed fr

om e

ach

proc

ess

(arr

ay: s

ize=

PETOT

)–

displs

int

Iin

tege

r arr

ay (o

f len

gth

grou

p si

ze).

Ent

ry i

spec

ifies

the

disp

lace

men

t (re

lativ

e to

recv

buf)

at w

hich

to p

lace

the

inco

min

g da

ta fr

om p

roce

ss i

(arr

ay: s

ize=

PETOT+1

)–

recvtype

MP

I_D

atat

ypeI

data

type

of e

lem

ents

of r

ecei

ving

buf

fer

–comm

MP

I_C

ommI

com

mun

icat

or

C

Page 115: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

114

MPI

_Allg

athe

rv(c

ont.)

•MPI_Allgatherv(sendbuf, scount, sendtype, recvbuf,

rcounts, displs, recvtype, comm)

–rcounts

int

Iin

tege

r arra

y (o

f len

gth

grou

p si

ze) c

onta

inin

g th

e nu

mbe

r of

elem

ents

that

are

to b

e re

ceiv

ed fr

om e

ach

proc

ess

(arr

ay: s

ize=

PETOT

)–

displs

int

Iin

tege

r arr

ay (o

f len

gth

grou

p si

ze).

Ent

ry i

spec

ifies

the

disp

lace

men

t (re

lativ

e to

recv

buf)

at w

hich

to p

lace

the

inco

min

g da

ta fr

om p

roce

ss i

(arr

ay: s

ize=

PETOT+1

)–

Thes

e tw

o ar

rays

are

rela

ted

to s

ize

of fi

nal “

glob

al d

ata”

, the

refo

re e

ach

proc

ess

requ

ires

info

rmat

ion

of th

ese

arra

ys (r

counts, displs

)•

Eac

h pr

oces

s m

ust h

ave

sam

e va

lues

for a

ll co

mpo

nent

s of

bot

h ve

ctor

s–

Usu

ally

, stride(i)=rcounts(i)

rcou

nts[

0]rc

ount

s[1]

rcou

nts[

2]rc

ount

s[m

-1]

rcou

nts[

m-2

]

PE#0

PE#1

PE#2

PE#(

m-2

)PE

#(m

-1)

disp

ls[0

]=0

disp

ls[1

]=di

spls

[0] +

stri

de[0

]di

spls

[m]=

disp

ls[m

-1] +

stri

de[m

-1]

strid

e[0]

strid

e[1]

strid

e[2]

strid

e[m

-2]

strid

e[m

-1]

size

[recv

buf]=

dis

pls[

PE

TOT]

= su

m[s

tride

]

C

Page 116: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

115

Wha

t MPI

_Allg

athe

rv

is d

oing

strid

e[0]

PE

#0N

PE

#1N

PE

#2N

PE

#3N

rcounts[0] rcounts[1] rcounts[2] rcounts[3]

strid

e[1]

strid

e[2]

strid

e[3]

disp

ls[0

]

disp

ls[1

]

disp

ls[2

]

disp

ls[3

]

disp

ls[4

]

Loca

l Dat

a: s

endb

ufG

loba

l Dat

a: re

cvbu

f

Gen

erat

ing

glob

al d

ata

from

lo

cal d

ata

Page 117: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

116

Wha

t MPI

_Allg

athe

rv is

do

ing

strid

e[0]

= rc

ount

s[0]

PE

#0N

PE

#1N

PE

#2N

PE

#3N

rcounts[0] rcounts[1] rcounts[2] rcounts[3]

strid

e[1]

= rc

ount

s[1]

strid

e[2]

= rc

ount

s[2]

strid

e[3]

= rc

ount

s[3]

disp

ls[0

]

disp

ls[1

]

disp

ls[2

]

disp

ls[3

]

disp

ls[4

]

Gen

erat

ing

glob

al d

ata

from

loca

l dat

a

Loca

l Dat

a: s

endb

ufG

loba

l Dat

a: re

cvbu

f

Page 118: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

117

MPI

_Allg

athe

rv in

det

ail (

1/2)

•MPI_Allgatherv(sendbuf, scount, sendtype, recvbuf, rcounts,

displs, recvtype, comm)

•rcounts

–S

ize

of m

essa

ge fr

om e

ach

PE

: Siz

e of

Loc

al D

ata

(Len

gth

of L

ocal

Vec

tor)

•displs

–A

ddre

ss/in

dex

of e

ach

loca

l dat

a in

the

vect

or o

f glo

bal d

ata

–displs(PETOT+1)

= S

ize

of E

ntire

Glo

bal D

ata

(Glo

bal V

ecto

r)

rcou

nts[

0]rc

ount

s[1]

rcou

nts[

2]rc

ount

s[m

-1]

rcou

nts[

m-2

]

PE#0

PE#1

PE#2

PE#(

m-2

)PE

#(m

-1)

disp

ls[0

]=0

disp

ls[1

]=di

spls

[0] +

stri

de[0

]di

spls

[m]=

disp

ls[m

-1] +

stri

de[m

-1]

strid

e[0]

strid

e[1]

strid

e[2]

strid

e[m

-2]

strid

e[m

-1]

size

[recv

buf]=

dis

pls[

PE

TOT]

= su

m[s

tride

]

C

Page 119: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

118

MPI

_Allg

athe

rv in

det

ail (

2/2)

•E

ach

proc

ess

need

s in

form

atio

n of

rcounts

& displs

–“rcounts”

can

be c

reat

ed b

y ga

ther

ing

loca

l vec

tor l

engt

h “N”

from

eac

h pr

oces

s.

–O

n ea

ch p

roce

ss, “displs”

can

be g

ener

ated

from

“rcounts”

on e

ach

proc

ess.

•stride[i]= rcounts[i]

–S

ize

of ”recvbuf”

is c

alcu

late

d by

sum

mat

ion

of ”rcounts”.

rcou

nts[

0]rc

ount

s[1]

rcou

nts[

2]rc

ount

s[m

-1]

rcou

nts[

m-2

]

PE#0

PE#1

PE#2

PE#(

m-2

)PE

#(m

-1)

disp

ls[0

]=0

disp

ls[1

]=di

spls

[0] +

stri

de[0

]di

spls

[m]=

disp

ls[m

-1] +

stri

de[m

-1]

strid

e[0]

strid

e[1]

strid

e[2]

strid

e[m

-2]

strid

e[m

-1]

size

[recv

buf]=

dis

pls[

PE

TOT]

= su

m[s

tride

]

C

Page 120: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

119

Prep

arat

ion

for M

PI_A

llgat

herv

<$O

-S1>

/agv

.c

•G

ener

atin

g gl

obal

vec

tor f

rom

“a2

.0”~

”a2.

3”.

•Le

ngth

of t

he e

ach

vect

or is

8, 5

, 7, a

nd 3

, re

spec

tivel

y. T

here

fore

, siz

e of

fina

l glo

bal v

ecto

r is

23

(= 8

+5+7

+3).

Page 121: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

120

a2.0

~a2.

3

PE#0

8101.0

103.0

105.0

106.0

109.0

111.0

121.0

151.0

PE#1

5201.0

203.0

205.0

206.0

209.0

PE#2

7301.0

303.0

305.0

306.0

311.0

321.0

351.0

PE#3

3401.0

403.0

405.0

Page 122: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

121

int main(int argc, char **argv){

int i;

int PeTot, MyRank;

MPI_Comm SolverComm;

double *vec, *vec2, *vecg;

int *Rcounts, *Displs;

int n;

double sum0, sum;

char filename[80];

FILE *fp;

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &PeTot);

MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);

sprintf(filename, "a2.%d", MyRank);

fp = fopen(filename, "r");

assert(fp != NULL);

fscanf(fp, "%d", &n);

vec = malloc(n * sizeof(double));

for(i=0;i<n;i++){

fscanf(fp, "%lf", &vec[i]);

}

Prep

arat

ion:

MPI

_Allg

athe

rv (1

/4)

<$O-S1>/agv.c

n(NL)

is d

iffer

ent a

tea

ch p

roce

ss

C

Page 123: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

122

Prep

arat

ion:

MPI

_Allg

athe

rv (2

/4)

Rcounts= calloc(PeTot, sizeof(int));

Displs = calloc(PeTot+1, sizeof(int));

printf("before %d %d", MyRank, n);

for(i=0;i<PeTot;i++){printf(" %d", Rcounts[i]);}

MPI_Allgather(&n, 1, MPI_INT, Rcounts, 1, MPI_INT, MPI_COMM_WORLD);

printf("after %d %d", MyRank, n);

for(i=0;i<PeTot;i++){printf(" %d", Rcounts[i]);}

Displs[0] = 0;

Rcounts

on e

ach

PE

PE

#0 N

=8

PE

#1 N

=5

PE

#2 N

=7

PE

#3 N

=3

MP

I_A

llgat

her

Rco

unts

[0:3

]= {8

, 5, 7

, 3}

Rco

unts

[0:3

]={8

, 5, 7

, 3}

Rco

unts

[0:3

]={8

, 5, 7

, 3}

Rco

unts

[0:3

]={8

, 5, 7

, 3}

<$O-S1>/agv.c

C

Page 124: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

123

Prep

arat

ion:

MPI

_Allg

athe

rv (2

/4)

Rcounts= calloc(PeTot, sizeof(int));

Displs = calloc(PeTot+1, sizeof(int));

printf("before %d %d", MyRank, n);

for(i=0;i<PeTot;i++){printf(" %d", Rcounts[i]);}

MPI_Allgather(&n, 1, MPI_INT, Rcounts, 1, MPI_INT, MPI_COMM_WORLD);

printf("after %d %d", MyRank, n);

for(i=0;i<PeTot;i++){printf(" %d", Rcounts[i]);}

Displs[0] = 0;

for(i=0;i<PeTot;i++){

Displs[i+1] = Displs[i] + Rcounts[i];}

printf("CoundIndex %d ", MyRank);

for(i=0;i<PeTot+1;i++){

printf(" %d", Displs[i]);

} MPI_Finalize();

return 0;

}<$O-S1>/agv.c

C

Rcounts

on e

ach

PE

Displs

on e

ach

PE

Page 125: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

124

Prep

arat

ion:

MPI

_Allg

athe

rv (3

/4)

> cd <$O-S1>

> mpifccpx –Kfast agv.c

(modify go4.sh for 4 processes)

> pjsub go4.sh

before 0 8 0 0 0 0

after 0 8 8 5 7 3

Displs 0 0 8 13 20 23

before 1 5 0 0 0 0

after 1 5 8 5 7 3

Displs 1 0 8 13 20 23

before 3 3 0 0 0 0

after 3 3 8 5 7 3

Displs 3 0 8 13 20 23

before 2 7 0 0 0 0

after 2 7 8 5 7 3

Displs 2 0 8 13 20 23

Page 126: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MP

Ipro

g.

125

Prep

arat

ion:

MPI

_Allg

athe

rv (4

/4)

•O

nly ”recvbuf”

is n

ot d

efin

ed y

et.

•S

ize

of ”recvbuf”

= ”Displs[PETOT]”

MPI_Allgatherv

( VEC , N, MPI_DOUBLE,

recvbuf, rcounts, displs, MPI_DOUBLE,

MPI_COMM_WORLD);

Page 127: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

126

Rep

ort S

1 (1

/2)

•D

eadl

ine:

17:

00 O

ctob

er 1

1th

(Sat

), 20

14.

–S

end

files

via

e-m

ail a

t nakajima(at)cc.u-tokyo.ac.jp

•P

robl

em S

1-1

–R

ead

loca

l file

s <$

O-S

1>/a

1.0~

a1.3

, <$O

-S1>

/a2.

0~a2

.3.

–D

evel

op c

odes

whi

ch c

alcu

late

nor

m ||

x|| o

f glo

bal v

ecto

r for

eac

h ca

se.

•<$

O-S

1>fil

e.c,

<$O

-S1>

file2

.c

•P

robl

em S

1-2

–R

ead

loca

l file

s <$

O-S

1>/a

2.0~

a2.3

.–

Dev

elop

a c

ode

whi

ch c

onst

ruct

s “g

loba

l vec

tor”

usin

g M

PI_

Allg

athe

rv.

MP

I Pro

gram

min

g

Page 128: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

127

Rep

ort S

1 (2

/2)

•P

robl

em S

1-3

–D

evel

op p

aral

lel p

rogr

am w

hich

cal

cula

tes

the

follo

win

g nu

mer

ical

inte

grat

ion

usin

g “tr

apez

oida

l rul

e” b

y M

PI_

Red

uce,

M

PI_

Bca

st e

tc.

–M

easu

re c

ompu

tatio

n tim

e, a

nd p

aral

lel p

erfo

rman

ce

dxx

1 02

14

•R

epor

t–

Cov

er P

age:

Nam

e, ID

, and

Pro

blem

ID (S

1) m

ust b

e w

ritte

n.

–Le

ss th

an tw

o pa

ges

incl

udin

g fig

ures

and

tabl

es (A

4) fo

r eac

h of

thre

e su

b-pr

oble

ms

•S

trate

gy, S

truct

ure

of th

e P

rogr

am, R

emar

ks–

Sou

rce

list o

f the

pro

gram

(if y

ou h

ave

bugs

)–

Out

put l

ist (

as s

mal

l as

poss

ible

)

MP

I Pro

gram

min

g

Page 129: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

128

128

128

•W

hat i

s M

PI ?

•Y

our F

irst M

PI P

rogr

am: H

ello

Wor

ld

•G

loba

l/Loc

al D

ata

•C

olle

ctiv

e C

omm

unic

atio

n•

Peer

-to-P

eer C

omm

unic

atio

n

MP

I Pro

gram

min

g

Page 130: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Peer

-to-P

eer C

omm

unic

atio

nPo

int-t

o-Po

int C

omm

unic

atio

1対

1通信

•W

hat i

s P

2P C

omm

unic

atio

n ?

•2D

Pro

blem

, Gen

eral

ized

Com

mun

icat

ion

Tabl

e•

Rep

ort S

2

129

MP

I Pro

gram

min

g

Page 131: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

01

23

45

67

89

1011

01

23

45

67

89

10

01

23

4

78

910

11

12

3

78

910

0

34

56

78

34

56

7

01

23

45

67

89

1011

12

34

56

78

910

0

130

MP

I Pro

gram

min

g

Page 132: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

Loca

l ID

: Sta

rting

from

0 fo

r nod

e an

d el

em a

t eac

h do

mai

n

01

23

4

40

12

3

12

3

30

12

0

40

12

35

30

12

4

#0

#1

#2

131

MP

I Pro

gram

min

g

Page 133: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

1D F

EM: 1

2 no

des/

11 e

lem

’s/3

dom

ains

Inte

rnal

/Ext

erna

l Nod

es

01

23

4

40

12

3

12

3

30

12

0

40

12

35

30

12

4

#0

#1

#2

132

MP

I Pro

gram

min

g

Page 134: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Pre

cond

ition

ed C

onju

gate

Gra

dien

t M

etho

d (C

G)

Compute r(0)= b-[A]x(0)

fori= 1, 2, …

solve [M]z(i-1)= r(i-1)

i-1= r

(i-1) z(i-1)

ifi=1

p(1)= z(0)

else

i-1= i

-1/ i

-2

p(i)= z(i-1) + i

-1p(i-1)

endif

q(i)= [A]p(i)

i = i

-1/p(i)q(i)

x(i)= x(i-1) + ip(i)

r(i)= r(i-1) -

iq(i)

check convergence |r|

end

Pre

cond

ition

er:

Dia

gona

l Sca

ling

Poi

nt-J

acob

i Pre

cond

ition

ing13

3M

PI P

rogr

amm

ing

N

N

DD

DD

M

0...

00

00

0...

......

00

00

0...

0

1

2

1

Page 135: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Pre

cond

ition

ing,

DA

XP

YLo

cal O

pera

tions

by

Onl

y In

tern

al P

oint

s: P

aral

lel P

roce

ssin

g is

pos

sibl

e

/*

//--{x}= {x} + ALPHA*{p}

DAXPY: double a{x} plus {y}

// {r}= {r} -ALPHA*{q}

*/

for(i=0;i<N;i++){

PHI[i] += Alpha * W[P][i];

W[R][i] -= Alpha * W[Q][i];

}

/*//--

{z}= [Minv]{r}

*/

for(i=0;i<N;i++){

W[Z][i] = W[DD][i] * W[R][i];

}

0 1 2 3 4 5 6 7 8 9 10 11

134

MP

I Pro

gram

min

g

Page 136: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Dot

Pro

duct

sG

loba

l Sum

mat

ion

need

ed: C

omm

unic

atio

n ?

/*

//--

ALPHA= RHO / {p}{q}

*/C1 = 0.0;

for(i=0;i<N;i++){

C1 += W[P][i] * W[Q][i];

} Alpha = Rho / C1;

0 1 2 3 4 5 6 7 8 9 10 11

135

MP

I Pro

gram

min

g

Page 137: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Mat

rix-V

ecto

r Pro

duct

sV

alue

s at

Ext

erna

l Poi

nts:

P-to

-P C

omm

unic

atio

n/*//--{q}= [A]{p}

*/for(i=0;i<N;i++){

W[Q][i] = Diag[i] * W[P][i];

for(j=Index[i];j<Index[i+1];j++){

W[Q][i] += AMat[j]*W[P][Item[j]];

}}

40

12

35

136

MP

I Pro

gram

min

g

Page 138: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Mat

-Vec

Pro

duct

s: L

ocal

Op.

Pos

sibl

e0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

=

137

MP

I Pro

gram

min

g

Page 139: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Mat

-Vec

Pro

duct

s: L

ocal

Op.

Pos

sibl

e0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10 11

0 1 2 3 4 5 6 7 8 9 10 11

=

138

MP

I Pro

gram

min

g

Page 140: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Mat

-Vec

Pro

duct

s: L

ocal

Op.

Pos

sibl

e0

1

2

3

0

1

2

3

0

1

2

3

0 1 2 3 0 1 2 3 0 1 2 3

0 1 2 3 0 1 2 3 0 1 2 3

=

139

MP

I Pro

gram

min

g

Page 141: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Mat

-Vec

Pro

duct

s: L

ocal

Op.

#1

0

1

2

3

0 1 2 3

0 1 2 3

=

40

12

35

0

1

2

3

0 1 2 3

0 1 2 3

=

4 5

140

MP

I Pro

gram

min

g

Page 142: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Wha

t is

Peer

-to-P

eer C

omm

unic

atio

n ?

•C

olle

ctiv

e C

omm

unic

atio

n–

MP

I_R

educ

e, M

PI_

Sca

tter/G

athe

r etc

.–

Com

mun

icat

ions

with

all

proc

esse

s in

the

com

mun

icat

or–

App

licat

ion

Are

a•

BE

M, S

pect

ral M

etho

d, M

D: g

loba

l int

erac

tions

are

con

side

red

•D

ot p

rodu

cts,

MA

X/M

IN: G

loba

l Sum

mat

ion

& C

ompa

rison

•P

eer-

toP

eer/P

oint

-to-P

oint

–M

PI_

Sen

d, M

PI_

Rec

eive

–C

omm

unic

atio

n w

ith li

mite

d pr

oces

ses

•N

eigh

bors

–A

pplic

atio

n A

rea

•FE

M, F

DM

: Loc

aliz

ed M

etho

d

01

23

4

40

12

3

12

3

30

12

0

40

12

35

30

12

4

#0

#1

#2

141

MP

I Pro

gram

min

g

Page 143: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Col

lect

ive/

P2P

Com

mun

icat

ions

Inte

ract

ions

with

onl

y N

eigh

borin

g P

roce

sses

/Ele

men

tFi

nite

Diff

eren

ce M

etho

d (F

DM

), Fi

nite

Ele

men

t Met

hod

(FE

M)

142

MP

I Pro

gram

min

g

Page 144: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Whe

n do

we

need

P2P

com

m.:

1D-F

EMIn

fo in

nei

ghbo

ring

dom

ains

is re

quire

d fo

r FE

M o

pera

tions

Mat

rix a

ssem

blin

g, It

erat

ive

Met

hod

01

23

4

40

12

3

12

3

30

12

0

40

12

35

30

12

4

#0

#1

#2

143

MP

I Pro

gram

min

g

Page 145: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Met

hod

for P

2P C

omm

.•MPI_Send,

MPI_Recv

•Th

ese

are

“blo

ckin

g” fu

nctio

ns. “

Dea

d lo

ck” o

ccur

s fo

r th

ese

“blo

ckin

g” fu

nctio

ns.

•A

“blo

ckin

g” M

PI c

all m

eans

that

the

prog

ram

exe

cutio

n w

ill be

sus

pend

ed u

ntil

the

mes

sage

buf

fer i

s sa

fe to

use

. •

The

MP

I sta

ndar

ds s

peci

fy th

at a

blo

ckin

g S

EN

D o

r R

EC

V d

oes

not r

etur

n un

til th

e se

nd b

uffe

r is

safe

to

reus

e (fo

r MP

I_S

end)

, or t

he re

ceiv

e bu

ffer i

s re

ady

to

use

(for M

PI_

Rec

v).

–B

lock

ing

com

m. c

onfir

ms

“sec

ure”

com

mun

icat

ion,

but

it is

ver

y in

conv

enie

nt.

•P

leas

e ju

st re

mem

ber t

hat “

ther

e ar

e su

ch fu

nctio

ns”.

144

MP

I Pro

gram

min

g

Page 146: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MPI

_Sen

d/M

PI_R

ecv

if (my_rank.eq.0) NEIB_ID=1

if (my_rank.eq.1) NEIB_ID=0

… call MPI_SEND (NEIB_ID, arg’s)

call MPI_RECV (NEIB_ID, arg’s)

•Th

is s

eem

s re

ason

able

, but

it s

tops

at

MP

I_S

end/

MP

I_R

ecv.

–S

omet

imes

it w

orks

(acc

ordi

ng to

impl

emen

tatio

n).

12

34

12

34

PE#0

PE#1

5

4

145

MP

I Pro

gram

min

g

Page 147: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MPI

_Sen

d/M

PI_R

ecv

(con

t.)if (my_rank.eq.0) NEIB_ID=1

if (my_rank.eq.1) NEIB_ID=0

… if (my_rank.eq.0) then

call MPI_SEND (NEIB_ID, arg’s)

call MPI_RECV (NEIB_ID, arg’s)

endif

if (my_rank.eq.1) then

call MPI_RECV (NEIB_ID, arg’s)

call MPI_SEND (NEIB_ID, arg’s)

endif

•It

wor

ks ..

. but

12

34

12

34

PE#0

PE#1

5

4

146

MP

I Pro

gram

min

g

Page 148: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

How

to d

o P2

P C

omm

. ?

•U

sing

“non

-blo

ckin

g” fu

nctio

ns MPI_Isend

&

MPI_Irecv

toge

ther

with

MPI_Waitall

for

sync

hron

izat

ion

•MPI_Sendrecv

is a

lso

avai

labl

e.if (my_rank.eq.0) NEIB_ID=1

if (my_rank.eq.1) NEIB_ID=0

… call MPI_Isend (NEIB_ID, arg’s)

call MPI_Irecv (NEIB_ID, arg’s)

… call MPI_Waitall (for Irecv)

… call MPI_Waitall (for Isend)

MPI_Waitall

for b

oth

of

MPI_Isend/MPI_Irecv i

s po

ssib

le

12

34

12

34

PE#0

PE#1

5

4

147

MP

I Pro

gram

min

g

Page 149: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MPI

_Ise

nd•

Beg

ins

a no

n-bl

ocki

ng s

end

–S

end

the

cont

ents

of s

endi

ng b

uffe

r (st

artin

g fro

m sendbuf

, num

ber o

f mes

sage

s: count

) to

dest

with

tag

. –

Con

tent

s of

sen

ding

buf

fer c

anno

t be

mod

ified

bef

ore

callin

g co

rresp

ondi

ng MPI_Waitall

.

•MPI_Isend

(sendbuf,count,datatype,dest,tag,comm,request)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

endi

ng b

uffe

r–

count

int

Inu

mbe

r of e

lem

ents

in s

endi

ng b

uffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

ach

send

ing

buffe

r ele

men

t–

dest

int

Ira

nk o

f des

tinat

ion

–tag

int

Im

essa

ge ta

gTh

is in

tege

r can

be

used

by

the

appl

icat

ion

to d

istin

guis

h m

essa

ges.

Com

mun

icat

ion

occu

rs if

tag’s

of

MPI_Isend

and MPI_Irecv

are

mat

ched

. U

sual

ly ta

g is

set

to b

e “0

” (in

this

cla

ss),

–comm

MP

I_C

ommI

com

mun

icat

or–

request

MP

I_R

eque

stO

com

mun

icat

ion

requ

est a

rray

used

in MPI_Waitall

C14

8M

PI P

rogr

amm

ing

Page 150: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Com

mun

icat

ion

Req

uest

: req

uest

通信

識別

子•

MPI_Isend

(sendbuf,count,datatype,dest,tag,comm,request)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

endi

ng b

uffe

r–

count

int

Inu

mbe

r of e

lem

ents

in s

endi

ng b

uffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

ach

send

ing

buffe

r ele

men

t–

dest

int

Ira

nk o

f des

tinat

ion

–tag

int

Im

essa

ge ta

gTh

is in

tege

r can

be

used

by

the

appl

icat

ion

to d

istin

guis

h m

essa

ges.

Com

mun

icat

ion

occu

rs if

tag’s

of

MPI_Isend

and MPI_Irecv

are

mat

ched

. U

sual

ly ta

g is

set

to b

e “0

” (in

this

cla

ss),

–comm

MP

I_C

ommI

com

mun

icat

or–

request

MP

I_R

eque

stO

com

mun

icat

ion

requ

est u

sed

in MPI_Waitall

Siz

e of

the

arra

y is

tota

l num

ber o

f nei

ghbo

ring

proc

esse

s

•Ju

st d

efin

e th

e ar

ray

C

149

MP

I Pro

gram

min

g

Page 151: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MPI

_Ire

cv•

Beg

ins

a no

n-bl

ocki

ng re

ceiv

e –

Rec

eivi

ng th

e co

nten

ts o

f rec

eivi

ng b

uffe

r (st

artin

g fro

m recvbuf

, num

ber o

f mes

sage

s:

count

) fro

m source

with

tag

. –

Con

tent

s of

rece

ivin

g bu

ffer c

anno

t be

used

bef

ore

callin

g co

rresp

ondi

ng MPI_Waitall

.

•MPI_Irecv

(recvbuf,count,datatype,source,tag,comm,request)

–recvbuf

choi

ceI

star

ting

addr

ess

of re

ceiv

ing

buffe

r–

count

int

Inu

mbe

r of e

lem

ents

in re

ceiv

ing

buffe

r–

datatype

MP

I_D

atat

ypeI

data

type

of e

ach

rece

ivin

g bu

ffer e

lem

ent

–source

int

Ira

nk o

f sou

rce

–tag

int

Im

essa

ge ta

gTh

is in

tege

r can

be

used

by

the

appl

icat

ion

to d

istin

guis

h m

essa

ges.

Com

mun

icat

ion

occu

rs if

tag’s

of

MPI_Isend

and MPI_Irecv

are

mat

ched

. U

sual

ly ta

g is

set

to b

e “0

” (in

this

cla

ss),

–comm

MP

I_C

ommI

com

mun

icat

or–

request

MP

I_R

eque

stO

com

mun

icat

ion

requ

est a

rray

used

in MPI_Waitall

C15

0M

PI P

rogr

amm

ing

Page 152: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MPI

_Wai

tall

•MPI_Waitall

bloc

ks u

ntil

all c

omm

’s, a

ssoc

iate

d w

ith request

in th

e ar

ray,

co

mpl

ete.

It is

use

d fo

r syn

chro

nizi

ng MPI_Isend

and MPI_Irecv

in th

is c

lass

.•

At s

endi

ng p

hase

, con

tent

s of

sen

ding

buf

fer c

anno

t be

mod

ified

bef

ore

callin

g co

rresp

ondi

ng MPI_Waitall

. At r

ecei

ving

pha

se, c

onte

nts

of re

ceiv

ing

buffe

r ca

nnot

be

used

bef

ore

callin

g co

rresp

ondi

ng MPI_Waitall

.•

MPI_Isend

and MPI_Irecv

can

be s

ynch

roni

zed

sim

ulta

neou

sly

with

a s

ingl

e MPI_Waitall

if it

is c

onsi

tent

.–

Sam

e request

shou

ld b

e us

ed in

MPI_Isend

and MPI_Irecv

.•

Its o

pera

tion

is s

imila

r to

that

of MPI_Barrier

but, MPI_Waitall

can

not b

e re

plac

ed b

yMPI_Barrier

.–

Pos

sibl

e tro

uble

s us

ing MPI_Barrier

inst

ead

of MPI_Waitall

: Con

tent

s of

request

and

status

are

not u

pdat

ed p

rope

rly, v

ery

slow

ope

ratio

ns e

tc.

•MPI_Waitall

(count,request,status)

–count

int

Inu

mbe

r of p

roce

sses

to b

e sy

nchr

oniz

ed

–request

MP

I_R

eque

stI/O

com

m. r

eque

st u

sed

in MPI_Waitall

(arr

ay s

ize:

count

)–

status

MP

I_S

tatu

sO

arra

y of

sta

tus

obje

cts

MPI_STATUS_SIZE: defined in ‘mpif.h’, ‘mpi.h’C

151

MP

I Pro

gram

min

g

Page 153: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Arr

ay o

f sta

tus

obje

ct:

stat

us状

況オブジェクト配

•MPI_Waitall

(count,request,status)

–count

int

Inu

mbe

r of p

roce

sses

to b

e sy

nchr

oniz

ed

–request

MP

I_R

eque

stI/O

com

m. r

eque

st u

sed

in MPI_Waitall

(arr

ay s

ize:

count

)–

status

MP

I_S

tatu

sO

arra

y of

sta

tus

obje

cts

MPI_STATUS_SIZE: defined in ‘mpif.h’, ‘mpi.h’

•Ju

st d

efin

e th

e ar

ray

C

152

MP

I Pro

gram

min

g

Page 154: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

MPI

_Sen

drec

v•

MP

I_S

end+

MP

I_R

ecv:

not r

ecom

men

ded,

man

y re

stric

tions

•MPI_Sendrecv

(sendbuf,sendcount,sendtype,dest,sendtag,recvbuf,

recvcount,recvtype,source,recvtag,comm,status)

–sendbuf

choi

ceI

star

ting

addr

ess

of s

endi

ng b

uffe

r–

sendcount

int

Inu

mbe

r of e

lem

ents

in s

endi

ng b

uffe

r–

sendtype

MP

I_D

atat

ypeI

data

type

of e

ach

send

ing

buffe

r ele

men

t–

dest

int

Ira

nk o

f des

tinat

ion

–sendtag

int

Im

essa

ge ta

g fo

r sen

ding

–comm

MP

I_C

omm

Ico

mm

unic

ator

–recvbuf

choi

ceI

star

ting

addr

ess

of re

ceiv

ing

buffe

r–

recvcount

int

Inu

mbe

r of e

lem

ents

in re

ceiv

ing

buffe

r–

recvtype

MP

I_D

atat

ypeI

data

type

of e

ach

rece

ivin

g bu

ffer e

lem

ent

–source

int

Ira

nk o

f sou

rce

–recvtag

int

Im

essa

ge ta

g fo

r rec

eivi

ng–

comm

MP

I_C

omm

Ico

mm

unic

ator

–status

MP

I_S

tatu

sO

arra

y of

sta

tus

obje

cts

MPI_STATUS_SIZE: defined in ‘mpif.h’, ‘mpi.h’

C15

3M

PI P

rogr

amm

ing

Page 155: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Fund

amen

tal M

PI

154

REC

V: re

ceiv

ing

to e

xter

naln

odes

Rec

v. c

ontin

uous

dat

a to

recv

. buf

fer f

rom

nei

ghbo

rs•

MPI_Irecv

(recvbuf,count,datatype,source,tag,comm,request)

recvbuf

choi

ceI

star

ting

addr

ess

of re

ceiv

ing

buffe

rcount

int

Inu

mbe

r of e

lem

ents

in re

ceiv

ing

buffe

rdatatype

MP

I_D

atat

ypeI

data

type

of e

ach

rece

ivin

g bu

ffer e

lem

ent

source

int

Ira

nk o

f sou

rce

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

Page 156: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

•MPI_Isend

(sendbuf,count,datatype,dest,tag,comm,request)

sendbuf

choi

ceI

star

ting

addr

ess

of s

endi

ng b

uffe

rcount

int

Inu

mbe

r of e

lem

ents

in s

endi

ng b

uffe

rdatatype

MP

I_D

atat

ypeI

data

type

of e

ach

send

ing

buffe

r ele

men

tdest

int

Ira

nk o

f des

tinat

ion

Fund

amen

tal M

PI

155

SEN

D: s

endi

ng fr

om b

ound

ary

node

sSe

nd c

ontin

uous

dat

a to

sen

d bu

ffer o

f nei

ghbo

rs

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

12

3

45

67

89

11

10

14

13

15

12

PE#0

78

910

45

612

311

12

PE#1

71

23

10

911

12

56

84 PE#2

34

8

69

10

12

12

5

11

7

PE#3

Page 157: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Req

uest

, Sta

tus

in C

Lan

guag

eS

peci

al T

YP

E o

f Arra

ys

•MPI_Sendrecv: status

•MPI_Isend: request

•MPI_Irecv: request

•MPI_Waitall: request, status

MPI_Status *StatSend, *StatRecv;

MPI_Request *RequestSend, *RequestRecv;

・・・

StatSend = malloc(sizeof(MPI_Status) * NEIBpetot);

StatRecv = malloc(sizeof(MPI_Status) * NEIBpetot);

RequestSend = malloc(sizeof(MPI_Request) * NEIBpetot);

RequestRecv = malloc(sizeof(MPI_Request) * NEIBpetot);

MPI_Status *Status;

・・・

Status = malloc(sizeof(MPI_Status));

156

MP

I Pro

gram

min

g

Page 158: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

157

File

s on

Oak

leaf

-FX

157

MP

I Pro

gram

min

g

Fotran

>$ cd <$O-TOP>

>$ cp /home/z30088/class_eps/F/s2-f.tar .

>$ tar xvf s2-f.tar

C>$ cd <$O-TOP>

>$ cp /home/z30088/class_eps/C/s2-c.tar .

>$ tar xvf s2-c.tar

Confirm Directory

>$ ls

mpi

>$ cd mpi/S2

This directory is called as <$O-S2> in this course.

<$O-S2> = <$O-TOP>/mpi/S2

Page 159: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ex.1

: Sen

d-R

ecv

a Sc

alar

•E

xcha

nge

VA

L(re

al, 8

-byt

e) b

etw

een

PE

#0 &

PE

#1if (my_rank.eq.0) NEIB= 1

if (my_rank.eq.1) NEIB= 0

call MPI_Isend (VAL

,1,MPI_DOUBLE_PRECISION,NEIB,…,req_send,…)

call MPI_Irecv (VALtemp,1,MPI_DOUBLE_PRECISION,NEIB,…,req_recv,…)

call MPI_Waitall (…,req_recv,stat_recv,…): Recv.buf

VALtemp

can be used

call MPI_Waitall (…,req_send,stat_send,…): Send buf

VAL can

be modified

VAL= VALtemp

if (my_rank.eq.0) NEIB= 1

if (my_rank.eq.1) NEIB= 0

call MPI_Sendrecv (VAL

,1,MPI_DOUBLE_PRECISION,NEIB,… &

VALtemp,1,MPI_DOUBLE_PRECISION,NEIB,…, status,…)

VAL= VALtemp

Nam

e of

recv

. buf

fer c

ould

be

“VAL

”, b

ut n

ot re

com

men

ded.

158

MP

I Pro

gram

min

g

Page 160: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ex.1

: Sen

d-R

ecv

a Sc

alar

Isen

d/Ire

cv/W

aita

ll

#include <stdio.h>

#include <stdlib.h>

#include "mpi.h"

int main(int argc, char **argv){

int neib, MyRank, PeTot;

double VAL, VALx;

MPI_Status *StatSend, *StatRecv;

MPI_Request *RequestSend, *RequestRecv;

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &PeTot);

MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);

StatSend = malloc(sizeof(MPI_Status) * 1);

StatRecv = malloc(sizeof(MPI_Status) * 1);

RequestSend = malloc(sizeof(MPI_Request) * 1);

RequestRecv = malloc(sizeof(MPI_Request) * 1);

if(MyRank == 0) {neib= 1; VAL= 10.0;}

if(MyRank == 1) {neib= 0; VAL= 11.0;}

MPI_Isend(&VAL , 1, MPI_DOUBLE, neib, 0, MPI_COMM_WORLD, &RequestSend[0]);

MPI_Irecv(&VALx, 1, MPI_DOUBLE, neib, 0, MPI_COMM_WORLD, &RequestRecv[0]);

MPI_Waitall(1, RequestRecv, StatRecv);

MPI_Waitall(1, RequestSend, StatSend);

VAL=VALx;

MPI_Finalize();

return 0; }

$> cd <$O-S2>

$> mpifccpx –Kfast ex1-1.c

$> pjsub go2.sh

159

MP

I Pro

gram

min

g

Page 161: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ex.1

: Sen

d-R

ecv

a Sc

alar

Sen

dRec

v

#include <stdio.h>

#include <stdlib.h>

#include "mpi.h"

int main(int argc, char **argv){

int neib;

int MyRank, PeTot;

double VAL, VALtemp;

MPI_Status *StatSR;

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &PeTot);

MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);

if(MyRank == 0) {neib= 1; VAL= 10.0;}

if(MyRank == 1) {neib= 0; VAL= 11.0;}

StatSR = malloc(sizeof(MPI_Status));

MPI_Sendrecv(&VAL , 1, MPI_DOUBLE, neib, 0,

&VALtemp, 1, MPI_DOUBLE, neib, 0, MPI_COMM_WORLD, StatSR);

VAL=VALtemp;

MPI_Finalize();

return 0;

}

$> cd <$O-S2>

$> mpifccpx –Kfast ex1-2.c

$> pjsub go2.sh

160

MP

I Pro

gram

min

g

Page 162: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ex.2

: Sen

d-R

ecv

an A

rray

(1/4

)

•E

xcha

nge

VE

C (r

eal,

8-by

te) b

etw

een

PE

#0 &

PE

#1•

PE

#0 to

PE

#1–

PE

#0: s

end

VE

C(1

)-VE

C(1

1) (

leng

th=1

1)–

PE

#1: r

ecv.

as

VE

C(2

6)-V

EC

(36)

(len

gth=

11)

•P

E#1

to P

E#0

–P

E#1

: sen

d V

EC

(1)-V

EC

(25)

(le

ngth

=25)

–P

E#0

: rec

v. a

s V

EC

(12)

-VE

C(3

6) (l

engt

h=25

)

•P

ract

ice:

Dev

elop

a p

rogr

am fo

r thi

s op

erat

ion.

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

PE

#0

PE

#11

23

45

67

89

1011

1213

1415

1617

1819

2021

2223

2425

2627

2829

3031

3233

3435

36

161

MP

I Pro

gram

min

g

Page 163: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Prac

tice:

t1

•In

itial

sta

tus

of V

EC

[:]:

–P

E#0

VE

C[0

-35]

= 10

1,10

2,10

3,~,

135,

136

–P

E#1

VE

C[0

-35]

= 20

1,20

2,20

3,~,

235,

236

•C

onfir

m th

e re

sults

in th

e ne

xt p

age

•U

sing

follo

win

g tw

o fu

nctio

ns:

–M

PI_

Isen

d/Ire

cv/W

aita

ll–

MP

I_S

endr

ecv

162

MP

I Pro

gram

min

g

t1

Page 164: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Estim

ated

Res

ults

0 #BEFORE# 1 101.

0 #BEFORE# 2 102.

0 #BEFORE# 3 103.

0 #BEFORE# 4 104.

0 #BEFORE# 5 105.

0 #BEFORE# 6 106.

0 #BEFORE# 7 107.

0 #BEFORE# 8 108.

0 #BEFORE# 9 109.

0 #BEFORE# 10 110.

0 #BEFORE# 11 111.

0 #BEFORE# 12 112.

0 #BEFORE# 13 113.

0 #BEFORE# 14 114.

0 #BEFORE# 15 115.

0 #BEFORE# 16 116.

0 #BEFORE# 17 117.

0 #BEFORE# 18 118.

0 #BEFORE# 19 119.

0 #BEFORE# 20 120.

0 #BEFORE# 21 121.

0 #BEFORE# 22 122.

0 #BEFORE# 23 123.

0 #BEFORE# 24 124.

0 #BEFORE# 25 125.

0 #BEFORE# 26 126.

0 #BEFORE# 27 127.

0 #BEFORE# 28 128.

0 #BEFORE# 29 129.

0 #BEFORE# 30 130.

0 #BEFORE# 31 131.

0 #BEFORE# 32 132.

0 #BEFORE# 33 133.

0 #BEFORE# 34 134.

0 #BEFORE# 35 135.

0 #BEFORE# 36 136.

0 #AFTER # 1 101.

0 #AFTER # 2 102.

0 #AFTER # 3 103.

0 #AFTER # 4 104.

0 #AFTER # 5 105.

0 #AFTER # 6 106.

0 #AFTER # 7 107.

0 #AFTER # 8 108.

0 #AFTER # 9 109.

0 #AFTER # 10 110.

0 #AFTER # 11 111.

0 #AFTER # 12 201.

0 #AFTER # 13 202.

0 #AFTER # 14 203.

0 #AFTER # 15 204.

0 #AFTER # 16 205.

0 #AFTER # 17 206.

0 #AFTER # 18 207.

0 #AFTER # 19 208.

0 #AFTER # 20 209.

0 #AFTER # 21 210.

0 #AFTER # 22 211.

0 #AFTER # 23 212.

0 #AFTER # 24 213.

0 #AFTER # 25 214.

0 #AFTER # 26 215.

0 #AFTER # 27 216.

0 #AFTER # 28 217.

0 #AFTER # 29 218.

0 #AFTER # 30 219.

0 #AFTER # 31 220.

0 #AFTER # 32 221.

0 #AFTER # 33 222.

0 #AFTER # 34 223.

0 #AFTER # 35 224.

0 #AFTER # 36 225.

1 #BEFORE# 1 201.

1 #BEFORE# 2 202.

1 #BEFORE# 3 203.

1 #BEFORE# 4 204.

1 #BEFORE# 5 205.

1 #BEFORE# 6 206.

1 #BEFORE# 7 207.

1 #BEFORE# 8 208.

1 #BEFORE# 9 209.

1 #BEFORE# 10 210.

1 #BEFORE# 11 211.

1 #BEFORE# 12 212.

1 #BEFORE# 13 213.

1 #BEFORE# 14 214.

1 #BEFORE# 15 215.

1 #BEFORE# 16 216.

1 #BEFORE# 17 217.

1 #BEFORE# 18 218.

1 #BEFORE# 19 219.

1 #BEFORE# 20 220.

1 #BEFORE# 21 221.

1 #BEFORE# 22 222.

1 #BEFORE# 23 223.

1 #BEFORE# 24 224.

1 #BEFORE# 25 225.

1 #BEFORE# 26 226.

1 #BEFORE# 27 227.

1 #BEFORE# 28 228.

1 #BEFORE# 29 229.

1 #BEFORE# 30 230.

1 #BEFORE# 31 231.

1 #BEFORE# 32 232.

1 #BEFORE# 33 233.

1 #BEFORE# 34 234.

1 #BEFORE# 35 235.

1 #BEFORE# 36 236.

1 #AFTER # 1 201.

1 #AFTER # 2 202.

1 #AFTER # 3 203.

1 #AFTER # 4 204.

1 #AFTER # 5 205.

1 #AFTER # 6 206.

1 #AFTER # 7 207.

1 #AFTER # 8 208.

1 #AFTER # 9 209.

1 #AFTER # 10 210.

1 #AFTER # 11 211.

1 #AFTER # 12 212.

1 #AFTER # 13 213.

1 #AFTER # 14 214.

1 #AFTER # 15 215.

1 #AFTER # 16 216.

1 #AFTER # 17 217.

1 #AFTER # 18 218.

1 #AFTER # 19 219.

1 #AFTER # 20 220.

1 #AFTER # 21 221.

1 #AFTER # 22 222.

1 #AFTER # 23 223.

1 #AFTER # 24 224.

1 #AFTER # 25 225.

1 #AFTER # 26 101.

1 #AFTER # 27 102.

1 #AFTER # 28 103.

1 #AFTER # 29 104.

1 #AFTER # 30 105.

1 #AFTER # 31 106.

1 #AFTER # 32 107.

1 #AFTER # 33 108.

1 #AFTER # 34 109.

1 #AFTER # 35 110.

1 #AFTER # 36 111.

163

MP

I Pro

gram

min

g

t1

Page 165: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ex.2

: Sen

d-R

ecv

an A

rray

(2/4

)if (my_rank.eq.0) then

call MPI_Isend (VEC( 1),11,MPI_DOUBLE_PRECISION,1,…,req_send,…)

call MPI_Irecv (VEC(12),25,MPI_DOUBLE_PRECISION,1,…,req_recv,…)

endif

if (my_rank.eq.1) then

call MPI_Isend (VEC( 1),25,MPI_DOUBLE_PRECISION,0,…,req_send,…)

call MPI_Irecv (VEC(26),11,MPI_DOUBLE_PRECISION,0,…,req_recv,…)

endif

call MPI_Waitall (…,req_recv,stat_recv,…)

call MPI_Waitall (…,req_send,stat_send,…)

It w

orks

, but

com

plic

ated

ope

ratio

ns.

Not

look

s lik

e S

PM

D.

Not

por

tabl

e.

164

MP

I Pro

gram

min

g

t1

Page 166: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ex.2

: Sen

d-R

ecv

an A

rray

(3/4

)if (my_rank.eq.0) then

NEIB= 1

start_send= 1

length_send= 11

start_recv= length_send + 1

length_recv= 25

endif

if (my_rank.eq.1) then

NEIB= 0

start_send= 1

length_send= 25

start_recv= length_send + 1

length_recv= 11

endif

call MPI_Isend &

(VEC(start_send),length_send,MPI_DOUBLE_PRECISION,NEIB,…,req_send,…)

call MPI_Irecv &

(VEC(start_recv),length_recv,MPI_DOUBLE_PRECISION,NEIB,…,req_recv,…)

call MPI_Waitall (…,req_recv,stat_recv,…)

call MPI_Waitall (…,req_send,stat_send,…)

This

is “S

MP

D” !

!

165

MP

I Pro

gram

min

g

t1

Page 167: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ex.2

: Sen

d-R

ecv

an A

rray

(4/4

)if (my_rank.eq.0) then

NEIB= 1

start_send= 1

length_send= 11

start_recv= length_send + 1

length_recv= 25

endif

if (my_rank.eq.1) then

NEIB= 0

start_send= 1

length_send= 25

start_recv= length_send + 1

length_recv= 11

endif

call MPI_Sendrecv &

(VEC(start_send),length_send,MPI_DOUBLE_PRECISION,NEIB,… &

VEC(start_recv),length_recv,MPI_DOUBLE_PRECISION,NEIB,…, status,…)

166

MP

I Pro

gram

min

g

t1

Page 168: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Not

ice:

Sen

d/R

ecv

Arr

ays

#PE0send:

VEC(start_send)~

VEC(start_send+length_send-1)

#PE1recv:

VEC(start_recv)~

VEC(start_recv+length_recv-1)

#PE1send:

VEC(start_send)~

VEC(start_send+length_send-1)

#PE0recv:

VEC(start_recv)~

VEC(start_recv+length_recv-1)

•“le

ngth

_sen

d” o

f sen

ding

pro

cess

mus

t be

equa

l to

“leng

th_r

ecv”

of r

ecei

ving

pro

cess

.–

PE

#0 to

PE

#1, P

E#1

to P

E#0

•“s

endb

uf” a

nd “r

ecvb

uf”:

diffe

rent

add

ress

167

MP

I Pro

gram

min

g

t1

Page 169: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Peer

-to-P

eer C

omm

unic

atio

n

•W

hat i

s P

2P C

omm

unic

atio

n ?

•2D

Pro

blem

, Gen

eral

ized

Com

mun

icat

ion

Tabl

e–

2D F

DM

–P

robl

em S

ettin

g–

Dis

tribu

ted

Loca

l Dat

a an

d C

omm

unic

atio

n Ta

ble

–Im

plem

enta

tion

•R

epor

t S2

168

MP

I Pro

gram

min

g

Page 170: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

2D F

DM

(1/5

)E

ntire

Mes

h

169

MP

I Pro

gram

min

g

Page 171: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

2D F

DM

(5-p

oint

, cen

tral

diff

eren

ce)

fy

x

2

2

2

2

xx

W

C E

N S

y y

CS

CN

WC

Ef

yx

22

22

170

MP

I Pro

gram

min

g

Page 172: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Dec

ompo

se in

to 4

dom

ains

56

78

1314

1516

2122

2324

2930

3132

3334

3536

4142

4344

4950

5152

5758

5960

3738

3940

4546

4748

5354

5556

6162

6364

12

34

910

1112

1718

1920

2526

2728

12

34

910

1112

1718

1920

2526

2728

171

MP

I Pro

gram

min

g

Page 173: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

3334

3536

4142

4344

4950

5152

5758

5960

12

34

910

1112

1718

1920

2526

2728

12

34

910

1112

1718

1920

2526

2728

56

78

1314

1516

2122

2324

2930

3132

4 do

mai

ns: G

loba

l ID

PE

#0P

E#1

PE

#2P

E#3

3738

3940

4546

4748

5354

5556

6162

6364

172

MP

I Pro

gram

min

g

Page 174: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

4 do

mai

ns: L

ocal

ID

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

PE

#0P

E#1

173

MP

I Pro

gram

min

g

PE

#2P

E#3

Page 175: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Exte

rnal

Poi

nts:

Ove

rlapp

ed R

egio

n

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

11

1314

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

56

7

910

1112

1314

1516

PE

#0P

E#1

xx

W

C E

N S

y y

12

1516

13

4

12

15164

3

8

174

MP

I Pro

gram

min

g

PE

#2P

E#3

Page 176: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Exte

rnal

Poi

nts:

Ove

rlapp

ed R

egio

n

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

23

4

67

8

1011

12

1314

1516

12

34

910

1112

1718

1920

2526

2728 4

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

67

8

1011

12

1415

16

PE

#0P

E#1

12

3

15912

34

5913

175

MP

I Pro

gram

min

g

PE

#2P

E#3

Page 177: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Loca

l ID

of E

xter

nal P

oint

s ?

12

34

910

1112

1718

1920

2526

2728

12

3

56

7

910

11

12

34

910

1112

1718

1920

2526

2728

23

4

67

8

1011

12

910

1112

1718

1920

2526

2728

12

34

56

7

910

11

1314

15

12

34

910

1112

1718

1920

2526

2728

67

8

1011

12

1415

16

PE

#0P

E#1

? ? ? ?

??

??

? ? ? ?

? ? ? ?

? ? ? ?

??

??

??

??

??

??

4812

1314

1516

1591314

1516

12

3481216

12

34

5913

176

MP

I Pro

gram

min

g

PE

#2P

E#3

Page 178: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ove

rlapp

ed R

egio

n

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

910

1112

1718

1920

2526

2728

12

34

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

PE

#0P

E#1

? ? ? ?

??

??

? ? ? ?

? ? ? ?

? ? ? ?

??

??

??

??

??

??

177

MP

I Pro

gram

min

g

PE

#2P

E#3

Page 179: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ove

rlapp

ed R

egio

n

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

910

1112

1718

1920

2526

2728

12

34

12

34

56

78

910

1112

1314

1516

12

34

910

1112

1718

1920

2526

2728

12

34

56

78

910

1112

1314

1516

PE

#0P

E#1

? ? ? ?

??

??

? ? ? ?

? ? ? ?

? ? ? ?

??

??

??

??

??

??

178

MP

I Pro

gram

min

g

PE

#2P

E#3

Page 180: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Peer

-to-P

eer C

omm

unic

atio

n

•W

hat i

s P

2P C

omm

unic

atio

n ?

•2D

Pro

blem

, Gen

eral

ized

Com

mun

icat

ion

Tabl

e–

2D F

DM

–P

robl

em S

ettin

g–

Dis

tribu

ted

Loca

l Dat

a an

d C

omm

unic

atio

n Ta

ble

–Im

plem

enta

tion

•R

epor

t S2

179

MP

I Pro

gram

min

g

Page 181: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Prob

lem

Set

ting:

2D

FD

M

•2D

regi

on w

ith 6

4 m

eshe

s (8

x8)

•E

ach

mes

h ha

s gl

obal

ID

from

1 to

64

–In

this

exa

mpl

e, th

is

glob

al ID

is c

onsi

dere

d as

dep

ende

nt v

aria

ble,

su

ch a

s te

mpe

ratu

re,

pres

sure

etc

.–

Som

ethi

ng li

ke c

ompu

ted

resu

lts

5758

5960

6162

6364

4950

5152

5354

5556

4142

4344

4546

4748

3334

3536

3738

3940

2526

2728

2930

3132

1718

1920

2122

2324

910

1112

1314

1516

12

34

56

78

180

MP

I Pro

gram

min

g

Page 182: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Prob

lem

Set

ting:

Dis

trib

uted

Loc

al D

ata

•4

sub-

dom

ains

. •

Info

. of e

xter

nal p

oint

s (g

loba

l ID

of m

esh)

is

rece

ived

from

nei

ghbo

rs.

–P

E#0

rece

ives

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

3334

3536

2526

2728

61 53 45 37 29 21 13 5

60 52 44 36 28 20 12 4

2930

3132

3738

3940

181

MP

I Pro

gram

min

g

Page 183: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ope

ratio

ns o

f 2D

FD

M

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

xx

W

C E

N S

y yfy

x

2

2

2

2

CS

CN

WC

Ef

yx

2

2

22

182

MP

I Pro

gram

min

g

Page 184: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ope

ratio

ns o

f 2D

FD

M

5758

5960

4950

5152

4142

4344

3334

35

6162

6364

5354

5556

4546

4748

3738

3940

2526

1719

121

34

3031

3221

2223

2413

1415

165

67

8

189

1011

2

3627

28 2029

xx

W

C E

N S

y yfy

x

2

2

2

2

CS

CN

WC

Ef

yx

2

2

22

183

MP

I Pro

gram

min

g

Page 185: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Com

puta

tion

(1/3

)

•O

n ea

ch P

E, i

nfo.

of i

nter

nalp

ts (i

=1-N

(=16

)) a

re re

ad fr

om

dist

ribut

ed lo

cal d

ata,

info

. of b

ound

ary

pts

are

sent

to

neig

hbor

s, a

nd th

ey a

re re

ceiv

ed a

s in

fo. o

f ext

erna

lpts

.

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

184

MP

I Pro

gram

min

g

Page 186: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Com

puta

tion

(2/3

): B

efor

e Se

nd/R

ecv

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

1: 3

39:

49

17: ?

2: 3

410

: 50

18: ?

3: 3

511

: 51

19: ?

4: 3

612

: 52

20: ?

5: 4

113

: 57

21: ?

6: 4

214

: 58

22: ?

7: 4

315

: 59

23: ?

8: 4

416

: 60

24: ?

1: 3

79:

53

17: ?

2: 3

810

: 54

18: ?

3: 3

911

: 55

19: ?

4: 4

012

: 56

20: ?

5: 4

513

: 61

21: ?

6: 4

614

: 62

22: ?

7: 4

715

: 63

23: ?

8: 4

816

: 64

24: ?

1:

19:

17

17: ?

2:

210

: 18

18: ?

3:

311

: 19

19: ?

4:

412

: 20

20: ?

5:

913

: 25

21: ?

6: 1

014

: 26

22: ?

7: 1

115

: 27

23: ?

8: 1

216

: 28

24: ?

1:

59:

21

17: ?

2:

610

: 22

18: ?

3:

711

: 23

19: ?

4:

812

: 24

20: ?

5: 1

313

: 29

21: ?

6: 1

414

: 30

22: ?

7: 1

515

: 31

23: ?

8: 1

616

: 32

24: ?

3334

3536

2526

2728

61 53 45 37 29 21 13 5

60 52 44 36 28 20 12 4

2930

3132

3738

3940

185

MP

I Pro

gram

min

g

Page 187: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Com

puta

tion

(2/3

): B

efor

e Se

nd/R

ecv

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

1: 3

39:

49

17: ?

2: 3

410

: 50

18: ?

3: 3

511

: 51

19: ?

4: 3

612

: 52

20: ?

5: 4

113

: 57

21: ?

6: 4

214

: 58

22: ?

7: 4

315

: 59

23: ?

8: 4

416

: 60

24: ?

1: 3

79:

53

17: ?

2: 3

810

: 54

18: ?

3: 3

911

: 55

19: ?

4: 4

012

: 56

20: ?

5: 4

513

: 61

21: ?

6: 4

614

: 62

22: ?

7: 4

715

: 63

23: ?

8: 4

816

: 64

24: ?

1:

19:

17

17: ?

2:

210

: 18

18: ?

3:

311

: 19

19: ?

4:

412

: 20

20: ?

5:

913

: 25

21: ?

6: 1

014

: 26

22: ?

7: 1

115

: 27

23: ?

8: 1

216

: 28

24: ?

1:

59:

21

17: ?

2:

610

: 22

18: ?

3:

711

: 23

19: ?

4:

812

: 24

20: ?

5: 1

313

: 29

21: ?

6: 1

414

: 30

22: ?

7: 1

515

: 31

23: ?

8: 1

616

: 32

24: ?

3334

3536

2526

2728

61 53 45 37 29 21 13 5

60 52 44 36 28 20 12 4

2930

3132

3738

3940

186

MP

I Pro

gram

min

g

Page 188: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Com

puta

tion

(3/3

): A

fter S

end/

Rec

v

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

PE#0

PE#1

PE#2

PE#3

1: 3

39:

49

17: 3

72:

34

10: 5

018

: 45

3: 3

511

: 51

19: 5

34:

36

12: 5

220

: 61

5: 4

113

: 57

21: 2

56:

42

14: 5

822

: 26

7: 4

315

: 59

23: 2

78:

44

16: 6

024

: 28

1: 3

79:

53

17: 3

62:

38

10: 5

418

: 44

3: 3

911

: 55

19: 5

24:

40

12: 5

620

: 60

5: 4

513

: 61

21: 2

96:

46

14: 6

222

: 30

7: 4

715

: 63

23: 3

18:

48

16: 6

424

: 32

1:

19:

17

17:

52:

2

10: 1

818

: 14

3:

311

: 19

19: 2

14:

4

12: 2

020

: 29

5:

913

: 25

21: 3

36:

10

14: 2

622

: 34

7: 1

115

: 27

23: 3

58:

12

16: 2

824

: 36

1:

59:

21

17:

42:

6

10: 2

218

: 12

3:

711

: 23

19: 2

04:

8

12: 2

420

: 28

5: 1

313

: 29

21: 3

76:

14

14: 3

022

: 38

7: 1

515

: 31

23: 3

98:

16

16: 3

224

: 40

3334

3536

2526

2728

61 53 45 37 29 21 13 5

60 52 44 36 28 20 12 4

2930

3132

3738

3940

5758

5960

4950

5152

4142

4344

3334

3536

2930

3132

2122

2324

1314

1516

56

78

187

MP

I Pro

gram

min

g

Page 189: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Peer

-to-P

eer C

omm

unic

atio

n

•W

hat i

s P

2P C

omm

unic

atio

n ?

•2D

Pro

blem

, Gen

eral

ized

Com

mun

icat

ion

Tabl

e–

2D F

DM

–P

robl

em S

ettin

g–

Dis

tribu

ted

Loca

l Dat

a an

d C

omm

unic

atio

n Ta

ble

–Im

plem

enta

tion

•R

epor

t S2

188

MP

I Pro

gram

min

g

Page 190: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ove

rvie

w o

f Dis

trib

uted

Loc

al D

ata

Exa

mpl

e on

PE

#0

Valu

e at

eac

h m

esh

(= G

loba

l ID

) Lo

cal I

D

2526

2728

1718

1920

910

1112

12

34

PE#0

PE#1

PE#2

1314

1516

910

1112

56

78

12

34

PE#0

PE#1

PE#2

189

MP

I Pro

gram

min

g

Page 191: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

SPM

D・・・

PE

#0

“a.

out”

“sq

m.0

PE

#1

“a.

out”

“sq

m.1

PE

#2

“a.

out”

“sq

m.2

PE

#3

“a.

out”

“sq

m.3

“sq

.0”

“sq

.1”

“sq

.2”

“sq

.3”

Dis

t. Lo

cal D

ata

Set

s (N

eigh

bors

,C

omm

. Tab

les)

Dis

t. Lo

cal D

ata

Set

s (G

loba

l ID

of

Inte

rnal

Poi

nts)

Geo

met

ry

Res

ults

190

MP

I Pro

gram

min

g

Page 192: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

2D F

DM

: PE#

0In

form

atio

n at

eac

h do

mai

n (1

/4)

12

34

56

78

910

1112

1314

1516

191

MP

I Pro

gram

min

g

Inte

rnal

Poi

nts

Mes

hes

orig

inal

ly a

ssig

ned

to th

e do

mai

n

Page 193: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

2D F

DM

: PE#

0In

form

atio

n at

eac

h do

mai

n (2

/4)

PE#2

PE#1

12

34

56

78

910

1112

1314

1516

●●

●●

● ● ● ●

Ext

erna

l Poi

nts

Mes

hes

orig

inal

ly a

ssig

ned

to d

iffer

ent d

omai

n, b

ut

requ

ired

for c

ompu

tatio

n of

mes

hes

in th

e do

mai

n (m

eshe

s in

ove

rlapp

ed re

gion

s)

・S

leev

es・H

alo

192

MP

I Pro

gram

min

g

Inte

rnal

Poi

nts

Mes

hes

orig

inal

ly a

ssig

ned

to th

e do

mai

n

Page 194: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

2D F

DM

: PE#

0In

form

atio

n at

eac

h do

mai

n (3

/4)

PE#1

12

34

56

78

910

1112

1314

1516

●●

●●

● ● ● ●

193

MP

I Pro

gram

min

g

Inte

rnal

Poi

nts

Mes

hes

orig

inal

ly a

ssig

ned

to th

e do

mai

n

Ext

erna

l Poi

nts

Mes

hes

orig

inal

ly a

ssig

ned

to d

iffer

ent d

omai

n, b

ut

requ

ired

for c

ompu

tatio

n of

mes

hes

in th

e do

mai

n (m

eshe

s in

ove

rlapp

ed re

gion

s)

Bou

ndar

y P

oint

sIn

tern

al p

oint

s, w

hich

are

als

o ex

tern

al p

oint

s of

ot

her d

omai

ns (u

sed

in c

ompu

tatio

ns o

f mes

hes

in

othe

r dom

ains

)

PE#2

Page 195: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

2D F

DM

: PE#

0In

form

atio

n at

eac

h do

mai

n (4

/4)

Inte

rnal

Poi

nts

Mes

hes

orig

inal

ly a

ssig

ned

to th

e do

mai

n

Ext

erna

l Poi

nts

Mes

hes

orig

inal

ly a

ssig

ned

to d

iffer

ent d

omai

n, b

ut

requ

ired

for c

ompu

tatio

n of

mes

hes

in th

e do

mai

n (m

eshe

s in

ove

rlapp

ed re

gion

s)

Bou

ndar

y P

oint

sIn

tern

al p

oint

s, w

hich

are

als

o ex

tern

al p

oint

s of

ot

her d

omai

ns (u

sed

in c

ompu

tatio

ns o

f mes

hes

in

othe

r dom

ains

)

Rel

atio

nshi

ps b

etw

een

Dom

ains

Com

mun

icat

ion

Tabl

e: E

xter

nal/B

ound

ary

Poi

nts

Nei

ghbo

rs

PE#1

12

34

56

78

910

1112

1314

1516

●●

●●

● ● ● ●

194

MP

I Pro

gram

min

g

PE#2

Page 196: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Des

crip

tion

of D

istr

ibut

ed L

ocal

Dat

a

•In

tern

al/E

xter

nal P

oint

s–

Num

berin

g: S

tarti

ng fr

om in

tern

alpt

s,

then

ext

erna

lpts

afte

r tha

t•

Nei

ghbo

rs–

Sha

res

over

lapp

ed m

eshe

s–

Num

ber a

nd ID

of n

eigh

bors

•E

xter

nal P

oint

s–

From

whe

re, h

ow m

any,

and

whi

ch

exte

rnal

poi

nts

are

rece

ived

/impo

rted

? •

Bou

ndar

y P

oint

s–

To w

here

, how

man

y an

d w

hich

bo

unda

ry p

oint

s ar

e se

nt/e

xpor

ted

?

12

34

17

56

78

18

910

1112

19

1314

1516

20

2122

2324

195

MP

I Pro

gram

min

g

Page 197: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Ove

rvie

w o

f Dis

trib

uted

Loc

al D

ata

Exa

mpl

e on

PE

#0

2526

2728

1718

1920

910

1112

12

34

PE#0

PE#1

PE#2

2122

2324

1314

1516

910

1112

56

78

12

34

20 19 18 17PE

#0PE

#1

PE#2

196

MP

I Pro

gram

min

g

Valu

e at

eac

h m

esh

(= G

loba

l ID

) Lo

cal I

D

Page 198: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Gen

eral

ized

Com

m. T

able

: Sen

d•

Nei

ghbo

rs–

Nei

bPE

Tot,

Nei

bPE

[nei

b]•

Mes

sage

siz

e fo

r eac

h ne

ighb

or–

expo

rt_in

dex[

neib

], ne

ib=

0, N

eibP

ETo

t-1•

ID o

f bou

ndar

ypo

ints

–ex

port_

item

[k],

k= 0

, exp

ort_

inde

x[N

eibP

ETo

t]-1

•M

essa

ges

to e

ach

neig

hbor

–S

endB

uf[k

], k=

0, e

xpor

t_in

dex[

Nei

bPE

Tot]-

1

C

197

MP

I Pro

gram

min

g

Page 199: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

SEN

D: M

PI_I

send

/Irec

v/W

aita

llne

ib#0

Send

Buf

neib

#1ne

ib#2

neib

#3

BU

Flen

gth_

eB

UFl

engt

h_e

BU

Flen

gth_

eB

UFl

engt

h_e

expo

rt_i

ndex

[0]

expo

rt_i

ndex

[1]

expo

rt_i

ndex

[2]

expo

rt_i

ndex

[3]

expo

rt_i

ndex

[4]

for (neib=0; neib<NeibPETot;neib++){

for (k=export_index[neib];k<export_index[neib+1];k++){

kk= export_item[k];

SendBuf[k]= VAL[kk];

}} for (neib=0; neib<NeibPETot; neib++){

tag= 0;

iS_e= export_index[neib];

iE_e= export_index[neib+1];

BUFlength_e= iE_e -iS_e

ierr= MPI_Isend

(&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],0,

MPI_COMM_WORLD, &ReqSend[neib])

} MPI_Waitall(NeibPETot, ReqSend, StatSend);

Cop

ied

to s

endi

ng b

uffe

rs

expo

rt_i

tem

(exp

ort_

inde

x[ne

ib]:e

xpor

t_in

dex[

neib

+1]-1

) ar

e se

nt to

nei

b-th

neig

hbor

C19

8M

PI P

rogr

amm

ing

Page 200: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Gen

eral

ized

Com

m. T

able

: Rec

eive

•N

eigh

bors

–N

eibP

ETo

t ,N

eibP

E[n

eib]

•M

essa

ge s

ize

for e

ach

neig

hbor

–im

port_

inde

x[ne

ib],

neib

= 0,

Nei

bPE

Tot-1

•ID

of e

xter

nalp

oint

s–

impo

rt_ite

m[k

], k=

0, i

mpo

rt_in

dex[

Nei

bPE

Tot]-

1•

Mes

sage

s fro

m e

ach

neig

hbor

–R

ecvB

uf[k

], k=

0, i

mpo

rt_in

dex[

Nei

bPE

Tot]-

1

199

MP

I Pro

gram

min

g

C

Page 201: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

REC

V: M

PI_I

send

/Irec

v/W

aita

ll

neib

#0R

ecvB

ufne

ib#1

neib

#2ne

ib#3

BU

Flen

gth_

iB

UFl

engt

h_i

BU

Flen

gth_

iB

UFl

engt

h_i

for (neib=0; neib<NeibPETot; neib++){

tag= 0;

iS_i= import_index[neib];

iE_i= import_index[neib+1];

BUFlength_i= iE_i -iS_i

ierr= MPI_Irecv

(&RecvBuf[iS_i], BUFlength_i, MPI_DOUBLE, NeibPE[neib],0,

MPI_COMM_WORLD, &ReqRecv[neib])

} MPI_Waitall(NeibPETot, ReqRecv, StatRecv);

for (neib=0; neib<NeibPETot;neib++){

for (k=import_index[neib];k<import_index[neib+1];k++){

kk= import_item[k];

VAL[kk]= RecvBuf[k];

}} im

port

_ind

ex[0

]im

port

_ind

ex[1

]im

port

_ind

ex[2

]im

port

_ind

ex[3

]im

port

_ind

ex[4

]C20

0M

PI P

rogr

amm

ing

impo

rt_i

tem

(impo

rt_i

ndex

[nei

b]:im

port

_ind

ex[n

eib+

1]-1

) ar

e re

ceiv

ed fr

om n

eib-

thne

ighb

or

Cop

ied

from

rece

ivin

g bu

ffer

Page 202: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Rel

atio

nshi

p SE

ND

/REC

V

do neib= 1, NEIBPETOT

iS_i= import_index(neib-1) + 1

iE_i= import_index(neib )

BUFlength_i= iE_i + 1 -iS_i

call MPI_IRECV &

& (RECVbuf(iS_i), BUFlength_i,MPI_INTEGER, NEIBPE(neib),0,&

& MPI_COMM_WORLD, request_recv(neib),ierr)

enddo

do neib= 1, NEIBPETOT

iS_e= export_index(neib-1) + 1

iE_e= export_index(neib )

BUFlength_e= iE_e + 1 -iS_e

call MPI_ISEND

&& (SENDbuf(iS_e), BUFlength_e, MPI_INTEGER, NEIBPE(neib),0,&

& MPI_COMM_WORLD, request_send(neib),ierr)

enddo

•C

onsi

sten

cy o

f ID

’s o

f sou

rces

/des

tinat

ions

, siz

e an

d co

nten

ts o

f mes

sage

s !

•C

omm

unic

atio

n oc

curs

whe

n N

EIB

PE

(nei

b) m

atch

es

201

MP

I Pro

gram

min

g

Page 203: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Rel

atio

nshi

p SE

ND

/REC

V (#

0 to

#3)

•C

onsi

sten

cy o

f ID

’s o

f sou

rces

/des

tinat

ions

, siz

e an

d co

nten

ts o

f mes

sage

s !

•C

omm

unic

atio

n oc

curs

whe

n N

EIB

PE

(nei

b) m

atch

es

Send

#0

Rec

v. #

3

#1 #5 #9

#1 #10

#0

#3

NEI

BPE

(:)=1

,3,5

,9N

EIB

PE(:)

=1,0

,10

202

MP

I Pro

gram

min

g

Page 204: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Gen

eral

ized

Com

m. T

able

(1/6

)

12

34

17

56

78

18

910

1112

19

1314

1516

20

2122

2324

PE#2

PE#1

#NEIBPEtot

2 #NEIBPE

12

#NODE

24 16

#IMPORT_index

4 8

#IMPORT_items

17

18

19

20

21

22

23

24

#EXPORT_index

4 8

#EXPORT_items

4 8 12

16

13

14

15

16

203

MP

I Pro

gram

min

g

Page 205: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Gen

eral

ized

Com

m. T

able

(2/6

)

12

34

17

56

78

18

910

1112

19

1314

1516

20

2122

2324

PE#1

#NEIBPEtot

Num

ber o

f nei

ghbo

rs2 #NEIBPE

ID o

f nei

ghbo

rs1

2#NODE

24 16

Ext

/Int P

ts, I

nt P

ts#IMPORT_index

4 8

#IMPORT_items

17

18

19

20

21

22

23

24

#EXPORT_index

4 8

#EXPORT_items

4 8 12

16

13

14

15

16

204

MP

I Pro

gram

min

g

PE#2

Page 206: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

#NEIBPEtot

2 #NEIBPE

12

#NODE

24 16

#IMPORT_index

4 8

#IMPORT_items

17

18

19

20

21

22

23

24

#EXPORT_index

4 8

#EXPORT_items

4 8 12

16

13

14

15

16

Gen

eral

ized

Com

m. T

able

(3/6

)

Four

ext

pts

(1st-4

thite

ms)

are

im

porte

d fro

m 1

stne

ighb

or

(PE

#1),

and

four

(5th-8

thite

ms)

ar

e fro

m 2

ndne

ighb

or (P

E#2

).

12

34

17

56

78

18

910

1112

19

1314

1516

20

2122

2324

PE#1

205

MP

I Pro

gram

min

g

PE#2

Page 207: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

#NEIBPEtot

2 #NEIBPE

12

#NODE

24 16

#IMPORT_index

4 8

#IMPORT_items

17

18

19

20

21

22

23

24

#EXPORT_index

4 8

#EXPORT_items

4 8 12

16

13

14

15

16

Gen

eral

ized

Com

m. T

able

(4/6

)

impo

rted

from

1st

Nei

ghbo

r (P

E#1

) (1s

t -4th

item

s)

12

34

17

56

78

18

910

1112

19

1314

1516

20

2122

2324

PE#1

206

MP

I Pro

gram

min

g

impo

rted

from

2nd

Nei

ghbo

r (P

E#2

) (5t

h -8t

hite

ms)

PE#2

Page 208: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

#NEIBPEtot

2 #NEIBPE

12

#NODE

24 16

#IMPORT_index

4 8

#IMPORT_items

17

18

19

20

21

22

23

24

#EXPORT_index

4 8

#EXPORT_items

4 8 12

16

13

14

15

16

Gen

eral

ized

Com

m. T

able

(5/6

)

12

34

17

56

78

18

910

1112

19

1314

1516

20

2122

2324

PE#1

207

MP

I Pro

gram

min

g

Four

bou

ndar

y pt

s (1

st-4

th

item

s) a

re e

xpor

ted

to 1

st

neig

hbor

(PE

#1),

and

four

(5th-

8th

item

s) a

re to

2nd

neig

hbor

(P

E#2

).

PE#2

Page 209: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

#NEIBPEtot

2 #NEIBPE

12

#NODE

24 16

#IMPORT_index

4 8

#IMPORT_items

17

18

19

20

21

22

23

24

#EXPORT_index

4 8

#EXPORT_items

4 8 12

16

13

14

15

16

Gen

eral

ized

Com

m. T

able

(6/6

)

12

34

17

56

78

18

910

1112

19

1314

1516

20

2122

2324

PE#1

208

MP

I Pro

gram

min

g

expo

rted

to 1

stN

eigh

bor

(PE

#1) (

1st -4

thite

ms)

expo

rted

to 2

ndN

eigh

bor

(PE

#2) (

5th -

8th

item

s)

PE#2

Page 210: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Gen

eral

ized

Com

m. T

able

(6/6

)

12

34

17

56

78

18

910

1112

19

1314

1516

20

2122

2324

PE#1

An

exte

rnal

poi

nt is

onl

y se

nt

from

its

orig

inal

dom

ain.

A bo

unda

ry p

oint

cou

ld b

e re

ferre

d fro

m m

ore

than

one

do

mai

n, a

nd s

ent t

o m

ultip

le

dom

ains

(e.g

. 16t

hm

esh)

.

209

MP

I Pro

gram

min

g

PE#2

Page 211: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Not

ice:

Sen

d/R

ecv

Arr

ays

#PE0send:

VEC(start_send)~

VEC(start_send+length_send-1)

#PE1recv:

VEC(start_recv)~

VEC(start_recv+length_recv-1)

#PE1send:

VEC(start_send)~

VEC(start_send+length_send-1)

#PE0recv:

VEC(start_recv)~

VEC(start_recv+length_recv-1)

•“le

ngth

_sen

d” o

f sen

ding

pro

cess

mus

t be

equa

l to

“leng

th_r

ecv”

of r

ecei

ving

pro

cess

.–

PE

#0 to

PE

#1, P

E#1

to P

E#0

•“s

endb

uf” a

nd “r

ecvb

uf”:

diffe

rent

add

ress

210

MP

I Pro

gram

min

g

Page 212: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Peer

-to-P

eer C

omm

unic

atio

n

•W

hat i

s P

2P C

omm

unic

atio

n ?

•2D

Pro

blem

, Gen

eral

ized

Com

mun

icat

ion

Tabl

e–

2D F

DM

–P

robl

em S

ettin

g–

Dis

tribu

ted

Loca

l Dat

a an

d C

omm

unic

atio

n Ta

ble

–Im

plem

enta

tion

•R

epor

t S2

211

MP

I Pro

gram

min

g

Page 213: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Sam

ple

Prog

ram

for 2

D F

DM

212

MP

I Pro

gram

min

g $ cd <$O-S2>

$ mpifrtpx –Kfast sq-sr1.f

$ mpifccpx –Kfast sq-sr1.c

(modify go4.sh for 4 processes)

$ pjsub go4.sh

Page 214: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Exam

ple:

sq-

sr1.

c (1

/6)

Initi

aliz

atio

n#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <assert.h>

#include "mpi.h"

int main(int argc, char **argv){

int n, np, NeibPeTot, BufLength;

MPI_Status *StatSend, *StatRecv;

MPI_Request *RequestSend, *RequestRecv;

int MyRank, PeTot;

int *val, *SendBuf, *RecvBuf, *NeibPe;

int *ImportIndex, *ExportIndex, *ImportItem, *ExportItem;

char FileName[80], line[80];

int i, nn, neib;

int iStart, iEnd;

FILE *fp;

/*!C +-----------+

!C | INIT. MPI |

!C +-----------+

!C===*/MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &PeTot);

MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);

C21

3M

PI P

rogr

amm

ing

Page 215: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

/*!C +-----------+

!C | DATA file |

!C +-----------+

!C===*/sprintf(FileName, "sqm.%d", MyRank);

fp = fopen(FileName, "r");

fscanf(fp, "%d", &NeibPeTot);

NeibPe = calloc(NeibPeTot, sizeof(int));

ImportIndex = calloc(1+NeibPeTot, sizeof(int));

ExportIndex = calloc(1+NeibPeTot, sizeof(int));

for(neib=0;neib<NeibPeTot;neib++){

fscanf(fp, "%d", &NeibPe[neib]);

} fscanf(fp, "%d %d", &np, &n);

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ImportIndex[neib]);}

nn = ImportIndex[NeibPeTot];

ImportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ExportIndex[neib]);}

nn = ExportIndex[NeibPeTot];

ExportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}

Exam

ple:

sq-

sr1.

c (2

/6)

Rea

ding

dis

tribu

ted

loca

l dat

a fil

es (s

qm.*

)C

214

MP

I Pro

gram

min

g

Page 216: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

/*!C +-----------+

!C | DATA file |

!C +-----------+

!C===*/sprintf(FileName, "sqm.%d", MyRank);

fp = fopen(FileName, "r");

fscanf(fp, "%d", &NeibPeTot);

NeibPe = calloc(NeibPeTot, sizeof(int));

ImportIndex = calloc(1+NeibPeTot, sizeof(int));

ExportIndex = calloc(1+NeibPeTot, sizeof(int));

for(neib=0;neib<NeibPeTot;neib++){

fscanf(fp, "%d", &NeibPe[neib]);

} fscanf(fp, "%d %d", &np, &n);

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ImportIndex[neib]);}

nn = ImportIndex[NeibPeTot];

ImportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ExportIndex[neib]);}

nn = ExportIndex[NeibPeTot];

ExportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}

Exam

ple:

sq-

sr1.

c (2

/6)

Rea

ding

dis

tribu

ted

loca

l dat

a fil

es (s

qm.*

)#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

C21

5M

PI P

rogr

amm

ing

Page 217: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

/*!C +-----------+

!C | DATA file |

!C +-----------+

!C===*/sprintf(FileName, "sqm.%d", MyRank);

fp = fopen(FileName, "r");

fscanf(fp, "%d", &NeibPeTot);

NeibPe = calloc(NeibPeTot, sizeof(int));

ImportIndex = calloc(1+NeibPeTot, sizeof(int));

ExportIndex = calloc(1+NeibPeTot, sizeof(int));

for(neib=0;neib<NeibPeTot;neib++){

fscanf(fp, "%d", &NeibPe[neib]);

} fscanf(fp, "%d %d", &np, &n);

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ImportIndex[neib]);}

nn = ImportIndex[NeibPeTot];

ImportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ExportIndex[neib]);}

nn = ExportIndex[NeibPeTot];

ExportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}

Exam

ple:

sq-

sr1.

c (2

/6)

Rea

ding

dis

tribu

ted

loca

l dat

a fil

es (s

qm.*

)#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

npN

umbe

r of a

ll m

eshe

s (in

tern

al +

ext

erna

l)n

Num

ber o

f int

erna

l mes

hes

C21

6M

PI P

rogr

amm

ing

Page 218: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

/*!C +-----------+

!C | DATA file |

!C +-----------+

!C===*/sprintf(FileName, "sqm.%d", MyRank);

fp = fopen(FileName, "r");

fscanf(fp, "%d", &NeibPeTot);

NeibPe = calloc(NeibPeTot, sizeof(int));

ImportIndex = calloc(1+NeibPeTot, sizeof(int));

ExportIndex = calloc(1+NeibPeTot, sizeof(int));

for(neib=0;neib<NeibPeTot;neib++){

fscanf(fp, "%d", &NeibPe[neib]);

} fscanf(fp, "%d %d", &np, &n);

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ImportIndex[neib]);}

nn = ImportIndex[NeibPeTot];

ImportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ExportIndex[neib]);}

nn = ExportIndex[NeibPeTot];

ExportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}

Exam

ple:

sq-

sr1.

c (2

/6)

Rea

ding

dis

tribu

ted

loca

l dat

a fil

es (s

qm.*

)#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

C21

7M

PI P

rogr

amm

ing

Page 219: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

/*!C +-----------+

!C | DATA file |

!C +-----------+

!C===*/sprintf(FileName, "sqm.%d", MyRank);

fp = fopen(FileName, "r");

fscanf(fp, "%d", &NeibPeTot);

NeibPe = calloc(NeibPeTot, sizeof(int));

ImportIndex = calloc(1+NeibPeTot, sizeof(int));

ExportIndex = calloc(1+NeibPeTot, sizeof(int));

for(neib=0;neib<NeibPeTot;neib++){

fscanf(fp, "%d", &NeibPe[neib]);

} fscanf(fp, "%d %d", &np, &n);

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ImportIndex[neib]);}

nn = ImportIndex[NeibPeTot];

ImportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ExportIndex[neib]);}

nn = ExportIndex[NeibPeTot];

ExportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}

Exam

ple:

sq-

sr1.

c (2

/6)

Rea

ding

dis

tribu

ted

loca

l dat

a fil

es (s

qm.*

)#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

C21

8M

PI P

rogr

amm

ing

Page 220: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

REC

V/Im

port

: PE#

0#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

2122

2324

1314

1516

910

1112

56

78

12

34

20 19 18 17PE

#0PE

#1

PE#2

219

MP

I Pro

gram

min

g

Page 221: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

/*!C +-----------+

!C | DATA file |

!C +-----------+

!C===*/sprintf(FileName, "sqm.%d", MyRank);

fp = fopen(FileName, "r");

fscanf(fp, "%d", &NeibPeTot);

NeibPe = calloc(NeibPeTot, sizeof(int));

ImportIndex = calloc(1+NeibPeTot, sizeof(int));

ExportIndex = calloc(1+NeibPeTot, sizeof(int));

for(neib=0;neib<NeibPeTot;neib++){

fscanf(fp, "%d", &NeibPe[neib]);

} fscanf(fp, "%d %d", &np, &n);

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ImportIndex[neib]);}

nn = ImportIndex[NeibPeTot];

ImportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ExportIndex[neib]);}

nn = ExportIndex[NeibPeTot];

ExportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}

Exam

ple:

sq-

sr1.

c (2

/6)

Rea

ding

dis

tribu

ted

loca

l dat

a fil

es (s

qm.*

)#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

C22

0M

PI P

rogr

amm

ing

Page 222: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

/*!C +-----------+

!C | DATA file |

!C +-----------+

!C===*/sprintf(FileName, "sqm.%d", MyRank);

fp = fopen(FileName, "r");

fscanf(fp, "%d", &NeibPeTot);

NeibPe = calloc(NeibPeTot, sizeof(int));

ImportIndex = calloc(1+NeibPeTot, sizeof(int));

ExportIndex = calloc(1+NeibPeTot, sizeof(int));

for(neib=0;neib<NeibPeTot;neib++){

fscanf(fp, "%d", &NeibPe[neib]);

} fscanf(fp, "%d %d", &np, &n);

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ImportIndex[neib]);}

nn = ImportIndex[NeibPeTot];

ImportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}

for(neib=1;neib<NeibPeTot+1;neib++){

fscanf(fp, "%d", &ExportIndex[neib]);}

nn = ExportIndex[NeibPeTot];

ExportItem = malloc(nn * sizeof(int));

for(i=0;i<nn;i++){

fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}

Exam

ple:

sq-

sr1.

c (2

/6)

Rea

ding

dis

tribu

ted

loca

l dat

a fil

es (s

qm.*

)#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

C22

1M

PI P

rogr

amm

ing

Page 223: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

SEN

D/E

xpor

t: PE

#0#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

2122

2324

1314

1516

910

1112

56

78

12

34

20 19 18 17PE

#0PE

#1

PE#2

222

MP

I Pro

gram

min

g

Page 224: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

sprintf(FileName, "sq.%d", MyRank);

fp = fopen(FileName, "r");

assert(fp != NULL);

val = calloc(np, sizeof(*val));

for(i=0;i<n;i++){

fscanf(fp, "%d", &val[i]);

}

Exam

ple:

sq-

sr1.

c (3

/6)

Rea

ding

dis

tribu

ted

loca

l dat

a fil

es (s

q.*)

n

: Num

ber o

f int

erna

l poi

nts

val :

Glo

bal I

D o

f mes

hes

val o

n ex

tern

al p

oint

s ar

e un

know

nat

this

sta

ge.

2526

2728

1718

1920

910

1112

12

34

2526

2728

1718

1920

910

1112

12

34

PE#0

PE#1

PE#2

1 2 3 4 9 1011121718192025262728

C22

3M

PI P

rogr

amm

ing

Page 225: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Exam

ple:

sq-

sr1.

c (4

/6)

Pre

para

tion

of s

endi

ng/re

ceiv

ing

buffe

rs/*!C!C +--------+

!C | BUFFER |

!C +--------+

!C===*/SendBuf = calloc(ExportIndex[NeibPeTot], sizeof(*SendBuf));

RecvBuf = calloc(ImportIndex[NeibPeTot], sizeof(*RecvBuf));

for(neib=0;neib<NeibPeTot;neib++){

iStart = ExportIndex[neib];

iEnd = ExportIndex[neib+1];

for(i=iStart;i<iEnd;i++){

SendBuf[i] = val[ExportItem[i]];

}}

Info

. of b

ound

ary

poin

ts is

writ

ten

into

sen

ding

bu

ffer (SendBuf

). In

fo. s

ent t

o NeibPe[neib]

is s

tore

d in

SendBuf[ExportIndex[neib]:

ExportInedx[neib+1]-1]

C22

4M

PI P

rogr

amm

ing

Page 226: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Send

ing

Buf

fer i

s ni

ce ..

.

2122

2324

1314

1516

910

1112

56

78

12

34

20 19 18 17PE

#0PE

#1

PE#2

Num

berin

g of

thes

e bo

unda

ry n

odes

is

not

con

tinuo

us, t

here

fore

the

follo

win

g pr

oced

ure

of M

PI_

Isen

d is

no

t app

lied

dire

ctly

:

・S

tarti

ng a

ddre

ss o

f sen

ding

buf

fer

・X

X-m

essa

ges

from

that

add

ress

for (neib=0; neib<NeibPETot; neib++){

tag= 0;

iS_e= export_index[neib];

iE_e= export_index[neib+1];

BUFlength_e= iE_e -iS_e

ierr= MPI_Isend

(&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],0,

MPI_COMM_WORLD, &ReqSend[neib])

}

C22

5M

PI P

rogr

amm

ing

Page 227: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Com

mun

icat

ion

Pat

tern

usi

ng 1

D

Stru

ctur

e

halo

halo

halo

halo

Dr.

Osn

i Mar

ques

(La

wre

nce

Ber

kele

y N

atio

nal L

abor

ator

y)

226

MP

I Pro

gram

min

g

Page 228: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Exam

ple:

sq-

sr1.

c (5

/6)

SE

ND

/Exp

ort:

MP

I_Is

end

/*!C!C +-----------+

!C | SEND-RECV |

!C +-----------+

!C===*/StatSend = malloc(sizeof(MPI_Status) * NeibPeTot);

StatRecv = malloc(sizeof(MPI_Status) * NeibPeTot);

RequestSend = malloc(sizeof(MPI_Request) * NeibPeTot);

RequestRecv = malloc(sizeof(MPI_Request) * NeibPeTot);

for(neib=0;neib<NeibPeTot;neib++){

iStart = ExportIndex[neib];

iEnd = ExportIndex[neib+1];

BufLength = iEnd -iStart;

MPI_Isend(&SendBuf[iStart], BufLength, MPI_INT,

NeibPe[neib], 0, MPI_COMM_WORLD, &RequestSend[neib]);

} for(neib=0;neib<NeibPeTot;neib++){

iStart = ImportIndex[neib];

iEnd = ImportIndex[neib+1];

BufLength = iEnd -iStart;

MPI_Irecv(&RecvBuf[iStart], BufLength, MPI_INT,

NeibPe[neib], 0, MPI_COMM_WORLD, &RequestRecv[neib]);

}

5758

5960

4950

5152

4142

4344

3334

3536

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

C22

7M

PI P

rogr

amm

ing

Page 229: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

SEN

D/E

xpor

t: PE

#0#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

2122

2324

1314

1516

910

1112

56

78

12

34

20 19 18 17PE

#0PE

#1

PE#2

228

MP

I Pro

gram

min

g

Page 230: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

SEN

D: M

PI_I

send

/Irec

v/W

aita

llne

ib#0

Send

Buf

neib

#1ne

ib#2

neib

#3

BU

Flen

gth_

eB

UFl

engt

h_e

BU

Flen

gth_

eB

UFl

engt

h_e

expo

rt_i

ndex

[0]

expo

rt_i

ndex

[1]

expo

rt_i

ndex

[2]

expo

rt_i

ndex

[3]

expo

rt_i

ndex

[4]

for (neib=0; neib<NeibPETot;neib++){

for (k=export_index[neib];k<export_index[neib+1];k++){

kk= export_item[k];

SendBuf[k]= VAL[kk];

}} for (neib=0; neib<NeibPETot; neib++){

tag= 0;

iS_e= export_index[neib];

iE_e= export_index[neib+1];

BUFlength_e= iE_e -iS_e

ierr= MPI_Isend

(&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],0,

MPI_COMM_WORLD, &ReqSend[neib])

} MPI_Waitall(NeibPETot, ReqSend, StatSend);

expo

rt_i

tem

(exp

ort_

inde

x[ne

ib]:e

xpor

t_in

dex[

neib

+1]-1

) ar

e se

nt to

nei

b-th

neig

hbor

C22

9M

PI P

rogr

amm

ing

Cop

ied

to s

endi

ng b

uffe

rs

Page 231: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Not

ice:

Sen

d/R

ecv

Arr

ays

#PE0send:

VEC(start_send)~

VEC(start_send+length_send-1)

#PE1recv:

VEC(start_recv)~

VEC(start_recv+length_recv-1)

#PE1send:

VEC(start_send)~

VEC(start_send+length_send-1)

#PE0recv:

VEC(start_recv)~

VEC(start_recv+length_recv-1)

•“le

ngth

_sen

d” o

f sen

ding

pro

cess

mus

t be

equa

l to

“leng

th_r

ecv”

of r

ecei

ving

pro

cess

.–

PE

#0 to

PE

#1, P

E#1

to P

E#0

•“s

endb

uf” a

nd “r

ecvb

uf”:

diffe

rent

add

ress

230

MP

I Pro

gram

min

g

Page 232: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Rel

atio

nshi

p SE

ND

/REC

V

do neib= 1, NEIBPETOT

iS_i= import_index(neib-1) + 1

iE_i= import_index(neib )

BUFlength_i= iE_i + 1 -iS_i

call MPI_IRECV &

& (RECVbuf(iS_i), BUFlength_i,MPI_INTEGER, NEIBPE(neib),0,&

& MPI_COMM_WORLD, request_recv(neib),ierr)

enddo

do neib= 1, NEIBPETOT

iS_e= export_index(neib-1) + 1

iE_e= export_index(neib )

BUFlength_e= iE_e + 1 -iS_e

call MPI_ISEND

&& (SENDbuf(iS_e), BUFlength_e, MPI_INTEGER, NEIBPE(neib),0,&

& MPI_COMM_WORLD, request_send(neib),ierr)

enddo

•C

onsi

sten

cy o

f ID

’s o

f sou

rces

/des

tinat

ions

, siz

e an

d co

nten

ts o

f mes

sage

s !

•C

omm

unic

atio

n oc

curs

whe

n N

EIB

PE

(nei

b) m

atch

es

231

MP

I Pro

gram

min

g

Page 233: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Rel

atio

nshi

p SE

ND

/REC

V (#

0 to

#3)

•C

onsi

sten

cy o

f ID

’s o

f sou

rces

/des

tinat

ions

, siz

e an

d co

nten

ts o

f mes

sage

s !

•C

omm

unic

atio

n oc

curs

whe

n N

EIB

PE

(nei

b) m

atch

es

Send

#0

Rec

v. #

3

#1 #5 #9

#1 #10

#0

#3

NEI

BPE

(:)=1

,3,5

,9N

EIB

PE(:)

=1,0

,10

232

MP

I Pro

gram

min

g

Page 234: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Exam

ple:

sq-

sr1.

c (5

/6)

RE

CV

/Impo

rt: M

PI_

Irecv

/*!C!C +-----------+

!C | SEND-RECV |

!C +-----------+

!C===*/StatSend = malloc(sizeof(MPI_Status) * NeibPeTot);

StatRecv = malloc(sizeof(MPI_Status) * NeibPeTot);

RequestSend = malloc(sizeof(MPI_Request) * NeibPeTot);

RequestRecv = malloc(sizeof(MPI_Request) * NeibPeTot);

for(neib=0;neib<NeibPeTot;neib++){

iStart = ExportIndex[neib];

iEnd = ExportIndex[neib+1];

BufLength = iEnd -iStart;

MPI_Isend(&SendBuf[iStart], BufLength, MPI_INT,

NeibPe[neib], 0, MPI_COMM_WORLD, &RequestSend[neib]);

} for(neib=0;neib<NeibPeTot;neib++){

iStart = ImportIndex[neib];

iEnd = ImportIndex[neib+1];

BufLength = iEnd -iStart;

MPI_Irecv(&RecvBuf[iStart], BufLength, MPI_INT,

NeibPe[neib], 0, MPI_COMM_WORLD, &RequestRecv[neib]);

}

5758

5960

4950

5152

4142

4344

3334

3536

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

C23

3M

PI P

rogr

amm

ing

Page 235: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

REC

V/Im

port

: PE#

0#NEIBPEtot

2 #NEIBPE

1 2

#NODE

24 16

#IMPORTindex

4 8

#IMPORTitems

1718192021222324#EXPORTindex

4 8

#EXPORTitems

4 8 121613141516

2122

2324

1314

1516

910

1112

56

78

12

34

20 19 18 17PE

#0PE

#1

PE#2

234

MP

I Pro

gram

min

g

Page 236: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

REC

V: M

PI_I

send

/Irec

v/W

aita

ll

neib

#0R

ecvB

ufne

ib#1

neib

#2ne

ib#3

BU

Flen

gth_

iB

UFl

engt

h_i

BU

Flen

gth_

iB

UFl

engt

h_i

for (neib=0; neib<NeibPETot; neib++){

tag= 0;

iS_i= import_index[neib];

iE_i= import_index[neib+1];

BUFlength_i= iE_i -iS_i

ierr= MPI_Irecv

(&RecvBuf[iS_i], BUFlength_i, MPI_DOUBLE, NeibPE[neib],0,

MPI_COMM_WORLD, &ReqRecv[neib])

} MPI_Waitall(NeibPETot, ReqRecv, StatRecv);

for (neib=0; neib<NeibPETot;neib++){

for (k=import_index[neib];k<import_index[neib+1];k++){

kk= import_item[k];

VAL[kk]= RecvBuf[k];

}}

Cop

ied

from

rece

ivin

g bu

ffer

impo

rt_i

ndex

[0]

impo

rt_i

ndex

[1]

impo

rt_i

ndex

[2]

impo

rt_i

ndex

[3]

impo

rt_i

ndex

[4]C

235

MP

I Pro

gram

min

g

impo

rt_i

tem

(impo

rt_i

ndex

[nei

b]:im

port

_ind

ex[n

eib+

1]-1

) ar

e re

ceiv

ed fr

om n

eib-

thne

ighb

or

Page 237: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Exam

ple:

sq-

sr1.

c (6

/6)

Rea

ding

info

of e

xt p

ts fr

om re

ceiv

ing

buffe

rsMPI_Waitall(NeibPeTot, RequestRecv, StatRecv);

for(neib=0;neib<NeibPeTot;neib++){

iStart = ImportIndex[neib];

iEnd = ImportIndex[neib+1];

for(i=iStart;i<iEnd;i++){

val[ImportItem[i]] = RecvBuf[i];

}} MPI_Waitall(NeibPeTot, RequestSend, StatSend); /*

!C +--------+

!C | OUTPUT |

!C +--------+

!C===*/for(neib=0;neib<NeibPeTot;neib++){

iStart = ImportIndex[neib];

iEnd = ImportIndex[neib+1];

for(i=iStart;i<iEnd;i++){

int in = ImportItem[i];

printf("RECVbuf%8d%8d%8d¥n", MyRank, NeibPe[neib], val[in]);

}} MPI_Finalize();

return 0;

}

Con

tent

s of

RecvBuf

are

copi

ed to

valu

es a

t ext

erna

l poi

nts.

C23

6M

PI P

rogr

amm

ing

Page 238: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Exam

ple:

sq-

sr1.

c (6

/6)

Writ

ing

valu

es a

t ext

erna

l poi

nts

MPI_Waitall(NeibPeTot, RequestRecv, StatRecv);

for(neib=0;neib<NeibPeTot;neib++){

iStart = ImportIndex[neib];

iEnd = ImportIndex[neib+1];

for(i=iStart;i<iEnd;i++){

val[ImportItem[i]] = RecvBuf[i];

}} MPI_Waitall(NeibPeTot, RequestSend, StatSend); /*

!C +--------+

!C | OUTPUT |

!C +--------+

!C===*/for(neib=0;neib<NeibPeTot;neib++){

iStart = ImportIndex[neib];

iEnd = ImportIndex[neib+1];

for(i=iStart;i<iEnd;i++){

int in = ImportItem[i];

printf("RECVbuf%8d%8d%8d¥n", MyRank, NeibPe[neib], val[in]);

}} MPI_Finalize();

return 0;

}

C23

7M

PI P

rogr

amm

ing

Page 239: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Res

ults

(PE#

0) RECVbuf 0 1 5

RECVbuf 0 1 13

RECVbuf 0 1 21

RECVbuf 0 1 29

RECVbuf 0 2 33

RECVbuf 0 2 34

RECVbuf 0 2 35

RECVbuf 0 2 36

RECVbuf 1 0 4

RECVbuf 1 0 12

RECVbuf 1 0 20

RECVbuf 1 0 28

RECVbuf 1 3 37

RECVbuf 1 3 38

RECVbuf 1 3 39

RECVbuf 1 3 40

RECVbuf 2 3 37

RECVbuf 2 3 45

RECVbuf 2 3 53

RECVbuf 2 3 61

RECVbuf 2 0 25

RECVbuf 2 0 26

RECVbuf 2 0 27

RECVbuf 2 0 28

RECVbuf 3 2 36

RECVbuf 3 2 44

RECVbuf 3 2 52

RECVbuf 3 2 60

RECVbuf 3 1 29

RECVbuf 3 1 30

RECVbuf 3 1 31

RECVbuf 3 1 32

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

238

MP

I Pro

gram

min

g

Page 240: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Res

ults

(PE#

1) RECVbuf 0 1 5

RECVbuf 0 1 13

RECVbuf 0 1 21

RECVbuf 0 1 29

RECVbuf 0 2 33

RECVbuf 0 2 34

RECVbuf 0 2 35

RECVbuf 0 2 36

RECVbuf 1 0 4

RECVbuf 1 0 12

RECVbuf 1 0 20

RECVbuf 1 0 28

RECVbuf 1 3 37

RECVbuf 1 3 38

RECVbuf 1 3 39

RECVbuf 1 3 40

RECVbuf 2 3 37

RECVbuf 2 3 45

RECVbuf 2 3 53

RECVbuf 2 3 61

RECVbuf 2 0 25

RECVbuf 2 0 26

RECVbuf 2 0 27

RECVbuf 2 0 28

RECVbuf 3 2 36

RECVbuf 3 2 44

RECVbuf 3 2 52

RECVbuf 3 2 60

RECVbuf 3 1 29

RECVbuf 3 1 30

RECVbuf 3 1 31

RECVbuf 3 1 32

5758

5960

4950

5152

4142

4344

3334

3536

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

2930

3132

2122

2324

1314

1516

56

78

PE#0

PE#1

PE#2

PE#3

239

MP

I Pro

gram

min

g

Page 241: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Res

ults

(PE#

2) RECVbuf 0 1 5

RECVbuf 0 1 13

RECVbuf 0 1 21

RECVbuf 0 1 29

RECVbuf 0 2 33

RECVbuf 0 2 34

RECVbuf 0 2 35

RECVbuf 0 2 36

RECVbuf 1 0 4

RECVbuf 1 0 12

RECVbuf 1 0 20

RECVbuf 1 0 28

RECVbuf 1 3 37

RECVbuf 1 3 38

RECVbuf 1 3 39

RECVbuf 1 3 40

RECVbuf 2 3 37

RECVbuf 2 3 45

RECVbuf 2 3 53

RECVbuf 2 3 61

RECVbuf 2 0 25

RECVbuf 2 0 26

RECVbuf 2 0 27

RECVbuf 2 0 28

RECVbuf 3 2 36

RECVbuf 3 2 44

RECVbuf 3 2 52

RECVbuf 3 2 60

RECVbuf 3 1 29

RECVbuf 3 1 30

RECVbuf 3 1 31

RECVbuf 3 1 32

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

PE#0

PE#1

PE#2

PE#3

5758

5960

4950

5152

4142

4344

3334

3536

2930

3132

2122

2324

1314

1516

56

78

240

MP

I Pro

gram

min

g

Page 242: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

5758

5960

4950

5152

4142

4344

3334

3536

2930

3132

2122

2324

1314

1516

56

78

Res

ults

(PE#

3) RECVbuf 0 1 5

RECVbuf 0 1 13

RECVbuf 0 1 21

RECVbuf 0 1 29

RECVbuf 0 2 33

RECVbuf 0 2 34

RECVbuf 0 2 35

RECVbuf 0 2 36

RECVbuf 1 0 4

RECVbuf 1 0 12

RECVbuf 1 0 20

RECVbuf 1 0 28

RECVbuf 1 3 37

RECVbuf 1 3 38

RECVbuf 1 3 39

RECVbuf 1 3 40

RECVbuf 2 3 37

RECVbuf 2 3 45

RECVbuf 2 3 53

RECVbuf 2 3 61

RECVbuf 2 0 25

RECVbuf 2 0 26

RECVbuf 2 0 27

RECVbuf 2 0 28

RECVbuf 3 2 36

RECVbuf 3 2 44

RECVbuf 3 2 52

RECVbuf 3 2 60

RECVbuf 3 1 29

RECVbuf 3 1 30

RECVbuf 3 1 31

RECVbuf 3 1 32

6162

6364

5354

5556

4546

4748

3738

3940

2526

2728

1718

1920

910

1112

12

34

PE#0

PE#1

PE#2

PE#3

241

MP

I Pro

gram

min

g

Page 243: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Dis

trib

uted

Loc

al D

ata

Stru

ctur

e fo

r Pa

ralle

l Com

puta

tion

•D

istri

bute

d lo

cal d

ata

stru

ctur

e fo

r dom

ain-

to-d

oain

co

mm

unic

atio

ns h

as b

een

intro

duce

d, w

hich

is

appr

opria

te fo

r suc

h ap

plic

atio

ns w

ith s

pars

e co

effic

ient

m

atric

es (e

.g. F

DM

, FE

M, F

VM

etc

.).–

SP

MD

–Lo

cal N

umbe

ring:

Inte

rnal

pts

to E

xter

nal p

ts–

Gen

eral

ized

com

mun

icat

ion

tabl

e

•E

very

thin

g is

eas

y, if

pro

per d

ata

stru

ctur

e is

def

ined

:–

Val

ues

at b

ound

ary

pts

are

copi

ed in

to s

endi

ng b

uffe

rs–

Sen

d/R

ecv

–V

alue

s at

ext

erna

lpts

are

upd

ated

thro

ugh

rece

ivin

g bu

ffers

242

MP

I Pro

gram

min

g

Page 244: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

243

Initi

al M

esh

12

34

5

67

89

10

1112

1314

15

1617

1819

20

2122

2324

25

t2

Page 245: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

244

Thre

e D

omai

ns

45

89

10

1314

15

1819

20

2324

25

67

8

1112

1314

1617

1819

2122

2324

12

34

5

67

89

10

1112

13

#PE2

#PE1

#PE0

t2

Page 246: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

245

Thre

e D

omai

ns

9 410 5

11 81 9

2 10

12 133 14

4 15

13 185 19

6 20

14 237 24

8 25

10 611 7

12 8

1 112 12

3 1313 14

4 165 17

6 1814 19

7 218 22

9 2315 24

1 12 2

3 34 4

5 5

6 67 7

8 89 9

10 10

11 1112 12

13 13

#PE2

#PE0

#PE1t2

Page 247: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

246

PE

#0: s

qm.0

: fill ○

’s

9 410 5

11 81 9

2 10

12 133 14

4 15

13 185 19

6 20

14 237 24

8 25

10 611 7

12 8

1 112 12

3 1313 14

4 165 17

6 1814 19

7 218 22

9 2315 24

1 12 2

3 34 4

5 5

6 67 7

8 89 9

10 10

11 1112 12

13 13

#PE

2

#PE

0

#PE

1

9 410 5

11 81 9

2 10

12 133 14

4 15

13 185 19

6 20

14 237 24

8 25

10 611 7

12 8

1 112 12

3 1313 14

4 165 17

6 1814 19

7 218 22

9 2315 24

1 12 2

3 34 4

5 5

6 67 7

8 89 9

10 10

11 1112 12

13 13

#PE

2

#PE

0

#PE

1#NEIBPEtot

2#NEIBPE

1 2

#NODE13 8

(int+ext, int pts)

#IMPORTindex

○○

#IMPORTitems

○…

#EXPORTindex

○○

#EXPORTitems

○…

t2

Page 248: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

247

PE

#1: s

qm.1

: fill ○

’s

9 410 5

11 81 9

2 10

12 133 14

4 15

13 185 19

6 20

14 237 24

8 25

10 611 7

12 8

1 112 12

3 1313 14

4 165 17

6 1814 19

7 218 22

9 2315 24

1 12 2

3 34 4

5 5

6 67 7

8 89 9

10 10

11 1112 12

13 13

#PE

2

#PE

0

#PE

1

9 410 5

11 81 9

2 10

12 133 14

4 15

13 185 19

6 20

14 237 24

8 25

10 611 7

12 8

1 112 12

3 1313 14

4 165 17

6 1814 19

7 218 22

9 2315 24

1 12 2

3 34 4

5 5

6 67 7

8 89 9

10 10

11 1112 12

13 13

#PE

2

#PE

0

#PE

1#NEIBPEtot

2#NEIBPE

0 2

#NODE 8 14

(int+ext, int pts)

#IMPORTindex

○○

#IMPORTitems

○…

#EXPORTindex

○○

#EXPORTitems

○…

t2

Page 249: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

248

PE

#2: s

qm.2

: fill ○

’s

9 410 5

11 81 9

2 10

12 133 14

4 15

13 185 19

6 20

14 237 24

8 25

10 611 7

12 8

1 112 12

3 1313 14

4 165 17

6 1814 19

7 218 22

9 2315 24

1 12 2

3 34 4

5 5

6 67 7

8 89 9

10 10

11 1112 12

13 13

#PE

2

#PE

0

#PE

1

9 410 5

11 81 9

2 10

12 133 14

4 15

13 185 19

6 20

14 237 24

8 25

10 611 7

12 8

1 112 12

3 1313 14

4 165 17

6 1814 19

7 218 22

9 2315 24

1 12 2

3 34 4

5 5

6 67 7

8 89 9

10 10

11 1112 12

13 13

#PE

2

#PE

0

#PE

1#NEIBPEtot

2#NEIBPE

1 0

#NODE 9 15

(int+ext, int pts)

#IMPORTindex

○○

#IMPORTitems

○…

#EXPORTindex

○○

#EXPORTitems

○…

t2

Page 250: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

249

9 410 5

11 81 9

2 10

12 133 14

4 15

13 185 19

6 20

14 237 24

8 25

10 611 7

12 8

1 112 12

3 1313 14

4 165 17

6 1814 19

7 218 22

9 2315 24

1 12 2

3 34 4

5 5

6 67 7

8 89 9

10 10

11 1112 12

13 13

#PE2

#PE0

#PE1t2

Page 251: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

250

Proc

edur

es•

Num

ber o

f Int

erna

l/Ext

erna

l Poi

nts

•W

here

do

Ext

erna

l Pts

com

e fro

m ?

–IMPORTindex,IMPORTitems

–S

eque

nce

of NEIBPE

•Th

en c

heck

des

tinat

ions

of B

ound

ary

Pts

.–EXPORTindex,EXPORTitems

–S

eque

nce

of NEIBPE

•“s

q.*”

are

in <

$O-S

2>/e

x•

Cre

ate

“sqm

.*” b

y yo

urse

lf•

copy

<$O

-S2>

/a.o

ut(b

y sq

-sr1

.c) t

o <$

O-S

2>/e

x•

pjsu

bgo

3.sh

t2

Page 252: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Rep

ort S

2 (1

/2)

•P

aral

leliz

e 1D

cod

e (1

d.c)

usi

ng M

PI

•R

ead

entir

e el

emen

t num

ber,

and

deco

mpo

se in

to s

ub-

dom

ains

in y

our p

rogr

am

•M

easu

re p

aral

lel p

erfo

rman

ce

251

MP

I Pro

gram

min

g

Page 253: Introduction to Programming by MPI for Parallel …nkl.cc.u-tokyo.ac.jp/14e/03-MPI/MPIprog-C.pdflecture2 • 12 nodes (192 cores), 15 min., active during class time • More jobs (compared

Rep

ort S

2 (2

/2)

•D

eadl

ine:

17:

00 O

ctob

er 1

1th

(Sat

), 20

14.

–S

end

files

via

e-m

ail a

t nakajima(at)cc.u-tokyo.ac.jp

•P

robl

em–

App

ly “G

ener

aliz

ed C

omm

unic

atio

n Ta

ble”

–R

ead

entir

e el

em. #

, dec

ompo

se in

to s

ub-d

omai

ns in

you

r pro

gram

–E

valu

ate

para

llel p

erfo

rman

ce•

You

nee

d hu

ge n

umbe

r of e

lem

ents

, to

get e

xcel

lent

per

form

ance

.•

Fix

num

ber o

f ite

ratio

ns (e

.g. 1

00),

if co

mpu

tatio

ns c

anno

t be

com

plet

ed.

•R

epor

t–

Cov

er P

age:

Nam

e, ID

, and

Pro

blem

ID (S

2) m

ust b

e w

ritte

n.

–Le

ss th

an e

ight

page

s in

clud

ing

figur

es a

nd ta

bles

(A4)

. •

Stra

tegy

, Stru

ctur

e of

the

Pro

gram

, Rem

arks

–S

ourc

e lis

t of t

he p

rogr

am (i

f you

hav

e bu

gs)

–O

utpu

t lis

t (as

sm

all a

s po

ssib

le)

252

MP

I Pro

gram

min

g