Introduction to Programming by MPI for Parallel...
Transcript of Introduction to Programming by MPI for Parallel...
Intr
oduc
tion
to P
rogr
amm
ing
by M
PI fo
r Par
alle
l FEM
Rep
ort S
1 &
S2
in C
Ken
go N
akaj
ima
Pro
gram
min
g fo
r Par
alle
l Com
putin
g (6
16-2
057)
S
emin
ar o
n A
dvan
ced
Com
putin
g (6
16-4
009)
11
Mot
ivat
ion
for P
aral
lel C
ompu
ting
(and
this
cla
ss)
•La
rge-
scal
e pa
ralle
l com
pute
r ena
bles
fast
com
putin
g in
larg
e-sc
ale
scie
ntifi
c si
mul
atio
ns w
ith d
etai
led
mod
els.
C
ompu
tatio
nal s
cien
ce d
evel
ops
new
fron
tiers
of
scie
nce
and
engi
neer
ing.
•W
hy p
aral
lel c
ompu
ting
?–
fast
er &
larg
er–
“larg
er” i
s m
ore
impo
rtant
from
the
view
poi
nt o
f “ne
w fr
ontie
rs
of s
cien
ce &
eng
inee
ring”
, but
“fas
ter”
is a
lso
impo
rtant
.–
+ m
ore
com
plic
ated
–Id
eal:
Sca
labl
e•
Sol
ving
Nx
scal
e pr
oble
m u
sing
Nx
com
puta
tiona
l res
ourc
es d
urin
g sa
me
com
puta
tion
time.
MP
I Pro
gram
min
g
22
Ove
rvie
w
•W
hat i
s M
PI ?
•Y
our F
irst M
PI P
rogr
am: H
ello
Wor
ld
•G
loba
l/Loc
al D
ata
•C
olle
ctiv
e C
omm
unic
atio
n•
Pee
r-to-
Pee
r Com
mun
icat
ion
MP
I Pro
gram
min
g
33
Wha
t is
MPI
? (1
/2)
•M
essa
ge P
assi
ng In
terfa
ce•
“Spe
cific
atio
n” o
f mes
sage
pas
sing
AP
I for
dis
tribu
ted
mem
ory
envi
ronm
ent
–N
ot a
pro
gram
, Not
a li
brar
y•
http
://ph
ase.
hpcc
.jp/p
hase
/mpi
-j/m
l/mpi
-j-ht
ml/c
onte
nts.
htm
l
•H
isto
ry–
1992
MP
I For
um–
1994
MP
I-1–
1997
MP
I-2: M
PI I
/O–
2012
MP
I-3: F
ault
Res
ilien
ce, A
sync
hron
ous
Col
lect
ive
•Im
plem
enta
tion
–m
pich
AN
L (A
rgon
ne N
atio
nal L
abor
ator
y), O
penM
PI,
MV
AP
ICH
–H
/W v
endo
rs–
C/C
++, F
OTR
AN
, Jav
a ; U
nix,
Lin
ux, W
indo
ws,
Mac
OS
MP
I Pro
gram
min
g
44
Wha
t is
MPI
? (2
/2)
•“m
pich
” (fre
e) is
wid
ely
used
–su
ppor
ts M
PI-2
spe
c. (p
artia
lly)
–M
PIC
H2
afte
r Nov
. 200
5.–
http
://w
ww
-uni
x.m
cs.a
nl.g
ov/m
pi/
•W
hy M
PI i
s w
idel
y us
ed a
s de
fact
o st
anda
rd ?
–U
nifo
rm in
terfa
ce th
roug
h M
PI f
orum
•P
orta
ble,
can
wor
k on
any
type
s of
com
pute
rs•
Can
be
calle
d fro
m F
ortra
n, C
, etc
.
–m
pich
•fre
e, s
uppo
rts e
very
arc
hite
ctur
e
•P
VM
(Par
alle
l Virt
ual M
achi
ne) w
as a
lso
prop
osed
in
early
90’
s bu
t not
so
wid
ely
used
as
MP
I
MP
I Pro
gram
min
g
55
Ref
eren
ces
•W
.Gro
ppet
al.,
Usi
ng M
PI s
econ
d ed
ition
, MIT
Pre
ss, 1
999.
•
M.J
.Qui
nn, P
aral
lel P
rogr
amm
ing
in C
with
MP
I and
Ope
nMP
, M
cGra
whi
ll, 2
003.
•W
.Gro
ppet
al.,
MP
I:Th
e C
ompl
ete
Ref
eren
ce V
ol.I,
II, M
IT P
ress
, 19
98.
•ht
tp://
ww
w-u
nix.
mcs
.anl
.gov
/mpi
/ww
w/
–A
PI(
App
licat
ion
Inte
rface
) of M
PI
MP
I Pro
gram
min
g
66
How
to le
arn
MPI
(1/2
)•
Gra
mm
ar–
10-2
0 fu
nctio
ns o
f MP
I-1 w
ill be
taug
ht in
the
clas
s•
alth
ough
ther
e ar
e m
any
conv
enie
nt c
apab
ilitie
s in
MP
I-2–
If yo
u ne
ed fu
rther
info
rmat
ion,
you
can
find
info
rmat
ion
from
web
, bo
oks,
and
MP
I exp
erts
.•
Pra
ctic
e is
impo
rtant
–P
rogr
amm
ing
–“R
unni
ng th
e co
des”
is th
e m
ost i
mpo
rtant
•B
e fa
milia
r with
or “
grab
” the
idea
of S
PM
D/S
IMD
op’
s–
Sin
gle
Pro
gram
/Inst
ruct
ion
Mul
tiple
Dat
a–
Eac
h pr
oces
s do
es s
ame
oper
atio
n fo
r diff
eren
t dat
a•
Larg
e-sc
ale
data
is d
ecom
pose
d, a
nd e
ach
part
is c
ompu
ted
by e
ach
proc
ess
–G
loba
l/Loc
al D
ata,
Glo
bal/L
ocal
Num
berin
g
MP
I Pro
gram
min
g
77
SPM
D
PE
#0
Pro
gram
Dat
a #0
PE
#1
Pro
gram
Dat
a #1
PE
#2
Pro
gram
Dat
a #2
PE
#M
-1
Pro
gram
Dat
a #M
-1
mpirun -np M <Program>
You
unde
rsta
nd 9
0% M
PI,
if yo
u un
ders
tand
this
figu
re.
PE
: Pro
cess
ing
Ele
men
tP
roce
ssor
, Dom
ain,
Pro
cess
Eac
h pr
oces
s do
es s
ame
oper
atio
n fo
r diff
eren
t dat
aLa
rge-
scal
e da
ta is
dec
ompo
sed,
and
eac
h pa
rt is
com
pute
d by
eac
h pr
oces
sIt
is id
eal t
hat p
aral
lel p
rogr
am is
not
diff
eren
t fro
m s
eria
l one
exc
ept c
omm
unic
atio
n.
MP
I Pro
gram
min
g
88
Som
e Te
chni
cal T
erm
s•
Pro
cess
or, C
ore
–P
roce
ssin
g U
nit (
H/W
), P
roce
ssor
=Cor
e fo
r sin
gle-
core
pro
c’s
•P
roce
ss–
Uni
t for
MP
I com
puta
tion,
nea
rly e
qual
to “c
ore”
–E
ach
core
(or p
roce
ssor
) can
hos
t mul
tiple
pro
cess
es (b
ut n
ot
effic
ient
)•
PE
(Pro
cess
ing
Ele
men
t)–
PE
orig
inal
ly m
ean
“pro
cess
or”,
but i
t is
som
etim
es u
sed
as
“pro
cess
” in
this
cla
ss. M
oreo
ver i
t mea
ns “d
omai
n” (n
ext)
•In
mul
ticor
e pr
oc’s
: PE
gen
eral
ly m
eans
“cor
e”
•D
omai
n–
dom
ain=
proc
ess
(=P
E),
each
of “
MD
” in
“SP
MD
”, ea
ch d
ata
set
•P
roce
ss ID
of M
PI (
ID o
f PE
, ID
of d
omai
n) s
tarts
from
“0”
–if
you
have
8 p
roce
sses
(PE
’s, d
omai
ns),
ID is
0~7
MP
I Pro
gram
min
g
99
SPM
D
PE
#0
Pro
gram
Dat
a #0
PE
#1
Pro
gram
Dat
a #1
PE
#2
Pro
gram
Dat
a #2
PE
#M
-1
Pro
gram
Dat
a #M
-1
mpirun -np M <Program>
You
unde
rsta
nd 9
0% M
PI,
if yo
u un
ders
tand
this
figu
re.
PE
: Pro
cess
ing
Ele
men
tP
roce
ssor
, Dom
ain,
Pro
cess
Eac
h pr
oces
s do
es s
ame
oper
atio
n fo
r diff
eren
t dat
aLa
rge-
scal
e da
ta is
dec
ompo
sed,
and
eac
h pa
rt is
com
pute
d by
eac
h pr
oces
sIt
is id
eal t
hat p
aral
lel p
rogr
am is
not
diff
eren
t fro
m s
eria
l one
exc
ept c
omm
unic
atio
n.
MP
I Pro
gram
min
g
1010
How
to le
arn
MPI
(2/2
)
•N
OT
so d
iffic
ult.
•Th
eref
ore,
2-3
lect
ures
are
eno
ugh
for j
ust l
earn
ing
gram
mar
of M
PI.
•G
rab
the
idea
of S
PM
D !
MP
I Pro
gram
min
g
1111
Sche
dule
•M
PI
–B
asic
Fun
ctio
ns–
Col
lect
ive
Com
mun
icat
ion
–P
oint
-to-P
oint
(or P
eer-
to-P
eer)
Com
mun
icat
ion
•90
min
. x 4
-5 le
ctur
es–
Col
lect
ive
Com
mun
icat
ion
•R
epor
t S1
–P
oint
-to-P
oint
/Pee
r-to
-Pee
r Com
mun
icat
ion
•R
epor
t S2:
Par
alle
lizat
ion
of 1
D c
ode
–A
t thi
s po
int,
you
are
alm
ost a
n ex
pert
of M
PI p
rogr
amm
ing.
MP
I Pro
gram
min
g
1212
•W
hat i
s M
PI ?
•Yo
ur F
irst M
PI P
rogr
am: H
ello
Wor
ld
•G
loba
l/Loc
al D
ata
•C
olle
ctiv
e C
omm
unic
atio
n•
Pee
r-to-
Pee
r Com
mun
icat
ion
MP
I Pro
gram
min
g
1313
Logi
n to
Oak
leaf
-FX
MP
I Pro
gram
min
g
ssht71**@oakleaf-fx.cc.u-tokyo.ac.jp
Cre
ate
dire
ctor
y>$ cd
>$ mkdir2014summer
(you
r fav
orite
nam
e)>$ cd 2014summer
In th
is c
lass
this
top-
dire
ctor
y is
cal
led
<$O-TOP>.
File
s ar
e co
pied
to th
is d
irect
ory.
Und
er th
is d
irect
ory,
S1, S2, S1-ref
are
crea
ted:
<$O-S1> = <$O-TOP>/mpi/S1
<$O-S2> = <$O-TOP>/mpi/S2
Oak
leaf
-FX
ECC
S201
2
1414
Cop
ying
file
s on
Oak
leaf
-FX
MP
I Pro
gram
min
g
Fort
an>$ cd <$O-TOP>
>$ cp/home/z30088/class_eps/F/s1-f.tar .
>$ tar xvfs1-f.tar
C>$ cd <$O-TOP>
>$ cp/home/z30088/class_eps/C/s1-c.tar .
>$ tar xvfs1-c.tar
Con
firm
atio
n>$ ls
mpi
>$ cd mpi/S1
This
dire
ctor
y is
cal
led
as<$O-S1>.
<$O-S1> = <$O-TOP>/mpi/S1
151515
Firs
t Exa
mpl
eimplicit REAL*8 (A-H,O-Z)
include 'mpif.h‘
integer :: PETOT, my_rank, ierr
call MPI_INIT (ierr)
call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )
call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )
write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT
call MPI_FINALIZE (ierr)
stop
end
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv)
{int n, myid, numprocs, i;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf ("Hello World %d¥n", myid);
MPI_Finalize();
}
hello.f
hello.c
MP
I Pro
gram
min
g
161616
Com
pilin
g he
llo.f/
c
FOR
TRA
N$> mpifrtpx–Kfasthello.f
“mpifrtpx”:
requ
ired
com
pile
r & li
brar
ies
are
incl
uded
for
FORTRAN90+MPI
C$> mpifccpx–Kfasthello.c
“mpifccpx”:
requ
ired
com
pile
r & li
brar
ies
are
incl
uded
for C
+MPI
MP
I Pro
gram
min
g
>$ cd <$O-S1>
>$ mpifrtpx –Kfast hello.f
>$ mpifccpx –Kfast hello.c
171717
Run
ning
Job
•B
atch
Job
s–
Onl
y ba
tch
jobs
are
allo
wed
.–
Inte
ract
ive
exec
utio
ns o
f job
s ar
e no
t allo
wed
.•
How
to ru
n–
writ
ing
job
scrip
t–
subm
ittin
g jo
b–
chec
king
job
stat
us–
chec
king
resu
lts•
Util
izat
ion
of c
ompu
tatio
nal r
esou
rces
–
1-no
de (1
6 co
res)
is o
ccup
ied
by e
ach
job.
–Y
our n
ode
is n
ot s
hare
d by
oth
er jo
bs.
MP
I Pro
gram
min
g
181818
Job
Scrip
t•<$O-S1>/hello.sh
•S
ched
ulin
g +
She
ll S
crip
t#!/bin/sh
#PJM -L “node=1“
Number of Nodes
#PJM -L “elapse=00:10:00“
Computation Time
#PJM -L “rscgrp=lecture2“
Name of “QUEUE”
#PJM -g “gt82“
Group Name (Wallet)
#PJM -j
#PJM -o “hello.lst“
Standard Output
#PJM --mpi “proc=4“
MPI Process #
mpiexec ./a.out
Execs
MP
I Pro
gram
min
g
8 proc’s
“node=1“
“proc=8”
16 proc’s
“node=1“
“proc=16”
32 proc’s
“node=2“
“proc=32”
64 proc’s
“node=4“
“proc=64”
192 proc’s
“node=12“
“proc=192”
191919
Subm
ittin
g Jo
bsM
PI P
rogr
amm
ing
>$ cd <$O-S1>
>$ pjsub hello.sh
>$ cat hello.lst
Hello World 0
Hello World 3
Hello World 2
Hello World 1
202020
Ava
ilabl
e Q
UEU
E’s
•Fo
llow
ing
2 qu
eues
are
ava
ilabl
e.•
1 To
fu (1
2 no
des)
can
be
used
–lecture
•12
nod
es (1
92 c
ores
), 15
min
., va
lid u
ntil
the
end
of
Oct
ober
, 201
4•
Sha
red
by a
ll “e
duca
tiona
l” us
ers
–lecture2
•12
nod
es (1
92 c
ores
), 15
min
., ac
tive
durin
g cl
ass
time
•M
ore
jobs
(com
pare
d to
lecture
) can
be
proc
esse
d up
on
ava
ilabi
lity.
MP
I Pro
gram
min
g
Tofu
Inte
rcon
nect
•N
ode
Gro
up–
12no
des
–A
-/C-a
xis:
4 n
odes
in s
yste
m b
oard
, B-a
xis:
3 b
oard
s•
6D: (
X,Y
,Z,A
,B,C
)–
AB
C 3
D M
esh:
in e
ach
node
gro
up: 2
×2×
3–
XY
Z 3D
Mes
h: c
onne
ctio
n of
nod
e gr
oups
: 10×
5×8
•Jo
b su
bmis
sion
acc
ordi
ng to
net
wor
k to
polo
gy is
pos
sibl
e:–
Info
rmat
ion
abou
t use
d “X
YZ”
is a
vaila
ble
afte
r exe
cutio
n.
21
222222
Subm
ittin
g &
Che
ckin
g Jo
bs•
Sub
mitt
ing
Jobs
pjsubSCRIPT NAME
•C
heck
ing
stat
us o
f job
spjstat
•D
elet
ing/
abor
ting
pjdel JOB ID
•C
heck
ing
stat
us o
f que
ues
pjstat --rsc
•D
etai
led
info
. of q
ueue
spjstat --rsc –x
•N
umbe
r of r
unni
ng jo
bspjstat --rsc –b
•Li
mita
tion
of s
ubm
issi
onpjstat --limit
[z30088@oakleaf-fx-6 S2-ref]$ pjstat
Oakleaf-FX scheduled stop time: 2012/09/28(Fri) 09:00:00 (Remain: 31days 20:01:46)
JOB_ID JOB_NAME STATUS PROJECT RSCGROUP START_DATE ELAPSE TOKEN NODE:COORD
334730 go.sh RUNNING gt61 lecture 08/27 12:58:08 00:00:05 0.0 1
MP
I Pro
gram
min
g
232323
Bas
ic/E
ssen
tial F
unct
ions
implicit REAL*8 (A-H,O-Z)
include 'mpif.h‘
integer :: PETOT, my_rank, ierr
call MPI_INIT (ierr)
call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )
call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )
write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT
call MPI_FINALIZE (ierr)
stop
end
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv)
{int n, myid, numprocs, i;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf ("Hello World %d¥n", myid);
MPI_Finalize();
}
‘mpif.h’, “mpi.h”
Ess
entia
l Inc
lude
file
“use mpi”
is p
ossi
ble
in F
90
MPI_Init
Initi
aliz
atio
n
MPI_Comm_size
Num
ber o
f MP
I Pro
cess
esmpirun
-np
XX
<prog>
MPI_Comm_rank
Pro
cess
ID s
tarti
ng fr
om 0
MPI_Finalize
Term
inat
ion
of M
PI p
roce
sses
MP
I Pro
gram
min
g
242424
Diff
eren
ce b
etw
een
FOR
TRA
N/C
•(B
asic
ally
) sam
e in
terfa
ce–
In C
, UP
PE
R/lo
wer
cas
es a
re c
onsi
dere
d as
diff
eren
t•
e.g.
: MPI
_Com
m_s
ize
–M
PI:
UP
PE
R c
ase
–Fi
rst c
hara
cter
of t
he fu
nctio
n ex
cept
“MP
I_” i
s in
UP
PE
R c
ase.
–O
ther
cha
ract
ers
are
in lo
wer
cas
e.
•In
For
tran,
retu
rn v
alue
ierr
has
to b
e ad
ded
at th
e en
d of
the
argu
men
t lis
t. •
Cne
eds
spec
ial t
ypes
for v
aria
bles
:–
MP
I_C
omm
, MP
I_D
atat
ype,
MP
I_O
p et
c.•MPI_INIT
is d
iffer
ent:
–call MPI_INIT (ierr)
–MPI_Init (int *argc, char ***argv)
MP
I Pro
gram
min
g
252525
Wha
t’s a
re g
oing
on
?
•mpiexec
star
ts u
p 4
MP
I pro
cess
es(”p
roc=
4”)
–A
sin
gle
prog
ram
runs
on
four
pro
cess
es.
–ea
ch p
roce
ss w
rites
a v
alue
of myid
•Fo
ur p
roce
sses
do
sam
e op
erat
ions
, but
va
lues
of myid
are
diffe
rent
.•
Out
put o
f eac
h pr
oces
s is
diff
eren
t.•
That
is S
PM
D !
MP
I Pro
gram
min
g
#!/bin/sh
#PJM -L “node=1“
Number of Nodes
#PJM -L “elapse=00:10:00“
Computation Time
#PJM -L “rscgrp=lecture“
Name of “QUEUE”
#PJM -g “gt64“
Group Name (Wallet)
#PJM -j
#PJM -o “hello.lst“
Standard Output
#PJM --mpi“proc=4“
MPI Process #
mpiexec./a.out
Execs
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv)
{int n, myid, numprocs, i;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf ("Hello World %d¥n", myid);
MPI_Finalize();
}
262626
mpi
.h,
mpi
f.himplicit REAL*8 (A-H,O-Z)
include 'mpif.h‘
integer :: PETOT, my_rank, ierr
call MPI_INIT (ierr)
call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT, ierr )
call MPI_COMM_RANK (MPI_COMM_WORLD, my_rank, ierr )
write (*,'(a,2i8)') 'Hello World FORTRAN', my_rank, PETOT
call MPI_FINALIZE (ierr)
stop
end
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv)
{int n, myid, numprocs, i;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf ("Hello World %d¥n", myid);
MPI_Finalize();
}
•V
ario
us ty
pes
of p
aram
eter
s an
d va
riabl
es fo
r MP
I & th
eir i
nitia
l val
ues.
•N
ame
of e
ach
var.
star
ts fr
om “M
PI_
”•
Val
ues
of th
ese
para
met
ers
and
varia
bles
can
not b
e ch
ange
d by
us
ers.
•U
sers
do
not s
peci
fy v
aria
bles
st
artin
g fro
m “M
PI_
” in
user
s’
prog
ram
s.
MP
I Pro
gram
min
g
272727
MPI
_Ini
t•
Initi
aliz
e th
e M
PI e
xecu
tion
envi
ronm
ent (
requ
ired)
•It
is re
com
men
ded
to p
ut th
is B
EFO
RE
all
stat
emen
ts in
the
prog
ram
.
•MPI_Init(argc, argv)
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv)
{int n, myid, numprocs, i;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf ("Hello World %d¥n", myid);
MPI_Finalize();
}
CM
PI P
rogr
amm
ing
282828
MPI
_Fin
aliz
e•
Term
inat
es M
PI e
xecu
tion
envi
ronm
ent (
requ
ired)
•It
is re
com
men
ded
to p
ut th
is A
FTE
R a
ll st
atem
ents
in th
e pr
ogra
m.
•P
leas
e do
not
forg
et th
is.
•MPI_Finalize()
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv)
{int n, myid, numprocs, i;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf ("Hello World %d¥n", myid);
MPI_Finalize();
}
CM
PI P
rogr
amm
ing
292929
MPI
_Com
m_s
ize
•D
eter
min
es th
e si
ze o
f the
gro
up a
ssoc
iate
d w
ith a
com
mun
icat
or•
not r
equi
red,
but
ver
y co
nven
ient
func
tion
•MPI_Comm_size (comm, size)
–comm
MP
I_C
ommI
com
mun
icat
or–
size
int
Onu
mbe
r of p
roce
sses
in th
e gr
oup
of c
omm
unic
ator
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv)
{int n, myid, numprocs, i;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf ("Hello World %d¥n", myid);
MPI_Finalize();
}
CM
PI P
rogr
amm
ing
303030
Wha
t is
Com
mun
icat
or ?
•G
roup
of p
roce
sses
for c
omm
unic
atio
n•
Com
mun
icat
or m
ust b
e sp
ecifi
ed in
MP
I pro
gram
as
a un
it of
com
mun
icat
ion
•A
ll pr
oces
ses
belo
ng to
a g
roup
, nam
ed
“MPI
_CO
MM
_WO
RLD
” (de
faul
t)•
Mul
tiple
com
mun
icat
ors
can
be c
reat
ed, a
nd c
ompl
icat
ed
oper
atio
ns a
re p
ossi
ble.
–C
ompu
tatio
n, V
isua
lizat
ion
•O
nly
“MP
I_C
OM
M_W
OR
LD” i
s ne
eded
in th
is c
lass
.
MPI_Comm_Size (MPI_COMM_WORLD, PETOT)
MP
I Pro
gram
min
g
313131
MPI
_CO
MM
_WO
RLD
Com
mun
icat
or in
MPI
One
pro
cess
can
bel
ong
to m
ultip
le c
omm
unic
ator
s
CO
MM
_MA
NTL
EC
OM
M_C
RU
ST
CO
MM
_VIS
MP
I Pro
gram
min
g
323232
Cou
plin
g be
twee
n “G
roun
d M
otio
n” a
nd
“Slo
shin
g of
Tan
ks fo
r Oil-
Stor
age”
MP
I Pro
gram
min
g
3434
Targ
et A
pplic
atio
n•
Cou
plin
g be
twee
n “G
roun
d M
otio
n” a
nd “S
losh
ing
of
Tank
s fo
r Oil-
Sto
rage
”–
“One
-way
” cou
plin
g fro
m “G
roun
d M
otio
n” to
“Tan
ks”.
–D
ispl
acem
ent o
f gro
und
surfa
ce is
giv
en a
s fo
rced
di
spla
cem
ent o
f bot
tom
sur
face
of t
anks
.–
1 Ta
nk =
1 P
E (s
eria
l)
Def
orm
atio
nof
sur
face
w
ill b
e gi
ven
as
boun
dary
con
ditio
nsat
bot
tom
of t
anks
.
Def
orm
atio
nof
sur
face
w
ill b
e gi
ven
as
boun
dary
con
ditio
nsat
bot
tom
of t
anks
.
MP
I Pro
gram
min
g
35
2003
Tok
achi
Ear
thqu
ake
(M8.
0)Fi
re a
ccid
ent o
f oil
tank
s du
e to
long
per
iod
grou
nd m
otio
n (s
urfa
ce w
aves
) dev
elop
ed in
the
basi
n of
Tom
akom
ai
MP
I Pro
gram
min
g
36
Seis
mic
Wav
e Pr
opag
atio
n,
Und
ergr
ound
Str
uctu
re
MP
I Pro
gram
min
g
3737
Sim
ulat
ion
Cod
es•
Gro
und
Mot
ion
(Ichi
mur
a): F
ortra
n–
Par
alle
l FE
M, 3
D E
last
ic/D
ynam
ic•
Exp
licit
forw
ard
Eul
er s
chem
e
–E
ach
elem
ent:
2m×
2m×
2m c
ube
–24
0m×
240m
×10
0m re
gion
•S
losh
ing
of T
anks
(Nag
ashi
ma)
: C–
Ser
ial F
EM
(Em
barr
assi
ngly
Par
alle
l)•
Impl
icit
back
war
d E
uler
, Sky
line
met
hod
•S
hell
elem
ents
+ In
visc
id p
oten
tial f
low
–D
: 42.
7m, H
: 24.
9m, T
: 20m
m,
–Fr
eque
ncy:
7.6
sec.
–80
ele
men
ts in
circ
., 0.
6m m
esh
in h
eigh
t–
Tank
-to-T
ank:
60m
, 4×
4•
Tota
l num
ber o
f unk
now
ns: 2
,918
,169
MP
I Pro
gram
min
g
383838
Thre
e C
omm
unic
ator
sm
eshG
LOB
AL%
MP
I_C
OM
M
base
mem
t#0
base
men
t#1
base
men
t#2
base
men
t#3
mes
hBA
SE
%M
PI_
CO
MM
tank #0
tank #1
tank #2
tank #3
tank #4
tank #5
tank #6
tank #7
tank #8
mes
hTA
NK
%M
PI_
CO
MM
mes
hGLO
BA
L%m
y_ra
nk=
0~3
mes
hBA
SE
%m
y_ra
nk=
0~3
mes
hGLO
BA
L%m
y_ra
nk=
4~12
mes
hTA
NK
%m
y_ra
nk=
0~ 8
mes
hTA
NK
%m
y_ra
nk=
-1m
eshB
AS
E%
my_
rank
= -1
mes
hGLO
BA
L%M
PI_
CO
MM
base
mem
t#0
base
men
t#1
base
men
t#2
base
men
t#3
mes
hBA
SE
%M
PI_
CO
MM
base
mem
t#0
base
men
t#1
base
men
t#2
base
men
t#3
mes
hBA
SE
%M
PI_
CO
MM
tank #0
tank #1
tank #2
tank #3
tank #4
tank #5
tank #6
tank #7
tank #8
mes
hTA
NK
%M
PI_
CO
MM
tank #0
tank #1
tank #2
tank #3
tank #4
tank #5
tank #6
tank #7
tank #8
mes
hTA
NK
%M
PI_
CO
MM
mes
hGLO
BA
L%m
y_ra
nk=
0~3
mes
hBA
SE
%m
y_ra
nk=
0~3
mes
hGLO
BA
L%m
y_ra
nk=
4~12
mes
hTA
NK
%m
y_ra
nk=
0~ 8
mes
hTA
NK
%m
y_ra
nk=
-1m
eshB
AS
E%
my_
rank
= -1
MP
I Pro
gram
min
g
393939
MPI
_Com
m_r
ank
•D
eter
min
es th
e ra
nk o
f the
cal
ling
proc
ess
in th
e co
mm
unic
ator
–“ID
of M
PI p
roce
ss” i
s so
met
imes
cal
led
“ran
k”
•MPI_Comm_rank (comm, rank)
–comm
MP
I_C
ommI
com
mun
icat
or–
rank
int
Ora
nk o
f the
cal
ling
proc
ess
in th
e gr
oup
of c
omm
Sta
rting
from
“0”
#include "mpi.h"
#include <stdio.h>
int main(int argc, char **argv)
{int n, myid, numprocs, i;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
printf ("Hello World %d¥n", myid);
MPI_Finalize();
}
CM
PI P
rogr
amm
ing
404040
MPI
_Abo
rt•
Abo
rts M
PI e
xecu
tion
envi
ronm
ent
•MPI_Abort (comm, errcode)
–comm
MP
I_C
ommI
com
mun
icat
or–
errcode
int
Oer
ror c
ode
CM
PI P
rogr
amm
ing
414141
MPI
_Wtim
e•
Ret
urns
an
elap
sed
time
on th
e ca
lling
proc
esso
r
•time= MPI_Wtime ()
–time
double
OTi
me
in s
econ
ds s
ince
an
arbi
trary
tim
e in
the
past
.
… double Stime, Etime;
Stime= MPI_Wtime ();
(…)
Etime= MPI_Wtime ();
CM
PI P
rogr
amm
ing
424242
Exam
ple
of M
PI_W
time
MP
I Pro
gram
min
g $> cd <$O-S1>
$> mpifccpx –O1 time.c
$> mpifrtpx –O1 time.f
(modify go4.sh, 4 processes)
$> pjsub go4.sh
0 1.113281E+00
3 1.113281E+00
2 1.117188E+00
1 1.117188E+00
Process Time
ID
434343
MPI
_Wtic
k•
Ret
urns
the
reso
lutio
n of
MP
I_W
time
•de
pend
s on
har
dwar
e, a
nd c
ompi
ler
•time= MPI_Wtick ()
–time
double
OTi
me
in s
econ
ds o
f res
olut
ion
of M
PI_
Wtim
e
implicit REAL*8 (A-H,O-Z)
include 'mpif.h'
… TM= MPI_WTICK ()
write (*,*) TM
…
double Time;
… Time = MPI_Wtick();
printf("%5d%16.6E¥n", MyRank, Time);
…
MP
I Pro
gram
min
g
444444
Exam
ple
of M
PI_W
tick
MP
I Pro
gram
min
g
$> cd <$O-S1>
$> mpifccpx –O1 wtick.c
$> mpifrtpx –O1 wtick.f
(modify go1.sh, 1 process)
$> pjsub go1.sh
454545
MPI
_Bar
rier
•B
lock
s un
til a
ll pr
oces
ses
in th
e co
mm
unic
ator
hav
e re
ache
d th
is ro
utin
e.•
Mai
nly
for d
ebug
ging
, hug
e ov
erhe
ad, n
ot re
com
men
ded
for r
eal c
ode.
•MPI_Barrier (comm)
–comm
MP
I_C
ommI
com
mun
icat
or
CM
PI P
rogr
amm
ing
464646
•W
hat i
s M
PI ?
•Y
our F
irst M
PI P
rogr
am: H
ello
Wor
ld
•G
loba
l/Loc
al D
ata
•C
olle
ctiv
e C
omm
unic
atio
n•
Pee
r-to-
Pee
r Com
mun
icat
ion
MP
I Pro
gram
min
g
4747
Dat
a St
ruct
ures
& A
lgor
ithm
s
•C
ompu
ter p
rogr
am c
onsi
sts
of d
ata
stru
ctur
es a
nd
algo
rithm
s.•
They
are
clo
sely
rela
ted.
In o
rder
to im
plem
ent a
n al
gorit
hm, w
e ne
ed to
spe
cify
an
appr
opria
te d
ata
stru
ctur
e fo
r tha
t.–
We
can
even
say
that
“Dat
a S
truct
ures
=Alg
orith
ms”
–S
ome
peop
le m
ay n
ot a
gree
with
this
, I (K
N) t
hink
it is
true
for
scie
ntifi
c co
mpu
tatio
ns fr
om m
y ex
perie
nces
.
•A
ppro
pria
te d
ata
stru
ctur
es fo
r par
alle
l com
putin
g m
ust
be s
peci
fied
befo
re s
tarti
ng p
aral
lel c
ompu
ting.
MP
I Pro
gram
min
g
484848
SPM
D:Si
ngle
Pro
gram
Mul
tiple
Dat
a
•Th
ere
are
vario
us ty
pes
of “p
aral
lel c
ompu
ting”
, and
th
ere
are
man
y al
gorit
hms.
•C
omm
on is
sue
is S
PM
D (S
ingl
e P
rogr
am M
ultip
le D
ata)
.•
It is
idea
l tha
t par
alle
l com
putin
g is
don
e in
the
sam
e w
ay fo
r ser
ial c
ompu
ting
(exc
ept c
omm
unic
atio
ns)
–It
is re
quire
d to
spe
cify
pro
cess
es w
ith c
omm
unic
atio
ns a
nd
thos
e w
ithou
t com
mun
icat
ions
.
MP
I Pro
gram
min
g
494949
Wha
t is
a da
ta s
truc
ture
whi
ch is
ap
prop
riate
for S
PMD
?
PE
#0
Pro
gram
Dat
a #0
PE
#1
Pro
gram
Dat
a #1
PE
#2
Pro
gram
Dat
a #2
PE
#3
Pro
gram
Dat
a #3
MP
I Pro
gram
min
g
505050
Dat
a St
ruct
ure
for S
MPD
(1/2
)•
SP
MD
: Lar
ge d
ata
is d
ecom
pose
d in
to s
mal
l pie
ces,
and
ea
ch p
iece
is p
roce
ssed
by
each
pro
cess
or/p
roce
ss•
Con
side
r the
follo
win
g si
mpl
e co
mpu
tatio
n fo
r vec
tor V
gw
ith le
ngh
of N
g(=
20):
•If
you
com
pute
this
usi
ng fo
ur p
roce
ssor
s, e
ach
proc
esso
r sto
res
and
proc
esse
s 5
(=20
/4)
com
pone
nts
of V
g.
int main(){
int i,Ng;
double Vg[20];
Ng=20;
for(i=0;i<Ng;i++){
Vg[i] = 2.0*Vg[i];}
return 0;}
MP
I Pro
gram
min
g
515151
Dat
a St
ruct
ure
for S
MPD
(2/2
)•
i.e.
•Th
us, a
“sin
gle
prog
ram
” can
exe
cute
par
alle
l pro
cess
ing.
–In
eac
h pr
oces
s, c
ompo
nent
s of
“Vl”
are
diffe
rent
: Mul
tiple
Dat
a –
Com
puta
tion
usin
g on
ly “V
l” (a
s lo
ng a
s po
ssib
le) l
eads
to
effic
ient
par
alle
l com
puta
tion.
–P
rogr
am is
not
diff
eren
t fro
m th
at fo
r ser
ial C
PU
(in
the
prev
ious
pa
ge).
MP
I Pro
gram
min
g
int main(){
int i,Nl;
double Vl[5];
Nl=5;
for(i=0;i<Nl;i++){
Vl[i] = 2.0*Vl[i];}
return 0;}
525252
Glo
bal &
Loc
al D
ata
•V
g –E
ntire
Dom
ain
–“G
loba
l Dat
a” w
ith “G
loba
l ID
” fro
m 1
to 2
0•
Vl –fo
r Eac
h P
roce
ss (P
E, P
roce
ssor
, Dom
ain)
–“L
ocal
Dat
a” w
ith “L
ocal
ID” f
rom
1 to
5–
Effi
cien
t util
izat
ion
of lo
cal d
ata
lead
s to
exc
elle
nt p
aral
lel
effic
ienc
y.
MP
I Pro
gram
min
g
5353
Idea
of L
ocal
Dat
a in
CVg
: Glo
bal D
ata
•0th-4
thco
mp.
on
PE
#0•5
th-9
thco
mp.
on
PE
#1•1
0th -
14th
com
p. o
n P
E#2
•15t
h -19
thco
mp.
on
PE
#3
Eac
h of
thes
e fo
ur s
ets
corr
espo
nds
to 0
th-4
th
com
pone
nts
of V
l(lo
cal
data
) whe
re th
ere
loca
l ID
’s
are
0-4.
Vl[0]
Vl[1]
Vl[2]
Vl[3]
Vl[4]
PE#0
PE#1
PE#2
PE#3
Vg[ 0]
Vg[ 1]
Vg[ 2]
Vg[ 3]
Vg[ 4]
Vg[ 5]
Vg[ 6]
Vg[ 7]
Vg[ 8]
Vg[ 9]
Vg[10]
Vg[11]
Vg[12]
Vg[13]
Vg[14]
Vg[15]
Vg[16]
Vg[17]
Vg[18]
Vg[19]
Vl[0]
Vl[1]
Vl[2]
Vl[3]
Vl[4]
Vl[0]
Vl[1]
Vl[2]
Vl[3]
Vl[4]
Vl[0]
Vl[1]
Vl[2]
Vl[3]
Vl[4]C
MP
I Pro
gram
min
g
5454
Glo
bal &
Loc
al D
ata
•V
g –E
ntire
Dom
ain
–“G
loba
l Dat
a” w
ith “G
loba
l ID
” fro
m 1
to 2
0•
Vl –fo
r Eac
h P
roce
ss (P
E, P
roce
ssor
, Dom
ain)
–“L
ocal
Dat
a” w
ith “L
ocal
ID” f
rom
1 to
5•
Ple
ase
keep
you
r atte
ntio
n to
the
follo
win
g:–
How
to g
ener
ate
Vl(
loca
l dat
a) fr
om V
g (g
loba
l dat
a)–
How
to m
ap c
ompo
nent
s, fr
om V
g to
Vl,
and
from
Vlt
o V
g.–
Wha
t to
do if
Vlc
anno
t be
calc
ulat
ed o
n ea
ch p
roce
ss in
in
depe
nden
t man
ner.
–P
roce
ssin
g as
loca
lized
as
poss
ible
lead
s to
exc
elle
nt p
aral
lel
effic
ienc
y:
•D
ata
stru
ctur
es &
alg
orith
ms
for t
hat p
urpo
se.
MP
I Pro
gram
min
g
555555
•W
hat i
s M
PI ?
•Y
our F
irst M
PI P
rogr
am: H
ello
Wor
ld
•G
loba
l/Loc
al D
ata
•C
olle
ctiv
e C
omm
unic
atio
n•
Pee
r-to-
Pee
r Com
mun
icat
ion
MP
I Pro
gram
min
g
565656
Wha
t is
Col
lect
ive
Com
mun
icat
ion
?集
団通
信,グル
ープ通
信
•C
olle
ctiv
e co
mm
unic
atio
n is
the
proc
ess
of
exch
angi
ng in
form
atio
n be
twee
n m
ultip
le M
PI
proc
esse
s in
the
com
mun
icat
or: o
ne-to
-all
or a
ll-to
-all
com
mun
icat
ions
.
•E
xam
ples
–B
road
cast
ing
cont
rol d
ata
–M
ax, M
in–
Sum
mat
ion
–D
ot p
rodu
cts
of v
ecto
rs–
Tran
sfor
mat
ion
of d
ense
mat
rices
MP
I Pro
gram
min
g
575757
Exam
ple
of C
olle
ctiv
e C
omm
unic
atio
ns (1
/4)
A0
P#0
B0
C0
D0
P#1
P#2
P#3
Bro
adca
stA
0P
#0B
0C
0D
0
A0
P#1
B0
C0
D0
A0
P#2
B0
C0
D0
A0
P#3
B0
C0
D0
A0
P#0
B0
C0
D0
P#1
P#2
P#3
Scat
ter
A0
P#0
B0
P#1
C0
P#2
D0
P#3
Gat
her
MP
I Pro
gram
min
g
585858
Exam
ple
of C
olle
ctiv
e C
omm
unic
atio
ns (2
/4)
All
gath
erA
0P
#0B
0C
0D
0
A0
P#1
B0
C0
D0
A0
P#2
B0
C0
D0
A0
P#3
B0
C0
D0
All-
to-A
ll
A0
P#0
B0
P#1
C0
P#2
D0
P#3
A0
P#0
A1
A2
A3
B0
P#1
B1
B2
B3
C0
P#2
C1
C2
C3
D0
P#3
D1
D2
D3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
MP
I Pro
gram
min
g
595959
Exam
ple
of C
olle
ctiv
e C
omm
unic
atio
ns (3
/4)
Red
uce
P#0
P#1
P#2
P#3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
All
redu
ceP
#0
P#1
P#2
P#3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
MP
I Pro
gram
min
g
606060
Exam
ple
of C
olle
ctiv
e C
omm
unic
atio
ns (4
/4)
Red
uce
scat
ter
P#0
P#1
P#2
P#3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
op.A
0-A
3
op.B
0-B
3
op.C
0-C
3
op.D
0-D
3
MP
I Pro
gram
min
g
616161
Exam
ples
by
Col
lect
ive
Com
m.
•D
ot P
rodu
cts
of V
ecto
rs•
Sca
tter/G
athe
r•
Rea
ding
Dis
tribu
ted
File
s•
MP
I_A
llgat
herv
MP
I Pro
gram
min
g
626262
Glo
bal/L
ocal
Dat
a•
Dat
a st
ruct
ure
of p
aral
lel c
ompu
ting
base
d on
SP
MD
, w
here
larg
e sc
ale
“glo
bal d
ata”
is d
ecom
pose
d to
sm
all
piec
es o
f “lo
cal d
ata”
.
MP
I Pro
gram
min
g
636363
Large-scale
Data
local
data
local
data
local
data
local
data
local
data
local
data
local
data
local
data
comm.
Domain
Decomposition
Dom
ain
Dec
ompo
sitio
n/Pa
rtiti
onin
g•
PC
with
1G
B R
AM
: can
exe
cute
FE
M a
pplic
atio
n w
ith u
p to
10
6m
eshe
s–
103 k
m×
103
km×
102
km(S
W J
apan
): 10
8m
eshe
s by
1km
cu
bes
•La
rge-
scal
e D
ata:
Dom
ain
deco
mpo
sitio
n, p
aral
lel &
loca
l op
erat
ions
•G
loba
l Com
puta
tion:
Com
m. a
mon
g do
mai
ns n
eede
d MP
I Pro
gram
min
g
646464
Loca
l Dat
a St
ruct
ure
•It
is im
porta
nt to
def
ine
prop
er lo
cal d
ata
stru
ctur
e fo
r ta
rget
com
puta
tion
(and
its
algo
rithm
)–
Alg
orith
ms=
Dat
a S
truct
ures
•M
ain
obje
ctiv
e of
this
cla
ss !
MP
I Pro
gram
min
g
656565
Glo
bal/L
ocal
Dat
a•
Dat
a st
ruct
ure
of p
aral
lel c
ompu
ting
base
d on
SP
MD
, w
here
larg
e sc
ale
“glo
bal d
ata”
is d
ecom
pose
d to
sm
all
piec
es o
f “lo
cal d
ata”
.
•C
onsi
der t
he d
ot p
rodu
ct o
f fol
low
ing
VE
Cp
and
VE
Cs
with
leng
th=2
0 by
par
alle
l com
puta
tion
usin
g 4
proc
esso
rs
MP
I Pro
gram
min
g
VECp[ 0]= 2
[ 1]= 2
[ 2]= 2
…[17]= 2
[18]= 2
[19]= 2
VECs[ 0]= 3
[ 1]= 3
[ 2]= 3
…[17]= 3
[18]= 3
[19]= 3
666666
<$O
-S1>
/dot
.f, d
ot.c
implicit REAL*8 (A-H,O-Z)
real(kind=8),dimension(20):: &
VECp, VECs
do i= 1, 20
VECp(i)= 2.0d0
VECs(i)= 3.0d0
enddo
sum= 0.d0
do ii= 1, 20
sum= sum + VECp(ii)*VECs(ii)
enddo
stop
end
#include <stdio.h>
int main(){
int i;
double VECp[20], VECs[20]
double sum;
for(i=0;i<20;i++){
VECp[i]= 2.0;
VECs[i]= 3.0;
} sum = 0.0;
for(i=0;i<20;i++){
sum += VECp[i] * VECs[i];
} return 0;
}
MP
I Pro
gram
min
g
676767
<$O
-S1>
/dot
.f, d
ot.c
(do
it on
EC
CS
2012
)
MP
I Pro
gram
min
g
>$ cd <$O-S1>
>$ cc -O3 dot.c
>$ f90 –O3 dot.f
>$ ./a.out
1 2. 3.
2 2. 3.
3 2. 3.
…18 2. 3.
19 2. 3.
20 2. 3.
dot product 120.
686868
MPI
_Red
uce
•R
educ
es v
alue
s on
all
proc
esse
s to
a s
ingl
e va
lue
–S
umm
atio
n, P
rodu
ct, M
ax, M
in e
tc.
•MPI_Reduce(sendbuf,recvbuf,count,datatype,op,root,comm)
–sendbuf
choi
ceI
star
ting
addr
ess
of s
end
buffe
r–
recvbuf
choi
ce
Ost
artin
g ad
dres
s re
ceiv
e bu
ffer
type
is d
efin
ed b
y ”datatype”
–count
int
Inu
mbe
r of e
lem
ents
in s
end/
rece
ive
buffe
r–
datatype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
end/
reci
vebu
ffer
FORTRAN MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_CHARACTER etc.
C MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR etc
–op
MP
I_O
pI
redu
ce o
pera
tion
MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND etc
Use
rs c
an d
efin
e op
erat
ions
by MPI_OP_CREATE
–root
int
Ira
nk o
f roo
t pro
cess
–
comm
MP
I_C
ommI
com
mun
icat
or
Red
uce
P#0
P#1
P#2
P#3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
C
MP
I Pro
gram
min
g
696969
Send
/Rec
eive
Buf
fer
(Sen
ding
/Rec
eivi
ng)
•A
rray
s of
“sen
d (s
endi
ng) b
uffe
r” a
nd “r
ecei
ve
(rec
eivi
ng) b
uffe
r” o
ften
appe
ar in
MP
I.
•A
ddre
sses
of “
send
(sen
ding
) buf
fer”
and
“rec
eive
(r
ecei
ving
) buf
fer”
mus
t be
diffe
rent
.
MP
I Pro
gram
min
g
707070
Exam
ple
of M
PI_R
educ
e (1
/2)
MPI_Reduce
(sendbuf,recvbuf,count,datatype,op,root,comm)
double X0, X1;
MPI_Reduce
(&X0, &X1, 1, MPI_DOUBLE, MPI_MAX, 0, <comm>);
Glo
bal M
ax v
alue
s of
X0[
i] go
to X
MA
X[i]
on
#0 p
roce
ss (
i=0~
3)
double X0[4], XMAX[4];
MPI_Reduce
(X0, XMAX, 4, MPI_DOUBLE, MPI_MAX, 0, <comm>);
CM
PI P
rogr
amm
ing
717171
Exam
ple
of M
PI_R
educ
e (2
/2)
double X0, XSUM;
MPI_Reduce
(&X0, &XSUM, 1, MPI_DOUBLE, MPI_SUM, 0, <comm>)
double X0[4];
MPI_Reduce
(&X0[0], &X0[2], 2, MPI_DOUBLE_PRECISION, MPI_SUM, 0, <comm>)
Glo
bal s
umm
atio
n of
X0
goes
to X
SU
Mon
#0
proc
ess.
・G
loba
l sum
mat
ion
of X
0[0]
goe
s to
X0[
2]on
#0
proc
ess.
・G
loba
l sum
mat
ion
of X
0[1]
goe
s to
X0[
3]on
#0
proc
ess.
MPI_Reduce
(sendbuf,recvbuf,count,datatype,op,root,comm)
CM
PI P
rogr
amm
ing
727272
MPI
_Bca
st
•B
road
cast
s a
mes
sage
from
the
proc
ess
with
rank
"roo
t" to
all
othe
r pr
oces
ses
of th
e co
mm
unic
ator
•MPI_Bcast (buffer,count,datatype,root,comm)
–buffer
choi
ce
I/O
star
ting
addr
ess
of b
uffe
rty
pe is
def
ined
by ”datatype”
–count
int
Inu
mbe
r of e
lem
ents
in s
end/
recv
buf
fer
–datatype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
end/
recv
buf
fer
FORTRAN MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_CHARACTER etc.
C MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR etc.
–root
int
Ira
nk o
f roo
t pro
cess
–
comm
MP
I_C
ommI
com
mun
icat
or
A0
P#0
B0
C0
D0
P#1
P#2
P#3
A0
P#0
B0
C0
D0
P#1
P#2
P#3
Bro
adca
stA
0P
#0B
0C
0D
0
A0
P#1
B0
C0
D0
A0
P#2
B0
C0
D0
A0
P#3
B0
C0
D0
A0
P#0
B0
C0
D0
A0
P#1
B0
C0
D0
A0
P#2
B0
C0
D0
A0
P#3
B0
C0
D0 C
MP
I Pro
gram
min
g
737373
MPI
_Allr
educ
e•
MP
I_R
educ
e +
MP
I_B
cast
•S
umm
atio
n (o
f dot
pro
duct
s) a
nd M
AX
/MIN
val
ues
are
likel
y to
util
ized
in
each
pro
cess
•call MPI_Allreduce
(sendbuf,recvbuf,count,datatype,op, comm)
–sendbuf
choi
ceI
star
ting
addr
ess
of s
end
buffe
r–
recvbuf
choi
ce
Ost
artin
g ad
dres
s re
ceiv
e bu
ffer
type
is d
efin
ed b
y ”datatype”
–count
int
Inu
mbe
r of e
lem
ents
in s
end/
recv
buf
fer
–datatype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
end/
recv
buf
fer
–op
MP
I_O
pI
redu
ce o
pera
tion
–comm
MP
I_C
ommI
com
mun
icat
or
All
redu
ceP
#0
P#1
P#2
P#3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
op.A
0-A
3op
.B0-
B3
op.C
0-C
3op
.D0-
D3
C
MP
I Pro
gram
min
g
7474
“op”
of M
PI_R
educ
e/A
llred
uce
•MPI_MAX,MPI_MIN
Max
, Min
•MPI_SUM,MPI_PROD
Sum
mat
ion,
Pro
duct
•MPI_LAND
Logi
cal A
ND
MPI_Reduce
(sendbuf,recvbuf,count,datatype,op,root,comm)
C
MP
I Pro
gram
min
g
757575
Loca
l Dat
a (1
/2)
•D
ecom
pose
vec
tor w
ith le
ngth
=20
into
4 d
omai
ns (p
roce
sses
)•
Eac
h pr
oces
s ha
ndle
s a
vect
or w
ith le
ngth
= 5
VECp[ 0]= 2
[ 1]= 2
[ 2]= 2
…[17]= 2
[18]= 2
[19]= 2
VECs[ 0]= 3
[ 1]= 3
[ 2]= 3
…[17]= 3
[18]= 3
[19]= 3
CM
PI P
rogr
amm
ing
7676
Loca
l Dat
a (2
/2)
•1t
h -5t
hco
mpo
nent
s of
orig
inal
glo
bal v
ecto
r go
to 1
th-5
th c
ompo
nent
s of
PE
#0, 6
th-1
0th
-> P
E#1
, 11t
h -15
th->
PE
#2, 1
6th -
20th
-> P
E#3
.
MP
I Pro
gram
min
g
C
VECp[0]= 2
[1]= 2
[2]= 2
[3]= 2
[4]= 2
VECs[0]= 3
[1]= 3
[2]= 3
[3]= 3
[4]= 3
PE#0
PE#1
PE#2
PE#3
VECp[ 0]~VECp[ 4]
VECs[ 0]~VECs[ 4]
VECp[ 5]~VECp[ 9]
VECs[ 5]~VECs[ 9]
VECp[10]~VECp[14]
VECs[10]~VECs[14]
VECp[15]~VECp[19]
VECs[15]~VECs[19]
VECp[0]= 2
[1]= 2
[2]= 2
[3]= 2
[4]= 2
VECs[0]= 3
[1]= 3
[2]= 3
[3]= 3
[4]= 3
VECp[0]= 2
[1]= 2
[2]= 2
[3]= 2
[4]= 2
VECs[0]= 3
[1]= 3
[2]= 3
[3]= 3
[4]= 3
VECp[0]= 2
[1]= 2
[2]= 2
[3]= 2
[4]= 2
VECs[0]= 3
[1]= 3
[2]= 3
[3]= 3
[4]= 3
77
But
...
•It
is to
o ea
sy !!
Jus
t de
com
posi
ng a
nd
renu
mbe
ring
from
1 (o
r 0)
.
•O
f cou
rse,
this
is n
ot
enou
gh. F
urth
er
exam
ples
will
be s
how
n in
the
latte
r par
t.
MP
I Pro
gram
min
g
Vl[0]
Vl[1]
Vl[2]
Vl[3]
Vl[4]
PE#0
PE#1
PE#2
PE#3
Vg[ 0]
Vg[ 1]
Vg[ 2]
Vg[ 3]
Vg[ 4]
Vg[ 5]
Vg[ 6]
Vg[ 7]
Vg[ 8]
Vg[ 9]
Vg[10]
Vg[11]
Vg[12]
Vg[13]
Vg[14]
Vg[15]
Vg[16]
Vg[17]
Vg[18]
Vg[19]
Vl[0]
Vl[1]
Vl[2]
Vl[3]
Vl[4]
Vl[0]
Vl[1]
Vl[2]
Vl[3]
Vl[4]
Vl[0]
Vl[1]
Vl[2]
Vl[3]
Vl[4]
7878
Exam
ple:
Dot
Pro
duct
(1/3
)<$O-S1>/allreduce.c
MP
I Pro
gram
min
g
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int i,N;
int PeTot, MyRank;
double VECp[5], VECs[5];
double sumA, sumR, sum0;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &PeTot);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
sumA= 0.0;
sumR= 0.0;
N=5;
for(i=0;i<N;i++){
VECp[i] = 2.0;
VECs[i] = 3.0;
} sum0 = 0.0;
for(i=0;i<N;i++){
sum0 += VECp[i] * VECs[i];
}
Loca
l vec
tor i
s ge
nera
ted
at e
ach
loca
l pro
cess
.
7979
Exam
ple:
Dot
Pro
duct
(2/3
)<$O-S1>/allreduce.c
MP
I Pro
gram
min
g
MPI_Reduce(&sum0, &sumR, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
MPI_Allreduce(&sum0, &sumA, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
printf("before BCAST %5d %15.0F %15.0F¥n", MyRank, sumA, sumR);
MPI_Bcast(&sumR, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
printf("after BCAST %5d %15.0F %15.0F¥n", MyRank, sumA, sumR);
MPI_Finalize();
return 0;
}
8080
Exam
ple:
Dot
Pro
duct
(3/3
)<$O-S1>/allreduce.c
MP
I Pro
gram
min
g
MPI_Reduce(&sum0, &sumR, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
MPI_Allreduce(&sum0, &sumA, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
Dot
Pro
duct
Sum
mat
ion
of re
sults
of e
ach
proc
ess
(sum
0)“s
umR
” has
val
ue o
nly
on P
E#0
.
“sum
A” h
as v
alue
on
all p
roce
sses
by
MP
I_A
llred
uce
“sum
R” h
as v
alue
on
PE
#1-#
3 by
MP
I_B
cast
MPI_Bcast(&sumR, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
818181
Execute <$O-S1>/allreduce.f/c
MP
I Pro
gram
min
g
$> mpifccpx –Kfast allreduce.c
$> mpifrtpx –Kfast allreduce.f
(modify go4.sh, 4 process)
$> pjsub go4.sh
(my_rank, sumALLREDUCE,sumREDUCE)
before BCAST 0 1.200000E+02 1.200000E+02
after BCAST 0 1.200000E+02 1.200000E+02
before BCAST 1 1.200000E+02 0.000000E+00
after BCAST 1 1.200000E+02 1.200000E+02
before BCAST 3 1.200000E+02 0.000000E+00
after BCAST 3 1.200000E+02 1.200000E+02
before BCAST 2 1.200000E+02 0.000000E+00
after BCAST 2 1.200000E+02 1.200000E+02
828282
Exam
ples
by
Col
lect
ive
Com
m.
•D
ot P
rodu
cts
of V
ecto
rs•
Sca
tter/G
athe
r•
Rea
ding
Dis
tribu
ted
File
s•
MP
I_A
llgat
herv
MP
I Pro
gram
min
g
8383
Glo
bal/L
ocal
Dat
a (1
/3)
•P
aral
leliz
atio
n of
an
easy
pro
cess
whe
re a
real
num
ber
is a
dded
to e
ach
com
pone
nt o
f rea
l vec
tor V
ECg:
do i= 1, NG
VECg(i)= VECg(i) + ALPHA
enddo
for (i=0; i<NG; i++{
VECg[i]= VECg[i] + ALPHA
}
MP
I Pro
gram
min
g
8484
Glo
bal/L
ocal
Dat
a (2
/3)
•C
onfig
urat
iona
–N
G=
32 (l
engt
h of
the
vect
or)
–A
LPH
A=1
000.
–P
roce
ss #
of M
PI=
4
•V
ecto
r VE
Cg
has
follo
win
g 32
com
pone
nts
(<
$T-S
1>/a
1x.a
ll):
(101.0, 103.0, 105.0, 106.0, 109.0, 111.0, 121.0, 151.0,
201.0, 203
.0, 205.0, 206
.0, 209.
0, 211
.0, 221.0, 251
.0,
301.0, 303
.0, 305.0, 306
.0, 309.
0, 311
.0, 321.0, 351
.0,
401.0, 403
.0, 405.0, 406
.0, 409.
0, 411
.0, 421.0, 451
.0)
MP
I Pro
gram
min
g
8585
Glo
bal/L
ocal
Dat
a (3
/3)
•P
roce
dure
①R
eadi
ng v
ecto
r VEC
gw
ith le
ngth
=32
from
one
pro
cess
(e.g
.0th
proc
ess)
–G
loba
l Dat
a②
Dis
tribu
ting
vect
or c
ompo
nent
s to
4 M
PI p
roce
sses
equ
ally
(i.e
.len
gth=
8
for e
ach
proc
esse
s)–
Loca
l Dat
a, L
ocal
ID/N
umbe
ring
③A
ddin
g A
LPH
Ato
eac
h co
mpo
nent
of t
he lo
cal v
ecto
r (w
ith le
ngth
= 8)
on
each
pro
cess
.④
Mer
ging
the
resu
lts to
glo
bal v
ecto
r with
leng
th=
32.
•A
ctua
lly, w
e do
not
nee
d pa
ralle
l com
pute
rs fo
r suc
h a
kind
of
smal
l com
puta
tion.
MP
I Pro
gram
min
g
8686
Ope
ratio
ns o
f Sca
tter/G
athe
r (1/
8)R
eadi
ng V
ECg
(leng
th=3
2) fr
om a
pro
cess
(e.g
.#0)
•R
eadi
ng g
loba
l dat
a fro
m #
0 pr
oces
s i
nclude
'mpif.h'
integer,
parameter :: NG= 32
real(kind=8), dimension(NG):: VECg
call MPI_INIT (ierr)
call MPI_COMM_SIZE (<comm>, PETOT , ierr)
call MPI_COMM_RANK (<comm>, my_rank, ierr)
if (my_rank.eq.0) then
open (21, file= 'a1x.all', status= 'un
known')
do i= 1, NG
read (21,*) VECg(i)
enddo
close (21)
endif
#include <mpi.h>
#include <stdio.h>
#include <math.h>
#include <assert.h>
int main(int argc, char **argv){
int i, NG=32;
int PeTot, MyRank, MPI_Comm;
double VECg[32];
char filename[80];
FILE *fp;
MPI_Init(&argc, &argv);
MPI_Comm_size(<comm>, &PeTot);
MPI_Comm_rank(<comm>, &MyRank);
fp = fopen("a1x.all", "r");
if(!MyRank) for(i=0;i<NG;i++){
fscanf(fp, "%lf", &VECg[i]);
}
MP
I Pro
gram
min
g
8787
Ope
ratio
ns o
f Sca
tter/G
athe
r (2/
8)D
istri
butin
g gl
obal
dat
a to
4 p
roce
ss e
qual
ly (i
.e.l
engt
h=8
for
each
pro
cess
)
•M
PI_
Sca
tter
MP
I Pro
gram
min
g
888888
MPI
_Sca
tter
•S
ends
dat
a fro
m o
ne p
roce
ss to
all
othe
r pro
cess
es in
a c
omm
unic
ator
–
scount
-siz
e m
essa
ges
are
sent
to e
ach
proc
ess
•MPI_Scatter (sendbuf, scount, sendtype, recvbuf, rcount,
recvtype, root, comm)
–sendbuf
choi
ce
Ist
artin
g ad
dres
s of
sen
ding
buf
fer
type
is d
efin
ed b
y ”datatype”
–scount
int
Inu
mbe
r of e
lem
ents
sen
t to
each
pro
cess
–sendtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
endi
ng b
uffe
rFORTRAN MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_CHARACTER etc.
C MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR etc.
–recvbuf
choi
ce
Ost
artin
g ad
dres
s of
rece
ivin
g bu
ffer
–rcount
int
Inu
mbe
r of e
lem
ents
rece
ived
from
the
root
pro
cess
–
recvtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of r
ecei
ving
buf
fer
–root
int
Ira
nk o
f roo
t pro
cess
–
comm
MP
I_C
ommI
com
mun
icat
orA0P
#0B0
C0
D0
P#1
P#2
P#3
A0P
#0B0
C0
D0
P#1
P#2
P#3
Sca
tter
A0P
#0
B0P
#1
C0
P#2
D0
P#3
A0P
#0
B0P
#1
C0
P#2
D0
P#3
Gat
her
C
MP
I Pro
gram
min
g
898989
MPI
_Sca
tter
(con
t.)•
MPI_Scatter (sendbuf, scount, sendtype, recvbuf, rcount,
recvtype, root, comm)
–sendbuf
choi
ce
Ist
artin
g ad
dres
s of
sen
ding
buf
fer
–scount
int
Inu
mbe
r of e
lem
ents
sen
t to
each
pro
cess
–sendtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
endi
ng b
uffe
r–
recvbuf
choi
ce
Ost
artin
g ad
dres
s of
rece
ivin
g bu
ffer
–rcount
int
Inu
mbe
r of e
lem
ents
rece
ived
from
the
root
pro
cess
–
recvtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of r
ecei
ving
buf
fer
–root
int
Ira
nk o
f roo
t pro
cess
–
comm
MP
I_C
ommI
com
mun
icat
or
•U
sual
ly–
scount = rcount
–sendtype= recvtype
•Th
is fu
nctio
n se
nds scount
com
pone
nts
star
ting
from
sendbuf
(sen
ding
bu
ffer)
at p
roce
ss #root
to e
ach
proc
ess
in comm
. Eac
h pr
oces
s re
ceiv
es
rcount
com
pone
nts
star
ting
from
recvbuf
(rec
eivi
ng b
uffe
r).
A0P
#0B0
C0
D0
P#1
P#2
P#3
A0P
#0B0
C0
D0
P#1
P#2
P#3
Sca
tter
A0P
#0
B0P
#1
C0
P#2
D0
P#3
A0P
#0
B0P
#1
C0
P#2
D0
P#3
Gat
her
C
MP
I Pro
gram
min
g
9090
Ope
ratio
ns o
f Sca
tter/G
athe
r (3/
8)D
istri
butin
g gl
obal
dat
a to
4 p
roce
sses
equ
ally
•A
lloca
ting
rece
ivin
g bu
ffer V
EC(le
ngth
=8) a
t eac
h pr
oces
s.•
8 co
mpo
nent
s se
nt fr
om s
endi
ng b
uffe
r VEC
gof
pro
cess
#0
are
rece
ived
at e
ach
proc
ess
#0-#
3 as
1st-8
thco
mpo
nent
s of
rece
ivin
g bu
ffer V
EC.
call MPI_SCATTER
(sendbuf, scount, sendtype, recvbuf, rcount,
recvtype, root, comm, ierr)
MP
I Pro
gram
min
g
integer, parameter :: N = 8
real(kind=8), dimension(N ) :: VEC
...
call MPI_Scatter &
(VECg, N, MPI_DOUBLE_PRECISION, &
VEC , N, MPI_DOUBLE_PRECISION, &
0, <comm>, ierr)
int N=8;
double VEC [8];
...
MPI_Scatter
(VECg, N, MPI_DOUBLE,
VEC, N,
MPI_DOUBLE, 0, <comm>);
9191
Ope
ratio
ns o
f Sca
tter/G
athe
r (4/
8)D
istri
butin
g gl
obal
dat
a to
4 p
roce
sses
equ
ally
•8
com
pone
nts
are
scat
tere
dto
eac
h pr
oces
s fro
m ro
ot (#
0)•
1st -8
thco
mpo
nent
s of
VEC
gar
e st
ored
as
1st -8
thon
es o
f VEC
at #
0,
9th -
16th
com
pone
nts
of V
ECg
are
stor
ed a
s 1s
t -8th
ones
of V
ECat
#1,
et
c. –VE
Cg:
Glo
bal D
ata,
VEC
: Loc
al D
ata
VECg
sendbuf
VEC
recvbuf
PE
#0
88
88
8
root
PE
#18
PE
#28
PE
#38
VECg
sendbuf
VEC
recvbuf
PE
#0
88
88
8
root
PE
#18
PE
#28
PE
#38lo
cal d
ata
glob
al d
ata
MP
I Pro
gram
min
g
9292
Ope
ratio
ns o
f Sca
tter/G
athe
r (5/
8)D
istri
butin
g gl
obal
dat
a to
4 p
roce
sses
equ
ally
•G
loba
l Dat
a: 1
st-3
2nd
com
pone
nts
of V
ECg
at #
0•
Loca
l Dat
a: 1
st-8
thco
mpo
nent
s of
VEC
at e
ach
proc
ess
•E
ach
com
pone
nt o
f VEC
can
be w
ritte
n fro
m e
ach
proc
ess
in th
e fo
llow
ing
way
:
do i= 1, N
write (*,'(a, 2i8,f10.0)') 'before', my_rank, i, VEC(i)
enddo
for(i=0;i<N;i++){
printf("before %5d %5d %10.0F\n", MyRank, i+1, VEC[i]);}
MP
I Pro
gram
min
g
9393
Ope
ratio
ns o
f Sca
tter/G
athe
r (5/
8)D
istri
butin
g gl
obal
dat
a to
4 p
roce
sses
equ
ally
•G
loba
l Dat
a: 1
st-3
2nd
com
pone
nts
of V
ECg
at #
0•
Loca
l Dat
a: 1
st-8
thco
mpo
nent
s of
VEC
at e
ach
proc
ess
•E
ach
com
pone
nt o
f VEC
can
be w
ritte
n fro
m e
ach
proc
ess
in th
e fo
llow
ing
way
:
MP
I Pro
gram
min
g
PE#0
before 0 1 101.
before 0 2 103.
before 0 3 105.
before 0 4 106.
before 0 5 109.
before 0 6 111.
before 0 7 121.
before 0 8 151.
PE#1
before 1 1 201.
before 1 2 203.
before 1 3 205.
before 1 4 206.
before 1 5 209.
before 1 6 211.
before 1 7 221.
before 1 8 251.
PE#2
before 2 1 301.
before 2 2 303.
before 2 3 305.
before 2 4 306.
before 2 5 309.
before 2 6 311.
before 2 7 321.
before 2 8 351.
PE#3
before 3 1 401.
before 3 2 403.
before 3 3 405.
before 3 4 406.
before 3 5 409.
before 3 6 411.
before 3 7 421.
before 3 8 451.
9494
Ope
ratio
ns o
f Sca
tter/G
athe
r (6/
8)O
n ea
ch p
roce
ss, A
LPH
Ais
add
ed to
eac
h of
8 c
ompo
nent
s of
VEC
•O
n ea
ch p
roce
ss, c
ompu
tatio
n is
in th
e fo
llow
ing
way
r
eal(kind=8), parameter :: ALPHA= 1000.
do i= 1, N
VEC(i)= VEC(i) + ALPHA
enddo
double ALPHA=1000.;
...
for(i=0;i<N;i++){
VEC[i]= VEC[i] + ALPHA;}
•R
esul
ts:
PE#0
after 0 1 1101.
after 0 2 1103.
after 0 3 1105.
after 0 4 1106.
after 0 5 1109.
after 0 6 1111.
after 0 7 1121.
after 0 8 1151.
PE#1
after 1 1 1201.
after 1 2 1203.
after 1 3 1205.
after 1 4 1206.
after 1 5 1209.
after 1 6 1211.
after 1 7 1221.
after 1 8 1251.
PE#2
after 2 1 1301.
after 2 2 1303.
after 2 3 1305.
after 2 4 1306.
after 2 5 1309.
after 2 6 1311.
after 2 7 1321.
after 2 8 1351.
PE#3
after 3 1 1401.
after 3 2 1403.
after 3 3 1405.
after 3 4 1406.
after 3 5 1409.
after 3 6 1411.
after 3 7 1421.
after 3 8 1451.
MP
I Pro
gram
min
g
9595
Ope
ratio
ns o
f Sca
tter/G
athe
r (7/
8)M
ergi
ng th
e re
sults
to g
loba
l vec
tor w
ith le
ngth
= 32
•U
sing
MP
I_G
athe
r (in
vers
e op
erat
ion
to M
PI_
Sca
tter)
MP
I Pro
gram
min
g
969696
MPI
_Gat
her
•G
athe
rs to
geth
er v
alue
s fro
m a
gro
up o
f pro
cess
es, i
nver
se o
pera
tion
to
MP
I_S
catte
r
•MPI_Gather (sendbuf, scount, sendtype, recvbuf, rcount,
recvtype, root, comm)
–sendbuf
choi
ce
Ist
artin
g ad
dres
s of
sen
ding
buf
fer
–scount
int
Inu
mbe
r of e
lem
ents
sen
t to
each
pro
cess
–sendtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
endi
ng b
uffe
r–
recvbuf
choi
ce
Ost
artin
g ad
dres
s of
rece
ivin
g bu
ffer
–rcount
int
Inu
mbe
r of e
lem
ents
rece
ived
from
the
root
pro
cess
–
recvtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of r
ecei
ving
buf
fer
–root
int
Ira
nk o
f roo
t pro
cess
–
comm
MP
I_C
ommI
com
mun
icat
or
•recvbuf
is o
n root
proc
ess.
A0P
#0B0
C0
D0
P#1
P#2
P#3
A0P
#0B0
C0
D0
P#1
P#2
P#3
Sca
tter
A0P
#0
B0P
#1
C0
P#2
D0
P#3
A0P
#0
B0P
#1
C0
P#2
D0
P#3
Gat
her
C
MP
I Pro
gram
min
g
9797
Ope
ratio
ns o
f Sca
tter/G
athe
r (8/
8)M
ergi
ng th
e re
sults
to g
loba
l vec
tor w
ith le
ngth
= 32
•E
ach
proc
ess
com
pone
nts
of V
ECto
VEC
gon
root
(#0
in th
is c
ase)
.
VECg
recvbuf
VEC
sendbuf
PE
#0
88
88
8
root
PE
#18
PE
#28
PE
#38
VECg
recvbuf
VEC
sendbuf
PE
#0
88
88
8
root
PE
#18
PE
#28
PE
#38•
8 co
mpo
nent
s ar
e ga
ther
ed fr
om e
ach
proc
ess
to th
e ro
ot p
roce
ss.
loca
l dat
a
glob
al d
ata
MP
I Pro
gram
min
g
call MPI_Gather &
(VEC , N, MPI_DOUBLE_PRECISION, &
VECg, N, MPI_DOUBLE_PRECISION, &
0, <comm>, ierr)
MPI_Gather
(VEC, N,
MPI_DOUBLE, VECg,
N,
MPI_DOUBLE, 0, <comm>);
989898
<$O-S1>/scatter-gather.f/c
exam
ple
$> mpifccpx –Kfast scatter-gather.c
$> mpifrtpx –Kfast scatter-gather.f
$> (exec.4 proc’s) go4.sh
PE#0
before 0 1 101.
before 0 2 103.
before 0 3 105.
before 0 4 106.
before 0 5 109.
before 0 6 111.
before 0 7 121.
before 0 8 151.
PE#1
before 1 1 201.
before 1 2 203.
before 1 3 205.
before 1 4 206.
before 1 5 209.
before 1 6 211.
before 1 7 221.
before 1 8 251.
PE#2
before 2 1 301.
before 2 2 303.
before 2 3 305.
before 2 4 306.
before 2 5 309.
before 2 6 311.
before 2 7 321.
before 2 8 351.
PE#3
before 3 1 401.
before 3 2 403.
before 3 3 405.
before 3 4 406.
before 3 5 409.
before 3 6 411.
before 3 7 421.
before 3 8 451.
PE#0
after 0 1 1101.
after 0 2 1103.
after 0 3 1105.
after 0 4 1106.
after 0 5 1109.
after 0 6 1111.
after 0 7 1121.
after 0 8 1151.
PE#1
after 1 1 1201.
after 1 2 1203.
after 1 3 1205.
after 1 4 1206.
after 1 5 1209.
after 1 6 1211.
after 1 7 1221.
after 1 8 1251.
PE#2
after 2 1 1301.
after 2 2 1303.
after 2 3 1305.
after 2 4 1306.
after 2 5 1309.
after 2 6 1311.
after 2 7 1321.
after 2 8 1351.
PE#3
after 3 1 1401.
after 3 2 1403.
after 3 3 1405.
after 3 4 1406.
after 3 5 1409.
after 3 6 1411.
after 3 7 1421.
after 3 8 1451.
MP
I Pro
gram
min
g
999999
MPI
_Red
uce_
scat
ter
•M
PI_
Red
uce
+ M
PI_
Sca
tter
•MPI_Reduce_Scatter(sendbuf, recvbuf, rcount, datatype,
op, comm)
–sendbuf
choi
ce
Ist
artin
g ad
dres
s of
sen
ding
buf
fer
–recvbuf
choi
ce
Ost
artin
g ad
dres
s of
rece
ivin
g bu
ffer
–rcount
int
Iin
tege
r arr
ay s
peci
fyin
g th
e nu
mbe
r of e
lem
ents
in re
sult
dist
ribut
ed to
eac
h pr
oces
s. A
rray
mus
t be
iden
tical
on
all
callin
g pr
oces
ses.
–
datatype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
endi
ng/re
ceiv
ing
buffe
r–
op
MP
I_O
pI
redu
ce o
pera
tion
MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND etc
–comm
MP
I_C
ommI
com
mun
icat
or
Red
uce
scat
ter
P#0
P#1
P#2
P#3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
op.A
0-A
3op
.A0-
A3
op.B
0-B
3op
.B0-
B3
op.C
0-C
3op
.C0-
C3
op.D
0-D
3op
.D0-
D3
C
MP
I Pro
gram
min
g
100
100
100
MPI
_Allg
athe
r
•M
PI_
Gat
her+
MP
I_B
cast
–G
athe
rs d
ata
from
all
task
s an
d di
strib
ute
the
com
bine
d da
ta to
all
task
s
•MPI_Allgather(sendbuf, scount, sendtype, recvbuf, rcount,
recvtype, comm)
–sendbuf
choi
ce
Ist
artin
g ad
dres
s of
sen
ding
buf
fer
–scount
int
Inu
mbe
r of e
lem
ents
sen
t to
each
pro
cess
–sendtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
endi
ng b
uffe
r–
recvbuf
choi
ce
Ost
artin
g ad
dres
s of
rece
ivin
g bu
ffer
–rcount
int
Inu
mbe
r of e
lem
ents
rece
ived
from
eac
h pr
oces
s –
recvtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of r
ecei
ving
buf
fer
–comm
MP
I_C
ommI
com
mun
icat
or
All
gath
erA
0P
#0B0
C0
D0
A0
P#1
B0C
0D
0
A0
P#2
B0C
0D
0
A0
P#3
B0C
0D
0
A0
P#0
B0C
0D
0
A0
P#1
B0C
0D
0
A0
P#2
B0C
0D
0
A0
P#3
B0C
0D
0
A0
P#0
B0
P#1
C0
P#2
D0
P#3
A0
P#0
B0
P#1
C0
P#2
D0
P#3
C
MP
I Pro
gram
min
g
101
101
101
MPI
_Allt
oall
•S
ends
dat
a fro
m a
ll to
all
proc
esse
s: tr
ansf
orm
atio
n of
den
se m
atrix
•MPI_Alltoall (sendbuf, scount, sendtype, recvbuf, rcount,
recvtype, comm)
–sendbuf
choi
ce
Ist
artin
g ad
dres
s of
sen
ding
buf
fer
–scount
int
Inu
mbe
r of e
lem
ents
sen
t to
each
pro
cess
–sendtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
endi
ng b
uffe
r–
recvbuf
choi
ce
Ost
artin
g ad
dres
s of
rece
ivin
g bu
ffer
–rcount
int
Inu
mbe
r of e
lem
ents
rece
ived
from
the
root
pro
cess
–
recvtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of r
ecei
ving
buf
fer
–comm
MP
I_C
ommI
com
mun
icat
or
All-
to-A
llA
0P
#0A
1A
2A3
B0
P#1
B1
B2
B3
C0
P#2
C1
C2
C3
D0
P#3
D1
D2
D3
A0
P#0
A1
A2
A3
B0
P#1
B1
B2
B3
C0
P#2
C1
C2
C3
D0
P#3
D1
D2
D3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
A0
P#0
B0
C0
D0
A1
P#1
B1
C1
D1
A2
P#2
B2
C2
D2
A3
P#3
B3
C3
D3
C
MP
I Pro
gram
min
g
102
102
102
Exam
ples
by
Col
lect
ive
Com
m.
•D
ot P
rodu
cts
of V
ecto
rs•
Sca
tter/G
athe
r•
Rea
ding
Dis
tribu
ted
File
s•
MP
I_A
llgat
herv
MP
I Pro
gram
min
g
103
103
103
Ope
ratio
ns o
f Dis
trib
uted
Loc
al F
iles
•In
Sca
tter/G
athe
r exa
mpl
e, P
E#0
read
s gl
obal
dat
a, th
at
is s
catte
red
to e
ach
proc
esse
r, th
en p
aral
lel o
pera
tions
ar
e do
ne.
•If
the
prob
lem
siz
e is
ver
y la
rge,
a s
ingl
e pr
oces
sor m
ay
not r
ead
entir
e gl
obal
dat
a.–
If th
e en
tire
glob
al d
ata
is d
ecom
pose
d to
dis
tribu
ted
loca
l dat
a se
ts, e
ach
proc
ess
can
read
the
loca
l dat
a.–
If gl
obal
ope
ratio
ns a
re n
eede
d to
a c
erta
in s
ets
of v
ecto
rs,
MP
I fun
ctio
ns, s
uch
as M
PI_
Gat
her e
tc. a
re a
vaila
ble.
MP
I Pro
gram
min
g
104
104
104
Rea
ding
Dis
trib
uted
Loc
al F
iles:
U
nifo
rm V
ec. L
engt
h (1
/2)
>$ cd <$O-S1>
>$ ls a1.*
a1.0 a1.1 a1.2 a1.3
a1x.all is decomposed to
4 files.
>$ mpifccpx –Kfast file.c
>$ mpifrtpx –Kfast file.f
(modify go4.sh for 4 processes)
>$ pjsub go4.sh
MP
I Pro
gram
min
g
105
105
Ope
ratio
ns o
f Dis
trib
uted
Loc
al F
iles
•Lo
cal f
iles a1.0~a1.3
are
orig
inal
ly fr
om g
loba
l file
a1x.all
.
a1.0
a1.1
a1.2
a1.3
a1x.
all
MP
I Pro
gram
min
g
106
106
Rea
ding
Dis
trib
uted
Loc
al F
iles:
U
nifo
rm V
ec. L
engt
h (2
/2)
<$O-S1>/file.c
MP
I Pro
gram
min
g
int main(int argc, char **argv){
int i;
int PeTot, MyRank;
MPI_Comm SolverComm;
double vec[8];
char FileName[80];
FILE *fp;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &PeTot);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
sprintf(FileName, "a1.%d", MyRank);
fp = fopen(FileName, "r");
if(fp == NULL) MPI_Abort(MPI_COMM_WORLD, -1);
for(i=0;i<8;i++){
fscanf(fp, "%lf", &vec[i]); }
for(i=0;i<8;i++){
printf("%5d%5d%10.0f¥n", MyRank, i+1, vec[i]);
} MPI_Finalize();
return 0;
}
Sim
ilar t
o “H
ello
”
Loca
l ID
is 0
-7
107
107
Typi
cal S
PMD
Ope
ratio
n
PE
#0
“a.o
ut”
“a1.
0”
PE
#1
“a.o
ut”
“a1.
1”
PE
#2
“a.o
ut”
“a1.
2”
mpirun -np 4 a.out
PE
#3
“a.o
ut”
“a1.
3”
MP
I Pro
gram
min
g
108
108
108
Non
-Uni
form
Vec
tor L
engt
h (1
/2)
>$ cd <$O-S1>
>$ lsa2.*
a2.0 a2.1 a2.2 a2.3
>$ cat a2.0
5N
umbe
r of C
ompo
nent
s at
eac
h P
roce
ss201.0
Com
pone
nts
203.0
205.0
206.0
209.0
>$ mpifccpx–Kfastfile2.c
>$ mpifrtpx–Kfastfile2.f
(modify go4.sh for 4 processes)
>$ pjsubgo4.sh
MP
I Pro
gram
min
g
int main(int argc, char **argv){
int i, int PeTot, MyRank;
MPI_Comm SolverComm;
double *vec, *vec2, *vecg;
int num;
double sum0, sum;
char filename[80];
FILE *fp;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &PeTot);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
sprintf(filename, "a2.%d", MyRank);
fp = fopen(filename, "r");
assert(fp != NULL);
fscanf(fp, "%d", &num);
vec = malloc(num * sizeof(double));
for(i=0;i<num;i++){fscanf(fp, "%lf", &vec[i]);}
for(i=0;i<num;i++){
printf(" %5d%5d%5d%10.0f¥n", MyRank, i+1, num, vec[i]);}
MPI_Finalize();
}
109
109
Non
-Uni
form
Vec
tor L
engt
h (2
/2)
“num
” is
diffe
rent
at e
ach
proc
ess
<$O-S1>/file2.c
MP
I Pro
gram
min
g
110
How
to g
ener
ate
loca
l dat
a•
Rea
ding
glo
bal d
ata
(N=N
G)
–S
catte
ring
to e
ach
proc
ess
–P
aral
lel p
roce
ssin
g on
eac
h pr
oces
s–
(If n
eede
d) re
cons
truct
ion
of g
loba
l dat
a by
gat
herin
g lo
cal d
ata
•G
ener
atin
g lo
cal d
ata
(N=N
L), o
r rea
ding
dis
tribu
ted
loca
l da
ta
–G
ener
atin
g or
read
ing
loca
l dat
a on
eac
h pr
oces
s–
Par
alle
l pro
cess
ing
on e
ach
proc
ess
–(If
nee
ded)
reco
nstru
ctio
n of
glo
bal d
ata
by g
athe
ring
loca
l dat
a
•In
futu
re, l
atte
r cas
e is
mor
e im
porta
nt, b
ut fo
rmer
cas
e is
al
so in
trodu
ced
in th
is c
lass
for u
nder
stan
ding
of
oper
atio
ns o
f glo
bal/l
ocal
dat
a.
MP
I Pro
gram
min
g
111
111
111
Exam
ples
by
Col
lect
ive
Com
m.
•D
ot P
rodu
cts
of V
ecto
rs•
Sca
tter/G
athe
r•
Rea
ding
Dis
tribu
ted
File
s•
MP
I_A
llgat
herv
MP
I Pro
gram
min
g
MP
Ipro
g.
112
MPI
_Gat
herv,
MPI
_Sca
tterv
•M
PI_
Gat
her,
MP
I_S
catte
r–
Leng
th o
f mes
sage
from
/to e
ach
proc
ess
is u
nifo
rm
•M
PI_
XX
Xv
exte
nds
func
tiona
lity
of M
PI_
XX
X b
y al
low
ing
a va
ryin
g co
unt o
f dat
a fro
m e
ach
proc
ess:
–
MP
I_G
athe
rv–
MP
I_S
catte
rv–
MP
I_A
llgat
herv
–M
PI_
Allt
oallv
MP
Ipro
g.
113
MPI
_Allg
athe
rv
•V
aria
ble
coun
t ver
sion
of M
PI_
Allg
athe
r–
crea
tes
“glo
bal d
ata”
from
“loc
al d
ata”
•MPI_Allgatherv(sendbuf, scount, sendtype, recvbuf,
rcounts, displs, recvtype, comm)
–sendbuf
choi
ce
Ist
artin
g ad
dres
s of
sen
ding
buf
fer
–scount
int
Inu
mbe
r of e
lem
ents
sen
t to
each
pro
cess
–sendtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of s
endi
ng b
uffe
r–
recvbuf
choi
ce
Ost
artin
g ad
dres
s of
rece
ivin
g bu
ffer
–rcounts
int
Iin
tege
r arra
y (o
f len
gth
grou
p si
ze) c
onta
inin
g th
e nu
mbe
r of
elem
ents
that
are
to b
e re
ceiv
ed fr
om e
ach
proc
ess
(arr
ay: s
ize=
PETOT
)–
displs
int
Iin
tege
r arr
ay (o
f len
gth
grou
p si
ze).
Ent
ry i
spec
ifies
the
disp
lace
men
t (re
lativ
e to
recv
buf)
at w
hich
to p
lace
the
inco
min
g da
ta fr
om p
roce
ss i
(arr
ay: s
ize=
PETOT+1
)–
recvtype
MP
I_D
atat
ypeI
data
type
of e
lem
ents
of r
ecei
ving
buf
fer
–comm
MP
I_C
ommI
com
mun
icat
or
C
MP
Ipro
g.
114
MPI
_Allg
athe
rv(c
ont.)
•MPI_Allgatherv(sendbuf, scount, sendtype, recvbuf,
rcounts, displs, recvtype, comm)
–rcounts
int
Iin
tege
r arra
y (o
f len
gth
grou
p si
ze) c
onta
inin
g th
e nu
mbe
r of
elem
ents
that
are
to b
e re
ceiv
ed fr
om e
ach
proc
ess
(arr
ay: s
ize=
PETOT
)–
displs
int
Iin
tege
r arr
ay (o
f len
gth
grou
p si
ze).
Ent
ry i
spec
ifies
the
disp
lace
men
t (re
lativ
e to
recv
buf)
at w
hich
to p
lace
the
inco
min
g da
ta fr
om p
roce
ss i
(arr
ay: s
ize=
PETOT+1
)–
Thes
e tw
o ar
rays
are
rela
ted
to s
ize
of fi
nal “
glob
al d
ata”
, the
refo
re e
ach
proc
ess
requ
ires
info
rmat
ion
of th
ese
arra
ys (r
counts, displs
)•
Eac
h pr
oces
s m
ust h
ave
sam
e va
lues
for a
ll co
mpo
nent
s of
bot
h ve
ctor
s–
Usu
ally
, stride(i)=rcounts(i)
rcou
nts[
0]rc
ount
s[1]
rcou
nts[
2]rc
ount
s[m
-1]
rcou
nts[
m-2
]
PE#0
PE#1
PE#2
PE#(
m-2
)PE
#(m
-1)
disp
ls[0
]=0
disp
ls[1
]=di
spls
[0] +
stri
de[0
]di
spls
[m]=
disp
ls[m
-1] +
stri
de[m
-1]
strid
e[0]
strid
e[1]
strid
e[2]
strid
e[m
-2]
strid
e[m
-1]
size
[recv
buf]=
dis
pls[
PE
TOT]
= su
m[s
tride
]
C
MP
Ipro
g.
115
Wha
t MPI
_Allg
athe
rv
is d
oing
strid
e[0]
PE
#0N
PE
#1N
PE
#2N
PE
#3N
rcounts[0] rcounts[1] rcounts[2] rcounts[3]
strid
e[1]
strid
e[2]
strid
e[3]
disp
ls[0
]
disp
ls[1
]
disp
ls[2
]
disp
ls[3
]
disp
ls[4
]
Loca
l Dat
a: s
endb
ufG
loba
l Dat
a: re
cvbu
f
Gen
erat
ing
glob
al d
ata
from
lo
cal d
ata
MP
Ipro
g.
116
Wha
t MPI
_Allg
athe
rv is
do
ing
strid
e[0]
= rc
ount
s[0]
PE
#0N
PE
#1N
PE
#2N
PE
#3N
rcounts[0] rcounts[1] rcounts[2] rcounts[3]
strid
e[1]
= rc
ount
s[1]
strid
e[2]
= rc
ount
s[2]
strid
e[3]
= rc
ount
s[3]
disp
ls[0
]
disp
ls[1
]
disp
ls[2
]
disp
ls[3
]
disp
ls[4
]
Gen
erat
ing
glob
al d
ata
from
loca
l dat
a
Loca
l Dat
a: s
endb
ufG
loba
l Dat
a: re
cvbu
f
MP
Ipro
g.
117
MPI
_Allg
athe
rv in
det
ail (
1/2)
•MPI_Allgatherv(sendbuf, scount, sendtype, recvbuf, rcounts,
displs, recvtype, comm)
•rcounts
–S
ize
of m
essa
ge fr
om e
ach
PE
: Siz
e of
Loc
al D
ata
(Len
gth
of L
ocal
Vec
tor)
•displs
–A
ddre
ss/in
dex
of e
ach
loca
l dat
a in
the
vect
or o
f glo
bal d
ata
–displs(PETOT+1)
= S
ize
of E
ntire
Glo
bal D
ata
(Glo
bal V
ecto
r)
rcou
nts[
0]rc
ount
s[1]
rcou
nts[
2]rc
ount
s[m
-1]
rcou
nts[
m-2
]
PE#0
PE#1
PE#2
PE#(
m-2
)PE
#(m
-1)
disp
ls[0
]=0
disp
ls[1
]=di
spls
[0] +
stri
de[0
]di
spls
[m]=
disp
ls[m
-1] +
stri
de[m
-1]
strid
e[0]
strid
e[1]
strid
e[2]
strid
e[m
-2]
strid
e[m
-1]
size
[recv
buf]=
dis
pls[
PE
TOT]
= su
m[s
tride
]
C
MP
Ipro
g.
118
MPI
_Allg
athe
rv in
det
ail (
2/2)
•E
ach
proc
ess
need
s in
form
atio
n of
rcounts
& displs
–“rcounts”
can
be c
reat
ed b
y ga
ther
ing
loca
l vec
tor l
engt
h “N”
from
eac
h pr
oces
s.
–O
n ea
ch p
roce
ss, “displs”
can
be g
ener
ated
from
“rcounts”
on e
ach
proc
ess.
•stride[i]= rcounts[i]
–S
ize
of ”recvbuf”
is c
alcu
late
d by
sum
mat
ion
of ”rcounts”.
rcou
nts[
0]rc
ount
s[1]
rcou
nts[
2]rc
ount
s[m
-1]
rcou
nts[
m-2
]
PE#0
PE#1
PE#2
PE#(
m-2
)PE
#(m
-1)
disp
ls[0
]=0
disp
ls[1
]=di
spls
[0] +
stri
de[0
]di
spls
[m]=
disp
ls[m
-1] +
stri
de[m
-1]
strid
e[0]
strid
e[1]
strid
e[2]
strid
e[m
-2]
strid
e[m
-1]
size
[recv
buf]=
dis
pls[
PE
TOT]
= su
m[s
tride
]
C
MP
Ipro
g.
119
Prep
arat
ion
for M
PI_A
llgat
herv
<$O
-S1>
/agv
.c
•G
ener
atin
g gl
obal
vec
tor f
rom
“a2
.0”~
”a2.
3”.
•Le
ngth
of t
he e
ach
vect
or is
8, 5
, 7, a
nd 3
, re
spec
tivel
y. T
here
fore
, siz
e of
fina
l glo
bal v
ecto
r is
23
(= 8
+5+7
+3).
MP
Ipro
g.
120
a2.0
~a2.
3
PE#0
8101.0
103.0
105.0
106.0
109.0
111.0
121.0
151.0
PE#1
5201.0
203.0
205.0
206.0
209.0
PE#2
7301.0
303.0
305.0
306.0
311.0
321.0
351.0
PE#3
3401.0
403.0
405.0
MP
Ipro
g.
121
int main(int argc, char **argv){
int i;
int PeTot, MyRank;
MPI_Comm SolverComm;
double *vec, *vec2, *vecg;
int *Rcounts, *Displs;
int n;
double sum0, sum;
char filename[80];
FILE *fp;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &PeTot);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
sprintf(filename, "a2.%d", MyRank);
fp = fopen(filename, "r");
assert(fp != NULL);
fscanf(fp, "%d", &n);
vec = malloc(n * sizeof(double));
for(i=0;i<n;i++){
fscanf(fp, "%lf", &vec[i]);
}
Prep
arat
ion:
MPI
_Allg
athe
rv (1
/4)
<$O-S1>/agv.c
n(NL)
is d
iffer
ent a
tea
ch p
roce
ss
C
MP
Ipro
g.
122
Prep
arat
ion:
MPI
_Allg
athe
rv (2
/4)
Rcounts= calloc(PeTot, sizeof(int));
Displs = calloc(PeTot+1, sizeof(int));
printf("before %d %d", MyRank, n);
for(i=0;i<PeTot;i++){printf(" %d", Rcounts[i]);}
MPI_Allgather(&n, 1, MPI_INT, Rcounts, 1, MPI_INT, MPI_COMM_WORLD);
printf("after %d %d", MyRank, n);
for(i=0;i<PeTot;i++){printf(" %d", Rcounts[i]);}
Displs[0] = 0;
Rcounts
on e
ach
PE
PE
#0 N
=8
PE
#1 N
=5
PE
#2 N
=7
PE
#3 N
=3
MP
I_A
llgat
her
Rco
unts
[0:3
]= {8
, 5, 7
, 3}
Rco
unts
[0:3
]={8
, 5, 7
, 3}
Rco
unts
[0:3
]={8
, 5, 7
, 3}
Rco
unts
[0:3
]={8
, 5, 7
, 3}
<$O-S1>/agv.c
C
MP
Ipro
g.
123
Prep
arat
ion:
MPI
_Allg
athe
rv (2
/4)
Rcounts= calloc(PeTot, sizeof(int));
Displs = calloc(PeTot+1, sizeof(int));
printf("before %d %d", MyRank, n);
for(i=0;i<PeTot;i++){printf(" %d", Rcounts[i]);}
MPI_Allgather(&n, 1, MPI_INT, Rcounts, 1, MPI_INT, MPI_COMM_WORLD);
printf("after %d %d", MyRank, n);
for(i=0;i<PeTot;i++){printf(" %d", Rcounts[i]);}
Displs[0] = 0;
for(i=0;i<PeTot;i++){
Displs[i+1] = Displs[i] + Rcounts[i];}
printf("CoundIndex %d ", MyRank);
for(i=0;i<PeTot+1;i++){
printf(" %d", Displs[i]);
} MPI_Finalize();
return 0;
}<$O-S1>/agv.c
C
Rcounts
on e
ach
PE
Displs
on e
ach
PE
MP
Ipro
g.
124
Prep
arat
ion:
MPI
_Allg
athe
rv (3
/4)
> cd <$O-S1>
> mpifccpx –Kfast agv.c
(modify go4.sh for 4 processes)
> pjsub go4.sh
before 0 8 0 0 0 0
after 0 8 8 5 7 3
Displs 0 0 8 13 20 23
before 1 5 0 0 0 0
after 1 5 8 5 7 3
Displs 1 0 8 13 20 23
before 3 3 0 0 0 0
after 3 3 8 5 7 3
Displs 3 0 8 13 20 23
before 2 7 0 0 0 0
after 2 7 8 5 7 3
Displs 2 0 8 13 20 23
MP
Ipro
g.
125
Prep
arat
ion:
MPI
_Allg
athe
rv (4
/4)
•O
nly ”recvbuf”
is n
ot d
efin
ed y
et.
•S
ize
of ”recvbuf”
= ”Displs[PETOT]”
MPI_Allgatherv
( VEC , N, MPI_DOUBLE,
recvbuf, rcounts, displs, MPI_DOUBLE,
MPI_COMM_WORLD);
126
Rep
ort S
1 (1
/2)
•D
eadl
ine:
17:
00 O
ctob
er 1
1th
(Sat
), 20
14.
–S
end
files
via
e-m
ail a
t nakajima(at)cc.u-tokyo.ac.jp
•P
robl
em S
1-1
–R
ead
loca
l file
s <$
O-S
1>/a
1.0~
a1.3
, <$O
-S1>
/a2.
0~a2
.3.
–D
evel
op c
odes
whi
ch c
alcu
late
nor
m ||
x|| o
f glo
bal v
ecto
r for
eac
h ca
se.
•<$
O-S
1>fil
e.c,
<$O
-S1>
file2
.c
•P
robl
em S
1-2
–R
ead
loca
l file
s <$
O-S
1>/a
2.0~
a2.3
.–
Dev
elop
a c
ode
whi
ch c
onst
ruct
s “g
loba
l vec
tor”
usin
g M
PI_
Allg
athe
rv.
MP
I Pro
gram
min
g
127
Rep
ort S
1 (2
/2)
•P
robl
em S
1-3
–D
evel
op p
aral
lel p
rogr
am w
hich
cal
cula
tes
the
follo
win
g nu
mer
ical
inte
grat
ion
usin
g “tr
apez
oida
l rul
e” b
y M
PI_
Red
uce,
M
PI_
Bca
st e
tc.
–M
easu
re c
ompu
tatio
n tim
e, a
nd p
aral
lel p
erfo
rman
ce
dxx
1 02
14
•R
epor
t–
Cov
er P
age:
Nam
e, ID
, and
Pro
blem
ID (S
1) m
ust b
e w
ritte
n.
–Le
ss th
an tw
o pa
ges
incl
udin
g fig
ures
and
tabl
es (A
4) fo
r eac
h of
thre
e su
b-pr
oble
ms
•S
trate
gy, S
truct
ure
of th
e P
rogr
am, R
emar
ks–
Sou
rce
list o
f the
pro
gram
(if y
ou h
ave
bugs
)–
Out
put l
ist (
as s
mal
l as
poss
ible
)
MP
I Pro
gram
min
g
128
128
128
•W
hat i
s M
PI ?
•Y
our F
irst M
PI P
rogr
am: H
ello
Wor
ld
•G
loba
l/Loc
al D
ata
•C
olle
ctiv
e C
omm
unic
atio
n•
Peer
-to-P
eer C
omm
unic
atio
n
MP
I Pro
gram
min
g
Peer
-to-P
eer C
omm
unic
atio
nPo
int-t
o-Po
int C
omm
unic
atio
1対
1通信
•W
hat i
s P
2P C
omm
unic
atio
n ?
•2D
Pro
blem
, Gen
eral
ized
Com
mun
icat
ion
Tabl
e•
Rep
ort S
2
129
MP
I Pro
gram
min
g
1D F
EM: 1
2 no
des/
11 e
lem
’s/3
dom
ains
01
23
45
67
89
1011
01
23
45
67
89
10
01
23
4
78
910
11
12
3
78
910
0
34
56
78
34
56
7
01
23
45
67
89
1011
12
34
56
78
910
0
130
MP
I Pro
gram
min
g
1D F
EM: 1
2 no
des/
11 e
lem
’s/3
dom
ains
Loca
l ID
: Sta
rting
from
0 fo
r nod
e an
d el
em a
t eac
h do
mai
n
01
23
4
40
12
3
12
3
30
12
0
40
12
35
30
12
4
#0
#1
#2
131
MP
I Pro
gram
min
g
1D F
EM: 1
2 no
des/
11 e
lem
’s/3
dom
ains
Inte
rnal
/Ext
erna
l Nod
es
01
23
4
40
12
3
12
3
30
12
0
40
12
35
30
12
4
#0
#1
#2
132
MP
I Pro
gram
min
g
Pre
cond
ition
ed C
onju
gate
Gra
dien
t M
etho
d (C
G)
Compute r(0)= b-[A]x(0)
fori= 1, 2, …
solve [M]z(i-1)= r(i-1)
i-1= r
(i-1) z(i-1)
ifi=1
p(1)= z(0)
else
i-1= i
-1/ i
-2
p(i)= z(i-1) + i
-1p(i-1)
endif
q(i)= [A]p(i)
i = i
-1/p(i)q(i)
x(i)= x(i-1) + ip(i)
r(i)= r(i-1) -
iq(i)
check convergence |r|
end
Pre
cond
ition
er:
Dia
gona
l Sca
ling
Poi
nt-J
acob
i Pre
cond
ition
ing13
3M
PI P
rogr
amm
ing
N
N
DD
DD
M
0...
00
00
0...
......
00
00
0...
0
1
2
1
Pre
cond
ition
ing,
DA
XP
YLo
cal O
pera
tions
by
Onl
y In
tern
al P
oint
s: P
aral
lel P
roce
ssin
g is
pos
sibl
e
/*
//--{x}= {x} + ALPHA*{p}
DAXPY: double a{x} plus {y}
// {r}= {r} -ALPHA*{q}
*/
for(i=0;i<N;i++){
PHI[i] += Alpha * W[P][i];
W[R][i] -= Alpha * W[Q][i];
}
/*//--
{z}= [Minv]{r}
*/
for(i=0;i<N;i++){
W[Z][i] = W[DD][i] * W[R][i];
}
0 1 2 3 4 5 6 7 8 9 10 11
134
MP
I Pro
gram
min
g
Dot
Pro
duct
sG
loba
l Sum
mat
ion
need
ed: C
omm
unic
atio
n ?
/*
//--
ALPHA= RHO / {p}{q}
*/C1 = 0.0;
for(i=0;i<N;i++){
C1 += W[P][i] * W[Q][i];
} Alpha = Rho / C1;
0 1 2 3 4 5 6 7 8 9 10 11
135
MP
I Pro
gram
min
g
Mat
rix-V
ecto
r Pro
duct
sV
alue
s at
Ext
erna
l Poi
nts:
P-to
-P C
omm
unic
atio
n/*//--{q}= [A]{p}
*/for(i=0;i<N;i++){
W[Q][i] = Diag[i] * W[P][i];
for(j=Index[i];j<Index[i+1];j++){
W[Q][i] += AMat[j]*W[P][Item[j]];
}}
40
12
35
136
MP
I Pro
gram
min
g
Mat
-Vec
Pro
duct
s: L
ocal
Op.
Pos
sibl
e0
1
2
3
4
5
6
7
8
9
10
11
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
=
137
MP
I Pro
gram
min
g
Mat
-Vec
Pro
duct
s: L
ocal
Op.
Pos
sibl
e0
1
2
3
4
5
6
7
8
9
10
11
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
=
138
MP
I Pro
gram
min
g
Mat
-Vec
Pro
duct
s: L
ocal
Op.
Pos
sibl
e0
1
2
3
0
1
2
3
0
1
2
3
0 1 2 3 0 1 2 3 0 1 2 3
0 1 2 3 0 1 2 3 0 1 2 3
=
139
MP
I Pro
gram
min
g
Mat
-Vec
Pro
duct
s: L
ocal
Op.
#1
0
1
2
3
0 1 2 3
0 1 2 3
=
40
12
35
0
1
2
3
0 1 2 3
0 1 2 3
=
4 5
140
MP
I Pro
gram
min
g
Wha
t is
Peer
-to-P
eer C
omm
unic
atio
n ?
•C
olle
ctiv
e C
omm
unic
atio
n–
MP
I_R
educ
e, M
PI_
Sca
tter/G
athe
r etc
.–
Com
mun
icat
ions
with
all
proc
esse
s in
the
com
mun
icat
or–
App
licat
ion
Are
a•
BE
M, S
pect
ral M
etho
d, M
D: g
loba
l int
erac
tions
are
con
side
red
•D
ot p
rodu
cts,
MA
X/M
IN: G
loba
l Sum
mat
ion
& C
ompa
rison
•P
eer-
toP
eer/P
oint
-to-P
oint
–M
PI_
Sen
d, M
PI_
Rec
eive
–C
omm
unic
atio
n w
ith li
mite
d pr
oces
ses
•N
eigh
bors
–A
pplic
atio
n A
rea
•FE
M, F
DM
: Loc
aliz
ed M
etho
d
01
23
4
40
12
3
12
3
30
12
0
40
12
35
30
12
4
#0
#1
#2
141
MP
I Pro
gram
min
g
Col
lect
ive/
P2P
Com
mun
icat
ions
Inte
ract
ions
with
onl
y N
eigh
borin
g P
roce
sses
/Ele
men
tFi
nite
Diff
eren
ce M
etho
d (F
DM
), Fi
nite
Ele
men
t Met
hod
(FE
M)
142
MP
I Pro
gram
min
g
Whe
n do
we
need
P2P
com
m.:
1D-F
EMIn
fo in
nei
ghbo
ring
dom
ains
is re
quire
d fo
r FE
M o
pera
tions
Mat
rix a
ssem
blin
g, It
erat
ive
Met
hod
01
23
4
40
12
3
12
3
30
12
0
40
12
35
30
12
4
#0
#1
#2
143
MP
I Pro
gram
min
g
Met
hod
for P
2P C
omm
.•MPI_Send,
MPI_Recv
•Th
ese
are
“blo
ckin
g” fu
nctio
ns. “
Dea
d lo
ck” o
ccur
s fo
r th
ese
“blo
ckin
g” fu
nctio
ns.
•A
“blo
ckin
g” M
PI c
all m
eans
that
the
prog
ram
exe
cutio
n w
ill be
sus
pend
ed u
ntil
the
mes
sage
buf
fer i
s sa
fe to
use
. •
The
MP
I sta
ndar
ds s
peci
fy th
at a
blo
ckin
g S
EN
D o
r R
EC
V d
oes
not r
etur
n un
til th
e se
nd b
uffe
r is
safe
to
reus
e (fo
r MP
I_S
end)
, or t
he re
ceiv
e bu
ffer i
s re
ady
to
use
(for M
PI_
Rec
v).
–B
lock
ing
com
m. c
onfir
ms
“sec
ure”
com
mun
icat
ion,
but
it is
ver
y in
conv
enie
nt.
•P
leas
e ju
st re
mem
ber t
hat “
ther
e ar
e su
ch fu
nctio
ns”.
144
MP
I Pro
gram
min
g
MPI
_Sen
d/M
PI_R
ecv
if (my_rank.eq.0) NEIB_ID=1
if (my_rank.eq.1) NEIB_ID=0
… call MPI_SEND (NEIB_ID, arg’s)
call MPI_RECV (NEIB_ID, arg’s)
…
•Th
is s
eem
s re
ason
able
, but
it s
tops
at
MP
I_S
end/
MP
I_R
ecv.
–S
omet
imes
it w
orks
(acc
ordi
ng to
impl
emen
tatio
n).
12
34
12
34
PE#0
PE#1
5
4
145
MP
I Pro
gram
min
g
MPI
_Sen
d/M
PI_R
ecv
(con
t.)if (my_rank.eq.0) NEIB_ID=1
if (my_rank.eq.1) NEIB_ID=0
… if (my_rank.eq.0) then
call MPI_SEND (NEIB_ID, arg’s)
call MPI_RECV (NEIB_ID, arg’s)
endif
if (my_rank.eq.1) then
call MPI_RECV (NEIB_ID, arg’s)
call MPI_SEND (NEIB_ID, arg’s)
endif
…
•It
wor
ks ..
. but
12
34
12
34
PE#0
PE#1
5
4
146
MP
I Pro
gram
min
g
How
to d
o P2
P C
omm
. ?
•U
sing
“non
-blo
ckin
g” fu
nctio
ns MPI_Isend
&
MPI_Irecv
toge
ther
with
MPI_Waitall
for
sync
hron
izat
ion
•MPI_Sendrecv
is a
lso
avai
labl
e.if (my_rank.eq.0) NEIB_ID=1
if (my_rank.eq.1) NEIB_ID=0
… call MPI_Isend (NEIB_ID, arg’s)
call MPI_Irecv (NEIB_ID, arg’s)
… call MPI_Waitall (for Irecv)
… call MPI_Waitall (for Isend)
MPI_Waitall
for b
oth
of
MPI_Isend/MPI_Irecv i
s po
ssib
le
12
34
12
34
PE#0
PE#1
5
4
147
MP
I Pro
gram
min
g
MPI
_Ise
nd•
Beg
ins
a no
n-bl
ocki
ng s
end
–S
end
the
cont
ents
of s
endi
ng b
uffe
r (st
artin
g fro
m sendbuf
, num
ber o
f mes
sage
s: count
) to
dest
with
tag
. –
Con
tent
s of
sen
ding
buf
fer c
anno
t be
mod
ified
bef
ore
callin
g co
rresp
ondi
ng MPI_Waitall
.
•MPI_Isend
(sendbuf,count,datatype,dest,tag,comm,request)
–sendbuf
choi
ceI
star
ting
addr
ess
of s
endi
ng b
uffe
r–
count
int
Inu
mbe
r of e
lem
ents
in s
endi
ng b
uffe
r–
datatype
MP
I_D
atat
ypeI
data
type
of e
ach
send
ing
buffe
r ele
men
t–
dest
int
Ira
nk o
f des
tinat
ion
–tag
int
Im
essa
ge ta
gTh
is in
tege
r can
be
used
by
the
appl
icat
ion
to d
istin
guis
h m
essa
ges.
Com
mun
icat
ion
occu
rs if
tag’s
of
MPI_Isend
and MPI_Irecv
are
mat
ched
. U
sual
ly ta
g is
set
to b
e “0
” (in
this
cla
ss),
–comm
MP
I_C
ommI
com
mun
icat
or–
request
MP
I_R
eque
stO
com
mun
icat
ion
requ
est a
rray
used
in MPI_Waitall
C14
8M
PI P
rogr
amm
ing
Com
mun
icat
ion
Req
uest
: req
uest
通信
識別
子•
MPI_Isend
(sendbuf,count,datatype,dest,tag,comm,request)
–sendbuf
choi
ceI
star
ting
addr
ess
of s
endi
ng b
uffe
r–
count
int
Inu
mbe
r of e
lem
ents
in s
endi
ng b
uffe
r–
datatype
MP
I_D
atat
ypeI
data
type
of e
ach
send
ing
buffe
r ele
men
t–
dest
int
Ira
nk o
f des
tinat
ion
–tag
int
Im
essa
ge ta
gTh
is in
tege
r can
be
used
by
the
appl
icat
ion
to d
istin
guis
h m
essa
ges.
Com
mun
icat
ion
occu
rs if
tag’s
of
MPI_Isend
and MPI_Irecv
are
mat
ched
. U
sual
ly ta
g is
set
to b
e “0
” (in
this
cla
ss),
–comm
MP
I_C
ommI
com
mun
icat
or–
request
MP
I_R
eque
stO
com
mun
icat
ion
requ
est u
sed
in MPI_Waitall
Siz
e of
the
arra
y is
tota
l num
ber o
f nei
ghbo
ring
proc
esse
s
•Ju
st d
efin
e th
e ar
ray
C
149
MP
I Pro
gram
min
g
MPI
_Ire
cv•
Beg
ins
a no
n-bl
ocki
ng re
ceiv
e –
Rec
eivi
ng th
e co
nten
ts o
f rec
eivi
ng b
uffe
r (st
artin
g fro
m recvbuf
, num
ber o
f mes
sage
s:
count
) fro
m source
with
tag
. –
Con
tent
s of
rece
ivin
g bu
ffer c
anno
t be
used
bef
ore
callin
g co
rresp
ondi
ng MPI_Waitall
.
•MPI_Irecv
(recvbuf,count,datatype,source,tag,comm,request)
–recvbuf
choi
ceI
star
ting
addr
ess
of re
ceiv
ing
buffe
r–
count
int
Inu
mbe
r of e
lem
ents
in re
ceiv
ing
buffe
r–
datatype
MP
I_D
atat
ypeI
data
type
of e
ach
rece
ivin
g bu
ffer e
lem
ent
–source
int
Ira
nk o
f sou
rce
–tag
int
Im
essa
ge ta
gTh
is in
tege
r can
be
used
by
the
appl
icat
ion
to d
istin
guis
h m
essa
ges.
Com
mun
icat
ion
occu
rs if
tag’s
of
MPI_Isend
and MPI_Irecv
are
mat
ched
. U
sual
ly ta
g is
set
to b
e “0
” (in
this
cla
ss),
–comm
MP
I_C
ommI
com
mun
icat
or–
request
MP
I_R
eque
stO
com
mun
icat
ion
requ
est a
rray
used
in MPI_Waitall
C15
0M
PI P
rogr
amm
ing
MPI
_Wai
tall
•MPI_Waitall
bloc
ks u
ntil
all c
omm
’s, a
ssoc
iate
d w
ith request
in th
e ar
ray,
co
mpl
ete.
It is
use
d fo
r syn
chro
nizi
ng MPI_Isend
and MPI_Irecv
in th
is c
lass
.•
At s
endi
ng p
hase
, con
tent
s of
sen
ding
buf
fer c
anno
t be
mod
ified
bef
ore
callin
g co
rresp
ondi
ng MPI_Waitall
. At r
ecei
ving
pha
se, c
onte
nts
of re
ceiv
ing
buffe
r ca
nnot
be
used
bef
ore
callin
g co
rresp
ondi
ng MPI_Waitall
.•
MPI_Isend
and MPI_Irecv
can
be s
ynch
roni
zed
sim
ulta
neou
sly
with
a s
ingl
e MPI_Waitall
if it
is c
onsi
tent
.–
Sam
e request
shou
ld b
e us
ed in
MPI_Isend
and MPI_Irecv
.•
Its o
pera
tion
is s
imila
r to
that
of MPI_Barrier
but, MPI_Waitall
can
not b
e re
plac
ed b
yMPI_Barrier
.–
Pos
sibl
e tro
uble
s us
ing MPI_Barrier
inst
ead
of MPI_Waitall
: Con
tent
s of
request
and
status
are
not u
pdat
ed p
rope
rly, v
ery
slow
ope
ratio
ns e
tc.
•MPI_Waitall
(count,request,status)
–count
int
Inu
mbe
r of p
roce
sses
to b
e sy
nchr
oniz
ed
–request
MP
I_R
eque
stI/O
com
m. r
eque
st u
sed
in MPI_Waitall
(arr
ay s
ize:
count
)–
status
MP
I_S
tatu
sO
arra
y of
sta
tus
obje
cts
MPI_STATUS_SIZE: defined in ‘mpif.h’, ‘mpi.h’C
151
MP
I Pro
gram
min
g
Arr
ay o
f sta
tus
obje
ct:
stat
us状
況オブジェクト配
列
•MPI_Waitall
(count,request,status)
–count
int
Inu
mbe
r of p
roce
sses
to b
e sy
nchr
oniz
ed
–request
MP
I_R
eque
stI/O
com
m. r
eque
st u
sed
in MPI_Waitall
(arr
ay s
ize:
count
)–
status
MP
I_S
tatu
sO
arra
y of
sta
tus
obje
cts
MPI_STATUS_SIZE: defined in ‘mpif.h’, ‘mpi.h’
•Ju
st d
efin
e th
e ar
ray
C
152
MP
I Pro
gram
min
g
MPI
_Sen
drec
v•
MP
I_S
end+
MP
I_R
ecv:
not r
ecom
men
ded,
man
y re
stric
tions
•MPI_Sendrecv
(sendbuf,sendcount,sendtype,dest,sendtag,recvbuf,
recvcount,recvtype,source,recvtag,comm,status)
–sendbuf
choi
ceI
star
ting
addr
ess
of s
endi
ng b
uffe
r–
sendcount
int
Inu
mbe
r of e
lem
ents
in s
endi
ng b
uffe
r–
sendtype
MP
I_D
atat
ypeI
data
type
of e
ach
send
ing
buffe
r ele
men
t–
dest
int
Ira
nk o
f des
tinat
ion
–sendtag
int
Im
essa
ge ta
g fo
r sen
ding
–comm
MP
I_C
omm
Ico
mm
unic
ator
–recvbuf
choi
ceI
star
ting
addr
ess
of re
ceiv
ing
buffe
r–
recvcount
int
Inu
mbe
r of e
lem
ents
in re
ceiv
ing
buffe
r–
recvtype
MP
I_D
atat
ypeI
data
type
of e
ach
rece
ivin
g bu
ffer e
lem
ent
–source
int
Ira
nk o
f sou
rce
–recvtag
int
Im
essa
ge ta
g fo
r rec
eivi
ng–
comm
MP
I_C
omm
Ico
mm
unic
ator
–status
MP
I_S
tatu
sO
arra
y of
sta
tus
obje
cts
MPI_STATUS_SIZE: defined in ‘mpif.h’, ‘mpi.h’
C15
3M
PI P
rogr
amm
ing
Fund
amen
tal M
PI
154
REC
V: re
ceiv
ing
to e
xter
naln
odes
Rec
v. c
ontin
uous
dat
a to
recv
. buf
fer f
rom
nei
ghbo
rs•
MPI_Irecv
(recvbuf,count,datatype,source,tag,comm,request)
recvbuf
choi
ceI
star
ting
addr
ess
of re
ceiv
ing
buffe
rcount
int
Inu
mbe
r of e
lem
ents
in re
ceiv
ing
buffe
rdatatype
MP
I_D
atat
ypeI
data
type
of e
ach
rece
ivin
g bu
ffer e
lem
ent
source
int
Ira
nk o
f sou
rce
12
3
45
67
89
11
10
14
13
15
12
PE#0
78
910
45
612
311
12
PE#1
71
23
10
911
12
56
84 PE#2
34
8
69
10
12
12
5
11
7
PE#3
12
3
45
67
89
11
10
14
13
15
12
PE#0
78
910
45
612
311
12
PE#1
71
23
10
911
12
56
84 PE#2
34
8
69
10
12
12
5
11
7
PE#3
•MPI_Isend
(sendbuf,count,datatype,dest,tag,comm,request)
sendbuf
choi
ceI
star
ting
addr
ess
of s
endi
ng b
uffe
rcount
int
Inu
mbe
r of e
lem
ents
in s
endi
ng b
uffe
rdatatype
MP
I_D
atat
ypeI
data
type
of e
ach
send
ing
buffe
r ele
men
tdest
int
Ira
nk o
f des
tinat
ion
Fund
amen
tal M
PI
155
SEN
D: s
endi
ng fr
om b
ound
ary
node
sSe
nd c
ontin
uous
dat
a to
sen
d bu
ffer o
f nei
ghbo
rs
12
3
45
67
89
11
10
14
13
15
12
PE#0
78
910
45
612
311
12
PE#1
71
23
10
911
12
56
84 PE#2
34
8
69
10
12
12
5
11
7
PE#3
12
3
45
67
89
11
10
14
13
15
12
PE#0
78
910
45
612
311
12
PE#1
71
23
10
911
12
56
84 PE#2
34
8
69
10
12
12
5
11
7
PE#3
Req
uest
, Sta
tus
in C
Lan
guag
eS
peci
al T
YP
E o
f Arra
ys
•MPI_Sendrecv: status
•MPI_Isend: request
•MPI_Irecv: request
•MPI_Waitall: request, status
MPI_Status *StatSend, *StatRecv;
MPI_Request *RequestSend, *RequestRecv;
・・・
StatSend = malloc(sizeof(MPI_Status) * NEIBpetot);
StatRecv = malloc(sizeof(MPI_Status) * NEIBpetot);
RequestSend = malloc(sizeof(MPI_Request) * NEIBpetot);
RequestRecv = malloc(sizeof(MPI_Request) * NEIBpetot);
MPI_Status *Status;
・・・
Status = malloc(sizeof(MPI_Status));
156
MP
I Pro
gram
min
g
157
File
s on
Oak
leaf
-FX
157
MP
I Pro
gram
min
g
Fotran
>$ cd <$O-TOP>
>$ cp /home/z30088/class_eps/F/s2-f.tar .
>$ tar xvf s2-f.tar
C>$ cd <$O-TOP>
>$ cp /home/z30088/class_eps/C/s2-c.tar .
>$ tar xvf s2-c.tar
Confirm Directory
>$ ls
mpi
>$ cd mpi/S2
This directory is called as <$O-S2> in this course.
<$O-S2> = <$O-TOP>/mpi/S2
Ex.1
: Sen
d-R
ecv
a Sc
alar
•E
xcha
nge
VA
L(re
al, 8
-byt
e) b
etw
een
PE
#0 &
PE
#1if (my_rank.eq.0) NEIB= 1
if (my_rank.eq.1) NEIB= 0
call MPI_Isend (VAL
,1,MPI_DOUBLE_PRECISION,NEIB,…,req_send,…)
call MPI_Irecv (VALtemp,1,MPI_DOUBLE_PRECISION,NEIB,…,req_recv,…)
call MPI_Waitall (…,req_recv,stat_recv,…): Recv.buf
VALtemp
can be used
call MPI_Waitall (…,req_send,stat_send,…): Send buf
VAL can
be modified
VAL= VALtemp
if (my_rank.eq.0) NEIB= 1
if (my_rank.eq.1) NEIB= 0
call MPI_Sendrecv (VAL
,1,MPI_DOUBLE_PRECISION,NEIB,… &
VALtemp,1,MPI_DOUBLE_PRECISION,NEIB,…, status,…)
VAL= VALtemp
Nam
e of
recv
. buf
fer c
ould
be
“VAL
”, b
ut n
ot re
com
men
ded.
158
MP
I Pro
gram
min
g
Ex.1
: Sen
d-R
ecv
a Sc
alar
Isen
d/Ire
cv/W
aita
ll
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int neib, MyRank, PeTot;
double VAL, VALx;
MPI_Status *StatSend, *StatRecv;
MPI_Request *RequestSend, *RequestRecv;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &PeTot);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
StatSend = malloc(sizeof(MPI_Status) * 1);
StatRecv = malloc(sizeof(MPI_Status) * 1);
RequestSend = malloc(sizeof(MPI_Request) * 1);
RequestRecv = malloc(sizeof(MPI_Request) * 1);
if(MyRank == 0) {neib= 1; VAL= 10.0;}
if(MyRank == 1) {neib= 0; VAL= 11.0;}
MPI_Isend(&VAL , 1, MPI_DOUBLE, neib, 0, MPI_COMM_WORLD, &RequestSend[0]);
MPI_Irecv(&VALx, 1, MPI_DOUBLE, neib, 0, MPI_COMM_WORLD, &RequestRecv[0]);
MPI_Waitall(1, RequestRecv, StatRecv);
MPI_Waitall(1, RequestSend, StatSend);
VAL=VALx;
MPI_Finalize();
return 0; }
$> cd <$O-S2>
$> mpifccpx –Kfast ex1-1.c
$> pjsub go2.sh
159
MP
I Pro
gram
min
g
Ex.1
: Sen
d-R
ecv
a Sc
alar
Sen
dRec
v
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int neib;
int MyRank, PeTot;
double VAL, VALtemp;
MPI_Status *StatSR;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &PeTot);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
if(MyRank == 0) {neib= 1; VAL= 10.0;}
if(MyRank == 1) {neib= 0; VAL= 11.0;}
StatSR = malloc(sizeof(MPI_Status));
MPI_Sendrecv(&VAL , 1, MPI_DOUBLE, neib, 0,
&VALtemp, 1, MPI_DOUBLE, neib, 0, MPI_COMM_WORLD, StatSR);
VAL=VALtemp;
MPI_Finalize();
return 0;
}
$> cd <$O-S2>
$> mpifccpx –Kfast ex1-2.c
$> pjsub go2.sh
160
MP
I Pro
gram
min
g
Ex.2
: Sen
d-R
ecv
an A
rray
(1/4
)
•E
xcha
nge
VE
C (r
eal,
8-by
te) b
etw
een
PE
#0 &
PE
#1•
PE
#0 to
PE
#1–
PE
#0: s
end
VE
C(1
)-VE
C(1
1) (
leng
th=1
1)–
PE
#1: r
ecv.
as
VE
C(2
6)-V
EC
(36)
(len
gth=
11)
•P
E#1
to P
E#0
–P
E#1
: sen
d V
EC
(1)-V
EC
(25)
(le
ngth
=25)
–P
E#0
: rec
v. a
s V
EC
(12)
-VE
C(3
6) (l
engt
h=25
)
•P
ract
ice:
Dev
elop
a p
rogr
am fo
r thi
s op
erat
ion.
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
3132
3334
3536
PE
#0
PE
#11
23
45
67
89
1011
1213
1415
1617
1819
2021
2223
2425
2627
2829
3031
3233
3435
36
161
MP
I Pro
gram
min
g
Prac
tice:
t1
•In
itial
sta
tus
of V
EC
[:]:
–P
E#0
VE
C[0
-35]
= 10
1,10
2,10
3,~,
135,
136
–P
E#1
VE
C[0
-35]
= 20
1,20
2,20
3,~,
235,
236
•C
onfir
m th
e re
sults
in th
e ne
xt p
age
•U
sing
follo
win
g tw
o fu
nctio
ns:
–M
PI_
Isen
d/Ire
cv/W
aita
ll–
MP
I_S
endr
ecv
162
MP
I Pro
gram
min
g
t1
Estim
ated
Res
ults
0 #BEFORE# 1 101.
0 #BEFORE# 2 102.
0 #BEFORE# 3 103.
0 #BEFORE# 4 104.
0 #BEFORE# 5 105.
0 #BEFORE# 6 106.
0 #BEFORE# 7 107.
0 #BEFORE# 8 108.
0 #BEFORE# 9 109.
0 #BEFORE# 10 110.
0 #BEFORE# 11 111.
0 #BEFORE# 12 112.
0 #BEFORE# 13 113.
0 #BEFORE# 14 114.
0 #BEFORE# 15 115.
0 #BEFORE# 16 116.
0 #BEFORE# 17 117.
0 #BEFORE# 18 118.
0 #BEFORE# 19 119.
0 #BEFORE# 20 120.
0 #BEFORE# 21 121.
0 #BEFORE# 22 122.
0 #BEFORE# 23 123.
0 #BEFORE# 24 124.
0 #BEFORE# 25 125.
0 #BEFORE# 26 126.
0 #BEFORE# 27 127.
0 #BEFORE# 28 128.
0 #BEFORE# 29 129.
0 #BEFORE# 30 130.
0 #BEFORE# 31 131.
0 #BEFORE# 32 132.
0 #BEFORE# 33 133.
0 #BEFORE# 34 134.
0 #BEFORE# 35 135.
0 #BEFORE# 36 136.
0 #AFTER # 1 101.
0 #AFTER # 2 102.
0 #AFTER # 3 103.
0 #AFTER # 4 104.
0 #AFTER # 5 105.
0 #AFTER # 6 106.
0 #AFTER # 7 107.
0 #AFTER # 8 108.
0 #AFTER # 9 109.
0 #AFTER # 10 110.
0 #AFTER # 11 111.
0 #AFTER # 12 201.
0 #AFTER # 13 202.
0 #AFTER # 14 203.
0 #AFTER # 15 204.
0 #AFTER # 16 205.
0 #AFTER # 17 206.
0 #AFTER # 18 207.
0 #AFTER # 19 208.
0 #AFTER # 20 209.
0 #AFTER # 21 210.
0 #AFTER # 22 211.
0 #AFTER # 23 212.
0 #AFTER # 24 213.
0 #AFTER # 25 214.
0 #AFTER # 26 215.
0 #AFTER # 27 216.
0 #AFTER # 28 217.
0 #AFTER # 29 218.
0 #AFTER # 30 219.
0 #AFTER # 31 220.
0 #AFTER # 32 221.
0 #AFTER # 33 222.
0 #AFTER # 34 223.
0 #AFTER # 35 224.
0 #AFTER # 36 225.
1 #BEFORE# 1 201.
1 #BEFORE# 2 202.
1 #BEFORE# 3 203.
1 #BEFORE# 4 204.
1 #BEFORE# 5 205.
1 #BEFORE# 6 206.
1 #BEFORE# 7 207.
1 #BEFORE# 8 208.
1 #BEFORE# 9 209.
1 #BEFORE# 10 210.
1 #BEFORE# 11 211.
1 #BEFORE# 12 212.
1 #BEFORE# 13 213.
1 #BEFORE# 14 214.
1 #BEFORE# 15 215.
1 #BEFORE# 16 216.
1 #BEFORE# 17 217.
1 #BEFORE# 18 218.
1 #BEFORE# 19 219.
1 #BEFORE# 20 220.
1 #BEFORE# 21 221.
1 #BEFORE# 22 222.
1 #BEFORE# 23 223.
1 #BEFORE# 24 224.
1 #BEFORE# 25 225.
1 #BEFORE# 26 226.
1 #BEFORE# 27 227.
1 #BEFORE# 28 228.
1 #BEFORE# 29 229.
1 #BEFORE# 30 230.
1 #BEFORE# 31 231.
1 #BEFORE# 32 232.
1 #BEFORE# 33 233.
1 #BEFORE# 34 234.
1 #BEFORE# 35 235.
1 #BEFORE# 36 236.
1 #AFTER # 1 201.
1 #AFTER # 2 202.
1 #AFTER # 3 203.
1 #AFTER # 4 204.
1 #AFTER # 5 205.
1 #AFTER # 6 206.
1 #AFTER # 7 207.
1 #AFTER # 8 208.
1 #AFTER # 9 209.
1 #AFTER # 10 210.
1 #AFTER # 11 211.
1 #AFTER # 12 212.
1 #AFTER # 13 213.
1 #AFTER # 14 214.
1 #AFTER # 15 215.
1 #AFTER # 16 216.
1 #AFTER # 17 217.
1 #AFTER # 18 218.
1 #AFTER # 19 219.
1 #AFTER # 20 220.
1 #AFTER # 21 221.
1 #AFTER # 22 222.
1 #AFTER # 23 223.
1 #AFTER # 24 224.
1 #AFTER # 25 225.
1 #AFTER # 26 101.
1 #AFTER # 27 102.
1 #AFTER # 28 103.
1 #AFTER # 29 104.
1 #AFTER # 30 105.
1 #AFTER # 31 106.
1 #AFTER # 32 107.
1 #AFTER # 33 108.
1 #AFTER # 34 109.
1 #AFTER # 35 110.
1 #AFTER # 36 111.
163
MP
I Pro
gram
min
g
t1
Ex.2
: Sen
d-R
ecv
an A
rray
(2/4
)if (my_rank.eq.0) then
call MPI_Isend (VEC( 1),11,MPI_DOUBLE_PRECISION,1,…,req_send,…)
call MPI_Irecv (VEC(12),25,MPI_DOUBLE_PRECISION,1,…,req_recv,…)
endif
if (my_rank.eq.1) then
call MPI_Isend (VEC( 1),25,MPI_DOUBLE_PRECISION,0,…,req_send,…)
call MPI_Irecv (VEC(26),11,MPI_DOUBLE_PRECISION,0,…,req_recv,…)
endif
call MPI_Waitall (…,req_recv,stat_recv,…)
call MPI_Waitall (…,req_send,stat_send,…)
It w
orks
, but
com
plic
ated
ope
ratio
ns.
Not
look
s lik
e S
PM
D.
Not
por
tabl
e.
164
MP
I Pro
gram
min
g
t1
Ex.2
: Sen
d-R
ecv
an A
rray
(3/4
)if (my_rank.eq.0) then
NEIB= 1
start_send= 1
length_send= 11
start_recv= length_send + 1
length_recv= 25
endif
if (my_rank.eq.1) then
NEIB= 0
start_send= 1
length_send= 25
start_recv= length_send + 1
length_recv= 11
endif
call MPI_Isend &
(VEC(start_send),length_send,MPI_DOUBLE_PRECISION,NEIB,…,req_send,…)
call MPI_Irecv &
(VEC(start_recv),length_recv,MPI_DOUBLE_PRECISION,NEIB,…,req_recv,…)
call MPI_Waitall (…,req_recv,stat_recv,…)
call MPI_Waitall (…,req_send,stat_send,…)
This
is “S
MP
D” !
!
165
MP
I Pro
gram
min
g
t1
Ex.2
: Sen
d-R
ecv
an A
rray
(4/4
)if (my_rank.eq.0) then
NEIB= 1
start_send= 1
length_send= 11
start_recv= length_send + 1
length_recv= 25
endif
if (my_rank.eq.1) then
NEIB= 0
start_send= 1
length_send= 25
start_recv= length_send + 1
length_recv= 11
endif
call MPI_Sendrecv &
(VEC(start_send),length_send,MPI_DOUBLE_PRECISION,NEIB,… &
VEC(start_recv),length_recv,MPI_DOUBLE_PRECISION,NEIB,…, status,…)
166
MP
I Pro
gram
min
g
t1
Not
ice:
Sen
d/R
ecv
Arr
ays
#PE0send:
VEC(start_send)~
VEC(start_send+length_send-1)
#PE1recv:
VEC(start_recv)~
VEC(start_recv+length_recv-1)
#PE1send:
VEC(start_send)~
VEC(start_send+length_send-1)
#PE0recv:
VEC(start_recv)~
VEC(start_recv+length_recv-1)
•“le
ngth
_sen
d” o
f sen
ding
pro
cess
mus
t be
equa
l to
“leng
th_r
ecv”
of r
ecei
ving
pro
cess
.–
PE
#0 to
PE
#1, P
E#1
to P
E#0
•“s
endb
uf” a
nd “r
ecvb
uf”:
diffe
rent
add
ress
167
MP
I Pro
gram
min
g
t1
Peer
-to-P
eer C
omm
unic
atio
n
•W
hat i
s P
2P C
omm
unic
atio
n ?
•2D
Pro
blem
, Gen
eral
ized
Com
mun
icat
ion
Tabl
e–
2D F
DM
–P
robl
em S
ettin
g–
Dis
tribu
ted
Loca
l Dat
a an
d C
omm
unic
atio
n Ta
ble
–Im
plem
enta
tion
•R
epor
t S2
168
MP
I Pro
gram
min
g
2D F
DM
(1/5
)E
ntire
Mes
h
169
MP
I Pro
gram
min
g
2D F
DM
(5-p
oint
, cen
tral
diff
eren
ce)
fy
x
2
2
2
2
xx
W
C E
N S
y y
CS
CN
WC
Ef
yx
22
22
170
MP
I Pro
gram
min
g
Dec
ompo
se in
to 4
dom
ains
56
78
1314
1516
2122
2324
2930
3132
3334
3536
4142
4344
4950
5152
5758
5960
3738
3940
4546
4748
5354
5556
6162
6364
12
34
910
1112
1718
1920
2526
2728
12
34
910
1112
1718
1920
2526
2728
171
MP
I Pro
gram
min
g
3334
3536
4142
4344
4950
5152
5758
5960
12
34
910
1112
1718
1920
2526
2728
12
34
910
1112
1718
1920
2526
2728
56
78
1314
1516
2122
2324
2930
3132
4 do
mai
ns: G
loba
l ID
PE
#0P
E#1
PE
#2P
E#3
3738
3940
4546
4748
5354
5556
6162
6364
172
MP
I Pro
gram
min
g
4 do
mai
ns: L
ocal
ID
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
PE
#0P
E#1
173
MP
I Pro
gram
min
g
PE
#2P
E#3
Exte
rnal
Poi
nts:
Ove
rlapp
ed R
egio
n
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
11
1314
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
56
7
910
1112
1314
1516
PE
#0P
E#1
xx
W
C E
N S
y y
12
1516
13
4
12
15164
3
8
174
MP
I Pro
gram
min
g
PE
#2P
E#3
Exte
rnal
Poi
nts:
Ove
rlapp
ed R
egio
n
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
23
4
67
8
1011
12
1314
1516
12
34
910
1112
1718
1920
2526
2728 4
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
67
8
1011
12
1415
16
PE
#0P
E#1
12
3
15912
34
5913
175
MP
I Pro
gram
min
g
PE
#2P
E#3
Loca
l ID
of E
xter
nal P
oint
s ?
12
34
910
1112
1718
1920
2526
2728
12
3
56
7
910
11
12
34
910
1112
1718
1920
2526
2728
23
4
67
8
1011
12
910
1112
1718
1920
2526
2728
12
34
56
7
910
11
1314
15
12
34
910
1112
1718
1920
2526
2728
67
8
1011
12
1415
16
PE
#0P
E#1
? ? ? ?
??
??
? ? ? ?
? ? ? ?
? ? ? ?
??
??
??
??
??
??
4812
1314
1516
1591314
1516
12
3481216
12
34
5913
176
MP
I Pro
gram
min
g
PE
#2P
E#3
Ove
rlapp
ed R
egio
n
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
910
1112
1718
1920
2526
2728
12
34
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
PE
#0P
E#1
? ? ? ?
??
??
? ? ? ?
? ? ? ?
? ? ? ?
??
??
??
??
??
??
177
MP
I Pro
gram
min
g
PE
#2P
E#3
Ove
rlapp
ed R
egio
n
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
910
1112
1718
1920
2526
2728
12
34
12
34
56
78
910
1112
1314
1516
12
34
910
1112
1718
1920
2526
2728
12
34
56
78
910
1112
1314
1516
PE
#0P
E#1
? ? ? ?
??
??
? ? ? ?
? ? ? ?
? ? ? ?
??
??
??
??
??
??
178
MP
I Pro
gram
min
g
PE
#2P
E#3
Peer
-to-P
eer C
omm
unic
atio
n
•W
hat i
s P
2P C
omm
unic
atio
n ?
•2D
Pro
blem
, Gen
eral
ized
Com
mun
icat
ion
Tabl
e–
2D F
DM
–P
robl
em S
ettin
g–
Dis
tribu
ted
Loca
l Dat
a an
d C
omm
unic
atio
n Ta
ble
–Im
plem
enta
tion
•R
epor
t S2
179
MP
I Pro
gram
min
g
Prob
lem
Set
ting:
2D
FD
M
•2D
regi
on w
ith 6
4 m
eshe
s (8
x8)
•E
ach
mes
h ha
s gl
obal
ID
from
1 to
64
–In
this
exa
mpl
e, th
is
glob
al ID
is c
onsi
dere
d as
dep
ende
nt v
aria
ble,
su
ch a
s te
mpe
ratu
re,
pres
sure
etc
.–
Som
ethi
ng li
ke c
ompu
ted
resu
lts
5758
5960
6162
6364
4950
5152
5354
5556
4142
4344
4546
4748
3334
3536
3738
3940
2526
2728
2930
3132
1718
1920
2122
2324
910
1112
1314
1516
12
34
56
78
180
MP
I Pro
gram
min
g
Prob
lem
Set
ting:
Dis
trib
uted
Loc
al D
ata
•4
sub-
dom
ains
. •
Info
. of e
xter
nal p
oint
s (g
loba
l ID
of m
esh)
is
rece
ived
from
nei
ghbo
rs.
–P
E#0
rece
ives
□
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
3334
3536
2526
2728
61 53 45 37 29 21 13 5
60 52 44 36 28 20 12 4
2930
3132
3738
3940
181
MP
I Pro
gram
min
g
Ope
ratio
ns o
f 2D
FD
M
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
xx
W
C E
N S
y yfy
x
2
2
2
2
CS
CN
WC
Ef
yx
2
2
22
182
MP
I Pro
gram
min
g
Ope
ratio
ns o
f 2D
FD
M
5758
5960
4950
5152
4142
4344
3334
35
6162
6364
5354
5556
4546
4748
3738
3940
2526
1719
121
34
3031
3221
2223
2413
1415
165
67
8
189
1011
2
3627
28 2029
xx
W
C E
N S
y yfy
x
2
2
2
2
CS
CN
WC
Ef
yx
2
2
22
183
MP
I Pro
gram
min
g
Com
puta
tion
(1/3
)
•O
n ea
ch P
E, i
nfo.
of i
nter
nalp
ts (i
=1-N
(=16
)) a
re re
ad fr
om
dist
ribut
ed lo
cal d
ata,
info
. of b
ound
ary
pts
are
sent
to
neig
hbor
s, a
nd th
ey a
re re
ceiv
ed a
s in
fo. o
f ext
erna
lpts
.
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
184
MP
I Pro
gram
min
g
Com
puta
tion
(2/3
): B
efor
e Se
nd/R
ecv
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
1: 3
39:
49
17: ?
2: 3
410
: 50
18: ?
3: 3
511
: 51
19: ?
4: 3
612
: 52
20: ?
5: 4
113
: 57
21: ?
6: 4
214
: 58
22: ?
7: 4
315
: 59
23: ?
8: 4
416
: 60
24: ?
1: 3
79:
53
17: ?
2: 3
810
: 54
18: ?
3: 3
911
: 55
19: ?
4: 4
012
: 56
20: ?
5: 4
513
: 61
21: ?
6: 4
614
: 62
22: ?
7: 4
715
: 63
23: ?
8: 4
816
: 64
24: ?
1:
19:
17
17: ?
2:
210
: 18
18: ?
3:
311
: 19
19: ?
4:
412
: 20
20: ?
5:
913
: 25
21: ?
6: 1
014
: 26
22: ?
7: 1
115
: 27
23: ?
8: 1
216
: 28
24: ?
1:
59:
21
17: ?
2:
610
: 22
18: ?
3:
711
: 23
19: ?
4:
812
: 24
20: ?
5: 1
313
: 29
21: ?
6: 1
414
: 30
22: ?
7: 1
515
: 31
23: ?
8: 1
616
: 32
24: ?
3334
3536
2526
2728
61 53 45 37 29 21 13 5
60 52 44 36 28 20 12 4
2930
3132
3738
3940
185
MP
I Pro
gram
min
g
Com
puta
tion
(2/3
): B
efor
e Se
nd/R
ecv
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
1: 3
39:
49
17: ?
2: 3
410
: 50
18: ?
3: 3
511
: 51
19: ?
4: 3
612
: 52
20: ?
5: 4
113
: 57
21: ?
6: 4
214
: 58
22: ?
7: 4
315
: 59
23: ?
8: 4
416
: 60
24: ?
1: 3
79:
53
17: ?
2: 3
810
: 54
18: ?
3: 3
911
: 55
19: ?
4: 4
012
: 56
20: ?
5: 4
513
: 61
21: ?
6: 4
614
: 62
22: ?
7: 4
715
: 63
23: ?
8: 4
816
: 64
24: ?
1:
19:
17
17: ?
2:
210
: 18
18: ?
3:
311
: 19
19: ?
4:
412
: 20
20: ?
5:
913
: 25
21: ?
6: 1
014
: 26
22: ?
7: 1
115
: 27
23: ?
8: 1
216
: 28
24: ?
1:
59:
21
17: ?
2:
610
: 22
18: ?
3:
711
: 23
19: ?
4:
812
: 24
20: ?
5: 1
313
: 29
21: ?
6: 1
414
: 30
22: ?
7: 1
515
: 31
23: ?
8: 1
616
: 32
24: ?
3334
3536
2526
2728
61 53 45 37 29 21 13 5
60 52 44 36 28 20 12 4
2930
3132
3738
3940
186
MP
I Pro
gram
min
g
Com
puta
tion
(3/3
): A
fter S
end/
Rec
v
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
PE#0
PE#1
PE#2
PE#3
1: 3
39:
49
17: 3
72:
34
10: 5
018
: 45
3: 3
511
: 51
19: 5
34:
36
12: 5
220
: 61
5: 4
113
: 57
21: 2
56:
42
14: 5
822
: 26
7: 4
315
: 59
23: 2
78:
44
16: 6
024
: 28
1: 3
79:
53
17: 3
62:
38
10: 5
418
: 44
3: 3
911
: 55
19: 5
24:
40
12: 5
620
: 60
5: 4
513
: 61
21: 2
96:
46
14: 6
222
: 30
7: 4
715
: 63
23: 3
18:
48
16: 6
424
: 32
1:
19:
17
17:
52:
2
10: 1
818
: 14
3:
311
: 19
19: 2
14:
4
12: 2
020
: 29
5:
913
: 25
21: 3
36:
10
14: 2
622
: 34
7: 1
115
: 27
23: 3
58:
12
16: 2
824
: 36
1:
59:
21
17:
42:
6
10: 2
218
: 12
3:
711
: 23
19: 2
04:
8
12: 2
420
: 28
5: 1
313
: 29
21: 3
76:
14
14: 3
022
: 38
7: 1
515
: 31
23: 3
98:
16
16: 3
224
: 40
3334
3536
2526
2728
61 53 45 37 29 21 13 5
60 52 44 36 28 20 12 4
2930
3132
3738
3940
5758
5960
4950
5152
4142
4344
3334
3536
2930
3132
2122
2324
1314
1516
56
78
187
MP
I Pro
gram
min
g
Peer
-to-P
eer C
omm
unic
atio
n
•W
hat i
s P
2P C
omm
unic
atio
n ?
•2D
Pro
blem
, Gen
eral
ized
Com
mun
icat
ion
Tabl
e–
2D F
DM
–P
robl
em S
ettin
g–
Dis
tribu
ted
Loca
l Dat
a an
d C
omm
unic
atio
n Ta
ble
–Im
plem
enta
tion
•R
epor
t S2
188
MP
I Pro
gram
min
g
Ove
rvie
w o
f Dis
trib
uted
Loc
al D
ata
Exa
mpl
e on
PE
#0
Valu
e at
eac
h m
esh
(= G
loba
l ID
) Lo
cal I
D
2526
2728
1718
1920
910
1112
12
34
PE#0
PE#1
PE#2
1314
1516
910
1112
56
78
12
34
PE#0
PE#1
PE#2
189
MP
I Pro
gram
min
g
SPM
D・・・
PE
#0
“a.
out”
“sq
m.0
”
PE
#1
“a.
out”
“sq
m.1
”
PE
#2
“a.
out”
“sq
m.2
”
PE
#3
“a.
out”
“sq
m.3
”
“sq
.0”
“sq
.1”
“sq
.2”
“sq
.3”
Dis
t. Lo
cal D
ata
Set
s (N
eigh
bors
,C
omm
. Tab
les)
Dis
t. Lo
cal D
ata
Set
s (G
loba
l ID
of
Inte
rnal
Poi
nts)
Geo
met
ry
Res
ults
190
MP
I Pro
gram
min
g
2D F
DM
: PE#
0In
form
atio
n at
eac
h do
mai
n (1
/4)
12
34
56
78
910
1112
1314
1516
191
MP
I Pro
gram
min
g
Inte
rnal
Poi
nts
Mes
hes
orig
inal
ly a
ssig
ned
to th
e do
mai
n
2D F
DM
: PE#
0In
form
atio
n at
eac
h do
mai
n (2
/4)
PE#2
PE#1
12
34
56
78
910
1112
1314
1516
●●
●●
● ● ● ●
Ext
erna
l Poi
nts
Mes
hes
orig
inal
ly a
ssig
ned
to d
iffer
ent d
omai
n, b
ut
requ
ired
for c
ompu
tatio
n of
mes
hes
in th
e do
mai
n (m
eshe
s in
ove
rlapp
ed re
gion
s)
・S
leev
es・H
alo
192
MP
I Pro
gram
min
g
Inte
rnal
Poi
nts
Mes
hes
orig
inal
ly a
ssig
ned
to th
e do
mai
n
2D F
DM
: PE#
0In
form
atio
n at
eac
h do
mai
n (3
/4)
PE#1
12
34
56
78
910
1112
1314
1516
●●
●●
● ● ● ●
193
MP
I Pro
gram
min
g
Inte
rnal
Poi
nts
Mes
hes
orig
inal
ly a
ssig
ned
to th
e do
mai
n
Ext
erna
l Poi
nts
Mes
hes
orig
inal
ly a
ssig
ned
to d
iffer
ent d
omai
n, b
ut
requ
ired
for c
ompu
tatio
n of
mes
hes
in th
e do
mai
n (m
eshe
s in
ove
rlapp
ed re
gion
s)
Bou
ndar
y P
oint
sIn
tern
al p
oint
s, w
hich
are
als
o ex
tern
al p
oint
s of
ot
her d
omai
ns (u
sed
in c
ompu
tatio
ns o
f mes
hes
in
othe
r dom
ains
)
PE#2
2D F
DM
: PE#
0In
form
atio
n at
eac
h do
mai
n (4
/4)
Inte
rnal
Poi
nts
Mes
hes
orig
inal
ly a
ssig
ned
to th
e do
mai
n
Ext
erna
l Poi
nts
Mes
hes
orig
inal
ly a
ssig
ned
to d
iffer
ent d
omai
n, b
ut
requ
ired
for c
ompu
tatio
n of
mes
hes
in th
e do
mai
n (m
eshe
s in
ove
rlapp
ed re
gion
s)
Bou
ndar
y P
oint
sIn
tern
al p
oint
s, w
hich
are
als
o ex
tern
al p
oint
s of
ot
her d
omai
ns (u
sed
in c
ompu
tatio
ns o
f mes
hes
in
othe
r dom
ains
)
Rel
atio
nshi
ps b
etw
een
Dom
ains
Com
mun
icat
ion
Tabl
e: E
xter
nal/B
ound
ary
Poi
nts
Nei
ghbo
rs
PE#1
12
34
56
78
910
1112
1314
1516
●●
●●
● ● ● ●
194
MP
I Pro
gram
min
g
PE#2
Des
crip
tion
of D
istr
ibut
ed L
ocal
Dat
a
•In
tern
al/E
xter
nal P
oint
s–
Num
berin
g: S
tarti
ng fr
om in
tern
alpt
s,
then
ext
erna
lpts
afte
r tha
t•
Nei
ghbo
rs–
Sha
res
over
lapp
ed m
eshe
s–
Num
ber a
nd ID
of n
eigh
bors
•E
xter
nal P
oint
s–
From
whe
re, h
ow m
any,
and
whi
ch
exte
rnal
poi
nts
are
rece
ived
/impo
rted
? •
Bou
ndar
y P
oint
s–
To w
here
, how
man
y an
d w
hich
bo
unda
ry p
oint
s ar
e se
nt/e
xpor
ted
?
12
34
17
56
78
18
910
1112
19
1314
1516
20
2122
2324
195
MP
I Pro
gram
min
g
Ove
rvie
w o
f Dis
trib
uted
Loc
al D
ata
Exa
mpl
e on
PE
#0
2526
2728
1718
1920
910
1112
12
34
PE#0
PE#1
PE#2
2122
2324
1314
1516
910
1112
56
78
12
34
20 19 18 17PE
#0PE
#1
PE#2
196
MP
I Pro
gram
min
g
Valu
e at
eac
h m
esh
(= G
loba
l ID
) Lo
cal I
D
Gen
eral
ized
Com
m. T
able
: Sen
d•
Nei
ghbo
rs–
Nei
bPE
Tot,
Nei
bPE
[nei
b]•
Mes
sage
siz
e fo
r eac
h ne
ighb
or–
expo
rt_in
dex[
neib
], ne
ib=
0, N
eibP
ETo
t-1•
ID o
f bou
ndar
ypo
ints
–ex
port_
item
[k],
k= 0
, exp
ort_
inde
x[N
eibP
ETo
t]-1
•M
essa
ges
to e
ach
neig
hbor
–S
endB
uf[k
], k=
0, e
xpor
t_in
dex[
Nei
bPE
Tot]-
1
C
197
MP
I Pro
gram
min
g
SEN
D: M
PI_I
send
/Irec
v/W
aita
llne
ib#0
Send
Buf
neib
#1ne
ib#2
neib
#3
BU
Flen
gth_
eB
UFl
engt
h_e
BU
Flen
gth_
eB
UFl
engt
h_e
expo
rt_i
ndex
[0]
expo
rt_i
ndex
[1]
expo
rt_i
ndex
[2]
expo
rt_i
ndex
[3]
expo
rt_i
ndex
[4]
for (neib=0; neib<NeibPETot;neib++){
for (k=export_index[neib];k<export_index[neib+1];k++){
kk= export_item[k];
SendBuf[k]= VAL[kk];
}} for (neib=0; neib<NeibPETot; neib++){
tag= 0;
iS_e= export_index[neib];
iE_e= export_index[neib+1];
BUFlength_e= iE_e -iS_e
ierr= MPI_Isend
(&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],0,
MPI_COMM_WORLD, &ReqSend[neib])
} MPI_Waitall(NeibPETot, ReqSend, StatSend);
Cop
ied
to s
endi
ng b
uffe
rs
expo
rt_i
tem
(exp
ort_
inde
x[ne
ib]:e
xpor
t_in
dex[
neib
+1]-1
) ar
e se
nt to
nei
b-th
neig
hbor
C19
8M
PI P
rogr
amm
ing
Gen
eral
ized
Com
m. T
able
: Rec
eive
•N
eigh
bors
–N
eibP
ETo
t ,N
eibP
E[n
eib]
•M
essa
ge s
ize
for e
ach
neig
hbor
–im
port_
inde
x[ne
ib],
neib
= 0,
Nei
bPE
Tot-1
•ID
of e
xter
nalp
oint
s–
impo
rt_ite
m[k
], k=
0, i
mpo
rt_in
dex[
Nei
bPE
Tot]-
1•
Mes
sage
s fro
m e
ach
neig
hbor
–R
ecvB
uf[k
], k=
0, i
mpo
rt_in
dex[
Nei
bPE
Tot]-
1
199
MP
I Pro
gram
min
g
C
REC
V: M
PI_I
send
/Irec
v/W
aita
ll
neib
#0R
ecvB
ufne
ib#1
neib
#2ne
ib#3
BU
Flen
gth_
iB
UFl
engt
h_i
BU
Flen
gth_
iB
UFl
engt
h_i
for (neib=0; neib<NeibPETot; neib++){
tag= 0;
iS_i= import_index[neib];
iE_i= import_index[neib+1];
BUFlength_i= iE_i -iS_i
ierr= MPI_Irecv
(&RecvBuf[iS_i], BUFlength_i, MPI_DOUBLE, NeibPE[neib],0,
MPI_COMM_WORLD, &ReqRecv[neib])
} MPI_Waitall(NeibPETot, ReqRecv, StatRecv);
for (neib=0; neib<NeibPETot;neib++){
for (k=import_index[neib];k<import_index[neib+1];k++){
kk= import_item[k];
VAL[kk]= RecvBuf[k];
}} im
port
_ind
ex[0
]im
port
_ind
ex[1
]im
port
_ind
ex[2
]im
port
_ind
ex[3
]im
port
_ind
ex[4
]C20
0M
PI P
rogr
amm
ing
impo
rt_i
tem
(impo
rt_i
ndex
[nei
b]:im
port
_ind
ex[n
eib+
1]-1
) ar
e re
ceiv
ed fr
om n
eib-
thne
ighb
or
Cop
ied
from
rece
ivin
g bu
ffer
Rel
atio
nshi
p SE
ND
/REC
V
do neib= 1, NEIBPETOT
iS_i= import_index(neib-1) + 1
iE_i= import_index(neib )
BUFlength_i= iE_i + 1 -iS_i
call MPI_IRECV &
& (RECVbuf(iS_i), BUFlength_i,MPI_INTEGER, NEIBPE(neib),0,&
& MPI_COMM_WORLD, request_recv(neib),ierr)
enddo
do neib= 1, NEIBPETOT
iS_e= export_index(neib-1) + 1
iE_e= export_index(neib )
BUFlength_e= iE_e + 1 -iS_e
call MPI_ISEND
&& (SENDbuf(iS_e), BUFlength_e, MPI_INTEGER, NEIBPE(neib),0,&
& MPI_COMM_WORLD, request_send(neib),ierr)
enddo
•C
onsi
sten
cy o
f ID
’s o
f sou
rces
/des
tinat
ions
, siz
e an
d co
nten
ts o
f mes
sage
s !
•C
omm
unic
atio
n oc
curs
whe
n N
EIB
PE
(nei
b) m
atch
es
201
MP
I Pro
gram
min
g
Rel
atio
nshi
p SE
ND
/REC
V (#
0 to
#3)
•C
onsi
sten
cy o
f ID
’s o
f sou
rces
/des
tinat
ions
, siz
e an
d co
nten
ts o
f mes
sage
s !
•C
omm
unic
atio
n oc
curs
whe
n N
EIB
PE
(nei
b) m
atch
es
Send
#0
Rec
v. #
3
#1 #5 #9
#1 #10
#0
#3
NEI
BPE
(:)=1
,3,5
,9N
EIB
PE(:)
=1,0
,10
202
MP
I Pro
gram
min
g
Gen
eral
ized
Com
m. T
able
(1/6
)
12
34
17
56
78
18
910
1112
19
1314
1516
20
2122
2324
PE#2
PE#1
#NEIBPEtot
2 #NEIBPE
12
#NODE
24 16
#IMPORT_index
4 8
#IMPORT_items
17
18
19
20
21
22
23
24
#EXPORT_index
4 8
#EXPORT_items
4 8 12
16
13
14
15
16
203
MP
I Pro
gram
min
g
Gen
eral
ized
Com
m. T
able
(2/6
)
12
34
17
56
78
18
910
1112
19
1314
1516
20
2122
2324
PE#1
#NEIBPEtot
Num
ber o
f nei
ghbo
rs2 #NEIBPE
ID o
f nei
ghbo
rs1
2#NODE
24 16
Ext
/Int P
ts, I
nt P
ts#IMPORT_index
4 8
#IMPORT_items
17
18
19
20
21
22
23
24
#EXPORT_index
4 8
#EXPORT_items
4 8 12
16
13
14
15
16
204
MP
I Pro
gram
min
g
PE#2
#NEIBPEtot
2 #NEIBPE
12
#NODE
24 16
#IMPORT_index
4 8
#IMPORT_items
17
18
19
20
21
22
23
24
#EXPORT_index
4 8
#EXPORT_items
4 8 12
16
13
14
15
16
Gen
eral
ized
Com
m. T
able
(3/6
)
Four
ext
pts
(1st-4
thite
ms)
are
im
porte
d fro
m 1
stne
ighb
or
(PE
#1),
and
four
(5th-8
thite
ms)
ar
e fro
m 2
ndne
ighb
or (P
E#2
).
12
34
17
56
78
18
910
1112
19
1314
1516
20
2122
2324
PE#1
205
MP
I Pro
gram
min
g
PE#2
#NEIBPEtot
2 #NEIBPE
12
#NODE
24 16
#IMPORT_index
4 8
#IMPORT_items
17
18
19
20
21
22
23
24
#EXPORT_index
4 8
#EXPORT_items
4 8 12
16
13
14
15
16
Gen
eral
ized
Com
m. T
able
(4/6
)
impo
rted
from
1st
Nei
ghbo
r (P
E#1
) (1s
t -4th
item
s)
12
34
17
56
78
18
910
1112
19
1314
1516
20
2122
2324
PE#1
206
MP
I Pro
gram
min
g
impo
rted
from
2nd
Nei
ghbo
r (P
E#2
) (5t
h -8t
hite
ms)
PE#2
#NEIBPEtot
2 #NEIBPE
12
#NODE
24 16
#IMPORT_index
4 8
#IMPORT_items
17
18
19
20
21
22
23
24
#EXPORT_index
4 8
#EXPORT_items
4 8 12
16
13
14
15
16
Gen
eral
ized
Com
m. T
able
(5/6
)
12
34
17
56
78
18
910
1112
19
1314
1516
20
2122
2324
PE#1
207
MP
I Pro
gram
min
g
Four
bou
ndar
y pt
s (1
st-4
th
item
s) a
re e
xpor
ted
to 1
st
neig
hbor
(PE
#1),
and
four
(5th-
8th
item
s) a
re to
2nd
neig
hbor
(P
E#2
).
PE#2
#NEIBPEtot
2 #NEIBPE
12
#NODE
24 16
#IMPORT_index
4 8
#IMPORT_items
17
18
19
20
21
22
23
24
#EXPORT_index
4 8
#EXPORT_items
4 8 12
16
13
14
15
16
Gen
eral
ized
Com
m. T
able
(6/6
)
12
34
17
56
78
18
910
1112
19
1314
1516
20
2122
2324
PE#1
208
MP
I Pro
gram
min
g
expo
rted
to 1
stN
eigh
bor
(PE
#1) (
1st -4
thite
ms)
expo
rted
to 2
ndN
eigh
bor
(PE
#2) (
5th -
8th
item
s)
PE#2
Gen
eral
ized
Com
m. T
able
(6/6
)
12
34
17
56
78
18
910
1112
19
1314
1516
20
2122
2324
PE#1
An
exte
rnal
poi
nt is
onl
y se
nt
from
its
orig
inal
dom
ain.
A bo
unda
ry p
oint
cou
ld b
e re
ferre
d fro
m m
ore
than
one
do
mai
n, a
nd s
ent t
o m
ultip
le
dom
ains
(e.g
. 16t
hm
esh)
.
209
MP
I Pro
gram
min
g
PE#2
Not
ice:
Sen
d/R
ecv
Arr
ays
#PE0send:
VEC(start_send)~
VEC(start_send+length_send-1)
#PE1recv:
VEC(start_recv)~
VEC(start_recv+length_recv-1)
#PE1send:
VEC(start_send)~
VEC(start_send+length_send-1)
#PE0recv:
VEC(start_recv)~
VEC(start_recv+length_recv-1)
•“le
ngth
_sen
d” o
f sen
ding
pro
cess
mus
t be
equa
l to
“leng
th_r
ecv”
of r
ecei
ving
pro
cess
.–
PE
#0 to
PE
#1, P
E#1
to P
E#0
•“s
endb
uf” a
nd “r
ecvb
uf”:
diffe
rent
add
ress
210
MP
I Pro
gram
min
g
Peer
-to-P
eer C
omm
unic
atio
n
•W
hat i
s P
2P C
omm
unic
atio
n ?
•2D
Pro
blem
, Gen
eral
ized
Com
mun
icat
ion
Tabl
e–
2D F
DM
–P
robl
em S
ettin
g–
Dis
tribu
ted
Loca
l Dat
a an
d C
omm
unic
atio
n Ta
ble
–Im
plem
enta
tion
•R
epor
t S2
211
MP
I Pro
gram
min
g
Sam
ple
Prog
ram
for 2
D F
DM
212
MP
I Pro
gram
min
g $ cd <$O-S2>
$ mpifrtpx –Kfast sq-sr1.f
$ mpifccpx –Kfast sq-sr1.c
(modify go4.sh for 4 processes)
$ pjsub go4.sh
Exam
ple:
sq-
sr1.
c (1
/6)
Initi
aliz
atio
n#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include "mpi.h"
int main(int argc, char **argv){
int n, np, NeibPeTot, BufLength;
MPI_Status *StatSend, *StatRecv;
MPI_Request *RequestSend, *RequestRecv;
int MyRank, PeTot;
int *val, *SendBuf, *RecvBuf, *NeibPe;
int *ImportIndex, *ExportIndex, *ImportItem, *ExportItem;
char FileName[80], line[80];
int i, nn, neib;
int iStart, iEnd;
FILE *fp;
/*!C +-----------+
!C | INIT. MPI |
!C +-----------+
!C===*/MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &PeTot);
MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
C21
3M
PI P
rogr
amm
ing
/*!C +-----------+
!C | DATA file |
!C +-----------+
!C===*/sprintf(FileName, "sqm.%d", MyRank);
fp = fopen(FileName, "r");
fscanf(fp, "%d", &NeibPeTot);
NeibPe = calloc(NeibPeTot, sizeof(int));
ImportIndex = calloc(1+NeibPeTot, sizeof(int));
ExportIndex = calloc(1+NeibPeTot, sizeof(int));
for(neib=0;neib<NeibPeTot;neib++){
fscanf(fp, "%d", &NeibPe[neib]);
} fscanf(fp, "%d %d", &np, &n);
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ImportIndex[neib]);}
nn = ImportIndex[NeibPeTot];
ImportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ExportIndex[neib]);}
nn = ExportIndex[NeibPeTot];
ExportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}
Exam
ple:
sq-
sr1.
c (2
/6)
Rea
ding
dis
tribu
ted
loca
l dat
a fil
es (s
qm.*
)C
214
MP
I Pro
gram
min
g
/*!C +-----------+
!C | DATA file |
!C +-----------+
!C===*/sprintf(FileName, "sqm.%d", MyRank);
fp = fopen(FileName, "r");
fscanf(fp, "%d", &NeibPeTot);
NeibPe = calloc(NeibPeTot, sizeof(int));
ImportIndex = calloc(1+NeibPeTot, sizeof(int));
ExportIndex = calloc(1+NeibPeTot, sizeof(int));
for(neib=0;neib<NeibPeTot;neib++){
fscanf(fp, "%d", &NeibPe[neib]);
} fscanf(fp, "%d %d", &np, &n);
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ImportIndex[neib]);}
nn = ImportIndex[NeibPeTot];
ImportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ExportIndex[neib]);}
nn = ExportIndex[NeibPeTot];
ExportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}
Exam
ple:
sq-
sr1.
c (2
/6)
Rea
ding
dis
tribu
ted
loca
l dat
a fil
es (s
qm.*
)#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
C21
5M
PI P
rogr
amm
ing
/*!C +-----------+
!C | DATA file |
!C +-----------+
!C===*/sprintf(FileName, "sqm.%d", MyRank);
fp = fopen(FileName, "r");
fscanf(fp, "%d", &NeibPeTot);
NeibPe = calloc(NeibPeTot, sizeof(int));
ImportIndex = calloc(1+NeibPeTot, sizeof(int));
ExportIndex = calloc(1+NeibPeTot, sizeof(int));
for(neib=0;neib<NeibPeTot;neib++){
fscanf(fp, "%d", &NeibPe[neib]);
} fscanf(fp, "%d %d", &np, &n);
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ImportIndex[neib]);}
nn = ImportIndex[NeibPeTot];
ImportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ExportIndex[neib]);}
nn = ExportIndex[NeibPeTot];
ExportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}
Exam
ple:
sq-
sr1.
c (2
/6)
Rea
ding
dis
tribu
ted
loca
l dat
a fil
es (s
qm.*
)#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
npN
umbe
r of a
ll m
eshe
s (in
tern
al +
ext
erna
l)n
Num
ber o
f int
erna
l mes
hes
C21
6M
PI P
rogr
amm
ing
/*!C +-----------+
!C | DATA file |
!C +-----------+
!C===*/sprintf(FileName, "sqm.%d", MyRank);
fp = fopen(FileName, "r");
fscanf(fp, "%d", &NeibPeTot);
NeibPe = calloc(NeibPeTot, sizeof(int));
ImportIndex = calloc(1+NeibPeTot, sizeof(int));
ExportIndex = calloc(1+NeibPeTot, sizeof(int));
for(neib=0;neib<NeibPeTot;neib++){
fscanf(fp, "%d", &NeibPe[neib]);
} fscanf(fp, "%d %d", &np, &n);
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ImportIndex[neib]);}
nn = ImportIndex[NeibPeTot];
ImportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ExportIndex[neib]);}
nn = ExportIndex[NeibPeTot];
ExportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}
Exam
ple:
sq-
sr1.
c (2
/6)
Rea
ding
dis
tribu
ted
loca
l dat
a fil
es (s
qm.*
)#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
C21
7M
PI P
rogr
amm
ing
/*!C +-----------+
!C | DATA file |
!C +-----------+
!C===*/sprintf(FileName, "sqm.%d", MyRank);
fp = fopen(FileName, "r");
fscanf(fp, "%d", &NeibPeTot);
NeibPe = calloc(NeibPeTot, sizeof(int));
ImportIndex = calloc(1+NeibPeTot, sizeof(int));
ExportIndex = calloc(1+NeibPeTot, sizeof(int));
for(neib=0;neib<NeibPeTot;neib++){
fscanf(fp, "%d", &NeibPe[neib]);
} fscanf(fp, "%d %d", &np, &n);
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ImportIndex[neib]);}
nn = ImportIndex[NeibPeTot];
ImportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ExportIndex[neib]);}
nn = ExportIndex[NeibPeTot];
ExportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}
Exam
ple:
sq-
sr1.
c (2
/6)
Rea
ding
dis
tribu
ted
loca
l dat
a fil
es (s
qm.*
)#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
C21
8M
PI P
rogr
amm
ing
REC
V/Im
port
: PE#
0#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
2122
2324
1314
1516
910
1112
56
78
12
34
20 19 18 17PE
#0PE
#1
PE#2
219
MP
I Pro
gram
min
g
/*!C +-----------+
!C | DATA file |
!C +-----------+
!C===*/sprintf(FileName, "sqm.%d", MyRank);
fp = fopen(FileName, "r");
fscanf(fp, "%d", &NeibPeTot);
NeibPe = calloc(NeibPeTot, sizeof(int));
ImportIndex = calloc(1+NeibPeTot, sizeof(int));
ExportIndex = calloc(1+NeibPeTot, sizeof(int));
for(neib=0;neib<NeibPeTot;neib++){
fscanf(fp, "%d", &NeibPe[neib]);
} fscanf(fp, "%d %d", &np, &n);
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ImportIndex[neib]);}
nn = ImportIndex[NeibPeTot];
ImportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ExportIndex[neib]);}
nn = ExportIndex[NeibPeTot];
ExportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}
Exam
ple:
sq-
sr1.
c (2
/6)
Rea
ding
dis
tribu
ted
loca
l dat
a fil
es (s
qm.*
)#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
C22
0M
PI P
rogr
amm
ing
/*!C +-----------+
!C | DATA file |
!C +-----------+
!C===*/sprintf(FileName, "sqm.%d", MyRank);
fp = fopen(FileName, "r");
fscanf(fp, "%d", &NeibPeTot);
NeibPe = calloc(NeibPeTot, sizeof(int));
ImportIndex = calloc(1+NeibPeTot, sizeof(int));
ExportIndex = calloc(1+NeibPeTot, sizeof(int));
for(neib=0;neib<NeibPeTot;neib++){
fscanf(fp, "%d", &NeibPe[neib]);
} fscanf(fp, "%d %d", &np, &n);
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ImportIndex[neib]);}
nn = ImportIndex[NeibPeTot];
ImportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ImportItem[i]); ImportItem[i]--;}
for(neib=1;neib<NeibPeTot+1;neib++){
fscanf(fp, "%d", &ExportIndex[neib]);}
nn = ExportIndex[NeibPeTot];
ExportItem = malloc(nn * sizeof(int));
for(i=0;i<nn;i++){
fscanf(fp, "%d", &ExportItem[i]);ExportItem[i]--;}
Exam
ple:
sq-
sr1.
c (2
/6)
Rea
ding
dis
tribu
ted
loca
l dat
a fil
es (s
qm.*
)#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
C22
1M
PI P
rogr
amm
ing
SEN
D/E
xpor
t: PE
#0#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
2122
2324
1314
1516
910
1112
56
78
12
34
20 19 18 17PE
#0PE
#1
PE#2
222
MP
I Pro
gram
min
g
sprintf(FileName, "sq.%d", MyRank);
fp = fopen(FileName, "r");
assert(fp != NULL);
val = calloc(np, sizeof(*val));
for(i=0;i<n;i++){
fscanf(fp, "%d", &val[i]);
}
Exam
ple:
sq-
sr1.
c (3
/6)
Rea
ding
dis
tribu
ted
loca
l dat
a fil
es (s
q.*)
n
: Num
ber o
f int
erna
l poi
nts
val :
Glo
bal I
D o
f mes
hes
val o
n ex
tern
al p
oint
s ar
e un
know
nat
this
sta
ge.
2526
2728
1718
1920
910
1112
12
34
2526
2728
1718
1920
910
1112
12
34
PE#0
PE#1
PE#2
1 2 3 4 9 1011121718192025262728
C22
3M
PI P
rogr
amm
ing
Exam
ple:
sq-
sr1.
c (4
/6)
Pre
para
tion
of s
endi
ng/re
ceiv
ing
buffe
rs/*!C!C +--------+
!C | BUFFER |
!C +--------+
!C===*/SendBuf = calloc(ExportIndex[NeibPeTot], sizeof(*SendBuf));
RecvBuf = calloc(ImportIndex[NeibPeTot], sizeof(*RecvBuf));
for(neib=0;neib<NeibPeTot;neib++){
iStart = ExportIndex[neib];
iEnd = ExportIndex[neib+1];
for(i=iStart;i<iEnd;i++){
SendBuf[i] = val[ExportItem[i]];
}}
Info
. of b
ound
ary
poin
ts is
writ
ten
into
sen
ding
bu
ffer (SendBuf
). In
fo. s
ent t
o NeibPe[neib]
is s
tore
d in
SendBuf[ExportIndex[neib]:
ExportInedx[neib+1]-1]
C22
4M
PI P
rogr
amm
ing
Send
ing
Buf
fer i
s ni
ce ..
.
2122
2324
1314
1516
910
1112
56
78
12
34
20 19 18 17PE
#0PE
#1
PE#2
Num
berin
g of
thes
e bo
unda
ry n
odes
is
not
con
tinuo
us, t
here
fore
the
follo
win
g pr
oced
ure
of M
PI_
Isen
d is
no
t app
lied
dire
ctly
:
・S
tarti
ng a
ddre
ss o
f sen
ding
buf
fer
・X
X-m
essa
ges
from
that
add
ress
for (neib=0; neib<NeibPETot; neib++){
tag= 0;
iS_e= export_index[neib];
iE_e= export_index[neib+1];
BUFlength_e= iE_e -iS_e
ierr= MPI_Isend
(&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],0,
MPI_COMM_WORLD, &ReqSend[neib])
}
C22
5M
PI P
rogr
amm
ing
Com
mun
icat
ion
Pat
tern
usi
ng 1
D
Stru
ctur
e
halo
halo
halo
halo
Dr.
Osn
i Mar
ques
(La
wre
nce
Ber
kele
y N
atio
nal L
abor
ator
y)
226
MP
I Pro
gram
min
g
Exam
ple:
sq-
sr1.
c (5
/6)
SE
ND
/Exp
ort:
MP
I_Is
end
/*!C!C +-----------+
!C | SEND-RECV |
!C +-----------+
!C===*/StatSend = malloc(sizeof(MPI_Status) * NeibPeTot);
StatRecv = malloc(sizeof(MPI_Status) * NeibPeTot);
RequestSend = malloc(sizeof(MPI_Request) * NeibPeTot);
RequestRecv = malloc(sizeof(MPI_Request) * NeibPeTot);
for(neib=0;neib<NeibPeTot;neib++){
iStart = ExportIndex[neib];
iEnd = ExportIndex[neib+1];
BufLength = iEnd -iStart;
MPI_Isend(&SendBuf[iStart], BufLength, MPI_INT,
NeibPe[neib], 0, MPI_COMM_WORLD, &RequestSend[neib]);
} for(neib=0;neib<NeibPeTot;neib++){
iStart = ImportIndex[neib];
iEnd = ImportIndex[neib+1];
BufLength = iEnd -iStart;
MPI_Irecv(&RecvBuf[iStart], BufLength, MPI_INT,
NeibPe[neib], 0, MPI_COMM_WORLD, &RequestRecv[neib]);
}
5758
5960
4950
5152
4142
4344
3334
3536
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
C22
7M
PI P
rogr
amm
ing
SEN
D/E
xpor
t: PE
#0#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
2122
2324
1314
1516
910
1112
56
78
12
34
20 19 18 17PE
#0PE
#1
PE#2
228
MP
I Pro
gram
min
g
SEN
D: M
PI_I
send
/Irec
v/W
aita
llne
ib#0
Send
Buf
neib
#1ne
ib#2
neib
#3
BU
Flen
gth_
eB
UFl
engt
h_e
BU
Flen
gth_
eB
UFl
engt
h_e
expo
rt_i
ndex
[0]
expo
rt_i
ndex
[1]
expo
rt_i
ndex
[2]
expo
rt_i
ndex
[3]
expo
rt_i
ndex
[4]
for (neib=0; neib<NeibPETot;neib++){
for (k=export_index[neib];k<export_index[neib+1];k++){
kk= export_item[k];
SendBuf[k]= VAL[kk];
}} for (neib=0; neib<NeibPETot; neib++){
tag= 0;
iS_e= export_index[neib];
iE_e= export_index[neib+1];
BUFlength_e= iE_e -iS_e
ierr= MPI_Isend
(&SendBuf[iS_e], BUFlength_e, MPI_DOUBLE, NeibPE[neib],0,
MPI_COMM_WORLD, &ReqSend[neib])
} MPI_Waitall(NeibPETot, ReqSend, StatSend);
expo
rt_i
tem
(exp
ort_
inde
x[ne
ib]:e
xpor
t_in
dex[
neib
+1]-1
) ar
e se
nt to
nei
b-th
neig
hbor
C22
9M
PI P
rogr
amm
ing
Cop
ied
to s
endi
ng b
uffe
rs
Not
ice:
Sen
d/R
ecv
Arr
ays
#PE0send:
VEC(start_send)~
VEC(start_send+length_send-1)
#PE1recv:
VEC(start_recv)~
VEC(start_recv+length_recv-1)
#PE1send:
VEC(start_send)~
VEC(start_send+length_send-1)
#PE0recv:
VEC(start_recv)~
VEC(start_recv+length_recv-1)
•“le
ngth
_sen
d” o
f sen
ding
pro
cess
mus
t be
equa
l to
“leng
th_r
ecv”
of r
ecei
ving
pro
cess
.–
PE
#0 to
PE
#1, P
E#1
to P
E#0
•“s
endb
uf” a
nd “r
ecvb
uf”:
diffe
rent
add
ress
230
MP
I Pro
gram
min
g
Rel
atio
nshi
p SE
ND
/REC
V
do neib= 1, NEIBPETOT
iS_i= import_index(neib-1) + 1
iE_i= import_index(neib )
BUFlength_i= iE_i + 1 -iS_i
call MPI_IRECV &
& (RECVbuf(iS_i), BUFlength_i,MPI_INTEGER, NEIBPE(neib),0,&
& MPI_COMM_WORLD, request_recv(neib),ierr)
enddo
do neib= 1, NEIBPETOT
iS_e= export_index(neib-1) + 1
iE_e= export_index(neib )
BUFlength_e= iE_e + 1 -iS_e
call MPI_ISEND
&& (SENDbuf(iS_e), BUFlength_e, MPI_INTEGER, NEIBPE(neib),0,&
& MPI_COMM_WORLD, request_send(neib),ierr)
enddo
•C
onsi
sten
cy o
f ID
’s o
f sou
rces
/des
tinat
ions
, siz
e an
d co
nten
ts o
f mes
sage
s !
•C
omm
unic
atio
n oc
curs
whe
n N
EIB
PE
(nei
b) m
atch
es
231
MP
I Pro
gram
min
g
Rel
atio
nshi
p SE
ND
/REC
V (#
0 to
#3)
•C
onsi
sten
cy o
f ID
’s o
f sou
rces
/des
tinat
ions
, siz
e an
d co
nten
ts o
f mes
sage
s !
•C
omm
unic
atio
n oc
curs
whe
n N
EIB
PE
(nei
b) m
atch
es
Send
#0
Rec
v. #
3
#1 #5 #9
#1 #10
#0
#3
NEI
BPE
(:)=1
,3,5
,9N
EIB
PE(:)
=1,0
,10
232
MP
I Pro
gram
min
g
Exam
ple:
sq-
sr1.
c (5
/6)
RE
CV
/Impo
rt: M
PI_
Irecv
/*!C!C +-----------+
!C | SEND-RECV |
!C +-----------+
!C===*/StatSend = malloc(sizeof(MPI_Status) * NeibPeTot);
StatRecv = malloc(sizeof(MPI_Status) * NeibPeTot);
RequestSend = malloc(sizeof(MPI_Request) * NeibPeTot);
RequestRecv = malloc(sizeof(MPI_Request) * NeibPeTot);
for(neib=0;neib<NeibPeTot;neib++){
iStart = ExportIndex[neib];
iEnd = ExportIndex[neib+1];
BufLength = iEnd -iStart;
MPI_Isend(&SendBuf[iStart], BufLength, MPI_INT,
NeibPe[neib], 0, MPI_COMM_WORLD, &RequestSend[neib]);
} for(neib=0;neib<NeibPeTot;neib++){
iStart = ImportIndex[neib];
iEnd = ImportIndex[neib+1];
BufLength = iEnd -iStart;
MPI_Irecv(&RecvBuf[iStart], BufLength, MPI_INT,
NeibPe[neib], 0, MPI_COMM_WORLD, &RequestRecv[neib]);
}
5758
5960
4950
5152
4142
4344
3334
3536
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
C23
3M
PI P
rogr
amm
ing
REC
V/Im
port
: PE#
0#NEIBPEtot
2 #NEIBPE
1 2
#NODE
24 16
#IMPORTindex
4 8
#IMPORTitems
1718192021222324#EXPORTindex
4 8
#EXPORTitems
4 8 121613141516
2122
2324
1314
1516
910
1112
56
78
12
34
20 19 18 17PE
#0PE
#1
PE#2
234
MP
I Pro
gram
min
g
REC
V: M
PI_I
send
/Irec
v/W
aita
ll
neib
#0R
ecvB
ufne
ib#1
neib
#2ne
ib#3
BU
Flen
gth_
iB
UFl
engt
h_i
BU
Flen
gth_
iB
UFl
engt
h_i
for (neib=0; neib<NeibPETot; neib++){
tag= 0;
iS_i= import_index[neib];
iE_i= import_index[neib+1];
BUFlength_i= iE_i -iS_i
ierr= MPI_Irecv
(&RecvBuf[iS_i], BUFlength_i, MPI_DOUBLE, NeibPE[neib],0,
MPI_COMM_WORLD, &ReqRecv[neib])
} MPI_Waitall(NeibPETot, ReqRecv, StatRecv);
for (neib=0; neib<NeibPETot;neib++){
for (k=import_index[neib];k<import_index[neib+1];k++){
kk= import_item[k];
VAL[kk]= RecvBuf[k];
}}
Cop
ied
from
rece
ivin
g bu
ffer
impo
rt_i
ndex
[0]
impo
rt_i
ndex
[1]
impo
rt_i
ndex
[2]
impo
rt_i
ndex
[3]
impo
rt_i
ndex
[4]C
235
MP
I Pro
gram
min
g
impo
rt_i
tem
(impo
rt_i
ndex
[nei
b]:im
port
_ind
ex[n
eib+
1]-1
) ar
e re
ceiv
ed fr
om n
eib-
thne
ighb
or
Exam
ple:
sq-
sr1.
c (6
/6)
Rea
ding
info
of e
xt p
ts fr
om re
ceiv
ing
buffe
rsMPI_Waitall(NeibPeTot, RequestRecv, StatRecv);
for(neib=0;neib<NeibPeTot;neib++){
iStart = ImportIndex[neib];
iEnd = ImportIndex[neib+1];
for(i=iStart;i<iEnd;i++){
val[ImportItem[i]] = RecvBuf[i];
}} MPI_Waitall(NeibPeTot, RequestSend, StatSend); /*
!C +--------+
!C | OUTPUT |
!C +--------+
!C===*/for(neib=0;neib<NeibPeTot;neib++){
iStart = ImportIndex[neib];
iEnd = ImportIndex[neib+1];
for(i=iStart;i<iEnd;i++){
int in = ImportItem[i];
printf("RECVbuf%8d%8d%8d¥n", MyRank, NeibPe[neib], val[in]);
}} MPI_Finalize();
return 0;
}
Con
tent
s of
RecvBuf
are
copi
ed to
valu
es a
t ext
erna
l poi
nts.
C23
6M
PI P
rogr
amm
ing
Exam
ple:
sq-
sr1.
c (6
/6)
Writ
ing
valu
es a
t ext
erna
l poi
nts
MPI_Waitall(NeibPeTot, RequestRecv, StatRecv);
for(neib=0;neib<NeibPeTot;neib++){
iStart = ImportIndex[neib];
iEnd = ImportIndex[neib+1];
for(i=iStart;i<iEnd;i++){
val[ImportItem[i]] = RecvBuf[i];
}} MPI_Waitall(NeibPeTot, RequestSend, StatSend); /*
!C +--------+
!C | OUTPUT |
!C +--------+
!C===*/for(neib=0;neib<NeibPeTot;neib++){
iStart = ImportIndex[neib];
iEnd = ImportIndex[neib+1];
for(i=iStart;i<iEnd;i++){
int in = ImportItem[i];
printf("RECVbuf%8d%8d%8d¥n", MyRank, NeibPe[neib], val[in]);
}} MPI_Finalize();
return 0;
}
C23
7M
PI P
rogr
amm
ing
Res
ults
(PE#
0) RECVbuf 0 1 5
RECVbuf 0 1 13
RECVbuf 0 1 21
RECVbuf 0 1 29
RECVbuf 0 2 33
RECVbuf 0 2 34
RECVbuf 0 2 35
RECVbuf 0 2 36
RECVbuf 1 0 4
RECVbuf 1 0 12
RECVbuf 1 0 20
RECVbuf 1 0 28
RECVbuf 1 3 37
RECVbuf 1 3 38
RECVbuf 1 3 39
RECVbuf 1 3 40
RECVbuf 2 3 37
RECVbuf 2 3 45
RECVbuf 2 3 53
RECVbuf 2 3 61
RECVbuf 2 0 25
RECVbuf 2 0 26
RECVbuf 2 0 27
RECVbuf 2 0 28
RECVbuf 3 2 36
RECVbuf 3 2 44
RECVbuf 3 2 52
RECVbuf 3 2 60
RECVbuf 3 1 29
RECVbuf 3 1 30
RECVbuf 3 1 31
RECVbuf 3 1 32
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
238
MP
I Pro
gram
min
g
Res
ults
(PE#
1) RECVbuf 0 1 5
RECVbuf 0 1 13
RECVbuf 0 1 21
RECVbuf 0 1 29
RECVbuf 0 2 33
RECVbuf 0 2 34
RECVbuf 0 2 35
RECVbuf 0 2 36
RECVbuf 1 0 4
RECVbuf 1 0 12
RECVbuf 1 0 20
RECVbuf 1 0 28
RECVbuf 1 3 37
RECVbuf 1 3 38
RECVbuf 1 3 39
RECVbuf 1 3 40
RECVbuf 2 3 37
RECVbuf 2 3 45
RECVbuf 2 3 53
RECVbuf 2 3 61
RECVbuf 2 0 25
RECVbuf 2 0 26
RECVbuf 2 0 27
RECVbuf 2 0 28
RECVbuf 3 2 36
RECVbuf 3 2 44
RECVbuf 3 2 52
RECVbuf 3 2 60
RECVbuf 3 1 29
RECVbuf 3 1 30
RECVbuf 3 1 31
RECVbuf 3 1 32
5758
5960
4950
5152
4142
4344
3334
3536
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
2930
3132
2122
2324
1314
1516
56
78
PE#0
PE#1
PE#2
PE#3
239
MP
I Pro
gram
min
g
Res
ults
(PE#
2) RECVbuf 0 1 5
RECVbuf 0 1 13
RECVbuf 0 1 21
RECVbuf 0 1 29
RECVbuf 0 2 33
RECVbuf 0 2 34
RECVbuf 0 2 35
RECVbuf 0 2 36
RECVbuf 1 0 4
RECVbuf 1 0 12
RECVbuf 1 0 20
RECVbuf 1 0 28
RECVbuf 1 3 37
RECVbuf 1 3 38
RECVbuf 1 3 39
RECVbuf 1 3 40
RECVbuf 2 3 37
RECVbuf 2 3 45
RECVbuf 2 3 53
RECVbuf 2 3 61
RECVbuf 2 0 25
RECVbuf 2 0 26
RECVbuf 2 0 27
RECVbuf 2 0 28
RECVbuf 3 2 36
RECVbuf 3 2 44
RECVbuf 3 2 52
RECVbuf 3 2 60
RECVbuf 3 1 29
RECVbuf 3 1 30
RECVbuf 3 1 31
RECVbuf 3 1 32
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
PE#0
PE#1
PE#2
PE#3
5758
5960
4950
5152
4142
4344
3334
3536
2930
3132
2122
2324
1314
1516
56
78
240
MP
I Pro
gram
min
g
5758
5960
4950
5152
4142
4344
3334
3536
2930
3132
2122
2324
1314
1516
56
78
Res
ults
(PE#
3) RECVbuf 0 1 5
RECVbuf 0 1 13
RECVbuf 0 1 21
RECVbuf 0 1 29
RECVbuf 0 2 33
RECVbuf 0 2 34
RECVbuf 0 2 35
RECVbuf 0 2 36
RECVbuf 1 0 4
RECVbuf 1 0 12
RECVbuf 1 0 20
RECVbuf 1 0 28
RECVbuf 1 3 37
RECVbuf 1 3 38
RECVbuf 1 3 39
RECVbuf 1 3 40
RECVbuf 2 3 37
RECVbuf 2 3 45
RECVbuf 2 3 53
RECVbuf 2 3 61
RECVbuf 2 0 25
RECVbuf 2 0 26
RECVbuf 2 0 27
RECVbuf 2 0 28
RECVbuf 3 2 36
RECVbuf 3 2 44
RECVbuf 3 2 52
RECVbuf 3 2 60
RECVbuf 3 1 29
RECVbuf 3 1 30
RECVbuf 3 1 31
RECVbuf 3 1 32
6162
6364
5354
5556
4546
4748
3738
3940
2526
2728
1718
1920
910
1112
12
34
PE#0
PE#1
PE#2
PE#3
241
MP
I Pro
gram
min
g
Dis
trib
uted
Loc
al D
ata
Stru
ctur
e fo
r Pa
ralle
l Com
puta
tion
•D
istri
bute
d lo
cal d
ata
stru
ctur
e fo
r dom
ain-
to-d
oain
co
mm
unic
atio
ns h
as b
een
intro
duce
d, w
hich
is
appr
opria
te fo
r suc
h ap
plic
atio
ns w
ith s
pars
e co
effic
ient
m
atric
es (e
.g. F
DM
, FE
M, F
VM
etc
.).–
SP
MD
–Lo
cal N
umbe
ring:
Inte
rnal
pts
to E
xter
nal p
ts–
Gen
eral
ized
com
mun
icat
ion
tabl
e
•E
very
thin
g is
eas
y, if
pro
per d
ata
stru
ctur
e is
def
ined
:–
Val
ues
at b
ound
ary
pts
are
copi
ed in
to s
endi
ng b
uffe
rs–
Sen
d/R
ecv
–V
alue
s at
ext
erna
lpts
are
upd
ated
thro
ugh
rece
ivin
g bu
ffers
242
MP
I Pro
gram
min
g
243
Initi
al M
esh
12
34
5
67
89
10
1112
1314
15
1617
1819
20
2122
2324
25
t2
244
Thre
e D
omai
ns
45
89
10
1314
15
1819
20
2324
25
67
8
1112
1314
1617
1819
2122
2324
12
34
5
67
89
10
1112
13
#PE2
#PE1
#PE0
t2
245
Thre
e D
omai
ns
9 410 5
11 81 9
2 10
12 133 14
4 15
13 185 19
6 20
14 237 24
8 25
10 611 7
12 8
1 112 12
3 1313 14
4 165 17
6 1814 19
7 218 22
9 2315 24
1 12 2
3 34 4
5 5
6 67 7
8 89 9
10 10
11 1112 12
13 13
#PE2
#PE0
#PE1t2
246
PE
#0: s
qm.0
: fill ○
’s
9 410 5
11 81 9
2 10
12 133 14
4 15
13 185 19
6 20
14 237 24
8 25
10 611 7
12 8
1 112 12
3 1313 14
4 165 17
6 1814 19
7 218 22
9 2315 24
1 12 2
3 34 4
5 5
6 67 7
8 89 9
10 10
11 1112 12
13 13
#PE
2
#PE
0
#PE
1
9 410 5
11 81 9
2 10
12 133 14
4 15
13 185 19
6 20
14 237 24
8 25
10 611 7
12 8
1 112 12
3 1313 14
4 165 17
6 1814 19
7 218 22
9 2315 24
1 12 2
3 34 4
5 5
6 67 7
8 89 9
10 10
11 1112 12
13 13
#PE
2
#PE
0
#PE
1#NEIBPEtot
2#NEIBPE
1 2
#NODE13 8
(int+ext, int pts)
#IMPORTindex
○○
#IMPORTitems
○…
#EXPORTindex
○○
#EXPORTitems
○…
t2
247
PE
#1: s
qm.1
: fill ○
’s
9 410 5
11 81 9
2 10
12 133 14
4 15
13 185 19
6 20
14 237 24
8 25
10 611 7
12 8
1 112 12
3 1313 14
4 165 17
6 1814 19
7 218 22
9 2315 24
1 12 2
3 34 4
5 5
6 67 7
8 89 9
10 10
11 1112 12
13 13
#PE
2
#PE
0
#PE
1
9 410 5
11 81 9
2 10
12 133 14
4 15
13 185 19
6 20
14 237 24
8 25
10 611 7
12 8
1 112 12
3 1313 14
4 165 17
6 1814 19
7 218 22
9 2315 24
1 12 2
3 34 4
5 5
6 67 7
8 89 9
10 10
11 1112 12
13 13
#PE
2
#PE
0
#PE
1#NEIBPEtot
2#NEIBPE
0 2
#NODE 8 14
(int+ext, int pts)
#IMPORTindex
○○
#IMPORTitems
○…
#EXPORTindex
○○
#EXPORTitems
○…
t2
248
PE
#2: s
qm.2
: fill ○
’s
9 410 5
11 81 9
2 10
12 133 14
4 15
13 185 19
6 20
14 237 24
8 25
10 611 7
12 8
1 112 12
3 1313 14
4 165 17
6 1814 19
7 218 22
9 2315 24
1 12 2
3 34 4
5 5
6 67 7
8 89 9
10 10
11 1112 12
13 13
#PE
2
#PE
0
#PE
1
9 410 5
11 81 9
2 10
12 133 14
4 15
13 185 19
6 20
14 237 24
8 25
10 611 7
12 8
1 112 12
3 1313 14
4 165 17
6 1814 19
7 218 22
9 2315 24
1 12 2
3 34 4
5 5
6 67 7
8 89 9
10 10
11 1112 12
13 13
#PE
2
#PE
0
#PE
1#NEIBPEtot
2#NEIBPE
1 0
#NODE 9 15
(int+ext, int pts)
#IMPORTindex
○○
#IMPORTitems
○…
#EXPORTindex
○○
#EXPORTitems
○…
t2
249
9 410 5
11 81 9
2 10
12 133 14
4 15
13 185 19
6 20
14 237 24
8 25
10 611 7
12 8
1 112 12
3 1313 14
4 165 17
6 1814 19
7 218 22
9 2315 24
1 12 2
3 34 4
5 5
6 67 7
8 89 9
10 10
11 1112 12
13 13
#PE2
#PE0
#PE1t2
250
Proc
edur
es•
Num
ber o
f Int
erna
l/Ext
erna
l Poi
nts
•W
here
do
Ext
erna
l Pts
com
e fro
m ?
–IMPORTindex,IMPORTitems
–S
eque
nce
of NEIBPE
•Th
en c
heck
des
tinat
ions
of B
ound
ary
Pts
.–EXPORTindex,EXPORTitems
–S
eque
nce
of NEIBPE
•“s
q.*”
are
in <
$O-S
2>/e
x•
Cre
ate
“sqm
.*” b
y yo
urse
lf•
copy
<$O
-S2>
/a.o
ut(b
y sq
-sr1
.c) t
o <$
O-S
2>/e
x•
pjsu
bgo
3.sh
t2
Rep
ort S
2 (1
/2)
•P
aral
leliz
e 1D
cod
e (1
d.c)
usi
ng M
PI
•R
ead
entir
e el
emen
t num
ber,
and
deco
mpo
se in
to s
ub-
dom
ains
in y
our p
rogr
am
•M
easu
re p
aral
lel p
erfo
rman
ce
251
MP
I Pro
gram
min
g
Rep
ort S
2 (2
/2)
•D
eadl
ine:
17:
00 O
ctob
er 1
1th
(Sat
), 20
14.
–S
end
files
via
e-m
ail a
t nakajima(at)cc.u-tokyo.ac.jp
•P
robl
em–
App
ly “G
ener
aliz
ed C
omm
unic
atio
n Ta
ble”
–R
ead
entir
e el
em. #
, dec
ompo
se in
to s
ub-d
omai
ns in
you
r pro
gram
–E
valu
ate
para
llel p
erfo
rman
ce•
You
nee
d hu
ge n
umbe
r of e
lem
ents
, to
get e
xcel
lent
per
form
ance
.•
Fix
num
ber o
f ite
ratio
ns (e
.g. 1
00),
if co
mpu
tatio
ns c
anno
t be
com
plet
ed.
•R
epor
t–
Cov
er P
age:
Nam
e, ID
, and
Pro
blem
ID (S
2) m
ust b
e w
ritte
n.
–Le
ss th
an e
ight
page
s in
clud
ing
figur
es a
nd ta
bles
(A4)
. •
Stra
tegy
, Stru
ctur
e of
the
Pro
gram
, Rem
arks
–S
ourc
e lis
t of t
he p
rogr
am (i
f you
hav
e bu
gs)
–O
utpu
t lis
t (as
sm
all a
s po
ssib
le)
252
MP
I Pro
gram
min
g