Parallel Programming languages on heterogeneous architectures4 .docx
Transcript of Parallel Programming languages on heterogeneous architectures4 .docx
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
1/26
Title
PARALLEL PROGRAMMING LANGUAGES ON
HETEROGENEOUS ARCHITECTURES USING
OPENMPC, OMPSS, OPENACC AND OPENMP
Titulo
LENGUAJES PARA PROGRAMACION PARALELA
EN ARQUITECTURAS HETEROGENEAS
UTILIZANDO OPENMPC, OMPSS, OPENACC Y
OPENMP
Authors
Esteban Hernndez B.
Network Engineering with Master degree on software engineering and Free Software
construccin, minor degree on a!ied mathematics and network software construction.
Now running the "octorate studies on Engineering of "istrita! #ni$ersit% and work as
rincia! architect on N'. (our works in focus on )ara!!e! )rogramming, high
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
2/26
erformance comuting, comutationa! numerica! simu!ation and numerica! weather
forecast.
*ontact+ ehernandezb-udistrita!.edu.co
erardo de /es0s Monto%a a$iria,
Engineer in meteoro!og% from the #ni$ersit% of 1eningrad, with a doctorate in h%sica!2
mathematica! sciences of state moscow uni$ersit%, ioneer in the area of meteoro!og% in
*o!ombia, in charge of meteoro!og% graduate of the Nationa! #ni$ersit% of *o!ombia,
researcher and director of more than 34 graduate theses d%namic area metereoro!og5a and
numerica! forecast, air 6ua!it%, efficient use of c!imate mode!s and weather. He is current!%
a fu!! rofessor of the Facu!t% of eosciences of the Nationa! #ni$ersit% of *o!ombia.
*ontact+ gdmonto%ag-una!.edu.co
*ar!os Enri6ue Montenegro
S%stem Engineering, )hd and Master degree on 7nformatics, director of research grou
77&8 with focus on Socia! Network 8na!%zing, e1earning and data $isua!ization. He is
current!% associate rofessor of Engineering Facu!t% on "istrita! #ni$ersit%.
*ontact+ cemontenegrom-udistrita!.edu.co
Abstract: 9n the fie!d of ara!!e! rograming has seen arri$e a new big !a%er in the !ast 3:
%ears. 'he )#;s has taken a re!e$ant imortance on scientific comuting because offers
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected] -
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
3/26
high erformance comuting, !ow cost and sim!icit% of im!ementation. Howe$er one of
the most imortant cha!!enges it the rogram !anguages used for this de$ices. 'he effort for
recoding a!gorithms designed for *)#s is an imortant rob!em. 7n this artic!e we re$iew
three of rincia! frameworks for rogramming *#"8 de$ices comared with the new
directi$es introduced on the 9enM) < standard reso!$ing the /acobi iterati$e
method=Meierink > ?orst, 3@AA Cang, n.d.D.
Resumen: En e! camo de !a rogramacin ara!e!a, ha arribado un nue$o gran ugador en
!os 0!timos 3: aos. 1as )#s han tomado una imortancia re!e$ante en !a comutacin
cient5fica debido a 6ue ofrecen a!to rendimiento comutaciona!, bao costos % sim!icidad
de im!ementacin, sin embargo uno de !os desaf5os ms grandes 6ue oseen son !os
!enguaes uti!izados ara !a rogramacin de !os disositi$os. E! esfuerzo de reescribir
a!goritmos diseados origina!mente ara *)#s in uno de !os rob!emas ms re!e$antes. En
este art5cu!o se re$isan tres frameworks de rogramacin ara !a tecno!og5a *#"8 % se
rea!iza una comaracin con e! reciente estndar 9enM) $ersin ?orst, 3@AA Cang, n.d.D.
KeywordsCUDA, /acobbi method,9mSS 9en8**,9enM), 9enM), )ara!!e!
)rogramming
Palabras clave; Mtodo de acobi, 9mSS, 9en8**,9enM), )rogramacin ara!e!a.
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
4/26
I. INTRODUCTION
Since 3: %ears ago, the massi$e!% ara!!e! rocessors ha$e used the )#s as rincia!
e!ement on the new aroach in ara!!e! rogramming itGs e$o!$ed from a grahics2secific
acce!erator to a genera!2urose comuting de$ice and at this time is considered to be in the
era of )#s.=Nicko!!s > "a!!%, 4:3:D, howe$er the rincia! obstac!e for !arge adotion on
the rogrammer communit% has been the !ake of standards that a!!ow rogramming on
unified form different eisting hardware so!utions=Nicko!!s > "a!!%, 4:3:D. 'he most imort
!a%er on )# so!utions is N$idia I with the *#"8I !anguage rogramming and his own
comi!er =n$ccD=Hi!! > Mart%, 4::JD, with thousands of so!utions insta!!ed and reward on
toK:: suercomuter !ist, howe$er the ortabi!it% itGs the rincia! rob!em. Some
communit% roect and some hardware a!!iance ha$e roosed so!utions for reso!$e this
issue. 9mSS, 9en8** and 9enM)* ha$e emerged as the most romising so!utions
=?etter, 4:34D using the 9enM) base mode!. 7n the !ast %ear 9enM) board re!ease the
$ersion
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
5/26
A. Ompss
9mss =a rogramming mode! form Barce!ona Suercomuter center based on 9enM)
and StarSsD itGs framework focus on task decomosition aradigm for de$e!o ara!!e!
a!ications on c!uster en$ironments with heterogeneous architectures. 7t ro$ides a set of
comi!er directi$es that can be used to annotate a se6uentia! code. 8dditiona! features ha$e
been added to suort the use of acce!erators !ike )#s. 9mSS is based on StartsS a task
based rogramming mode!. 7t is based on annotating a seria! a!ication with directi$es that
are trans!ated b% the comi!er. Cith it, the same rogram that runs se6uentia!!% in a node
with a sing!e )# can run in ara!!e! in mu!ti!e )#s either !oca! =sing!e nodeD or remote
=c!uster of )#sD. Besides erforming a task2based ara!!e!ization, the runtime s%stem
mo$es the data as needed between the different nodes and )#s minimizing the imact of
communication b% using afnit% schedu!ing, caching, and b% o$er!aing communication
with the comutationa! task.
9mSs is based on the 9enM) rogramming mode! with modications to its eecution and
memor% mode!. 7t a!so ro$ides some etensions for s%nchronization, data motion and
heterogeneit% suort.
3D Eecution mode!+ 9mSs uses a thread2oo! eecution mode! instead of the traditiona!
9enM) fork2oin mode!. 'he master thread starts the eecution and a!! other threads
cooerate eecuting the work it creates =whether it is from work sharing or task constructsD.
'herefore, there is no need for a ara!!e! region. Nesting of constructs a!!ows other threads
to generate work as we!! =Figure 3D.
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
6/26
4D Memor% mode!+ 9mSs assumes that mu!ti!e address saces ma% eist. 8s such shared
data ma% reside in memor% !ocations that are not direct!% accessib!e from some of the
comutationa! resources. 'herefore, a!! ara!!e! code can on!% safe!% access ri$ate data and
shared data which has been marked e!icit!% with our etended s%nta. 'his assumtion is
true e$en for SM) machines as the im!ementation ma% rea!!ocate shared data to imro$e
memor% accesses =e.g., N#M8D.
LD Etensions+
Function tasks+ 9mSs a!!ows to annotate function dec!arations or denitions
*i!k="uran, )erez, 8%guad, Badia, > 1abarta, 4::JD, with a task directi$e. 7n this case, an%
ca!! to the function creates a new task that wi!! eecute the function bod%. 'he data
en$ironment of the task is catured from the function arguments.
"eendenc% s%nchronization+ 9mSs integrates the StarSs deendence suort ="uran et
a!., 4::JD. 7t a!!ows annotating tasks with three c!auses+ inut, outut, inOout. 'he% a!!ow
eressing, resecti$e!%, that a gi$en task deends on some data roduced before, which wi!!
roduce some data, or both. 'he s%nta in the c!ause a!!ows secif%ing sca!ars, arra%s,
ointers and ointed data.
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
7/26
Figure 1. OmSS e!e"u#i$% m$&e'
B. OpenACC
9en8** is an industr% standard roosed for heterogeneous comuting on
Suer*omuter *onference 4:33. 9en8** has fo!!owing the 9enM) aroach, with
annotation on Se6uentia! code with comi!er directi$es =ragmasD, indicating those regions
of code suscetib!e to be eecuted in the )#.
'he eecution mode! targeted b% 9en8** 8)72enab!ed im!ementations is host2
directed eecution with an attached acce!erator de$ice, such as a )#. Much of a user
a!ication eecutes on the host. *omute intensi$e regions are off!oaded to the acce!erator
de$ice under contro! of the host. 'he de$ice eecutes ara!!e! regions, which t%ica!!%
contain work2 sharing !oos, or kerne!s regions, which t%ica!!% contain one or more !oos
which are eecuted as kerne!s on the acce!erator. E$en in acce!erator2targeted regions, the
host ma% orchestrate the eecution b% a!!ocating memor% on the acce!erator de$ice,
initiating data transfer, sending the code to the acce!erator, assing arguments to the comute
region, 6ueuing the de$ice code, waiting for com!etion, transferring resu!ts back to the
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
8/26
host, and de2a!!ocating memor% =Figure 4D. 7n most cases, the host can 6ueue a se6uence of
oerations to be eecuted on the de$ice, one after the other.=Co!fe, 4:3LD
'he actua! rob!ems with 9en8** are re!ationshi with the on!% for2oin mode! suort
and suort for on!% commercia! comi!ers can suorts his directi$es =)7, *ra% and
*8)SD=Co!fe, 4:3LD=&e%es, 1ez, Fumero, > Sande, 4:34D. 7n the !ast %ear on!% one
oen source im!ementations has suort =acc#11D =&e%es > 1ez2&odr5guez, 4:34D.
Figure (. Oe%ACC e!e"u#i$% m$&e'
C. OpenMPC
'he 9enM)* =9enM) etendent for *#"8D itGs a framework to hide de com!eit% of
rogramming mode! and memor% mode! to user=1ee > Eigenmann, 4:3:D. 9enM)*
consists of a standar 9enM) 8)7 !us a new set of directi$es and en$ironment $ariab!es to
contro! imortant *#"82re!ated arameters and otimizations.
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
9/26
9enM)* addresses two imortant issues on ))# rogramming+ rogrammabi!it%
and tunabi!it%. 9enM)* as a front2end rogramming mode! ro$ides rogrammers with
abstractions of the com!e *#"8 rogramming mode! and high2!e$e! contro!s o$er
$arious otimizations and *#"82re!ated arameters. 9enM)* inc!uded fu!!% automatic
comi!ation and user2assisted tuning s%stem suorting 9enM)*. 7n addition to a range of
comi!er transformations and otimizations, the s%stem inc!udes tuning caabi!ities for
generating, runing, and na$igating the search sace of comi!ation $ariants.
9enM)* use the comi!er cetus="a$e, Bae, Min, > 1ee, 4::@D for automatic
ara!!e!ization source to source. 'he Source code on * has L !e$e! of ana!%zing
)ri$atization
&eduction ?ariab!e &ecognition
7nduction ?ariab!e substitution
9enM)* adding a numbers of ragmas for annotate 9enM) ara!!e! regions and se!ect
otimization regions. 'he ragmas added has the fo!!owing form+ ragma cuda
PPfunctionQQ.
D. OpenMP release 4
9enM) is the most used framework for rogramming ara!!e! software with shared
memor% and suort on most of the eisting comi!ers. Cith the e!osion of mu!ticore and
man%core s%stem, 9enM) gain accetance on ara!!e! rogramming communit% and
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
10/26
hardware $endors. From his creation to $ersion L the focus of 8)7 was the *)#s
en$ironments, but with the introduction of )#s and $ector acce!erators the new < re!ease
inc!ude suort for eterna! de$ices=9enM), 4:3LD. Historica!!% 9enM) has suort
Sim!e 7nstruction Mu!ti!e "ata =S7M"D mode! on!% focus on fork2oin mode! =Figure LD,
but in this new re!ease the task2base mode! ="uran et a!., 4::J )odobas, Brorsson, > Fan,
4:3:D has been introduced to gain erformance with more ara!!e!ism on eterna! de$ices .
'he most imortant directi$es introduced were+ target, teams and distributed. 'his
directi$es ermit that a grou of threads was distributed on a secia! de$ices and the resu!t
was been co% to host memor% =Figure LD.
Figure ). Oe%MP * e!e"u#i$% m$&e'
III. JACO+I ITERATIE METHOD
7terati$e methods are suitab!e for !arge sca!e !inear e6uations. 'here are three common!%
used iterati$e methods+ /acobiGs method, auss method and S9& iterati$e
methods=ra$$anis, Fi!e!is2)aadoou!os, > 1iitakis, 4:3L Huang, 'eng, Cahid, > Ro,
4::@D.
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
11/26
BAx= 'he !ast two iterati$e methods con$ergence seed is faster than /acobiGs
iterati$e method but !ack of ara!!e!ism. 'he% ha$e ad$antages to /acobi method on!% when
im!emented in se6uentia! fashion and eecuted on traditiona! *)#s. 9n the other hand,
/acobiGs iterati$e method has inherent ara!!e!ism. 7tGs suitab!e to be im!emented on *#"8
or $ector acce!erators to run concurrent!% on man% cores. 'he basic idea of /acobi method
is con$ert the s%stem+ into e6ui$a!ent s%stem then we so!$ed E6uation =3D
and E6uation =4D+
LLLL4L4LLL3
4L4L4444443
3L3L4343333
bxaxaxa
bxaxaxa
bxaxaxa
=++
=++
=++
LL
L4
LL
L43
LL
L3L
44
4L
44
4L3
44
434
33
3L
33
3L4
33
343
a
bx
a
ax
a
ax
a
bx
a
ax
a
ax
a
bx
a
ax
a
ax
+=
+=
+=
E6uation =3D
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
12/26
=
ij
jk
jii
ii
ik xab
ax 3
,
,
3
on each iteration we so!$e +
E6uation =4D
Chere the $a!ues from the =k-1Dst iteration are used to comute the $a!ues for the kth
iteration. 'he seudo code for /acobi method ="ongarra, o!ub, rosse, Mo!er, > Moore,
4::JD+
*hoose an initia! guest to the so!ution .
for k3,4,T
for i3,4,Tni:
for 3,4,T,i23,iU3,Tn
i iU ai,=k23D
end
i =biU iDO ai,
end=kDcheck con$ergence continue if necessar%
end
'his iterati$e method can be im!emented on a ara!!e! form, using shared or distributed
memor% =Margaris, Soura$!as, > &oume!iotis, 4:3
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
13/26
Figure *. P-r-''e' $rm $ J-"$/i me#0$& $% 0-re& mem$r2=8!semmeri, n.d.D
I. OPENMP IMPLEMENTATION
int n17M7'VN
int m17M7'VM8WnXWmX OO the "23 matri
8new WnXWmX
%V$ectorWnX OOb $ectorOOfi!! the matriz with initia! conditions
T
ragma om ara!!e! for shared=m, n, 8new, 8D for= int 3 P n23 UUD
Y
for= int i 3 i P m23 iUU D
Y 8newWXWiX :.4Kf Z = 8WXWiU3X U 8WXWi23X
U 8W23XWiX U 8WU3XWiXD
error fmaf= error, fabsf=8newWXWiX28WXWiXDD [
[
ragma om ara!!e! for shared=m, n, 8new, 8D
for= int 3 P n23 UUD Y
for= int i 3 i P m23 iUU D Y
8WXWiX 8newWXWiX
[
[
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
14/26
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
15/26
ragma acc kerne!s
for= int 3 P n23 UUD Y
for= int i 3 i P m23 iUU D
Y 8WXWiX 8newWXWiX
[
[ if=iter \ 3:: :D rintf=]\Kd, \:.^f_n], iter, errorD
iterUU
[TOOrint the resu!t
7n this im!ementation aear a new annotation ragma acc kerne!s, where itGs indicate that
the code wi!! be eecuted. 'his sim!e annotation hide a com!e im!ementation of *#"8
kerne!, the co% of data from host to de$ices and de$ices to )# and definition of grid of
threads and the data maniu!ation =8morim > Haase, 4::@ Sanders > Randrot, 4:33
`hang et a!., 4::@D.
I. PURECUDA IMPLEMENTATION=Cang, n.d.D
T
int dimB, dim'
dim' 4K^
dimB =dim O dim'D U 3
f!oat err 3.:
OO set u the memor% for )# f!oat Z 1#Vd
f!oat Z BVd f!oat Z diagVd
f!oat ZVd, ZVo!dVd
f!oat Z tm
cudaMa!!oc= =$oid ZZD >BVd, sizeof=f!oatD Z dim D
cudaMa!!oc= =$oid ZZD >diagVd, sizeof=f!oatD Z dim D
cudaMa!!oc= =$oid ZZD >1#Vd, sizeof=f!oatD Z dim Z dimD
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
16/26
cudaMemc%= 1#Vd, 1#, sizeof=f!oatD Z dim Z dim, cudaMemc%Host'o"e$iceD
cudaMemc%= BVd, B, sizeof=f!oatD Z dim, cudaMemc%Host'o"e$iceD
cudaMemc%= diagVd, diag, sizeof=f!oatD Z dim, cudaMemc%Host'o"e$iceD
cudaMa!!oc= =$oid ZZD >Vd, sizeof=f!oatD Z dimD
cudaMa!!oc= =$oid ZZD >Vo!dVd, sizeof=f!oatD Z dimD
cudaMa!!oc= =$oid ZZD >tm, sizeof=f!oatD Z dimD
T
OOca!! to cuda kerne!s
OO 4. *omute b% 8 Vo!d
cudaMemc%= Vo!dVd, Vo!d, sizeof=f!oatD Z dim, cudaMemc%Host'o"e$iceD
matMu!t?ecPPPdimB, dim'QQQ=1#Vd, Vo!dVd, tm, dim, dimD OO use Vo!d to comute 1#
Vo!d and store the resu!t in tm
substractPPPdimB, dim'QQQ=BVd, tm, Vd, dimD OO get the =B 2 1# Vo!dD, which is
stored in Vd
diaMu!t?ecPPPdimB, dim'QQQ=diagVd, Vd, dimD OO get the new
OO L. co% the new back to the Host Memor%
cudaMemc%= , Vd, sizeof=f!oatD Z dim, cudaMemc%"e$ice'oHostD
OO
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
17/26
[
row7d U grid"im. Z b!ock"im.
[
VVs%ncthreads=D[
OO cuda kerne! for $ector subtraction
VVg!oba!VV $oid substract=f!oat ZaVd,
f!oat ZbVd,
f!oat ZcVd,
int dimD
Y
int tid thread7d. U b!ock7d. Z b!ock"im.
whi!e = tid P dim D
Y
cVdWtidX aVdWtidX 2 bVdWtidX
tid U grid"im. Z b!ock"im.
[
[
VVg!oba!VV $oid ?ecMa= f!oat Z $ec, int dimD
Y int tid thread7d. U b!ock7d. Z b!ock"im.
whi!e =dim Q 3D
Y
int mid dim O 4 OO get the ha!f size
if = tid P midD OO fi!ter the acti$e thread
Y
if = $ecWtidX P $ecWtidUmidX D OO get the !arger one between $ecWtidX and $ecWtidUmidX
$ecWtidX $ecWtidUmidX OO and store the !arger one in $ecWtidX [
OOdea! with the odd case
if = dim \ 4 D OO if dim is odd...we need care about the !ast e!ement
Y
if = tid : D OO on!% use the $ecW:X to comare with $ecWdim23X
Y
if = $ecWtidX P $ecWdim23X D
$ecWtidX $ecWdim23X
[ [
VVs%ncthreads=D OO s%nc a!! threads
dim O 4 OO make the $ector ha!f size short.
[
[
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
18/26
'he effort for write three kerne!s, management the !ogic of grids dimensions, co% data
from hosts to )# and )# to host, the asect of s%nchronization thread on the grous of
'hreads on )# and some asects as 'hread7" ca!cu!ation re6uired high effort of
know!edge of hardware de$ices and code rogramming.
II. TEST TECHNIQUE
Ce take the tree im!ementations of /acobi method =)ure 9enM), 9en8**, 9enM)*D
and running its on S#' =S%stem #nder 'estD of 'ab!e 3, and make mu!ti!es running with
s6uare matrices of incrementa! sizes ='ab!e 4D, take rocessing time for ana!%zing the
erformance =Rim, n.d. Sun > ustafson, 3@@3D against the code number !ines needed on
the a!gorithm.
T-/'e 1. C0-r-"#eri#i" $ S2#em u%&er Te#
ystem Suermicro S(S23:4A&2'&F
CPU 7nte!I eonI 3:2core EK24^J: ?4 *)#s - 4.J: Hz
!emory L4B ""&L 3^::MHz
"PU N?7"78 Re!er R
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
19/26
4: &am, 4:3LD.
I4. CONCLUSION
'he new de$ices for high erformance comuting needs standardized methods and
!anguages that ermits interoerabi!it%, eas% rogramming and we!! defined interfaces for
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
20/26
integrated the data interchange between memor% segments of the rocessors =*)#sD and
de$ices. 7s necessar% too, that !anguages has suort for work with two or more de$ices on
ara!!e! using the same code but running segments of high ara!!e!ism in automatica!!%
form. 9enM) is the facto standard for shared memor% rogramming mode! but the
suort for heterogeneous de$ices =us, acce!erators, fga, etcD is in $er% ear!% stage, the
new frameworks and industria! 8)7 need he! to grown and maturation of standard.
4. FINANCING
'his this research was de$e!oment with own resources, in the "octora! thesis rocess on
"istrita! #ni$ersit%
4I. FUTURE 5OR6S
7s necessar% ana!%zing the imact of frameworks for mu!ti!e de$ices on the same host for
management rob!ems of memor% !oca!it%, unified memor% between *)# and de$ices and
7O9 oerations from de$ices using the most recent $ersion of it standard framework. Cith
the suort of )#s de$ice on the fo!!owing $ersion of **K, is ossib!e a comi!er
erformance comarison between commercia! and standards oensource so!utions. 7tGs
necessar% too re$iew the imact too of use c!uster with mu!ti2de$ices, messaging ass and
M)7 integration for high erformance comuting using the !anguage etensions =Schaa >
Rae!i, 4::@D.
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
21/26
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
22/26
"annert, '., Marek, 8., > &am, M. =4:3LD. )orting 1arge H)* 8!ications to )#
*!usters+ 'he *odes ENE and ?E&'E. &etrie$ed from
htt+OOari$.orgOabsO3L3:.3 Moore, R. =4::JD. Net!ib and N82Net+
Bui!ding a Scientific *omuting *ommunit%.Annals of *e 0is*or2 of Comp)*ing
333, ,=4D, L:
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
23/26
uta, S., iang, )., > `hou, H. =4:3LD. 8na!%zing !oca!it% of memor% references in )#
architectures. & of *e ACM (8PLA5 :orksop on Memor2 &, 34.
doi+3:.33
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
24/26
1=3LAD, 3 1ez2&odr5guez, 7. =4:34D. acc#11+ 8n 9en8** im!ementation with
*#"8 and 9en*1 suort.3)ro-Par +,1+ Parallel &, +=44JL@JD, JA3JJ4.
&etrie$ed from htt+OO!ink.sringer.comOchaterO3:.3::AO@AJ2L2^ Randrot, E. =4:33D. *#"8 b% Eam!e.An n*ro!)/*ion *o 8eneral-P)rpose
8PU Programming &. &etrie$ed from htt+OOscho!ar.goog!e.comOscho!ar
h!en>btnSearch>6intit!e+*#"8Ub%UEam!e3
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
25/26
Schaa, "., > Rae!i, ". =4::@D. E!oring the mu!ti!e2)# design sace. +,, 333
n*erna*ional (2mposi)m on Parallel ' Dis*rib)*e! Pro/essing, 334.
doi+3:.33:@O7)")S.4::@.K3^3:^J
Sun, ., > ustafson, /. =3@@3D. 'oward a better ara!!e! erformance metric.Parallel
Comp)*ing, 3:@L33:@. &etrie$ed from
htt+OOwww.sciencedirect.comOscienceOartic!eOiiOS:3^AJ3@3:KJ::4J^
?etter, /. =4:34D. 9n the road to Easca!e+ !essons from contemorar% sca!ab!e )#
s%stems. & 7A C@C :orksop on A//elera*or e/nologies for &. &etrie$ed from
htt+OOwww.acrc.a2star.edu.sgOastaratiregV4:34O)roceedingsO)resentation 2 /effre%
?etter.df
Cang, (. =n.d.D. /acobi iteration on )#. &etrie$ed Setember 4K, 4:3
-
7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx
26/26