Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases
-
Upload
cesga-foundation -
Category
Technology
-
view
274 -
download
0
Transcript of Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases
Can You Get Performance from
Xeon Phi Easily?
Lessons Learned from Two Real
Cases
Objective
• Check the amount of work to use Intel
Xeon Phi.
• Minimal modifications using only pragmas.
• Two applications: – CalcunetW. Test MKL Libraries.
– GammaMaps. Test pragmas.
• Two modes: – Native: Only compiled to execute on Xeon Phi
– Offload: Uses Host+Xeon Phi
CalcuNetw: Calculate Measurements in Complex Networks
• Complex networks, consisting of sets of nodes or vertices joined together in pairs by links or edges.
• Application Calculates for each network: – Subgraph Centrality (SC): characterizes the
participation of each node in all subgraphs in a network.
– SC odd: account only paths of long odd
– SC even: account only paths of long even
– Bipartivity: Is a proportion of even to total number of closed walks in the network.
– Network Communicability for Connected Nodes: C(p,q): Measures how well communicated are two nodes in the network.
– Network Communicability C(G): is the mean of all the C(p,q),
Mouriño J.C., Estrada E., Gomez A. “ CalcuNetw: Calculate Measurements in Complex Networks ”,Informe Técnico
CESGA-2005-003
CalcuNetW
GammaMaps: A figure-of-merit in Radiation
Therapy
X
Y
Z
Dose in voxel i,j,k
X
Y
Z
GammaMaps: A figure-of-merit in
Radiation Therapy Read
Doses
Initialise and
normalise
Compute
Gamma
Store
Gamma
• Application in FORTRAN 90
• Parallelised using OpenMP
• Geometric algorithm*
• 512 x 512 x 128 = 33,554,432
voxels
• Auto-vectorization
• Pragmas for offload
* T. Ju, T. Simpson, J. O. Deasy, and D. A. Low, “Geometric interpretation of the γ dose distribution
comparison technique: Interpolation-free calculation,” Medical Physics, vol. 35, no. 3, p. 879, 2008.
Results of Experiments
Platform Host
CPU Model Intel(R) Xeon(R) CPU E5-2680
0 @ 2.70GHz
Nr. of cores 16
Memory 32788 MB
Operating System Linux 2.6.32-279.el6.x86_64
Compiler Version 2013U2 Intel Xeon Phi
Model Beta0 Engineering Sample
Nr. of cores 61 at 1.09GHz
Memory 7936 MB
Operating System MPSS Gold U1
Compiler Version 2013U2
GDDR Technology GDDR5
GDDR Frecuency 2750000 KHz
• Remote
access to
Intel systems
• Feb. 2013
COMPACT - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 1 2 3 4 5 6 7
Intel Xeon Phi Affinity Policies
SCATTER - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 4 1 5 2 6 3 7
BALANCED - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 1 2 3 4 5 6 7
BALANCED - CORE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
{0,1} {2,3} {4,5} {6,7}
• TYPE – Compact
– Scatter
– Balanced
• Granularity – Fine or Thread
– Core
Results for CalcunetW
CalcunetW
CalcunetW
CalcunetW
Results for GammaMaps
GammaMaps
Host
0
200
400
600
800
1000
1200
1400
0 5 10 15 20
Ela
psed
Tim
e (
s)
Nr. of Threads
Host
local-compact-core
local-compact-fine
local-scatter-fine
local-scatter-core
GammaMaps
Xeon Phi poor I/O
Conclusions
• Using MKL library is easy and does not
require changes in the code.
• Easy pragmas on code permit fast usage
• I/O performance issues in Xeon Phi
• 1 Xeon Phi ~ 1 Xeon E5-2680
• Improve performance requires additional
work.
Acknowledge
The authors would like to thank Intel for
providing access to Intel Xeon Phi
coprocessor.
Questions
Andrés Gómez
José Carlos Mouriño
Carmen Cotelo
Aurelio Rodríguez
The TEAM