Parallel Programming & Intel HPC Round Table visitjanjust/presentations/PDP-groepsoverleg... ·...
-
Upload
hoangkhanh -
Category
Documents
-
view
217 -
download
0
Transcript of Parallel Programming & Intel HPC Round Table visitjanjust/presentations/PDP-groepsoverleg... ·...
J.J. KeijserNikhefAmsterdamCT/PDP
Parallel Programming & Intel HPC Round Table visit
Jan Just Keijser23 April 2014
J.J. KeijserNikhefAmsterdamCT/PDP
PPCES 2014:◦ Parallel Programming for Computational
Engineering & Science◦ 10-14 March◦ RWTH Aachen
Intel EMEA HPC Roundtable◦ 8 – 9 April◦ Cambridge, UK◦ I survived sitting in a British car with Tristan
S. driving
J.J. KeijserNikhefAmsterdamCT/PDP
Parallel Programming Course1.Introduction, Parallel Computing Architectures,
Serial Tuning.
2.Message Passing with MPI.
3.Shared Memory Programming with OpenMP
4.OpenMP Programming for the MIC Architecture.
5.GPGPU Programming with OpenACC
J.J. KeijserNikhefAmsterdamCT/PDP
Day 1: serial tuningVery instructiveFound a bug in the Intel C++ 2014 compiler:
Good
10 times slower:
J.J. KeijserNikhefAmsterdamCT/PDP
Day 2&3: MPI & OpenMPMPI a bit boring, although very similar to
OpenMPOpenMP 4.0 looks promising:
◦ Annotate C/C++ or FORTRAN with #pragma's to help parallellisation
◦ New #pragma's “target” and “simd” to explicitly select an offload target and vectorisation
Portland Group (PG) compiler can compile OpenMP 4.0 code to multiple targets:◦ CPU, Xeon Phi, CUDA (Nvidia GPU's),
OpenCL
J.J. KeijserNikhefAmsterdamCT/PDP
Day 4: OpenMP on MICMIC = Intel Xeon PhiFirst time that I have seen code that runs
faster on a single Xeon Phi vs a dual Xeon E5 2697v2
Can use either Intel compiler or PG compiler
Again a bug discovered:◦ Code that is offloaded to a Xeon Phi runs
slower than code that is natively compiled for a Xeon Phi by a factor > 2
J.J. KeijserNikhefAmsterdamCT/PDP
Day 5: OpenACCSimilar to OpenMPMeant as a open standard to be able to
annotate code to compile on GPU'sThus far, only commercial compilers support
this (SGI, Portland Group)
J.J. KeijserNikhefAmsterdamCT/PDP
Intel HPC Round TableNDA meeting, so Intel folks could talk freelyLatest news from Intel on CPUs, Xeon Phi's,
storage, networkingSessions on success stories with Intel stuff The real reason to go is the ability to directly
talk to Intel developers◦ Talked to C++ compiler developer◦ Talked to Xeon Phi developer
J.J. KeijserNikhefAmsterdamCT/PDP
News on next-gen Xeon PhiAlso in socket format – replaces a regular
XeonWill always use special memory for best
performance, so caching data locally remains vital
Based on Atom cores instead of current Pentium-M cores: x86_64 compatible
J.J. KeijserNikhefAmsterdamCT/PDP
GPUs are excellent for performing the Same operation (Instruction) on
Multiple Data elements (SIMD)Vector processing, anyone?
Remember this slide?
From: Massively Parallel Computing with CUDA
J.J. KeijserNikhefAmsterdamCT/PDP
Available GPUs @ NikhefArun: 2 x NVIDIA Tesla M2070
Pleedo: 2 x Intel Xeon Phi 5110P
Plofkip: 2 x NVIDIA Tesla M1060, 1 x NVIDIA GTX580, 1 x AMD Radeon HD7970 GHz
Nesse: Jos Vermaseren's Xeon Phi 7120P
Some engineering workstations now have AMD Firepro W5000's, which are actually quite nice
J.J. KeijserNikhefAmsterdamCT/PDP
Which direction to take?OpenMP 4.0 looks promising
NVIDIA vs Intel vs AMD?◦ Most HPC centres use NVIDIA (cartesius)◦ However, look at raw performance: