Guillimin HPC Users Meeting December 15, 2016 guillimin ...
Transcript of Guillimin HPC Users Meeting December 15, 2016 guillimin ...
Guillimin HPC Users Meeting - December 2016
Guillimin HPC Users MeetingDecember 15, 2016
McGill University / Calcul Québec / Compute CanadaMontréal, QC Canada
Guillimin HPC Users Meeting - December 2016
• Please be kind to your fellow user meeting attendees • Limit to two slices of pizza per person to start please• And please recycle your pop cans.• Thank you!
2
Guillimin HPC Users Meeting - December 2016
• Compute Canada News• System Status• Software Updates• Training News• Special Topic
• Singularity as an alternative to Docker for HPC systems
Outline
3
Guillimin HPC Users Meeting - December 2016
• Cedar (GP2) and Graham (GP3) specifications:• https://docs.computecanada.ca/wiki/Migration2016:Ne
w_Systems• NDC: National Data Cyberinfrastructure (storage)
• Cedar-Compute + Cedar-GPU + NDC-SFU• Graham-Compute + Graham-GPU + NDC-Waterloo
• 2017 Resource Allocation Competitions• RAC (RPP, Fast Tracks and RRG) reviews undergoing
Compute Canada News
4
Guillimin HPC Users Meeting - December 2016
• GPFS file system related downtimes:• Friday November 11 - fixed over the following week• Friday November 25 - fixed over the weekend• Monday November 28 - quick recovery on Nov. 29• Tuesday November 29 - recovery on Nov. 30 evening• Early December - intermittent slowness, with very
quick recovery due to active and sustained monitoring
Storage and Infiniband Status
5
Guillimin HPC Users Meeting - December 2016
• Contributing factors include:• Hardware issues: faulty infiniband network cables and
switch modules• Ethernet core switch downtime• GPFS software: Long waiters, causing node expelling
(temporarily losing access to GPFS), monitoring scripts that have good intentions but apply pressure
• Fixes and remedies applied:• Reseated and replaced faulty network cables• Made system more resilient: no more local DNS
lookups via ethernet, fixed scripts, so that failures are localized and do not spread to the whole system
Guillimin core elements nearly 6 years old
Storage and Infiniband Status
6
Guillimin HPC Users Meeting - December 2016
• Space Management• /gs is full: 97% used, 124 TB free (as of Dec. 15)
• For better space management we continue to migrate cold data from disk to tape• Metadata remains on disk• Users can still access their files through usual
methods, but with an increased latency• Storage space is a precious resource - manage it
wisely!• Delete temporary files, compress large files not
frequently accessed, tar many smaller files into collections, …
Storage Status
7
Guillimin HPC Users Meeting - December 2016
• Matlab R2016b (for users from McGill only)• Matlab Distributed Computing Server (MDCS) R2016b• Singularity/2.2• Stacks 1.44 (Genomic)• Intel Advisor/2017_update1 (analyzes vectorization and
threading in code)• LAMMPS/20161117 (Molecular Dynamics)• OpenFOAM/2.4.0 (CFD)• R-bundle-Bioconductor/3.3-R-3.3.1 (Bioinformatics)• OligoArrayAux/3.3 (Genomic)• Xerces-C++/3.1.4 (XML Parser)
New Software Installations
8
Guillimin HPC Users Meeting - December 2016
• All upcoming events: calculquebec.eventbrite.ca• ---
• Recently completed:• Nov. 23 - Programmation en R intermédiaire (U. Montreal)• Dec. 1 - Advanced and Parallel Python (McGill U.)• Dec. 5 - Easy GPU Programming with OpenACC (U.
Montreal)• Dec. 6 - Introduction à la programmation en Python (U.
Sherb.)• All materials from previous workshops are available
online: wiki.calculquebec.ca/w/Formations/en• All user meeting presentations online at www.hpc.mcgill.ca
Training News
9
Guillimin HPC Users Meeting - December 2016
• Support Level Activity during the Holiday Period• December 23 to January 2nd inclusive - returning January 3• Reduced level of access to general user support• All systems and services available and will be closely monitored• Priority and critical issues will be addressed
Other News
10
Happy Holidays! Joyeuses Fêtes!
Guillimin HPC Users Meeting - December 2016
• Questions? Comments?• We value your feedback. Contact us at:
• Guillimin Operational News for Users– Status Pages
• http://www.hpc.mcgill.ca/index.php/guillimin-status• http://serveurscq.computecanada.ca (all CQ systems)
– Follow us on Twitter• http://twitter.com/McGillHPC
User Feedback and Discussion
11
Guillimin HPC Users Meeting - December 2016
McGill University / Calcul Québec / Compute CanadaMontréal, QC Canada
Singularity as an alternative to Docker for HPC systemsDecember 15, 2016
Guillimin HPC Users Meeting - December 2016
Outline
• Overview of virtual machines and containers• What is Singularity?
• Container solution• Project lead: Gregory M. Kurtzer, LBNL
• (figures in this slide deck are his)• Why Singularity?
• Mobility of Compute• Reproducibility• User Freedom• Supports traditional HPC• Able to run newer software stack (that is, a whole
Linux distribution except the kernel itself) on older OS with minimal effort - or vice versa.
13
Guillimin HPC Users Meeting - December 2016
• Emulators• The whole machine including the CPU is emulated.
• Examples: Bochs, QEMU (without KVM), OpenMSX• Virtual Machines
• Most of the machine is emulated but CPU code runs mostly natively. There is a guest OS kernel.• Examples: VirtualBox, QEMU with KVM, VMware
• Containers• Code in a container interfaces directly with the host OS
kernel. There is no guest OS kernel.• Examples: Docker, Singularity
Emulators, Virtual Machines, Containers
14
Guillimin HPC Users Meeting - December 2016
Virtual Machines
15
Guillimin HPC Users Meeting - December 2016
Docker-style Containers
16
Guillimin HPC Users Meeting - December 2016
Docker vs Singularity
17
Guillimin HPC Users Meeting - December 2016
• Docker• Runs with a daemon that orchestrates everything• Primary use case: network service virtualization
• More lightweight than VMs• But… Docker tries to emulate VMs in many respects:
• Network isolation and other hardware isolation (using cgroups)
• Virtualized but still dangerous “root” account inside container
• “udocker” on guillimin removes “root” but still fairly isolated and relatively heavyweight.
• Singularity• No daemon, but only a launcher, container runs with normal
user-owned processes.• Only namespaces are virtual (file system) and optionally (not by
default), PIDs, no cgroups• So containers see the host network, infiniband, GPUs, etc.
Docker vs. Singularity
18
Guillimin HPC Users Meeting - December 2016
Singularity workflow
19
Guillimin HPC Users Meeting - December 2016
On system with root access (Linux laptop, Linux VM on Windows/Mac):
sudo singularity create --size 1024 centos7-ompi.img
Followed by bootstrapping, for example:sudo singularity bootstrap centos7-ompi.img
centos7-ompi.def Or importing, for example:
sudo singularity import tensorflow.img \
docker://tensorflow/tensorflow:latest
See if it works:singularity shell centos7-ompi.img
ls
exit
Singularity workflow
20
Guillimin HPC Users Meeting - December 2016
Next we copy the container to guillimin:scp centos7-ompi.img [email protected]:
And login to guillimin:ssh [email protected]
Loading the Singularity modulemodule load Singularity/2.2
Running a shell, can bind $SCRATCH or other folderssingularity shell centos7-ompi.img
singularity shell -B $SCRATCH centos7-ompi.img
mpirun inside/outside container:singularity exec mpirun -n 2 /usr/bin/mpi-ring
module load iomkl/2015b
mpirun -n 2 singularity exec /usr/bin/mpi-ring
Singularity workflow
21
Guillimin HPC Users Meeting - December 2016
OpenMPI with Singularity processes
22
Guillimin HPC Users Meeting - December 2016
Next we copy the container to guillimin:scp centos7-ompi.img [email protected]
And login to guillimin:ssh [email protected]
Loading the Singularity modulemodule load Singularity/2.2
Running a shell, can bind $SCRATCH or other folderssingularity shell centos7-ompi.img
singularity shell -B$SCRATCH
mpirun inside/outside container:singularity run mpirun -n 2 /usr/bin/mpi-ring
module load iomkl/2015b
mpirun -n 2 singularity /usr/bin/mpi-ring
Singularity workflow
23
Guillimin HPC Users Meeting - December 2016
Early adaptor, asked to install:NIAK, SIMEXP lab (Dr. Pierre Bellec, Pierre-Olivier Quirion)http://niak.simexp-lab.org/niak_installation.html
Singularity use within Guillimin
24