Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs...
Transcript of Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs...
![Page 1: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/1.jpg)
Photos placed in horizontal position with even amount of white space
between photos and header
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
InitialExperienceswithDeployingSingularityonaCrayXCSupercomputer
AndrewJ.Younge,KevinPedretti{ajyoung,ktpedre}@sandia.gov
![Page 2: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/2.jpg)
Outline
§ OverviewofContainers§ ContainersinHPC§ WhySingularity?§ HPCcontainers@Sandia
§ Trilinos &ATDMapps§ HPCG
§ Dev-opsMechanisms§ InitialBenchmarking§ Conclusion
![Page 3: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/3.jpg)
WhatareContainers?§ “Anobjectthatcanbeusedtoholdortransportsomething.”§ Awaytopackagenecessarycomponentsofrunning
applications.§ Libraries,software,files,environmentsettings,etc.
§ OS-levelvirtualization§ Relieson1OSkernel– aka“chroot onsteroids”§ cgroups forresourceisolation,namespacesforprocessisolation,
chroot forfilesystemisolation.
§ DifferentthanHostVirtualization§ SingleOSKernelthatdoesallthehardwork
![Page 4: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/4.jpg)
ContainersinIndustry§ Containersareusedtocreatelarge-scalelooselycoupled
services§ Eachcontainerruns1userprocess– “micro-services”
§ 3httpd containers,2DBs,1logger,etc
§ Scalingachievedthroughloadbalancersandprovisioning§ Jammanycontainersonhostsforincreasedsystemutilization§ Helpswithdev-opsissues
§ Samesoftwareenvironmentfordevelopinganddeploying§ Onlyimageschangesarepushedtoproduction,notwholenewimage(CoW).§ Developonlaptop,pushtoproductionservers§ Interactwithgithub similartodevelopercodebases§ Uploadimagesto”hub”or“repository”wherebytheycanjustbepulledand
provisioned
![Page 5: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/5.jpg)
ContainerfeatureswantedinHPC
§ Developersprescriberunningsoftwareenvironment§ ”Bring-your-own-environment”§ Notboundbyvendorsoftwaredelivery§ Notboundbysysadminsupportforadditionallibraries§ Developersknowbesthowtorun,letusersjustspecifyit
§ Easydefinitionofapplicationcompilation&runtimesetup§ Integrationwithgithub orotherdevenvironments§ Couldenablebetterportabilitybetweenarchitectures
![Page 6: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/6.jpg)
ContainerfeaturesnotwantedinHPC
§ Overhead– cannotslowdownadvancedarchitecturesupercomputersbeyondreason§ Posit:<5%maybeok,anymoreisbigproblem
§ Micro-servicessupportandon-noderesourcepartitioning§ Don’tneedcgroups tosliceupindividualcomputenodes§ Notrunningservices,butrealapplications
§ Runningasroot!§ Networkingaspectscanbeleftout
![Page 7: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/7.jpg)
ContainerVision@Sandia
§ SupportsoftwaredevandtestingonlaptopswhichcreateworkingbuildsthatcanrunonHPCmachines§ MayalsoleverageVM/binarytranslation
§ LetdevelopersspecifyhowtobuildtheenvironmentANDtheapplication§ Usersjustimportcontainerandrunontargetplatform.§ Manycontainers,butcanhavedifferentcode“branches”forarch,
compilers,etc.§ Notboundtovendorandsysadminversions&releasecycles
§ Wantalltheperformance§ Wanttomanagepermutationsofarchitecturesand
compilers§ X86&KNL,ARM,POWER9,etc.§ Intel,GCC,LLVM
![Page 8: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/8.jpg)
WhySingularity?§ SingularityisasimplecontainersolutioncreatedbyLBNL§ Basedonsingletoncontainerimages
§ NotlayeredAUFSimagesordev-mapperinsanity§ Imagesharing&managementmadeeasy
§ Providesusernamespaces§ Userajyounge onHPCsystemmapstoajyounge incontainer§ RunningasrootonHPCresourcesnotallowed!
§ Sitefilesystemscanalsobemounted§ BringinMPIlibsortunedlibraries,etc
§ Integrationwithexistingschedulingsystems§ Makebinariesavailableoncomputenodes
§ NoVendorlock-in.WantportableHPCcontainersolution§ SupportedinOpenHPC via1.3.1releaselastweek
8
![Page 9: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/9.jpg)
SingularityonCrayXC-series
§ Craysarespecialmachines§ CrayCNLisread-onlyimagewithtmpfs mounts§ LustreorNFSoverCrayDVSfilesystem§ SpecializedLinuxkernelw/outstandardfeaturesets
§ HadtomodifyCNLtobuildinnecessarykernelfeatures§ XC30runs3.0.101kernel(old)§ RebuildCrayimagewithbuild-infeatures
§ LoopbackdevicesupportandEXT3
§ ProvisionnewCNLtointeractivenodesandcomputenodes§ SimilartoKVMonCrayeffort(RelatedWork)§ “EnablingDiverseSoftwareStacksonSupercomputersusingHighPerformanceVirtual
Clusters“
9
![Page 10: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/10.jpg)
ContainerBuild#1:Trilinos Muelu§ Trilinos providesmathlibrary
packagesformanyapplicationsofinterest@Sandia
§ Trilinos itselfdependsonnumerous3rd partylibraries
§ CancondensecomplexcompilationstepsdowntojustasimpleDockerfile§ Predictable&stable
environmentacrossdeployments§ Enablestestingacrossmultiple
architecturesandvalidationofTPLchanges
FROM ajyounge/dev-tpl
WORKDIR /opt/trilinos# Copy files to image COPY do-configure /opt/trilinos/ # Download Trilinos source tarballRUN wget -nvhttps://trilinos.org/oldsite/download/files/trilinos-12.8.1-Source.tar.gz -O /opt/trilinos/trilinos.tar.gz# Extract Trilinos source file & load mpi libraryRUN tar xf /opt/trilinos/trilinos.tar.gz -C /opt/trilinos/ RUN rm -f /opt/trilinos/trilinos.tar.gzRUN mv /opt/trilinos/trilinos-12.8.1-Source /opt/trilinos/trilinosRUN mkdir /opt/trilinos/trilinos-buildRUN module load mpi
# Compile TrilinosRUN /opt/trilinos/do-configure RUN cd /opt/trilinos/trilinos-build && make -j 3 #Link in a tutorial directory, and then set the workdirRUN ln -s /opt/trilinos/trilinos-build/packages/muelu/doc/Tutorial/src /opt/muelu-tutorial WORKDIR /opt/muelu-tutorialCMD ["/bin/bash"]
10
![Page 11: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/11.jpg)
ContainerBuild#2:HPCG§ Straight-forwardcontainer
buildforHPCG§ UseCentos7image,install
basicsoftware§ InstallIntelparallelstudio
2017§ Silentconfiguration§ Pullinsitelicenseorusetrial§ Cleanupinstallfiles(>8GB)
§ ExtractandbuildHPCG3.0withCXX = mpiicpc
FROM centos:7.2.1511ARG intel_file=parallel_studio_xe_2017_update2# Dependencies and MPICHRUN yum update -y && yum groupinstall -y "Development Tools”RUN yum install -y mpich-3.2 mpich-3.2-devel redhat-lsb
# Intel compiler installCOPY $intel_file.tgz /RUN tar xvfz /$intel_file.tgzRUN mkdir -p /opt/intel/licensesCOPY USE_SERVER.lic /opt/intel/licenses/#Silent configuration installationCOPY silent.cfg /$intel_file/silent.cfgRUN /$intel_file/install.sh --silent /$intel_file/silent.cfgRUN echo "source /opt/intel/bin/compilervars.sh intel64" >> /etc/bashrcRUN rm -rf /$intel_file && rm /$intel_file.tgz
#Build and HPCGCOPY hpcg-3.0.tar.gz /opt/RUN tar xvfz /opt/hpcg-3.0.tar.gz -C /opt/COPY Make.Linux_intel_mpich /opt/hpcg-3.0/setup/RUN mkdir -p /opt/hpcg-3.0/Linux_intel_mpich/WORKDIR /opt/hpcg-3.0/Linux_intel_mpichRUN ../configure Linux_intel_mpichRUN /bin/bash -c "source /opt/intel/bin/compilervars.shintel64 && make”CMD ["/bin/bash"]
11
![Page 12: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/12.jpg)
Dev-opsPathway
12
Gitlab ConainterRegistry
SingularityServer
Cray Login Server
Cray CNLLustre
/NFS
![Page 13: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/13.jpg)
Dev-opsPathway
13
Gitlab ConainterRegistry
SingularityServer
Cray Login Server
Cray CNLLustre
/NFS
lap$ docker login gitlab.sandia.govlap$ docker build .lap$ docker tag 0e5574283393 ajyounge/hpcg-containerlap$ docker push ajyounge/hpcg-container:latest
![Page 14: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/14.jpg)
Dev-opsPathway
14
Gitlab ConainterRegistry
SingularityServer
Cray Login Server
Cray CNLLustre
/NFS
lap$ docker login gitlab.sandia.govlap$ docker build .lap$ docker tag 0e5574283393 ajyounge/hpcg-containerlap$ docker push ajyounge/hpcg-container:latest
ss$ sudo singularity create –s 12G hpcg-container.imgss$ sudo singularity import hpcg-container.imgdocker://gitlab.sandia.gov/ajyounge/hpcg-container:latest
![Page 15: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/15.jpg)
Dev-opsPathway
15
Gitlab Container Registry
SingularityServer
Cray Login Server
Cray CNLLustre
/NFS
lap$ docker login gitlab.sandia.govlap$ docker build .lap$ docker tag 0e5574283393 ajyounge/hpcg-containerlap$ docker push ajyounge/hpcg-container:latest
ss$ sudo singularity create –s 12G hpcg-container.imgss$ sudo singularity import hpcg-container.imgdocker://gitlab.sandia.gov/ajyounge/hpcg-container:latest
cray$ scp ss:~/hpcg-container.img .cray$ aprun –n 24 –L 62,63 singularity exec hpcg-container.img ./xhpcg
![Page 16: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/16.jpg)
Dev-opsPathway(new)
16
Gitlab Container Registry
Cray Login Server
Cray CNLLustre
/NFS
lap$ docker login gitlab.sandia.govlap$ docker build .lap$ docker tag 0e5574283393 ajyounge/hpcg-containerlap$ docker push ajyounge/hpcg-container:latest
cray$ singularity pull –name hpcg.container.imgdocker://gitlab.sandia.gov/ajyoung/hpcg-container:latestcray$ aprun –n 24 –L 62,63 singularity exec hpcg-container.img ./xhpcg
![Page 17: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/17.jpg)
Singularity+CrayInterconnect§ UsingVoltatestbed– CrayXC30IvyBridge§ ContainerusingTCP/IP– nochangesnecessary
§ UseCray’sIP-over-AriesEthernetdevice(ipogif0)§ Betterthan10gbEthernetperformance(~32Gbs)
§ IntelMPInotoptimizedforAriesnetwork§ BringCray’sMPIimplementationincontainer
§ Mount/opt/cray§ Mount/var/opt/cray§ SetLD_LIBRARY_PATHaccordinglyincontainer
17
cray$ aprun -n 24 -L63 singularity exec hpcg-container.img /bin/bash -c "export LD_LIBRARY_PATH=/opt/cray/ugni/6.0-1.0502.10863.8.29.ari/lib64:/opt/cray/xpmem/0.1-2.0502.64982.5.3.ari/lib64:/opt/cray/pmi/5.0.11/lib64:/opt/cray/udreg/2.3.2-1.0502.10518.2.17.ari/lib64:/opt/cray/mpt/7.5.1/gni/mpich-intel-abi/16.0/lib:/opt/cray/alps/5.2.4-2.0502.9822.32.1.ari/lib64:/opt/cray/wlm_detect/1.0-1.0502.64649.2.1.ari/lib64:/opt/intel/lib/intel64:$LD_LIBRARY_PATH && /opt/hpcg-3.0/Linux_intel_mpich/bin/xhpcg"
![Page 18: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/18.jpg)
HPCGEfficiency
18
![Page 19: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/19.jpg)
HPCGPerformanceSummary
§ Singularitypresentsnear-nativeruntimeperformance§ KVMalsogood,buthasalittlemoreoverhead(likelyduetoIntelMPI)
§ ScalingresultsTBD,butexpectthesame§ KVMscales90%ofnative@786cores,Singularitywillbebetter§ UsingCrayMPI&AriesInterconnectisakeyfeaturetogettingnear-
nativeperformance§ StayingABIcompatibleforMPIismandatory
§ Imagedeploymentismostlikelysourceofoverhead§ ScalabilityofmountinglookbackimagesonLustre/NFS?§ Read-onlyhelps,butmaynotsolveallproblems
19
![Page 20: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/20.jpg)
Conclusion
§ SingularityworksonCrayXCseriessupercomputers§ ModificationstoCNLnecessary§ Performanceisnear-native
§ Additionalfeaturesneededforcleandeployment§ Site-specificENVvariables§ OverlayFS
§ Performancenear-nativewithHPCG§ Notsurprising§ UsingCrayMPIandABIcompatibility
§ SingularityisidealforHPCinteroperability
20
![Page 21: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/21.jpg)
FutureConsiderations
§ Containerstorageatscale§ Howtouseothertunedlibrariesandsite-specificsoftware
§ ABIcompatibility?
§ CantheHPCcommunityagreeoncontainerinteroperability?§ Imageformats,manifests,etc.
§ Multi-architecturesupport§ Vendorsupportforlaptopdevelopment?
21
![Page 22: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/22.jpg)
Photos placed in horizontal position with even amount
of white spacebetween photos
and header
Photos placed in horizontal position with even amount of white space
between photos and header
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
Thanks!
![Page 23: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/23.jpg)
Photos placed in horizontal position with even amount
of white spacebetween photos
and header
Photos placed in horizontal position with even amount of white space
between photos and header
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
BackupSlides
![Page 24: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/24.jpg)
Photos placed in horizontal position with even amount
of white spacebetween photos
and header
Photos placed in horizontal position with even amount of white space
between photos and header
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
![Page 25: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/25.jpg)
Photos placed in horizontal position with even amount
of white spacebetween photos
and header
Photos placed in horizontal position with even amount of white space
between photos and header
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
![Page 26: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/26.jpg)
Intra-nodecommunication–CrayMPIvsIntelMPI(KVM)
![Page 27: Initial Experiences with Deploying Singularity on a Cray ...€¦ · services §Each container runs 1 user process –“micro-services” § 3 httpdcontainers, 2 DBs, 1 logger, etc](https://reader034.fdocuments.us/reader034/viewer/2022042917/5f58ba882659e94ec243e39d/html5/thumbnails/27.jpg)
XC30HPCGKVMScaling
27