Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf ·...
Transcript of Title — Calibri Bold 26pt - HEAnet › 2016 › files › 240 › LT10 HPC101.pdf ·...
HPC 101HEAnet National Conference 2016
Paddy DoyleSenior Sysadmin – Research IT / TCHPC (IT Services)
Date 2016-11-03
Trinity College Dublin, The University of Dublin
Brief Overview
Big picture
Motivation for HPC
What that means for software
What that means for hardware
Typical day as a HPC sysadmin
Trinity College Dublin, The University of Dublin
Big Picture
Large industry
– Circa $10 billion annual spend
Major vendors
– HP, IBM, Dell, SGI, Fujitsu, Intel
Largest HPC systems:
– 10,000,000s of CPU cores
– Many 10,000s of nodes
– 100s of cabinets
– 15MW of power!
High Performance Computing in numbers
Trinity College Dublin, The University of Dublin
Measuring Performance: Top500.org
High Performance LINPACK benchmark
– Dense linear algebra
FLOPS: FLoating-point Operations Per Second
List of most powerful machines
Machine Performance FLOPS
Typical PC 100 GFLOPS 100,000,000,000
Sunway TaihuLight (#1) 93 PFLOPS 93,000,000,000,000,000
Trinity College Dublin, The University of Dublin
Top 500 Performance DevelopmentCurrently Peta-scale; when will we reach Exa-scale?
Trinity College Dublin, The University of Dublin
Motivation for HPC
Bigger:
– memory-bound problems
Faster:
– CPU-bound problems
“HPC is the art of getting bigger things done faster” – D. Frost
Trinity College Dublin, The University of Dublin
What that means for software
Parallel languages and libraries
– MPI, OpenMP, CUDA, OpenCL, PGAS
– BLAS, MKL, ATLAS, FFTW, Boost, PLASMA, PETSc
System administration
– Resource manager, queuing system
– Uniform environments
– Parallel filesystem (100s or 1000s of client nodes)
Software must communicate between cores and compute nodes
Trinity College Dublin, The University of Dublin
What that means for hardware
Specialised hardware vs commodity servers
– Cray, IBM BlueGene
CPU: many-core, larger caches
Accelerator cards:
– GPGPU, Intel Xeon PHI
High-speed, low-latency networks
– Infiniband (40, 56, 96Gb/s; <1µs)
– Topologies: fat-tree, torus
Parallel filesystem
– Fast spinning disk, flash drives, hierarchies
Many cores, fast networking
Trinity College Dublin, The University of Dublin
Typical day of HPC sysadmin
[Occasionally] design, rack, install, provision new systems
What software do researchers need?
– ‘yum install’ or ‘./configure; make’
– Build gcc-6.2.0, then openmpi-2.0.1 using gcc, then boost-1.62 using both, THEN try to compile their software
– Compile scientific software (sometimes without Makefiles)
– Complex software stack!
Node / queue / network: health checks and auto-remediation
Tweak provisioning config (Salt, Ansible, Puppet etc)
“Why did my job fail?”
Thank You
Trinity College Dublin, The University of Dublin
References / Sources
– https://www.nextplatform.com/2016/06/22/hpc-spending-outpaces-market-will-continue/
– https://www.top500.org/statistics/list/
– https://www.olcf.ornl.gov/titan/
– https://www.top500.org/statistics/perfdevel/
– http://neilashton.co.uk/publications/
– http://hiwpp.noaa.gov/hpc/
– http://www.hpc-ch.org/first-realistic-simulation-of-the-formation-of-the-milky-way-computed-at-cscs/
– https://becksteinlab.physics.asu.edu/learning/53/density-functional-theory-simulation-of-rhodium-nanoframes-and-carbon-nanotube-graphene-pillars
– http://info.adtechglobal.com/blog/bid/304327/Don-t-Forget-the-Fabric-The-Role-of-High-bandwidth-Low-latency-Interconnects-in-High-Performance-Clusters
– https://computing.llnl.gov/tutorials/bgq/
– http://frabz.com/meme-generator/what-i-do/
– http://vignette2.wikia.nocookie.net/matrix/images/d/df/Thematrixincode99.jpg/revision/latest?cb=20140425045724
– http://www.quickmeme.com/meme/355ovv