Post on 29-Mar-2015
Changhua, Li
National Astronomical Observatory of China
Short Report on the laohu GPU cluster usage at NAOC
Introduction of Laohu
Hardware configuration: 85 nodes+ Infiniband+140T
Node hardware info: Lenovo R740 , 2 Xeon E5520 CPUs , 24GB Memory , 500G disk, 2 Nvidia C1060 GPU cards.
Laohu GPU cluster built in 2009, the peak of single precision performance is 160TFLOPs. Total Cost: 6 million RMB 2009 (4/1 Min. of Finance ZDYZ2008-2-A06/NAOC)
Laohu upgrade
C1060 240 cores, 4GMemory, 933GFlops
In Sep. 2013 , we bought 59 K20 GPU cards for 59 nodes , we spent 1.18 million RMB. So, the new laohu configuration is 59 hosts with one k20 GPU card and 26 hosts with 3 C1060 GPU cards 。 In theory, the peak of single precision performance is 280 TFLOPS/s.
LAOHU Architecture
Laohu management system--LSFPlatform LSF (Load Sharing Facility) is a suite of distributed resource management products that: Connects computers into a Cluster (or “Grid”) ;
Monitors loads of systems ;
Distributes, schedules and balances workload ;
Controls access and load by policies ;
Analyzes the workload ;
High Performance Computing (HPC) environment
Laohu Queues for GPU job
GPU queues:
gpu_16: k20 host, max cores: 16, min cores: 4, total cores limitation: 32 gpu_8: k20 host, max cores: 8, min cores: 2, total cores limitation: 24 gpu_k20_test: k20 host, only 2 croes for one job, total coreslimitation: 3 gpu_c1060: c1060 host, max cores: 30, min cores: 2, total cores limitation: 66 gpu_c1060_test: c1060 host, only 3 cores for one job, total cores limitation: 9
Laohu Queues for CPU job
CPU queues cpu_32: 25-32 nodes with 7/5 Cpu cores per node (192 cores)
for per job, Allow to execute as two jobs. Maximum running time 1 week.
cpu_large: 8 - 22 nodes with 7/5 Cpu cores per node (total: 48 cores). Allow to execute as many jobs. Maximum running time 1 week.
cpu_small: 2 - 8 nodes with 7/5 cpu cores per node for per single job.Allow to execute as many job to fill 8 nodes/48 cpu cores. Maximum running time 1 week.
cpu_test: 1 - 5 nodes with 7/5 cpu cores per node(total: 30 cores). Allow to execute as many job to fill 5 nodes/30 cpu cores. Maximum running time 3 hours
CPU job submit script
Sample 1: cpujob.lsf#!/bin/sh #BSUB -q cpu_32 #job queue, modify according to user#BSUB -a openmpi-qlc #BSUB -R 'select[type==any] span[ptile=6] ‘ #resource
requirement of host#BSUB -o out.test #output file#BSUB -n 132 #the
maximum number of CPU
mpirun.lsf --mca "btl openib,self" Gadget2wy WJL.PARAM # need modify for user’s
program.
Exec method: bsub < cpujob.lsf
GPU job submit script
Sample 2: gpujob.lsf#!/bin/sh #BSUB -q gpu_32 #job queue#BSUB -a openmpi-qlc #BSUB -R 'select[type==any]‘ #resource
requirement of host
#BSUB -o out.test #output file#BSUB –e out.err#BSUB -n 20 #the
maximum number of CPU
mpirun.lsf --prefix "/usr/mpi/gcc/openmpi-1.3.2-qlc" -x "LD_LIBRARY_PATH=/export/cuda/lib:/usr/mpi/gcc/openmpi-1.3.2- qlc/lib64" ./phi-GRAPE.exe
# need modify for user’s program.
Exec method: bsub < gpujob.lsf
LAOHU Monitoring
http://laohu.bao.ac.cn
LAOHU SOFT.
CUDA4.0/CUDA5.0OPENMPI/IntelMPI, etc.GCC 4.1/GCC4.5Intel CompilerMath lib: blas, gsl, cfitsio, fft,…Gnuplot, pgplotGadget
55
20
1510
NAOCInternationalKIAAOthers
LAOHU Users
Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec.0%
10%20%30%40%50%60%70%80%90%
100%
LAOHU CPU utilization ratio (2012)
2012 (Avg. 74%)
Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov.0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
LAOHU CPU utilization ratio (2013)
2013 (Avg. 64%)
LAOHU Application List
1. NBODY Simulations (NBODY6++, phiGPU, Galactic Nuclei, Star Clusters)
2. NBODY Simulations (Gadget2, galactic dynamics)3. Correlator(only test)4. Gravitational Microlensing5. Local spirals formation through major merger6. Dark energy survey7. TREND, the Mento carlo simulation for the extreme-high energy Extensive AirShower(EAS)8. Parallelization of Herschel Interactive Processing Environment9. The HII region and PDR modeling based on CLOUDY code10. Reconstructing primordial power spectrum and dark energy
equation of state……
文章标题 作者 时间 下载链接
Astrophysical Supercomputing with “Green” GPU Clusters in Jülich and Beijing
Rainer,peter 03/2012 http://inside.hlrs.de/pdfs/inSiDE_spring2012.pdf
Loops formed by tidal tails as fossil records of a major merger
Wang, J.; Hammer, F.; Athanassoula, E.; Puech, M.; Yang, Y.; Flores, H.
02/2012 http://adsabs.harvard.edu/abs/2012A%26A...538A.121W
Made-to-measure galaxy models - III Modelling with Milky Way observations
Long,R.J.; Mao,Shude; Shen,Juntai; Wang,Yougang
09/2012http://adsabs.harvard.edu/abs/2012arXiv1209.0145L
Made-to-measure galaxy models - II. Elliptical and lenticular galaxies
Long, R. J.; Mao, Shude
04/2012http://adsabs.harvard.edu/abs/2012MNRAS.421.2580L
A New Model for the Milky Way Bar
Wang,Yougang; Zhao,Hongsheng; Mao,Shude; Rich, R.M.
09/2012http://adsabs.harvard.edu/abs/2012arXiv1209.0963W
On the Survivability and Metamorphism of Tidally Disrupted Giant Planets: the Role of Dense Cores
Liu, Shang-Fei; Guillochon, James; Lin, Douglas N. C.; Ramirez-Ruiz, Enrico
11/2012http://adsabs.harvard.edu/abs/2012arXiv1211.1971L
Interaction of Recoiling Supermassive Black Holes with Stars in Galactic Nuclei
Li, Shuo; Liu, F. K.; Berczik, Peter; Chen, Xian; Spurzem, Rainer
03/2012 http://adsabs.harvard.edu/abs/2012ApJ...748...65L
LAOHU AchievementBerczik, P., Nitadori, K., Zhong S., Spurzem, R., Hamada, T, Wang, X.W., Berentzen, I., Veles, A., Ge, W., Proceedings of the
International conference on High Performance Computing
High Performance massively parallel direct N-body simulations on large GPU clusters
Amaro-Seoane, P., Miller, M. C., Kennedy, G. F., Monthly Notices of the Royal Astronomical Society
Tidal disruptions of separated binaries in galactic nuclei
Just, A., Yurin, D., Makukov, M., Berczik, P., Omarov, C., Spurzem, R., Vilkoviskij, E. Y., The Astrophysical Journal
Enhanced Accretion Rates of Stars on Supermassive Black Holes by Star-Disk Interactions in Galactic Nuclei
Taani, A., Naso, L., Wei, Y., Zhang, C., Zhao, Y., Astrophysics and Space Science
Modeling the spatial distribution of neutron stars in the Galaxy
Olczak, C., Spurzem, R., Henning, T., Kaczmarek, T., Pfalzner, S., Harfst, S., Portegies Zwart, S., Advances in Computational
Astrophysics: Methods, Tools, and Outcome
Dynamics in Young Star Clusters: From Planets to Massive Stars
Spurzem, R., Berczik, P., Zhong, S., Nitadori, K., Hamada, T., Berentzen, I., Veles, A., Advances in Computational Astrophysics:
Methods, Tools, and Outcome
Supermassive Black Hole Binaries in High Performance Massively Parallel Direct N-body Simulations on Large GPU Clusters
Khan, F. M., Preto, M., Berczik, P., Berentzen, I., Just, A., Spurzem, R., The Astrophysical Journal
Mergers of Unequal-mass Galaxies: Supermassive Black Hole Binary Evolution and Structure of Merger Remnants
Li, S., Liu, F. K., Berczik, P., Chen, X., Spurzem, R., The Astrophysical Journal
Interaction of Recoiling Supermassive Black Holes with Stars in Galactic Nuclei
..... http://silkroad.bao.ac.cn/web/index.php/research/publications
HPC/GPU Training
Astronomy Cloud Project
Astronomy Cloud Architecture
THANKS!
Email: lich@nao.cas.cn