Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage...

20
Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC

Transcript of Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage...

Page 1: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Changhua, Li

National Astronomical Observatory of China

Short Report on the laohu GPU cluster usage at NAOC

Page 2: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Introduction of Laohu

Hardware configuration: 85 nodes+ Infiniband+140T

Node hardware info: Lenovo R740 , 2 Xeon E5520 CPUs , 24GB Memory , 500G disk, 2 Nvidia C1060 GPU cards.

Laohu GPU cluster built in 2009, the peak of single precision performance is 160TFLOPs. Total Cost: 6 million RMB 2009 (4/1 Min. of Finance ZDYZ2008-2-A06/NAOC)

Page 3: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Laohu upgrade

C1060 240 cores, 4GMemory, 933GFlops

In Sep. 2013 , we bought 59 K20 GPU cards for 59 nodes , we spent 1.18 million RMB. So, the new laohu configuration is 59 hosts with one k20 GPU card and 26 hosts with 3 C1060 GPU cards 。 In theory, the peak of single precision performance is 280 TFLOPS/s.

Page 4: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

LAOHU Architecture

Page 5: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Laohu management system--LSFPlatform LSF (Load Sharing Facility) is a suite of distributed resource management products that: Connects computers into a Cluster (or “Grid”) ;

Monitors loads of systems ;

Distributes, schedules and balances workload ;

Controls access and load by policies ;

Analyzes the workload ;

High Performance Computing (HPC) environment

Page 6: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Laohu Queues for GPU job

GPU queues: 

gpu_16: k20 host, max cores: 16, min cores: 4,  total cores limitation: 32  gpu_8:   k20 host, max cores: 8,  min cores: 2,  total cores limitation: 24 gpu_k20_test: k20 host, only 2 croes for one job, total coreslimitation: 3 gpu_c1060:   c1060 host, max cores: 30, min cores: 2, total cores limitation: 66       gpu_c1060_test: c1060 host, only 3 cores for one job, total cores limitation: 9

Page 7: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Laohu Queues for CPU job

CPU queues cpu_32: 25-32 nodes with 7/5 Cpu cores per node (192 cores)

for per job, Allow to execute as two jobs. Maximum running time 1 week.

cpu_large: 8 - 22 nodes with 7/5 Cpu cores per node (total: 48 cores). Allow to execute as many jobs. Maximum running time 1 week.

 cpu_small: 2 - 8 nodes with 7/5 cpu cores per node for per single job.Allow to execute as many job to fill 8 nodes/48 cpu cores. Maximum running time 1 week.

 cpu_test: 1 - 5 nodes with 7/5 cpu cores per node(total: 30 cores). Allow to execute as many job to fill 5 nodes/30 cpu cores. Maximum running time 3 hours

Page 8: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

CPU job submit script

Sample 1: cpujob.lsf#!/bin/sh #BSUB -q cpu_32 #job queue, modify according to user#BSUB -a openmpi-qlc #BSUB -R 'select[type==any] span[ptile=6] ‘ #resource

requirement of host#BSUB -o out.test #output file#BSUB -n 132 #the

maximum number of CPU

mpirun.lsf --mca "btl openib,self" Gadget2wy WJL.PARAM # need modify for user’s

program.

Exec method: bsub < cpujob.lsf

Page 9: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

GPU job submit script

Sample 2: gpujob.lsf#!/bin/sh #BSUB -q gpu_32 #job queue#BSUB -a openmpi-qlc #BSUB -R 'select[type==any]‘ #resource

requirement of host

#BSUB -o out.test #output file#BSUB –e out.err#BSUB -n 20 #the

maximum number of CPU

mpirun.lsf --prefix "/usr/mpi/gcc/openmpi-1.3.2-qlc" -x "LD_LIBRARY_PATH=/export/cuda/lib:/usr/mpi/gcc/openmpi-1.3.2- qlc/lib64" ./phi-GRAPE.exe

# need modify for user’s program.

Exec method: bsub < gpujob.lsf

Page 10: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

LAOHU Monitoring

http://laohu.bao.ac.cn

Page 11: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

LAOHU SOFT.

CUDA4.0/CUDA5.0OPENMPI/IntelMPI, etc.GCC 4.1/GCC4.5Intel CompilerMath lib: blas, gsl, cfitsio, fft,…Gnuplot, pgplotGadget

Page 12: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

55

20

1510

NAOCInternationalKIAAOthers

LAOHU Users

Page 13: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec.0%

10%20%30%40%50%60%70%80%90%

100%

LAOHU CPU utilization ratio (2012)

2012 (Avg. 74%)

Page 14: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov.0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

LAOHU CPU utilization ratio (2013)

2013 (Avg. 64%)

Page 15: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

LAOHU Application List

1. NBODY Simulations (NBODY6++, phiGPU, Galactic Nuclei, Star Clusters)

2. NBODY Simulations (Gadget2, galactic dynamics)3. Correlator(only test)4. Gravitational Microlensing5. Local spirals formation through major merger6. Dark energy survey7. TREND, the Mento carlo simulation for the extreme-high energy Extensive AirShower(EAS)8. Parallelization of Herschel Interactive Processing Environment9. The HII region and PDR modeling based on CLOUDY code10. Reconstructing primordial power spectrum and dark energy

equation of state……

Page 16: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

文章标题 作者 时间 下载链接

Astrophysical Supercomputing with “Green” GPU Clusters in Jülich and Beijing

Rainer,peter 03/2012 http://inside.hlrs.de/pdfs/inSiDE_spring2012.pdf

Loops formed by tidal tails as fossil records of a major merger

Wang, J.; Hammer, F.; Athanassoula, E.; Puech, M.; Yang, Y.; Flores, H.

02/2012 http://adsabs.harvard.edu/abs/2012A%26A...538A.121W

Made-to-measure galaxy models - III Modelling with Milky Way observations

Long,R.J.; Mao,Shude; Shen,Juntai; Wang,Yougang

09/2012http://adsabs.harvard.edu/abs/2012arXiv1209.0145L

Made-to-measure galaxy models - II. Elliptical and lenticular galaxies

Long, R. J.; Mao, Shude

04/2012http://adsabs.harvard.edu/abs/2012MNRAS.421.2580L

A New Model for the Milky Way Bar

Wang,Yougang; Zhao,Hongsheng; Mao,Shude; Rich, R.M.

09/2012http://adsabs.harvard.edu/abs/2012arXiv1209.0963W

On the Survivability and Metamorphism of Tidally Disrupted Giant Planets: the Role of Dense Cores

Liu, Shang-Fei; Guillochon, James; Lin, Douglas N. C.; Ramirez-Ruiz, Enrico

11/2012http://adsabs.harvard.edu/abs/2012arXiv1211.1971L

Interaction of Recoiling Supermassive Black Holes with Stars in Galactic Nuclei

Li, Shuo; Liu, F. K.; Berczik, Peter; Chen, Xian; Spurzem, Rainer

03/2012 http://adsabs.harvard.edu/abs/2012ApJ...748...65L

LAOHU AchievementBerczik, P., Nitadori, K., Zhong S., Spurzem, R., Hamada, T, Wang, X.W., Berentzen, I., Veles, A., Ge, W., Proceedings of the

International conference on High Performance Computing

High Performance massively parallel direct N-body simulations on large GPU clusters 

Amaro-Seoane, P., Miller, M. C., Kennedy, G. F., Monthly Notices of the Royal Astronomical Society

Tidal disruptions of separated binaries in galactic nuclei 

Just, A., Yurin, D., Makukov, M., Berczik, P., Omarov, C., Spurzem, R., Vilkoviskij, E. Y., The Astrophysical Journal

Enhanced Accretion Rates of Stars on Supermassive Black Holes by Star-Disk Interactions in Galactic Nuclei 

Taani, A., Naso, L., Wei, Y., Zhang, C., Zhao, Y., Astrophysics and Space Science

Modeling the spatial distribution of neutron stars in the Galaxy 

Olczak, C., Spurzem, R., Henning, T., Kaczmarek, T., Pfalzner, S., Harfst, S., Portegies Zwart, S., Advances in Computational

Astrophysics: Methods, Tools, and Outcome

Dynamics in Young Star Clusters: From Planets to Massive Stars 

Spurzem, R., Berczik, P., Zhong, S., Nitadori, K., Hamada, T., Berentzen, I., Veles, A., Advances in Computational Astrophysics:

Methods, Tools, and Outcome

Supermassive Black Hole Binaries in High Performance Massively Parallel Direct N-body Simulations on Large GPU Clusters 

Khan, F. M., Preto, M., Berczik, P., Berentzen, I., Just, A., Spurzem, R., The Astrophysical Journal

Mergers of Unequal-mass Galaxies: Supermassive Black Hole Binary Evolution and Structure of Merger Remnants 

Li, S., Liu, F. K., Berczik, P., Chen, X., Spurzem, R., The Astrophysical Journal

Interaction of Recoiling Supermassive Black Holes with Stars in Galactic Nuclei

..... http://silkroad.bao.ac.cn/web/index.php/research/publications

Page 17: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

HPC/GPU Training

Page 18: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Astronomy Cloud Project

Page 19: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

Astronomy Cloud Architecture

Page 20: Changhua, Li National Astronomical Observatory of China Short Report on the laohu GPU cluster usage at NAOC.

THANKS!

Email: [email protected]