Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera...
Transcript of Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera...
Using the IAC Chimera Cluster
Ángel de Vicente(Tel.: x5387)
SIE de Investigación y Enseñanza
Chimera overview
● Beowulftype cluster● Chimera: a monstrouscreature made of the parts ofmultiple animals.● Mailing list:[email protected]● Web page:http://chimera● Course on Adv. Prog. andParallel Comp. (June 1125)
Schematic View
Hardware Details
● Nodes:– 1 master node (EM64T)– 16 old i686 nodes: 32 Xeon 2.80 Ghz (chi32)– 16 new EM64T nodes: 32 Xeon 3.20 Ghz (chi64)
● RAM: 98 GB (master: 2 + chi32: 32 + chi64: 64)● Disk: ~ 5TB (master: 280 + chi32: 480 + chi64:
4.5TB)● Network: two independent Gigabit networks
(user applications and admin, nfs, etc.)
Disk space
● Useravailable space:– (all) /home (NFS master): 50 GB
/scratch (NFS master): 195 GB– (chi32) /local_scratch (local): (per node) 20 GB– (chi64) /mnt/pvfs2 (PVFS2 chi64): 3.9 TB
● /home quotas to be implemented● automatic deletion in the other partitions to be
implemented as well.
PVFS2 Introduction
● Stripes data across disks (chi64 in Chimera)● Larger files can be created, and potential band
with is increased.● Multiple userinterfaces:
– MPIIO support– Traditional Linux file system
PVFS2 Example● With MPIIO:
/scratch (NFS) /mnt/pvfs2 (PVFS2)Processors: 60Write bandwith: 24MB/s 892MB/sRead bandwith: 116MB/s 482MB/s
● Traditional Linux file system:local disk /scratch (NFS) /mnt/pvfs2(PVFS2)
Processors: 1Write 900 MB 14.77s 43.942s 11.779s Read 900 MB (wc) 6.401s 10.007s 45.942s
Modules package● Dynamic modification of a user's environment:
– PATH, MANPATH, etc.● Shared and/or private modulefiles.● Useful in managing different versions of applica
tions.● Very simple to use:
– module help | avail | list | load | unload● Use module commands is .bashrc for common
environment. ● Useful for dealing with chi32 vs. chi64
Compiling code● Code compiled in 64 bits can only run in chi64.● Code compiled in 32 bits can run in chi32, chi64
or chimera (chi32 + chi64).● By default you login into a 64bits environment.
– (see this by running uname a)● Modules are by default 64 bits. 32 bits versions
end with _32● Environment and modules' bitness should match.
Compiling code (2)● Compiling example for 64 bits:
– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c– [angelv@chimera sieminar]$ file cpi_64cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped
● Compiling example for 32 bits:– env32 puts us into a 32 bits environment–[angelv@chimera sieminar]$ module list (verify 32 bits versions)–[angelv@chimera sieminar]$ mpicc o cpi_32 cpi.c–[angelv@chimera sieminar]$ file cpi_32cpi_32: ELF 32bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped
Submitting jobs to the cluster● Chimera's queueing system:
– Torque: Resource Manager– Maui: Scheduler
● Maui/Torque basic commands: – showq, qsub, checkjob, canceljob
● qsub needs a submitting file:– [angelv@chimera sieminar]$ cat submitcpi
#!/bin/shNP=$(wc l $PBS_NODEFILE | awk '{print $1}')cd $PBS_O_WORKDIR
mpirun np $NP machinefile $PBS_NODEFILE ./cpi
Submitting jobs to the cluster (2)● With qsub you specify:
– the number of nodes required, the time required, the bitness of nodes required, etc.
● Example submissions:– To chi64 (default):
qsub l nodes=4:ppn=2,walltime=03:00:00 submitcpi– To chi32:
qsub l nodes=4:ppn=2 q chi32 submitcpi– To chimera:
qsub l nodes=4:ppn=2 q chimera submitcpi
Scheduling policies● Current policies NOT FIFO (/usr/local/maui/maui.cfg):
– Time in queue– Expansion factor– Backfilling– Number of requested processors– Fairshare
● Max time for a job: 3.5 days for 128 processors.● Usage of Beoiac (old cluster): 54.18% (last 2
years)● “The early bird catches the worm!”
Monitorization● Graphical view of scheduling status
(same output as showq, but perhaps easier to interpret) http://chimera/cgibin/mauistatus.pl
● Graphical view of different metrics of the cluster(are your allocated nodes really doing something?)http://chimera/ganglia/
Other resources at the IAC
● Condor system (~ 180 machines, ideal for parameter studies).
● Future CALP node (512 nodes, 20% exclusive to IAC)
References
● Beowulf.org (http://www.beowulf.org)● Chimera@wikipedia (http://en.wikipedia.org/wiki/Chimera_%28mythology%29)● IAC mailing list (http://listas.iac.es/mailman/listinfo/beowulf) ● Chimera IAC web page (http://chimera/)● IAC Course on Parallel Comp. (http://goya/SIE/forum/viewtopic.php?t=141)● PVFS2 (http://www.pvfs.org)● Modules package (http://modules.sourceforge.net)● Maui (http://www.clusterresources.com/pages/products/mauiclusterscheduler.php)● Torque (http://www.clusterresources.com/pages/products/torqueresourcemanager.php)● Condor IAC web page (http://www.iac.es/sieinvens/SINFIN/Condor/)