Linux hpc-cluster-setup-guide
-
Upload
jasembo -
Category
Technology
-
view
6.469 -
download
4
Transcript of Linux hpc-cluster-setup-guide
Guide to Building your Linux
High-performance Cluster
Edmund Ochieng
March 2, 2012
1
Abstract
In modern day where computer simulation forms a critical part inresearch, high-performance clusters have become a need in about everyeducational or research institution.
This paper aims to give you the instructions you need to setup yourpersonal computer. So if you are looking forward to setting up a cluster,this is the guide for you.
This guide is prepared with climate simulation in mind. However, be-sides the software required for climate simualtion, steps required to setupthe cluster remain more or less the same.
The setup aims to grant you the ability to run modelling, simulationand visualisation applications across multiple processors. Probably morethan you can have in a single server unit.
2
Contents
I Master node Configuration 5
1 Network configuration 61.1 Internal interface configuration . . . . . . . . . . . . . . . . . . . 61.2 External interface configuration . . . . . . . . . . . . . . . . . . . 6
2 MAC address acquisition 62.1 System Documentation / Manuals . . . . . . . . . . . . . . . . . 72.2 Netwotk Traffic Monitoring . . . . . . . . . . . . . . . . . . . . . 72.3 TFTP Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 DHCP configuration 9
4 Local Repository 11
5 EPEL Repository 11
6 NFS configuration 12
7 SSH Key Generation Script 13
II Software and Compiler installation and configura-tion 14
8 Torque configuration 15
9 Maui configuration 19
10 Compiler Installation 2110.1 GCC Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2110.2 Intel Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
11 OpenMPI installation 2111.1 OpenMPI Compiled with GCC Compilers . . . . . . . . . . . . . 2211.2 OpenMPI Compiled with Intel Compilers . . . . . . . . . . . . . 22
12 Environment Modules installation 22
13 C3 Tools installation 23
14 Password Syncing 24
15 NetCDF, HDF5 and GrADs installation 24
16 NCL and NCO installation 25
17 R Statistical package installation 25
3
III Computing Node Installation 26
18 Node OS installtion 27
19 Name resolution 28
4
Part I
Master node Configuration
5
1 Network configuration
1.1 Internal interface configuration
Set the network interface through which the DHCP service will listen for IPaddress request to be static and to start on system boot up. This is shouldappear similar to the configurations below.
1. With a text editor of your choice, edit your master node network config-uration for the network interface to be used to communicate with othernodes in your cluster.
[root@master ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet
DEVICE=eth0
#BOOTPROTO=dhcp
BOOTPROTO=static
HWADDR=00:16:36:E7:8B:A3
IPADDR=192.168.10.1
NETMASK=255.255.255.0
ONBOOT=yes
DHCP_HOSTNAME=master.cluster
2. Once the changes have been made, you can save the file and start theinterface.
3. Finally, you should invoke, the ifconfig instruction to confirm the settingsare active as illustrated below.
[root@master ~]# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:16:36:E7:8B:A3
inet addr:192.168.10.1 Bcast:192.168.10.127 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:74 Memory:fdfc0000-fdfd0000
1.2 External interface configuration
The eth1 interface shall be connected to the organizational network and willacquire network configuration via DHCP. So to have the inetrface working, allthat needs to be done is to set the ONBOOT option in /etc/sysconfig/network-scripts/ifcfg-eth1 and connect a cable to the interface.
2 MAC address acquisition
The MAC address acquisition step is important as it allows the master node touniquely identify the nodes that make up the cluster and as a result give themcustomized configuration.
6
Each network interface has a unique MAC address which can be obtainedeither from the system manuals/documentation or from listening to the networktraffic from the master node interface on which the dhcp shall be listening on.
2.1 System Documentation / Manuals
This could either be on the hardware such as is the case on Sun servers and acouple of HP servers I’ve seen or on the booklets provided alongside the server.However, this could at times be deceiving. If that is the case, you could alwayslisten on the network to obtain the desired MAC address.
2.2 Netwotk Traffic Monitoring
Using the tcpdump command, we can acquire the hardware interfaces’ MACaddress. For easy identification, each node should be turned on at any giventime during the MAC address collection process.
From the tcpdump output below, we can identify the network interface MACaddress of the first node as 00:1b:24:3d:f1:a3 since the column just before thesecond ”greater than” symbol is 0.0.0.0.68 - which basically means it has no ipaddress and expects a response on UDP port 68.
[root@master ~]# tcpdump -i eth0 -nn -qtep port bootpc and port bootps \
and ip broadcast
tcpdump: verbose output suppressed, use -v or -vv for full protocol
decode listening on eth0, link-type EN10MB (Ethernet), capture size
96 bytes
00:1b:24:3d:f1:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 590: 0.0.0.0.68 >
255.255.255.255.67: UDP, length 548
00:16:36:e7:8b:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 342: 192.168.10.1
.67 > 255.255.255.255.68: UDP, length 300
Repeat the above process for all nodes to which you would like to issue staticIP addresses.
2.3 TFTP Configuration
The TFTP service is trivial for a PXE server to work as they serve provide anetinstall kernel and a ramdisk to the clients when they attempt to do a networkboot.
By default, tftp which is part of xinetd.d is disabled. You can have it enabledby opening the configuration file and changing the value of the option ”disabled”from yes to no. Your completed configuration file should be similar to the oneshown below
1. Enable tftp which is part of the xinetd stack
[root@master ~]# vi /etc/xinetd.d/tftp
[root@master ~]# cat /etc/xinetd.d/tftp
# default: off
service tftp
7
{
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = -s /tftpboot
disable = no
per_source = 11
cps = 100 2
flags = IPv4
}
2. Once done, restart the service xinetd to start tftp alongside other serviceson the next start.
[root@master ~]# service xinetd restart
Stopping xinetd: [ OK ]
Starting xinetd: [ OK ]
3. Check if a tftpboot directory has been created on the root directory treeas is shown below
[root@master ~]# file /tftpboot/
/tftpboot/: directory
4. Create a directory tree into which the pxe files shall be placed.
[root@master ~]# mkdir -p /tftpboot/pxe/pxelinux.cfg
5. Copy the netboot kernel image and an initial ramdisk.
[root@master ~]# ls /distro/centos/images/pxeboot/
initrd.img README TRANS.TBL vmlinuz
[root@master ~]# cp /distro/centos/images/pxeboot/{vmlinuz,
initrd.img} /tftpboot/pxe/
6. Locate the pxelinux.0 file and copy to the /tftpboot/pxe directory fromwhere it should be accessible via tftp daemon.
[root@master ~]# locate pxelinux.0
/usr/lib/syslinux/pxelinux.0
[root@master ~]# cp -av /usr/lib/syslinux/pxelinux.0 /tftpboot/pxe/
‘/usr/lib/syslinux/pxelinux.0’ -> ‘/tftpboot/pxe/pxelinux.0’
NOTE: Keenly note the location of the pxelinux.0 file as its relativepath(i.e. from the tftp root directory - /tftpboot) will be used in theDHCP daemon configuration section.
7. Create a default boot configuration file for machines that may not have aspecific boot file in the pxelinux.cfg directory.
8
[root@master ~]# vi /tftpboot/pxe/pxelinux.cfg/default
[root@master ~]# cat /tftpboot/pxe/pxelinux.cfg/default
# /tftpboot/pxe/pxelinux.cfg/default
prompt 1
timeout 100
default local
label local
LOCALBOOT 0
label install
kernel vmlinuz
append initrd=initrd.img network ip=dhcp lang=en US keymap=us \
ksdevice=eth0 ks=http://192.168.10.1/ks/node-ks.cfg \
loadramdisk=1 prompt_ramdisk=0 ramdisksize=16384 vga=normal \
selinux=0
8. Get the hexadecimal equivalent of the nodes ip address used to creat aper client pxe configuration.
[root@master pxelinux.cfg]# gethostip node01
node01 192.168.10.2 C0A80A02
[root@master pxelinux.cfg]# cp default C0A80A02
9. Copy the default file to a file with the hex equivalent obtained above.Open the file and change the line default local to default install. Thisshould commence installation on rebooting node01. The same should bedone for all other nodes.
[root@master ~]# cp /tftpboot/pxe/pxelinux.cfg/default /tftpbo
ot/pxe/pxelinux.cfg/C0A80A02
3 DHCP configuration
To issue static ip addresses via the DHCP daemon, the network interface hard-ware(or MAC) addresses collected in the MAC address collection section willbe necessary.
DHCP daemon configuration for the cluster should carried out as outlined inthe steps below.
1. Enter the name of the interface through which the DHCP daemon will belistening on.
[root@master ~]# cat /etc/sysconfig/dhcpd
# Command line options here
DHCPDARGS="eth0"
2. Create your DHCP configuration file, from the sample file in the locationbelow.
9
[root@master ~]# cp /usr/share/doc/dhcp-3.0.5/dhcpd.conf.sample \
/etc/dhcpd.conf
cp: overwrite ‘/etc/dhcpd.conf’? y
3. You could edit your your configurations to look more or less like my con-figurations issuing addresses to desired hosts using their MAC addressesas illustrated below.
[root@master ~]# cat /etc/dhcpd.conf
ddns-update-style interim;
ignore client-updates;
allow booting;
allow bootp;
subnet 192.168.10.0 netmask 255.255.255.0 {
# --- default gateway
# option routers 192.168.0.1;
option subnet-mask 255.255.255.0;
# option nis-domain "domain.org";
option domain-name "cluster";
option domain-name-servers 192.168.10.1;
option time-offset 10800; # EAT
# option ntp-servers 192.168.1.1;
# option netbios-name-servers 192.168.1.1;
# range dynamic-bootp 192.168.10.4 192.168.10.20;
default-lease-time 21600;
max-lease-time 43200;
filename "pxe/pxelinux.0";
next-server 192.168.10.1;
# we want the nameserver to appear at a fixed address
host node01 {
hardware ethernet 00:1b:24:3d:f1:a3;
fixed-address 192.168.10.2;
option host-name "node01";
}
host node02 {
hardware ethernet 00:1b:24:3e:05:d1;
fixed-address 192.168.10.3;
option host-name "node02";
}
host node03 {
hardware ethernet 00:1b:24:3e:04:f6;
fixed-address 192.168.10.4;
option host-name "node03";
}
}
4. Finally, save the configuration file and start the server.
10
[root@master ~]# service dhcpd start
Starting dhcpd: [ OK ]
5. Should the starting of DHCP daemon fail, you could look at the logs at/var/logs/messages and identify any DHCP daemon related errors. Thiscould be done using the GNU/Linux editor but for better troubleshooting,I’d proceed as below.
[root@master ~]# tail -f /var/log/messages
4 Local Repository
A local repository is very crucial in cases of poor Internet connectivity.
1. Create a directory on the system and make it copy all the contents of theinstallation disk into it.
[root@master ~]# mkdir -p /distro/centos
[root@master ~]# cp -ar /media/CentOS_5.6_Final/* /distro/centos
2. Create a new repository file that would point to the location created above.
[root@master ~]# cat /etc/yum.repos.d/CentOS-Local.repo
[Local]
name=CentOS- - Local
baseurl=file:///distro/centos
gpgcheck=0
enabled=1
3. Clear the cache and any other repository information saved locally
[root@master ~]# yum clean all
4. Make a cache of the new available repositories.
[root@master ~]# yum makecache
5 EPEL Repository
The addition of the EPEL(Extraa Packages for Enterprise Linux) repositorywas crucial in the facilitation of the installation of some of the software neededin the cluster and which installation from source was not quite a simple process.These are such as:
1. R - R Statistical package http://www.r-project.org/
2. NCO - NetCDF Operator http://nco.sourceforge.net/
3. CDO - Climate Data Operators
4. NCL - NCAR Command Languagehttp://www.ncl.ucar.edu/Applications/rcm.shtml
11
5. GrADS - Grid Analysis and Display System http://www.iges.org/
This is done as illustrated below:
[root@master ~]# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5
/x86_64/epel-release-5-4.noarch.rpm
Retrieving http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-re
lease-5-4.noarch.rpm
warning: /var/tmp/rpm-xfer.Ln8ILG: Header V3 DSA signature: NOKEY, key
ID 217521f6
Preparing... ########################################### [100%]
1:epel-release ########################################### [100%]
6 NFS configuration
We shall export some of the master node’s filesystem to reduce the need forrepetitive configuration.
1. Populate the /etc/exports configuration file with the directories you’d wishto have exported via nfs.
[root@master ~]# vi /etc/exports
/distro *(ro,root_squash)
/home *(rw,root_squash)
/distro/centos *(ro,root_squash)
/distro/ks *(ro,root_squash)
/opt *(ro,root_squash)
/usr/local *(ro,root_squash)
/scratch *(rw,root_squash)
2. Start the nfs daemon. Which should start succesfully should your config-urations.
[root@master ~]# service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS daemon: [ OK ]
Starting NFS mountd: [ OK ]
3. Make the nfs daemon to autostart without on system start up.
[root@master ~]# chkconfig nfs on
[root@master ~]# exportfs -vra
exporting *:/distro/centos
exporting *:/distro/ks
exporting *:/usr/local
exporting *:/scratch
exporting *:/distro
exporting *:/home
exporting *:/opt
12
7 SSH Key Generation Script
To allow jobs to be succesfully submitted to the cluster, passwordless ssh loginshould be possible for all users on the cluster. So the script below will createa key pair and copy it over to the authorized keys file in the .ssh/ directory ineach users home directory.
This shall be automated by the script below which we shall place in system-wide /etc/profile.d directory.
[root@master modulefiles]# cat /etc/profile.d/passwordless-ssh.sh
Listing 1: /etc/profile.d/passwordless-ssh.sh#!/ bin /bash## / e tc / p r o f i l e . d/ password less−ssh . sh#
i f [ ! −d ”${HOME}” / . ssh / −o ! −f ”${HOME}” / . ssh / id d sa . pub ]thenecho −ne ”Generating ssh keys :\ t ”ssh−keygen −t dsa −N ”” −f ”${HOME}” / . ssh / id d sa
i f [ ”$?” −eq 0 ] ; thenecho −e ” [ \ 0 3 3 [ 3 2 ; 1m done \033 [0m] ” ;cat ”${HOME}” / . ssh / id d sa . pub >> ”${HOME}” / . ssh / author i z ed keyschmod −R u+rwX, go= ”${HOME}” / . ssh /
e l s eecho −e ” [ \ 0 3 3 [ 3 5 ; 1m f a i l e d \033 [0m] ”
f if i
13
Part II
Software and Compilerinstallation and configuration
14
8 Torque configuration
1. Untar the source and execute the configure script with the following below.
[root@master src]# tar xvfz torque-2.4.14.tar.gz
[root@master src]# cd torque-2.4.14
[root@master torque-2.4.14]# mkdir build
[root@master torque-2.4.14]# cd build
[root@master build]# ../configure --help
[root@master build]# ../configure --prefix=/opt/torque --
enable-server --enable-mom --enable-clients --disable-gui
--with-rcp=scp
2. Compile the code to create binary files by executing ”make”, followed by”make install” to install the binaries.
[root@master build]# make
[root@master build]# make install
3. Add the path for the sbin directory to the root user’s .bashrc file.
[root@master torque-2.4.14]# echo "export PATH=/opt/torqu
e/sbin:\$PATH" >> /root/.bashrc
[root@master torque-2.4.14]# tail -n 1 ~/.bashrc
export PATH=/opt/torque/sbin:$PATH
4. Copy the pbs mom script in the contrib/init.d directory of the installationsource /opt/torque/pbs mom.init. Open the file in an editor of your choiceand ammend any erroneous paths.
[root@master torque-2.4.14]# cp contrib/init.d/pbs_mom \
/opt/torque/pbs_mom.init
[root@master torque-2.4.14]# vi /opt/torque/pbs_mom.init
5. Copy the node install.sh script into the torque install directory. It will beused to install pbs mom on the computing nodes.
Listing 2: node install.sh#!/ bin /bash# /opt / torque / n o d e i n s t a l l . sh
# h t t p :// ep ico . esc ience−l a b . org# mai l to : baro@democritos . i t
TORQUEHOME=/opt/ torque /TORQUEBIN=$TORQUEHOME/bin
MAUIBIN=/opt/maui/ binSPOOL=/var / spoo l / torque
mkdir −vp $SPOOL
cd $SPOOL | | e x i t
#===========================================================#
15
mkdir −vp aux mom priv/ jobs mom logs checkpo int spoo lunde l i ve r ed
chmod −v 1777 spoo l unde l i ve r ed
f o r s in pro logue ep i l o guedo
t e s t −e $TORQUEHOME/ s c r i p t s / $s && \ln −sv $TORQUEHOME/ s c r i p t s / $s $SPOOL/mom priv/
done
#===========================================================#
cat << EOF > pbs environmentPATH=/bin : / usr / binLANG=C
EOF
#===========================================================#
echo master > server name
#===========================================================#
cat << EOF > mom priv/ c on f i g\ $ c l i e n t h o s t master\ $ logevent 0 x7f\ $usecp ∗ : / u /u\ $usecp ∗ : / home /home\ $usecp ∗ : / s c ra t ch / s c ra t ch
EOF
#===========================================================#
MOM INIT=/etc / i n i t . d/pbs mom
cp −va /opt/ torque /pbs mom . i n i t $MOM INITchmod +x $MOM INIT
chkcon f i g −−add pbs momchkcon f i g pbs mom on
# increase l im i t s f o r i n f i n i b and s t u f f (pbs mom i s NOTpam l imi ts aware )
egrep ’ u l im i t [ [ : space : ] ]+.∗ − l [ [ : space : ] ] ’ $MOM INIT | | \pe r l −e ’whi l e (<>) {
pr in t ;i f ( /ˆ [ \ t ]+ s t a r t \) / ){
pr in t << EOF ;#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−## increase l im i t s f o r i n f i n i b and s t u f f (no−pam limits−
aware )# max locked memory , s o f t and hard l im i t s f o r a l l PBS
ch i l d r enu l im i t −H − l un l imi tedu l im i t −S − l 4096000# stack s i z e , s o f t and hard l im i t s f o r a l l PBS
ch i l d r enu l im i t −H −s un l imi ted
16
u l im i t −S −s 1024000#−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−#
EOF}
}’ − i $MOM INIT
#===========================================================#
cat << EOF > / e tc / p r o f i l e . d/pbs . shexport PATH=$TORQUEBIN:$MAUIBIN:\$PATH
EOF
#EOF
6. In an editor of your choice, enter the fully qualified domain name of yourmaster node in the file below.
[root@master torque-2.4.14]# vi /var/spool/torque/server_name
master.cluster
7. Add your nodes and the their properties into the nodes file as shown below.
[root@master torque-2.4.14]# vi /var/spool/torque/server_priv/nodes
node01 np=4
node02 np=4
node03 np=4
8. Initialize the serverdb and start the torque pbs server as shown below
[root@master ~]# pbs_server -t create
[root@master ~]# service pbs_server start
Starting TORQUE Server: [ OK ]
9. Create a queue(s) to suit your configuration and make at least one ofdefault using the torque qmgr command. An easier way would be tocreate a file as below
[root@master ~]# vi qmgr.cluster
create queue default
set queue default queue_type = Execution
set queue default Priority = 60
set queue default max_running = 128
set queue default resources_max.walltime = 168:00:00
set queue default resources_default.walltime = 01:00:00
set queue default max_user_run = 12
set queue default enabled = True
set queue default started = True
set server scheduling = True
set server managers = maui@master
set server managers += root@master
set server operators = maui@master
set server operators += root@master
set server default_queue = default
17
10. Load the enter the file containing the qmgr configuration as illustratedbelow
[root@master ~]# qmgr -c < qmgr.cluster
11. A print of the pbs server configuration looks as below
[root@master ~]# qmgr -c ’p s’
#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default Priority = 60
set queue default max_running = 128
set queue default resources_max.walltime = 168:00:00
set queue default resources_default.walltime = 01:00:00
set queue default max_user_run = 12
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = master.cluster
set server managers = maui@master
set server managers += root@master
set server operators = maui@master
set server operators += root@master
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server next_job_number = 26
12. Restart both the pbs server on the master node and the pbs mom on thenodes and execute, pbsnodes to see a print out on all free nodes.
[root@master ~]# pbsnodes
node01
state = free
np = 2
ntype = cluster
status = rectime=1308321567,varattr=,jobs=,state=free,
netload=1205591,gres=,loadave=0.18,ncpus=4,physmem=4051184
kb,availmem=5021068kb,totmem=5103400kb,idletime=0,nusers=0,
nsessions=? 0,sessions=? 0,uname=Linux node01 2.6.18-238.
el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux
18
node02
state = free
np = 2
ntype = cluster
status = rectime=1308321569,varattr=,jobs=,state=free,
netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184
kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0,
nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238.
el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux
node03
state = free
np = 2
ntype = cluster
status = rectime=1308321569,varattr=,jobs=,state=free,
netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184
kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0,
nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238.
el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux
9 Maui configuration
1. Untar, configure, make binaries and install maui from source as shown inthe next sequence of steps
[root@master ~]# tar xvfz maui-3.3.1.tar.gz
[root@master ~]# cd maui-3.3.1
[root@master maui-3.3.1]# ./configure --help
[root@master maui-3.3.1]# ./configure --prefix=/opt/maui
--with-spooldir=/var/spool/maui --with-pbs=/opt/torque/
[root@master maui-3.3.1]# make
[root@master maui-3.3.1]# make install
2. Create a system user maui through which maui shall be run
[root@master maui-3.3.1]# useradd -d /var/spool/maui -r -g daemon \
maui
3. Edit the maui.cfg file changing the SERVERHOST, ADMIN1, ADMIN3and resouce manager definition(RMCFG) as shown in the snipett below
[root@master maui-3.3.1]# vi /var/spool/maui/maui.cfg
# maui.cfg 3.3.1
SERVERHOST master
# primary admin must be first in list
ADMIN1 maui root
ADMIN3 ALL
# Resource Manager Definition
RMCFG[MASTER] TYPE=PBS
19
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
....
EOF
4. Copy the init script in the maui source package to /etc/init.d/ and, edit thefile changing the MAUI PREFIX to point to your installation directory.
[root@master maui-3.3.1]# cp contrib/service-scripts/redhat. \
maui.d /etc/init.d/maui
[root@master maui-3.3.1]# vi /etc/init.d/maui
[root@master maui-3.3.1]# cat /etc/init.d/maui
#!/bin/sh
#
# maui This script will start and stop the MAUI Scheduler
#
# chkconfig: 345 85 85
# description: maui
#
ulimit -n 32768
# Source the library functions
. /etc/rc.d/init.d/functions
MAUI_PREFIX=/opt/maui
# let see how we were called
case "$1" in
start)
echo -n "Starting MAUI Scheduler: "
daemon --user maui $MAUI_PREFIX/sbin/maui
echo
;;
stop)
echo -n "Shutting down MAUI Scheduler: "
killproc maui
echo
;;
status)
status maui
;;
restart)
$0 stop
$0 start
;;
*)
echo "Usage: maui {start|stop|restart|status}"
exit 1
esac
5. Create a file maui.sh in the /etc/profile.d directory and to it add theenvironment variables PATH, INCLUDE and LD LIBRARY PATH andmake it executable.
20
[root@master maui]# vi /etc/profile.d/maui.sh
[root@master maui]# chmod +x /etc/profile.d/maui.sh
10 Compiler Installation
A compilers is necessary in a cluster as they aid in the changing of sourcecode into executables that can be run or understood by the computer. Ofinterest are C, C++ and fortran compilers popular of which are the GCC andIntel compilers. Another, option is the PGI compilers which we shall not haveinstalled.
10.1 GCC Compilers
From the CentOS repositories we shall install the GCC compilers using the yumpackage management utility.
[root@master src]# yum -y install gcc.x86_64 gcc-gfortran.x86_64 \
libstdc++.x86_64 libstdc++-devel.x86_64 libgcj.x86_64 compat-lib \
stdc++.x86_64
10.2 Intel Compilers
For the Intel compilers which may give better results depending on the scenario,we shall proceed with the installation as outlined below:
1. Visit the Intel Website in your preferred web browser, register and down-load the Intel compilers for non-commercial use.
2. Move to the directory into which you downloaded the Intel C compilersand Fortran compilers.
3. Untar the tarballs and change directory into the created directory.
[root@master ~]# tar xvfz l_ccompxe_2011.4.191.tgz
[root@master ~]# cd l_ccompxe_2011.4.191
[root@master l_ccompxe_2011.4.191]# ./install.sh
[root@master ~]# tar xvfz l_fcompxe_2011.4.191.tgz
[root@master ~]# cd l_fcompxe_2011.4.191
[root@master l_fcompxe_2011.4.191]# ./install.sh
4. Execute the install.sh script and proceed as prompted.
11 OpenMPI installation
OpenMPI is an open source library implementation of the Message PassingInterface(MPI-2) and facilitates communication/message inter-change betweenprocess in a High Performance Computing environment.
21
11.1 OpenMPI Compiled with GCC Compilers
1. Untar and compile the sources
[root@master src]# tar xvfj openmpi-1.4.2.tar.bz2
[root@master src]# cd openmpi-1.4.2
[root@master openmpi-1.4.2]# mkdir build
[root@master openmpi-1.4.2]# cd build/
[root@master build]# ../configure CC=gcc CXX=g++ FC=gfortran \
F77=gfortran --prefix=/opt/openmpi/1.4.2/gcc/4.1.2 \
--with-tm=/opt/torque/
2. Create binaries by running ”make”
[root@master build]# make
3. Finally, install the binaries into the system
[root@master build]# make install
11.2 OpenMPI Compiled with Intel Compilers
1. Untar and compile the sources as above. However, take keen notice of thevalue of the variables CC, CXX, FC and F77 as compared to the samestep when compiled with the GCC compilers above.
[root@master src]# tar xvfj openmpi-1.4.2.tar.bz2
[root@master src]# cd openmpi-1.4.2
[root@master openmpi-1.4.2]# mkdir build
[root@master openmpi-1.4.2]# cd build/
[root@master build]# ../configure CC=icc CXX=icpc FC=ifort \
F77=ifort --prefix=/opt/openmpi/1.4.2/intel/12.0.4 \
--with-tm=/opt/torque/
2. Create binaries by running ”make”
[root@master build]# make
3. Finally, install the binaries into the system
[root@master build]# make install
12 Environment Modules installation
1. Obtain the environment modules source file, uncompress it and changeedirectory into the created directory as below
[root@master src]# tar xvfz modules-3.2.8a.tar.gz
[root@master src]# cd modules-3.2.8
2. Then compile the sources specifying a prefix where the sources should beinstalled.
22
[root@master modules-3.2.8]# ./configure --prefix=/opt
Should, you be running a 64-bit system and encounter an error indicatingtcl lib and include directories cannot be found, proceed as below
[root@master modules-3.2.8]# ./configure --with-tcl-lib=/usr/lib64/
--with-tcl-inc=/usr/include/ --prefix=/opt
3. Then create binaries and install.
[root@master modules-3.2.8]# make
[root@master modules-3.2.8]# make install
4. Finally, copy the init scrips to the /etc/profile.d directory to make themodules command available system-wide.
[root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash /etc/
profile.d/modules.sh
[root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash_compl
etion /etc/profile.d/modules_bash_completion.sh
13 C3 Tools installation
1. Uncompress the C3 tools source package and execute the install script
[root@master src]# tar xvfz c3-4.0.1.tar.gz
[root@master src]# cd c3-4.0.1
[root@master c3-4.0.1]# ./Install-c3
2. Create a c3.conf configuration file defining a cluster name, the master nodeand nodes in the cluster.
[root@master c3-4.0.1]# vi /etc/c3.conf
[root@master c3-4.0.1]# cat /etc/c3.conf
cluster cluster1 {
master:master
node0[1-3]
}
3. Create ssh keys to be used for passwordless login in the nodes of thecluster.
[root@master ~]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Created directory ’/root/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
46:6d:e5:e5:e2:5c:b5:72:16:bc:04:6f:59:2c:b5:32 root@master
.cluster
23
4. Copy the /.ssh/id dsa.pub contents to the authorized keys file of all nodesin the cluster. This is how to do it on a single node.
[root@master ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@node01
21
The authenticity of host ’node01 (192.168.10.2)’ can’t be es
tablished. DSA key fingerprint is fe:8d:bf:6e:de:f4:94:d3:c4:
d7:ee:74:6c:8c:dd:da. Are you sure you want to continue conn-
ecting (yes/no)? yes
Warning: Permanently added ’node01,192.168.10.2’ (RSA) to the
list of known hosts.
root@node01’s password:
Now try logging into the machine, with "ssh ’root@node01’",
and check in:
.ssh/authorized_keys
to make sure we haven’t added extra keys that you weren’t
expecting.
5. Test if the key was succesfully registered by attempting to login intonode01.
[root@master ~]# ssh node01
Last login: Fri Jun 17 12:53:28 2011
[root@node01 ~]# exit
logout
14 Password Syncing
User accounts and passwords in the cluster should be similar in all nodes form-ing the cluster should be the same however, we cant have the user create thepassword in all the machines that form up the cluster. We shall therefore createa script to effect this. In our case we shall use the cpush command from the c3tools package installed earlier.
Listing 3: node-ks.cfg#!/ bin /bash## Sync / e t c /passwd , / e t c /shadow and / e t c /group# Fi l e : / root / bin# Cron : min hour dom month dow root / e t c /password−push . sh
f o r f in passwd shadow group ; do/opt/c3−4/cpush / e tc /”${ f }” > /dev/ nu l l
done
However, have in mind that rsync could be used to achieve the same.
15 NetCDF, HDF5 and GrADs installation
Grads requires NetCDF and HDF5 as dependencies for its installtion. Therefore,we shall install them all as a pack from the epel repositories.
24
[root@master ~]# yum -y install netcdf hdf5 grads
16 NCL and NCO installation
These too we shall have installed using the yum package manager as below
[root@master ~]# yum -y install ncl nco
17 R Statistical package installation
The R statistical package will be installed from the epel repositories to save asfrom the agony of installing a myraid of dependencies and for easy updating ofthe packages.
[root@master ~]# yum -y install R.x86_64 R-core.x86_64 R-devel.x86_64 \
libRmath.x86_64 libRmath-devel.x86_64
25
Part III
Computing Node Installation
26
18 Node OS installtion
With the master node setup complete, installtion of the nodes should just be apush of a button. However, a little understanding of the node-ks.cfg is essential.It marks the packages tftp, openssh-server, openssh, xorg-x11-xauth, mc andstrace for installation and those with a preceeding − sign for uninstalltion.
There after, the post installation section is executed, which removes unwantedpackages, creates a local repository, and install the gcc compilers on the nodeswhich are available on the CentOS repositories.
Listing 4: node-ks.cfgt f t popenssh−s e r v e ropensshxorg−x11−xauthmcs t r a c e−cups−cups− l i b s−bluez−u t i l s−bluez−gnome−rp−pppoe−ppp
%post −−l og=/root /ks−post . l ogMASTER=192.168 .10 .1
# Dele te unwanted s e r v i c e sf o r i in sendmail ;do
chkcon f i g −−de l ”${ i }”done
# Remove d e f a u l t reposta r cv f z yum. repos . d . ta r . gz / e tc /yum. repos . drm −r f / e t c /yum. repos . d/∗
# Mount / d i s t r o form master nodemkdir −p / d i s t r omount −t n f s $MASTER:/ d i s t r o / d i s t r o
# Add mount to f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / d i s t r o \ t / d i s t r o \ t \ t n f s \ t d e f a u l t s \ t0 0” | t e e
−a / e tc / f s t ab
# Add master node ’ s / opt to f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / opt\ t /opt\ t \ t n f s \ t d e f a u l t s \ t0 0” | t e e −a /
e tc / f s t ab
# Add master node ’ s /home to f s t a becho −e ” 1 9 2 . 1 6 8 . 1 0 . 1 : / home\ t /home\ t \ t n f s \ t d e f a u l t s \ t0 0” | t e e −a
/ e tc / f s t ab
# Execute the n o d e i n s t a l l . sh s c r i p t to i n s t a l l pbs mom/opt/ torque / n o d e i n s t a l l . sh
# Create l o c a l repomkdir −p / d i s t r o / centosecho −e ” [ Local ] \nname=CentOS−$ r e l e a s e v e r − Local \ nbaseur l= f i l
27
e :/// d i s t r o / centos \ngpgcheck=0 \nenabled=1” | t e e / e t c /yum. repos. d/CentOS−Local . repo
yum c l ean a l lyum makecache
# GCC compi lersyum −y i n s t a l l gcc . x86 64 gcc−g f o r t r an . x86 64 l i b s t d c++.x86 64
l i b s t d c++−deve l . x86 64 l i b g c j . x86 64 compat−l i b s t d c++.x86 64
Once the installation is complete, you could have a look at the ks-post.login root’s home directory for any errors while executing the post section of thekickstart file.
19 Name resolution
Finally, ensure that all the nodes in the cluster can resolve names of the nodes inthe cluster. You can either setup DNS on the master node or use the /etc/hostsfile.
SHould you need help setting up a DNS server, post your requests in thecomments below.
28