Using The Cluster. What We’ll Be Doing Add users Run Linpack Compile code Compute Node Management.

Post on 29-Dec-2015

225 views 0 download

Transcript of Using The Cluster. What We’ll Be Doing Add users Run Linpack Compile code Compute Node Management.

Using The Cluster

What We’ll Be Doing

Add users Run Linpack Compile code Compute Node Management

Add a User

Adding a User Account

Execute:

# useradd <username>

Output from ‘useradd’Creating user: gbmake: Entering directory `/var/411'/opt/rocks/bin/411put --comment="#" /etc/auto.home411 Wrote: /etc/411.d/etc.auto..homeSize: 514/207 bytes (encrypted/plain)Alert: sent on channel 239.2.11.71 with master 10.1.1.1

/opt/rocks/bin/411put --comment="#" /etc/passwd411 Wrote: /etc/411.d/etc.passwdSize: 2565/1722 bytes (encrypted/plain)Alert: sent on channel 239.2.11.71 with master 10.1.1.1

/opt/rocks/bin/411put --comment="#" /etc/shadow411 Wrote: /etc/411.d/etc.shadowSize: 1714/1093 bytes (encrypted/plain)Alert: sent on channel 239.2.11.71 with master 10.1.1.1

/opt/rocks/bin/411put --comment="#" /etc/group411 Wrote: /etc/411.d/etc.groupSize: 1163/687 bytes (encrypted/plain)Alert: sent on channel 239.2.11.71 with master 10.1.1.1

make: Leaving directory `/var/411'

411 Secure Information Service

Secure NIS replacement

Distributes files within the cluster Default 411 configuration is to distribute user account

files, but one can use 411 to distribute any file to all nodes

411 Secure Information Service

When a 411 monitored file changes, an alert is multicast When a node receives an alert, it pulls the file associated with

the alert

Compute nodes periodically pull all files under the control of 411

User Accounts

All user accounts are housed on the frontend under: /export/home/<username>

All nodes use ‘autofs’ to automatically mount the user directory when a user logs into a node This method provides for a simple global file system

On the frontend and every compute node, the user account is available at “/home/<username>”

Deleting a User

Use:

# userdel <username>

Note: the user’s home directory (/export/home/<username>) will not be removed For safety, this must be removed by hand

Running Linpack

Linpack Linpack is a floating point matrix multiply benchmark Measures sustained floating-point operations per second

“Giga flops” - 1 billion floating point operations per second

This benchmark is used to rate the Top500 fastest supercomputers in the world

We use it as a comprehensive test of the system Stresses the CPU Uses the MPICH layer Sends a modest number of messages Ensures a user can launch a job on all nodes Can run through queueing system to also test queueing

system

Running Linpack From the Command Line

Make a ‘machines’ file Execute: vi machines Input the following:

compute-0-0compute-0-0

Get a test Linpack configuration file:

$ cp /var/www/html/rocks-documentation/3.2.0/examples/HPL.dat .

# su - <userid>

Login as non-root user

Run It Load your ssh key into your environment:

$ /opt/mpich/gnu/bin/mpirun -nolocal -np 2 \-machinefile machines /opt/hpl/gnu/bin/xhpl

$ ssh-agent $SHELL$ ssh-add

Execute Linpack:

Flags: -nolocal : don’t run Linpack on host that is launching the job -np 2 : give the job 2 processors -machinefile : run the job on the nodes specified in the file ‘machines’

Successful Linpack OutputThe following parameter values will be used:

N : 2000 NB : 64 P : 1 Q : 2 PFACT : Left Crout Right NBMIN : 8 NDIV : 2 RFACT : Right BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 80)L1 : transposed formU : transposed formEQUIL : yesALIGN : 8 double precision words

----------------------------------------------------------------------------

- The matrix A is randomly generated for each test.- The following scaled residual checks will be computed: 1) ||Ax-b||_oo / ( eps * ||A||_1 * N ) 2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) 3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )- The relative machine precision (eps) is taken to be 1.110223e-16- Computational tests pass if scaled residuals are less than 16.0

============================================================================T/V N NB P Q Time Gflops----------------------------------------------------------------------------W11R2L8 2000 64 1 2 1.96 2.724e+00----------------------------------------------------------------------------||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.1049227 ...... PASSED||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0255037 ...... PASSED||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0055411 ...... PASSED

Running Linpack Through a Job Management System Get a test SGE submission script:

$ cp /var/www/html/rocks-documentation/3.2.0/examples/sge-qsub-test.sh .

Examine the script Most of the script concerns adding (and removing) a

temporary ssh key to your environment

/opt/mpich/gnu/bin/mpirun -nolocal -np $NSLOTS \-machinefile $TMPDIR/machines \/opt/hpl/gnu/bin/xhpl

Important Part Of The Script

At the top Requested number of processors

In the middle What job to run

#$ -pe mpi 2

Submit the Job

Send the job off to SGE:

$ qsub sge-qsub-test.sh

Monitoring the Job

Command line$ qstat -f

queuename qtype used/tot. load_avg arch states----------------------------------------------------------------------------compute-0-0q BIP 2/2 99.99 glinux 3 0 sge-qsub-t bruno r 06/03/2004 02:48:15 MASTER 0 sge-qsub-t bruno r 06/03/2004 02:48:15 SLAVE

Status

Job Output

SGE writes 4 files: sge-qsub-test.sh.e0

Stderr for job ‘0’ sge-qsub-test.sh.o0

Stdout for job ‘0’ sge-qsub-test.sh.pe0

Stderr from the queueing system regarding job ‘0’ sge-qsub-test.sh.po0

Stdout from the queueing system regarding job ‘0’

Removing a Job from the Queue

Execute:

$ qdel <job id>

queuename qtype used/tot. load_avg arch states----------------------------------------------------------------------------compute-0-0q BIP 2/2 99.99 glinux 3 0 sge-qsub-t bruno r 06/03/2004 02:48:15 MASTER 0 sge-qsub-t bruno r 06/03/2004 02:48:15 SLAVE

Find the job id with ‘qstat -f’

To remove the job above:$ qdel 3

Monitoring SGE Via The Web

Setup access to web server Local access

Configure X: redhat-config-xfree86 Remote access

Open http port in “/etc/sysconfig/iptables” Or, port forwarding

“ssh root@stakkato.rocksclusters.org -L 8080:localhost:80”

Then point web browser to “http://localhost:8080”

Frontend Web Page

SGE Job Monitoring

SGE Job Monitoring

Ganglia Monitoring

Ganglia Monitoring

Scaling Up Linpack

Tell SGE to allocate more processors Edit ‘sge-qsub-test.sh’ and change:

#$ -pe mpi 2

To:

#$ -pe mpi 4

Tell Linpack to use more processors Edit ‘HPL.dat’ and change

1 Ps

To:

2 Ps

The number of processors Linpack uses is P * Q

Scaling Up Linpack

Submit the larger job$ qsub sge-qsub-test.sh

To make Linpack use more memory (and increase performance, edit ‘HPL.dat’ and change

1000 Ns

To:

4000 Ns

Linpack operates on an N * N matrix Goal: consume 75% of memory on each compute node

Using Linpack Over Myrinet

Scale up the job in the same manner as described in the previous slides.

Submit the Myrinet-based job$ qsub sge-qsub-test-myri.sh

Get a test Myrinet SGE submission script:

$ cp /var/www/html/rocks-documentation/3.2.0/examples/sge-qsub-test-myri.sh .

Executing Commands Across the Cluster Collect “ps” status

cluster-ps <regular expression> To get the status of all the processes being executed by user ‘bruno’

Execute: cluster-ps bruno

Kill processes cluster-kill <regular expression> To kill all the Linpack jobs

Execute: cluster-kill xhpl

Execute any command line executable cluster-fork <regular expression> To restart the ‘autofs’ service on all compute nodes

Execute: cluster-fork “service autofs restart”

Executing Commands Across the Cluster All cluster-* commands can query the database to

generate a node list

To restart the ‘autofs’ service only on the nodes in cabinet 1

Execute: cluster-fork --query=“select name from nodes where rack=1” “service autofs restart”

Compile Code

Compile Test MPI Program with gcc Compile cpi

$ cp /opt/mpich/gnu/examples/cpi.c .$ cp /opt/mpich/gnu/examples/Makefile .$ make cpi/opt/mpich/gnu/bin/mpicc -c cpi.c/opt/mpich/gnu/bin/mpicc -o cpi cpi.o -lm

Run it$ /opt/mpich/gnu/bin/mpirun -nolocal -np 2 -machinefile machines $HOME/cpi/cpiProcess 0 on compute-2-1.localpi is approximately 3.1416009869231241, Error is 0.0000083333333309wall clock time = 0.000650Process 1 on compute-2-1.local

Compile Test MPI Program with gcc Compile cpi

$ cp /opt/mpich/gnu/examples/cpi.c $HOME$ cp /opt/mpich/gnu/examples/Makefile $HOME$ make cpi/opt/mpich/gnu/bin/mpicc -c cpi.c/opt/mpich/gnu/bin/mpicc -o cpi cpi.o -lm

Run it$ /opt/mpich/gnu/bin/mpirun -nolocal -np 2 -machinefile machines $HOME/cpiProcess 0 on compute-2-1.localpi is approximately 3.1416009869231241, Error is 0.0000083333333309wall clock time = 0.000650Process 1 on compute-2-1.local

Compile MPI Code with Intel Compiler

Simply change ‘gnu’ to ‘intel’

$ cp /opt/mpich/intel/examples/cpi.c $HOME$ cp /opt/mpich/intel/examples/Makefile $HOME$ make cpi/opt/mpich/intel/bin/mpicc -c cpi.c/opt/mpich/intel/bin/mpicc -o cpi cpi.o -lm

Bring In Your Own Code

FTP your code to the frontend Let’s compile and try to run it!

Compute Node Management

Adding a Compute Node

Execute “insert-ethers” If adding to a specific rack:

For example, if adding to cabinet 2: “insert-ethers --cabinet=2”

If adding to a specific location within a rack: “insert-ethers --cabinet=2 rank=4”

Replacing a Dead Node

To replace node compute-0-4:

# insert-ethers --replace=“compute-0-4”

Remove the dead node Power up the new node Put the new node into “installation mode”

Boot with Rocks Base CD, PXE boot, etc.

The next node that issues a DHCP request will assume the role of compute-0-4

Removing a Node

If decommissioning a node:

# insert-ethers --remove=“compute-0-2”

Insert-ethers will remove all traces of compute-0-2 from the database and restart all relevant services You will not be asked for any input