2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan [email protected].

42
2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan [email protected]

Transcript of 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan [email protected].

Page 1: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Distributed & Parallel Computing Cluster

Patrick [email protected]

Page 2: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Background

NSF funded Major Research Instrumentation (MRI) grant

Goals Personnel

– PI– Co-PI’s– Senior Personnel– Systems Administrator

Page 3: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Goals

Establish a regional Distributed and Parallel Computing Cluster at UTA (DPCC@UTA)

An inter-departmental and inter-institutional facility Facilitate collaborative research that requires large

scale storage (tera to peta bytes), high speed access (gigabit or more) and mega processing (100’s of processors)

Page 4: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Research Areas

Data mining / KDD– Association rules, graph-mining, Stream processing etc.

High Energy Physics– Simulation, moving towards a regional Dø Center

Dermatology/skin cancer– Image database, lesion detection and monitoring

Distributed computing– Grid computing, PICO

Networking– Non-intrusive network performance evaluation

Software Engineering– Formal specification and verification

Multimedia– Video streaming, scene analysis

Facilitate collaborative efforts that need high-performance computing

Page 5: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Personnel

PI Dr. Chakravarthy CO-PI’s

– Drs. Aslandogan, Das, Holder, Yu

Senior Personnel– Paul Bergstresser, Kaushik De, Farhad

Kamangar, David Kung, Mohan Kumar, David Levine, Jung-Hwan Oh, Gregely Zaruba

Systems Administrator– Patrick McGuigan

Page 6: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Components

Establish a distributed memory cluster (150+ processors)

Establish a Symmetric or shared multiprocessor system

Establish a large shareable high speed storage (100’s of Terabytes)

Page 7: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Cluster as of 2/1/2004

Located in 101 GACB Inauguration 2/23 as part of E-Week 5 racks of equipment + UPS

Page 8: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Page 9: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Page 10: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Page 11: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Photos (scaled for presentation)

Page 12: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Resources

97 machines– 81 worker nodes– 2 interactive nodes– 10 IDE based RAID servers– 4 nodes support Fibre Channel SAN

50+ TB storage– 4.5 TB in each IDE RAID– 5.2 TB in FC SAN

Page 13: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Resources (continued)

1 Gb/s network interconnections– core switch– satellite switches

1 Gb/s SAN network UPS

Page 14: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

node2

node3

node4

node5

node6

node7

node8

node9

node10

node11

node12

node13

node14

Node15

node16

node1

node18

node19

node20

node21

node22

node23

node24

node25

node26

node27

node28

node29

node30

node31

node32

node17

master.dpcc.uta.edu grid.dpcc.uta.edu

gfsnode1

gfsnode2

gfsnode3

lockserver

ArcusII 5.2 TB

Brocade 3200

Open

Open

Open

Open

Foundry FastIron 800

raid3

raid6

raid8

raid9

raid10

node54

node55

node56

node57

node58

node59

node60

node61

node62

node53

raid4 IDE 6TB

Linksys 3512

node64

node65

node66

node67

node68

node69

node70

node71

node72

node63

raid5 IDE 6TB

Linksys 3512

node74

node75

node76

node77

node78

node79

node80

node81

node73

raid7 IDE 6TB

Linksys 3512

node34

node35

node36

node37

node38

node39

node40

node41

node42

node33

raid1 IDE 6TB

Linksys 3512

node44

node45

node46

node47

node48

node49

Node50

node51

node52

node43

raid2 IDE 6TB

Linksys 3512

100 Mbs switch

Campus Campus

DPCC Layout

Page 15: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Resource Details

Worker nodes– Dual Xeon processors

32 machines @ 2.4GHz 49 machines @ 2.6GHz

– 2 GB RAM– IDE Storage

32 machines @ 60 GB 49 machines @ 80 GB

– Redhat 7.3 Linux (2.4.20 kernel)

Page 16: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Resource Details (cont.)

Raid Server– Dual Xeon processors (2.4 GHz)– 2 GB RAM– 4 Raid Controllers

2 port controller (qty 1) Mirrored OS disks 8 port controller (qty 3) RAID5 with hot spare

– 24 250GB disks– 2 40GB disk– NFS used to support worker nodes

Page 17: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

DPCC Resource Details (cont.)

FC SAN– RAID5 Array– 42 142GB FC disks– FC Switch– 3 GFS nodes

Dual Xeon (2.4 Ghz) 2 GB RAM Global File System (GFS) Serve to cluster via NFS

– 1 GFS Lockserver

Page 18: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Using DPCC

Two nodes available for interactive use– master.dpcc.uta.edu– grid.dpcc.uta.edu

More nodes are likely to support other services (Web, DB access)

Access through SSH (version 2 client)– Freeware Windows clients are available (ssh.com)– File transfers through SCP/SFTP

Page 19: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Using DPCC (continued)

User quotas not implemented on home directory yet. Be sensible in your usage.

Large data sets will be stored on RAID’s (requires coordination with sys admin)

All storage visible to all nodes.

Page 20: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Getting Accounts

Have your supervisor request account Account will be created Bring ID to 101 GACB to receive password

– Keep password safe Login to any interactive machine

– master.dpcc.uta.edu– grid.dpcc.uta.edu

USE yppasswd command to change password If you forget your password

– See me in my office, I will reset your password (with ID)– Call or e-mail me, I will reset your password to the original

password

Page 21: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

User environment

Default shell is bash– Change with ypchsh– Customize user environment using startup files

.bash_profile (login session) .bashrc (non-login)

– Customize with statements like: export <variable>=<value> source <shell file>

– Much more information in man page

Page 22: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Program development tools

GCC 2.96– C– C++– Java (gcj)– Objective C– Chill

Java– JDK Sun J2SDK 1.4.2

Page 23: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Development tools (cont.)

Python– python = version 1.5.2– python2 = version 2.2.2

Perl– Version 5.6.1

Flex, Bison, gdb… If your favorite tool is not available, we’ll

consider adding it!

Page 24: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Batch Queue System

OpenPBS– Server runs on master– pbs_mom runs on worker nodes– Scheduler runs on master– Jobs can be submitted from any interactive node– User commands

qsub – submit a job for execution qstat – determine status of job, queue, server qdel – delete a job from the queue qalter – modify attributes of a job

– Single queue (workq)

Page 25: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

PBS qsub

qsub used to submit jobs to PBS– A job is represented by a shell script– Shell script can alter environment and proceed

with execution– Script may contain embedded PBS directives– Script is responsible for starting parallel jobs (not

PBS)

Page 26: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Hello World

[mcguigan@master pbs_examples]$ cat helloworldecho Hello World from $HOSTNAME

[mcguigan@master pbs_examples]$ qsub helloworld15795.master.cluster

[mcguigan@master pbs_examples]$ lshelloworld helloworld.e15795 helloworld.o15795

[mcguigan@master pbs_examples]$ more helloworld.o15795 Hello World from node1.cluster

Page 27: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Hello World (continued)

Job ID is returned from qsub Default attributes allow job to run

– 1 Node– 1 CPU– 36:00:00 CPU time

standard out and standard error streams are returned

Page 28: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Hello World (continued)

Environment of job– Defaults to login shell (overide with #!) or –S switch– Login environment variable list with PBS additions:

PBS_O_HOST PBS_O_QUEUE PBS_O_WORKDIR PBS_ENVIRONMENT PBS_JOBID PBS_JOBNAME PBS_NODEFILE PBS_QUEUE

– Additional environment variables may be transferred using “-v” switch

Page 29: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

PBS Environment Variables

PBS_ENVIRONMENT PBS_JOBCOOKIE PBS_JOBID PBS_JOBNAME PBS_MOMPORT PBS_NODENUM PBS_O_HOME PBS_O_HOST PBS_O_LANG PBS_O_LOGNAME PBS_O_MAIL PBS_O_PATH PBS_O_QUEUE PBS_O_SHELL PBS_O_WORKDIR PBS_QUEUE=workq PBS_TASKNUM=1

Page 30: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

qsub options

Output streams:– -e (error output path)– -o (standard output path)– -j (join error + output as either output or error)

Mail options– -m [aben] when to mail (abort, begin, end, none)– -M who to mail

Name of job– -N (15 printable characters MAX first is alphabetical)

Which queue to submit job to– -q [name] Unimportant for now

Environment variables– -v pass specific variables– -V pass all environment variables of qsub to job

Additional attributes-w specify dependencies

Page 31: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Qsub options (continued)

-l switch used to specify needed resources– Number of nodes

nodes = x

– Number of processors ncpus = x

– CPU time cput=hh:mm:ss

– Walltime walltime=hh:mm:ss

See man page for pbs_resources

Page 32: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Hello World

qsub –l nodes=1 –l ncpus=1 –l cput=36:00:00 –N helloworld –m a –q workq helloworld

Options can be included in script:#PBS -l nodes=1#PBS -l ncpus=1#PBS -m a#PBS -N helloworld2#PBS -l cput=36:00:00

echo Hello World from $HOSTNAME

Page 33: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

qstat

Used to determine status of jobs, queues, server $ qstat $ qstat <job id> Switches

– -u <user> list jobs of user– -f provides extra output– -n provides nodes given to job– -q status of the queue– -i show idle jobs

Page 34: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

qdel & qalter

qdel used to remove a job from a queue– qdel <job ID>

qalter used to alter attributes of currently queued job– qalter <job id> attributes (similar to qsub)

Page 35: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Processing on a worker node

All RAID storage visible to all nodes– /dataxy where x is raid ID, y is Volume (1-3)– /gfsx where x is gfs volume (1-3)

Local storage on each worker node– /scratch

Data intensive applications should copy input data (when possible) to /scratch for manipulation and copy results back to raid storage

Page 36: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Parallel Processing

MPI installed on interactive + worker nodes– MPICH 1.2.5– Path: /usr/local/mpich-1.2.5

Asking for multiple processors– -l nodes=x– -l ncpus=2x

Page 37: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Parallel Processing (continued)

PBS node file created when job executes Available to job via $PBS_NODEFILE Used to start processes on remote nodes

– mpirun– rsh

Page 38: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Using node file (example job)

#!/bin/sh

#PBS -m n

#PBS -l nodes=3:ppn=2

#PBS -l walltime=00:30:00

#PBS -j oe

#PBS -o helloworld.out

#PBS -N helloword_mpi

NN=`cat $PBS_NODEFILE | wc -l`

echo "Processors received = "$NN

echo "script running on host `hostname`"

cd $PBS_O_WORKDIR

echo

echo "PBS NODE FILE"

cat $PBS_NODEFILE

echo

/usr/local/mpich-1.2.5/bin/mpirun -machinefile $PBS_NODEFILE -np $NN ~/mpi-example/helloworld

Page 39: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

MPI

Shared Memory vs. Message Passing MPI

– C based library to allow programs to communicate– Each cooperating execution is running the same program

image– Different images can do different computations based on

notion of rank– MPI primitives allow for construction of more sophisticated

synchronization mechanisms (barrier, mutex)

Page 40: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

helloworld.c

#include <stdio.h>#include <unistd.h>#include <string.h>#include "mpi.h"

int main( argc, argv ) int argc; char **argv;{ int rank, size; char host[256]; int val;

val = gethostname(host,255); if ( val != 0 ){ strcpy(host,"UNKNOWN"); }

MPI_Init( &argc, &argv ); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); printf( "Hello world from node %s: process %d of %d\n", host, rank, size ); MPI_Finalize(); return 0;}

Page 41: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Using MPI programs

Compiling– $ /usr/local/mpich-1.2.5/bin/mpicc helloworld.c

Executing– $ /usr/local/mpich-1.2.5/bin/mpirun <options> \

helloworld– Common options:

-np number of processes to create -machinefile list of nodes to run on

Page 42: 2/12/04 Distributed & Parallel Computing Cluster Patrick McGuigan mcguigan@cse.uta.edu.

2/12/04

Resources for MPI

http://www-hep.uta.edu/~mcguigan/dpcc/mpi– MPI documentation

http://www-unix.mcs.anl.gov/mpi/indexold.html– Links to various tutorials

Parallel programming course