Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS,...

32
Joint High Performance Computing Exchange (JHPCE) Cluster Overview The JHPCE staff: Director: Fernando J. Pineda Technology Manager: Mark Miller Systems Engineer: Jiong Yang [email protected] [email protected] hEp://www.jhpce.jhu.edu

Transcript of Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS,...

Page 1: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

Joint High Performance Computing Exchange (JHPCE) Cluster Overview

The  JHPCE  staff:    Director:                        Fernando  J.  Pineda  Technology  Manager:    Mark  Miller  Systems  Engineer:      Jiong  Yang    

       [email protected]                                                                      [email protected]                                                                      hEp://www.jhpce.jhu.edu    

Page 2: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Introductions-  Name -  Department -  How do you plan on using the cluster? -  Will you be accessing the cluster from a Mac or a

Windows system? -  What is your experience with Unix? -  Any experience using other clusters?

Page 3: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

What is a cluster?-  A collection of many powerful computers that can be

shared with many users.

Page 4: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Node (Computer) Components-  Each computer is called a “Node” -  Each node, just like a desktop/laptop has

-  RAM -  CPU

-  Cores -  Disk space

-  Unlike desktop/laptop systems, nodes do not make use of a display/mouse – they are used from a command line interface

-  Each node is about the size of a pizza box

Page 5: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

The JHPCE cluster components-  Joint High Performance Computing Exchange -

pronounced “gypsy”. -  Used for a wide range of Biostatistics – gene

sequence analysis, population simulations. -  Common applications: R, SAS, Stata, perl -  56 Nodes -  Nodes range from having 2 CPUs to 4 CPUs, with

between 4 and 64 cores. In total we have over 2000 cores.

-  RAM ranges from 20 GB to 512 GB -  Disk space

-  Locally attached “scratch” disk space from 250GB to 6TB -  Network Attached Storage (NFS) – 1000 TB

Page 6: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Why would you use a cluster?-  You need resources not available on your local laptop -  You need to run a program (job) that will run for a

very long time -  You need to run a job that can make use of parallel

computing

Page 7: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

How do jobs get scheduled?-  We use a product called “Sun Grid Engine” (SGE)

that schedules jobs -  Jobs are assigned to slots as they become available

and meet the resource requirement of the job -  Jobs are submitted to queues

-  Shared Queue -  Designated queues  

-  The cluster nodes can also be used interactively.

Page 8: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

How do you use the cluster?-  The JHPCE cluster is accessed using SSH. -  Use ssh to login to “jhpce01.jhsph.edu”

-  For Mac users, you can use ssh from Terminal Window. Also need Xquartz for X Window applications:

hEp://xquartz.macosforge.org/landing/   -  For PC users, you need to install and ssh client – such as

Cygwin, Putty, Winscp :  hEp://www.cygwin.com    hEp://www.chiark.greenend.org.uk/~sgtatham/puEy/download.html    hEp://winscp.net  

Page 9: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

JHPCE user-model

“compute farm” of compute nodes (hosts)

Login server jhpce01/02

100 Mbs

workstations, desktops & laptops

.  .  .  

.  .  .  

10/2010  

direct attached

local disks (internal) ( /tmp, /scratch/temp )

storage farm

(NFS exported file systems)

/thumper

(24TB raw)

/thumper2 (24TB raw)

Sunfire X4500 servers

/home (and archives)(70.5TB raw)

Sun 7210 (plus expansion tray)

Nexsan Sataboys

/nexsan

(10.5TB raw)

1000

Mbs

sw

itche

s

2316 compute cores

1000 Mbs

/nexsan2

(14TB raw)

F.  Pineda,  M.  Newhouse  

Page 10: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Example 1 – Logging in-  Bring up Terminal

-  Run: ssh <USER>@jhpce01.jhsph.edu

-  2 Factor authentication

-  Shell prompt

Page 11: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Lab 1 - Logging In-  Bring up Terminal or get putty

-  Run: ssh <USER>@jhpce01.jhsph.edu

-  Change password with “passwd” command

-  Setup 2 factor authentication -­‐  hEp://jhpce.jhu.edu/knowledge-­‐base/how-­‐to/2-­‐factor-­‐authenQcaQon/  

-  Login again with 2 factor authentication

-  Setup ssh keys

-  Note: 25 GB limit on home directory.

Page 12: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

General Linux/Unix Commands-  ls

-  ls –l-  ls -al

-  pwd-  cd

-  date-  echo-  sleep

-  control-C

-  cp-  rm-  mv

-  mkdir-  rmdir

-  more, cat-  Editors: nano, vi-  Wildcard

-  man

-  Good overview of Unix: http://korflab.ucdavis.edu/Unix_and_Perl/unix_and_perl_v3.1.1.html

Page 13: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Submitting jobs to the queue with Sun Grid Engine (SGE)-  qsub – allows you to submit a batch job to the cluster -  qrsh – allows you establish an interactive job

-  qstat – allows you to see the status of your job

Page 14: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

JHPCE user-model

“compute farm” of compute nodes (hosts)

Login server jhpce01/02

100 Mbs

workstations, desktops & laptops

.  .  .  

.  .  .  

10/2010  

direct attached

local disks (internal) ( /tmp, /scratch/temp )

storage farm

(NFS exported file systems)

/thumper

(24TB raw)

/thumper2 (24TB raw)

Sunfire X4500 servers

/home (and archives)(70.5TB raw)

Sun 7210 (plus expansion tray)

Nexsan Sataboys

/nexsan

(10.5TB raw)

1000

Mbs

sw

itche

s

2316 compute cores

1000 Mbs

/nexsan2

(14TB raw)

F.  Pineda,  M.  Newhouse  

Page 15: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Example/Lab 2 - Using the clusterExample 2a – submitting a batch job cd class-scriptsqsub –cwd script1qstat examine results files Example 2b – using the interactive queue

qrsh Note current directory

Page 16: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Never run a job on the login node!-  Always use “qsub” or qrsh” to make use of the

compute nodes -  Jobs that are found running on the login node may be

killed at will -  If you are going to be compiling programs, do so on a

compute node via qrsh.

Page 17: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Other useful SGE commandsqstat – shows information about jobs you are running

qhost – shows information about the nodes qdel – deletes your job

Page 18: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

A few options for “qsub”•  run the job in the current working directory (where qsub was executed) rather than the default (home directory)

-cwd

•  send standard output (error) stream to a different file -o path/filename-e path/filename

•  receive notification via email when your job completes -m e -M [email protected]

F.  Pineda,  M.  Newhouse  

Page 19: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Requesting resources in qsub-  By default, when you submit a job, you are allotted 1 slot

and requesting 2GB or RAM -  Note .sge_request file -  If you need more, you can request more RAM and slots by 2

methods: -­‐  OpQons  to  the  qsub  command  

-  qsub –l mem_free=10G,h_vmem=11G .  .  .  

-­‐  SeSng  them  up  in  your  batch  job  script  #$ -l mem_free=10.0G #$ -l h_vmem=11G

#$ -m e#$ -M [email protected]

 

 

Page 20: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Estimating RAM usage-  No easy formula until you’ve actually run something -  A good place to start is the size of the files you will be

reading in -  If you are going to be running many of the same type

of job, run one job, and use “qacct –j <jobid>” to look at “maxvmem”.

Page 21: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

MulQple  slots  for  MulQ-­‐threaded  jobs  

•  Use the -pe local N option to request multiple slots on a cluster node for a multi-threaded job or session

• The mem_free and h_vmem limits need to be divided by the number of cores/slots requested. Examples: - A job which will use 8 cores and need 40GB of RAM

qsub –cwd –pe local 8 -l mem_free=5G,h_vmem=6G myscript.sh - A job which will use 10 cores and need 120GB of RAM: qsub –cwd -pe local 10 -l mem_free=12G,h_vmem=13G myscript2.sh

Note:  The    local parallel  environment  is  a  construct  name  par;cular  to  our  cluster  …  and  implies  that  your  request  is  for  that  many  slots  on  a  single  cluster  node.    

F.  Pineda,  M.  Newhouse  

Page 22: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Types of parallelism1. Embarrassingly parallel … http://en.wikipedia.org/wiki/Embarrassingly_parallel 2. Multi-core (or multi-threaded) – a single job using multiple CPU cores via program threads on a single machine (cluster node). Also see discussion of fine-grained vs coarse-grained parallelism at http://en.wikipedia.org/wiki/Parallel_computing 3. Many CPU cores on many nodes using a Message Passing Interface (MPI) environment. Not used much on the JHPCE Cluster. F. Pineda, M. Newhouse

Page 23: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

File size limitations-  There is a default file size limitation of 10GB for all

jobs. -  If you are going to be creating files larger than 10GB

you need to use the “h_fsize” option. -  Example: $ qsub –cwd -l h_fsize=50G myscript.sh

Page 24: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

JHPCE user-model

“compute farm” of compute nodes (hosts)

Login server jhpce01/02

100 Mbs

workstations, desktops & laptops

.  .  .  

.  .  .  

10/2010  

direct attached

local disks (internal) ( /tmp, /scratch/temp )

storage farm

(NFS exported file systems)

/thumper

(24TB raw)

/thumper2 (24TB raw)

Sunfire X4500 servers

/home (and archives)(70.5TB raw)

Sun 7210 (plus expansion tray)

Nexsan Sataboys

/nexsan

(10.5TB raw)

1000

Mbs

sw

itche

s

2316 compute cores

1000 Mbs

/nexsan2

(14TB raw)

F.  Pineda,  M.  Newhouse  

Page 25: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Example of staging data to $TMPDIR#!/bin/bash###################### stage the input data###################### (1) copy 628MB of compressed data

mkdir $TMPDIR/datacp ~/data/test.tar.Z $TMPDIR/data/.cd $TMPDIR/data

# (2) decompress and untar the data

tar -xzvf test.tar.Zdu -ch $TMPDIR/data | grep total

###################### Do the processing###################### in this case just move the data to a # temporary output directory and tar it again

mkdir $TMPDIR/outputmv $TMPDIR/data $TMPDIR/outputcd $TMPDIR/outputtar -czf results.tar.Z data

###################### save the results of the computation and quit#####################

cp results.tar.Z $HOME/.exit 0

F.  Pineda,  M.  Newhouse  

Page 26: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

“How many jobs can I submit?”As many as you want … but this should be less than 10000 BUT, only a limited number of “slots” will run. The rest will have the queue wait state ‘qw’ and will run as your other jobs finish. In SGE, a slot generally corresponds to a single cpu-core. Currently, the maximum number of standard queue slots per user is 200. The maximum number of slots per user may change depending on the availability of cluster resources or special needs and requests. There are dedicated queues for stakeholders which may have custom configurations.

F.  Pineda,  M.  Newhouse  

Page 27: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Example/Lab 3Running R on the cluster. -  Note “r” files – Script file and R file

-  Submit Script file -  qsub -cwd plot1.sh

-  Run R commands interactively

-  X Windows Setup

Page 28: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Example 4Transferring files to the cluster -  Transfer results back scp [email protected]:class-scripts/plot1-R-results.pdf . Or use WinSCP, Filezilla…

-  Look at resulting PDF file

-  Look at pdf via xpdf

Page 29: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Other queues-  “shared” queue – default queue -  “download” queue (jhpce03.jhsph.edu)

-  qrsh –l download

-  “math” queue -  qrsh –l math

-  “sas” queue -  qrsh –l sas

Page 30: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

ModulesModules for R, SAS, Mathematica . . . -  module list -  module avail -  module load -  module unload

Page 31: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Summary-  Get familiar with Linux -  Use ssh to connect to JHPCE cluster -  Use qsub and qrsh to submit jobs -  Never run jobs on the login node -  Helpful resources

-  http://www.jhpce.jhu.edu/ -  [email protected] - System issues -  [email protected] - Application issues

-  Initial Login Process

-­‐  You  will  receive  an  email  with  your  login  ID  and  iniQal  password.  -­‐  Login  to  the  cluster  and  change  your  password  with  the  “passwd”  

command.  -­‐  Home  directory  limited  to  25GB.  

Page 32: Joint High Performance Computing Exchange (JHPCE) Cluster ... · - Common applications: R, SAS, Stata, perl - 56 Nodes - Nodes range from having 2 CPUs to 4 CPUs, with between 4 and

© 2014, Johns Hopkins University. All rights reserved.

Thanks for attending! Questions?