The JANUS Computing Environment · Software support RC expertise less general Unsupported software...

Post on 12-Jul-2020

3 views 0 download

Transcript of The JANUS Computing Environment · Software support RC expertise less general Unsupported software...

Research ComputingUNIVERSITY OF COLORADO

The JANUS Computing Environment

Monte Lunacekmonte.lunacek@colorado.edu

rc-help@colorado.edu

Thursday, June 21, 12

What is JANUS?November, 2011

1,368 Compute nodes

16,416 processors

~ 20 GB of available space

~ 800 TB of storage

2.8Ghz Intel Westmere

TFLOPS is a rate of execution, trillions of floating point operations per second

Thursday, June 21, 12

NUMA Architecture

Resource Management and

“queues”

Parallel file systems

Different architectures

Explicit environment

Lots of ways to do something...

Thursday, June 21, 12

Online resourceswww.rc.colorado.edu

Thursday, June 21, 12

OverviewAccess

Login, file system, data transfer

Software

Supported software, dotkits, building software

Resource Management

Queues, Moab, and Torque

Running Jobs

Single-core, load-balanced, MPI, OpenMP

Questions

Thursday, June 21, 12

Access

Thursday, June 21, 12

Login Proceduressh <username>@login.rc.colorado.edu

Password: Yubikeys or Cryptocards

Thursday, June 21, 12

RC FilesystemHome directory

/home/<user_name>

2 Gb, Network File System (NFS)

Project space

/projects/<user_name>

250 Gb, NFS

Scratch space

/lustre/janus_scratch/<user_name>

No quota, no backup

Lustre file system

Build software here

Run software here

Thursday, June 21, 12

SnapshotDid you accidentally remove a file or directory?

$HOME/.snapshot/hourly.[0-12]$HOME/.snapshot/nightly.[0-6]$HOME/.snapshot/weekly.[0-7]

Example

rm $HOME/bugreport.csh cp $HOME/.snapshot/weekly.0/bugreport.csh $HOME

Where?

$HOME/.snapshot/projects/<user_name>/.snapshot

Thursday, June 21, 12

LustreScalable, POSIX-compliant parallel file system designed for large, distributed-memory systems

Object Storage Targets (OST)

Store user file data

Object Storage Servers (OSS)

Control I/O access and handling network request

Metadata Target (MDT)

Stores filenames, directories, permissions and file layout

Metadata Server (MDS)

Assigns storage locations associated with each file in order to direct file I/O requests to the correct set of OST

Thursday, June 21, 12

IB

MDS MDT

OSS OST

Metadata server (MDS) and target (MDT)

Object storage server (OSS) and target (OST)

Thursday, June 21, 12

IB

MDT

OSS OST

File Access

MDS

Compute node requests storage location

Compute node then interacts directly with OST

Thursday, June 21, 12

StripingFile - contiguous sequence of bytes

Key feature: Lustre file system can distribute these segments multiple OSTs using a technique called file striping.

A file is said to be striped when its contiguous sequence of bytes is separated into small chunks, or stripes, so that read and write operations can access multiple OSTs concurrently.

/file

/file

Thursday, June 21, 12

File I/O

/file /file1 /file2 /filen

Serial File-per-process

Shared file

/file

Collective Buffering: Not currently supported on JANUS

Thursday, June 21, 12

Single processor

stripe count

writ

e sp

eed

(Mb/

s)

0

200

400

600

800

● ● ●●

●●

1 2 4 8 15 30 60

Transfer size● 1 mb● 32 mb

Thursday, June 21, 12

File per processor

processors (files)

writ

e sp

eed

(Mb/

s)

0

2000

4000

6000

8000

10000

12000

●●

● ●●

1 2 4 8 16 32 64 128 256 512 1024 2048

Thursday, June 21, 12

Shared-file with striping

processors (files)

writ

e sp

eed

(Mb/

s)

1000

2000

3000

4000

5000

6000

7000

● ● ● ● ●

1 2 4 8 16 32 64 128 256 1024

Thursday, June 21, 12

Examples

bash-janus> mkdir temp_dir

bash-janus> lfs setstripe -c 3 temp_dir

bash-janus> touch temp_dir/temp_file

bash-janus> lfs getstripe temp_dir

temp_dirstripe_count: 3 stripe_size: 33554432 stripe_offset: -1

temp_dir/temp_filelmm_stripe_count: 3lmm_stripe_size: 33554432lmm_stripe_offset: 18 obdidx objid objid group 18 12787913 0xc320c9 0 7 12863377 0xc44791 0 23 12496893 0xbeaffd 0

Thursday, June 21, 12

Data transferhttps://www.rc.colorado.edu/crcdocs/file-transfer

Grid FTP

GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks

Globus Online

Large file transfers with “drag and drop archiving” to move data between its long-time archival storage and compute systems

Utilities

scp, sftp, rsync

Good for small files

Thursday, June 21, 12

Access tipsControl Sockets

One-time passwords make multiple terminal sessions and file transer painful.

mkdir -p ~/.ssh/socketscat >> ~/.ssh/config << EOFHost login.rc* ControlMaster auto ControlPath ~/.ssh/sockets/%r@%h:%pEOF

Mount Drive

http://macfusionapp.org/

Symbolic links

/project, /scratch

Thursday, June 21, 12

Software

Thursday, June 21, 12

Software support

RC expertise

less general

Unsupported software

Installation

Consulting

Advice on installing your software

and any dependancies

Supported software

select state-of-the-art software

Installation, verification, and training

user

exp

ertis

e

Thursday, June 21, 12

Environment

To run an executable, you need to know where it is.

/opt/openmpi/1.4.4/bin/mpicxx

/opt/mpitch2/1.5a2/bin/mpicxx

Which one does the command which mpicxx use?

PATH

What about libraries?

/opt/openmpi/1.4.4/lib/libmpi.so

/opt/mpitch2/1.5a2/lib/libmpi.so

LD_LIBRARY_PATH

Thursday, June 21, 12

DotkitManages your environmental variables

use list packages in use

use -a list hidden packages in use

use <package_name> add a package to environment

unuse <package_name> remove package from environment

use -la list available packages

use -la <term> list packages that contain <term>

Thursday, June 21, 12

Examplesuse NCAR-Parallel-Intel

bash-janus> echo $PATH/curc/tools/free/redhat_5_x86_64/parallel-netcdf-1.2.0_openmpi-1.4.5_intel-12.1.4/bin/curc/tools/free/redhat_5_x86_64/openmpi-1.4.5_intel-12.1.4/bin/curc/tools/free/redhat_5_x86_64/torque-2.5.8/bin/curc/tools/free/redhat_5_x86_64/netcdf-4.1.3_intel-12.1.4_hdf-4.2.6_hdf5-1.8.8_openmpi-1.4.5/bin/curc/tools/free/redhat_5_x86_64/hdf5-1.8.8_openmpi-1.4.5_intel-12.1.4/bin/curc/tools/nonfree/redhat_5_x86_64/intel-12.1.4/composer_xe_2011_sp1.10.319/bin/intel64/curc/tools/free/redhat_5_x86_64/sun_jdk-1.6.0_23-x86_64/bin/curc/tools/free/redhat_5_x86_64/hdf-4.2.6_ics-2012.0.032/bin/curc/tools/free/redhat_5_x86_64/szip-2.1/bin/curc/tools/nonfree/redhat_5_x86_64/moab-6.1.5/bin

Thursday, June 21, 12

Building SoftwareI need the Boost C++ library for my software. Where should I build this?

/home/molu8455/projects/software/boost/1.49.0

Build on a compute node (e.g. qsub -I)

Ideas

Consider sharing this with your group.

How about your own dotkit?

Thursday, June 21, 12

Build your own dotkitcat $HOME/.kits/TeachingHPC.dk

#c Teaching HPC#d This contains the libraries I use for teaching HPC:#d .openmpi-1.4.3_gcc-4.5.2_torque-2.5.8_ib #d .hdf5-1.8.6

# Dependenciesdk_op -q .torque-2.5.8dk_op -q .openmpi-1.4.3_gcc-4.5.2_torque-2.5.8_ib dk_op -q .hdf5-1.8.6

# Variablesdk_alter HDF5_DIR /curc/tools/free/redhat_5_x86_64/hdf5-1.8.6dk_alter BOOST_ROOT /home/molu8455/projects/software/boost/1.49.0

dk_alter LD_LIBRARY_PATH /home/molu8455/projects/software/boost/1.49.0/lib

Thursday, June 21, 12

Resource Management

Thursday, June 21, 12

Scheduling

1

6

4

3

5

7

2Time

Node

s

Thursday, June 21, 12

Scheduling

1

6

4

3 572

Time

Node

s

Thursday, June 21, 12

Moab and TorqueMoab

Brains of the operation

Comes up with the “schedule”

Torque

Reports information to Moab

Receives direction from Moab

Handles users requests

Provide job query facilities

Thursday, June 21, 12

Commands

showq -u <username> Show jobs in the queue

canceljob <job_id> or ALL Cancel your job(s)

checkjob <job_id> Information about your job

qsub submit jobs

showstart <job_id> When will your job start?

showq -u <username> Show jobs in the queue

Thursday, June 21, 12

qsubRequest a resource for your job

1) batch or 2) interactive

Makes environmental variables available to your job

PBS_O_*PBS_O_WORKDIRPBS_NODEFILE

Options

-q <queue_name>-l <resource_list>-I interactive-N <name>-e <error_path>-o <output_path>-j <join_path>

Thursday, June 21, 12

Queues

Name Nodes Max Time Node Sharing

janus-debug 1-480 1 hour

janus-short 1-480 4 hours

janus-long 1-80 7 days

janus-small 1-20 1 day

janus-normal 21-80 1 day

janus-wide 81-480 1 day

Thursday, June 21, 12

Running Jobs

Thursday, June 21, 12

ProcessHow many processors do I need?

Approximately how long will this take?

showstart 1024@30:00showstart 16@16:00:00

Which queue best fits this criteria?

2

4

Node

s

Time

Name Nodes Max Time Node Sharing

janus-debug 1-480 1 hour

janus-short 1-480 4 hours

janus-long 1-80 7 days

janus-small 1-20 1 day

janus-normal 21-80 1 day

janus-wide 81-480 1 day

Thursday, June 21, 12

Serial Jobs#!/bin/bash

#PBS -N example_1#PBS -q janus-debug#PBS -l walltime=00:05:00#PBS -l nodes=1:ppn=1#PBS -e errfile#PBS -o outfile

cd $PBS_O_WORKDIR

# run trial 1 of the simulator./simulator 1 > sim.1

Thursday, June 21, 12

Pack the node#!/bin/bash

#PBS -N example_2#PBS -q janus-debug#PBS -l walltime=0:00:30, nodes=1:ppn=12

cd $PBS_O_WORKDIR

./simulator 1 > sim.1 &

./simulator 2 > sim.2 &

./simulator 3 > sim.3 &

./simulator 4 > sim.4 &

./simulator 5 > sim.5 &

./simulator 6 > sim.6 &

./simulator 7 > sim.7 &

./simulator 8 > sim.8 &

./simulator 9 > sim.9 &

./simulator 10 > sim.10 &

./simulator 11 > sim.11 &

./simulator 12 > sim.12 &

wait

Thursday, June 21, 12

Multi-node serial jobs?Consider using our load-balancing tool.

https://www.rc.colorado.edu/tutorials/loadbalance

#!/bin/bash#PBS -N example_1#PBS -q janus-debug#PBS -l walltime=00:05:00#PBS -l nodes=2:ppn=12

cd $PBS_O_WORKDIR

. /curc/tools/utils/dkinitreuse LoadBalance

mpirun load_balance -f cmd_lines

./simulator 1 > sim.1

./simulator 2 > sim.2

./simulator 3 > sim.3

./simulator 4 > sim.4

./simulator 5 > sim.5

./simulator 6 > sim.6

./simulator 7 > sim.7

./simulator 8 > sim.8

./simulator 9 > sim.9

./simulator 10 > sim.10

...

./simulator 2000 > sim.2000

Thursday, June 21, 12

MPI#!/bin/bash

#PBS -N example_4#PBS -q janus-debug#PBS -l walltime=0:10:00#PBS -l nodes=3:ppn=12

cd $PBS_O_WORKDIRresuse .openmpi-1.4.5_intel-12.1.4

# run trial 1 of the simulatormpirun -np 36 ./simulator mpirun ./simulator

Thursday, June 21, 12

Non-Uniform Memory Access (NUMA)Each socket has a dedicated memory area for high speed access

Also has an interconnect to other sockets for slower access to the other sockets' memory

memory memory

memory controlmemory control

Thursday, June 21, 12

MPI OpenMP / High Memory#!/bin/bash

#PBS -N example_5#PBS -q janus-debug#PBS -l walltime=0:10:00#PBS -l nodes=3:ppn=12

cd $PBS_O_WORKDIR. /curc/tools/utils/dkinitresuse .openmpi-1.4.5_intel-12.1.4

export OMP_NUM_THREADS=12mpirun --bind-to-core --bynode --npernode 1 ./simulator

export OMP_NUM_THREADS=6mpirun --bind-to-socket --bysocket --npersocket 1 ./simulator

Thursday, June 21, 12

SummaryAccess

Use control sockets for login

Filesystem

Build software in /projects/<username>

Run your jobs in /lustre/janus_scratch/<user_name>

Recover files with .snapshot

Consider striping when using shared-file access.

Data Transfer

Large files: Globus Online, Grid FTP

Smaller files: sftp, scp

Thursday, June 21, 12

Software

Build on compute node.

Manage environment with your own dotkits.

Resource Management

Familiarize yourself with the queues

When you have choices... showstart

Running Jobs

Request what you need and manage with LoadBalance

OpenMP: be aware of NUMA

Limit the number of processes per node for hybrid and high memory

Thursday, June 21, 12

Questions?

Thursday, June 21, 12

Collective bufferingAt large core counts, I/O performance can be hindered by:

MDS contention (file-per-process)

file system contention (shared-file)

Use a subset of application processes to perform I/O.

limits the number of files (file-per-process)

limits the number of processes accessing file system resources (shared-file).

Offloads work from the file system to the application

A subset of processors write - reducing contention

Thursday, June 21, 12