SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high...

26
SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing

Transcript of SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high...

Page 1: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Overview of UCSD’s Triton Resource

A cost-effective, high performance shared resource for research

computing

Page 2: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

What is the Triton Resource?

• A medium-scale high performance computing (HPC) and data storage system

• Designed to serve the needs of UC researchers:• Turn-key, cost competitive access to a robust computing resource• Supports computing research, scientific & engineering computing,

large scale data analysis• Lengthy proposals & long waits for access are not required• Support short- or long-term projects• Flexible usage models are accommodated• Free of equipment headaches and staffing costs associated with

maintaining a dedicated cluster

Page 3: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Triton Resource Components

High Performance Network

Data Oasis: 2,000 – 4,000 terabytes of disk storage for research data

Petascale Data Analysis Facility (PDAF): Unique SMP system for analyzing very large datasets. 28 nodes, 256/512GB of memory, 8 quad core AMD Shanghai processors/node (32 cores/node).

Triton Compute Cluster (TCC): Medium-scale cluster system for general purpose HPC. 256 nodes, 24GB of memory, 2 quad core Nehalem processors/node (8 cores/node).

High Performance File System

To High Bandwidth Research

Networks & Internet

Page 4: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Flexible Usage Models

• Shared-queue access• Compute nodes are shared with other users• Jobs are submitted to queue and wait to run• Batch and interactive jobs are supported• User accounts are debited by actual service units consumed by the job

• Dedicated compute nodes• User can reserve a fixed number of compute nodes for exclusive access• User is charged for 24x7 use of the nodes at 70% utilization

• Any utilization over 70% is a “bonus”

• Nodes may be reserved on a monthly basis• Hybrid

• Dedicated nodes for core computing tasks and shared-queue access for overflow or jobs that are not time-critical or jobs requiring higher core counts

Page 5: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Triton Resource Benefits

• Short lead time for project start-up• Low waits-in-queue• No lengthy proposal process• Flexible usage models:• Access to HPC experts for setup, software optimization and trouble-

shooting• Avoid using research staff for sysadmin tasks• Avoid headaches with maintenance, aging equipment, project wind-

down• Access to parallel high performance, high capacity storage system• Access to high bandwidth research networks

Page 6: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Triton Affiliates & Partners Program

• TAPP is SDSC’s program for accessing the Triton Resource

• Two components:• Central Campus Purchase• Individual / Department Purchase

• Central Campus Purchase• Block purchase made by central campus then allocated out to

individual faculty / researchers

• Individual Purchase• Faculty / researchers / departments purchase cycles from grants or

other funding

• Startup Accounts• 1,000 SU accounts for evaluation are granted upon request

Page 7: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Contact for access/allocations:

Ron HawkinsTAPP Manager

[email protected](858) 534-5045

Page 8: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Numerical Libraries on Triton

Mahidhar Tatineni04/22/2010

Page 9: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

AMD Core Math Library (ACML) • Installed on Triton as part of the PGI compiler

installation directory.

• Covers BLAS, LAPACK, and FFT routines.

• ACML user guide is in the following location:/opt/pgi/linux86-64/8.0-6/doc/acml.pdf

• Example BLAS, LAPACK, FFT codes in:/home/diag/examples/ACML

Page 10: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

BLAS Example Using ACML

• Compile and link as follows:pgcc -L/opt/pgi/linux86-64/8.0-6/lib blas_cdotu.c -lacml -lm -

lpgftnrtl –lrt

• Output:-bash-3.2$ ./a.outACML example: dot product of two complex vectors using cdotu------------------------------------------------------------

Vector x: ( 1.0000, 2.0000) ( 2.0000, 1.0000) ( 1.0000, 3.0000)Vector y: ( 3.0000, 1.0000) ( 1.0000, 4.0000) ( 1.0000, 2.0000)r = x.y = ( -6.000, 21.000)

Page 11: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Lapack Example Using ACML

• Compile and link as follows:pgcc -L/opt/pgi/linux86-64/8.0-6/lib lapack_dgesdd.c -lacml -

lm -lpgftnrtl –lrt

• Output:-bash-3.2$ ./a.outACML example: SVD of a matrix A using dgesdd--------------------------------------------

Matrix A: -0.5700 -1.2800 -0.3900 0.2500 -1.9300 1.0800 -0.3100 -2.1400 2.3000 0.2400 0.4000 -0.3500 -1.9300 0.6400 -0.6600 0.0800

Singular values of matrix A: 3.9147 2.2959 1.1184 0.3237

Page 12: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

FFT Example Using ACML• Compile and link as follows:

pgf90 dzfft_example.f -L/opt/pgi/linux86-64/8.0-6/lib –lacml

• Output:-bash-3.2$ ./a.out

ACML example: FFT of a real sequence using ZFFT1D

--------------------------------------------------

Components of discrete Fourier transform:

1 2.4836

2 -0.2660

3 -0.2577

4 -0.2564

5 0.0581

6 0.2030

7 0.5309

Original sequence as restored by inverse transform:

Original Restored

1 0.3491 0.3491

2 0.5489 0.5489

3 0.7478 0.7478

4 0.9446 0.9446

5 1.1385 1.1385

6 1.3285 1.3285

7 1.5137 1.5137

Page 13: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Intel Math Kernel Libraries (MKL)

• Installed on Triton as part of the Intel compiler directory.

• Covers BLAS, LAPACK, FFT, BLACS, and SCALAPACK libraries.

• Most useful link: The Intel link advisor!http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/

• Examples in the following directory:• /home/diag/examples/MKL

Page 14: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

CBLAS example using MKL

• Compile as follows:> export MKLPATH=/opt/intel/Compiler/11.1/046/mkl

> icc cblas_cdotu_subx.c common_func.c -I$MKLPATH/include $MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/lib/em64t/libmkl_intel_lp64.a $MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a -Wl,--end-group -lpthread

• Run as follows:[mtatineni@login-4-0 MKL]$ ./a.out cblas_cdotu_subx.d C B L A S _ C D O T U _ S U B EXAMPLE PROGRAM INPUT DATA N=4 VECTOR X INCX=1 ( 1.00, 1.00) ( 2.00, -1.00) ( 3.00, 1.00) ( 4.00, -1.00) VECTOR Y INCY=1 ( 3.50, 0.00) ( 7.10, 0.00) ( 1.20, 0.00) ( 4.70, 0.00) OUTPUT DATA CDOTU_SUB = ( 40.100, -7.100)

Page 15: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

LAPACK example using MKL

• Compile as follows:ifort dgebrdx.f -I$MKLPATH/include

$MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/lib/em64t/libmkl_intel_lp64.a $MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a -Wl,--end-group libaux_em64t_intel.a -lpthread

• Output:[mtatineni@login-4-0 MKL]$ ./a.out < dgebrdx.d DGEBRD Example Program Results Diagonal 3.6177 2.4161 -1.9213 -1.4265 Super-diagonal 1.2587 1.5262 -1.1895

Page 16: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

ScaLAPACK example using MKL

• Sample test case (from MKL examples) is in:• /home/diag/examples/scalapack

• The make file is set up to compile all the tests. Procedure:module purgemodule load intelmodule load openmpi_mxmake libem64t compiler=intel mpi=openmpi LIBdir=/opt/intel/Compiler/11.1/046/mkl/lib/em64t

• Sample link line (to illustrate how to link for scalapack):mpif77 -o ../xsdtlu_libem64t_openmpi_intel_noopt_lp64 psdtdriver.o psdtinfo.o psdtlaschk.o

psdbmv1.o psbmatgen.o psmatgen.o pmatgeninc.o -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_scalapack_lp64.a /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_blacs_openmpi_lp64.a -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_intel_lp64.a -Wl,--start-group /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_sequential.a /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_core.a -Wl,--end-group -lpthread

Page 17: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Profiling Tools on Triton

• FPMPI • MPI profiling library: /home/beta/fpmpi/fpmpi-2(PGI +MPICH MX)

• TAU• Profiling and tracing toolkit for performance analysis of

parallel programs written in Fortran, C, C++, Java, Python. Available on Triton, compiled with PGI compilers.

/home/beta/tau/2.19-pgi

/home/beta/pdt/3.15-pgi

Page 18: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Using FPMPI on Triton

• The library is located in:/home/beta/fpmpi/fpmpi-2/lib

• Needs PGI and MPICH MX:> module purge> module load pgi> module load mpich_mx

• Just relink with the library. For example:/opt/pgi/mpichmx_pgi/bin/mpicc -o cpi cpi.o -L/home/beta/fpmpi/fpmpi-2/lib -lfpmpi

Page 19: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Using FPMPI on Triton

• Run code normally:>mpirun -machinefile $PBS_NODEFILE -np 2 ./cpiProcess 1 on tcc-2-25.localpi is approximately 3.1416009869231241, Error is 0.0000083333333309wall clock time = 0.036982Process 0 on tcc-2-25.local

• Creates output file (fpmpi_profile.txt) with profile data.

• Check /home/diag/FPMPI directory for more examples.

Page 20: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Sample FPMPI Output• Command: /mirage/mtatineni/TESTS/FPMPI/./cpi

• Date: Wed Apr 21 16:44:04 2010• Processes: 2• Execute time:0• Timing Stats: [seconds] [min/max] [min rank/max rank]• wall-clock: 0 sec 0.000000 / 0.000000 0 / 0

• Memory Usage Stats (RSS) [min/max KB]: 825/926

• Average of sums over all processes• Routine Calls Time Msg Length %Time by message length• 0.........1........1........• K M• MPI_Bcast : 2 0.00179 4 0*00000000000000000000000000• MPI_Reduce : 1 0.0252 8 00*0000000000000000000000000

Page 21: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Sample FPMPI Output• Details for each MPI routine• Average of sums over all processes• % by message length• (max over 0.........1........1........• processes [rank]) K M• MPI_Bcast:• Calls : 2 2 [ 0] 0*00000000000000000000000000• Time : 0.00179 0.00356 [ 1] 0*00000000000000000000000000• Data Sent : 4 8 [ 0]• By bin : 1-4 [2,2] [ 5.96e-06, 0.00356]• MPI_Reduce:• Calls : 1 1 [ 0] 00*0000000000000000000000000• Time : 0.0252 0.027 [ 0] 00*0000000000000000000000000• Data Sent : 8 8 [ 0]• By bin : 5-8 [1,1] [ 0.0235, 0.027]

• Summary of target processes for point-to-point communication:• 1-norm distance of point-to-point with an assumed 2-d topology• (Maximum distance for point-to-point communication from each process)• 0 0

• Detailed partner data: source: dest1 dest2 ...• Size of COMM_WORLD 2• 0:• 1:

Page 22: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

About TauTAU is a suite of Tuning and Analysis Utilitieswww.cs.uoregon.edu/research/tau

• 11+ year project involving• University of Oregon Performance Research Lab• LANL Advanced Computing Laboratory• Research Centre Julich at ZAM, Germany

• Integrated toolkit• Performance instrumentation• Measurement• Analysis• Visualization

Page 23: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Using Tau• Load the papi and tau modules

• Gather information for the profile run:• Type of run (profiling/tracing, hardware counters, etc…)• Programming Paradigm (MPI/OMP)• Compiler (Intel/PGI/GCC…)

• Select the appropriate TAU_MAKEFILE based on your choices ($TAU/Makefile.*)

• Set up the selected PAPI counters in your submission script

• Run as usual & analyze using paraprof • You can transfer the database to your own PC to do the analysis

Page 24: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

TAU Performance System Architecture

Page 25: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Tau: ExampleSet up the tau environment (this will be in modules in the next software stack of Triton):

export PATH=/home/beta/tau/2.19-pgi/x86_64/bin:$PATHexport LD_LIBRARY_PATH=/home/beta/tau/2.19-pgi/x86_64/lib:$LD_LIBRARY_PATH

Choose the TAU_MAKEFILE to use for your code. For example:/home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi

So we set it up:% export TAU_MAKEFILE=/home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi

And we compile using the wrapper provided by tau:% tau_cc.sh matmult.c

Run the job through the queue normally. Analyze output using paraprof. (More detail in the Ranger part of the presentation).

Page 26: SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

SAN DIEGO SUPERCOMPUTER CENTER

Coming Soon on Triton

• Data Oasis version 0! We have the hardware on site and are working to get the lustre filesystem setup (~350TB).

• Upgrade of entire software stack. A lot of the packages in /home/beta will become a permanent part of the stack (we have rocks rolls for them). This will happen within a month.

• mpiP will be installed soon on Triton.

• PAPI/IPM needs perfctr patch of kernel. Need to integrate this into our stack (Not in the current upgrade).