Getting your hands on Cori’s Burst Bufferpress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC...•...

Debbie BardData and Analytics Services

ATPSEC 2017 IO Day

Getting your hands on Cori’s Burst Buffer

Scratch allocation

• ‘type=scratch’ – duration just for compute job (i.e. not ‘persistent’)

• ‘access_mode=striped’ – visible to all compute nodes (i.e. not ‘private’) and striped across multiple BB nodes

–Actual distribution across BB Nodes is in units of (configurable) granularity (currently 80 GB at NERSC in wlm_pool, so 1000 GB would normally be placed on 13 BB nodes)

• Data ‘stage_in’ before job start and ‘stage_out’ after

#!/bin/bash#SBATCH –p regular –N 10 –t 00:10:00#DW jobdw capacity=1000GB access_mode=striped type=scratch#DW stage_in source=/lustre/inputs destination=$DW_JOB_STRIPED/inputs \ type=directory#DW stage_in source=/lustre/file.dat destination=$DW_JOB_STRIPED/ type=file#DW stage_out source=$DW_JOB_STRIPED/outputs destination=/lustre/outputs \ type=directorysrun my.x --indir=$DW_JOB_STRIPED/inputs --infile=$DW_JOB_STRIPED/file.dat \ --outdir=$DW_JOB_STRIPED/outputs

Scratch allocation

Step 1: log onto Cori

• We have temporary user accounts for NERSC (that will expire at the end of the day) and a reservation of Haswell nodes on Cori.

• Using your training account (or your own NERSC account): ssh username@cori.nersc.gov– Note that if you use your own NERSC account you won’t be

part of our compute reservation but you can still submit jobs that run on the Burst Buffer.

Copy over the example scripts and test data

• Pull down the example scripts and the test data file onto your scratch space– We’re using scratch (i.e. Lustre) because it’s currently the

only filesystem the burst buffer can access on Cori.

cd $SCRATCH

git clone https://github.com/NERSC/train.git

cd train/atpesc-IO-day/IntroToBB/

mkdir data

cp /global/cscratch1/sd/djbard/test1Gb.db data/

Run the first simple script!

• Go to the directory train/atpesc-IO-day/Int

roToBB/examples and look at the example script “scratch.sh”:

• Note: on this slide the wrong reservation name is given - but should be correct in the github scripts. Use: – #SBATCH -C haswell– #SBATCH

--reservation=”atpesctrain”

“atpesctrain”

haswell

• Submit the job using “sbatch scratch.sh”• User “sqs” and “squeue” to view the status of

your job

• Submit the job using “sbatch scratch.sh”• User “sqs” and “squeue” to view the status of

your job

View Burst Buffer status

• User “scontrol show burst” to look at what the Burst Buffer is doing right now

Stage in data to a scratch allocation

• Look at “stage_in.sh”.• Edit it to point to your

OWN file/directory that you want to stage_in

• Submit the job• Check the output log file -

did the directory stage in as expected?

“atpesctrain”

haswell

Stage out data from a scratch allocation

• Look at “stage_out.sh”.• Edit it to point to your OWN

scratch directory• Submit the job• Check your scratch directory - did

“hello.txt” stage out as expected?

“atpesctrain”

haswell

•Using a persistent DataWarp instance–Lifetime different from the batch job

–Usable by any batch job (posix permissions permitting)

–name=xyz: Name of persistent instance to use

Persistent reservations

Use in another job Delete

Create a persistent reservation

• Look at “create_persistent.sh”

• Edit it to rename the persistent reservation!

• Submit it, then use “scontrol show burst” to check it’s been created.

“atpesctrain”

haswell

Use a persistent reservation

• Look at “use_persistent.sh”• Change:

– The PR name, – The stage_in path– The variable

$DW_PERSISTENT_STRIPED_name

“atpesctrain”

haswell

Destroy a persistent reservation

• Look at “destroy_persistent.sh”• Change the name to your own PR name. • Submit it, then use “scontrol show burst” to check

it’s been destroyed. – If you get the name wrong, you’ll get NO error messages.

“atpesctrain”

haswell

Using the Burst Buffer interactively

• Look at “interactive_scratch.conf”:

• Request an interactive session using “salloc”:salloc -N 1 -t 5 -C haswell --reservation="atpesctrain" --bbf=”interactive_scratch.conf”

salloc -N 1 -t 5 -C knl --qos=interactive --bbf=”interactive_scratch.conf”

• Play around with the BB interactively!– Try running IOR using the executable in your examples

directory: srun -n 8 ./IOR -a MPIIO -g -t 10MiB -b 100MiB -o

$DW_JOB_STRIPED/IOR_file

• Also “interactive_persistent.conf” for a PR.

Using the Burst Buffer interactively

• Look at “interactive_scratch.conf”:

• Request an interactive session using “salloc”:

• Play around with the BB interactively!– Try running IOR using the executable in your examples

directory: • ./IOR -a MPIIO -g -t 20MiB -b 100MiB -o

$DW_JOB_STRIPED/IOR_file

Using dwstat

• Dwstat is a useful tool to learn exactly what’s going on with the BB allocations. – Only accessible from

compute nodes.

• Look at “dwstat.sh” for how to find what DW nodes *your* BB allocation is striped over.

“atpesctrain”

haswell

Running IOR on the BB

• Take a look at “run_IOR.sh”– This script sets all the options to run IOR. You can edit

it to run on a scratch BB allocation, or on scratch, or in your home directory.

“atpesctrain”

Running IOR on the BB

• Take a look at “run_IOR.sh”.

Getting your hands on Cori’s Burst Bufferpress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC...•...

Documents

Transcript of Getting your hands on Cori’s Burst Bufferpress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC...•...

Software Refactoring and Documentationpress3.mcs.anl.gov/atpesc/files/2019/08/ATPESC_2019... · 2019-08-09 · 7 ATPESC 2019, July 28 –August 9, 2019 Reasons for refactoring •Once

TotalView: Debugging from Desktop to Supercomputerpress3.mcs.anl.gov/atpesc/files/2015/08/Thompson_atpesc... · 2015-08-12 · August 12, 2015 ATPESC 2015 . Rogue Wave helps organizations

Numerical Optimization using PETSc/TAO - MCSpress3.mcs.anl.gov/atpesc/files/2018/08/ATPESC... · Numerical Optimization using PETSc/TAO ... Determine the optimal initial conditions

AMReX: Building a Block-Structured AMR Application (and More)press3.mcs.anl.gov/atpesc/files/2019/08/ATPESC_2019... · 2019-08-06 · AMR Application (and More) Presented to ATPESC

Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Globus for Research Data Managementpress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC_2017_Track-3_10_8-4...Globus for Research Data Management Presented to ATPESC 2017 Participants Greg

Large Scale Visualization with ParaViewpress3.mcs.anl.gov/atpesc/files/2020/08/ATPESC-2020...–Billions of AMR cells (2008). –6.33 billion unstructured cells in Catalyst (2016).

ATPESC 2019 Track-7 8-08 0345pm ONealpress3.mcs.anl.gov/atpesc/files/2019/08/ATPESC... · •Requested citation: Jared O’Neal, Git Workflows, in Better Scientific Software Tutorial,

Welcome to…press3.mcs.anl.gov/atpesc/files/2020/08/ATPESC-2020... · 2020-08-06 · 5 Building an Online Community •New community-based resourcefor scientific software improvement

ATPESC 2019 Track-7 8-08 0345pm ONeal...Git Workflows ATPESC 2019 Jared O’Neal Mathematics and Computer Science Division Argonne National Laboratory Q Center, St. Charles, IL (USA)

ATPESC We resume @ 10:45ampress3.mcs.anl.gov/atpesc/files/2015/08/Reinders.Vector.pdf- Use for unstructured stream or pile of work - Can add additional work to pile while running parallel_for_each

SYCL – Introduction and Hands-onpress3.mcs.anl.gov/atpesc/files/2020/07/ATPESC-2020... · 2020. 7. 30. · SYCL–IntroductionandHands-on ThomasApplencourt+peopleonnextslide-apl@anl.gov

Science Networks: How Data Gets Therepress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · • Science networks are engineered to support data-intensive science – Related to and

ATPESC 2021 Tools Track

Architecture of the IBM BG/Qpress3.mcs.anl.gov/atpesc/files/2015/08/atpesc-parker-bgq-arch-2015-rloy-v1.2.pdfLeadership computing power – Leading architecture since introduction,

ATPESC 2017 - MCSpress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · point you will see your source and can set breakpoints. • Some points to consider… –You don’t see your

press3.mcs.anl.govpress3.mcs.anl.gov/atpesc/files/2020/07/ATPESC-2020-Track-2b-Tal… · 4 Preliminaries: Systems for exercises •Copy the exercises to your home directory $ cp -r

The Supercomputer “Fugaku” and Arm-SVE enabled A64FX …press3.mcs.anl.gov/atpesc/files/2020/07/ATPESC... · The Supercomputer “Fugaku” and Arm-SVE enabled A64FX processor

Overview of Machine Learning Methodspress3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Overview of Machine Learning Methods Prasanna Balaprakash Mathemacs and Computer Science Division

Git Introductionextremecomputingtraining.anl.gov/files/2017/08/ATPESC... · 2017. 8. 13. · Git Introduction Presented to ATPESC 2017 Participants Alicia Klinvex Sandia National