Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS...

35
Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag

Transcript of Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS...

Page 1: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

Discover Cluster Upgrades:

Hello Haswells and SLES11 SP3, Goodbye Westmeres

February 3, 2015NCCS Brown Bag

Page 2: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Agenda

• Discover Cluster Hardware Changes & Schedule – Brief Update

• Using Discover SCU10 Haswell / SLES11 SP3

• Q & A

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 2

Page 3: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Discover Hardware Changes & Schedule update

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 3

Page 4: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Discover’s New Intel Xeon“Haswell” Nodes

• Discover’s Intel Xeon “Haswell” nodes:• 28 cores per node, 2.6 GHz

• Usable memory: 120 GB per node, ~4.25 GB per core (128 GB total)

• FDR InfiniBand (56 Gbps), 1:1 blocking

• SLES11 SP3

• NO SWAP space, but DO have lscratch and shmem disk space

• SCU10:– 720* Haswell nodes general use (1,080 nodes total), 30,240 cores

total, 1,229 TFLOPS peak total

• *Up to 360 of the 720 nodes may be episodically allocated for priority work

• SCU11: – ~600 Haswell nodes, 16,800 cores total, 683 TFLOPS peak

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 4

Page 5: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Discover Hardware Changes in a Nutshell

• January 30, 2015 (-70 TFLOPS):

– Removed: 516 Westmere (12-core) nodes (SCU3, SCU4)

• February 2, 2015 (+806 TFLOPS for general work):

– Added: ~720* Haswell (28-core) nodes (2/3 of SCU10)

• *Up to 360 of the 720 nodes may be episodically allocated to a priority project

• Week of February 9, 2015 (-70 TFLOPS):

– Removed: 516 Westmere (12-core) nodes (SCU1, SCU2)

– Removed: 7 oldest (‘Dunnington’) Dalis (dali02-dali08)

• Late February/early March 2015 (+713 TFLOPS for general work):

– Added: 600 Haswell (28-core) nodes (SCU11)

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 5

TFLOPS for General User Work

Page 6: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Discover Node Count for General Work – Fall/Winter Evolution

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 6

Page 7: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Discover Processor Cores for General Work – Fall/Winter Evolution

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 7

Page 8: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Oldest Dali Nodes to Be Decommissioned

• The oldest Dali nodes (dali02 – dali08) will be decommissioned starting February 9 (plenty of newer Dali nodes remain).

• You should see no impact from the decommissioning of old Dali nodes, provided you have not been explicitly specifying one of the dali02 – dali08 node names when logging in.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 8

Page 9: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Using Discover SCU10 and Haswell / SLES11 SP3

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 9

Page 10: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

How to use SCU10

• 720 Haswell nodes on SCU10 available in sp3 partition

• To be placed on a login node with the SP3 development environment, after providing your NCCS LDAP password, specify “discover-sp3” at the “Host” prompt:

Host: discover-sp3

• However, you may submit to the sp3 partition from any login node.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 10

Page 11: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

How to use SCU10

• To submit a job to the sp3 partition, use either:– Command line:

sbatch --partition=sp3 --constraint=hasw myjob.sh

– Or inline directives:#SBATCH --partition=sp3

#SBATCH --constraint=hasw

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 11

Page 12: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Porting your work: the fine print…

• There is a small (but non-zero) chance your scripts and binaries will run with no changes at all.

• Nearly all scripts and binaries will require changes to make best use of SCU10.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 12

Page 13: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Porting your work: the fine print…

• There is a small (but non-zero) chance your scripts and binaries will run with no changes at all.

• Nearly all scripts and binaries will require changes to make best use of SCU10, sooo…

With great power comes great responsibility.

- Ben Parker (2002)

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 13

Page 14: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Adjust for new core count

• Haswell nodes have 28 cores, 128 GB– >x2 memory/core from Sandy Bridge

• Specify total cores/tasks needed, not nodes.– Example: for Sandy Bridge nodes:

#SBATCH --ntasks=800

Not

#SBATCH --nodes=50

• This allows SLURM to allocate whatever resources are available.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 14

Page 15: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

If you must control the details…

• … still don’t use --nodes.

• If you need more than ~4 GB/core, use fewer cores/node.#SBATCH --ntasks-per-node=N…

–Assumes 1 task/core (the usual case).

• Or specify required memory:#SBATCH --mem-per-cpu=N_MB…

• SLURM will figure out how many nodes are needed to meet this specification.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 15

Page 16: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Script changes summary

• Avoid specifying --partition unless absolutely necessary.– And sometimes not even then…

• Avoid specifying --nodes.– Ditto.

• Let SLURM do the work for you.– That’s what it’s there for, and it allows for better

resource utilization.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 16

Page 17: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Source code changes

• You might not need to recompile…– … but SP3 upgrade may require it.

• SCU10 hardware is brand-new, possibly needing a recompile.– New features, e.g. AVX2 vector registers– SGI nodes, not IBM– FDR vs QDR Infiniband– NO SWAP SPACE!

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 17

Page 18: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

And did I mention…

• … NO SWAP SPACE!

• This is critical.– When you run out of memory now, you won’t start

to swap – your code will throw an exception.

• Ameliorated by higher GB/core ratio…– … but we still expect some problems from this.

• Use policeme to monitor the memory requirements of your code.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 18

Page 19: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

If you do recompile…

• Current working compiler modules:– All Intel C compilers (ifortran not tested yet)– gcc 4.5, 4.81, 4.91– g95 0.93

• Current working MPI modules:– SGI MPT– Intel 4.1.1.036 and later– MVAPICH2 1.81, 1.9, 1.9a, 2.0, 2.0a, 2.1a– OpenMPI: 1.8.1, 1.8.2, 1.8.3

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 19

Page 20: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

MPI “gotchas”

• Programs using old Intel MPI must be upgraded.

• MVAPICH2 and OpenMPI have only been tested on single-node jobs.

• All MPI modules (except SGI MPT) may experience stability issues when node counts are >~300.– Symptom: Abnormally long MPI teardown times.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 20

Page 21: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

cron jobs

• discover-cron is still at SP1.– When running SP3-specific code, may need to ssh

to SP3 node for proper execution.– Not extensively tested yet.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 21

Page 22: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Sequential job execution

• Jobs may not execute in submission order.– Small and interactive jobs favored during the day.– Large jobs favored at night.

• If execution order is important, the dependencies must be specified to SLURM.

• Multiple dependencies can be specified with the --dependency option.– Can depend on start, end, failure, error, etc.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 22

Page 23: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Dependency example

# String to hold the job IDs.

job_ids=''

# Submit the first parallel processing job, save the job ID.

job_id=`sbatch job1.sh | cut -d ' ' -f 4`

job_ids="$job_ids:$job_id"

# Submit the second parallel processing job, save the job ID.

job_id=`sbatch job2.sh | cut -d ' ' -f 4`

job_ids="$job_ids:$job_id"

# Submit the third parallel processing job, save the job ID.

job_id=`sbatch job3.sh | cut -d ' ' -f 4`

job_ids="$job_ids:$job_id"

# Wait for the processing jobs to finish successfully, then

# run the post-processing job.

sbatch --dependency=afterok$job_ids postjob.sh

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 23

Page 24: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Coming attraction: shared nodes

• SCU10 nodes will initially be exclusive: 1 job/node

• This is how we roll on discover now.

• May leave a lot of unused cores and/or memory.

• Eventually, SCU10 nodes (and maybe others) will be shared among jobs.– Same or different users.

• What does this mean?Discover: Haswells & SLES11 SP3, Feb. 3, 2015 24

Page 25: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Shared nodes (future)

• You will no longer be able to assume that all of the node resources are for you.

• Specifying task and memory requirements will ensure SLURM gets you what you need.

• Your jobs must learn to “work and play well with others”.– Unexpected job interactions, esp. with I/O, may

cause unusual behavior when nodes are shared.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 25

Page 26: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Shared nodes (future, continued)

• If you absolutely must have a minimum number of CPUs in a node, the --mincpus=N option to sbatch will ensure you get it.

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 26

Page 27: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

Questions & Answers

NCCS User Services:[email protected]

301-286-9120

https://www.nccs.nasa.gov

Thank you

Page 28: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Supplemental Slides

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 28

Page 29: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Discover Compute Nodes, February 3, 2015 (Peak ~1,629 TFLOPS)

• “Haswell” nodes, 28 cores per node, 2.6 GHz (new)

– SLES11 SP3

– SCU10, 4.5 GB memory per core (new)

• 720* nodes general use (1,080 nodes total), 30,240 cores total, 1,229 TFLOPS peak total (*360 nodes episodically allocated for priority work)

• “Sandy Bridge” nodes, 16 cores per node, 2.6 GHz (no change)– SLES11 SP1

– SCU8, 2 GB memory per core

• 480 nodes, 7,680 cores, 160 TFLOPS peak

– SCU9, 4 GB memory per core

• 480 nodes, 7,680 cores, 160 TFLOPS peak

• “Westmere” nodes, 12 cores per node, 2 GB memory per core, 2.6 GHz– SLES11 SP1

– SCU1, SCU2 (SCUS 3, 4, and 7 already removed)

• 516 nodes, 6,192 cores total, 70 TFLOPS peakDiscover: Haswells & SLES11 SP3, Feb. 3, 2015 29

Page 30: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Discover Compute Nodes, March 2015 (Peak ~2,200 TFLOPS)

• “Haswell” nodes, 28 cores per node

– SLES11 SP3

– SCU10, 4.5 GB memory per core

• 720* nodes general use (1,080 nodes total), 30,240 cores total, 1,229 TFLOPS peak total (*360 nodes episodically allocated for priority work)

– SCU11, 4.5 GB memory per core (new)

• ~600 nodes, 16,800 cores total, 683 TFLOPS peak

• “Sandy Bridge” nodes, 16 cores per node (no change)– SLES11 SP1

– SCU8, 2 GB memory per core

• 480 nodes, 7,680 cores, 160 TFLOPS peak

– SCU9, 4 GB memory per core

• 480 nodes, 7,680 cores, 160 TFLOPS peak

• No remaining “Westmere” nodesDiscover: Haswells & SLES11 SP3, Feb. 3, 2015 30

Page 31: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Jan.26-30Jan.

26-30Feb. 2-

6Feb. 2-

6Feb. 9-

13Feb. 9-

13Feb. 17-20Feb. 17-20

Feb. 23-27Feb. 23-27

Mar.2-27Mar.2-27

SCU10 Integration SCU10 GeneralAccess: +720*

Nodes

SCU10 arrived in mid-November 2014. Following installation & resolution of initial power issues, the NCCS provisioned SCU10 with Discover images and integrated it with GPFS storage. NCCS stress testing and targeted high-priority use occurred in January 2015.(*360 nodes episodically allocated for priority work)

SCU 8 and 9

No changes during this period (January – March 2015). In November 2014, 480 nodes previously allocated for a high-priority project were made available for all user processing.

SCU11 Integration

SCU 11 (600 Haswell nodes) has been delivered, and will be installed starting Feb. 9th. Then the NCCS will provision the system with Discover images and integrate it with GPFS storage. Power and I/O connections from Westmere SCUs 1, 2, 3, and 4 are needed for SCU11. Thus, SCUs 1, 2, 3, and 4 must be removed prior to SCU11 integration.

SLES11, SP3600 Nodes

16,800 CoresIntel Haswell683 TF Peak

SLES11, SP1960 Nodes

15,360 CoresIntel Sandy

Bridge320 TF Peak

SLES11, SP11,032 Nodes12,384 Cores

Intel Westmere139 TF Peak

SCU 1, 2, 3, 4Decommissioning Drain: 516

Nodes

To make room for the new SCU11 compute nodes, the nodes of Scalable Units 1, 2, 3, and 4 (12-core Westmeres installed in 2011) are being removed from operations during February. Removal of half of these nodes will coincide with the general access to SCU10, the remaining half during installation of SCU11.

SLES11, SP31,080 Nodes30,240 CoresIntel Haswell

1,229 TF Peak

Discover COMPUTE

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 31

Drain: 516Nodes

Remove: 516Nodes

Remove: 516Nodes

Physical Installation

Configur-ation

StressTesting

SCU11 GeneralAccess: +600

Nodes

Page 32: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Discover “SBU” Computational Capacity for General Work – Fall/Winter Evolution

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 32

Page 33: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Total Discover Peak Computing Capability as a Function of Time (Intel Xeon Processors Only)

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 33

Page 34: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Total Number of Discover Intel Xeon Processor Cores as a Function of Time

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 34

Page 35: Discover Cluster Upgrades: Hello Haswells and SLES11 SP3, Goodbye Westmeres February 3, 2015 NCCS Brown Bag.

NASA Center for Climate Simulation

Storage Augmentations

• Dirac (Mass Storage) Disk Augmentation– 4 Petabytes usable (5 Petabytes “raw”), installed

– Gradual data move: starts week of February 9 (many files, “inodes” to move)

• Discover Storage Expansion– 8 Petabytes usable (10 Petabytes “raw”), installed

– For both general use and targeted “Climate Downscaling” project

– Phased deployment, including optimizing the arrangement existing project and user nobackup space

Discover: Haswells & SLES11 SP3, Feb. 3, 2015 35