Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia...
-
Upload
antony-boyd -
Category
Documents
-
view
216 -
download
1
Transcript of Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia...
![Page 1: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/1.jpg)
Nanco: a large HPC Nanco: a large HPC cluster for RBNIcluster for RBNI
(Russell Berrie Nanotechnology Institute)(Russell Berrie Nanotechnology Institute)
Anne Weill – ZrahiaAnne Weill – ZrahiaTechnion,Computer CenterTechnion,Computer Center
October 2008October 2008
![Page 2: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/2.jpg)
Resources needed for applications Resources needed for applications arising from Nanotechnologyarising from Nanotechnology
Large memory – Large memory – TbytesTbytes
High floating point computing speed High floating point computing speed ––TflopsTflops
High data throughput High data throughput – state of the – state of the art …art …
![Page 3: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/3.jpg)
SMP architectureSMP architecture
PP PP
Memory
![Page 4: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/4.jpg)
Cluster architectureCluster architecture
Interconnection network
![Page 5: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/5.jpg)
Why not a clusterWhy not a cluster
Single SMP system easier to Single SMP system easier to purchase/maintainpurchase/maintain
Ease of programming in SMP Ease of programming in SMP systemssystems
![Page 6: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/6.jpg)
Why a clusterWhy a cluster
ScalabilityScalability Total available physical RAMTotal available physical RAM Reduced costReduced cost
But …But …
![Page 7: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/7.jpg)
Having an application which exploits Having an application which exploits the parallel capabilities the parallel capabilities
Studying the application or applications which
will run on the cluster
![Page 8: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/8.jpg)
Things to include in designThings to include in design
Property of Property of codecode
Essential Essential componentcomponent
CPU boundCPU bound Fast Fast computing computing unitunit
Memory Memory boundbound
Large Large memory , fast memory , fast accessaccess
Global flow of Global flow of data in data in parallel appparallel app
Fast Fast interconnectinterconnect
![Page 9: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/9.jpg)
Our choicesOur choices Property of Property of codecode
Essential Essential componentcomponent
ChoiceChoice
CoComputationnmputationnally ally intensive,FPintensive,FP
Fast Fast computing computing unitunit
64 bit dual 64 bit dual core,Opteron,core,Opteron,Rev.FRev.F
Large Large matricesmatrices
Large Large memory , fast memory , fast accessaccess
88 GB /nodeGB /node
Finite Finite element, element, spectral spectral codescodes,,
Fast Fast interconnectinterconnect
Infiniband Infiniband DDR (20 DDR (20 Gb/s,low Gb/s,low latency)latency)
![Page 10: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/10.jpg)
Other requirementsOther requirements
Space, power ,cooling constraints , Space, power ,cooling constraints , strength of floorsstrength of floors
Software configuration:Software configuration:
1.1. Operating systemOperating system
2.2. Compilers & application deve. toolsCompilers & application deve. tools
3.3. Load balancing and job schedulingLoad balancing and job scheduling
4.4. System management toolsSystem management tools
![Page 11: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/11.jpg)
ConfigurationConfiguration
P P PPP P
MMM
Infiniband Switch
![Page 12: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/12.jpg)
Before finalizing our choice …Before finalizing our choice …
One should check , on a similar system One should check , on a similar system ::
Single processor peak performanceSingle processor peak performance Infiniband interconnect performance Infiniband interconnect performance SMP behaviourSMP behaviour Non commercial parallel applications Non commercial parallel applications
behaviourbehaviour
![Page 13: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/13.jpg)
Parallel applications issuesParallel applications issues
Execution timeExecution time
Parallel speedup Sp= T1/TpParallel speedup Sp= T1/Tp
ScalabilityScalability
![Page 14: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/14.jpg)
Benchmark designBenchmark design
Must give a good estimate of Must give a good estimate of performance of your applicationperformance of your application
Acceptance test -should match all its Acceptance test -should match all its componentscomponents
![Page 15: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/15.jpg)
Comparison of performanceComparison of performance
Computer Computer CarmelCarmelNancoNanco
Lapack Lapack program, program, N=9000N=9000
487 Mflops487 Mflops3823826.4 Mflops6.4 Mflops
Ratio of 7.8 !!Ratio of 7.8 !!
![Page 16: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/16.jpg)
Execution time of Monte-Carlo Execution time of Monte-Carlo parallel code (MPI)parallel code (MPI)
ProcessesProcesses((CarmelCarmel11NancoNanco
112204222042
(~6hrs !)(~6hrs !)43894389
(~1 hr)(~1 hr)
22122461224617391739
44480948091154.81154.8
8835403540642.12642.12
1616282.5282.5
![Page 17: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/17.jpg)
Speedup of Parallel Monte Carlo
0.00
20.00
40.00
60.00
80.00
100.00
120.00
2 4 8 16 32 64
n of processes
Exe
uti
on
tim
e
MILC
![Page 18: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/18.jpg)
What did workWhat did work
Running MPI code interactivelyRunning MPI code interactively Running a serial job through the Running a serial job through the
queuequeue Compiling C code with MPICompiling C code with MPI
![Page 19: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/19.jpg)
What did not workWhat did not work
Compiling F90 or C++ code with Compiling F90 or C++ code with MPIMPI
Running MPI code through the queueRunning MPI code through the queue Queues do not do accounting per Queues do not do accounting per
CPUCPU
![Page 20: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/20.jpg)
PParaarallel performancellel performance results results
TheoreticTheoretical peak al peak
2.1 Tflops2.1 Tflops
NNanco performance on HPL:anco performance on HPL:
0.58 Tflops0.58 Tflops
![Page 21: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/21.jpg)
Comparison with Sun BenchmarkComparison with Sun Benchmark
Comparison Sunbench vs nanco(pathscale),2ppn
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
24816
nof processes
MVH1
MILC
IGOR
![Page 22: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/22.jpg)
M I LC s mal l - 2th/ n-
0. 00
500. 00
1000. 00
1500. 00
2000. 00
2500. 00
12481632
pr ocesses
Sun-bench
Nanco-gcc3
Nanco-sunc
Nanco-path
Nanco-gcc4
EExecution tixecution time –comparison of me –comparison of compilerscompilers
![Page 23: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/23.jpg)
P ar al l el Speedup f or M I LC (2th/ n)
0. 00
20. 00
40. 00
60. 00
80. 00
100. 00
120. 00
248163264
pr ocesses
SUN-bench
Nanco-sun
Nanco-path
![Page 24: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/24.jpg)
PerforPerformance with different mance with different optimizationsoptimizationsExecution time of MVH1 on nanco w ith 32 threads
0.00
50.00
100.00
150.00
200.00
250.00
300.00
Type of optimization
Execu
tio
n t
ime
VoltaireMPI+Pathscale
OpenMPI+opt.plac.
OpenMPI+opt.plac.+tmp disk
![Page 25: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/25.jpg)
Conclusions from acceptance testsConclusions from acceptance tests
New gcc (gcc4) is faster than New gcc (gcc4) is faster than Pathscale for some applicationsPathscale for some applications
MPI collective communication MPI collective communication functions are differently functions are differently implemented in various MPI versionsimplemented in various MPI versions
Disk access times are crucial - use Disk access times are crucial - use attached storage when possibleattached storage when possible
![Page 26: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/26.jpg)
Scheduling decisionsScheduling decisions
Assessing priorities between user Assessing priorities between user groupsgroups
Assessing parallel efficiency of Assessing parallel efficiency of different job types different job types (MPI,serial ,OPenMP) /commercial (MPI,serial ,OPenMP) /commercial software and designing special software and designing special queues for themqueues for them
Avoiding starvation by giving weight Avoiding starvation by giving weight to the urgency parameterto the urgency parameter
![Page 27: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/27.jpg)
Observations during production Observations during production modemode
Assessing user’s understanding of Assessing user’s understanding of machine – support in writing scripts machine – support in writing scripts and efficient parallelizationand efficient parallelization
Lack of visualization tools – writing of Lack of visualization tools – writing of script to show current usage of script to show current usage of clustercluster
![Page 28: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/28.jpg)
Utilization of clusterUtilization of cluster
![Page 29: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/29.jpg)
Utilization of nanco sep08Utilization of nanco sep08
Utilization (daily) sep 08
0
20
40
60
80
100
120
date
Uti
liza
tio
n
Series1
![Page 30: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/30.jpg)
Nanco jobs by typeNanco jobs by type
Nanco- feb 2008-by job type
Scalar
Fullwave
Self dev.code
![Page 31: Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649f3f5503460f94c5fc07/html5/thumbnails/31.jpg)
ConclusionConclusion
Benchmark correct design is crucial Benchmark correct design is crucial to test capabilities of proposed to test capabilities of proposed architecturearchitecture
Acceptance tests allow to negotiate Acceptance tests allow to negotiate with vendors and give insights on with vendors and give insights on future choicesfuture choices
Only after several weeks and Only after several weeks and running of the cluster at full running of the cluster at full capacity can we make informed capacity can we make informed decisions on management of the decisions on management of the clustercluster