The Newton Fund Research and Innovation for Growth and Prosperity
Research Computing with Newton
description
Transcript of Research Computing with Newton
Research Computing with Newton
Gerald Ragghianti
Newton HPC workshop
Sept. 3, 2010
What is the Newton Program?
• Research computing support• Infrastructure management• Consultation• Training
• Research Objectives• Effectiveness• Efficiency• Capability
User
applicationsComputational
environment (OS, cluster
management, software)
Computing hardware
Computing infrastructure (space, network, power, cooling)
Community organization (policies, membership)
The Newton cluster• “Normal” Linux compute cluster• 295 computers• 2500 processors• 5TB RAM• 40 Gbit/sec Infiniband• 80 TB Storage
Storage server
Lustre storageHead node
Compute node Compute node Compute nodeInteractive
node
Lustre storage
External network
Infiniband network
Ethernet network
Compute node
Compute node Compute node Compute nodeInteractive
nodeCompute node
Lustre storage
Storage server
Newton cluster machinesRack 1 Rack 2 Rack 3 Rack 4 Rack 5 Rack 6 Rack 7 Rack 8
Dell R410 (tao040) Dell R410 (tao059) Dell R410 (tao078) Dell R410 (tao119) Dell 1950 (gamma31) C6100Dell R410 (tao039) Dell R410 (tao058) Dell R410 (tao077) Dell R410 (tao118) KVM Dell 1950 (gamma30) X2200M2 (lustre4)
Dell R410 (tao038) Dell R410 (tao057) Dell R410 (tao076) Dell R410 (tao117) Dell 1950 (zeta31) Dell 1950 (gamma29) X2200M2 (lustre3) C6100Dell R410 (tao037) Dell R410 (tao056) Dell R410 (tao075) Dell R410 (tao116) Dell 1950 (zeta30) Dell 1950 (gamma28) X2200M2 (lustre2)
Dell R410 (tao036) Dell R410 (tao055) Dell R410 (tao074) Dell R410 (tao115) Dell 1950 (zeta29) Dell 1950 (gamma27) X2200M2 (lustre1) C6100Dell R410 (tao035) Dell R410 (tao054) Dell R410 (tao073) Dell R410 (tao114) Dell 1950 (zeta28) Dell 1950 (gamma26) X2200M2 (alpha11)
Dell R410 (tao034) Dell R410 (tao053) Dell R410 (tao072) Dell R410 (tao113) Dell 1950 (zeta27) Dell 1950 (gamma25) X2200M2 (alpha10) C6100Dell R410 (tao033) Dell R410 (tao052) Dell R410 (tao071) Dell R410 (tao112) Dell 1950 (zeta26) Dell 1950 (gamma24) X2200M2 (alpha09)
Dell R410 (tao032) Dell R410 (tao051) Dell R410 (tao070) Dell R410 (tao111) Dell 1950 (zeta25) Dell 1950 (gamma23) X2200M2 (alpha08) C6100Dell R410 (tao031) Dell R410 (tao050) Dell R410 (tao069) Dell R410 (tao110) Dell 1950 (zeta24) Dell 1950 (gamma22) X2200M2 (alpha07)
Dell R410 (tao030) Dell R410 (tao049) Dell R410 (tao068) Dell R410 (tao109) Dell 1950 (zeta23) Dell 1950 (gamma21) X2200M2 (alpha06) C6100Dell R410 (tao029) Dell R410 (tao048) Dell R410 (tao067) Dell R410 (tao108) Dell 1950 (zeta22) Dell 1950 (gamma20) X2200M2 (alpha05)
Dell R410 (tao028) Dell R410 (tao047) Dell R410 (tao066) Dell R410 (tao107) Dell 1950 (zeta21) Dell 1950 (gamma19) X2200M2 (alpha04) C6100Dell R410 (tao027) Dell R410 (tao046) Dell R410 (tao065) Dell R410 (tao106) Dell 1950 (zeta20) Dell 1950 (gamma18) X2200M2 (alpha03)
Dell R410 (tao026) Dell R410 (tao045) Dell R410 (tao064) Dell R410 (tao105) Dell 1950 (zeta19) Dell 1950 (gamma17) X2200M2 (alpha02) C6100Dell R410 (tao025) Dell R410 (tao044) Dell R410 (tao063) Dell R410 (tao104) Dell 1950 (zeta18) Dell 1950 (gamma16) X2200M2 (lustre0)
Dell R410 (tao024) Dell R410 (tao043) Dell R410 (tao062) Dell R410 (tao103) Dell 1950 (zeta17) Dell 1950 (gamma15) X2200M2 (alpha00) C6100Dell R410 (tao023) Dell R410 (tao042) Dell R410 (tao061) Dell R410 (tao102) Dell 1950 (zeta16) Dell 1950 (gamma14) Dell 1850 (isaac)
Dell R410 (tao022) Dell R410 (tao041) Dell R410 (tao060) Dell R410 (tao101) Dell 1950 (zeta15) Dell 1950 (gamma13)EMC CX300 SAN
Qlogic IB 122000
Dell R410 (tao021)
Dell R900 (epsilon0)
C6100 Dell R410 (tao100) Dell 1850 (admin) Dell 1950 (gamma12) Qlogic IB 122000
Dell R410 (tao020) Dell R410 (tao099) console Dell 1950 (gamma11) Qlogic IB 122000
Dell R410 (tao019) C6100 Dell R410 (tao098) Dell 1950 (zeta14) Dell 1950 (gamma10)EMC CX300 SAN
C6100Dell R410 (tao018) Dell R410 (tao097) Dell 1950 (zeta13) Dell 1950 (gamma09)
Dell R410 (tao017) Qlogic IB 123000 Qlogic IB 122000 Dell R410 (tao096) Dell 1950 (zeta12) Dell 1950 (gamma08) C6100Qlogic IB 122000 Qlogic IB 122000 Qlogic IB 122000 Qlogic IB 122000 Dell 1950 (zeta11) Dell 1950 (gamma07)
EMC CX300 SANDell R410 (tao016) Dell R510 nfs-mrail0 Dell R510 lustre-oss-0 Dell R410 (tao095) Dell 1950 (zeta10) Dell 1950 (gamma06) C6100Dell R410 (tao015) Dell R410 (tao094) Dell 1950 (zeta09) Dell 1950 (gamma05)
Dell R410 (tao014)
SunFire X4540 (thumper-spanier)
Dell R510 lustre-oss-1 Dell R410 (tao093) Dell 1950 (zeta08) Dell 1950 (gamma04)EMC CX300 SAN
C6100Dell R410 (tao013) Dell R410 (tao092) Dell 1950 (zeta07) Dell 1950 (gamma03)
Dell R410 (tao012) Dell R510 lustre-oss-2 Dell R410 (tao091) Dell 1950 (zeta06) Dell 1950 (gamma02) C6100Dell R410 (tao011) Dell R410 (tao090) Dell 1950 (zeta05) Dell 1950 (gamma01) EMC CX300 SAN
Dell R410 (tao010) Dell 1850 (login0) Dell R510 lustre-mds Dell R410 (tao089) Dell 1950 (zeta04) Dell 1950 (gamma00)
PDU
C6100Dell R410 (tao009) Dell 1850 (login1) Dell R410 (tao088) Dell 1950 (zeta03) Dell 6248 Ethernet
Dell R410 (tao008) Sun X2200M2 (head) Dell R510 lustre-oss-3 Dell R410 (tao087) Dell 1950 (zeta02) Dell 6248 Ethernet C6100Dell R410 (tao007)
Dell R900 (epsilon1)
Dell R410 (tao086) Dell 1950 (zeta01)
Dell R410 (tao006) Dell 6248 Ethernet Dell R410 (tao085) Dell 1950 (zeta00) C6100Dell R410 (tao005) Dell 6248 Ethernet Dell R410 (tao084)
Cisco Infiniband
Dell R410 (tao004) Dell 6248 Ethernet Dell R410 (tao083) C6100Dell R410 (tao003)
SunFire X4500 (thumper)
Dell 6248 Ethernet Dell R410 (tao082)
Dell R410 (tao002) Dell R410 (tao081) PC 6248 switch
Dell R410 (tao001) PDU Dell R410 (tao080) PC 6248 switch
Dell R410 (tao000) Dell R410 (tao079) PC 3548 switch
Legend:
server
storage server
compute node
login compute node
Infiniband switch
Ethernet switch
management
power distribution
empty
Getting started
• SSH to login.newton.utk.edu using NetID• Transfer files with scp, sftp, or FileZilla• Display graphics with X11, xorg, or Xming
• Requires X11 “tunneling” through SSH client
$ ssh [email protected]: ***************[gragghia@newton1 ~]$ lsTest.sge filename.txt[gragghia@newton1 ~]$ w10:36:49 up 32 days, 15:07, 20 users, load average: 1.98, 1.81, 1.88USER TTY FROM LOGIN@ IDLE JCPU PCPU WHATgragghia pts/0 poltth Tue05 1:05 1.39s 1.39s -bashmkzadd pts/1 bkg.engr.utk.edu Thu18 15:16m 0.06s 0.06s -bashKrrrccc pts/2 ares.bio.utk.edu 03Aug10 3days 0.03s 0.03s -bash
Environment management
• Modules utility• Manages environment variables and aliases• User chooses applications and libraries to use
• Allows multiple versions to be available
• Example use:• See available modules: “module avail”• Load a module: “module add R”• Unload a module: “module unload R”
Resource Management: The Grid Engine
1. Accepts job requests• Executable to run• Execution time• Parallelization• RAM needed
2. Finds available resources (compute nodes)
3. Reserves and uses resources
4. Returns output
A simple job
1. Create a job request file.
2. Submit job$ qsub job.sge
3. Monitor job$ qstat -g t
4. View result log files#$ -q short*#$ -cwd#$ -N Testuname –asleep 30
More Sophistication: Array jobs»Run the same job multiple times
1. Create data files (optional)$ ~gragghia/workshop/make_datafiles.sh
2. Create a job request file with “-t” option:
3. Submit job$ qsub job.sge
4. Monitor job$ qstat -g t
5. View result log files
#$ -q short*#$ -cwd#$ -N Array#$ -t 1-10md5sum data-$SGE_TASK_ID.dat
A parallel job: MPI
1. Download the software:$ wget http://newton.utk.edu/workshop/hello.tar
2. Extract the software:$ tar –vxf hello.tar
3. Select MPI version:$ module add openmpi/1.4.2/intel
4. Compile the application:$ cd hello
$ make
5. Create a batch submit file
6. Submit the job
#$ -N Hello#$ -q short*#$ -cwd -V#$ -pe openmpi* 16mpirun hellosleep 30
Compiling and Installing Software
Example: Fractal generator
1. Find the software
2. Transfer to Newton• Direct: wget http://newton.utk.edu/workshop/gmandel.tgz• Indirect: Download to workstation and scp (sftp)
3. Extracting the source code1. Uncompressed: tar
2. Compressed: gunzip or unzip
4. Configure the software:$ ./configure –prefix=$HOME/gmandel
5. Compile: $ make
6. Install: $ make install
$ wget http://newton.utk.edu/workshop/gmandel.tgz$ tar –vzxf gmandel.tgz$ ./configure –-prefix=$HOME/gmandel$ make install…
Commercial Applications
• Matlab• Graphical (interactive)• Batch mode (parallel): matlab –r <Function>
• SAS• SPSS
$ module load matlabt$ matlab$ matlab –r ‘TestFunction’
More Information• Newton Program website: http://newton.utk.edu/
• Program policies• Documentation• Meetings / support / consulting schedule
• Research Computing Mailing List:[email protected]
Visit http://oit.utk.edu/workshops/eval/• Section ID: Newton_Cluster-5