Post on 07-Feb-2021
Ben Rogers March 29, 2012
4/2/12 1
} What is a Compute Cluster? } High Performance Computing } High Throughput Computing } The Helium Cluster } Mapping Your Problem to a Cluster } Gaining Access to the Helium Cluster } Questions?
4/2/12 2
} Large number of computers } Software that allows them to work together } A tool for solving large computational
problems that require more memory or cpu than is available on a single system
4/2/12 http://www.flickr.com/photos/fullaperture/5435786866/sizes/l/in/photostream/
3
} Using multiple computers in a coordinated way to solve a single problem
} Provides the ability to: ◦ Use 10s-1000s of cores to solve a single problem ◦ Allows access to 10s-1000s of GB of Ram
} Likely to require substantial code modification to use a library such as MPI
} Common Examples: ◦ Computational Fluid Dynamics ◦ Molecular Dynamics
4/2/12 4
} Using multiple computers in a coordinated way to solve many individual problems
} Provides the ability to: ◦ Analyze many data sets simultaneously ◦ Efficiently perform a parameter sweep
} Requires minimal code modifications } Common Examples: ◦ Image Analysis ◦ Genomics
4/2/12 5
} Collaborative Cluster } CentOS 5 (Linux) } 350+ compute nodes } 3600+ processor cores } 24-144GB of Ram/node } 200TB+ Storage } 40Gb/s Infiniband
Network
6
4/2/12 7
} Home Account Storage ◦ NFS ◦ 80TB Total ◦ 1TB per User
} Shared Scratch Storage ◦ Lustre ◦ Total: 144TB ◦ Deleted after 30 days
} Local Scratch Storage ◦ 600GB+/Compute Node
} No Backups!
} Is a cluster a good fit? ◦ If your problem Is not tractable on your desktop system Requires more memory than your desktop has
available Requires rapid turnaround of results that you can’t
achieve with a desktop system Would benefit from having jobs scheduled Don’t want to tie up your desktop computer
} Your problem may be a good candidate for a cluster!
4/2/12 8
} Next Questions ◦ Does your job run on Linux? ◦ Can your job run in batch mode? ◦ Is your job HPC or HTC?
} Next Steps ◦ Apply for Cluster Access ◦ Develop Strategy for Running Jobs ◦ Install Software ◦ Develop Job Submission Scripts ◦ Run Your Job
4/2/12 9
} Run Freesurfer on 1000 MRIs ◦ Takes 20 Hours per MRI ◦ Requires 2GB of Memory/analysis
} Desktop Analysis Time ◦ 20 Hours x 1000 MRIs = 20,000 Hours 2.3 Years! ◦ But I have a Quad Core Desktop with 8GB That’s still over six months!
4/2/12 http://surfer.nmr.mgh.harvard.edu/ 10
} Good fit for cluster? – Yes } Type of problem – HTC } Software – Runs on Linux in batch mode Time to Analyze } On Helium – As little as 20 hours ◦ Time dependent on cores available, likely complete
within a week. ◦ Possible to run all analyses simultaneously 1000 processor cores – Total on Helium > 3600 2000GB of memory – Total on Helium > 9000GB
4/2/12 11
} What’s the catch? ◦ Time and effort needed to understand how to run
your analysis on Helium ◦ Shared Resource Job wait time Job eviction
4/2/12 12
} Now open to all U of I faculty, staff, and students ◦ Students must be sponsored by a faculty member
} Complete form at site below ◦ http://hpc.uiowa.edu/helium-access-agreement
} Access typically granted within 1 business day
4/2/12 13
} Classes ◦ Introduction to Helium – April 17th, 2:00-4:00PM
} HPC Office Hours ◦ Wednesday and Thursday 2:00-3:00PM
} Contact Us: ◦ HPC-Sysadmins@iowa.uiowa.edu
} For additional details visit ◦ http://www.hpc.uiowa.edu
4/2/12 14
} ben-rogers@uiowa.edu
} Additional Information http://www.hpc.uiowa.edu http://its.uiowa.edu/apps/services/service.aspx?id=67 https://www.icts.uiowa.edu/confluence/display/
ICTSit/Helium+Cluster+Quick+Start+Guide
4/2/12 15