Post on 28-May-2020
1
A PETAFLOPS Supercomputer as a University Resource: IU’s Experience
Abhinav ThotaTeam Lead, Scientific Applications
University IT ServicesIndiana University
2
Outline
• Based on a paper published last year at SIGUCCS• About Indiana University and Research Technologies• Big Red II and the HPC ecosystem at IU• HPC is not plug and play• BR 2’s Successful Launch • Evaluation of BR 2 and Value Assessment of HPC in general• Return on Investment• Argument for investing in local HPC resources• Questions – stop me anytime!
3
Indiana University and Research Technologies
• Indiana University (IU) is a large public university• Founded in 1820• Bloomington is the flagship campus of IU’s eight campuses statewide• 115,000 students statewide; 23,000 faculty and staff
• Research Technologies is a division of the University IT Services (UITS) department and a core component of the Pervasive Technology Institute (IUPTI), a collaborative center
• Provides research computing services to the university, including HPC, storage, gateways, and visualization
4
Indiana University
High Performance SystemsPeggy Lindenlaub, Manager
Matt Allen, Senior Technical Lead
SystemsMatthew Link, Director
Advanced Biomedical IT CoreRichard Meraz, Manager
Advanced Visualization LabMichael Boyles, ManagerHigh Performance File Systems
Stephen Simms, Manager
Campus Bridging and Research InfrastructureJoe Butler, Manager
Research AnalyticsScott Michael, Manager
High Throughput ComputingRobert Quick, Manager
Scientific Applications and Performance TuningAbhinav Thota, Manager
Research Technologies July 2017
Chart includes appointed staff only. See http://uits.iu.edu/scripts/ose.cgi?ltxt.help.
Science Community ToolsRobert Henschel, Director
Associate Dean, RT, and Executive Director, PTICraig A. Stewart
H.E. Cicada Dennis Ben Fulton
Shijie ShengLe-Mai Weakley
Alain DeximoKyle GrossMarina KrenzThomas Lee
Chris PipesElizabeth ProutScott Teige
Jefferson DavisTak Noguchi
Justin PetersKevin Wilhite
Visualization and AnalyticsEric Wernert, Director
Brian ChaseEric CoulterBarbara Hallock
Matthew StandishAaron WellsSarah Williams
Stephen BirdCollin GaydeJeff Gronek
Joseph RinkovskyJenett Tillotson
Patrick BeardEd DambikChris EllerChauncey Frend
David ReaganJeffrey RogersWilliam Sherman
Community Engagement and InteroperabilityTherese Miller, Program Director
Andrew ArensonJames DudleyRyan LongAbhijeet
Malatpure
Jim MullenRadha SuryaYelena Helen YezeretsVacancy
Research StorageCharles McClary, Manager
Danko AntolovicChris GarrisonKarl Garrison
Stewart HowardJeff RussHaichuan Yang
Tom Doak, Principal InvestigatorScott Daniel Michaels, Co-Principal Investigator
Craig Stewart, Co-Principal Investigator
Thomas Doak, Manager
Margaret Dolinsky, Research Scientist
Carrie GanoteBhavya Nalagampalli
Papudeshi
Sheri Sanders
George Turner, Chief Systems Architect
Digital Humanities CyberinfrastructureTassie Gniady, Coordinator
National Center for Genome Analysis Support (NCGAS)
Thomas CroweChris HannaNathan Heald
Nathan LavenderKen RawlingsShawn Slavin
Application VirtualizationStephanie Cox, Manager
Ellyn Barham-PruittSean Mahoney
Steve Schunk
Research Data Services M. Esen Tuna, Manager & Technical Advisor
Nancy Long James McCombs
Alan Walsh
Robert Ping, Education and Outreach Training Manager
Jetstream Project Management and Outreach
Vacancy, Lead Cyberinfrastructure Analyst
Julie Wernert, Information Manager
Jeremy Fischer, Sr Technical AdvisorVacancy
Jetstream CyberinfrastructureGeorge Turner, Senior Technical Lead
Advanced CyberinfrastructureDavid Y. Hancock, Program Director
Bret Hammond Mike Lowe
David KlosterScalable Compute Archives
Arvind Gopu, Manager
Soichi HayashiRaymond Perigo
Michael Young
Tony Walker, Sr. Technical Advisor
Collaboration and Engagement SupportWinona Snapp-Childs, Manager
Advanced Parallel ApplicationsRaymond Sheppard, Manager
Junjie Li Shijie Li
5
IU Campus Gates
Image:http://www.iuwc.indiana.edu/bloomington/
6
Bloomington, Indiana
Image:http://ois.iu.edu/img/content-photos/map.gif
7
IU’s HPC Journey
• We have heard the perspectives of large national supercomputer centers, here’s how IU looks at HPC
• Over the last two decades:– The first steps happened in 1997, after the then President Myles Brand
set the goal for IU to be a leader in use and application of IT– In 2001, IU implemented the first 1 TFLOPS supercomputer owned by a
US university– In 2013, IU made a similar achievement at the 1 PFLOPS level with Big
Red II, a Cray supercomputer• Of course, there were other faster supercomputers, but not
owned by and operated for a single university– This gave IU the freedom to pursue university priorities and further the
mission
8
Investment in Big Red II
• Will focus on how investment in Big Red II sped up positive changes• Big Red II was purchased in 2013, became the flagship machine• As always, lot’s of promises were made J (more about this in a bit)
– Unlikely, but they have all been kept!• We promised that Big Red II will be widely adopted in the university
– The goal was 150 disciplines and sub-disciplines • How was this accomplished? Definitely not an accident!• What are some of the factors that can lead to this kind of success?
9
Centrally funded by IU
• Big Red II and cyberinfrastructure at IU generally are centrally funded by IU for the use of the university community
• This is actually not that common• Many large universities employ some form of a chargeback model
and make their HPC customers pay for access and usage• Or at least need a proposal and there is a review process• As easy a creating an email account – no proposal or justification
required
10
Context – HPC is not just compute
• HPC – High Performance Computing• There are a lot of other resources that support a supercomputer
– Storage• Stable permanent storage• High performance parallel storage• Archival storage
– Visualization tools and resources– Supporting human resources
• Important and often overlooked part of the equation
11
Cyberinfrastructure at IU
• Big Red II: 1020 node Cray Xe6/XK7 system (676 GPU nodes)
• Big Red II+: 512 node Cray XC30 for Grand Challenge Projects
• Karst: 256 Ivy Bridge nodes• Carbonate: 96 large-memory Haswell
nodes (256 GB RAM)• Jetstream: XSEDE cloud offering • Wrangler: XSEDE data analysis and
storage system• Torque/Moab & SLURM resource and
workload managers• A 6 Petabyte parallel scratch file
system • 3 Petabyte home/permanent file
system• 20PB+ tape archive
12
How HPC is utilized
• Generally:– the operating system is some version of Linux– There are different kinds of file systems attached, including
parallel file systems– Users access the HPC system remotely through an SSH based
terminal– The workload on the machine is managed through a job
scheduler
13
Garden Variety Terminal
• Who’s excited to use a terminal?
14
Terminal is where it starts
• New users begin here• And this is where they hit the first roadblock• A lot of non-traditional HPC users are now using HPC• Bottom line:
– The service we are providing is not user friendly by any stretch of imagination
• R, Python and MATLAB are now mainstream HPC applications
• This is the environment into which Big Red II entered
15
HPC is a specialized field
• People don’t know how HPC works until they start using it• It’s not a ”I know how it works but haven’t used it” situation • Linux, remote terminals, batch job driven usage and all the other
moving parts make the learning curve steep• Users have two options
– Settle for what they can get out of a laptop– Learn how to use a supercomputer
• Here is where great outreach, training and user support come into play
• How are other centers handling this problem?
16
Big Red II’s launch
• In 2013, Big Red II was replacing Big Red• Old Big Red was in service from 2006 to 2013• It was a big change for our users• To address this:
– We had an early user phase lasting a few months– A high profile dedication ceremony, to get the word out– Extensive outreach efforts
• Presentations and demos to our regular users in person• Presentations and tables at campus events
17
GPU Outreach
• IU went from zero GPUs to 676 GPUs after Big Red II was launched
• We installed and made available the popular GPU applications like NAMD, LAMMPS, Amber, GROMACS, etc
• Back in 2013, there were very few full-fledged GPU applications
• We explained why users should use the GPU version in the knowledge base articles
• We declined to build CPU-only versions if there was a GPU version
• The situation gradually improved AscreenshotfromaKBarticlehttps://kb.iu.edu/d/besc
18
Promises that were made
• Big Red II will benefit more than 150 disciplines and sub-disciplines• Everyone thought that this was crazy, at the time• We are at 194 today; hit 150 in Jan 2015
19
Disciplines – at 194 today
20
How we counted
• Created a web form to take this information– Contacted users to fill in this web form– Integrated the web form into the account creation process– Users self identify from a drop down list
21
How we got to 150
• Hearing from users and listening to them when making policy and purchase decisions
• Everyone was welcome – No barriers and road blocks (just a click on a webpage)
• The big splash at launch • Outreach• User onboarding process• Continued support, for both easy, quick questions and long term
consulting• Training
22
Support Structure
• Basic user and application support
• Data analytics and statistical application support
• Visualization support• Digital humanities• Science gateways
• Separate admin teams for systems and storage
• Wecan’talwaysmeetanddemothingstousersbeforetheygetgoing
• Butifwecan,wedon’tmissthatopportunity
• Manytimes,theuserneedstheresourcesbutthelearningcurvestopsthemfromdivingin
23
Training
• About 25 workshops every year– Intro to Unix– Introductory HPC– Intro MPI– Intermediate MPI– R– Python– Matlab
• And ad-hoc on-demand custom workshops for research groups who request it– These are awesome – you are helping researchers use the systems
efficiently – Which makes them and us both happy!
24
Evaluation of the system
• The University IT Services department does a survey once an year– This includes everything under the sun, like email, Wi-Fi, and of
course HPC• One section for HPC questions
– The results for the last 25 years are available here: http://www.indiana.edu/~uitssur/
• Limited scope, due the the limited number of questions• But we do surveys after every workshop • Really important to stay in touch with the user base and knowing
what their experience is
25
Value Assessment of HPC investments
• Value assessment is hard– Especially at a non-profit/educational institution– A lot easier when you are selling a product for profit!
• We are doing this by measuring how widely the HPC resources being used– Number of disciplines and sub-disciplines for example
• How deeply some groups are using the HPC resources– For example, a few groups doing research that use HPC
extensively• Interviews
– We contracted with an assessment group at another university to do interviews, the report is available online
26
XDMod Value Analytics for value assessment
• XDMoD – XD Metrics on Demand• Is an NSF funded open source tool designed to audit and facilitate the
utilization of XSEDE resources• Metrics provided include:
– Resource utilization– Resource performance– Impact on scholarship and research
• IU is working on integrating scientific output and grant income into this tool– For researchers who use HPC vs those who don’t
• This tool can plug into NSF and NIH grant databases – Even if individual university/college grant data is not public
27
Value Analytics: A Financial Module for the Open XDMoD Project
• Talk by Ben Fulton• Wednesday, July 12 • 2:00pm - 2:30pm• Bolden 5
28
Return on Investment
• Existing studies show that HPC (and probably IT facilities in general) are likely to lead to increases in publications and grant income
• We calculated the total amount in grant funding that Big Red II users brought-in in the last three fiscal years (includes funding possibly unrelated to Big Red II)
AllIUGrants College ofArtsandScienceGrants
FY2014 $11.7M $6.5M
FY2015 $24.5M $11.2M
FY2016 $39.8M $14.0M
29
Let’s do some math
• Over the 5 year lifespan of Big Red II, hardware and personnel combined will cost about $15 million
• At the halfway mark, FY 2016 grant income was $40 million• Assume that the total over the lifespan of 5 years is $90 million• About $30 million of that would come to IU as facilities and
administration funds• This is already double the total cost of operation of Big Red II• Factor in the increased competitiveness on grants and the scientific
value of research done using Big Red II – You have a part qualitative, part quantitative but winning
argument overall about why you should invest in HPC
30
HPC that can scale
• Many institutions may be able to invest in HPC at some level in a local HPC system
• It does not need to have the capacity and capability to satisfy all the local needs– IU, for example, cannot meet all the local demand; fair share
policies are in place• If no investment in HPC is possible, investment in personnel to support
access to federally funded open resources can be of great benefit
31
Building a small HPC center
• Basic components of a HPC environment are compute, a network file system that is backed up and a parallel file system that is not backed up
• About 3 FTE ought be able to support this in a barebones fashion– No backup admins– No support during non business hours, the usual disclaimers– User support and outreach will need more staff! -- most journeys
begin here though, with enthusiastic staff doing support and outreach just because they want to!
32
NSF and DoE funded resources
• Sometimes it makes more financial sense to support 90% of your users and guide the rest on to shared national resources
• XSEDE– Campus Champions program– Startup allocations
• Department of Energy INCITE – For even larger needs
33
Conclusion
• HPC is not just hardware, good support and training is needed for optimal usage
• Reducing the barriers to HPC access is proven to increase breadth and diversity of users– 194 disciplines at IU are using Big Red II
• Value assessment and calculating ROI for HPC investments is hard– associating grant income to HPC usage is a good start
• Smart investments in HPC/cyberinfrastructure lead to increases in grant income and publications– Makes very good economic sense
34
Questions and Comments
• Contact: athota@iu.edu