NSF XSEDE Campus Champions and Extreme Scale Research ... · Campus Champions and Extreme Scale...
Transcript of NSF XSEDE Campus Champions and Extreme Scale Research ... · Campus Champions and Extreme Scale...
NSFXSEDECampus ChampionsandExtreme Scale Research Computing
D. Karres, Beckman Institute
J. Alameda, National Center for Supercomputing Applications
S. Kappes, National Center for Supercomputing Applications
Outline
• Research Computing @ Illinois– National Science Foundation Investments
• Extreme Science and Engineering Discovery Environment (XSEDE)
• Blue Waters
– Research IT• Highlighted Core Services
– Campus Champions• History/Motivation
• Scope
• Benefits
Outline
• Research Computing @ Illinois– National Science Foundation Investments
• Extreme Science and Engineering Discovery Environment (XSEDE)
• Blue Waters
– Research IT• Highlighted Core Services
– Campus Champions• History/Motivation
• Scope
• Benefits
National Science Foundation Investments
• Some Context: M. Parashar, NSF Town Hall, PEARC18, July 25, 2018– Leadership HPC
• Blue Waters (through December 2019)
• Phase 1: Frontera @ TACC (production ~mid 2019)
– Innovative HPC• Allocated through XSEDE
• Large Scale, Long-tail, Data Intensive, Cloud
– Services• XSEDE: Supporting Innovative HPC resources
• XD Metrics Service (xdmod)
• Open Science Grid
Outline
• Research Computing @ Illinois– National Science Foundation Investments
• Extreme Science and Engineering Discovery Environment (XSEDE)
• Blue Waters
– Research IT• Highlighted Core Services
– Campus Champions• History/Motivation
• Scope
• Benefits
Slides adapted from:
Linda Akli, SURA
Assistant Director, Education, Training, Outreach
Manager, XSEDE Broadening Participation Program
XSEDE OverviewFall 2018
What is XSEDE?
Foundation for a National CI Ecosystem
• Comprehensive suite of advanced digital services that federates with other high-end facilities and campus-based resources
Unprecedented Integration of Diverse Advanced Computing Resources
• Innovative, open architecture making possible the continuous addition of new technology capabilities and services
XSEDE Leadership
Mission and Goals
Mission: Accelerate scientific discovery
• Deepen and Extend Use
• Raise the general awareness of the value
• Deepen the use and extend use to new communities
• Contribute to the preparation of current and next generation scholars, researchers, and engineers
• Advance the Ecosystem
• Sustain the Ecosystem
Strategic Goals:
Total Research Funding Supported by XSEDE 2.0
10
$1.97 billion in research
supported by XSEDE 2.0September 2016 - April 2018
Research funding only. XSEDE leverages and
integrates additional infrastructure, some
funded by NSF (e.g. “Track 2” systems) and
some not (e.g. Internet2).
NSF, 754.3, 38%
NIH, 432.0, 22%
DOE, 325.0, 16%
DOD, 175.1, 9%
DOC, 55.2, 3%
NASA, 38.2, 2%
All Others, 187.7, 10%
XSEDE Supports a Breadth of Research
Earthquake Science
Molecular Dynamics
Nanotechnology
Plant Science
Storm Modeling
Epidemiology
Particle Physics
Economic Analysis of Phone Network Patterns
Large Scale Video Analytics (LSVA) Decision Making Theory
Library Collection Analysis
Replicating Brain Circuitry to Direct a Realistic Prosthetic Arm
XSEDE researchers visualize
massive Joplin, Missouri tornado
A collaboration of social scientists, humanities scholars and digital
researchers harnessed the power of high-performance computing to
find and understand the historical experiences of black women by
searching two massive databases of written works from the 18th
through 20th centuries.
Recovering Lost History
XSEDE Visualization and Data Resources
Visualization
Visualization Portal• Remote, interactive, web-
based visualization
• iPython / Jupyter Notebook integration
• R Studio Integration
Storage• Resource file system storage:
All compute/visualization allocations include access to limited disk and scratch space on the compute/visualization resource file systems to accomplish project goals
• Archival Storage: Archival storage on XSEDE systems is used for large-scale persistent storage requested in conjunction with compute and visualization resources.
• Stand-alone Storage: Stand-alone storage allows storage allocations independent of a compute allocation.
13
Compute and Analytics Resources
Bridges: Featuring interactive on-demand access, tools for gateway building, and virtualization.
Comet: hosting a variety of tools including Amber, GAUSSIAN, GROMACS, Lammps, NAMD, and VisIt.
Jetstream: A self-provisioned, scalable science and engineering cloud environment
Stampede-2: Intel's new innovative MIC technology on a massive scale
Super Mic: Equipped with Intel's Xeon Phi technology. Cluster consists of 380 compute nodes.
Wrangler: Data Analytics System combines database services, flash storage and long-term replicated storage, and an analytics server. IRODS Data Management, HADOOP Service Reservations, and Database instances.
Science Gateways
The CIPRES science gateway: A NSF
investment launching thousands of scientific
publications with no sign of slowing down.
https://sciencenode.org/feature/cipres-one-facet-in-bold-nsf-vision.php?clicked=title
XSEDE High Throughput Computing Partnership
16
• Governed by the OSG consortium
• 126 institutions with ~120 active sites collectively supporting usage of ~2,000,000 core hours per day
• High throughput workflows with simple system and data dependencies are a good fit for OSG
• Access Options:
• OSGConnect available to any researcher affiliated with US institutions and who are funded by US funding agencies
• OSG Virtual Organization such as CMS and ATLAS
• XSEDE
• https://portal.xsede.org/OSG-User-Guide
Open Science Grid
Accessing XSEDE - Allocations
Education
Research
17
Champion
Startup
XSEDE User Support Resources
Technical information Training
Help Desk/Consultants
Extended Collaborative Support Service
Workforce Development: Training
XSEDE Training Course Catalog with all materials in a single location
Course Calendar for viewing a listing of and registering for upcoming training events and a registration
Online Training on materials relevant to XSEDE users
Badges available for completing selected training
Some events provide participation documentation
Training Roadmaps
pearc19.pearc.org
July 28 - Aug 1, 2019 Chicago, IL
Welcome toXSEDE!
Outline
• Research Computing @ Illinois• National Science Foundation Investments
• Extreme Science and Engineering Discovery Environment (XSEDE)
• Blue Waters
• Research IT• Highlighted Core Services
• Campus Champions• History/Motivation
• Scope
• Benefits
Blue Waters Overview
Blue Waters
• Most capable supercomputer on a University
campus
• Managed by the Blue Waters Project of the
National Center for Supercomputing
Applications at the University of Illinois
• Funded by the National Science Foundation
24
Goal of the project
Ensure researchers and educators can advance
discovery in all fields of study
Blue Waters SystemTop-ranked system in all aspects of its capabilities
Emphasis on sustained performance
• Built by Cray (2011 – 2012).
• 45% larger than any other system Cray has ever built
• By far the largest NSF GPU resource
• Ranks among Top 10 HPC systems in the world in peak performance despite its age
• Largest memory capacity of any HPC system in the world: 1.66 PB (PetaBytes)
• One of the fastest file systems in the world: more than 1 TB/s (TeraByte per second)
• Largest backup system in the world: more than 250 PB
• Fastest external network capability of any open science site: more than 400 Gb/s
(Gigabit per second)
26
Blue Waters SystemProcessors, Memory, Interconnect, Online Storage, System Software, Programming Environment
SoftwareVisualization, analysis, computational libraries, etc.
SEAS: Software Engineering and Application Support
Petascale ApplicationsComputing Resource Allocations
User and Production SupportWAN Connections, Consulting, System Management, Security,
Operations, …
National Petascale Computing Facility
EOTEducation, Outreach,
and Training
GLCPCGreat Lakes
Consortium for Petascale
Computing
HardwareExternal networking, IDS, back-up storage,
import/export, etc
Industrypartners
Blue Waters Ecosystem
Blue Waters Computing System
Sonexion: 26 usable PB
>1 TB/sec
100 GB/sec
Spectra Logic: 200 usable PB400+ Gb/sec WAN
Scuba Subsystem:
Storage Configuration
for User Best Access
1.66 PB
10/40/100 GbEthernet Switch
IB SwitchExternal Servers
13.34 PFLOPS
Blue Waters Allocations: ~600 Active Users
NSF PRAC, 80%
o 30 – 40 teams, annual request for proposals (RFP) coordinated by NSF
o Blue Waters project does not participate in the review process
Illinois, 7%
o 30 – 40 teams, biannual RFP
GLCPC, 2%
o 10 teams, annual RFP
Education, 1%
o Classes, workshops, training events, fellowships. Continuous RFP.
Industry
Innovation and Exploration, 5%
Broadening Participation, a new category for underrepresented communities
28
Blue Waters Allocations: ~600 Active Users
NSF PRAC, 80%
o 30 – 40 teams, annual request for proposals (RFP) coordinated by NSF
o Blue Waters project does not participate in the review process
Illinois, 7%
o 30 – 40 teams, biannual RFP
GLCPC, 2%
o 10 teams, annual RFP
Education, 1%
o Classes, workshops, training events, fellowships. Continuous RFP.
Industry
Innovation and Exploration, 5%
Broadening Participation, a new category for underrepresented communities
29
Usage by Discipline and User
Data From Blue Waters 2016-2017 Annual Report
Biophysics10.8%
Physics12.3%
Astronomical Sciences
10.4%
Earth Sciences13.3%
Stellar Astronomy and
Astrophysics7.4%
Molecular Biosciences
7.6%
Atmospheric Sciences6.4%
Chemistry5.2%
Fluid, Particulate, and Hydraulic Systems
4.5% Engineering4.9% Extragalactic Astronomy
and Cosmology2.4%
Planetary Astronomy2.5%
Galactic Astronomy2.1%
Materials Research2.5%
Nuclear Physics1.3%
Biochemistry and Molecular Structure and Function
1.5%
Neuroscience Biology0.8%
Computer and Computation Research
1.0%
Biological Sciences1.5% Magnetospheric Physics
0.5%
Chemical, Thermal Systems
0.3%
Design and Computer-Integrated
Engineering0.3%
Climate Dynamics
0.1%
Environmental Biology0.1%
Social, Behavioral, and Economic Sciences
0.1%
Other7.5%
Recent Science Highlights
31
LIGO binary-blackhole observation verification
160-million-atom flu virus
EF5 Tornado Simulation
Arctic Elevation Maps
Earthquake rupture
Support for Python and Containers
• Approx. 20% of Blue
Waters users use
Python.
• We provide over 260
Python packages and
two Python versions.
• Support for GPUs,
ML/DL, etc.
• Support for “Docker-like”
containers using Shifter.
• MPI across nodes with
access to native driver.
• Access to GPU from
container.
• Support for Singularity.
32
33
Currently available libraries• TensorFlow 1.3.0
In the Pipeline• TensorFlow 1.4.x
• PyTorch
• Caffe2
• Cray ML Acceleration
Data challenge: large training datasets• Example/Research Data on BW
• ImageNet
• Seeking Datasets for:• Natural Language Processing
• Still looking for data set large enough
• Biomedical dataset• biobank http://www.ukbiobank.ac.uk
• Seeking users interests
Data Science and Machine Learning
Blue Waters Summary
Outstanding Computing System
• The largest installation of Cray’s most advanced technology
• Extreme-scale Lustre file system with advances in
reliability/maintainability
• Extreme-scale archive with advanced RAIT capability
Most balanced system in the open community
• Blue Waters is capable of addressing science problems that are
memory, storage, compute, or network intensive or any
combination.
• Use of innovative technologies provides a path to future systems
NCSA is a leader in developing and deploying these technologies as
well as contributing to community efforts.
34
Post Blue Waters
• Frontera: New leadership class resource at Texas Advanced Computing Center– Production mid-2019
– More to come soon
Outline
• Research Computing @ Illinois– National Science Foundation Investments
• Extreme Science and Engineering Discovery Environment (XSEDE)
• Blue Waters
– Research IT• Highlighted Core Services
– Campus Champions• History/Motivation
• Scope
• Benefits
Research IT
• Research IT Portal: https://researchit.illinois.edu/– Aggregation of many sources of Research
IT resources
– Some provided as services to campus wide community
– Many contributed by campus units
Highlighted Core Research IT Services
• Active Data Service (ADS)– Partnership between Research Data Service
(University Library), NCSA and Tech Services
– Mid-scale storage of actively used data• https://www.library.illinois.edu/rds/active-data-
storage-overview/
Highlighted Core Research IT Services
• Amazon Web Services– Cloud services platform
– Access coordinated by Tech Services• http://techservices.illinois.edu/services/amazon-web-services
Highlighted Core Research IT Services
• Data Management Consultations– Consultation with library based subject experts
for Data Management Plans, required for most funding proposals
– Offered by University Library
Highlighted Core Research IT Services
• IDEALS (Illinois Digital Environment for Access to Learning and Scholarship– Digital repository for research and scholarship
produced at the University of Illinois
– Offered by University Library
– https://www.ideals.illinois.edu/
Highlighted Core Research IT Services
• Illinois Campus Cluster Program (ICCP)– High performance computing cluster available
at Illinois
– Offered by Research IT
– Investor-based access
– On-demand Research Computing as a Service access
– https://campuscluster.illinois.edu/
Highlighted Core Research IT Services
• Training– Training opportunities from many sources
aggregated on Research IT portal
– Combination of live events and asynchronous training opportunities
• https://researchit.illinois.edu/resources?categories=8&page=1
Upcoming Core Research IT Services
• Research Computing Applications and Software Development Consulting– Allocated resource; collaborate with one or
more experts for up to 1 year on your research computing project
– Apply online at https://researchit.illinois.edu/initiatives
• High Throughput Computing – New computational resource
– Currently in pilot phase
Outline
• Research Computing @ Illinois– National Science Foundation Investments
• Extreme Science and Engineering Discovery Environment (XSEDE)
• Blue Waters
– Research IT• Highlighted Core Services
– Campus Champions• History/Motivation
• Scope
• Benefits
Campus ChampionsThe grass roots, ubiquitous,
community of practice
Slides adapted from:
Dana Brunson
XSEDE Campus Engagement co-manager
Asst. VP for Research Cyberinfrastructure
Director, High Performance Computing Center
Oklahoma State University
With support from:
Campus Champions: Breadth
• 520+ champions from 260+ US academic and research-focused institutions help their local researchers to use CI, especially large scale and advanced computing.
• Every US state, every US EPSCoR jurisdiction
• FREE to join with letter of collaboration.
• Funded via XSEDE project (but no longer XSEDE-focused).
• All flavors of campus research computing professionals (directors, sysadmins, user support, trainers, coordinators, research software engineers, etc.).
• Plus new “affiliates” – from organizations that serve the larger community (XSEDE, Internet2, Globus, BDHubs, national HPC centers, etc).
• https://www.xsede.org/campus-champions
47
Campus Champions
Research computing community facilitating computing-
and data-intensive research and education
48
• Champions help their
local researchers and
educators find and use
the advanced digital
services that best meet
their needs; and,
• Champions share CI
challenges and
solutionsWith support from:
https://www.xsede.org/web/site/community-engagement/campus-champions
Evolution of Champions
TeraGrid 2004-2011
• Fall 2007: Planning for Champions began
• 2008: Campus Champions official started
• May 2008: First Champion selected
• December 2009: Champion Leadership team formed
XSEDE 1: 2011-2016
• August 2011: 100th Institution joined
• May 2012: Champion Fellows program began
• June 2013: 200th Champion joined
• July 2013: Domain and Student Champion program initiated
• January 2015: Regional Champion Program Initiated
• August 2015: 200th Institution joined
XSEDE 2: 2016- present
• July 2017: Elected first half of leadership team
• September 2017: 400th Champion joined
• April 2018: Champion institutions in every state and EPSCoR jurisdiction
• July 2018: Fully elected Leadership team
• Planning for sustainability beyond XSEDE
Current numbers:
• Total Academic Institutions : 244
– Academic institutions in EPSCoR jurisdictions : 74
– Minority Serving Institutions: 48
• Non-academic, not-for-profit organizations: 25
• Total Campus Champion Institutions: 269
• Total Number of Champions: 523
49
0
100
200
300
400
500
600
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Campus Champions Growth
Synchronous Champions
Monthly Discussions
• 1st Tuesday: Champion Leadership team
• 2nd Tuesday: Community Chat
• 3rd Tuesday: All-Champions Call
– Guest speakers
– Community updates
• 4th Tuesday: Sustainability working group
Other Calls and Meetings
• Ad hoc special topics calls
• Face-to-face meetings at regional and national conferences
51
All Champion Monthly call
52
Asynchronous Champions
(CY 2017)
53
Longest Threads in 2017
1. Data Management Initiatives on YOUR Campus - TGR/TGW?
2. Adjusted Peak Performance for HPC clusters
3. OS flavors in HPC
4. HPC systems login access only with VPN -- good idea?
5. AMD EPYC and Intel Skylake Pricing Extremes
6. Theoretical Peak Performance
7. Champions-style job board?
8. CephFS for HPC?
9. Successful Scheduling
10. Xeon Phi on Motherboard
11. cgroups Memory Leak in RHEL 6/7.x
12. Advice needed -- Gaussian software on an HPC
13. sysadmin internships for undergrads?
14. On Evaluation of Cluster Security and Patching
15. Cluster Environment Monitoring
16. Service ticketing/tracking software
17. HPC Steering Committee
18. If you had $50K...
19. University risk due to using external computing resources
20. Cloud Costing for NSF
21. High School Student Looking for HPC
22. cloud vs local cluster stats
23. Memory leak in VASP (GPU)?
24. Origins of the Data Management Plan
25. New Campus Champion
26. Onboarding of HPC users & its challenges
27. How to submit an extension via XRAS
Slack
• 14,425 total messages
• 11,752 direct messages
• 1,525 in public channel
Email list:
• Distinct contributors: 274
• NEW contributors: 128
• # messages: 1409
• # threads: 601
Champion Focus Group at PEARC18 Quotes:
What value do you gain from the Champions program?
• …It gives you a position not necessarily of authority but you have expertise of how to help someone with their work and they take you seriously.
• …When I talk to the CS faculty in their ivory towers I know exactly what the cutting edge is and can interact with them.
• I don’t think I could’ve survived without it. So having a job is a benefit
• It’s a very welcoming community…I came to the community 2 years ago…I have no hesitation about sending something to the champions list or the slack channel even if it’s a dumb question
• There are other communities that don’t receive my dumb questions well but not the Champions community
Campus Champions & Friends: Community of Communities
55 Photo Credit: Tiffany Jolley
Plus: SC18 All Champions Meeting with guest speaker Dan Stanzione
Tuesday 1:30-3:30 P.M. Room D220 (Dan at 2:15)56
Questions?
• Thank you for your attention!
• Dean Karres ([email protected])
• Jay Alameda ([email protected])
• Sandie Kappes ([email protected])