Xsede for-nlhpc

20
November 17, 2013 XSEDE: an ecosystem of advanced digital services accelerating scientific discovery John Towns PI and Project Director, XSEDE Director, Collaborative eScience Programs, NCSA [email protected]

description

International Symposium NLHPC 2013: Innovation at the frontier of HPC Title: XSEDE: an ecosystem of advanced digital services accelerating scientific discovery Abstract: The XSEDE program (Extreme Science and Engineering Discovery Environment) has recently entered its third year of operation. In this talk we will discuss the vision, mission and goals of this project and some of the distinguishing characteristics of the program. This will be accompanied by a review of current status and look ahead at where the program is headed over the next several years.

Transcript of Xsede for-nlhpc

Page 1: Xsede for-nlhpc

November 17, 2013

XSEDE: an ecosystem of advanced digital

services accelerating scientific discovery

John Towns

PI and Project Director, XSEDE

Director, Collaborative eScience Programs, NCSA

[email protected]

Page 2: Xsede for-nlhpc

XSEDE – accelerating scientific discovery

XSEDE aspires to be the place to go to access digital research services.

Accelerate scientific discovery by enhancing the productivity of researchers, engineers, and

scholars through the use of advanced digital services and infrastructure.

2

Page 3: Xsede for-nlhpc

Motivation for XSEDE:

• Scientific advancement across multiple disciplines requires a variety of resources and services

• XSEDE is about increased productivity of the community and providing expanded capabilities – leads to more science – is sometimes the difference between a feasible project

and an impractical one – lowers barriers to adoption

• XSEDE provides a comprehensive eScience infrastructure composed of expertly managed and evolving advanced heterogeneous digital resources and services integrated into a general-purpose infrastructure

3

Page 4: Xsede for-nlhpc

Innovation: proactively looking to expand

scope of capabilities

• Striking a balance between providing stable, reliable services and fostering innovation – both in what we are doing and how we do it

• Campus Bridging use cases are mostly for capabilities we have not traditionally supported

• Novel and Innovative Projects team is seeking out new communities and identifying new capabilities necessary to support them

• Architecture design processes explicitly support innovation by the project – more importantly, facilitate innovation by the community

Our goal is to deliver new capabilities – and thus new science – faster

4

Page 5: Xsede for-nlhpc

XSEDE Factoids: high order bits

• 5 year, $121M project – plus $9M, 5 year Technology Investigation Service

• separate award from NSF

– option for additional 5 years of funding upon major review after PY3

• No funding for major hardware – coordination, support and creating a national/international

eScience infrastructure – coordinate allocations, training and documentation for

>$100M of concurrent project awards from NSF

• ~140 FTE (~250 individuals) across 19 partner institutions

5

Page 6: Xsede for-nlhpc

XSEDE’s Strategic Goals

• Deepen and extend the use of the XSEDE ecosystem – deepen use of XSEDE by existing researchers – extend use of XSEDE to new communities – prepare the current and next generation via education, training,

and outreach – raise the general awareness of the value of advanced digital

services

• Advance the XSEDE infrastructure – create an open and evolving infrastructure – enhance the array of technical expertise and support services

offered

• Sustain the XSEDE infrastructure – sustain a reliable and secure infrastructure – provide excellent user support services – operate an effective and innovative virtual organization

6

Page 7: Xsede for-nlhpc

What is XSEDE?

• An ecosystem of advanced digital services – support a growing portfolio of resources and services

• advanced computing, high-end visualization, data analysis, and other resources and services

• interoperability with other infrastructures

• A virtual organization providing – dynamic distributed infrastructure – support services, and technical expertise to enable

researchers engineers and scholars • addressing the most important and challenging problems

facing the nation and world

• A project funded by the National Science Foundation

7

Page 8: Xsede for-nlhpc

Total Research Funding Supported by XSEDE

in FY2013

8

US$700 million in research supported by XSEDE

in FY2013

Page 9: Xsede for-nlhpc

What do you mean by “Advanced Digital

Services?”

• Often use the terms “resources” and “services” – these should be interpreted very broadly – most are likely not operated by XSEDE

• Examples of resources – compute engines: HPC, HTC (high throughput computing), campus,

departmental, research group, project, … – data: simulation output, input files, instrument data, repositories, public

databases, private databases, … – instruments: telescopes, beam lines, sensor nets, shake tables, microscopes, … – infrastructure: local networks, wide-area networks, …

• Examples of services – collaboration: wikis, forums, telepresence, … – data: data transport, data management, sharing, curation, provenance, … – access/used: authentication, authorization, accounting, … – coordination: meta-queuing, … – support: helpdesk, consulting, ECSS, training, … – And many more: education, outreach, community building, …

9

Page 10: Xsede for-nlhpc

XSEDE offers a variety of resources

• Leading-edge distributed memory systems

• Very large shared memory systems

• High throughput systems, including Open Science Grid (OSG)

• Visualization engines

• Accelerators like GPUs and Xeon PHIs

Many scientific problems have components that call for use of more than one architecture.

10

Page 11: Xsede for-nlhpc

Approach to Other Infrastructures:

Active Interactions • OSG is a significant CI in the US – Level 2 Service Provider in XSEDE

– the nation’s premier high-throughput computing infrastructure • complement traditional HPC resources inherited from TeraGrid

– ties to CI (eScience infrastructure) providers internationally

• PRACE is a significant HPC CI in Europe – PRACE represents both large scale HPC and distributed resources

• subsumed DEISA in 2011

– joint Summer School series – working on joint call for collaborations support later this calendar year

• EGI is a significant HTC CI in Europe – initiating organizational benchmarking effort – identifying collaborating research teams spanning XSEDE-EGI

• HPC Wales – Champions programs, Science Gateways – training content

11

Page 12: Xsede for-nlhpc

Some Unexpected Challenges:

XSEDE is a socio-technical ecosystem

• Highly distributed organization – challenges in managing a project that involves

staff at 19 partner institutions

• A completely virtual organization – breaking new ground from an organizational

structure and management point of view

• Highly distributed engineering project – developing new methodologies to adapt

traditional practices to the unusual context of XSEDE

12

Page 13: Xsede for-nlhpc

XSEDE Software Engineering Processes

13

Architecture Design, Documentation

Requirements, Use Cases, Qualities

Detailed Engineering Plans Design, Development, Integration

User Documentation

Production Support

Architecture Review

Integration Test

A&D, S&SE

Focus

Acceptance Test

SD&I

Focus

Operations

Focus

Users

Pilots

Service Providers (TEOS, ECSS)

User

Experience

Operations

enhancements

Bug fixes,

routine

enhancements

Need for new

capabilities

documented architecture

production-ready software

supported software and services

Page 14: Xsede for-nlhpc

A Few Highlights from Past Year

Transitioning from a project in “start-up” mode to regular delivery of value to a broad community of researchers

• Facilitated broad range of ground-breaking research • Added powerful new resources and transitioned users

smoothly – integrated/coordinated documentation

• Initiated development of undergraduate and graduate certificate and degree programs

• Campus Champions reached new heights – 200 Champions at 147 institutions

• Delivered new or improved software, services and capabilities on a regular basis – new POPS user interface, new GridFTP version, xdusage utility,

new RT ticket system, UNICORE client and server, Globus Online services

14

Page 15: Xsede for-nlhpc

Current Status: excellent production operations, improving new services

• Recent annual review held in June – very strong review from panel

– either doing well or on the right track to address issues

• Clear objectives of the panel – make sure XSEDE is successful in major review

next year

– position XSEDE as a leader in the community • beyond the scope of an NSF project

– develop sustainability for XSEDE for the long term

15

Page 16: Xsede for-nlhpc

Objectives for Coming Year+ Accelerating the realization of the XSEDE vision

• Deliver new or improved software, services and capabilities on a regular basis – XSEDE Wide Area Filesystem; Global Federated Filesystem; enhanced

single sign-on; science gateway APIs; Canonical Use Case components

• Campus Bridging will promote "XSEDE Compatible" cluster build tools and use of Globus Online and GFFS for data movement and access

• Incorporate the third cadre of under-represented students into the XSEDE Scholars program

• Expand Champions Program to include Regional, Student, and Domain Champions

• Redesign and implement a new allocations request system • Complete baseline architecture and expanded set of defined Use

Cases • Develop joint activities with industry • Further develop relationships with other resource, service and

infrastructure providers

16

Page 17: Xsede for-nlhpc

What keeps me up at night? - #3

Sustainability of funding (how apropos)

• XSEDE in year 3 of 5 year project

– Can we secure an additional 5 years of funding?

• How do we sustain after 10 years?

– Can we avoid disruptive nature of re-compete while still deriving best value?

– Will NSF even continue to fund such an activity?

• Is NSF definition of “sustainable” that they no longer fund it?

17

Page 18: Xsede for-nlhpc

Call for participation to be announced before SC13!

Page 19: Xsede for-nlhpc

Questions?

Page 20: Xsede for-nlhpc