NERSC Status Update for NERSC User Group Meeting June 2006 William T.C Kramer kramer@nersc

26
NERSC Status Update for NERSC User Group Meeting June 2006 William T.C Kramer [email protected] 510-486-7577 Ernest Orlando Lawrence Berkeley National Laboratory

description

NERSC Status Update for NERSC User Group Meeting June 2006 William T.C Kramer [email protected] 510-486-7577 Ernest Orlando Lawrence Berkeley National Laboratory. Outline. Thanks for 10 Years of Help. This is the 20 th NUG meeting I have the privilege of attending - PowerPoint PPT Presentation

Transcript of NERSC Status Update for NERSC User Group Meeting June 2006 William T.C Kramer kramer@nersc

Page 1: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

NERSC Status Update for NERSC User Group Meeting

June 2006

William T.C [email protected]

510-486-7577

Ernest Orlando LawrenceBerkeley National Laboratory

Page 2: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

Outline

Page 3: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

Thanks for 10 Years of Help

• This is the 20th NUG meeting I have the privilege of attending

• Throughout the past 10 years you all have provided NERSC invaluable help and guidance

• NUG is very unique within the HPC community

• NERSC and I are grateful for your help in making NERSC successful

Page 4: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

Science-Driven Computing Strategy 2006 -2010

Page 5: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

NERSC Must AddressThree Trends

• The widening gap between application performance and peak performance of high-end computing systems

• The recent emergence of large, multidisciplinary computational science teams in the DOE research community

• The flood of scientific data from both simulations and experiments, and the convergence of computational simulation with experimental data collection and analysis in complex workflows

Page 6: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

Science-Driven Systems

• Balanced and timely introduction of best new technology for complete computational systems (computing, storage, networking, analytics)

• Engage and work directly with vendors in addressing the SC requirements in their roadmaps

• Collaborate with DOE labs and other sites in technology evaluation and introduction

Page 7: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

Science-Driven Services

• Provide the entire range of services from high-quality operations to direct scientific support

• Enable a broad range of scientists to effectively use NERSC in their research

• Concentrate on resources for scaling to large numbers of processors, and for supporting multidisciplinary computational science teams

Page 8: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

Science-Driven Analytics

• Provide architectural and systems enhancements and services to more closely integrate computational and storage resources

• Provide scientists with new tools to effectively manipulate, visualize and analyze the huge data sets from both simulations and experiments

Page 9: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

COMPUTATIONAL SYTEMSJAMES CRAWGroup Leader

National Energy Research Scientific National Energy Research Scientific Computing Computing ((NERSCNERSC)) Center DivisionCenter Division

NERSC CENTERNERSC CENTERDIVISION DIRECTORDIVISION DIRECTOR

HORST SIMON

DIVISION DEPUTYDIVISION DEPUTYWILLIAM KRAMER

NERSC CENTER GENERAL MANAGERNERSC CENTER GENERAL MANAGER& HIGH PERFORMANCE COMPUTING & HIGH PERFORMANCE COMPUTING

DEPARTMENT HEADDEPARTMENT HEAD

WILLIAM KRAMER

SCIENCE DRIVEN SYSTEM SCIENCE DRIVEN SYSTEM ARCHITECTUREARCHITECTURE

JOHN SHALF Team Leader

COMPUTER OPERATIONS & ESnet SUPPORT

STEVE LOWEGroup Leader

SCIENCE DRIVEN SERVICESSCIENCE DRIVEN SERVICES

FRANCESCA VERDIERAssociate General Manager

USER SERVICESJONATHAN CARTER

Group Leader

SCIENCE DRIVEN SYSTEMSSCIENCE DRIVEN SYSTEMS

HOWARD WALTERAssociate General Manager

HENP COMPUTINGHENP COMPUTING

CRAIG TULLGroup Leader

MASS STORAGEJASON HICKGroup Leader

NETWORK, SECURITY & SERVERS

BRENT DRANEYGroup Leader

ANALYTICSWES BETHEL- TL

(Matrixed - CRD)

OPEN SOFTWARE & PROGRAMMING

DAVID SKINNER Group Leader

ACCOUNTS & ALLOCATION TEAMCLAYTON BAGWELL

Team Leader

Page 10: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

SCIENCE DRIVEN SYSTEMS

HOWARD WALTER Associate General Manager

NERSC CenterNERSC Center

USER SERVICES

JONATHAN CARTERGroup Leader

Harsh Anand Andrew Canning (.25- CRD)

Richard GerberFrank HaleHelen He

Peter Nugent – (.25-CRD) David Skinner (.5)

Mike StewartDavid Turner (.75)

NERSC CENTER GENERAL MANAGERNERSC CENTER GENERAL MANAGER

WILLIAM KRAMER

COMPUTATIONALSYSTEMS

JAMES CRAWGroup Leader

Matthew Andrews (.5)William BairdNick Balthaser

Scott Burrow (V)Greg ButlerTina Butler

Nicholas CardoThomas Langley

Rei LeeDavid Paul

Iwona SakrejdaJay Srinivasan

Cary Whitney (HEP/NP)

Open Positions (2)COMPUTER

OPERATIONS & ESnet SUPPORT

STEVE LOWEGroup Leader

Richard BeardDel Black

Aaron GarrettRussell Huie (ES)

Yulok LamRobert Neylan

Tony Quan (ES)Alex Ubungen

NETWORKING,SECURITY, SERVERS & WORKSTATIONS

BRENT DRANEYGroup Leader

Elizabeth Bautista (DB)Scott Campbell

Steve ChanJed Donnelley

Craig LantRaymond Spence

Tavia Stone

Open Position (DB)

SCIENCE DRIVENSYTEM

ARCHITECTURETEAM

JOHN SHALF Team Leader

Andrew Canning (.25- CRD)Chris Ding (.2 – CRD)Esmond Ng (.25-CRD) Lenny Oliker (.25-CRD)

Hongzhang Shan (.5-CRD)David Skinner (.5)

E.Strohmaier (.25-CRD)Lin Wang Wang (.5 – CRD)

Harvey WassermanMike Welcome (.15-CRD) Katherine Yelick (.05-CRD)

MASSSTORAGE

JASON HICKGroup Leader

Matthew Andrews (.5)Shreyas CholiaDamian Hazen

Wayne Hurlbert

Open Position (1)

SCIENCE DRIVEN SERVICES

FRANCESCA VERDIER Associate General Manager

OPEN SOFTWARE & PROGRAMMING

DAVID SKINNERGroup Leader

Mikhail AvrekhTom DavisRK Owen

Open Position (1) - Grid

ANALYTICSWES BETHEL

Team Leader(.5-CRD)

Cecilia Aragon (.2 - CRD)Julian Borrill (.5 - CRD)

Chris Ding (.3 - CRD)Peter Nugent (.25 - CRD) Christina Siegrist (CRD)

Dave Turner (.25)

Open Positions (1.5)

V- Vendor staffCRD – Matrixed staff from CRDES – funded by ESnetHEP/NP – funded by LBNL HEP and NP DivisionDB – Division Burden

ACCOUNTS & ALLOCATIONS

CLAYTON BAGWELL Team Leader

Mark HeerKaren Zukor (.5)

Page 11: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

2005-2006 Accomplishments

Page 12: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

Large-Scale Capability Computing Is Addressing New Frontiers

INCITE Program at NERSC in 2005:

• Turbulent Angular Momentum Transport; FaustoFausto Cattaneo, University of Chicago

– Order of magnitude improvement in simulation of accretion in stars and in the lab.

• Direct Numerical Simulation of Turbulent Non-premixed Combustion; Jackie Chen, Sandia Labs

– The first 3D Direct Numerical Simulation of a turbulent H2/CO/N2-air flame with detailed chemistry. Found new flame phenomena unseen in 2D.

• Molecular Dynameomics; Valerie Dagget, University of Washington

– Simulated folds for 38% of all known proteins– 2 TB protein fold database created

Comprehensive Scientific Support:•20-45% code performance improvements 2M extra hours

•All projects relied heavily on NERSC visualization services

DOE Joule

metric

Page 13: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

The Good

• Deployed Bassi – January 2006– One of the fastest installations and acceptances– Bassi providing exceptional service

• Deployed NERSC Global File System – Sept 2005– Upgraded – January 2006– Excellent feedback from users

• Stabilized Jacquard – October 2005 to April 2006– Resolved MCE– errors– Installed 40 more nodes

Page 14: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

The Good

• Improved PDSF – Added processing and storage– Converted 100’s of NSF file systems to a few GPFS file

systems– Access to NGF

• Increased Archive Storage function and performance– Upgraded to HPSS 5.1 – April 2006– More tape drives– More Cache disk– 10 GE Servers

• NERSC 5 procurement – On schedule and below cost (to do the procurement)

• Continued Network tuning

Page 15: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

The Good

• Deployed Bassi – January 2006– One of the fastest installations and

acceptances– Bassi providing exceptional service

• Deployed NERSC Global File System – Sept 2005– Upgraded – January 2006– Excellent feedback from users

• Stabilized Jacquard – October 2005 to April 2006– Resolved MCM errors– Installed 40 more nodes

Page 16: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

The Good

• Continued Network Tuning• Security

– Continued to avoid major incidents– Good results from the “Site Assistance Visit”

at LBNL • LBNL and NERSC “outstanding”• Still a lot of work to do – and some changes – before

they return in a year

• Over allocation issues (AY 05) solved– Better queue responsiveness– Stable time allocations

Page 17: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

The Good

• Other– Thanks to ASCR the NERSC budget

appears stabilized– Worked with others to help define HPC

business practices– Continued progress in influencing

advanced HPC concepts• Cell, Power, Interconnects, Software

roadmaps, evaluation methods, working methods,…

Page 18: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

The Not So Good

• Took a long time to stabilize Jacquard– Learned some lessons about light weight

requirements

• Upgrades on systems have not gone as well as we would have liked– Extremely complex – and much is not

controlled by NERSC

• Security attempts continue and increase in sophistication– Can expect continued evolution

• User and NERSC data base usage will be a point of focus

Page 19: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

The Jury is still out

• Analytics ramp-up taking longer than we desired– NGF major step– Some success stories, but we don’t have breadth

• Scalability of Codes– DOE Expects significant (>50%?) of time to be for

jobs > 2,048 way for the first full year of NERSC-5– Many of the most scalable applications are

migrating to LCFs – so some of the low hanging fruit is already harvested

– Should be a continuing focus of NERSC and NUG

Page 20: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

2005-2006 Progress On Goals

Page 21: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

FY 04-06 Overall Goals

1. (Support for DOE Office of Science) Support and assist DOE Office of Science in meeting its goals and obligations through the research, development, deployment and support of high performance computing and storage resourcesand advanced mathematical and  computer systems software.

2. (Systems and Services)Provide leading edge, open High Performance Computing (HPC) systems and services to enable scientific discovery. NERSC will use its expertise and leadership in HPC to provide reliable, timely, and excellent services to its users.

3. (Innovative assistance)Provide innovative scientific and technical assistance to NERSC's users. NERSC will work closely with the user community and together produce significant scientific results while making the best use of NERSC facilities.

4. (Respond to Scientific Needs)Be an advocate for NERSC users within the HPC community. Respond to science-driven needs with new and innovative services and systems.

Page 22: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

FY 04-06 Overall Goals

5. (Balanced integration of  new products and ideas)Judiciously integrate new products, technology, procedures, and practices into the NERSC production environment in order to enhance NERSC's ability to support scientific discovery.

6. (Advance technology)Develop future cutting-edge strategies and technologies that will advance high performance scientific computing capabilities and effectiveness, allowing  scientists to solve new and larger problems, and making HPC systems easier to use and manage.

7. (Export NERSC knowledge)Export knowledge, experience, and technology developed at NERSC to benefit computer science and the high performance scientific computing community.

8. (Culture)Provide a facility that enables and stimulates scientific discovery by continually improving our systems, services and processes. Cultivate a can-do approach to solving problems and making systems work, while maintaining high standards of ethics and integrity.

Page 23: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

2005-2006 Progress 5 Year Plan Milestones

Page 24: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

5 Year Plan Milestones

• 2005– NCS enters full service.- Completed

• Focus is on modestly parallel and capacity computing.• >15–20% of Seaborg

– WAN upgrade to 10 Gb/s .- Completed– Upgrade HPSS to 16 PB. Storage upgrade to support 10

GB/s for higher density and increased bandwidth. .- Completed

– Quadruple the size of the visualization/post-processing server. .- Completed

• 2006– NCSb enters full service. .- Completed

• Focus is on modestly parallel and capacity computing• >30–40% of Seaborg .- Completed – Actually > 85% of

Seaborg SSP

Page 25: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

5 Year Plan Milestones

• 2006– NERSC-5: initial delivery with possibly a phasing of delivery. –

Expected – but most will be in FY 07• 3 to 4 times Seaborg in delivered performance – Over Achieved –

more later• Used for entire workload and has to be balanced

– Replace the security infrastructure for HPSS and add native Grid capability to HPSS – Completed and Underway

– Storage and Facility-Wide File System upgrade. .- Completed and Underway

• 2007– NERSC-5 enters full service. - Expected– Storage and Facility-Wide File System upgrade. - Expected– Double the size of the visualization/post processing server. – If

usage dictates

Page 26: NERSC Status Update for  NERSC User Group Meeting June 2006  William T.C Kramer kramer@nersc

Summary

• It is a good time to be in HPC• NERSC has far more success stories than

issues• NERSC Users are doing an outstanding

job producing leading edge science for the Nation– More than 1,200 peer reviewed papers for AY

05.

• DOE is extremely support of NERSC and its users