Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex...

12
National Aeronautics and Space Administration www.nasa.gov Theme 2: Deep Dive HECC Project @ NAS 1 Piyush Mehrotra [email protected] Chief, NASA Advanced Supercomputing (NAS) Division Ames Research Center https://ntrs.nasa.gov/search.jsp?R=20180007294 2020-04-23T04:39:30+00:00Z

Transcript of Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex...

Page 1: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

National Aeronautics and Space Administration

www.nasa.gov

Theme 2: Deep DiveHECC Project @ NAS

1

Piyush [email protected]

Chief, NASA Advanced Supercomputing (NAS) DivisionAmes Research Center

https://ntrs.nasa.gov/search.jsp?R=20180007294 2020-04-23T04:39:30+00:00Z

Page 2: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

HECC Environment

2

Tape

(1 EB

capacity)

Cloud

Bursting

Servers

Archive

Pleiades

(7.57 PF)

Endeavour (32 TF

shared memory)

HelioCollaboration/

Data Portals

Global Online

Data Storage

(> 40 PB)

Aux Processing

(GPU, Data

Analysis)

hyperwall

(128-GPUs)NAS Facility

Extension (2019)

NEX

Specialized

systems

(TESS, PRACA)

Security

Devices

5 GB/sec 15 GB/sec5 GB/sec

380 GB/sec

Electra

(MSF 4.78 PF)

180 GB/sec

Mission StatementTo accelerate and enhance NASA's mission of space exploration, scientific discovery, and aeronautics research by continually ensuring optimal use of a productive high-end computing environment .

Page 3: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

HECC Services

3

HECC provides a suite of complimentary services to the user community to enhance the scientific and engineering results obtained from the hardware assets.• Systems – Customized solutions including compute and storage solutions to meet specific project or

mission requirements.

• Network – End-to-end network performance enhancements for user communities

• Application Performance and Productivity – Software solutions provided to research/engineering teams to better exploit installed systems.

• Visualization and Data Analysis – Custom visualization during traditional post-processing or concurrent during simulation to understand complex interactions of data.

• Data Analytics/Machine Learning – Exploitation of data sets through neural nets and emerging new techniques.

• Data Gateways – Custom data portals to support diverse projects

• Customized system support

Page 4: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

Interactive Analysis of Petascale Ocean Data

4

• MITgcm runs, up to 70,000 cores, have produced > 5PB dataset

• Customized visualization framework and software to interactively analyze/explore the data

• Supports 2600 different layouts to address different questions

• Utilizes 2TB NVMESSDs (1 per hyperwall node)- Filesystem access at near-memory speeds- No latency penalties for seeks, unlike spinning disks- In-house developed software layer exposes NVMe devices

across hosts as unified block devices accessible over network

- Yields scalable aggregate performance allowing browsing through entire dataset cutting across the storage grain at interactive speeds.

GlobalView

DifferentDepths

AllScalars

Liked Scatterplot and map

Page 5: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

NASA Earth Exchange (NEX)A Collaborative environment to engage and enable Earth scientists

Collaboration Portal and Knowledgebase• Web server

• Database server

• 503 registered

members (both

NASA and non-

Nasa)

OpenNEX (Cloud Infrastructure)• 50+ TBs data

• Images

• Tutorials

• Workshops

Compute Resources

Sandbox: for

prototyping –

available to all

HPC: 720-core

dedicated queue +

access to rest of

HECC systems

Data RepositoryNFS storage - 2.3 PBs • Cached Datasets:

LANDSAT, Modis,

TRMM, …

• Available internally

and externally

Lustre storage – 3.2 PBs available internally

Page 6: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

Heliophysics Portal Querying Integrated Database of Solar Flares

6

• Heliophysics Portal provides highlights of the latest solar events

• Multi-instrument database provides an integrated view of reported flares and ground-based observations

• Web interface to search for unique flare events based on their physical characteristics and other pre-defined criteria, in order to investigate their radiation properties, including extreme ultraviolet radiation and X-ray radiation.

• Data from three primary flare lists (NOAA, NASA, and Lockheed Martin) and event catalogs from spacecraft and ground-based observations are integrated into the database for data starting from 2002

• http://helioportal.nas.nasa.gov.

Page 7: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

Commercial Cloud Trade Study

7

Goal: Evaluate the suitability of commercial clouds for HPC ApplicationsApproach• Workload: NPBs, six full-sized applications (ATHENA++, ECCO, ENZO, FVCore, WRF, OpenFOAM)• Systems: HECC systems Pleiades and Electra, Amazon Web Services (AWS), Penguin-on-Demand (POD) • Cost Basis: HECC – NAS full cost of running (HW/SW, power, maintenance, staff, and facility costs); AWS and POD – only

the compute costs from published rates and any publicly-known discounts (spot pricing, lease price, etc.)• Key Findings: Commercial clouds currently do not offer a viable, cost-effective approach for replacing in-house HPC

resources for NASA HPC applications. However, there may be use cases where a commercial cloud is a viable alternatives, e.g.,, specialized hardware

• Actions:- Continually evaluate the suitability of commercial clouds- Develop an environment to support bursting to commercial clouds (on a full-cost recovery basis) for S&E

projects – Phase 1 pilot project available September end.

Evaluating the Suitability of Commercial Clouds for NASA’s High Performance Computing Applications: A Trade Study: Chang et al, NAS Technical Report NAS-2018-01, May 2018https://www.nas.nasa.gov/assets/pdf/papers/NAS_Technical_Report_NAS-2018-01.pdf

Page 8: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

Perf. & Cost Comparison: HECC - AWS

8

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

0

5,000

10,000

15,000

20,000

25,000

ATHENA (1024)

ATHENA (2048)

ECCO (120)

ECCO (240)

ENZO (196)

FVCore (1176)

nuWRF (1700)

OpenFOAM (4

8)

OpenFOAM (1

44)

OpenFOAM (2

88)

Perfo

rman

ce R

atio

Perfo

rman

ce (s

ec)

Application Performance Comparison

HECC time (sec) AWS time (sec) Perf. Ratio

Page 9: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

Deep Dive: NAS HECC Project (1)

9

How does each division structure their archiving and what is its architecture?• Archive Systems sit on high performance networks (56 gigabit) as peers to the supercomputing

environment. • Archive Systems have some direct analysis capabilities (analyze in place)• System level automation moves data from archive disks to two tape copies in a reliable and

transparent ways streaming copies in 10’s of gigabytes/second

Do archives provide analysis tools? What are they?• Some amount of limited commercial licensed software provided by HECC: MATLAB, ID, …• Can install any software stack as needed by PI/user: Commercial, proprietary, open source, ..

How do user expectations drive any of their processes?• HEC C supports general purpose environments at no cost to users and special purpose (custom)

environments based on actual cost

Page 10: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

Deep Dive: NAS HECC Project (2)

10

What requirements do they put on archiving that ensures its viability for how long?• HECC continually tracks different storage technologies and opportunistically upgrades: when significant

reductions in cost or improvements in performance become available• Largely driven by LTO tape costs• Current practice is to migrate existing data on tape from one generation to the next

How can computing centers help in data processing and data analytics?• Best-case scenario for NASA scientists is to site copies of datasets next to the large-scale compute so

as to increase productivity• Continually enhance online and archival storage at very low incremental costs

Page 11: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

Takeaway

11

NASA in-house HPC centers provide cost-effective environments for accessing, analyzing and archiving SMD observational and model datasets

§ Large-scale computational resources§ Enhanced I/O capabilities§ Large-scale online storage for quick access and archival systems for long-term

storage§ Support for visualization and data analytics§ (Will) work with cloud environments to provide hybrid resources§ Capability to provide specialized hardware/software systems for custom data

analysis requirements

Page 12: Theme 2: Deep Dive HECC Project @ NAS · concurrent during simulation to understand complex interactions of data. ... Amazon Web Services (AWS), Penguin-on-Demand (POD) ... • HEC

High End Computing Capability Portfolio 12

Comments/Questions?

[email protected]

https://www.nas.nasa.gov/hecc