Parallel IO in the Community Earth System Model

35
Parallel IO in CESM Jim Edwards [email protected] NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012 Parallel IO in the Community Earth System Model Jim Edwards John Dennis (NCAR) Ray Loy(ANL) Pat Worley (ORNL)

description

Parallel IO in the Community Earth System Model. Jim Edwards John Dennis (NCAR) Ray Loy(ANL ) Pat Worley (ORNL). Some CESM 1.1 Capabilities: Ensemble configurations with multiple instances of each component Highly scalable capability proven to 100K+ tasks Regionally refined grids - PowerPoint PPT Presentation

Transcript of Parallel IO in the Community Earth System Model

Page 1: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in the Community Earth System Model

Jim Edwards John Dennis

(NCAR)Ray Loy(ANL)

Pat Worley (ORNL)

Page 2: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Page 3: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Some CESM 1.1 Capabilities:– Ensemble configurations with multiple

instances of each component– Highly scalable capability proven to

100K+ tasks– Regionally refined grids– Data assimilation with DART

Page 4: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Prior to PIO• Each model component was

independent with it’s own IO interface

• Mix of file formats – NetCDF– Binary (POSIX)– Binary (Fortran)

• Gather-Scatter method to interface serial IO

Page 5: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Steps toward PIO• Converge on a single file format

– NetCDF selected • Self describing• Lossless with lossy capability (netcdf4

only)• Works with the current postprocessing tool

chain

Page 6: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Extension to parallel• Reduce single task memory profile• Maintain single file decomposition

independent format• Performance (secondary issue)

Page 7: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Parallel IO from all compute tasks is not the best strategy– Data rearrangement is complicated

leading to numerous small and inefficient IO operations

– MPI-IO aggregation alone cannot overcome this problem

Page 8: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Parallel I/O library (PIO)• Goals:

– Reduce per MPI task memory usage– Easy to use– Improve performance

• Write/read a single file from parallel application

• Multiple backend libraries: MPI-IO,NetCDF3, NetCDF4, pNetCDF, NetCDF+VDC

• Meta-IO library: potential interface to other general libraries

Page 9: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

CPL7 COUPLER

CISL LAND ICE MODEL

CAM ATMOSPHERIC

MODEL

CLM LAND MODEL

POP2 OCEAN MODEL

CICE OCEAN ICE MODEL

PIO

netcdf3pnetcdf

netcdf4

HDF5

VDC

MPI-IO

Page 10: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Separation of Concerns• Separate computational and I/O

decomposition• Flexible user-level rearrangement• Encapsulate expert knowledge

PIO design principles

Page 11: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• What versus How– Concern of the user:

• What to write/read to/from disk?• eg: “I want to write T,V, PS.”

– Concern of the library developer:• How to efficiently access the disk?• eq: “How do I construct I/O operations so

that write bandwidth is maximized?”• Improves ease of use• Improves robustness• Enables better reuse

Separation of concerns

Page 12: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Separate computational and I/O decomposition

computational decomposition

I/O decomposition

Rearrangement between computational and I/Odecompositions

Page 13: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• A single technical solution is not suitable for the entire user community:– User A: Linux cluster, 32 core job, 200

MB files, NFS file system– User B: Cray XE6, 115,000 core job,

100 GB files, Lustre file system

Different compute environment requires different technical solution!

Flexible user-level rearrangement

Page 14: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Writing distributed data (I)

+ Maximize size of individual io-op’s to disk- Non-scalable user space buffering- Very large fan-in large MPI buffer allocations

Correct solution for User A

Computational decompositionI/O decomposition

Rearrangement

Page 15: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Writing distributed data (II)

+ Scalable user space memory + Relatively large individual io-op’s to disk- Very large fan-in large MPI buffer allocations

Computational decomposition

Rearrangement

I/O decomposition

Page 16: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Writing distributed data (III)

+ Scalable user space memory+ Smaller fan-in -> modest MPI buffer allocations- Smaller individual io-op’s to disk

Correct solution for User B

Computational decomposition

Rearrangement

I/O decomposition

Page 17: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Flow-control algorithm• Match size of I/O operations to stripe size

– Cray XT5/XE6 + Lustre file system– Minimize message passing traffic at

MPI-IO layer• Load balance disk traffic over all I/O nodes

– IBM Blue Gene/{L,P}+ GPFS file system

– Utilizes Blue Gene specific topology information

Encapsulate Expert knowledge

Page 18: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Did we achieve our design goals?• Impact of PIO features

– Flow-control– Vary number of IO-tasks– Different general I/O backends

• Read/write 3D POP sized variable [3600x2400x40]

• 10 files, 10 variables per file, [max bandwidth]• Using Kraken (Cray XT5) + Lustre filesystem

– Used 16 of 336 OST

Experimental setup

Page 19: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 20: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 21: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 22: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 23: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 24: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

PIOVDCParallel output to a VAPOR Data Collection (VDC)• VDC:

– A wavelet-based, gridded data format supporting both progressive access and efficient data subsetting

• Data may be progressively accessed (read back) at different levels of detail, permitting the application to trade off speed and accuracy

– Think GoogleEarth: less detail when the viewer is far away, progressively more detail as the viewer zooms in

– Enables rapid (interactive) exploration and hypothesis testing that can subsequently be validated with full fidelity data as needed

• Subsetting– Arrays are decomposed into smaller blocks that significantly improve

extraction of arbitrarily oriented sub arrays• Wavelet transform

– Similar to Fourier transforms– Computationally efficient: order O(n)– Basis for many multimedia compression technologies (e.g. mpeg4,

jpeg2000)

Page 25: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Other PIO Users• Earth System Modeling Framework

(ESMF)• Model for Prediction Across Scales

(MPAS)• Geophysical High Order Suite for

Turbulence (GHOST)• Data Assimilation Research Testbed

(DART)

Page 26: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Penn State University

26

Write performance on BG/L

April 26, 2010

Page 27: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Penn State University

27

Read performance on BG/L

April 26, 2010

Page 28: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

100:1 Compression with coefficient prioritization10243 Taylor-Green turbulence (enstrophy field) [P. Mininni, 2006]

No compression Coefficient prioritization (VDC2)

Page 29: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

40963 Homogenous turbulence simulation Volume rendering of original enstrophy field and 800:1 compressed field

Data provided by P.K. Yeung at Georgia Tech and Diego Donzis at Texas A&M

Original: 275GBs/field 800:1 compressed: 0.34GBs/field

Page 30: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

F90 code generation

interface PIO_write_darray! TYPE real,int! DIMS 1,2,3 module procedure write_darray_{DIMS}d_{TYPE} end interface

genf90.pl

Page 31: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

# 1 "tmp.F90.in"interface PIO_write_darray module procedure dosomething_1d_real module procedure dosomething_2d_real module procedure dosomething_3d_real module procedure dosomething_1d_int module procedure dosomething_2d_int module procedure dosomething_3d_intend interface

Page 32: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• PIO is opensource– http://code.google.com/p/parallelio/

Documentation using doxygen• http://web.ncar.teragrid.org/~dennis/pio_do

c/html/

Page 33: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Thank you

Page 34: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• netCDF3– Serial– Easy to implement– Limited flexibility

• HDF5– Serial and Parallel– Very flexible– Difficult to implement– Difficult to achieve good performance

• netCDF4– Serial and Parallel – Based on HDF5– Easy to implement– Limited flexibility– Difficult to achieve good performance

Existing I/O libraries

Page 35: Parallel IO in the Community Earth System Model

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Parallel-netCDF– Parallel– Easy to implement– Limited flexibility– Difficult to achieve good performance

• MPI-IO– Parallel – Very difficult to implement– Very flexible– Difficult to achieve good performance

• ADIOS– Serial and parallel– Easy to implement– BP file format

• Easy to achieve good performance– All other file formats

• Difficult to achieve good performance

Existing I/O libraries (con’t)