Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol [email protected] The HDF...

49
www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol [email protected] The HDF Group November 13, 2012 1

Transcript of Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol [email protected] The HDF...

Page 1: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

The HDF Group

1

HDF5: State of the Union

Quincey Koziol

[email protected]

The HDF Group

November 13, 2012

Page 2: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

What is HDF5?

• A versatile data model that can represent very complex data objects and a wide variety of metadata.

• A completely portable file format with no limit on the number or size of data objects stored.

• An open source software library that runs on a wide range of computational platforms, from cell phones to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.

• A rich set of integrated performance features that allow for access time and storage space optimizations.

• Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.

November 13, 2012 2

Page 3: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org3

HDF5 Technology Platform

• HDF5 Abstract Data Model• Defines the “building blocks” for data organization and

specification• Files, Groups, Links, Datasets, Attributes, Datatypes,

Dataspaces

• HDF5 Software• Tools • Language Interfaces• HDF5 Library

• HDF5 Binary File Format• Bit-level organization of HDF5 file• Defined by HDF5 File Format Specification

November 13, 2012

Page 4: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.orgNovember 13, 2012 4

HDF5 Data Model

• Groups – provide structure among objects• Datasets – where the primary data goes

• Data arrays• Rich set of datatype options• Flexible, efficient storage and I/O

• Attributes, for metadata

Everything else is built essentially from these parts.

Page 5: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.orgNovember 13, 2012 5

Structures to organize objects

palette

Raster image

3-D array

2-D array

Raster image

lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

Table

“/” (root)

“/TestData”

“Groups”

“Datasets”

Page 6: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Why use HDF5?

• Challenging data:• Application data that pushes the limits of what can be

addressed by traditional database systems, XML documents, or in-house data formats.

• Software solutions:• For very large datasets, very fast access requirements,

or very complex datasets.• To easily share data across a wide variety of

computational platforms using applications written in different programming languages.

• That take advantage of the many open-source and commercial tools that understand HDF5.

• Enabling long-term preservation of data.

November 13, 2012 6

Page 7: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Who uses HDF5?

• Examples of HDF5 user communities• Astrophysics• Astronomers• NASA Earth Science Enterprise• Dept. of Energy Labs• Supercomputing centers in US, Europe and Asia• Financial Institutions• NOAA• Manufacturing industries• Many others

• For a more detailed list, visit• http://www.hdfgroup.org/HDF5/users5.html

November 13, 2012 7

Page 8: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org8

Topics

November 13, 2012

What's up with The HDF Group?

Library Update

Tools update

HDF Java Products

Library development in the works

Other activities

Page 9: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.orgNovember 13, 2012 9

Brief History of HDF1987 At NCSA (University of Illinois), a task force formed to create an

architecture-independent format and library:AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF

Early NASA adopted HDF for Earth Observing System project 1990’s

1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files).

“Big HDF” became HDF5. 1998 HDF5 was released with support from DOE Labs, NASA, NCSA

2006 The HDF Group spun off from University of Illinois as non-profit corporation

Page 10: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org10

The HDF Group

• Established in 1988• 18 years at University of Illinois’ National Center for

Supercomputing Applications• 5 years as independent non-profit company, “The

HDF Group”

• The HDF Group owns HDF4 and HDF5• HDF4 & HDF5 formats, libraries, and tools are open

source and freely available with BSD-style license

• Currently employ 37 FTEs• Looking for more developers now!

November 13, 2012

Page 11: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

The HDF Group

11

The HDF Group Mission

To ensure long-term accessibility of HDF data through sustainable development and support of HDF

technologies.

November 13, 2012

Page 12: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org12

Goals of The HDF Group

• Maintain and evolve HDF for sponsors and communities that depend on it

• Provide support to the HDF communities through consulting, training, tuning, development, research

• Sustain the company for the long term to assure data access over time

November 13, 2012

Page 13: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org13

The HDF Group Services

• Helpdesk and Mailing Lists • Available to all users as a first level of support:

[email protected]• Priority Support

• Rapid issue resolution and advice • Consulting

• Needs assessment, troubleshooting, design reviews, etc.• Training

• Tutorials and hands-on practical experience • Enterprise Support

• Coordinating HDF activities across departments• Special Projects

• Adapting customer applications to HDF • New features and tools• Research and Development

November 13, 2012

Page 14: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org14

Members of the HDF support community

• NASA – Earth Observing System• NOAA/NASA/Riverside Tech – NPOESS• A large financial institution• DOE – Exascale FastForward w/Intel & EMC• DOE – projects w/LBNL & PNNL, ANL & ORNL• Lawrence Livermore National Lab• Sandia National Lab• ITER – project with General Atomics• A leading U.S. aerospace company• University of Illinois/NCSA• PSI/Dectris and DESY – European light sources• Projects for petroleum industry, vehicle testing, weapons

research, others• “In kind” supportNovember 13, 2012

Page 15: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Income Profile – 2012

15

Total income: ~$3.7 million

November 13, 2012

Commercial; 33%

NASA & NOAA; 48%

DOE; 17%

Other Govt & Academic; 2%

Page 16: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

New Directions We’re Taking

• High energy light source data storage• Projects with DESY and PSI/Dectris, to store

data from European synchotrons and particle accelerators

• Applications of HDF5 in the Bioinformatics field• Working with researchers at IRRI & Brown U.

• Synthesis of HDF5 and database storage w/Oracle• Exploring how to interact with HDF5 files

through SQL interface

November 13, 2012 16

Page 17: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org17

Trillion Particle Simulation on NERSC’s hopper system

November 13, 2012

Cool recent application

• VPIC with 100,000 nodes on hopper• Achieved 27GB/s sustained rate to each 32TB

HDF5 file (out of 35GB/s theoretical peak)• http://1.usa.gov/Le0JF8

Page 18: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org18

Topics

November 13, 2012

What's up with The HDF Group?

Library Update

Tools Update

HDF Java Products

Library development in the works

Other activities

Page 19: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Where We’ve Been

• Release 1.0• First “prototype” release in Oct, 1997• Incorporated core data model: datatypes,

dataspaces & datasets and groups• Parallel support added in r1.0.1, in Jan, 1999

• Release 1.2.0 - Oct, 1999• Added support for bitfield, opaque, enumeration,

variable-length and reference datatypes.• Added new ‘h5toh4’ tool• Lots of polishing• Performance optimizations

November 13, 2012 19

Page 20: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Where We’ve Been

• Release 1.4.0 - Feb, 2001• Added Virtual File Driver (VFD) API layer, with many

drivers• Added ‘h4toh5’, h5cc tools, XML output to h5dump• Added array datatype• F90 & C++ API wrappers• Performance optimizations

• Release 1.6.0 - July, 2003• Generic Property API• Compact dataset storage• Added ‘h5diff’, ‘h5repack’, ‘h5jam’, ‘h5import’ tools• Performance optimizations

November 13, 2012 20

Page 21: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Where We’re At Now

• Release 1.8.0 - Feb, 2008• Features to support netCDF-4

• Creation order indexing on links and attributes• Integer-to-floating point conversion support• NULL dataspace

• More efficient group storage• External Links• New H5L (links) and H5O (objects) APIs• Shared Object Header Messages• Unicode-8 Support• Anonymous object creation• New tools: ‘h5mkgrp’, ‘h5stat’, ‘h5copy’• CMake build support• Performance optimizations

November 13, 2012 21

Page 22: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Repository Statistics

November 13, 2012 22

HDF5 Library Source Code, Lines of Code by Date

Page 23: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Repository Statistics

November 13, 2012 23

HDF5 Library Source Code, Lines of Code by Release

Page 24: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Software Engineering in HDF5

• We spend considerable effort in ensuring we produce very high quality code for HDF5.

• Current efforts:• Correctness regression testing

• Nightly testing of >60 configurations on >20 machines

• Performance regression testing• Applying static code analysis – Coverity,

Klocwork• Memory leak detection – valgrind• Code coverage – coming soon

November 13, 2012 24

Page 25: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Performance Regression Test Suite

November 13, 2012 25

Page 26: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org26

HDF5 1.8.9 minor release (May ‘12)

• Library:• New API routine to validate object paths:

H5LTpath_valid()• New API routines to work with file images.• New feature to merge committed datatypes when

copying them.• Parallel I/O:

• New API routine to set MPI atomicity: H5Fset_mpi_atomicity()

• Tools:• New features added to h5repack, h5stat & h5repack.

• Bugs fixed:• Many bugs fixed in library and tools.

November 13, 2012

Page 27: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org27

HDF5 1.8.10 minor release (Nov ‘12)

• Library:• Reduced memory footprint for internal buffering.• Improved behavior of collective chunk I/O.

• Parallel I/O:• New API routine to query why collective I/O was

broken: H5Pget_mpio_no_collective_cause()• Tools:

• New features added to h5import.• Retired some out of date performance tools.

• Bugs fixed:• Updated to latest autotools release.• Many fixes to tools, high-level library and FORTRAN.

November 13, 2012

Page 28: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org28

Where We’ll Be Soon

• Release 1.10 - Overview• Beta release in November, 2011 Soon!• Stopped adding major features, fleshing out

our current efforts now• Major Efforts:

• Improved scalability of chunked dataset access

• Single-Writer/Multiple Reader (SWMR) Access• Improved fault tolerance• Initial support for asynchronous I/O

November 13, 2012

Page 29: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org29

Where We’ll Be Soon• Release 1.10 - Details

• New chunked dataset indexing methods• Single-Writer/Multiple-Reader (SWMR) Access• Improved Fault Tolerance

• Journaled Metadata Writing• Ordered Updates

• Page-aligned and buffered metadata access• Persistent file free space tracking• Basic support for asynchronous I/O

• Expanded Virtual File Driver (VFD) interface• Lazy metadata writes (in serial)

• F2003 Support• Compressed group information• High-level “HPC” API

• Collective I/O on multiple datasets• Metadata broadcast between processes

• Performance optimizations• Avoid file truncationNovember 13, 2012

Page 30: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org30

Where We Might Get To

• Release 1.10 - Maybe?• Full C99 type support (long double, complex,

boolean types, etc)• Support for runtime file format limits• Improved variable-length datatype storage• Virtual Object Layer

• Abstraction layer to allow HDF5 objects to be stored in any container

• Allows application to break “all collective” metadata modification limit in current HDF5

November 13, 2012

Page 31: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org31

Where We’re Not Going

• We’re not changing multi-threaded concurrency support• Keep “global lock” on library• Will use asynchronous I/O heavily• Will be using threads internally though

November 13, 2012

Page 32: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org32

Topics

November 13, 2012

What's up with The HDF Group?

Library Update

Tools update

HDF Java Products

Library development in the works

Other activities

Page 33: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org33

Tool activities in the works

• New tool – h5watch • Display changes to a dataset, metadata and raw

data• New tool – h5compare

• Rewritten and improved version of h5diff• Improved code quality and testing• Tools library: general purpose APIs for tools

• Tools library currently only for our developers• Want to make it public so that people can use it in

their products

November 13, 2012

Page 34: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org34

Topics

November 13, 2012

What's up with The HDF Group?

Library Update

Tools update

HDF Java Products

Library development in the works

Other activities

Page 35: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

HDF-Java 2.9 (Nov, ‘12)

• Maintenance release, no major new features• Built with:

• HDF4 2.8, HDF5 1.8.10, and Java 1.7• Many bug fixes and extended regression tests• New HDFView features:

• Added feature to show groups/attributes in creation order

• Exclude fill Values in data calculation• Added 'reload' option to quickly close and

reopen a file

November 13, 2012 35

Page 36: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org36

Topics

November 13, 2012

What's up with The HDF Group?

Library Update

Tools update

HDF Java Products

Library development in the works

Other activities

Page 37: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

HDF5 in the Future

“The best way to predict the future is to invent it.”

– Alan Kay

November 13, 2012 37

Page 38: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Plans, Guesses & Speculations: HPC

• Improve Parallel I/O Performance:• Continue to improve our use of MPI and

parallel file system features• Reduce # of I/O accesses for metadata access• Integrate with in-situ/in-transit frameworks• Support asynchronous parallel I/O• Support Single-Write/Multiple-Reader (SWMR)

access in parallel

November 13, 2012 38

Page 39: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Plans, Guesses & Speculations: Safety

• Improve Journaled HDF5 File Access:• Journal raw data operations• Allow multi-operation journal transactions to be

created by applications• Support fully asynchronous journal operations• Enable journaling for Parallel HDF5

November 13, 2012 39

Page 40: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Plans, Guesses & Speculations: Threadsafety

• Focus on asynchronous I/O, instead of improving multi-threaded concurrency of HDF5 library:• Library currently thread-safe, but not concurrent• Instead of improving concurrency, focus on

heavily leveraging asynchronous I/O• Use internal multi-threading where possible

November 13, 2012 40

Page 41: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Plans, Guesses & Speculations: Grab-bag

• Improve data model:• Shared dataspaces• Attributes on dataspaces and datatypes

• Improve raw data chunk cache implementation• More efficient storage and I/O of variable-

length data, including compression

November 13, 2012 41

Page 43: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Exascale FastForward Research

• Whamcloud*, EMC & The HDF Group were recently awarded a contract for exascale storage research and prototyping:• http://www.hpcwire.com/hpcwire/2012-07-12/doe

_primes_pump_for_exascale_supercomputers.html

• Using HDF5 data model and interface as top layer of next generation storage system for future exascale systems

• Laundry list of new features to prototype in HDF5• Pointer datatypes, asynchronous I/O,

transactions, end-to-end consistency checking, query/index features, python wrappers, etc.

November 13, 2012 43

* - Intel, now

Page 44: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Autotuning and Performance Tracing

• Why?• Because the dominant I/O support request at

NERSC is poor I/O performance, many/most of which can be solved by enabling Lustre striping, or tuning another I/O parameter

• Scientists shouldn’t have to figure this stuff out!• Two Areas of Focus:

• Evaluate techniques for autotuning HPC application I/O• File system, MPI, HDF5

• Record and Replay HDF5 I/O operations

November 13, 2012 44

Page 45: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Autotuning HPC I/O

• Goal: Avoid tuning each application to each machine and file system• Create I/O autotuner library that can inject “optimal”

parameters for I/O operations on a given system• Apply to precompiled application binaries

• Application can be dynamically linked with I/O autotuning library

• No changes to application or HDF5 library• Tested with several HPC applications already:

• VPIC, GCRM, Vorpal• Up to 16x performance improvement, compared to

system default settings• See poster for more details!

November 13, 2012 45

Page 46: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Recording and Replaying HDF5

• Goal:• Extract an “I/O Kernel” from application, without

examining application code• Method:

• Dynamically link library at run-time to record all HDF5 calls and parameters in “replay file”

• Create parallel replay tool that uses recording to replay HDF5 operations

November 13, 2012 46

Page 47: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Recording and Replaying HDF5

• Benefits:• Easy to create I/O kernel from any application,

even one with no source code• Can use replay file as accurate application I/O

benchmark• Can move replay file to another system for

comparison• Can autotune from replay, instead of application

• Challenges:• Serializing complex parameters to HDF5• Replay files can be very large• Accurate parallel replay is difficult

November 13, 2012 47

Page 48: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

Community Outreach

• Work with HPC community to serve their needs:• Focus on high-profile applications or “I/O

kernels” and remove HDF5 bottlenecks discovered

You tell us!

November 13, 2012 48

Page 49: Www.hdfgroup.org The HDF Group HDF5: State of the Union Quincey Koziol koziol@hdfgroup.org The HDF Group November 13, 20121.

www.hdfgroup.org

The HDF Group

49

Thank You!

Questions & Comments?

November 13, 2012