Download - Big Applications: Simulations, Models, Visualization, … Scientific data management for big computers and big data HDF5 (serial.

Transcript
Page 1: Big Applications: Simulations, Models, Visualization, … Scientific data management for big computers and big data  HDF5 (serial.

Big Applications: Simulations, Models, Visualization, …

Scientific data management for

big computers and big datahttp://hdf.ncsa.uiuc.edu/HDF5/

HDF5 (serial and/or parallel)

Parallel UDM

Software StacksApplications and readers, often customized for particular technical fields, enable users to create, manipulate, and view scientific and engineering data. With the support of intervening libraries, common interfaces, and HDF5, scientists and engineers in many fields are able to share data and software.

Specialized libraries and Common Interfaces use HDF5 layer for data management and often provide specialized metadata, context, and tools for data transformations and exchange.

The HDF5 layer provides many data management functions, including machine-independent storage of all datatypes, metadata describing datatypes, user-defined attributes, etc., sophisticated subsetting and subsampling capabilities.

Parallel HDF5 uses MPI-IO to provide parallel file system functionality and global file access.

SAF LibSheaf HDF-EOS

ReadersCommon Interfaces

Examples: Thermonuclear simulationsProduct modelingData mining tools

Visualization toolsClimate models

IDL

Storage

HDF5 virtual file layer (I/O drivers)

File on parallelfile systemFile

MPI I/O

Split metadata and raw data files

Split FilesStdio

User-defineddevice

Custom

?

Virtual File LayerThe HDF5 VFL, or virtual file layer, provides access to many different data input and output mechanisms. The standard (stdio), split, and MPI drivers read from and write to files on storage media; the stream driver reads and writes virtual files or streams of data.

The VFL also enables the creation of custom drivers, such as the stream driver, for specialized or user-defined situations.

Across the networkor to/from another

application or library

Stream

Representative Technical Fields* in which HDF5 Is Used

* from selected HDF5 download registrations, 15 October 2001 through 22 February 2002

ToolsVarious tools provide means of accessing HDF5 files, including the data, metadata, and hierarchical structure, without having to write new software.

HDFview, illustrated at the top of this image, displays the structure of a simple HDF5 file in one panel, raw data in another, and if appropriate an image or portion of it in a third. The larger image is the full, independently-generated gravity wave image.

HDF5 runs on almost all computers, including many parallel computers

LawrenceLivermoreNational Laboratory

National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-Champaign

Matter & the universe

Weather and climateA15-projector display wall (resolution 6400 x 3072) for viewing interactive applications and pre-computed animations at Lawrence Livermore National Laboratory.

August 24, 2001 August 24, 2002

Total Column Ozone (Dobson)

60 385 610

Answering big questions … involves big

data …

The ASCI White system contains 8,192 interconnected processors. Its 6.2 terabyte (trillion byte) The ASCI White system contains 8,192 interconnected processors. Its 6.2 terabyte (trillion byte) memory is about 97,000 times that of a 64-MB PC. Its 7,000 disk drives with 160 terabytes of storage memory is about 97,000 times that of a 64-MB PC. Its 7,000 disk drives with 160 terabytes of storage space has about 16,000 times the storage capacity of a desktop computer with a 10-GB hard disk. space has about 16,000 times the storage capacity of a desktop computer with a 10-GB hard disk.

on big computers.

Life and nature

How do we… Describe big data? Store it? Find it? Share it? Mine it? Move it into, out of, and between computers?A file format and software to describe, organize, store, share, and access big data:• Store large, complex scientific and engineering data sets• Retrieve complete data or partial data, easily and quickly• Enable parallel I/O, remote access, specialized access• A free, open standard developed by NCSA and the Lawrence Livermore, Sandia, and Los Alamos National Laboratories, with additional support from NASA

The name HDF5 derives from the term hierarchical data format. An HDF5 file is a hierarchically structured set of groups, datasets, and metadata.

Density gradient in the plasma causes the laser beam to self-focus and then split up into several "filaments".

Simulation of a NIF laser beam passing through a plasma.

Simulation by Bert Still, Visualization by Steve Langer, LLNL

HDF5 File Structure

Copyright 2002 by the Board of Trustees of the University of Illinois

HDF5

Courtesy of Arthur Mirin, LLNL

University of Illinois

NASA

National ScienceFoundation

DOE SciDAC

LANL LLNL, SNL TriLab NASA

Visualization courtesy of John Shalf, NERSC/Lawrence Berkeley Laboratory,using data computed on the NERSC SP2 by Dennis Pollney and the Cactus Team, Albert Einstein Institute

Aerospace

Agricultural research

Air traffic control

Aircraft emissions database

Applied mathematics

Astrophysics

Astrophysics / supernovae

Atmospheric chemistry

Atmospheric physics

Bioengineering

CEM Simulation

Climatology / hydrology

Computational fluid dynamics

Computational physics

Computational physics / education

Computational physics and computational

astrophysics

Computer modeling

Computer science

Data processing

Earth observation / atmospheric science

Earth science

Environment

Fast searching, sorting and retrieval

Film making special effects

Fluid mechanics

GIS

Geodetic Science

Geology

Gravitational physics

Hydrology

Information technology

Magnetic mass spectrometer development

Marine biology / ecology

Materials science

Meteorological data products

Meteorology

Microscopy

Molecular biology

Nano device simulation

Neutron scattering

Ocean color

Ocean remote sensing

Optics / optoelectronics

Petroleum engineering

Photonic band gap studies

Photonic crystals

Photonics

Post-fire erosion analysis

Protein crystallography, molecular modeling

Protostellar accretion discs

Remote sensing

SAR processing

Satellite / weather radar remote sensing

Satellite oceanography

Semiconductor process simulation

Software engineering, distributed systems

Space geodesy

Space physics

Surface water flow and sediment transport

Theoretical chemistry

Visualization

Volcanology

Water resources management

X-ray physics

Computers and operating systems include:

MacOS X

MS Windows

UNIX

Linux

FreeBSD

OSF1

HP-UX

IBM SP

SGI IRIX64

Cray T3E

Cray SV1

Sun Solaris

IA-32 and IA-64

Clusters and high performance computers

include:

ASCI Red

ASCI Blue Mountain

ASCI Blue Pacific

ASCI White

Various experimental clusters

Other HDF5 sponsors include