Key Components of a Successful Earth Science Subsetter Architecture

1
, Key Components of a Successful Earth Science Subsetter Architecture ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research Center is responsible for the ingest, archive, and distribution of NASA Earth Science data in the areas of radiation budget, clouds, aerosols, and tropospheric chemistry. The ASDC specializes in atmospheric data that is important to understanding the causes and processes of global climate change and the consequences of human activities on the climate. The ASDC currently supports more than 44 projects and has over 1,700 archived data sets, which increase daily. ASDC customers include scientists, researchers, federal, state, and local governments, academia, industry, and application users, the remote sensing JAVA HDF Subsetter Strategy & Innovation Search and Subsetter Application Interface & Framwork Jennifer Perez, Walter Baskin, & Peter Piatko NASA Langley Research Center, Hampton, VA Goal #4 The ASDC will continue to foster innovation by actively assessing emerging technologies and their applicability to existing and projected customer needs and requirements in order to mitigate gaps in capability Conclusion Key Components Since the unveiling of a new CALIPSO Search and Subset Application at the 2010 A-Train Symposium by the Atmospheric Science Data Center (ASDC) and CALIPSO science team, Atmospheric Scientists have responded enthusiastically. Congruent to this goal, the template of this subsetter application architecture has since been applied to the distribution of Level 2 Satellite data granules from Clouds and the Earth's Radiant Energy System (CERES) SSF swath datasets and Tropospheric Emission Spectrometer (TES) datasets. This permits science data users to employ new tools to rapidly locate, subset, and order specific dataset parameters tailored to their requirements. The 2013 ASDC Strategic Plan serves as a mission- focused plan with six defined goals, each with supporting objectives and tasks for implementation that emphasize the vision and support the mission and values of the ASDC. There are four key components of successful earth science subsetter architecture. These are: 1.Interactive user interface that is tightly integrated with a PostgrSQL-PostGIS metadata database specifically tailored for the Science Product data granules to be subsetted. 2.Scalable workflow framework for scheduling potentially thousands of subset processes across a configurable number of cluster processing nodes. 3.Efficient subset application with high-speed access to archived data granules. 4.Robust Metadata mining capability focused on obtaining high resolution spatial and temporal metadata. High Resolution Spatial Metadata Mined Directly From Archived HDF Data Granules The original CALIPSO Level 1 LIDAR spatial metadata is defined by a LineString consisting of ten points. The Search and Subset Application uses LineString metadata constructed by approximately 50 points, greatly increasing the accuracy of two dimensional bounding box queries near the (Green = metadata used by new Search and Subset Applications | Red = original metadata used in legacy data access applications) Metadata currently provided for one hour CERES Level 2 SSF granules assumed full coverage of the Earth within 20 degrees of the poles and stepped along the granule footprint boundaries at ten degree longitude intervals. A newer metadata mining technique directly detects field of view positions of the observations along the edges of the granule footprint and implements a Douglas-Peucker simplification on the resulting polygon. The updated hourly footprint polygon contains the In the ECS archive system, Level2 Tropospheric Emission Spectrometer (TES) Ozone metadata assumes global coverage for each daily granule. The ASDC is currently working with the TES Science Team on a prototype search and subset application. The metadata database used in this prototype stores the observation location for every data entry in the granule as an array of points. Bounding Box queries for observations over the entire mission consistently return results in less than five seconds. This ability to obtain The CALIPSO Search Subsetter User Interface automatically updates and displays the number of granules meeting the spatial and temporal constraints as the user changes them. This dynamic feedback provides a very positive user experience. New subset interfaces under development for CERES and TES datasets leverage this functionality. Details of the resulting data granules are displayed on the ‘Confirm Request’ page. Users are able to download a list of granules that meet their search criteria, browse profile plots for each resulting granule, or submit an order to subset the granules based on their spatial-temporal inputs. The CALPSO Science Team provides browse images for their LIDAR data products. These profiles are easily accessed through links under each granule result on the ‘Confirm Request’ page. Search and Subset Application Interface Subsetter Workflow Framework Node2 Metadata Database Web User Interface Node1 FTP Site Web Server SciFlo - Univa Grid Engine Processing node running JAVA HDF Subsetter Node2 Node The Subsetter Framework is a generic framework for subset processing. It uses SciFlo as its workflow engine to drive the processing, and Univa Grid Engine as its resource scheduler, so that the subsetting can be scaled across a set of computational nodes. The ASDC developed dedicated subsetters for the CALIPSO, CERES, and TES missions leveraging the HDF Group’s JAVA JNI libraries used in the open source HDFView application. These subsetters are deployed on Univa Grid Engine processing nodes and are managed by the Subsetter Workflow Framework. The subsetters have the capability to return subsetted files in NetCDF format. Types of granules subsetted from each data provider: CALIPSO: HDF4 CERES: HDF4 TES: HDF-EOS (HDF5 out) Inspection of a CERES ES8 subset result file in the HDFView application The ASDC subsetters leverage the Common Object Package and use specific methods in the Java HDF and HDF5 JNI Interfaces to directly access lower level functions in the C libraries. (source of diagram: http://www.hdfgroup.org/hdf-java-html/hdf-object/) New HDF subset and file access capabilities recently developed through ASDC’s collaboration with data providers give science data users the ability to quickly subset and mine data from large archived files, and has set the stage to directly stream desired data directly from archived files to a client’s visualization or analysis applications. Future Work for Improving ASDC’s Subset and Science Data access Machine-to-Machine subset interfaces Very high granularity in spatial/temporal metadata Geospatial plots of subsetted dataset query results Real-time browse images of dataset query results

description

Key Components of a Successful Earth Science Subsetter Architecture. Jennifer Perez, Walter Baskin, & Peter Piatko NASA Langley Research Center, Hampton, VA. - PowerPoint PPT Presentation

Transcript of Key Components of a Successful Earth Science Subsetter Architecture

Page 1: Key Components of a Successful Earth Science  Subsetter  Architecture

,

Key Components of a Successful Earth Science Subsetter Architecture Key Components of a Successful Earth Science Subsetter Architecture

ASDC IntroductionThe Atmospheric Science Data Center (ASDC) at NASA Langley Research Center is responsible for the ingest, archive, and distribution of NASA Earth Science data in the areas of radiation budget, clouds, aerosols, and tropospheric chemistry. The ASDC specializes in atmospheric data that is important to understanding the causes and processes of global climate change and the consequences of human activities on the climate. The ASDC currently supports more than 44 projects and has over 1,700 archived data sets, which increase daily. ASDC customers include scientists, researchers, federal, state, and local governments, academia, industry, and application users, the remote sensing community, and the general public.

JAVA HDF Subsetter

Strategy & Innovation

Search and Subsetter Application Interface & Framwork

Jennifer Perez, Walter Baskin, & Peter PiatkoNASA Langley Research Center, Hampton, VA

Goal #4The ASDC will continue to foster

innovation by actively assessing emerging technologies and their applicability to

existing and projected customer needs and requirements in order to mitigate gaps in

capability

Conclusion

Key Components

Since the unveiling of a new CALIPSO Search and Subset Application at the 2010 A-Train Symposium by the Atmospheric Science Data Center (ASDC) and CALIPSO science team, Atmospheric Scientists have responded enthusiastically. Congruent to this goal, the template of this subsetter application architecture has since been applied to the distribution of Level 2 Satellite data granules from Clouds and the Earth's Radiant Energy System (CERES) SSF swath datasets and Tropospheric Emission Spectrometer (TES) datasets. This permits science data users to employ new tools to rapidly locate, subset, and order specific dataset parameters tailored to their requirements.

The 2013 ASDC Strategic Plan serves as a mission-focused plan with six defined goals, each with supporting objectives and tasks for implementation that emphasize the vision and support the mission and values of the ASDC.

There are four key components of successful earth science subsetter architecture. These are:

1.Interactive user interface that is tightly integrated with a PostgrSQL-PostGIS metadata database specifically tailored for the Science Product data granules to be subsetted.

2.Scalable workflow framework for scheduling potentially thousands of subset processes across a configurable number of cluster processing nodes.

3.Efficient subset application with high-speed access to archived data granules.

4.Robust Metadata mining capability focused on obtaining high resolution spatial and temporal metadata.

High Resolution Spatial Metadata Mined Directly From Archived HDF Data Granules

The original CALIPSO Level 1 LIDAR spatial metadata is defined by a LineString consisting of ten points. The Search and Subset Application uses LineString metadata constructed by approximately 50 points, greatly increasing the accuracy of two dimensional bounding box queries near the poles.

(Green = metadata used by new Search and Subset Applications | Red = original metadata used in legacy data access applications)

Metadata currently provided for one hour CERES Level 2 SSF granules assumed full coverage of the Earth within 20 degrees of the poles and stepped along the granule footprint boundaries at ten degree longitude intervals. A newer metadata mining technique directly detects field of view positions of the observations along the edges of the granule footprint and implements a Douglas-Peucker simplification on the resulting polygon. The updated hourly footprint polygon contains the same number of points as the original metadata polygon, and is more accurate.

In the ECS archive system, Level2 Tropospheric Emission Spectrometer (TES) Ozone metadata assumes global coverage for each daily granule. The ASDC is currently working with the TES Science Team on a prototype search and subset application. The metadata database used in this prototype stores the observation location for every data entry in the granule as an array of points. Bounding Box queries for observations over the entire mission consistently return results in less than five seconds. This ability to obtain any observation over the life of the mission within a few seconds is unprecedented .

The CALIPSO Search Subsetter User Interface automatically updates and displays the number of granules meeting the spatial and temporal constraints as the user changes them. This dynamic feedback provides a very positive user experience. New subset interfaces under development for CERES and TES datasets leverage this functionality.

Details of the resulting data granules are displayed on the ‘Confirm Request’ page. Users are able to download a list of granules that meet their search criteria, browse profile plots for each resulting granule, or submit an order to subset the granules based on their spatial-temporal inputs.

The CALPSO Science Team provides browse images for their LIDAR data products. These profiles are easily accessed through links under each granule result on the ‘Confirm Request’ page.

Search and Subset Application Interface

Subsetter Workflow Framework Node2

MetadataDatabase

WebUser

Interface

Node1

FTP Site Web Server

SciFlo -Univa Grid

Engine

Processing node running JAVA HDF Subsetter

Node2

Node …

The Subsetter Framework is a generic framework for subset processing. It uses SciFlo as its workflow engine to drive the processing, and Univa Grid Engine as its resource scheduler, so that the subsetting can be scaled across a set of computational nodes.

The ASDC developed dedicated subsetters for the CALIPSO, CERES, and TES missions leveraging the HDF Group’s JAVA JNI libraries used in the open source HDFView application.

These subsetters are deployed on Univa Grid Engine processing nodes and are managed by the Subsetter Workflow Framework.

The subsetters have the capability to return subsetted files in NetCDF format.

Types of granules subsetted from each data provider:

CALIPSO: HDF4 CERES: HDF4 TES: HDF-EOS (HDF5 out)

Inspection of a CERES ES8 subset result file in the HDFView application

The ASDC subsetters leverage the Common Object Package and use specific methods in the Java HDF and HDF5 JNI Interfaces to directly access lower level functions in the C libraries.

(source of diagram: http://www.hdfgroup.org/hdf-java-html/hdf-object/)

New HDF subset and file access capabilities recently developed through ASDC’s collaboration with data providers give science data users the ability to quickly subset and mine data from large archived files, and has set the stage to directly stream desired data directly from archived files to a client’s visualization or analysis applications.

Future Work for Improving ASDC’s Subset and Science Data access

Machine-to-Machine subset interfaces

Very high granularity in spatial/temporal metadata

Geospatial plots of subsetted dataset query results

Real-time browse images of dataset query results