Post on 14-Feb-2016
description
1
Gridded Data Sub-setting Services through the RDA at
NCAR
Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak
3
Gridded Data Sub-setting Services Through the RDA at NCAR
• Research Data Archive (RDA) Overview• Problem Background• Required Infrastructure• Current Services• Future Directions
4
RDA Overview• Total archive volume over 1.3 PB• 8000+ unique users annually
Meteorological and Oceanographic Observations
Operational and Reanalysis model outputs
Remote Sensing Observations
Topography/Bathymetry, Vegetation, Land Use
5
Problem Background
NNR (1996)
ERA-40 (2003)
NARR (2004)
JRA-25 (2006)
ERA-I (2011)
CFSR (2011)
0
10
20
30
40
50
60
70
80
Reanalysis Data Volume
Volu
me
(TB
)
Dat
a Vo
lum
e
6
Problem Background• Large computational/storage resources needed
– Store data– Extract desired data from large grids/files– Convert data to desirable format(s)
Scientific data centers have these resources
Individual researchers generally don’t
7
Problem Background• Goals
– Make data more accessible and easier to use for individual researchers• Reasonable access volumes• Desired data formats• User defined parameters/grids
• Researchers stay focused on research
8
Required Infrastructure
Powerful Computing
NCAR HPC/DAV
Large Disk Storage (500 TB)
Rich and Detailed Metadata Databases
(RDADB)
Generalized Software Tools-Control system (RDAMS)-Sub-setting-Format conversion
Web InterfaceCommand Line
Interface
9
Required Infrastructure• Rich Metadata Databases (key ingredient)
Metadata DB
File attribute metadata:Name, Dataset, Location,
Format
File content metadata:T(C,D,T,L,L)
RH(C,D,T,L,L)Vort(C,D,T,L,L)Vis(C,D,T,L,L)
PcpR(C,D,T,L,L)
Drive Interfaces
Support Efficient Backend Processing
Provide Scalability
10
Current Services• Sub-setting available on 13 datasets
– ERA-I, CFSR, Operational Model, EaSM– Also available on select observation sets
• Sub-setting options– Parameter selection– Spatial region selection (limited availability)
• Available output formats– Native GRIB formats– NetCDF format
11
Current Services
12
Current Services
• Sub-set requests• Processed in delayed mode• User notified by email when request is ready• Download data via server provided wget
scripts
13
Current Services
Aug Sep Oct Nov Dec0
200
400
600
800
1000
Number of Unique Users and Requests for 2011 Gridded Sub-Sets
#UniqueUsers#Requests
Month (2011)
Cou
nt
14
Current Services
Aug Sep Oct Nov Dec0
50
100
150
200
250
300
Volume of Data Accessed and Output for 2011 Gridded Sub-Set Requests
Data AccessedData Output
Month (2011)
Volu
me
(TB
)
15
Future Directions• Spatial Interpolation• Faster Request Processing (NWSC)• Include More RDA Datasets• Improved Access Portals• Additional Output Formats• Web Service Access
16
Summary • Data Analysis Research Challenges
– Large and Growing Data Volumes– Numerous Formats
• RDA – Supply “User Friendly” Data– Parameter and Spatial Sub-Setting– Format Conversion– Improved and Additional Services
http://dss.ucar.eduschuster@ucar.edu