Gridded Data Sub-setting Services through the RDA at NCAR

Post on 14-Feb-2016

46 views 0 download

Tags:

description

Gridded Data Sub-setting Services through the RDA at NCAR. Doug Schuster, Steve Worley, Bob Dattore , Dave Stepaniak. Gridded Data Sub-setting Services Through the RDA at NCAR. Research Data Archive (RDA) Overview Problem Background Required Infrastructure Current Services - PowerPoint PPT Presentation

Transcript of Gridded Data Sub-setting Services through the RDA at NCAR

1

Gridded Data Sub-setting Services through the RDA at

NCAR

Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak

3

Gridded Data Sub-setting Services Through the RDA at NCAR

• Research Data Archive (RDA) Overview• Problem Background• Required Infrastructure• Current Services• Future Directions

4

RDA Overview• Total archive volume over 1.3 PB• 8000+ unique users annually

Meteorological and Oceanographic Observations

Operational and Reanalysis model outputs

Remote Sensing Observations

Topography/Bathymetry, Vegetation, Land Use

5

Problem Background

NNR (1996)

ERA-40 (2003)

NARR (2004)

JRA-25 (2006)

ERA-I (2011)

CFSR (2011)

0

10

20

30

40

50

60

70

80

Reanalysis Data Volume

Volu

me

(TB

)

Dat

a Vo

lum

e

6

Problem Background• Large computational/storage resources needed

– Store data– Extract desired data from large grids/files– Convert data to desirable format(s)

Scientific data centers have these resources

Individual researchers generally don’t

7

Problem Background• Goals

– Make data more accessible and easier to use for individual researchers• Reasonable access volumes• Desired data formats• User defined parameters/grids

• Researchers stay focused on research

8

Required Infrastructure

Powerful Computing

NCAR HPC/DAV

Large Disk Storage (500 TB)

Rich and Detailed Metadata Databases

(RDADB)

Generalized Software Tools-Control system (RDAMS)-Sub-setting-Format conversion

Web InterfaceCommand Line

Interface

9

Required Infrastructure• Rich Metadata Databases (key ingredient)

Metadata DB

File attribute metadata:Name, Dataset, Location,

Format

File content metadata:T(C,D,T,L,L)

RH(C,D,T,L,L)Vort(C,D,T,L,L)Vis(C,D,T,L,L)

PcpR(C,D,T,L,L)

Drive Interfaces

Support Efficient Backend Processing

Provide Scalability

10

Current Services• Sub-setting available on 13 datasets

– ERA-I, CFSR, Operational Model, EaSM– Also available on select observation sets

• Sub-setting options– Parameter selection– Spatial region selection (limited availability)

• Available output formats– Native GRIB formats– NetCDF format

11

Current Services

12

Current Services

• Sub-set requests• Processed in delayed mode• User notified by email when request is ready• Download data via server provided wget

scripts

13

Current Services

Aug Sep Oct Nov Dec0

200

400

600

800

1000

Number of Unique Users and Requests for 2011 Gridded Sub-Sets

#UniqueUsers#Requests

Month (2011)

Cou

nt

14

Current Services

Aug Sep Oct Nov Dec0

50

100

150

200

250

300

Volume of Data Accessed and Output for 2011 Gridded Sub-Set Requests

Data AccessedData Output

Month (2011)

Volu

me

(TB

)

15

Future Directions• Spatial Interpolation• Faster Request Processing (NWSC)• Include More RDA Datasets• Improved Access Portals• Additional Output Formats• Web Service Access

16

Summary • Data Analysis Research Challenges

– Large and Growing Data Volumes– Numerous Formats

• RDA – Supply “User Friendly” Data– Parameter and Spatial Sub-Setting– Format Conversion– Improved and Additional Services

http://dss.ucar.eduschuster@ucar.edu