1
Cyberinfrastructure to promote Model - Data Integration
Robert Cook, Yaxing Wei, and Suresh S. VannanOak Ridge National Laboratory
Presented at theModel-Data Fusion Workshop
NACP Albuquerque, NMFebruary 5, 2013
2
Data Management Vision Researchers spend more time doing science and less time doing data management• Free Data Users from the productivity losses associated with
- lack of a central clearinghouse,
- incompatible formats, units, parameter names
- unwieldy file sizes, and large non-aggregated collections
• For those making observations, a Data System that enables– Planning, collection, documenting, and quality-assuring data, production of
complete and clear metadata, including QA / QC information, and data provenance
• For Modelers, a Data System that enables – Discovering, accessing, browsing, and comparing data with standard tools (GIS,
visualization/analysis systems) without concern for format, location, or volume
– grid/distributed/cloud computing
3
ORNL DAAC• Part of NASA’s Earth Observing System Data & Information System• Terrestrial Ecology and Biogeochemical Dynamics Data
Tools for Discovery, Access, Extraction, and VisualizationData Holdings (>900 products)
• NASA Field Campaigns• Land Validation (remote sensing)• Global and Regional Spatial Data• Terrestrial Biogeochem. Model Code
Fire
http://daac.ornl.gov
4
Data Assimilation using Web services
• Programmatically access MODIS and Daymet data without downloading full data files
MODIS Web Service
Script
Tristan Quaife, University of Exeter
Daymet Web Service Script
MODIS Tiles
Daymet Tiles
• SiB3• LoTEC• Can_IBIS• ORCHIDEE• LPJwsl• TECO
Terrestrial Biosphere Models
5
• What are the magnitudes and spatial distribution of carbon sources and sinks, and their uncertainties?
• What is the spatial pattern and magnitude of interannual variation in carbon fluxes?
• Are the various observations and modeling estimates of carbon fluxes consistent with each other - and if not, why?
North American Carbon Program Synthesis Framework
6
Goal:Provide data management support for modeling and synthesis activities
Activities:1. Coordinate data management activities with
NACP modelers and synthesis groups;2. Prepare and distribute model input data;3. Provide data management support for model
outputs;4. Provide tools for accessing, subsetting and
visualization;5. Provide data packages to evaluate model
output; and6. Support synthesis activities, including data
support for workshops.
Net Ecosystem Exchange 2000 - 2006
Hayes et al 2012.
http://nacp.ornl.gov/index.shtml
7
Users can access observational data and convert to their specified format, spatial resolution, spatial extent, and temporal extent.
Pilot Study: Integrate Observations with Models using “Access Broker”
Original Observational
DataFTP/HTTP/…
SCRIP (regrid)
Data
Process
Customized Observational
Data Request for Data
8
Original MODIS Data
MODIS Web Service
Model-Data Comparison Framework
Data Assimilation Framework
Stefano Nativi et al.
9
Data and Information Managementfor Model-Data Integration
• Goal is to ensure data, products, information, and tools required to address science questions are available in harmonized forms when needed
• Develop data management capability that– Reflects the needs of the user community, – Is created in a reasonable time-frame, and– Is universally accepted as a value-added capability to
the those doing work
10
Environmental Observations and Modeling
Observations & Experiments
Communication among data managers, those making the measurements / experimentalists, and modelers is critical
Data Center
Ecosystem Models
11
Characteristics of the Data System (1)
• Dedicated financial support for data management is essential
• Close coordination between the data group(s) and the producers (experimentalists) and users (modelers) of the data products
• Based on a data management plan and a data policy• Integrated system that delivers a suite of diverse products• Establish standards (file, workflow, network) and promote
interoperability • Processes to assure and document data quality to allow
proper interpretation and use
12
• Facilitate rapid exchange of data, products, and information; rapid exchange of large volume data
• Promote the use of best practices to prepare and document data to share and archive
• Make efficient use of existing data management infrastructure and resources
• Ensure that finalized data and associated documentation are transferred to an appropriate archive
• Make numerical models (source code) and description of the models available, along with model parameters and example input and output data (Thornton et al 2005)
Characteristics of the Data System (2)
Top Related