Unidata’s Common Data Model
description
Transcript of Unidata’s Common Data Model
Unidata’s Common Data Model
John Caron
Unidata/UCAR
Nov 2006
Goals / Overview
• Look at the landscape of scientific datasets from a few thousand feet up.
• What semantics are needed to make these useful?– georeferencing– specialized subsetting
What’s a Data Model?
• An Abstract Data Model describes data objects and what methods you can use on them.
• An API is the interface to the Data Model for a specific programming language
• A file format is a way to persist the objects in the Data Model.
• An Abstract Data Model removes the details of any particular API and the persistence format.
Coordinate Systems
Common Data Model Layers
Data Access
Scientific Datatypes
Grid
Point
Radial
Trajectory
Swath
Station Profile
NetcdfDataset
ApplicationScientific Datatypes
NetCDF-Java version 2.2 architecture
OPeNDAP
THREDDS
Catalog.xml NetCDF-3
HDF5
I/O service provider
GRIB
GINI
NIDS
NetcdfFile
NetCDF-4
…Nexrad
DMSP
CoordSystem Builder
Datatype Adapter
ADDE
NcMLNcML
NetCDF-4 andCommon Data Model(Data Access Layer)
I/O Service Provider Implementations
• General: NetCDF, HDF5, OPeNDAP
• Gridded: GRIB-1, GRIB-2
• Radar: NEXRAD level 2 and 3, DORADE
• Point: BUFR, ASCII
• Satellite: DMSP, GINI
• In development– NOAA: GOES (Knapp/Nelson), many others
Coordinate Systems needed
• NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems– so georeferencing not part of API– Need conventions to specify (eg CF-1,
COARDS, etc)
• Contrast GRIB, HDF-EOS, other specialized formats
NetCDF Coordinate Variables
dimensions:
lat = 64;
lon = 128;
variables:float lat(lat);
float lon(lon);double temperature(lat,lon);
Coordinate Variables
– One-dimension variable with same name as its dimension
– Strictly monotonic values
– No missing values
The coordinates of a point (i,j,k) is
{CV1(i), CV2(j), CV3(k)}
Limitations of 1D Coordinate Variables
• Non lat/lon horizontal grids:float temperature(y,x) float lat(y, x); float lon(y, x);
• Trajectory data:float NKoreaRadioactivity(pt); float lat(pt); float lon(pt); float altitude(pt); float time(pt)
General Coordinates in CF-1.0
float P(y,x); P:coordinates = “lat lon”; float lat(y, x);float lon(y, x);
float Sr90(pt); Sr90:coordinates = “lat lon altitude time”;
Coordinate Systems (abstract)
• A Coordinate System for a data variable is a set of Coordinate Variables2 such that the coordinates of the (i,j,k) data point is
{CV1(i,j,k),CV2(i,j,k),CV3(i,j,k),CV4(i,j,k)…}
previous was {CV1(i), CV2(j), CV3(k)}
• The dimensions of each Coordinate Variable must be a subset of the dimensions of the data variable.
Need Coordinate Axis Types
float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x);
float radialData(radial, gate) float distance(gate) float azimuth(radial) float elevation(radial) float time(radial)
The same??
float stationObs(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);
float trajectory(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);
Revised Coordinate Systems
1. Specify Coordinate Variables 2. Specify Coordinate Types
(time, lat, lon, projection x, y, height, pressure, z, radial, azimuth, elevation)
3. Specify connectivity (implicit or explicit) between data points– Implicit: Neighbors in index space are
(connected) neighbors in coordinate space. Allows efficient searching.
Gridded Data
Connected meansNeighbors in index space
are neighbors in coordinate space
float gridData(t,z,y,x); float time(t); // Time float y(y); // GeoX float x(x); // GeoY float z(t,z,y,x); // Height or Pressure
• Cartesian coordinates• All dimensions are connected
Coordinate Systems UML
Scientific Data Types
• Based on datasets Unidata is familiar with– APIs are evolving
• How are data points connected?• Intended to scale to large, multifile
collections• Intended to support “specialized queries”
– Space, Time
• Corresponding “standard” NetCDF file conventions
Gridded Data
float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x);
• Cartesian coordinates• All dimensions are connected• x, y, z, time• recently added runtime and ensemble• refactored into GridDatatype interface
GridDatatype methods
CoordinateAxis getTaxis();CoordinateAxis getXaxis();CoordinateAxis getYaxis();CoordinateAxis getZaxis();Projection getProjection();
int[] findXYindexFromCoord( double x_coord, double y_coord);
LatLonRect getLatLonBoundingBox();
Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)
Radial Data
radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial)
• Polar coordinates• All dimensions are connected• Not separate time dimension
Swath
swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ??
• lat/lon coordinates• not separate time dimension• all dimensions are connected
Point Observation Data
Structure { lat, lon, z, time; v1, v2, ... } obs( pt);
• Set of measurements at the same point in space and time• Point dimension not connected
float obs1(pt);float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);
PointObsDataset Methods
// Iterator<StructureData>
Iterator getData(
LatLonRect boundingBox,
Date start, Date end);
Time series Station Data
Structure { name; lat, lon, z; Structure{ time; v1, v2, ... } obs(*); // connected } stn(stn); // not connected
StationObs Methods
// List<Station>List getStations( LatLonRect boundingBox);
// Iterator<StructureData>Iterator getData( Station s, Date start, Date end);
Structure { name; Structure { lat, lon, z, time; v1, v2, ... } obs(*); // connected } traj(traj) // not connected
Trajectory Data
Structure { lat, lon, z, time; v1, v2, ... } obs(pt); // connected
• pt dimension is connected• Collection dimension not connected
Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2, ... } obs(*); // connected } loc(nloc); // not connected
Structure { name; lat, lon; Structure { time, Structure { z; v1, v2, ... } obs(*); // connected } time(*); // connected } stn(stn); // not connected
Unstructured Grid
float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);
• Pt dimension not connected• Looks the same as point data• Need to specify the connectivity explicitly
Data Types Summary
• Data access through a standard API
• Convenient georeferencing
• Specialized subsetting methods– Efficiency for large datasets
File Format#N
File Format#2
File Format#1
CDM
Visualization&Analysis
PayoffN + M instead of N * M things on your TODO List!
NetCDF file
OpenDAP Server
WCS Service
Web Service
HTTP Tomcat Server
THREDDS Data Server
Datasets
Catalog.xml
hostname.edu
THREDDS ServerApplication
NetCDF-Javalibrary
IDD Data
•OPeNDAP
•HTTPServer
•WCS
Next: DataType Aggregation
• Work at the CDM DataType level, know (some) data semantics
• Forecast Model Collection– Combine multiple model forecasts into single
dataset with two time dimensions– With NOAA/IOOS (Steve Hankin)
• Point/Station/Trajectory/Profile Data – Allow space/time queries, return nested sequences– Start from / standardize “Dapper conventions”
Forecast
Model
Collections
Conclusion
• Standardized Data Access in good shape– HDF5, NetCDF, OPeNDAP– Write an IOSP for proprietary formats (Java)
• But that’s not good enough!• To do:
– Standard representations of coordinate systems
– Classifications of data types, standard services for them