NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA...

24
Luca Cinquini for the Earth System Grid NIEeS Workshop, Cambridge (UK), Sep 2002 METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR) for the Earth System Grid collaboration www.earthsystemgrid.org

Transcript of NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA...

Page 1: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

METADATA DEVELOPMENTfor the

EARTH SYSTEM GRID

Luca Cinquini(SCD/NCAR)

for theEarth System Grid collaboration

www.earthsystemgrid.org

Page 2: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

Metadata-centric view of ESG services

METADATASERVICES

METADATASERVICES

USER AUTHENTICATIONAND AUTHORIZATION

USER AUTHENTICATIONAND AUTHORIZATION

ACCESS AND AUTHORIZATION

METADATA

DATA TRANSPORTDATA TRANSPORT

LOCATIONMETADATA

SYSTEM MONITORINGAND CONTROL

SYSTEM MONITORINGAND CONTROL

LOGGINGMETADATA

DATA SEARCH & DISCOVERYDATA SEARCH & DISCOVERY

CONTENT METADATA

ANNOTATION & HISTORYMETADATA

DATA ANALYSIS & VISUALIZATION

DATA ANALYSIS & VISUALIZATION

AGGREGATION METADATA

DATA BROWSINGDATA BROWSING

CATALOGUINGMETADATA

Page 3: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002ESG Metadata Services

Goal Functionality

• Services responsible for the creation, management and utilization of metadata associated with geophysical data

• Functionality: Metadata extraction (automatically, from files in different

format and according to various possible metadata standards) Metadata conversion (from one standard to another) Metadata aggregation (associated with data collections) Metadata annotation (manually by humans) Metadata validation (basic quality control of metadata) Registration (population of metadata holdings) Harvesting (combination of metadata from different

repositories) Metadata browsing and display (for humans) Search and discovery of data through metadata Metadata query (by agents or clients for data analysis and

visualization)

Page 4: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

ESG Metadata Services Architecture

3-layers architecture:• Metadata Holdings: physical metadata content, stored in a

system of relational and/or XML native databases• Core Metadata Services: modules and libraries that

mediates all access to the Metadata Holdings (insert, update, delete, query) – expose an API that hides the specific implementation of the databases and query languages

• High Level Metadata Services: system of applications that make use of the Core Metadata Services to fulfill a specific atomic functionality – will be invoked by external clients

Page 5: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

METADATAEXTRACTION

METADATAEXTRACTION

METADATADISPLAY

METADATADISPLAY

METADATABROWSING

METADATABROWSING

METADATASEARCH, QUERY

& DISCOVERY

METADATASEARCH, QUERY

& DISCOVERY

ESG CLIENTS API & USER INTERFACES

ReplicaLocationServices

MetadataCataloguing

ServicesXML DB THREDDS

catalogs

METADATA HOLDINGS

METADATAANNOTATION

METADATAANNOTATION

METADATAVALIDATION

METADATAVALIDATION

METADATA ACCESS(update, insert, delete, query)

METADATA ACCESS(update, insert, delete, query)

SERVICE TRANSLATIONLIBRARY

SERVICE TRANSLATIONLIBRARY

CORE METADATA SERVICES

METADATAAGGREGATION

METADATAAGGREGATION

METADATACONVERSION

METADATACONVERSION

METADATA & DATA REGISTRATION

METADATA & DATA REGISTRATION

PUBLISHINGPUBLISHING

HIGH LEVEL METADATA SERVICES

SEARCH & DISCOVERYSEARCH & DISCOVERYADMINISTRATIONADMINISTRATION BROWSING & DISPLAYBROWSING & DISPLAY

ANALYSIS & VISUALIZATIONANALYSIS & VISUALIZATION

Page 6: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

ESG Metadata Services Current Development

Currently developing or evaluating the following technologies :• Replica Location Services : database to manage and index

multiple copies of the same data stored at different centers• Metadata Cataloguing Services : relational database to

store scientific metadata (developed for high energy physics and geophysical data)

• XML native databases (Apache Xindice)• THREDDS (by Unidata ) : system for hierarchical

cataloguing of datasets and associated metadata (http://www.unidata.ucar.edu/projects/THREDDS)

• NcML (Netcdf Markup Language) : XML language for encoding of metadata associated with data in netcdf format (and more…)

Page 7: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

ESG Metadata Policy

• Premise : geophysical sciences are too broad and complex to impose a single, omnicomprehensive metadata standard to capture the relevant information for all datasets, projects, instruments, scientists

• ESG will not mandate use of any metadata schema or convention

• Allow data providers, scientists to use their metadata of choice, provide technologies and tools to store and access metadata through common services (MCS, XML DB, THREDDS catalogs)

• Encourage development and reuse of a limited set of domain-specific standards (climate data, radar data, airborn instrumentation etc), encoding in XML (according to community developed schemas), interoperability and combination of schemas (XML namespaces, RDF, ontologies)

Page 8: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

Netcdf Markup Language (NcML)Work in progress, collaboration between

ESG, Unidata and the University of Florence

•Definition: XML representation for data following the netcdf model

•Features:

Express metadata associated with data in netcdf format

Definition of coordinates and coordinate systems (capturing netcdf conventions)

Aggregation/subsetting

Definition of new data, restracturing of existing data (virtual datasets)

Interoperability with openGIS and ISO

Also, possibly extend the model to other data formats (HDF, Grib etc.)

•Strategy: develop a system of XML schemas each covering a specific domain (advantages: more flexible, mantainable and extensible). Keep it simple!

Page 9: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

NcML: schemas architecture

Netcdf core(generic netcdf data)

Netcdf core(generic netcdf data)

Netcdf Coordinate Systems

(netcdf conventions for coord, coord systems)

Netcdf Coordinate Systems

(netcdf conventions for coord, coord systems)

Netcdf (virtual) dataset(operations on data)

Netcdf (virtual) dataset(operations on data)

Netcdf Geo Coordinate Systems(geo-referenced coord systems)

Netcdf Geo Coordinate Systems(geo-referenced coord systems)

openGIS-ISO Reference Coordinate

Systems

openGIS-ISO Reference Coordinate

Systems

Other schemas for openGIS-ISO

Other schemas for openGIS-ISO

Page 10: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002NcML: core schema

• For XML encoding of metadata (and data) of any generic netcdf file• Objects: Netcdf, Dimension, Variable, Attribute• Beta version reference implementation as Java library (http://www.scd.ucar.edu/vets/luca/netcdf/extract_metadata.htm)

Page 11: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002Example: two-dimensional latitude,

longitudecoordinate variables (CDL)

• dimensions: xc = 128; yc = 64; lev = 18;

• variables: float T(lev,yc,xc);

T:long_name = "temperature"; T:units = "K"; T:coordinates = "lon lat";

float xc(xc); xc:long_name = "x-coordinate in Cartesian system"; xc:units =

"m"; float yc(yc);

yc:long_name = "y-coordinate in Cartesian system"; yc:units = "m";

float lev(lev); lev:long_name = “altitude levels"; lev:units = “km";

float lon(yc,xc); lon:long_name = "longitude"; lon:units = "degrees_east";

float lat(yc,xc); lat:long_name = "latitude"; lat:units = "degrees_north";

Page 12: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002NcML core schema

<?xml version="1.0" encoding="UTF-8"?><nc:netcdf xmlns:nc="http://www.ucar.edu/schemas/netcdf"

uri="http://www.scd.ucar.edu/vets/luca/netcdf/example.nc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ucar.edu/schemas/netcdf http://www.ucar.edu/schemas/netcdf.xsd"><nc:dimension length="128" name="xc"/><nc:dimension length="64" name="yc"/><nc:dimension length="18" name="lev"/>

<nc:variable name="xc" shape="xc" type="float"> <nc:attribute name="long_name" type="string" value="x cartesian coord"/>

<nc:attribute name="units" type="string" value="m"/></nc:variable><nc:variable name="yc" shape="yc" type="float">

<nc:attribute name="long_name" type="string" value="y cartesian coord"/>

<nc:attribute name="units" type="string" value="m"/></nc:variable>

Page 13: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002NcML core schema

<nc:variable name="lev" shape="lev" type="float"><nc:attribute name="long_name" type="string"

value="altitude levels"/><nc:attribute name="units" type="string" value="km"/>

</nc:variable><nc:variable name="lon" shape="yc xc" type="float">

<nc:attribute name="units" type="string" value="degrees_east"/></nc:variable><nc:variable name="lat" shape="yc xc" type="float">

<nc:attribute name="units" type="string" value="degrees_north"/></nc:variable>

<nc:variable name="T" shape="lev yc xc" type="float"><nc:attribute name="long_name" type="string"

value="temperature"/><nc:attribute name="units" type="string" value="K"/><nc:attribute name="coordinates" type="string" value="lat

lon"/></nc:variable>

</nc:netcdf>

Page 14: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

NcML: coordinate systems schema

Generalization and unification of netcdf conventions for coordinates and coordinate systems

Page 15: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002Coordinate Systems extension to NcML

<nc:coordinateVariable name="xc" shape="xc" type="float"><nc:attribute name="long_name" type="string" value=“x cartesian

coord"/><nc:attribute name="units" type="string" value="m"/>

</nc:coordinateVariable><nc:coordinateVariable name="yc" shape="yc" type="float">

<nc:attribute name="long_name" type="string" value=“y cartesian coord"/>

<nc:attribute name="units" type="string" value="m"/></nc:coordinateVariable><nc:coordinateVariable name="lev" shape="lev" type="float">

<nc:attribute name="long_name" type="string" value="altitude levels"/>

<nc:attribute name="units" type="string" value="km"/></nc:coordinateVariable><nc:coordinateVariable name="lon" shape="yc xc" type="float">

<nc:attribute name="units" type="string" value="degrees_east"/></nc:coordinateVariable><nc:coordinateVariable name="lat" shape="yc xc" type="float">

<nc:attribute name="units" type="string" value="degrees_north"/></nc:coordinateVariable>

Page 16: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002Coordinate Systems extension to NcML

<nc:coordinateSystem name=“implicit"><nc:coordinateAxis ref=“xc” /><nc:coordinateAxis ref=“yc” /><nc:coordinateAxis ref=“lev” />

</nc:coordinateSystem><nc:coordinateVariable name=“geo">

<nc:coordinateAxis ref=“lon” /><nc:coordinateAxis ref=“lat” /><nc:coordinateAxis ref=“lev” />

</nc:coordinateVariable>

<nc:variable name="T" shape="lev yc xc" type="float“ coordinateSystems=“implicit geo”><nc:attribute name="long_name" type="string"

value="temperature"/><nc:attribute name="units" type="string" value="K"/><nc:attribute name="coordinates" type="string" value="lat

lon"/></nc:variable>

Page 17: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

Coordinate Systems extension to NcML

<nc:variable name=“ps" shape="lev yc xc" type="float“ coordinateSystems=“implicit geo”>

<nc:attribute name="long_name" type="string" value=“pressure"/>

<nc:attribute name="units" type="string" value=“Pa"/><nc:attribute name="coordinates" type="string" value="lat

lon"/></nc:variable><nc:coordinateSystem name=“pressure">

<nc:coordinateAxis ref=“lon” /><nc:coordinateAxis ref=“lat” /><nc:coordinateAxis ref=“pressure” />

</nc:coordinateSystem><nc:variable name="T" shape="lev yc xc" type="float”

coordinateSystems=“implicit geo pressure”><nc:attribute name="long_name" type="string"

value="temperature"/><nc:attribute name="units" type="string" value="K"/><nc:attribute name="coordinates" type="string" value="lat

lon"/></nc:variable>

Page 18: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

Aggregation in NcML

• XML naturally suited to represent aggregation of netcdf data• Rules for representing an aggregation hierarchy:

Allow netcdf nodes to contain other netcdf nodes Factor out (i.e. in the parent netcdf node) all common

structure between two nodes Structure defined in a netcdf node overrides that defined in

a parent netcdf node

Page 19: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002NcML aggregation over existing

coordinate (time)

<nc:netcdf><nc:dimension name="lat" length="64" /><nc:dimension name="lon" length="128" /><nc:dimension name="time" length="6" /><nc:variable name="temperature" shape="lat lon time"><nc:variable name="humidity" shape="lat lon time"><nc:netcdf uri="file1.nc">

<nc:dimension name="time" length=“3" /> <nc:coordinateVariable name="time" shape="time">

      <nc:values separator=" ">10 20 30</values>   </nc:coordinateVariable></nc:netcdf><nc:netcdf uri="file2.nc">

<nc:dimension name="time" length=“3" /> <nc:coordinateVariable name="time" shape="time">

      <nc:values separator=" ">40 50 60</values>   </nc:coordinateVariable></nc:netcdf>

</nc:netcdf>

Page 20: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002NcML aggregation over variables

<nc:netcdf>

 <nc:dimension name="lat" length="64" /> <nc:dimension name="lon" length="128" />

 <nc:netcdf uri="file1.nc">    <nc:variable name="temperature" shape="lat lon"> </nc:netcdf> <nc:netcdf uri="file2.nc">    <nc:variable name="humidity" shape="lat lon"> </nc:netcdf>

</nc:netcdf>

Page 21: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002NcML double aggregation

<nc:netcdf> <nc:dimension name="lat" length="64" /> <nc:dimension name="lon" length="128" /> <nc:dimension name=“time" length=“6" />

 <nc:netcdf uri=“temp/”><nc:variable name="temperature" shape="lat lon time"><nc:netcdf uri=“file1.nc>

<nc:dimension name="time" length=“3" /> <nc:coordinateVariable name="time"

shape="time">      <nc:values separator=" ">10 20 30</values>   </nc:coordinateVariable>

<nc:netcdf><nc:netcdf uri=“file2.nc>

<nc:dimension name="time" length=“3" /> <nc:coordinateVariable name="time"

shape="time">      <nc:values separator=" ">40 50 60</values>   </nc:coordinateVariable>

<nc:netcdf>

 </nc:netcdf>

Page 22: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002NcML double aggregation

<nc:netcdf uri=“humid/”><nc:variable name=“humidity" shape="lat lon time"><nc:netcdf uri=“file1.nc>

<nc:dimension name="time" length=“3" /> <nc:coordinateVariable name="time"

shape="time">      <nc:values separator=" ">10 20 30</values>   </nc:coordinateVariable>

<nc:netcdf><nc:netcdf uri=“file2.nc>

<nc:dimension name="time" length=“3" /> <nc:coordinateVariable name="time"

shape="time">      <nc:values separator=" ">40 50 60</values>   </nc:coordinateVariable>

<nc:netcdf> </nc:netcdf>

</nc:netcdf>

Page 23: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

Other NcML planned development

• Subsetting of data• Compute derived data• Extensions for interoperability with openGIS and ISO standards :

Establish a bond between Atmospheric Research and Geo-spatial communities

Allows import of NcML data into GIS tools, export of GIS data in netcdf format

Page 24: NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)

Luca Cinquini for the Earth System Grid

NIEeS Workshop, Cambridge (UK), Sep 2002

Conclusions

• ESG is very active in the research and development of metadata schemas, services and technologies

• We are very interested in collaborating with other projects and institutions to the definition and adoption of metadata standards for the geosciences and to work at interoperability technologies among standards