INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

47
INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado

Transcript of INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Page 1: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

INTEGRATED DATASYSTEM FOR CRITICALZONE OBSERVATORIES

Mark Williams, University of Colorado

Page 2: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

The water information value ladderThe water information value ladder

Monitoring

Collation

Quality assurance

Aggregation

Analysis

Reporting

Forecasting

Distribution

Done poorly

Done poorly to moderately

Sometimes done well, by many groups,but could be vastly improved

>>> Incre

asing value >>>

Integration

Data >>> Inform

ation >>> In

sight

Slide Courtesy CSIRO, BOM, WMO, Ilya, Dozier

Page 3: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

QuickTime™ and a decompressor

are needed to see this picture.

Provenance and transparency

Page 4: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

QuickTime™ and a decompressor

are needed to see this picture.

Page 5: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZOs as platforms for researchIntegrating satellite & ground measurements with modeling

CZO measurements

provide the basis for

advances in multiple

Earth sciences

CZOs are DATA-RICH

places to develop &

test Earth system models

Page 6: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Challenges to CZO Data Management

Atmosphere

Biosphere

Hydrosphere

Lithosphere

Many Object & Data Types!•Diverse media•Sensor-based

• Stationary• Mobile• Spectra/photos

•Sample-based• Sub-samples• Preparations/Fractions

• Numeric & Categorical

Hillslope Catchment Watershed

Minutes

Decades

Millenia

Eons

Page 7: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Sample Fractions for Soil GeochemistryAdapting SESAR IGSN for CZO

EA-IRMSFTIRSA

EA-IRMSFTIR

EA-IRMSFTIR

Ziplock (~500g)Bulk soil

horizon or depth increment

Al Can (~70 g)For Gamma

Counting 137Cs

DRY SIEVE 2 mm

glass vial:<2mm finesdry sieved

(1) Pick out plant roots & detritus, rinse with DI water, oven dry,mill (SPEX?)

>2mm:

glass vial:plant detritus

milled

(2) Remaining pebbles & rocks,hard grind

glass vial:pebbles

hard ground

<2mm

ICP-MS after Li-borate fusion

XRD?

WET SIEVE, or DENSITY, or SETTLING

(with or without sonication)

glass vial:sand +

small detritus

glass vial:silt + clay

The choice here is important. Do we want

aggregates or not?

EA-IRMSFTIR

ICP-MS after Li-borate fusion

XRDCEC

SPEX mill

EA-IRMSFTIR

ICP-MS after Li-borate fusion

SPEX mill

SA

XRDCEC

SA

ExtractionsDithionite-Citrate extraction

Na pyrophosphate extractionAmmonium oxalate extraction

Christiana River CZO example

Page 8: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Overall Approach

• Do not reinvent the wheel! Build on– CUAHSI HIS, EarthChemDB, LTER, etc

• Consistent data presentation on web– Metadata– Data values

• Central data system for data discovery– Harvested by SDSC (pull system)

Page 9: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZO data principles and policies

• Each CZO will operate and be responsible for its own local data management system for collecting, organizing, quality controlling and publishing data through its web site.

– Different philosophy than CUAHSI ODM – Each CZO is master of it’s own data• We don’t care what goes on under the hood• Each site uses it’s own protocols, data bases, etc• Allows CZO to honor site legacy data

Page 10: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZO data principles and policies• Each CZO publish’s its data on the web in ascii

format with sufficient metadata so that the data can be unambiguously interpreted

• Metadata follows a proscribed format– Data managers just need rules to follow

• Easy to harvest by central portal• Makes it simple at the site level so scientists

comply– Addresses the chokepoint that is getting

data/metadata from the scientists to data managers

Page 11: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Data Management Team

• David Tarboton, Utah State. PI on the CUAHSI Hydrologic Information System (HIS)

• Kerstin Lehnert, Columbia. PI on EarthChemDB• Ilya Zaslavsky, Lead, SDSC Spatial Information

Systems Lab; hosts CUAHSI HIS. • Mark Williams, CU-Boulder. PI Niwot Ridge LTER• Anthony Aufdenkampe, co-I Christiana River

Basin CZO

Page 12: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Integrated CZO data systemSynthesizing information management experience and software from CZO partners and neighboring earth science projects into a standards-based system for publishing environmental data to emphasize the “critical zone” nature of our shared data sets

Page 13: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Local CZO DB

CZO Data Publication System

Spatial, hydrologic, geophysical, geochemical, imagery, spectral…

Local CZO DB Local CZO DB

Web site Web site Web site

Standard CZO Services

Standard CZO data display formats

CZO

Desktop

Matlab

R

Excel

ArcGIS

Modeling

CZO DesktopApplications

CZOData Products

CZO Web-based Data Discovery

System

External cross-project registries

DataNet, NEON

CZO Data Repository and Indexing (CZO Central)

Page 14: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Data Publication Process(for hydrologic time series)

CZO Display File ODMWaterML

Service

OGC WFS

Service

Raw Display file metadata Is registered with the CZO data portal, to assure original data is discoverable and downloadable.

WFS Service Is registered with the CZO data portal

CZO Central Catalog

OGC CSW

ServiceCZO Portal utilizes the OGC CSW (catalog services for the web)

Catalog Search Service

CZO Desktop

Broader internet community

accessing data using standard

protocols.

Page 15: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZO data interoperability: what does it mean

Find and download CZO resources: files and file collections, services, documents – organized by CZO thematic category and by type

Data available in compatible semantics: ontologies, controlled vocabularies

Data available via the same service interfaces (e.g. WFS, SOS) but different information models

Compatibility at the level of domain information models and databases

Dee

per i

nteg

ratio

n

Wid

er v

arie

ty o

f dat

a

Well-understood data with formal information models

available via standard services

Different types of data collected by CZOs

Data discovery portal

Shared vocabulariesand ontology management

Serviceadministration (CZOCentral)

CZOdesktop,others

System componentsLevels of interoperability

Page 16: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.
Page 17: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Data disclaimer

Page 18: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Data Catalogue• Biogeochemistry: Including: anything on (Carbon), N

(Nitrogen), P (Phosphorus) nutrients, microbes• Climatology/Meteorology: Including: Met tower, temps,

snow• Ecology/Biology: Including: microbial, land use• Geology/Chronology: Including: geologic, descriptions of

rocks-mineralogy, CRN ages/rates• Geomorphology: Including: topography, chronological data,

sediment flux, fracture space• Geophysics: Including: seismic refraction etc• Geospatial: Including: GIS/RS, imagery, geologic map,

Gordon Gulch and GLV camera's

Page 19: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Water Chemistry• Header group (/doc): - Title, Abstract, Investigator, Variable names, Keywords,

Methods, Instrument, Citation, Publications, Comments• Header group, column information

– COL1. Label=ValueAttribue, value=site– COL2. label=ValueAttribute, value=DateTime, UTCOffset=-7, Timezone=MST,

format=”YYYYMMDD hh:mm”– COL3. label=ValueAttribute, value=pH, units=pH, SampleMedium=water, units=pH units,

missing value indicator=, ,methods=method1, etc• Header group, column (series) defaults that apply to all columns (eg site below)• Data (/data)• GREENLAKE4,820311,6.4,18,88.51,0.40,,114.77,24.68,21.75,10.23,25.389,,58.296,83.

200,,,,,,,,,,,,,,,,,,• GREENLAKE4,820422,5.7,18,90.15,2.00,,99.80,24.68,17.40,12.79,9.591,,72.870,44.92

8,,,,,,,,,,,,,,,,,,

• Automatically harvested using WaterML and EML• ASCII format, metadata and comma-deliminated data

Page 20: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZO Data Management Web Administration Interface

CZO data managers use this web-based system to register display files, edit service metadata, initiate data retrieval, validate the data against shared vocabularies, and update hydrologic time series services

The administration system will be extended to geochemical samples and other data http://central.criticalzone.org

Page 21: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Services edited and validated by CZO data managers

Data managers control how theirdata is annotated.

Ingesting of Display files is triggeredon the server by the Data manager.

Display file ingestion log

Editable service definitions and management interface for each CZO data service

Page 22: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZO Central Catalog Statistics, March 24, 2011

(time series services only)

CZO Service Sites Variables ValuesJemez River 14 1 154854Boulder Creek 1 31 11834Santa Catalina 5 6 59222Luquillo 8 16 831098Southern Sierra 8 4 1226330Shale Hills 1 18 848624

Christina River 31 5 6870150Total: 68 81 10002112

Page 23: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

New Development: Central CZO Data Discovery Portal

Registered data are organized by CZO thematic categories

Page 24: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Display files from CZO web sites are registered to the data discovery portal automatically

In addition, display files of known types are expressed as data services, which are also registered in the portal

The portal is CSW-compliant (CSW=Catalog Services for the Web): can be federated with other catalogs including data.gov

Supports search by location, resource type, thematic category, keywords, plus full-text abstract search

Federation with CUAHSI HydroCatalog, to allow search of hydrologic data from ~70 networks

Page 25: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Local CZO DB

Shared Vocabulary

Spatial, hydrologic, geophysical, geochemical, imagery, spectral…

Local CZO DB Local CZO DB

Web site Web site Web site

Shared Vocabulary

Standard CZO data display formats

CZO

Desktop

Matlab

R

Excel

ArcGIS

Modeling

CZO DesktopApplications

CZOData Products

CZO Web-based Data Discovery

System

External cross-project registries

DataNet

CZO Data Repository and Indexing (CZO Central)

Page 26: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZO Shared Vocabulary System

Purpose:To promote the consistent use of terminology.

http://sv.critialzone.org

Builds on CUAHSI HIS

Page 27: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

SVDatabase

Data Managers and SV

DataManagers

CSVData File

Unknown TermEmail

Local CZOWebsite

ObservationDatabase

CSVData File

❸Request

TermWeb Page

XML SV List

XML SV List

Page 28: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Preferred vocabularies. Moderators to be designated by CZO with expertise in each

category• Variable names (extended from CUAHSI HIS)• Units (extended from CUAHSI HIS) (e.g. m, g/L)• Value type (from CUAHSI HIS) (e.g. Field observation, derived value,

model output)• Sample type (from CUAHSI HIS) (e.g. stream water, ground water,

rock, soil)• Data type (from CUAHSI HIS) (e.g. average over interval, cumulative,

continuous, sporadic)• Data level (based on Ameriflux list) (e.g. level 0=raw data, level 4 =

fully infilled and quality controlled)• Spatial references ( extensible based on EPSG) (e.g. NAD 1983,

WGS84, UTM zone 11)• KEYKEY: CZO expands ODM controlled vocabularies to a larger audience

using “preferred vocabularies”

Page 29: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Methods

1. Major problem for metadata

2. Solution: lookup table that is part of the controlled vocabulary

3. Three parts: sample collection, sample preparation, analytical procedure

4. Up and running, needs moderators

Page 30: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Local CZO DB

CZO Spatial Data

Spatial, hydrologic, geophysical, geochemical, imagery, spectral…

Local CZO DB Local CZO DB

Web site Web site Web site

Spatial Data

Standard CZO data display formats

CZO

Desktop

Matlab

R

Excel

ArcGIS

Modeling

CZO DesktopApplications

Standard CZO

Services

CZO Web-based Data Discovery

System

CZO Data Repository and Indexing (CZO

Central)

Page 31: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Metadata and Spatial View

Spatial View

• Metadata- Multi File control

• Spatial Extent- Ex: LiDAR flights,

transects, etc.- Point data (collected

at particular location).- Uses Google Maps API- KML functionality

Guo lab, UC Merced

Page 32: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZODesktop

Matlab

R

Excel

ArcGIS

ModelingLocal CZO DB

Geochemical Samples (based on CZEN)

Geochemical samples

Local CZO DB Local CZO DB

Web site Web site Web site

Geochemical web services, EarthChemDB

Standard CZO data display formats

CZO DesktopApplications

Depth-resolved

geochemistry

CZO Web-based Geochemical DB

EarthChem Data Engine & Portal

Page 33: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Location(Watershed)

Sampling Site(Soil / Water)

AnalysisSample(Layer/Depth)

Preparat./

Treatment

12

.

.

.

Sub-smpl 2

Sub-sample

Sub-smpl n

Chemical

Phys. Minr

Others

Data

Loc_info/Climate

Methods

Sources

Precision

Var-Lookup/Unit

Me

ta-D

ata

Ma

in D

ata

Geo-Info

Publication

Project

SMPLTime Series

Landuse/Veg.

Lab-Info

Personcontributor

Preparation/Treatment

Sample

Country/State

Lab Analysis

Sub-Sample

CZO Chemistry Database Conceptual Model – (CZOCHEMDB)

Penn State lead

Page 34: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Progress

Database is accessible at www.czo.psu.edu

PSU CZO students and post-docs have used template for data entry

Susan Melzar (Colorado State) has used template and data has been entered into database

Published data from Muhs et al. (2001), Harden 1987, White et al. (2008)

Current version contains 1391 records, representing 17,604 data values

Ran webinar August 24th to show database capabilities and usage of data entry template

15 participated with representation from all 6 CZO’s

User guide is in progress

Page 35: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

datasets

(original data & derived products)

GCDM DB

Integration withEarthChemDB

35

USGS

NAVDAT

GEOROC

GfG Data EntryUser Submission

External Databases

Topical Data

Collections

Kerstin Lehnert

Page 36: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

EarthChem Portal

36

PetDB Others

USGSGEOROC

NAVDAT

XML

XMLXML

XML

XML

Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas.

Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database.Similar to our ODM hydrology portal

Page 37: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

INTERNATIONAL GEOSAMPLE NUMBER

•Purpose: Unique identification for samples and related sampling features in the Earth Sciences

–To allow unambiguous referencing of data to samples in publications and data systems

–To allow tracking samples through repositories & labs

–To allow integration of distributed data for samplesD3-1D3-1

Name Location PublicationD3-1 SEIR ANDERSON, 1980 D3-1 North Fiji Basin EISSEN 1994D3-1 Shimada Smt GRAHAM 1988D3-1 Gorda Ridge CLAGUE 19843-1 Lamont Smts BATIZA 1982

Name Location PublicationD3-1 SEIR ANDERSON, 1980 D3-1 North Fiji Basin EISSEN 1994D3-1 Shimada Smt GRAHAM 1988D3-1 Gorda Ridge CLAGUE 19843-1 Lamont Smts BATIZA 1982

Page 38: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Geoinformatics for Geochemistry

Core

Core Section 1

Core Section 3

Core Section 2

Sample 1

Sample 2

Sample 1

Sample 2

Sample 3

Sample 1

Sample 2

Sample 3

Rock powder

Mineral conc.

Leachate

Fossil separate

Microprobe mount

ParentParentChild

ChildChildParent

IGSN:XXX000120

IGSN:XXX0065B3

IGSN:XXX9K23G6

IGSN:XXX07ST4K

IGSN:XYZ0G693M

IGSN:ABC0L98SW

IGSN:ABC0L53NW

IGSN:ABC0L653X

IGSN:ABC078HGB

Page 39: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

IGSNInternational Organization

IGSNInternational Organization

SESARSESARNear Space Observatory

(invented example)

Near Space Observatory

(invented example)

ExoPlanet(invented example)

ExoPlanet(invented example)

CZOCZOGeoscience

AustraliaGeoscience

AustraliaUSGSUSGSIEDAIEDA ICDPICDP

RepositoryRepositoryAnalytical LabAnalytical LabInvestigatorInvestigator

Registrar

Registration Agents:

Registrants:

Managing Agent:

Page 40: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

ADAPTING IGSN for CZO•Register any type of sample: pedons, hand specimens, mineral concentrates, etc. …•Register any type of material: soil, rock, sediment, fluid, gas, bio ….•Register ‘sample-related features’: sites, wells, cores, dredges …•Register relations (parent – children): e.g. site pedon mineral

Page 41: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Exploring A More General Data Model: ODM 2.0

• To achieve interoperability between EarthCHEM, CUAHSI ODM, LTER EML

• Better support for samples and unique identifiers (IGSN/SESAR)

• Extensibility to table attributes• Better annotation and provenance• Enable integrated web service based

publication of a broader class of CZO data

Page 42: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

ODM 2.0 – Field Sensor Extension to support field sensor deployments and in

situ observations• Sensor

deployment details

• Attributes of sensor

• Data series from sensor

Page 43: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

ODM 2.0 – Provenance and Annotations Extensions

• Better support for storing provenance of observational data

Page 44: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

General Extensibility

Provides capability to record information (add fields) in tables that was not anticipated a-priori

Page 45: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

CZchemDB

CZO-Central GeoChemDB[ODM 2.0]

CZO-Services

EarthChem Portal

USGS

NAVDAT

GEOROC

Geochemical database

EarthChemXML

CZO Data Display Format

Geochem Services (IEDA)

CZO Web Discovery

GeoChemDB Search

Web-based User Access

CZO Desktop

GfG Data Validation & Ingest

IEDA Long-Term Archiving Service

IEDA Data Publication Service

(DataCite)

SESAR

Sample Registration

EarthChemXML

Other client systems

Other client systems

Page 46: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Where we are today

• Each site has a data manager• Data sets are posted to the web

– consistent metadata and ascii format in progress• We’ve prototyped harvesting data and posting to a

central data portal• Shared vocabulary system in place• Developed protocol for unique sample ID• Partnering with EarthChemDB• Expanding ODM to become more general• Way beyond what I thought possible

Page 47: INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado.

Work plan for next two years• Extending the CZO data publication model to geochemical and GIS

data; then to other types of data – towards deeper interoperability

• Integration based on service and information model standards (WaterML, EarthChemXML, EML, OGC services)– Requirements gathering from all CZOs, data modeling, display file format

specification, services specification, development and validation– Upgrade to WaterML 2 once approved as international standard (~Q3, 2011)

• Registering more hydrologic time series data via CZO Central– Regularly harvesting registered files and updating CZO services; keeping

provenance information• Enhancing parameter-based search across CZOs, with a shared

parameter ontology• Making CZO central data system more robust

– Currently a single server with 24/7 monitoring; need redundant setup• Enhancing role of Data Managers