DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS
description
Transcript of DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS
1
DATA SYSTEMS FOR SAMPLE-BASED OBSERVATIONS
Kerstin Lehnert
2
3
Data from Samples
Distributed data acquisition Different labs/researchers analyze the
same sample or subsamples of it. Distributed data publication
Different data for the same sample are published in different papers.
Distributed data archiving Data for the same sample are kept in
different data systems. Integrated data access required to
maximize utility.
4
Geochemical Data
diverse hundreds of parameters thousands of materials vary with space and time over a range of
more than ten orders of magnitude complex
mostly sample-based with complex relations among samples & subsamples
distributed data acquisition (one sample analyzed in different labs by different researchers at different times)
Idiosyncratic data acquisition methods
5
Geoinformatics for Geochemistry DATABASES
thematic geochemical databases (PetDB, SedDB, VentDB) DATA REPOSITORY
Geochemical Resource Library REGISTRIES
System for Earth Sample Registration SESAR IEDA Data Publication Agent of the STD-DOI system (DataCite®) GeoPass: single sign-on authentication system
DATA ACCESS & ANALYSIS TOOLS GfG user interfaces EarthChem Data Engine (Portal)
6
EarthChem XML DB
Metadata catalog
datasets
(original data & derived
products)
GCDM DB
GfG Architecture
USGS
NAVDAT
GEOROC
EarthChem Portal
GfG Data EntryUser Submission
External Databases
Topical Data Collections
Geochemical Resource Library
7
GeoChemical Data Model
observed value
publication data source
method/DQ
samplefeature of interest
collection,geospatial
analysis
materialpreparatio
n,obs. point
Metadata
Geospatial Geographical coordinates Geographical names
Collection Sampling technique Field program
Description & Age Classification Texture Alteration Age
Data Quality Technique Instrument Laboratory Precision Reference material measurements Correction procedures
9
10
11
12
13
14
15
16
Standards for Data Access & Integration
WMS, WFS For visualization tools
OAI-PMH For joint data inventories
EarthChemML For integration across geochemical data
systems For interoperability with other systems
17
IEDA System-wide Inventory
InventoryExpedition MetadataReference MetadataDataset Metadata
Geospatial Metadata
RSS feed
MGDS SESAREarthChem GRL
GeochemDBs
Object Registration Object Metadata
Chemical DataCruise Info
DOI Registration
19
EarthChem Portal
PetDB Others
USGSGEOR
OCNAVDAT
EarthChem Data Engine
Database
XML
XML XML
XML
XML
EarthChem Data Engine
Search & Visualization
Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas.Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database.
20
Access Levels
EarthChemML
23
EarthChem Repository: user submission need tools that are easy to use and
support the data flow from lab to publication ideally, represent ‘pipelines’ for data
capture early in the data acquisition process
tools need to include data validation and DQC procedures
offer citable data publication need data policies
IEDA data publication service
24
25
STD-DOIs
The STD-DOI metadata are mainly Dublin Core elements, plus data specific elements.
The metadata transmitted to the National Library via web service (HTTP/SOAP) and incorporated into the library catalogue.
The metadata may contain references to other objects (DOI, IGSN, ...): Element <RelatedIdentifier> isCited, isParent, isChild, isDuplicate, …
26
STD-DOIs
The element <relatedIdentifier> can be used to point to other electronic objects: Point to the literature where the data set is
interpreted. Point to samples, from which the data were
derived. Point to other datasets that belong to the
same collection of datasets. These links can be used by machines
(e.g. data portals) to make search suggestions and thus aid discovery of data, literature and samples, or other added value services.
STD-DOI System Architecture
28
Data DOIs
Information Discovery
Link to publication
Citation of data
IGSN points to sample
30
The International GeoSample Number
Ambiguous Sample Naming
Examples from the PetDB Database
Name Location Publication CruiseD3-1 SEIR ANDERSON, 1980 VM3301 (Vema)D3-1 North Fiji Basin EISSEN 1994 Starmer 1 (Nadir)D3-1 Shimada Smt GRAHAM 1988 S1-79 (Sea Sounder)D3-1 Gorda Ridge CLAGUE 1984 KK2-83NP (Kana Keoki)3-1 Lamont Smts BATIZA 1982 RISE III (New Horizon)
Sample names are duplicated.
Sample names are modified or changed.
D3 Engel 1964D-3 Scheidegger 1981, Schilling 1971PD3 Tatsumoto 1965, 1966PD-3 Hedge 1970, Muehlenbach 1972PV D-3 Engel 1965AMPH3D Pineau 1976AMPH-D3 MacDougall 1986AMPH D-3 Sun 1980, Schilling 1975AMPH 3-PD-3 Hart 1971S-10 Subbarao 1972
Dredge sample 3, Amphitrite Cruise 1963/4
Provides & manages unique identifiers for samples IGSN - International Geo Sample Number Assigned upon registration of sample metadata
Catalogs & archives sample metadata Access to sample metadata via web site & web
services Long-term preservation of metadata Link to sample archives
Facilitates links to data IGSN will be incorporated into persistent resolvable
GUIDs
33
IGSN:SIO8JH3M4
International GeoSample NumberA Global Unique Identifier for Earth Samples
Strict syntax (9 digits, alphanumeric) First three characters are unique user code (registered with
SESAR) Last 6 characters are random numbers + letters Allows 2,176,782,336 sample identifiers per registrant
Does not replace personal or institutional names. Applied to samples & sub-samples
system tracks relations
www.geosamples.org
Name space
Geoinformatics for Geochemistry
Core
Core Section 1
Core Section 3
Core Section 2
Sample 1
Sample 2
Sample 1
Sample 2
Sample 3
Sample 1
Sample 2
Sample 3
Rock powderMineral conc.Leachate
Fossil separateMicroprobe mount
ParentParentChild
ChildChildParent
IGSN:XXX000120
IGSN:XXX0065B3
IGSN:XXX9K23G6
IGSN:XXX07ST4K
IGSN:XYZ0G693M
IGSN:ABC0L98SW
IGSN:ABC0L53NW
IGSN:ABC0L653X
IGSN:ABC078HGB
Sample Types
“Sampling events” such as holes, cores, dredges, stratigraphic sections
“Individual samples”: specimens rocks, minerals, fossils, fluid samples, precipitates, synthetic material, etc.
“Sub-samples” of any of above: processed samples such as mineral or fossil separates, leachates, thin sections, etc.
Sample Registration
Spreadsheet forms for batch
loading
Interoperability(web services)
SESAR Web Site
37
Implementation Challenges
Diversity of users Large sampling campaigns (IODP, ICDP, ECS) Repositories Data systems Individual investigators
Diversity of sample types Integration into existing policies,
procedures, data systems International scope Connectivity in the field
38
Solutions
Schema improvements Web-service based registration from client data
systems Distributed system of registration nodes (Trusted
Agents) Handle service for IGSNs (persistent, resolvable)
http://dx.doi.org/18.2539/IGSN.SIO001234 Tools to facilitate registration
iSESAR (registration via iPhone) eCollections (personal sample management) webCollections (hosting services for repositories)
IGSN International Consortium