SeaDataNet A Pan-European Infrastructure for Ocean
and Marine Data Managementwww.seadatanet.org
Catherine MaillardFirst Training Session
Ostende, February 12-17, 2007
Introduction to Oceanographic Data Management
2
Data management
ANALYSIS & MODELLING
SYSTEMSOBSERVING
SYSTEMSUser’s Web
browser
Analysis program
Product generationQuality
control Checks
Data discovery
Safeguarding
Data sets aggregation
Catalogues
Data Compilation
DataFormatting
CDI - Data indexing in local
archiving system
3
1. Data Compilation
The data never go directly to the data centres – therefore it needs to:Locate the data sets not yet archivedRequest and get a copy of the missing data sets from the source laboratory/scientist –Check that the data sets is properly documented
4
COMPILATION 1.1: Locate the data sets
which are not yet archived
Search in cruise report (CSR) catalogueOr in observation system (EDIOS)Or in EDMED or EDMERP
A data set should be identified either+ Maintain regular direct contacts
5
COMPILATION 1.2: get a copy of the missing data
sets from the source laboratory/scientist
Request(s) a copy of the missing data sets identified as not archive at any format
Emphasize the importance of: long term archiving to follow up the environmental
changes Integration in long time series of data of the same
type – availability of global/regional/thematic database depends on all contributions
Facilitate the use of these databases Get and safeguard the electronic fileSometimes necessity of digitalization (GODAR)
6
COMPILATION 1.3: The mandatory meta-data
Check that the data sets is properly documented with the mandatory fields describeda minimum of meta-data should be
included in the data files eg. Reference to cruise or observation system and source laboratorySensor typeParameter names and units etc.
Complete the missing information by asking questions to the originator
7
2 - Data ReformattingIn general the original formats of the data files cannot be used in data management Incomplete/not standardized meta-data Incompatibility with QC and other processing input
format Need of a unique archiving format for safeguarding
the data sets of the same type Data management format, Archiving format and dissemination/exchange format(s) may be but not necessarily the same
8
2 - Different Data Formats used
Archiving format : can be one of the actual exchange format or local format designed according to rules to insure sustainabilityExchange/Disemination format(s): joint projects and interoperability require common exchange format(s)Data Management/processing
9
2.1 : General rules for sustainability of an archiving
format The archiving format should:
be independent from the computer (and libraries) – RDBS are not appropriateinsure that any isolated data includes enough meta-data to be processed (eg. Location and date)be compatible and include at least the mandatory fields (meta-data) requested for the agreed exchange format(s)Include additional textual or standardized “history” or “comment” fields to prevent any loss of informationProvide similar structure and meta-data for different data type such as vertical profiles and time series
These rules are normally followed also for exchange formats
10
2.2 - SeaDataNet Data transport Formats
obligatory formats:NetCDF (Binary) for gridded data and 3D observation data such
as ADCP (Modified) ODV spreadsheet for other data types (vertical
profiles and time series) optional format:ASCII Medatlas as standard exchange format for the
Mediterranean and Black Sea community. BODC leads the task to modify the present ODV and NetCDF
formats for SeaDataNet use (QC flags, parameters semantics etc..and conformity with the international standards)
Formatting exercises to asses the coherence and compatibility of exchange formats
11
2.3 – Processing Formats
For data management, (QC, cataloguing, selection, extraction, visualisation) the data can be In the archiving format and the In relational database system (RDBS) – the
presently most used RDBS in the community are ORACLE and MySQL
Note: an interface is needed between the software input format and the local data management system
12
3 - Quality ChecksWhat they do Detect missing mandatory information Detect errors made during the transfer or reformatting Detect remaining outliers Detect duplicates Attach a quality flag to each numerical value
What they don’t do the preliminary data calibration and validation made by the
expert scientists Modify the data points
General rule The tools for data QC are not unique (eg. ODV and other local
systems), but the procedures are compatible. Any QC of a data set should be reported to the originator to
give feedback and ask questionsHow they are performed Next presentation by Sissy
13
4 - SafeguardingThe QCed data sets should be safeguarded in a perennial system for further use 2 copiesFollowing up of the backup when the system or the
technology changes It is recommended to use the common computer
infrastructure of the institutes for making the backup regular and automatic
The original not standardized and not QCed data sets should be safeguarded also, for possible further checks by the data manager or the source scientists, but not to be disseminated
14
5 - Data Dissemination and service
National data sets according to the national rulesAggregated data sets with other data sourcesExport the data in a unique exchange format With the appropriate documentation on:
the format and codesQC performed on the data The source of the data and the condition of use (license)
15
5 - Data aggregationData Aggregation represents a service and a productTo answer data requests related to a geographical area or other selection criteria independently from the sourceInterrogate the local data centreComplete with other sourcesEliminate the duplicates
16
Other data sources
The other data centres of the consortiumRegional and project databases: ICES: North-East Atlantic Medatlas 2002, Mater1996-1999 but some data
included in Medatlas, MFS/MOON for RTThe World Ocean Atlas – delayed mode dataThe Coriolis/Argo Server – Real Time DataThe satellite data
17
The consortium data
The Common Data Index (CDI) shows what is presently available in the data centres. It will be continously updated during the project
http://www.sea-search.net/cdi/(also from the SeaDataNet website)
During the development phase (2006-2007) of the interoperable system, by the Technical Task Team, each data centre is interrogated separately to get access to the the data - Several Data centres provide on line tools for data search and access, including geographical selection and web services.
18
Regional Databases
ICEShttp://www.ices.dk/ocean/ICES format
Medatlas 2002www.ifremer.fr/medar + Cdrom +ftp site Developed in the frame of the EU Medar project (a
regional DAR)Data selection tools according to various criteria
including geographical search available on the CdromAlso available on line from several partner data centres
Medatlas format
19
World Ocean Atlas 2005http://www.nodc.noaa.gov
/OC5/WOD05/pr_wod05.html
Developed by US/NODC – WDC Washington – Ocean Climate Laboratory in the frame of IOC/GODAR project with the contribution of the other data centresData, mainly delayed mode data, are available through on line selection tool or on DVD (on request) All the fields can be interrogated for data selection. The possibility to select countries by group ( to get all but the own country, or all but the SDN consortium for example) is commonly used.
20
Data Types in WOA 2005Type of observations
Ocean Station Data (OSD) [Bottle, low resolution CTD/XCTD, plankton data]High Resolution CTD/XCTD (CTD) Expendable (XBT) and Mechanical (MBT) Bathythermographs Autonomous Pinniped Bathythermographs (APB)Profiling Floats (PFL) Drifting Buoys (DRB) Moored Buoys (MRB) [TAO, PIRATA, others]Undulating Oceanographic Recorder (UOR) [Towed CTD] Glider data (GLD) Surface-Only (SUR) [Bucket, Thermosalinograph]
ParametersPressure, Temperature,salinity + 23 bio-geochemical parameters + biological taxons
21
WOA 2005 export format
US-NODC formatCodes and standards different from SeaDataNet Tools available to process the data: US/NODC tools in fortran, C, Java to read the
data SeaDataNet/Ifremer tool to transcribe from WOA to
Medatlas by a converter (presently available in Unix only)
ODV can visualise the data directly in WOA format
22
Coriolis/ Argo Serverhttp://www.coriolis.eu.org/cdc/
The Coriolis/Argo server is one of the two Argo Global Data Assembly Centres (GDAC) synchronized on a daily basis with the US
GODAE Data Centre (Monterey) serving daily real time data (+gridded analyses)
from the following national DACs including: Australian, Canadian, Chinese, French, Indian, Korean, Japanese, UK, and US, contributors from Chile, Costa-Rica, Germany, Morocco, Mexico, Norway, Netherlands, Russia, Spain and data from the GTS (sources difficult to establish)
On line selection tools allowing to visualize and download in-situ data
23
Data Types in Coriolis/Argo
Vertical profiles mainly from : XBT, XCTD or XBT from research or opportunity vessels ; Argo profiling floats ; Anchored buoys or moorings ; Drifting buoys.
Trajectory data mainly from : Drifting buoys ; Argo floats ; Vessels equipped with a thermosalinograph (GOSUD server)
Many data but few parameters : P, T, S essentially Unerdevelopment: integration in the SeaDataNet
system
24
Export Formats from Coriolis/Argo
Argo Netcdf – widely used in operational oceanography, designed for TS profilesASCII – (quasi) Medatlas
Important: for Medatlas format extraction, do the data selection data type by data type, to avoid to have all types grouped in the same file.
25
Duplicates problem for data dissemination and products
preparationEven if the data are checked for duplicates at the national levels, remaining problems may exist:Data insufficiently documented and attributed
to two different sourcesPTS files and same station with other
parametersRT and DM profilesData declassified by the Navies with poor
meta-dataData sets from the GTS with decimated and
poorly documented profiles
26
What tcan be done?
Selection country by country (however difficult for the RT)Visualising ship tracks and trajectories and superimposing the position maps of cruises made in the same region in the same period. In case of duplicate data sets, evaluate which is the best set of observations, the more complete and documented etc..
Can lead to a lot of manual work in the QC
27
28
Template for TA web page
All the images in the directory « Template_images »
29
Education and Outreach pages
SDN-EDU.html
30
Conclusive remarks
SeaDataNet is developing basic tools for implementing the data management activities in conformity with internationally agreed protocols.The NODC/DNA of the 40 TAP use either the common tools or the existing local systems, but they should be inter-comparable and compatible.The present infrastructure is not yet stabilized in regards of standards and available software, but the main functionalities are available to insure the data circulation from the start of the project.Any new information, result or software is made immediately available on the website.Importance of developing a local page to connect by using the ENEA template
Top Related