Data Management Overview - Xiamen University€¦ · various research and engineering users,...
Transcript of Data Management Overview - Xiamen University€¦ · various research and engineering users,...
Maciej Telszewski
IOCCP Director and Coordinator of GOOS Biogeochemistry Panel
With slides from: Meike Becker and Jay Pearlman
Data Management Overview
Institute of Oceanology of Polish Academy of Sciences, ul. Powstańców Warszawy 55, 81-712 Sopot, Poland
Phone: +48 58 731 16 10 / Fax: +48 58 551 21 30, www.ioccp.orgBiogeochemistry
Panel
Biogeochemistry
Panel
Biogeochemistry
Panel
SystemA broad schematic of a full value chain in
sustained ocean observing programs
Observations
CoordinationIn situ and satellite observations
Global networks and global
approaches
System requirementsApplications/products, knowledge challenges,
phenomena, EOVs, network design
Data
systemsAssembly
and
dissemination
UnderstandingScientific analysis,
indicators
Predicting / ModelingOcean forecast systems
Societal benefit from actionable
informationPolicy, public and private management and individual decisions
AssessingPolicy-relevant scientific
assessmentsServices
[informing]Early warning, forecasts,
short and long term
direct advice
www.ioccp.org/foo
1st International GO2NE Summer School
2-8 September 2019, Xiamen, China
Always! Everything! With backup!
And keep it tidy!
So that you (and others) are able to find things also in ten years from now
How to store data related documantation?
In a way nothing gets lost
and that can be understood by others
• Lab-books
• Cruise reports
• Excel spreadsheets or txt files with the data -to be submitted
• netCDF files
Open access to publications and data became a formal requirement by
Funding agencies
Publishers
Researchers
Society
By traditional measures scientific data is already open!
researchers publish their results in peer-reviewed journals
they share data with one another
they present at conferences
collaborate on projects
Sharing data with colleagues and collaborators is the
basic principle of science!
Several international agreements exist for open data
• Good scientific practice in research and scholarship
European Science Foundation (ESF), 2000
It is vital that all primary and secondary data are stored in a secure and
accessible form.
• Principles for dissemination of scientific data
(ICSU/CODATA, 2000)
Scientific advances rely on full and open access to data. … Legal entities
should foster a balance between individual rights to data and the public
good of shared data.
• OECD Principles and Guidelines for Access to Research Data
from Public Funding (2007)
Databases are rapidly becoming an essential part of the infrastructure of the
global science system.
What has to be done prior data archival?
› Get the data in compliance with SOPs!
› Ensure completeness and consistency (reformatting, standardised vocabulary, units)
› Quality control and quality assurance
› Documentation
Types of best practice
Equipment User Manuals
come from developer/manufacturer
Good to assemble and for deployment
Specs often recorded in unrealistic environment
Standard Operating Procedures (SOP)
Very comprehensive one parameter, one platform
description
Describe method and not nuances of specific design
Best Practices (guides / manuals, cookbooks, SOP etc)
Practical knowledge plus elements of two above categories
Often developed for specific environment, phenomenon
or platform
(Certified) Reference Materials and Standards
Provide trusted reference for calibration and quality control
Published Papers
Methodology/protocol described in
a published journal/book article
Best Practices Documents
(OBPS Templates available)Written by practitioner for the
community often used as the basis
for a published peer review article
Training Courses
Face-to face/hands on experience
https://www.oceanbestpractices.org
Why a System for Best Practices?
Best practices bring many benefits:
Quality and consistency of observations
Interoperability of data
Efficiency (don’t re-invent the wheel – cost saving)
Data traceability
Connections between data, models and applications
BUT:
Not all best practice knowledge is documented
They are scattered and can be hard to find
Not stored in a machine readable format
Can be lost when a project ends
Promising methods may not be shared
Work to create a best practice is often not acknowledged
An Ocean Best Practice
System is Needed
https://www.oceanbestpractices.org
The Ocean Best Practices System
Vision:To have agreed and broadly adopted methods across ocean research, operations, and applications.
MissionTo provide a trusted system to support the
collaborative development, sharing, and adoption of
best practices across the ocean community.
Participating Organizations and Programs
RepositoryPeer
Review
Journ
UsersTraining
OBO
Technologies
Components of the
Ocean Best Practices
System
1) A trusted repository
2) Advanced
Technology:
including text
mining, natural
language
processing and
semantic search
3) Sophisticated but
user-friendly web
interface
4) Peer-reviewed
journal linked to the
repository
5) Training materials
supporting the
users and their
experience with the
OBPS
6) A community
forum for users and
providers of best
practices
https://www.oceanbestpractices.org
Observing Approach ☞
Ship
-bas
ed
Re
pe
at
Hyd
rogr
aph
y
Ship
-bas
ed
Un
de
rway
O
bse
rvat
ion
s
Pro
filin
g Fl
oat
s
Mo
ore
d F
ixed
-po
int
Ob
serv
ato
rie
s
Glid
ers
Ship
-bas
ed
Fix
ed-p
oin
t O
bse
rvat
ori
es
Sate
llite
Re
mo
te S
en
sin
g
...
EOV Sub-variable Procedure
I
N
O
R
G
A
N
I
C
C
A
R
B
O
N
Measurement technique☞
pCO2
Deployment & sampling
Data retrieval & formatting
Calibration / validation
Reference materials & standards
Primary quality control
(Near) real-time
Delayed-mode
Secondary quality control
DIC
Total Alkalinity
pH
OXYGEN
NUTRIENTS
TRANSIENT TRACERS
PARTICULATE MATTER
NITROUS OXIDE
STABLE CARBON ISOTOPES
DISSOLVED ORGANIC MATTER
OCEAN COLOUR
O
c
e
a
n
B
e
s
t
P
r
a
c
t
i
c
e
s
https://www.oceanbestpractices.org
25
The Repository – hub of the system
FAIR: Findable, Accessible, Interoperable, Reusable
Discovery and access to relevant and tested methods
● Global, permanent, open access
repository, hosted by IOC/UNESCO
● All elements of the ocean information
value chain.
● DOIs issued, version control, standard
metadata, active links
● Templates supporting uniform
submission and processing
● Notification services to keep track of
updates
BP Webinar May 8 2018
https://www.oceanbestpractices.org
Metadata
Without metadata - all the rest is useless…
Should provide all important information about the dataset.
What do you think should be included in a metadata document?
What would you want to know if you should use data someone else measured and processed?
Metadata – describing your data
Principal investigator(s) (PI), Project(s)who
what
where
when
how
Data types, Parameter [unit]
Methods
Spatial coverage -> geographical positions
Temporal coverage ->
Title, Identifier (DOI)
Reference(s)
Quantities
Sampling event, Campaign, Location
Data archives store your data and metadata but….
What they don’t store (yet):
• Calibration sheets (pre and post deployment)
(for all sensors used in your data reduction)
• Certificates/Calibrations of calibration gases you used
• Specific documentation about your system setup
• Specific circumstances that might have influenced your measurements during the cruise/deployment
There is still a lot of information, only the PI has.
Make sure it doesn’t get lost!
Data archival – important note
Players in data management
› National data archives (e.g. National Oceanographic Data Centres)
› International data archives (e.g. World Data Centres, regional archives)
› Community agreed data archives (e.g. human genome project, CMIP 5 (model intercomparison), OBIS (Ocean Biogeographic Information System))
› Portals/data harvesters: data from various sources (GEO, Copernicus Marine Environmental Monitoring Services, Global Change Master Directory)
› Data products (OBIS, World Ocean Database, GLODAP, SOCAT)
In reality many archives are a mix of the above
National Responsibilities include:
• Receiving data from researchers, performing quality control, and archiving;
• Receiving data from buoys, ships and satellites on a daily basis, processing the data in a timely way, and providing outputs to various research and engineering users, forecasters, experiment managers, or to other centres participating in the data management plan for the data in question.
• Reporting the results of quality control directly to data collectors as part of the quality assurance module for the system.
• Participating in the development of data management plans and establishing systems to support major experiments, monitoring systems, fisheries advisory systems;
• Disseminating data on the Internet and through other means (and on CD-ROM, DVD, etc);
• Publishing statistical studies and atlases of oceanographic variables.
• Providing indicators for the different types of data being exchanged in order to track the progress.
National Oceanographic Data Centres
31
Examples for NODCs
IODE Ocean Data Portal - www.oceandataportal.org
International Council for Science (ICSU) :
World Data Center system (WDCs)
• Founded in 1931 to promote international scientific activityin the different branches of science and its application for the benefit of humanity
• One of the oldest non-governmental organizations
• More than 135 nations adhere to it
• ICSU established the World Data Center system in the1950s
Mission:Data constitute the raw material of scientific understanding. The World Data Center system works to guarantee access to solar, geophysical and related environmental data. It serves the whole scientific community by assembling, scrutinizing, organizing and disseminating data and information
Source: www.iscu.org
Network of ICSU WDCs
•Nuclear Radiation
Tokyo, Japan
WDC Co-ordination Offices
Washington DC, USA
Beijing, China
•Meteorology
Asheville NC, USA
Beijing, China
Obninsk, Russia
•Oceaography
Obninsk, Russia
Silver Spring MD, USA
Tianjin, China
•Paleoclimatology
Boulder CO, USA
•Marine Geology and Geophysics
Boulder CO, USA
Moscow, Russia
•Remotely Sensed Land Data
Sioux Falls SD, USA
•Renewable Resources and Environment
Beijing, China
•Recent Crustal Movements
Ondrejov, Czech Republic
•Airglow
Mitaka,Japan
•Astronomy
Beijing, China
•Atmospheric Trace Gases
Oak Ridge TN, USA
•Aurora
Tokyo, Japan
•Cosmic Rays
Toyokawa, Japan
•Geology
Beijing, China
•Human Interactions in the Environment
Palisades NY, USA
•Ionosphere
Tokyo, Japan
•Earth Tides
Brussels, Belgium
•Geomagnetism
Copenhagen, Denmark
Edinburgh, UK
Kyoto, Japan
Colaba, India
•Glaciology
Boulder CO, USA
Cambridge, UK
Lanzhou, China
•Marine Environmental Sciences
Bremen, Germany
•Rotation of the Earth
Obninsk, Russia
Washington DC, USA
•Satellite Information
Greenbelt MD, USA
•Rockets and Satellites
Obninsk, Russia
•Seismology
Denver CO, USA
Beijing, China
•Solar Radio Emission
Nagano, Japan
•Space Science
Beijing, China
•Space Science Satellites
Kanagawa, Japan
•Solar Activity
Meudon, France
•Soils
Wageningen, The Netherlands
•Sunspot Index
Brussels, Belgium
•Solar Terrestrial Physics
Boulder CO, USA
Didcot Oxon, UK
Moscow, Russia
Haymarket, Australia
•Solid Earth Geophysics
Beijing, China
Boulder CO, USA
Moscow, Russia
• Data from various sources is made available in a
structured matter
• Structure can be on the metadata level or data level
Portals / Data harvesters
36
Marine Data Infrastructure for the
management of large and diverse sets of
data deriving from in situ of the seas and
oceans.
Portal / Data harvester
Portal / Data harvester
38
Organize and maintain data acquisition in
real-time and delayed mode of in-situ
measurements necessary for operational
oceanography
Data products
• Often parameter-centered
• Global synthesis of raw data
• gridded products with or without interpolation
• in uniform format with quality control (e.g. quality flags)
• Often periodically updated with new data (e.g. annual releases)
• often online viewers
• Downloadable in several formats (text, NetCDF, ODV)
• often documented in ESSD articles;
• Fair Data Use Statement;
• Often community activity with numerous contributors worldwide.
Finally!
Endelig!
Wreszcie!
Schließlich!
Finalmente! 最終的に!
Hopea!• A global collection data from
724 hydrographic cruises
• 45 306 stations
• 999 488 sampling depths
• 1972 -2013 GEOSECS-TTO-WOCE-CLIVAR
• Corrected for biases
• Extensively documented
• Released January 2016
Äntligen!
Data products - Ocean Interior Data Synthesis
Biogeochemistry
Panel
1st International GO2NE Summer School
2-8 September 2019, Xiamen, China
Data products - Surface Ocean CO2 Atlas v623 million in situ surface ocean CO2 observations
V6 new
A-E
V6 all
A-E
30°E 150°E 90°W 30°E 150°E 90°W 30°E
60°N
0°
60°S
260 280 300 320 340 360 380 400 420 440(µatm)
Possible problems in retrieving data
• Version conflicts (data is archived in many data centres – in different stages e.g. raw data, quality controlled, etc.)
• Bad documented metadata and data (methods, units, unclear parameter definitions, etc)
• Just metadata is available online – data has to be requested or log-in required
• Metadata is standardised but not data itself
• Naming of cruises varies in many countries > hard to identify same cruises
• Date formats (mm/dd/yyyy; yy/mm/dd; dd/mm/yyyy etc)
• Ways to report the position (Lat/Long, UTM)
• Different export formats (plain text, xml, netCDF, etc)
• Different entities (one data set = data from one cruise or data from one station or data from one sample)
• Data set is too large to be downloaded (e.g. model data)
Keep that in mind when preparing your data for submission and when accessing data from others!
Carbon Data Management
Biogeochemistry
Panel
1st International GO2NE Summer School
2-8 September 2019, Xiamen, China
Global Data Assembly Centre for
Marine Biogeochemistry
• Central access to QCed EOV Inorganic Carbon data
regardless the source
• Collaboration - no competition
• Mirrored inventories ensure sustainability
• IOC UNESCO GOOS and IODE, UNESCO/SCOR's
IOCCP, GEO’s Carbon and GHG Initiative, GOA-ON,
SOLAS, GCP’s GCB, GO-SHIP, ATLANTOS,
COPERNICUS, GDAC ARGO
1st International GO2NE Summer School
2-8 September 2019, Xiamen, China
Biogeochemistry
Panel
Institute of Oceanology of Polish Academy of Sciences, ul. Powstańców Warszawy 55, 81-712 Sopot, Poland
Phone: +48 58 731 16 10 / Fax: +48 58 551 21 30, www.ioccp.orgBiogeochemistry
Panel
Maciej: [email protected] our website: www.ioccp.org