Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12:...

21
Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access and Interoperability. 19 December 2008 San Francisco, CA P.H. Wiebe, R.C. Groman, C. Chandler, M.D. Allison, and D. Glover Woods Hole Oceanographic Institution Woods Hole, MA, USA

Transcript of Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12:...

Page 1: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Oceanographic Informatics in a Collaborative Environment.

Data Management Special Session N12: Strategies for Improved Marine and

Synergistic Data Access and Interoperability.19 December 2008San Francisco, CA

P.H. Wiebe, R.C. Groman, C. Chandler, M.D. Allison, and D. Glover

Woods Hole Oceanographic InstitutionWoods Hole, MA, USA

Page 2: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

A Context

Data and Information in oceanography in general are expanding at a rapid pace and there is a significant need for more and better management tools and techniques to preserve and serve them.

Page 3: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Talk Objectives

• To discuss current developments and new directions to enable better opportunities for data discovery, integration, and synthesis of oceanographic data regardless of origin.

• To encourage comprehensive efforts to establish broadly based and accepted best practices in the quest to obtain new information about ocean physics, chemistry, biology, geology, and geophysics.

• To highlight some of the changes I have observed during the past four decades and strongly endorse the New Age that is fast approaching in the way we gather, store, access, and analyze information and data.

Page 4: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

A Personal Context

I have worked throughout my career as a biological oceanographer on multi-investigator and multi-disciplinary programs and projects.

I realized early on that data and information management was an essential element in design, acquisition, and synthesis of data sets in the oceanographic scientific enterprise. But the technology (hardware/software), resources (funding), and mandates were not in place until recently to do it effectively.

The effort now is on more than data and information management. It involves what is termed “Data informatics”.

Page 5: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Informatics Defined

“Informatics is the science and engineering that occupies the gap between information and communications technology (ICT) systems and cyberinfrastructure (computers, grids, Web services, etc.), and the use of digital data, information, and related services for research and knowledge generation.”

From: Baker, D.N., C. E. Barton, W. K. Peterson, and P. Fox. 2008. Informatics and the 2007–2008 Electronic Geophysical Year. Eos. 89(48): 485-486.

Page 6: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

1976 CCR Program 1982 WCR Program

1999 GLOBEC Program

Evolution of MOCNESSData Acquisition

HP2100

CBM 8032

Windows PC

Page 7: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Sampling in the Cold-Core Ring Program

1976-1977

4 Cruises TotalPO, bio-process, & mapping

Page 8: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Sampling in the Warm-Core Ring Program

1981-1982

15 Cruises Total6 PO3 bio-process3 bio-mapping 2 bio-process & mapping

KnorrEndeavor

Oceanus

Page 9: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Sampling in the U.S. GLOBEC Georges Bank Program - 1994-1999

122 Cruises Total 31 Broad-scale 91 process and mooring.

Page 10: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Data Storage

1970’s – Honeywell Sigma 7 - Simple File Storage plus the Sigma 7 Extended Database Management System. MOCNESS data only – terminal access.

1980’s Digital VAX 11/780 - Flat File Storage – all data – terminal access. Micro-computers with floppies and small hard-drives.

1990’s Sun/Unix-Linux Server’s - GLOBEC Data & Information Management system – project specific - all data – web available. Micro-computers become mainstay for labs.

2000’s Unix/Linux Server’s – BCO-DMO Data & Information Management system – multiple projects – web available

Page 11: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

The Biological and Chemical Oceanography Data Management Office (BCO-DMO)

The BCO-DMO was initially created in late 2006 to serve PIs funded by the NSF Biological and Chemical Oceanography Sections to serve investigators funded by the National Science Foundation to conduct marine chemical and ecological research. BCO-DMO provides open access to marine biogeochemical and ecological data and information developed in the course of scientific research can easily be disseminated, protected, and stored on short and intermediate time-frames. [www.bco-dmo.org]

Page 12: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Theorem 1: The probability that all the necessary data and information are collected and preserved to allow another researcher to properly use your data is inversely proportional to the time since the data were collected.

Corollary: Unless data and information are collected and preserved during the experiment (e.g., cruise), subsequent researchers will have a difficult time using those data.

Theorem 2: The longer the time since the data were collected the less likely the data will ever be considered “final” or available.

Groman’s Theorems

Conclusion: It is essential that data and information management begin with the start of a project or program.

Page 13: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

The Rise in Interdisciplinary Oceanography and Collaboration in Ocean Science have been emphasized by Powell (2008) and Briscoe (2008).

Powell, T.M. 2008. The rise of interdisciplinary Oceanography. Oceanography. 21(3): 54-57.

Briscoe, M.G. 2008. Collaboration in the Ocean Sciences. Oceanography. 21(3): 58-65

Powell: “Ocean science has long been interdisciplinary… Today, one can scarcely conceive of an oceanographic question that does not cut across disciplines.”

Briscoe: “Ocean science must head toward more collaboration, because many of the research and applications questions we face demand teams of scientists and engineers (and probably social scientists and economists)…..Collaboration in the ocean sciences is critical to addressing emerging ocean problems, and is worth the effort.”

It will take data informatics to make it possible!

The Informatics Imperative

Page 14: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

What has happened to cause a change?

• Computers more powerful and storage much larger.

• Software and software tools to handle data management now widely available.

• More multi-disciplinary research is happening that is building on the works of earlier programs and the earlier data are needed for current and future work.

• Programs have policies that require data sharing in reasonable time frames (~2 years)

• Program Managers are requiring that data be made publicly web accessible from previous grants in order to get the funding for the next grant.

Page 15: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Still resistance to sharing data – Why?

Scientist does not want others to use the data - fear of lost opportunities.

Scientist does not know how to do it. Other Reasons expressed:

Structural Impediments

• I’m not done publishing my papers based on the data.

• My graduate student is almost done analyzing the data.

• It’s not final yet.

• Lack of positive acknowledgment of data shared (give credit on par with papers? Need for DOI’s).

Page 16: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Reasons for sharing data

Scientist’s data are not nearly as valuable by themselves as they are in the context of all the other data sets collected within a program.

Use of other’s data within a program without sharing their data is not fair.

Data publishing with author citable references is coming. Scientists will get credit for putting their data in public repositories.

There are real advantages to sharing.

Page 17: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Data Informatics

SemanticWeb

RDFOWL

SPARQL

BASIN – an example of a prospective new program that will require all the Data Informatics and management techniques possible.

Ontology web language (OWL); Resource Description Framework (RDF); SPARQL Query Language for RDF

Page 18: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

• Research in oceanography proceeds along three major lines: field observation, field and laboratory experimentation, and modeling. Data management and informatics until now have been an after-thought.

• Efforts like ecosystem-based management requires the integration of oceanographic, biodiversity, fisheries, and other marine environmental data, as well as the development of analysis and assessment tools.

• Exponential increase in data sources and the proliferation and distributed nature of databases have created a fourth new and important line of marine research. Data management and informatics is now on par with lines of oceanographic research (Baker et al. 2008).

Summary

FO EX

MO

Past

EX

FO MO

DM&I

Future

Baker, D.N., C. E. Barton, W. K. Peterson, and P. Fox. 2008. Informatics

and the 2007–2008 Electronic Geophysical Year. Eos. 89(48): 485-486.

Page 19: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

• Research priorities include:More rapid and efficient data acquisition, Enhanced data management, More effective data utilization and reuse, and Improved data visualizationDevelopment of ontologies.

• The ultimate goal is to create a cyberinfrastructure for oceanography that enables open, transparent, interoperable access to data and information, regardless of their location.

Summary

Page 20: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.

Acknowledgments

• Charlton Galvarino for his excellent skill in implementing the MapServer interface.

• Huan-Xiang Xu for his help during the metadata database design and his help in the initial loading of the database.

• Xiaoyan Ye for her help in the initial attempts to develop comprehensive search options, geospatial displays of all the data, and for updating software to take advantage of the new database.

• Julie Allen for her extensive help and support in implementing our BCO-DMO web site using Drupal and in using Cold Fusion to provide web access to the database.

• National Science Foundation supported our work under grant numbers OCE-0646353 and ANT-0440777.

Thanks To:

Page 21: Oceanographic Informatics in a Collaborative Environment. Data Management Special Session N12: Strategies for Improved Marine and Synergistic Data Access.