DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook...

10
DataONE: Preserving Data and Enabling Data- Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National Laboratory February 6, 2013 NACP All-Investigator Meeting

Transcript of DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook...

Page 1: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research

Bob CookEnvironmental Sciences DivisionOak Ridge National Laboratory

February 6, 2013NACP All-Investigator Meeting

Page 2: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

2

The DataONE Vision and Approach:

Providing universal access to data about life on earth and the environment that sustains it, as well as the tools needed by researchers.

1. Building community2. Developing sustainable data discovery and interoperability solutions

3. Supporting researcher tools and services

Page 3: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

3

The long tail of orphan data

Volu

me

Rank frequency of datatype

Specialized repositories (50%)

Orphan data (50%)

(B. Heidorn)

3

CharacteristicsBig ScienceLarge VolumeAutomated sensosWell describedWell curatedEasily Discovered

• Small Science• Small Volume• Poorly described• Rarely Indexed• Invisible to scientists• Rarely Used• Dark Data

• High spatial resolution• Process based• Theory Development• Model Development• Benchmarking

Characteristics

Page 4: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

4

✔Check for best practices✔Create metadata✔Connect to ONEShare

Data & Metadata (EML)

https://dataone.orghttp://dataup.cdlib.org/

Page 5: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

5

• Sponsor Requirements for Data Management

• Credit for data through citation, DOI, and Data Citation Index

• Training in Data Management• Improved tools for data

preparation – DataUp• Developing a metadata editor

Model-Data Fusion: Harnessing Observations

Page 6: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

66

Model-Data Fusion:Data System Characteristics (1)

• Dedicated financial support for data management is essential

• Close coordination between the data group(s) and the producers (experimentalists) and users (modelers) of the data products

• Based on a data management plan and a data policy• Integrated system that delivers a suite of diverse products• Establish standards (file, workflow, network) and promote

interoperability • Processes to assure and document data quality to allow

proper interpretation and use

Page 7: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

77

• Facilitate rapid exchange of data, products, and information; rapid exchange of large volume data

• Promote the use of best practices to prepare and document data to share and archive

• Make efficient use of existing data management infrastructure and resources

• Ensure that finalized data and associated documentation are transferred to an appropriate archive

• Make numerical models (source code) and description of the models available, along with model parameters and example input and output data (Thornton et al 2005)

Model-Data Fusion: Data System Characteristics (2)

Page 8: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

8

Interoperability

KNB

LTER

ORNL DAAC InternalMetadataIndex

CDL

Coordinating Nodes

Met

adat

a Ex

trac

tion

• Virtual Portals

• Numerous search capabilities

• Metadata has link to data, which reside at Member Nodes

USGS CSAS

DRYAD

Mem

ber N

odes

Futu

re

EML, ISO FGDC

FGDC, ISO

EML

FGDC

METS

FGDC, ISO

Page 9: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

9

The long tail of orphan dataVolu

me

Rank frequency of datatype

Specialized repositories(e.g. Remote Sensing, NEON)

Orphan data

(B. Heidorn)

“Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray

9

Page 10: DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.

10

Dec

reas

ing

Spati

al C

over

age

Incr

easi

ng P

roce

ss K

now

ledg

e

Adapted from CENR-OSTP

Remotesensing

Intensive science sites and experiments

Extensive science sites

Volunteer & education networks

“Data intensive science” and the “80:20 rule”

10