Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for...

16
Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems Division NASA Ames Research Center [email protected] http://sciencedesk.arc.nasa.gov/scidesk/ February 17, 2005 ROSES Workshop

Transcript of Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for...

Page 1: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Semantic Metadata for Scientific Data Access and Management

Richard M. Keller, Ph.D.Group Lead for Information Sharing & Integration

Intelligent Systems DivisionNASA Ames Research Center

[email protected]

http://sciencedesk.arc.nasa.gov/scidesk/

February 17, 2005ROSES Workshop

Page 2: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Focus of Work

• Scientific data management, not data analysis

• Computational infrastructure related to:

• storing

• locating

• searching

• integrating

• sharing

scientific data

Page 3: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Specific Problems

• Integrating Heterogeneous Scientific Data from Multiple Sources

• Searching/finding Relevant Scientific Data

• Organizing/indexing Data for Rapid, Intuitive Access

Page 4: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Culprit: Inadequate Metadata

• Metadata is typically limited to essentials only (e.g. data format, instrument, date)

– inadequate for extensive indexing, precise searching

• Each data repository defines its own metadata, using its own terminology and data dictionary

– difficult to search across repositories

– difficult to integrate and combine datasets

• No common frame of reference for cross-repository comparison

Page 5: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Common Approach

To facilitate storage, retrieval, integration, and comprehension of scientific data:

capture the

semantic metadata

that provides a rich context for each data product

Page 6: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

What is “semantic metadata”?

Semantic Metadata:

information relating to the context in which the scientific data are generated and used

– how?

– when?

– where?

– why?

– who?

Page 7: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Collection of microbial mats in the field

Early Microbial Ecosystems Investigation

Trace gas production and consumption under

“Early Earth” conditions

Greenhouse Incubator

Microbial mat (algae)

Detailed studies of mat biogeochemistry

• monitoring• analysis• experimentation

geographically-disbursedteam of collaborators

B. BeboutD. Des MaraisT. Hoehler, et al.Code SSX

Page 8: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Semantic Context Surrounding Mat “4b” (“Semantic Network”)

collected-at

Spring Beach

collected-by

Brad Bebout

stored-in

Greenhouse

has-measurement

measured-with

O2 Microsensor

O2 Concentration

HBC-2 Microbialculture

Culture prep B notes for Lee

Culture prep B notes for Lee

has-culture

cultivated-by

CulturerecipeMary Hogan

has-recipe

imaged-with

Electron Microscope

has-image

Page 9: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Semantic Network Structure

culture

photo

measurement

siteinstrument

sample

hypothesis

• Links: relationships among resources (e.g.,“measured by”, “supports hypothesis”)

• Attached files: electronic products associated with resources (e.g., datasets, images, documents)

• Attributes: properties of resources (metadata)

• Nodes: key info resources or organizational structures (describes people, places, measurements, hypotheses)

• date• size• format

Ontology:Specifies the

types of nodes, attributes and

links defined for scientific

investigation

Rules:Add/modify nodes, links & attributes in the network

Page 10: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

DNA sequenceimage

document

culture

personsample

photographic image

SEM image

Scientific Data Collection Ontology (partial)

other

experiment

Scientific Information Nodes

project

measurement

site

equipment

camera

gas chromatograph

stub

O2 microsensor

N2 microsensorSEM

O2 concentration

N2 concentration

spectrometer

spectrograph

chromatogram

other

other

micrograph

cultivated-fromcultivated-by

has-genetic-sequence

pictured-in

researcher

lab tech

Page 11: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Benefits of Semantic Metadata Approach

• Semantic context provides a unifying framework for integrating data across data collections

• Sophisticated “semantic search” methods allow retrieval based on semantic relationships among data

• Intuitive data indexing, access, and organization schemes derive from semantic data models

• Formal semantic representation enables automated inference about the data

Page 12: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Challenge

• Semantic metadata approach has been applied to small, PI-maintained data repositories

• Tremendous volume of earth and space science data is stored in huge, curated data repositories maintained by NASA, USGS, ESA, universities, and others.

• How to translate semantic metadata ideas to operate on the scale of large data repositories?

Seeking Collaborators!

Page 13: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

SemanticOrganizer System(Mat Sample: Spring-M4-b)

Page 14: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

Photo: SprM4b excised

Page 15: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

What is ScienceOrganizer?

• A Web-based collaborative knowledge management tool for distributed teams of scientific investigators

• Facilitates information sharing, integration, correlation• A project information repository / digital library: users upload/download heterogeneous project information products -- images, datasets, documents, and various types of scientific records (describing samples, field sites, measurements, instruments, etc.)

• Features cross-linkage: enables rapid access to interrelated information; permits linking data and observations to scientific hypotheses

• Supports inference capabilities: permits formal reasoning about the repository contents

• A “project archive” system: tracks history of project team’s fieldwork, labwork, and associated data collection activities

Page 16: Semantic Metadata for Scientific Data Access and Management Richard M. Keller, Ph.D. Group Lead for Information Sharing & Integration Intelligent Systems.

ScienceOrganizer Users

• ARC Microbial Ecosystems Group: field & lab science, experiments, data analysis.

• NAI Ecogenomics Focus Group: cross-discipline collaboration, data analysis.

• ARC Electron Microscopy Lab: electron microscopy image archiving, sample cataloging.

• MARTE Mission: analog Mars drilling mission, support for remote science data acquisition, storage, and access

• JSC Astrobiology Institute for the Study of Biomarkers: electron microscopy image archive, sample collection, cataloging, and storage; support for education & outreach.

• NIH/NASA Malaria Control Study: African malaria study - data collection and archiving.

• ASU/NSF Desert Microbial Survey (NSF): microbial survey; provides publicly-accessible repository.

• Mobile Agents Demonstration Project: analog Mars surface exploration, support for remote science data acquisition, storage, and access

• Astrobionics Technology Integration: technology infusion program