Download - 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

Transcript
Page 1: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

1

Using Semantically-Enabled Data Frameworks for Data Integration in

Virtual Observatories

Peter Fox*

*HAO/ESSL/NCAR

Deborah McGuinness$#, Luca Cinquini%, Rob Raskin!, Krishna Sinha^

Patrick West*, Jose Garcia*, Tony Darnell*, James Benedict$, Don Middleton%, Stan Solomon*

$McGuinness Associates#Knowledge Systems and AI Lab, Stanford Univ.

%SCD/CISL/NCAR!JPL/NASA

^Virginia TechWork funded by NSF and NASA

Page 2: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

2

Paradigm shift for NASA

• From: Instrument-based• To: Measurement-based (to find and use data irrespective of which

instrument obtained it)• Requires: ‘bridging the discipline data divide’• Overall vision: To integrate information technology in support of

advancing measurement-based processing systems for NASA by integrating existing diverse science discipline and mission-specific data sources.

Science data Science data

Integration Semantics

SWEET++ ontology

Process-oriented semantic content represented in SWSL----------------------------Articulation axioms

E.g. Statistical AnalysisApplication

Page 3: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

3

Compilation of distribution of volcanic ash associated with large eruptions. Note the continental scale ash fall associated with Yellowstone eruption ~600,000 years ago. Geologic databases provide the information about the magnitude of the eruption, and its impact on atmospheric chemistry and reflectance associated with particulate matter requires integration of concepts that bridge terrestrial and atmospheric ontologies.

Page 4: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

4

Why we were led to semantics

• When we integrate, we integrate concepts, terms, etc.• In the past we would: ask, guess, research a lot, or give up• It’s pretty much about meaning• Semantics can really help find, access, integrate, use,

explain, trust…• What if you…

- could not only use your data and tools but remote colleague’s data and tools?

- understood their assumptions, constraints, etc and could evaluate applicability?

- knew whose research currently (or in the future) would benefit from your results?

- knew whose results were consistent (or inconsistent) with yours?…

Page 5: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

5

Page 6: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

6

F a il e d R i f t

S u p ra S u b d u c t i o n Z o n e C o m p l e x

C o r e C o m p l e x

F o r e A r c B a c k A rc A c t iv e A rc

T re n c hA c c r e t i o n a r y C o m p l e x

T e c t o n i c A sse m b l a g e

T e rra n eE r o s i o n P r o ce ss C o l l a p s e P r o c e s s

S t a b i l i z i n g E v e n t

U p l i f t P r o c e ss

E x te n si o n E v e n t

E x t e n si o n P ro c e ss

S p re a d i n g E v e n t

S p r e a d i n g P r o c e s s

A st h e n o s p h e re M e so s p h e re C o r e

E a rt h

L i t h o s p h e r eP l a te1 . . *1 . . *

O c e a n i c P l a te a u

O c e a n i c L i t h o sp h e r e

1 . . *1 . . *

M i d - O c e a n R i d g e

1 . . *1 . . *

T r a n sf o r m P r o c e s s

C o l l i s i o n E v e n t

S u b d u c t i o n E v e n t

S u b d u c t i o n P ro c e ss

A c t i v e C o n t i n e n t

C o n t i n e n ta l L i t h o sp h e r e

1 . . *1 . . *

S ta b l e C o n t i n e n t

1 . . *1 . . *

Page 7: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

7

Excerpt from Plate Tectonics Ont.

Page 8: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

9

Virtual Observatory schematicsa suite of software applications on a set of computers that allows users to uniformly find, access, and use resources (data, software, etc.) from a collection of distributed product repositories and service providers. a service that unites services and/or multiple repositories

• Conceptual examples: • In-situ: Virtual measurements

– Sensors, etc. everywhere – Related measurements

• Remote sensing: Virtual, integrative measurements– Data integration

• Both usage patterns lead to additional data management challenges at the source and for users; now managing virtual ‘datasets’

Page 9: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

10

What should a VO do?

• Make “standard” scientific research much more efficient.– Even the principal investigator (PI) teams should want to use them.– Must improve on existing services (mission and PI sites, etc.). VOs

will not replace these, but will use them in new ways.– Access for young researchers, non-experts, other disciplines,

• Enable new, global problems to be solved. – Rapidly gain integrated views, e.g. from the solar origin to the

terrestrial effects of an event.– Find meaningful data related to any particular observation or model.– (Ultimately) answer “higher-order” queries such as “Show me the data

from cases where a large coronal mass ejection observed by the Solar-Orbiting Heliospheric Observatory was also observed in situ.” (science-speak) or “What happens when the Sun disrupts the Earth’s environment” (general public)”

Page 10: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

11

The Astronomy approach

… … … …

VO App1

VO App2VO App3

DB2 DB3DBn

DB1

VOTable

Simple Image

Access Protocol

Simple Spectrum

Access Protocol

Simple Time Access

Protocol

VO layer

Limited interoperability

Lightweight semantics

Limited meaning, hard coded

Limited extensibility

Under review

Page 11: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

12

… … … …

VO1VO2

VO3

DB2 DB3DBn

DB1

Semantic mediation layer - VSTO - low level

Semantic mediation layer - mid-upper-level

Education, clearinghouses, other services, disciplines, etc.

Metadata, schema, data

Query, access and use of data

Semantic query, hypothesis and inference

Semantic interoperability

Added value

Added value

Added value

Added value

Mediation Layer• Ontology - capturing concepts of Parameters,

Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes

• Maps queries to underlying data• Generates access requests for metadata, data• Allows queries, reasoning, analysis, new

hypothesis generation, testing, explanation, etc.

Page 12: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

13

Integrative use-case:

Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity.

Translate this into a complete query for data.What information do we have to extract from the use-

case?What information we can infer (and integrate)?

Returns data from instruments, indices and models!!

Page 13: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

14

Translating the Use-Case - non-monotonic?

Input

Physical properties: State of neutral atmosphere

Spatial:

• Above 100km

• Toward arctic circle (above 45N)

Conditions:

• High geomagnetic activity

Action: Return Data

Specification needed for query to CEDARWEB

Instrument

Parameter(s)

Operating Mode

Observatory

Date/time

Return-type: data

GeoMagneticActivity has ProxyRepresentation

GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)

Kp is a GeophysicalIndex hasTemporalDomain: “daily”

hasHighThreshold: xsd_number = 8

Date/time when KP => 8

Page 14: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

15

Translating the Use-Case - ctd.

Input

Physical properties: State of neutral atmosphere

Spatial:

Above 100km

Toward arctic circle (above 45N)

Conditions:

High geomagnetic activity

Action: Return Data

Specification needed for query to CEDARWEB

Instrument

Parameter(s)

Operating Mode

Observatory

Date/time

Return-type: data

NeutralAtmosphere is a subRealm of TerrestrialAtmosphere

hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc.

hasSpatialDomain: [0,360],[0,180],[100,150]

hasTemporalDomain:

NeutralTemperature is a Temperature (which) is a Parameter

FabryPerotInterferometer is a Interferometer, (which) is a Optical Instrument (which) is a Instrument

hasFilterCentralWavelength: Wavelength

hasLowerBoundFormationHeight: Height

ArcticCircle is a GeographicRegion

hasLatitudeBoundary:

hasLatitudeUpperBoundary:

GeoMagneticActivity has ProxyRepresentation

GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)

Kp is a GeophysicalIndex hasTemporalDomain: “daily”

hasHighThreshold: xsd_number = 8

Date/time when KP => 8

Page 15: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

16Partial exposure of Instrument class hierarchy - users seem to LIKE THIS

Semantic filtering by domain or instrument hierarchy

Page 16: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

17

Page 17: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

18

Inferred plot type and return formats for data products

Page 18: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu, www.vsto.org

Web Service

Page 19: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

21

http://dataportal.ucar.edu/schemas/vsto_all.owl

Page 20: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

22

VSTO Notable progress

• Conceptual model and architecture developed by combined team; KR experts, domain experts, and software engineers

• Semantic framework developed and built with a small, cohesive, carefully chosen team in a relatively short time (deployments in 1st year)

• Production portal released, includes security, etc. with community migration (and so far endorsement)

• VSTO ontology version 1.0, (vsto.owl) • Web Services encapsulation of semantic interfaces • More Solar Terrestrial use-cases to drive the completion of

the ontologies - filling out the instrument ontology• Using ontologies in other applications (volcanoes, climate, …)

Page 21: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

23

Developing ontologies

• Use cases and small team (7-8; 2-3 domain experts, 2 knowledge experts, 1 software engineer, 1 facilitator, 1 scribe)

• Identify classes and properties (leverage controlled vocab.)– VSTO - narrower terms, generalized easily– Data integration - required broader terms– Adopted conceptual decomposition (SWEET) – Imported modules when concepts were orthogonal

• Minimal properties to start, add only when needed• Mid-level to depth - i.e. neither TD nor BU• Review, review, review, vet, vet, vet, publish - www.planetont.org

(experiences, results, lessons learned, AND your ontologies AND discussions)

• Only code them (in RDF or OWL) when needed (CMAP, …)• Ontologies: small and modular

Page 22: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

24

What has KR done for us?

• In addition to valued added noted previously - some of which is transparent

• Allowed scientists to get data from instruments they never knew of before

• Reduced the need for 8 steps to query to 3 and reduced choices at each stage

• Allowed augmentation and validation of data• Useful and related data provided without having to

be an expert to ask for it• Integration and use (e.g. plotting) based on

inference• Ask and answer questions not possible before

Page 23: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

25

• Scaling to large numbers of data providers• Crossing disciplines• Security, access to resources, policies• Branding and attribution (where did this data come

from and who gets the credit, is it the correct version, is this an authoritative source?)

• Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …)

• Data quality, preservation, stewardship, rescue• Interoperability at a variety of levels (~3)

Issues for Virtual Observatories

Semantics can help with many of these

Page 24: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

26

Final remarks

• Many geoscience VOs are in production• Unified workflow around Instrument/Parameter/ Data-Time• Working on event/phenomenon• To date, the ontology creation and evolution process is working well• Informatics efforts in Geosciences are exploding

– GeoInformatics Town Hall at EGU meeting Thu lunchtime, Apr. 19 2007 in Vienna and 3 geoinformatics sessions (US10, GI10/GI11 and NH12)

– VO conference - June 11-15 2007 in Denver, CO

– e-monograph to document state of VOs

– NEW Journal of Earth Science Informatics

– Special issue of Computers and Geosciences: “Knowledge Representation in Earth and Space Science Cyberinfrastructure”

• Ongoing activities for VOs through 2008 under the auspices of the Electronic Geophysical Year (eGY; www.egy.org)

• Contact [email protected], [email protected]

Page 25: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

27

Garage

Page 26: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

28

Page 27: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

29

Page 28: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

30

[SO2]

t,X

WOVODAT CDML

Data constr.

WOVODAT MD CMDL MD

Spectrometer

Mass Spec

MultiCollect. Mass Spec

Page 29: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

31

What about Earth Science?

• SWEET (Semantic Web for Earth and Environmental Terminology)

– http://sweet.jpl.nasa.gov– based on GCMD terms– modular using faceted and

integrative concepts

• VSTO (Virtual Solar-Terrestrial Observatory)

– http://vsto.hao.ucar.edu– captures observational data (from

instruments)– modular, using application domains

• GEON– http://www.geongrid.org– Planetary material, rocks, minerals,

elements– modular, in ‘packages’

• SESDI– http://sesdi.hao.ucar.edu– broad discipline coverage, diverse

data– highly modular

• MMI– http://marinemetadata.org– captures aspects of marine data,

ocean observing systems– partly modular, mostly by developed

project• GeoSciML

– http://www.opengis.net/GeoSciML/– is a GML (Geography ML)

application language for Geoscience– modular, in ‘packages’

• More…• Visit swoogle.umbc.edu and

planetont.org

Page 30: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

32

Page 31: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

33

Page 32: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

34

Page 33: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

35

Sector Field Mass Spectrometer

VG Sector 54

Secondary Ion Mass Spectrometer

CamecaSHRIMP

Accelerator Mass Spectrometer

Inductively Coupled Plasma Mass Spectrometer

Spark Source Mass Spectrometer

Glow Discharge Mass Spectrometer

Knudsen Effusion Mass Spectrometer

Mattauch Herzog Small Volume Mass Spectrometer

Resonance Ionizat ion Mass Spectrometer

Fourier Transform Ion Cyclotron Resonance Mass Spectrometer

Mass Spectrometer

Page 34: 1 Using Semantically-Enabled Data Frameworks for Data Integration in Virtual Observatories Peter Fox * * HAO/ESSL/NCAR Deborah McGuinness $#, Luca Cinquini.

36

Data Types

Volcanic System

Climate

Phenomenon Material

Instruments

ImportNASA: Semantic Web for Earth Science

Numerics Ontology

ImportNASA: Semantic Web for Earth Science

Units Ontology

ImportNASA: Semantic Web for Earth Science

Physical Property Ontology

ImportNASA: Semantic Web for Earth Science

Physical Phenomena Ontology

Planetary Material

Planetary Structure

Physical Properties

PlanetaryLocation

Geologic Time

GeoImage

PlanetaryPhenomenon

IMPORT EXISTINGONTOLOGIES

SWEET GEON