1
Using Semantically-Enabled Data Frameworks for Data Integration in
Virtual Observatories
Peter Fox*
*HAO/ESSL/NCAR
Deborah McGuinness$#, Luca Cinquini%, Rob Raskin!, Krishna Sinha^
Patrick West*, Jose Garcia*, Tony Darnell*, James Benedict$, Don Middleton%, Stan Solomon*
$McGuinness Associates#Knowledge Systems and AI Lab, Stanford Univ.
%SCD/CISL/NCAR!JPL/NASA
^Virginia TechWork funded by NSF and NASA
2
Paradigm shift for NASA
• From: Instrument-based• To: Measurement-based (to find and use data irrespective of which
instrument obtained it)• Requires: ‘bridging the discipline data divide’• Overall vision: To integrate information technology in support of
advancing measurement-based processing systems for NASA by integrating existing diverse science discipline and mission-specific data sources.
Science data Science data
Integration Semantics
SWEET++ ontology
Process-oriented semantic content represented in SWSL----------------------------Articulation axioms
E.g. Statistical AnalysisApplication
3
Compilation of distribution of volcanic ash associated with large eruptions. Note the continental scale ash fall associated with Yellowstone eruption ~600,000 years ago. Geologic databases provide the information about the magnitude of the eruption, and its impact on atmospheric chemistry and reflectance associated with particulate matter requires integration of concepts that bridge terrestrial and atmospheric ontologies.
4
Why we were led to semantics
• When we integrate, we integrate concepts, terms, etc.• In the past we would: ask, guess, research a lot, or give up• It’s pretty much about meaning• Semantics can really help find, access, integrate, use,
explain, trust…• What if you…
- could not only use your data and tools but remote colleague’s data and tools?
- understood their assumptions, constraints, etc and could evaluate applicability?
- knew whose research currently (or in the future) would benefit from your results?
- knew whose results were consistent (or inconsistent) with yours?…
5
6
F a il e d R i f t
S u p ra S u b d u c t i o n Z o n e C o m p l e x
C o r e C o m p l e x
F o r e A r c B a c k A rc A c t iv e A rc
T re n c hA c c r e t i o n a r y C o m p l e x
T e c t o n i c A sse m b l a g e
T e rra n eE r o s i o n P r o ce ss C o l l a p s e P r o c e s s
S t a b i l i z i n g E v e n t
U p l i f t P r o c e ss
E x te n si o n E v e n t
E x t e n si o n P ro c e ss
S p re a d i n g E v e n t
S p r e a d i n g P r o c e s s
A st h e n o s p h e re M e so s p h e re C o r e
E a rt h
L i t h o s p h e r eP l a te1 . . *1 . . *
O c e a n i c P l a te a u
O c e a n i c L i t h o sp h e r e
1 . . *1 . . *
M i d - O c e a n R i d g e
1 . . *1 . . *
T r a n sf o r m P r o c e s s
C o l l i s i o n E v e n t
S u b d u c t i o n E v e n t
S u b d u c t i o n P ro c e ss
A c t i v e C o n t i n e n t
C o n t i n e n ta l L i t h o sp h e r e
1 . . *1 . . *
S ta b l e C o n t i n e n t
1 . . *1 . . *
7
Excerpt from Plate Tectonics Ont.
9
Virtual Observatory schematicsa suite of software applications on a set of computers that allows users to uniformly find, access, and use resources (data, software, etc.) from a collection of distributed product repositories and service providers. a service that unites services and/or multiple repositories
• Conceptual examples: • In-situ: Virtual measurements
– Sensors, etc. everywhere – Related measurements
• Remote sensing: Virtual, integrative measurements– Data integration
• Both usage patterns lead to additional data management challenges at the source and for users; now managing virtual ‘datasets’
10
What should a VO do?
• Make “standard” scientific research much more efficient.– Even the principal investigator (PI) teams should want to use them.– Must improve on existing services (mission and PI sites, etc.). VOs
will not replace these, but will use them in new ways.– Access for young researchers, non-experts, other disciplines,
• Enable new, global problems to be solved. – Rapidly gain integrated views, e.g. from the solar origin to the
terrestrial effects of an event.– Find meaningful data related to any particular observation or model.– (Ultimately) answer “higher-order” queries such as “Show me the data
from cases where a large coronal mass ejection observed by the Solar-Orbiting Heliospheric Observatory was also observed in situ.” (science-speak) or “What happens when the Sun disrupts the Earth’s environment” (general public)”
11
The Astronomy approach
… … … …
VO App1
VO App2VO App3
DB2 DB3DBn
DB1
VOTable
Simple Image
Access Protocol
Simple Spectrum
Access Protocol
Simple Time Access
Protocol
VO layer
Limited interoperability
Lightweight semantics
Limited meaning, hard coded
Limited extensibility
Under review
12
… … … …
VO1VO2
VO3
DB2 DB3DBn
DB1
Semantic mediation layer - VSTO - low level
Semantic mediation layer - mid-upper-level
Education, clearinghouses, other services, disciplines, etc.
Metadata, schema, data
Query, access and use of data
Semantic query, hypothesis and inference
Semantic interoperability
Added value
Added value
Added value
Added value
Mediation Layer• Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes
• Maps queries to underlying data• Generates access requests for metadata, data• Allows queries, reasoning, analysis, new
hypothesis generation, testing, explanation, etc.
13
Integrative use-case:
Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity.
Translate this into a complete query for data.What information do we have to extract from the use-
case?What information we can infer (and integrate)?
Returns data from instruments, indices and models!!
14
Translating the Use-Case - non-monotonic?
Input
Physical properties: State of neutral atmosphere
Spatial:
• Above 100km
• Toward arctic circle (above 45N)
Conditions:
• High geomagnetic activity
Action: Return Data
Specification needed for query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
Return-type: data
GeoMagneticActivity has ProxyRepresentation
GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)
Kp is a GeophysicalIndex hasTemporalDomain: “daily”
hasHighThreshold: xsd_number = 8
Date/time when KP => 8
15
Translating the Use-Case - ctd.
Input
Physical properties: State of neutral atmosphere
Spatial:
Above 100km
Toward arctic circle (above 45N)
Conditions:
High geomagnetic activity
Action: Return Data
Specification needed for query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
Return-type: data
NeutralAtmosphere is a subRealm of TerrestrialAtmosphere
hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc.
hasSpatialDomain: [0,360],[0,180],[100,150]
hasTemporalDomain:
NeutralTemperature is a Temperature (which) is a Parameter
FabryPerotInterferometer is a Interferometer, (which) is a Optical Instrument (which) is a Instrument
hasFilterCentralWavelength: Wavelength
hasLowerBoundFormationHeight: Height
ArcticCircle is a GeographicRegion
hasLatitudeBoundary:
hasLatitudeUpperBoundary:
GeoMagneticActivity has ProxyRepresentation
GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)
Kp is a GeophysicalIndex hasTemporalDomain: “daily”
hasHighThreshold: xsd_number = 8
Date/time when KP => 8
16Partial exposure of Instrument class hierarchy - users seem to LIKE THIS
Semantic filtering by domain or instrument hierarchy
17
18
Inferred plot type and return formats for data products
VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu, www.vsto.org
Web Service
21
http://dataportal.ucar.edu/schemas/vsto_all.owl
22
VSTO Notable progress
• Conceptual model and architecture developed by combined team; KR experts, domain experts, and software engineers
• Semantic framework developed and built with a small, cohesive, carefully chosen team in a relatively short time (deployments in 1st year)
• Production portal released, includes security, etc. with community migration (and so far endorsement)
• VSTO ontology version 1.0, (vsto.owl) • Web Services encapsulation of semantic interfaces • More Solar Terrestrial use-cases to drive the completion of
the ontologies - filling out the instrument ontology• Using ontologies in other applications (volcanoes, climate, …)
23
Developing ontologies
• Use cases and small team (7-8; 2-3 domain experts, 2 knowledge experts, 1 software engineer, 1 facilitator, 1 scribe)
• Identify classes and properties (leverage controlled vocab.)– VSTO - narrower terms, generalized easily– Data integration - required broader terms– Adopted conceptual decomposition (SWEET) – Imported modules when concepts were orthogonal
• Minimal properties to start, add only when needed• Mid-level to depth - i.e. neither TD nor BU• Review, review, review, vet, vet, vet, publish - www.planetont.org
(experiences, results, lessons learned, AND your ontologies AND discussions)
• Only code them (in RDF or OWL) when needed (CMAP, …)• Ontologies: small and modular
24
What has KR done for us?
• In addition to valued added noted previously - some of which is transparent
• Allowed scientists to get data from instruments they never knew of before
• Reduced the need for 8 steps to query to 3 and reduced choices at each stage
• Allowed augmentation and validation of data• Useful and related data provided without having to
be an expert to ask for it• Integration and use (e.g. plotting) based on
inference• Ask and answer questions not possible before
25
• Scaling to large numbers of data providers• Crossing disciplines• Security, access to resources, policies• Branding and attribution (where did this data come
from and who gets the credit, is it the correct version, is this an authoritative source?)
• Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …)
• Data quality, preservation, stewardship, rescue• Interoperability at a variety of levels (~3)
Issues for Virtual Observatories
Semantics can help with many of these
26
Final remarks
• Many geoscience VOs are in production• Unified workflow around Instrument/Parameter/ Data-Time• Working on event/phenomenon• To date, the ontology creation and evolution process is working well• Informatics efforts in Geosciences are exploding
– GeoInformatics Town Hall at EGU meeting Thu lunchtime, Apr. 19 2007 in Vienna and 3 geoinformatics sessions (US10, GI10/GI11 and NH12)
– VO conference - June 11-15 2007 in Denver, CO
– e-monograph to document state of VOs
– NEW Journal of Earth Science Informatics
– Special issue of Computers and Geosciences: “Knowledge Representation in Earth and Space Science Cyberinfrastructure”
• Ongoing activities for VOs through 2008 under the auspices of the Electronic Geophysical Year (eGY; www.egy.org)
• Contact [email protected], [email protected]
27
Garage
28
29
30
[SO2]
t,X
WOVODAT CDML
Data constr.
WOVODAT MD CMDL MD
Spectrometer
Mass Spec
MultiCollect. Mass Spec
31
What about Earth Science?
• SWEET (Semantic Web for Earth and Environmental Terminology)
– http://sweet.jpl.nasa.gov– based on GCMD terms– modular using faceted and
integrative concepts
• VSTO (Virtual Solar-Terrestrial Observatory)
– http://vsto.hao.ucar.edu– captures observational data (from
instruments)– modular, using application domains
• GEON– http://www.geongrid.org– Planetary material, rocks, minerals,
elements– modular, in ‘packages’
• SESDI– http://sesdi.hao.ucar.edu– broad discipline coverage, diverse
data– highly modular
• MMI– http://marinemetadata.org– captures aspects of marine data,
ocean observing systems– partly modular, mostly by developed
project• GeoSciML
– http://www.opengis.net/GeoSciML/– is a GML (Geography ML)
application language for Geoscience– modular, in ‘packages’
• More…• Visit swoogle.umbc.edu and
planetont.org
32
33
34
35
Sector Field Mass Spectrometer
VG Sector 54
Secondary Ion Mass Spectrometer
CamecaSHRIMP
Accelerator Mass Spectrometer
Inductively Coupled Plasma Mass Spectrometer
Spark Source Mass Spectrometer
Glow Discharge Mass Spectrometer
Knudsen Effusion Mass Spectrometer
Mattauch Herzog Small Volume Mass Spectrometer
Resonance Ionizat ion Mass Spectrometer
Fourier Transform Ion Cyclotron Resonance Mass Spectrometer
Mass Spectrometer
36
Data Types
Volcanic System
Climate
Phenomenon Material
Instruments
ImportNASA: Semantic Web for Earth Science
Numerics Ontology
ImportNASA: Semantic Web for Earth Science
Units Ontology
ImportNASA: Semantic Web for Earth Science
Physical Property Ontology
ImportNASA: Semantic Web for Earth Science
Physical Phenomena Ontology
Planetary Material
Planetary Structure
Physical Properties
PlanetaryLocation
Geologic Time
GeoImage
PlanetaryPhenomenon
IMPORT EXISTINGONTOLOGIES
SWEET GEON
Top Related