Ontologies in Data and Application Integration – an Update
description
Transcript of Ontologies in Data and Application Integration – an Update
Ontologies in Data and Ontologies in Data and Application Integration – an Application Integration – an
UpdateUpdateKai Lin
Bertram Ludäscher
Knowledge-Based Information Systems Lab
Data and Knowledge Systems (DAKS)San Diego Supercomputer CenterUniversity of California San Diego
http://www.geongrid.org
GEON PI Meeting, VTech March 21—23rd 2004 2
Outline
1. Motivation
2. Ontology Cheat Sheet
3. Ontology-enabled Prototypes and Tools
4. Data & Service Registration (Structural + Semantic)
5. Scientific Workflows
GEON PI Meeting, VTech March 21—23rd 2004 3
GEON PI Meeting, VTech March 21—23rd 2004 4
Ontology Cheat Sheet (1/2)
• What is an ontology? An ontology usually … – specifies a theoryspecifies a theory (a set of modelsmodels) by …– definingdefining and relatingrelating …– conceptsconcepts representing features of a domain of interest
• Also an overloaded (sometimes sloppy) term for:– Controlled vocabularies– Database schema (relational, XML, …)– Conceptual schema (ER, UML, … )– Thesauri (synonyms, broader term/narrower term)– Taxonomies– Informal/semi-formal representations
• “Concept spaces”, “concept maps”• Labeled graphs / semantic networks (RDF)
– Formal ontologies, e.g., in [Description] Logic (OWL)• “formalization of a specification” constrains possible interpretation of terms
GEON PI Meeting, VTech March 21—23rd 2004 5
A Multi-Hierarchical Rock Classification “Ontology” (GSC)
Composition
Genesis
Fabric
Texture
GEON PI Meeting, VTech March 21—23rd 2004 6
Ontology Cheat Sheet (2/2)
• What are ontologies used for? – Conceptual models of a domain or application,
(communication means, system design, …)– Classification of …
• concepts (taxonomy) and • data/object instances through classes
– Analysis of ontologies e.g.• Graph queries (reachability, path queries, …)• Reasoning (concept subsumption, consistency checking, …)
– Targets for semantic data registration– Conceptual indexes and views for
• searching,• browsing, • querying, and • integration of registered data
Application Example: Geologic Map Integration
domainknowledge
domainknowledge
Knowledge r
epresentatio
n
Ontologies!?
NevadaNevada
Geoscientists + Computer Scientists Igneous Geoinformaticists+/- Energy
GEON Metamorphism Equation:
+/- a few hundred million years
GEON PI Meeting, VTech March 21—23rd 2004 8
Geologic Map Integration in the Portal
• After registering datasets, ontologies (here: “classes”), and an application (“OMI”), the datasets can be searched and displayed in an integrated way.
GEON PI Meeting, VTech March 21—23rd 2004 9
Concept-Based Queries and Analysis
• After registering a source with one or more ontologies, concept-based queries and analysis can be launched
• Here: light-weight client-side processing (SVG)
GEON PI Meeting, VTech March 21—23rd 2004 10
Ontologies and Data Management
• Where do ontologies fit within data management architectures?
• Several answers, specifically:– An ontology is similar to a schema or conceptual model if
one exists, but is– Developed independently of a particular application– Probably given in a different language– Inherently more general– Usually not a very good schema (weak structure)
GEON PI Meeting, VTech March 21—23rd 2004 11
Ontologies and Data Management( watch out for Semantic Data Registration later)
Schema Schema Schema Schema
ConceptualModel
ConceptualModel
Ontology
Data
Metadata
DesignArtifact
use concepts from(explicitly or implicitly)
GEON PI Meeting, VTech March 21—23rd 2004 12
Creating and Sharing Concept Maps (here: Seismology concept map & Cmap
tool)
• Lock up scientists for 2+ days• Add CS/KRDB types• Create concept maps• Refine• Iterate from napkin drawings, to
concept maps, to ontologies
GEON PI Meeting, VTech March 21—23rd 2004 13
GEON PI Meeting, VTech March 21—23rd 2004 14
GEON PI Meeting, VTech March 21—23rd 2004 15
GEON PI Meeting, VTech March 21—23rd 2004 16
Graph (RDF) Queries on Ontologies
visualisation
RQL Query:Show all “products”
Query Results
GEON PI Meeting, VTech March 21—23rd 2004 17
Community-Based Ontology Development
• Draft of a geochemistry ontology developed by scientists
Current concept maps and emerging ontologies:1. Igneous Rocks/Plutons2. Seismology3. Geochemistry
GEON PI Meeting, VTech March 21—23rd 2004 18
Protégé (… not so ezOWL yet…)
GEON PI Meeting, VTech March 21—23rd 2004 19
Sparrow (a poor man’s OWL tool …)
Simple ASCII-based RDF and OWL entry and manipulation
Semantic Data Registration(joint work w/ Shawn Bowers)
GEON PI Meeting, VTech March 21—23rd 2004 21
What is Data/Ontology/… Registration?• A A mechanismmechanism by which by which data sources, data sources,
ontologies, services,ontologies, services, … …
• … … are are publishedpublished in a repository/registryin a repository/registry
• for the purpose of “smart” for the purpose of “smart” discoverydiscovery, , queryingquerying, , integrationintegration
GEON PI Meeting, VTech March 21—23rd 2004 22
Things to Register
• Data files (individual files)– Shapefile as a blob (+ file type)
• Collections (of files; nested; eg satellite data)
• Databases (has schema and can be queried)– Shapefile with schema registered
• Ontologies• Services (web + grid services)• Other/external applications
GEON PI Meeting, VTech March 21—23rd 2004 23
Connecting Datasets to Ontologies
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
DataCollectionEventMeasurement
MeasurementContextMeasurableItem
SpeciesCountSpeciesAbundance
AbundanceCollectionEventLocation
LTERSiteSBLTERSite
{naples,…}
⊑ contains.Measurement⊑ measureOf.MeasurableItem ⊓ hasContext.MeasurementContext
⊑ hasTime.DateTime ⊓ hasLocation.Location ⊑ hasUnit.Unit ⊓ hasValue.UnitValue ⊑ MeasurableItem ⊓ hasSpecies.Species ⊓ hasUnit.RatioUnit
… ⊑ Measurement ⊓ measureOf.SpeciesCount ⊑ DataCollectionEvent ⊓ contains.SpeciesAbundance ⊑ position.Coordinate ⊑ Location ⊑ LTERSite ⊓ position.SBLTERCoordinate ⊑ SBLTERSite
How can we “register”the dataset to concepts in the Ontology?
Ontology (snippet)
Dataset
GEON PI Meeting, VTech March 21—23rd 2004 24
Step1: Selecting Relevant Concepts
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Concepts from an Ontology
Dataset
• DataCollectionEvent• AbundanceCollectionEvent
• Measurement• Abundance
• SpeciesAbundance
• MeasurableItem• SpeciesCount
• Location• LTERSite
• SBLTERSite• naples
• Species• …
• MeasurementContext• …
GEON PI Meeting, VTech March 21—23rd 2004 25
Step1: Selecting Relevant Concepts
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Date Site Transect SP_Code Count 2000-09-08 CARP 1 CRGI 0 2000-09-08 CARP 4 LOCH 0 2000-09-08 CARP 7 MUCA 1 2000-09-22 NAPL 7 LOCH 1 2000-09-18 NAPL 1 PAPA 5 2000-09-28 BULL 1 CYOS 57
Concepts from an Ontology
Dataset
• DataCollectionEvent• AbundanceCollectionEvent
• Measurement• Abundance
• SpeciesAbundance
• MeasurableItem• SpeciesCount
• Location• LTERSite
• SBLTERSite• naples
• Species• …
• MeasurementContext• …
GEON PI Meeting, VTech March 21—23rd 2004 26
Step2: Generate Object ModelConcepts from an Ontology
AbundanceCollection Event
SpeciesAbundance
containsSpeciesCount
measureOf
Species
hasSpecies
RatioUnit
hasUnit
RatioValue
hasValue
DateTime SBLTERSite
hasTime hasLoc
• DataCollectionEvent• AbundanceCollectionEvent
• Measurement• Abundance
• SpeciesAbundance
• MeasurableItem• SpeciesCount
• Location• LTERSite
• SBLTERSite• naples
• Species• …
• MeasurementContext• …
GEON PI Meeting, VTech March 21—23rd 2004 27
GEON PI Meeting, VTech March 21—23rd 2004 28
GEON PI Meeting, VTech March 21—23rd 2004 29
Applications of Semantic Registration• Mentioned before:
– Smart data discovery, integration etc.
• New application:– Generating data transformation semi-
automatically for chaining together computational services
GEON PI Meeting, VTech March 21—23rd 2004 30
Problem: Service Reusability
• Unless “designed to fit,” independent services are structurally incompatible
• Generally, the source output type will not be a subtype of the target input type
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
(⋠)
GEON PI Meeting, VTech March 21—23rd 2004 31
Service Reusability
• A data transformation mapping () is required to connect the services … artificially creating subtype compatibility
• If such a exists, the services are “structurally feasible”
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
(⋠)
(Ps)(Ps) (≺)
GEON PI Meeting, VTech March 21—23rd 2004 32
Service Reusability
• Idea: – annotate services with semantic types (concept
expressions) primarily for discovery of services
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
Ontologies (OWL)Ontologies (OWL)
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
Desired Connection
Compatible ( )⊑
GEON PI Meeting, VTech March 21—23rd 2004 33
Service Reusability
• Services can be semantically compatible, but structurally incompatible
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Incompatible
Compatible
(⋠)
(⊑)
(Ps)(Ps) (≺)
Ontologies (OWL)Ontologies (OWL)
GEON PI Meeting, VTech March 21—23rd 2004 34
The Ontology-Driven Framework (work w/ Shawn Bowers, SEEK)
SourceServiceSourceService
TargetServiceTargetService
Ps Pt
SemanticType Ps
SemanticType Ps
SemanticType Pt
SemanticType Pt
StructuralType Pt
StructuralType Pt
StructuralType Ps
StructuralType Ps
Desired Connection
Compatible ( )⊑
RegistrationMapping (Output)
RegistrationMapping (Input)
CorrespondenceCorrespondence
Generate (Ps)(Ps)
Ontologies (OWL)Ontologies (OWL)
Transformation
GEON PI Meeting, VTech March 21—23rd 2004 35
Example Generated Data Transformation (in XQuery)
• Based on the structural correspondences and certain assumptions, we derive the transformation query:
<cohortTable> { for $s in /population/sample return <measurement> { for $c in $s/meas/cnt return <obs>{$c/text()}</obs> } { for $l in $s/lsp return <phase>{$l/text()}</phase> } </measurement> }</cohortTable>
Scientific Workflows(Efrat Jaeger et al.)
GEON PI Meeting, VTech March 21—23rd 2004 37
Reverse Engineering a Scientific Workflow using the KEPLER Tool (Efrat Jaeger)
GEON PI Meeting, VTech March 21—23rd 2004 38
A Scientific Workflow in Kepler
Extract mineral composition for row Id.
Igneous Rock Diagrams information.
Rock Name.
GEON PI Meeting, VTech March 21—23rd 2004 39
A Scientific Workflow in Kepler
GEON PI Meeting, VTech March 21—23rd 2004 40
A Scientific Workflow in Kepler
GEON PI Meeting, VTech March 21—23rd 2004 41
GEON PI Meeting, VTech March 21—23rd 2004 42
Reverse-Engineered the Geological Map Integration in Kepler
GEON PI Meeting, VTech March 21—23rd 2004 43
DataMapper Sub-Workflow
GEON PI Meeting, VTech March 21—23rd 2004 44
Result launched via the BrowserUI actor
GEON PI Meeting, VTech March 21—23rd 2004 45
KEPLER and YOU
• Kepler …– is a community-based, cross-project,
open source collaboration– for “minute made” application
integration– using web (grid) services as basic
building blocks– has a joint CVS repository, mailing
lists, web site, …– is gaining momentum thanks to
contributors and contributions• BSD-style license allows commercial
spin-offs – a pre-packaged, shrink-wrapped
version (“Kepler-to-GO”) coming soon to a place near you…
F I N – Questions?
Additional Material
GEON PI Meeting, VTech March 21—23rd 2004 48
The KEPLER GUI (Vergil from Ptolemy II)
Drag and drop utilities, director and actor libraries.
GEON PI Meeting, VTech March 21—23rd 2004 49
Running the workflow
GEON PI Meeting, VTech March 21—23rd 2004 50
Distributed Workflows in KEPLER
• Web and Grid Service plug-ins– WSDL– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard– SRB– SSH, SCP
• Web Service Harvester– Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors
• XSLT and XQuery transformers to link non-fitting services together
• Web Service Deployment (…ongoing work…)
GEON PI Meeting, VTech March 21—23rd 2004 51
A Generic Web Service Actor
Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.
Configure - select service operation
GEON PI Meeting, VTech March 21—23rd 2004 52
Set Parameters and Commit
Set parameters and commit
GEON PI Meeting, VTech March 21—23rd 2004 53
WS Actor after Instantiation
GEON PI Meeting, VTech March 21—23rd 2004 54
Web Service Harvester
• Imports the web services in a repository into the actor library.• Has the capability to search for web services based on a keyword.
GEON PI Meeting, VTech March 21—23rd 2004 55
Composing 3rd-Party WSs
Output of previousweb service
User interaction &Transformations
Input of next web service
Providing DB Access through Kepler
• Database connection actor: – Opening a database connection and passing it to all actors
accessing this database.
• Database query actor:– A generic actor that queries a database and provides its
result.
• DBConnection type and DBConnectionToken:– A new IOPort type and a token to distinguish a database
connection from any general type.
Database Connection Actor
OpenDBConnection actor:
• Input: database connection information.• Output: A DBConnectionToken, a reference
to a database connection instance, through a DBConnection output port.
Database Query Actor
Database Query actor:
Input: A query string (SQL) and a database connection reference.
Parameters: output type – XML, Record or String. output each row separately or all at once.
Process: Execute query. Produce results according to parameters.
Querying Example
GEON PI Meeting, VTech March 21—23rd 2004 60
Resource Description Framework (RDF)
Simple data model that consists of– Resources (uniquely identified via URIs)– Properties – Values (resources or character strings)
Data organized into triples (subject, property, value)
SonomaRegion CaliforniaRegionlocatedIn
Subject(Resource)
Value(Resource)
Property(Resource)
locatedIn(SonomaRegion, California)
GEON PI Meeting, VTech March 21—23rd 2004 61
RDF Schema
Adds a set of pre-defined properties to define classes and properties
Allows instances to be connected to classes
Sub-class and sub-property (is-a) relationships
SonomaRegion CaliforniaRegionlocatedIn
Region
rdf:type rdf:type
locatedInRegion is a classlocatedIn is a propertylocatedIn connects Regions
GEON PI Meeting, VTech March 21—23rd 2004 62
OWL
Adds additional pre-defined properties to further constrain an ontology(See http://www.w3.org/TR/owl-guide/)
Note, RDF(S) and OWL use XMLSome graphic tools exist (e.g., Protégé)
<owl:Class rdf:ID="Vintage"> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasVintageYear"/> <owl:cardinality>1</owl:cardinality> </owl:Restriction> </rdfs:subClassOf> </owl:Class>
A Vintage is a class that is a subclass of an unnamed class whose instances always have
one hasVintageYear property.
Note the uglified XML syntax…The good news: meant for
parsers, not humans!