An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled...

58
An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award* Lecture Standing in for Deborah L. McGuinness* July 17, 2013 Peter Fox (RPI) [email protected] , @taswegian Tetherless World Constellation

description

Origins (1) … In the need for capturing and preserving knowledge in Earth sciences data settings became very clear but the barriers were high In 2004 we started a virtual observatory project based on semantic technologies Tetherless World Constellation3

Transcript of An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled...

Page 1: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

An Open-World Iterative Methodology for the Development and Evaluation of

Semantically-Enabled ApplicationsIAAI - Session 23F

Robert S. Engelmore Award* LectureStanding in for Deborah L. McGuinness*

July 17, 2013

Peter Fox (RPI) [email protected], @taswegianTetherless World Constellation

Page 2: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Outline

• Origins of this effort and the working premise • Semantics in 2004 • Ontologies and the software and production!• Semantics between 2004 and 2009• The design and development methodology• The expressivity and implementability balance

and one more …• 2009-2013 … • And, a bit about what we are up to and where it

is going…

2Tetherless World Constellation

Page 3: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Origins (1) …

• In 2000-2001 the need for capturing and preserving knowledge in Earth sciences data settings became very clear but the barriers were high

• In 2004 we started a virtual observatory project based on semantic technologies

Tetherless World Constellation 3

Page 4: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Content: Coupling Energetics and Dynamics of Atmospheric Regions

Community data archive for observations and models of Earth's upper atmosphere and geophysical indices and parameters needed to interpret them. Includes browsing capabilities by periods, instruments, models, …

Page 5: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Content: Mauna Loa Solar Observatory

Near real-time data from Hawaii from a variety of solar instruments. Source for space weather, solar variability, and basic solar physics

Page 6: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Working premise

Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data and information that:• appears to be integrated• appears to be locally available• is accessible in a suitable vocabularyThen BAM - Data intensive – volume, complexity,

mode, scale, heterogeneity, … in an OPEN WORLD

Page 7: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Origins (2) …

• Use case driven – in solar and solar-terrestrial physics with an emphasis on instrument-based measurements and real data pipelines; we needed implementations

• We knew we also needed integration and provenance (but that came later)

• We aimed to push semantics into our systems to build new ‘prototypes’ but we ‘failed’ ;-)

Tetherless World Constellation 7

Page 8: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

8

Early days of VOs

… … … …

VO1

VO2 VO3

DB2 DB3DBn

DB1

?

Page 9: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

9

The Astronomy approach; data-types as a service

… … … …

VO App1VO App2 VO App3

DB2 DB3DBn

DB1

VOTableSimple

Image Access

ProtocolSimple Spectrum

Access ProtocolSimple

Time Access

Protocol

VO layer

Lightweight semanticsLimited meaning, hard codedLimited extensibility

Page 10: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

In 2004

Tetherless World Constellation 10

Page 11: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Design and Development

• Only develop ontologies that were required to answer specific use cases

• Use whatever ontologies were available**

• Would rules be needed?• We ignored query*

Tetherless World Constellation 11

Page 12: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

12

Science and technical use cases

Find data which represents the state of the neutral atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity.

Provide semantically-enabled, smart data query services via the Web for the Virtual Ionosphere-Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

Page 13: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

13

Use Case example

• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.

• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.

• Objects: – Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Non-vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product

Page 14: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

14

Knowledge representation

• Statements as triples: {subject-predicate-object}interferometer is-a optical instrumentFabry-Perot is-a interferometerOptical instrument has focal lengthOptical instrument is-a instrumentInstrument has instrument operating modeInstrument has measured parameterInstrument operating mode has measured parameterNeutralTemperature is-a temperatureTemperature is-a parameter

• A query*: select all optical instruments which have operating mode vertical

• An inference: infer operating modes for a Fabry-Perot Interferometer which measures neutral temperature

• ISWC paper award 2006, IAAI best paper (2007), Fox et al. 2009 in Computers and Geosciences.

Page 15: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Fox - APAC 2007, Driving e-research: Grids and Semantics

15… … … …

VO Portal

Web Serv.

VO API

DB2 DB3DBn

DB1

Semantic mediation layer - VSTO - low level

Semantic mediation layer – mid-upper-level

Education, clearinghouses, other services, disciplines, etc.

Metadata, schema, data

Query, access and use of data

Semantic query, hypothesis and inference

Semantic interoperability

Added value

Added value

Added value

Added value

Mediation Layer• Ontology - capturing concepts of Parameters,

Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes

• Maps queries to underlying data• Generates access requests for metadata, data• Allows queries, reasoning, analysis, new

hypothesis generation, testing, explanation, etc.

Page 16: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Fox - APAC 2007, Driving e-research: Grids and Semantics

16

Partial exposure of Instrument class hierarchy - users seem to LIKE THIS

Semantic filtering by domain or instrument hierarchy

Page 17: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

17

Page 18: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

18

Inferred plot type and return required axes data

Page 19: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

19

Semantic Web Services

Page 20: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Fox - APAC 2007, Driving e-research: Grids and Semantics

20

Semantic Web Services

OWL document returned using VSTO ontology - can be used both syntactically or semantically

Page 21: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

http://escience.rpi.edu/schemas/vsto_all.owl

Page 22: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

22

Developing ontologies

• Use cases and small team

• Identify classes and minimal properties (leverage controlled vocab.)

• Review, vet, publish • Only code them (in RDF or OWL) when needed

(CMAP, …)• Ontologies: small and modular

Page 23: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Implications and OWL 1.0

• Lack of numeric support meant that the the rules and procedural logic were implemented in java, i.e. in the code

• On several occasions the tools (not to be named) pushed us into OWL-Full, introduced inconsistencies, etc.

• Finally, they stabilized, and in 2005 (and again in 2006 and twice in 2007) we had stable releases

Tetherless World Constellation 23

Page 24: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Expressivity VSTO 1.0

Tetherless World Constellation 24

Page 25: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Expressivity VSTO dev. version

Tetherless World Constellation 25

Page 26: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Yikes

Tetherless World Constellation 26

Page 27: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Ontologies and the software

• Protégé 2.x and then 3.x built from our ontology on the web

• Java class generation

• Eclipse as a development environment• Leveraged a portal code base (from the

Earth System Grid project)

Tetherless World Constellation 27

Page 28: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Implementation choices

• Our big challenge was time – in use cases and in the representation– Depending on the level of granularity there were

> 200,000 day-time records, and > 70,000,000 sub-day time intervals – no triple store could handle this**

• Reasoning in finite time does not mean 3-4 secs!

• We descoped our effort to delay use cases such as: find all neutral temperature data around the summer solstice for the last decade

• We chose a minimal time encoding in the ontology and delegated that to a relational DBTetherless World Constellation 28

Page 29: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

29

Page 30: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Fox - APAC 2007, Driving e-research: Grids and Semantics

30

VSTO - semantics and ontologies in an operational environment: www.vsto.org

Web Service

Page 31: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

31

Semantic Web Benefits (1)

Unified/ abstracted query workflow: Parameters, Instruments, Date-Time

Decreased input requirements for query: in one case reducing the number of selections from eight to three

Generates only syntactically correct queries: which was not always insurable in previous implementations without semantics

Page 32: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

32

Semantic Web Benefits (2)

Semantic query support: only expose coherent query (portal and services)

Semantic integration: mediated understanding of coordinate systems, relationships, data

synthesis, transformations. returns independent variables and related parameters

A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

Page 33: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Evaluation• Formative and Summative

Tetherless World Constellation 33

Page 34: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Iteration

• We needed the ability to evolve the ontology and not break the framework

• As we broadened re-use of these ontologies and creation of new ones

Tetherless World Constellation 34

Page 35: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Maintenance

• Support for collaborative feedback, evolution

• Change management• Support for ‘comments’ and

‘annotations’, i.e. self-documentation• Package management: creation,

dependency, consistency checking

Tetherless World Constellation 35

Page 36: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Semantics between 2004 and 2009

• Ontologies were needed for data integration and provenance and mediation for data mining

• Protégé 3.x and then 4.0 came out• SWOOP development was interrupted• Cmap added OWL predicate support*• SPARQL became a recommendation• Triple stores exploded in use and capability• Linked Open Data started to take off• Pellet 2.0 came out• We invaded OWLED 2006, 2007, 2009, (2010)

Tetherless World Constellation 36

Page 37: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Semantics approach

• Use cases• Stakeholders• Distributed

authority• Access control• Ontologies• Maintaining

Identity

Page 38: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Working with knowledge

Expressivity

Maintainability/ Extensibility

Implement-ability

Query

Rule execution

Inference

Page 39: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Expressivity/ Implementation

Declarative Procedural

Linked open dataURI/http/RDF *

Ontology encoded

Page 40: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Semantics between 2009 and 2013

• Semantic data frameworks (SeSF)• Substantial knowledge provenance work• Data quality, uncertainty and bias

representations and applications – Multi-sensor advisor**

• Applications:– Sea Ice, Carbon Observatory, Integrated

Ecosystem Assessments, globalchange.gov, ocean.data.gov, energy.data.gov ….

Tetherless World Constellation 40

Page 41: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Anomaly Example: South Pacific Anomaly

Anomaly

41

MODIS Level 3 dataday definition leads to artifact in correlation

Page 42: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

…is caused by an Overpass Time Difference

42

Page 43: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

RuleSet Development

[DiffNEQCT:(?s rdf:type gio:RequestedService),(?s gio:input ?a),(?a rdf:type gio:DataSelection),(?s gio:input ?b),(?b rdf:type gio:DataSelection),(?a gio:sourceDataset ?a.ds),(?b gio:sourceDataset ?b.ds),(?a.ds gio:fromDeployment ?a.dply),(?b.ds gio:fromDeployment ?b.dply),(?a.dply rdf:type gio:SunSynchronousOrbitalDeployment),(?b.dply rdf:type gio:SunSynchronousOrbitalDeployment),(?a.dply gio:hasNominalEquatorialCrossingTime ?a.neqct),(?b.dply gio:hasNominalEquatorialCrossingTime ?b.neqct),notEqual(?a.neqct, ?b.neqct)->(?s gio:issueAdvisory giodata:DifferentNEQCTAdvisory)]

Multi-sensor Data Synergy Advisor (NASA), Leptoukh, Lynnes, Zednik, et al.

Page 44: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Advisor Knowledge Base

44Advisor Rules test for potential anomalies, create

association between service metadata and anomaly metadata in Advisor KB

Page 45: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Semantic Advisor Architecture

RPI

Multi-sensor Data Synergy Advisor (NASA), Leptoukh, Lynnes, Zednik, et al.

Page 46: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Advisory Report

• Tabular representation of the semantic equivalence of comparable data source and processing properties

• Advise of and describe potential data anomalies/bias

46

Page 47: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Advisory Report (Dimension Comparison Detail)

47

Page 48: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Going forward…• No surprise – lots of application

– Water quality, environmental conditions– Global change information system; traceable

accounts– Deep Carbon Observatory– And … maturing ontologies

• Discrete and Continuous Additive effects (in Event Calculus) – driven by the desire to explore physical processes in Earth’s atmosphere when a volcano erupts

Page 49: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

RPI Tetherless World Constellation tw.rpi.edu

Themes

Lots of RDFWeb Infrastructure

Scaling and DistributedQuery and Reasoning

AI, Rule reasoningVisualizing SW ‘data’Social SW, SMWiki

Policy LangOntologies/Tools, …

Future Web•Web Science

•Policy•Social

Xinformatics•Data Science•Semantic eScience•Data Frameworks

Semantic Foundations•Knowledge Provenance

•Inference•Trust

Hendler

Fox

McGuinness+ ~ 40 = Post-doc, Staff, Grad, UGrad

• Government Data• Health care/Life Sciences• Environmental Informatics

Luciano, Erickson

Page 50: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Respect vocabularies!

Page 51: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Discovering new data

Page 52: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Core and Framework Semantics - Multi-tiered interoperability

used by

Page 53: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

High-level architecture

Tetherless World Constellation 53

Page 54: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

Summary (thanks to AI)• In 2004 we set out to build a prototype and ended

up with a production semantic data framework– Languages and tools served us well (standards)

• Even with modest expressivity we challenged the tools of the time and made many compromises

• All along the way, we evaluated our ontology developments and implementations to gauge the benefits of semantics

• Maintainability, esp. modularization drove new expressivity needs

• Xinformatics and a repeatable methodology is the key (information models) - we continue to need to bridge computer science and application communities (“It’s the language stupid”, i.e. semantics)

Tetherless World Constellation 54

Page 55: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*
Page 57: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

57

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance Value

Restrs.

GeneralLogical

constraints

Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

Page 58: An Open-World Iterative Methodology for the Development and Evaluation of Semantically-Enabled Applications IAAI - Session 23F Robert S. Engelmore Award*

58

Semantic Web Layers

http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/