Programming the Semantic Web
description
Transcript of Programming the Semantic Web
Programming the Semantic Web
Linyun FuJune 3, 2014
Adapted from Steffen Staab’s keynote at ESWC 2014
This presentation contains program code, engineering, speculation and worse.It may be viewed as offensive by some viewers.
Semantic Web ProgrammingLinked Data
SPARQL endpoint
RDF file Hypothesis:Semantic Web data is wonderful, but programming with Semantic Web has changed too little since 2000 and still is a mess.
We promise flexibility, but code still hard to
maintain.
Ontology
Linked Data
SPARQL endpoint
RDF file
„Inside“Data Mgmt
„Outside“
Data Mgmt
Eclipse
Visual Studio
....
Ontology
These is a mismatch between data engineering and programming approaches
„Inside“Data Mgmt
„Outside“
Data Mgmt
Eclipse
Visual Studio
....
„Outside“ Understanding
by the developer „Search + Code“ „Browse + Code“
Triple-object mapping Code generation Process of federation
LD vs endpoint Models of federation
Abstraction layers that facilitate a developer‘s
life
searching, finding, reusing all the complex data strcutres that we have
Programming with Data: What does it cost?
Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode
Ctool : Costs for building t many tools, shared; almost free
Clearn: Costs for learning how to use technology per developer dCdeu: Costs for data engineering/understanding s sources
Cmap : Costs for mapping data structure for s sources to objects
Ccode : Actual costs for accessing/manipulating data n times
„Inside“Both „Outside“
SWOT of Semantic Web Programming
Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode
threat
opportunity
opportunity
weakness
Not a strength, yet!
strength/weakness
Strong in flexibilitySomewhat weak in
performance
„Outside“
as good as RelDB is not good enough!
Intermediate conclusionMinimize costs for setup:Clearn: Costs for learning how to use technology per developer dCdeu: Costs for data engineering/understanding s sources
Cmap : Costs for mapping data structure for s sources to objects
Minimize costs for core programming:Ccode : Actual costs for accessing/manipulating data n times
Costs for learning and understanding constitute a threat!
Need to be overcome!
We need flexible code to match flexible data structures
• i.e., Domain-specific languages for Semantic Web Programming
• XML programming example• Why Jena is not good enough
XML programming example: LINQ to XML
XElement contacts = new XElement("Contacts", new XElement("Contact", new XElement("Name", "Patrick Hines"), new XElement("Phone", "206-555-0144"), new XElement("Address", new XElement("Street1", "123 Main St"), new XElement("City", "Mercer Island"), new XElement("State", "WA"), new XElement("Postal", "68042") ) ) );
The Jena Approach
Task: List all records for each music artist
The Jamendo ontology
Accessing Artists Using Apache Jena
From artists to songs
Observations• SPARQL queries are strings• Results are strings• Requires good understanding of the data source
RDF Typing is lost
Related Work on RDF AccessStatic Typing Errors detected before
execution Misspelling discovered by
compiler! Anectode: 2nd place
because of misspelt var. Static types are form of
documentation Less knowledge about data
source required• Better IDE integration /
autocompletion
Code generation• Sommer• Winter• OntoMDEDynamic Typing E.g. ActiveRDF
(Oren et al 2007)) “convention over
configuration” dynamic
metaprogramming allows for slick code
Criticism
SEMANTIC WEB PROGRAMMING:
LITEQ – OUR OUTSIDE APPROACH
Programming against Jamendo
c1
Node Path Query Language Using Autocompletion
Exploration of classes
Exploration of relations
Querying for instances
Node Path Query Language Using Autocompletion
Exploration of classesExploration of relations
Node Path Query Language: Query Formulation
Exploration of classesExploration of relationsQuerying for instances
Type set of mo:MusicArtist
No definition or declaration needed
Node Path Query Language for Code Development
Exploration of classesExploration of relationsQuerying for instancesDeveloping code with queries
All translated into SPARQL queries at• Development time• Type inference at compile time
(but also as part of IDE)• Querying again at run time
One language to bind them all
Node Path Query Language for Code Development
Exploration of classesExploration of relationsQuerying for instancesDeveloping code with queriesDeveloping code with new classes
All translated into SPARQL queries at• Development time• Run time update• Persistence!
NPQL
NPQL (Node Path Query Language)• Intensional Queries
Describing RDF classes and properties for reuse in IDE and in host language metaprogramming
• Extensional Queries Class instances and property instances
• Compilation to SPARQL for reuse of existing endpoints
Ongoing discussion about details of NPQL
LITEQNPQL (Node Path Query Language)• Intensional Queries• Extensional Queries• Compilation to SPARQL
LITEQ (Language Integrated Types, Extensions and Queries) • Implementation of NPQL as F# Type Provider in Visual Studio• Autocompletion using NPQL queries• Automatic typing
of extensional query resultsby intensional queries
Cost savings
Ctotal = t*Ctool + d*t*Clearn + s*Cdeu + s*Cmap + n*Ccode
Ctool : open sourceClearn: not free – though autocompletion reduces cognitive loadCdeu: not free – understanding the RDF schema from your IDECmap : 0Ccode : a lot less than for dotNet RDF (Apache Jena?!!)
little bit more than for a fictitious perfect object model
Halstead metrics for different tasks:
Conventional Semantic Web programming approaches waste up to 50% of your efforts!
Speculation 1
Ctotal
n (as in n*Ccode )
SemWeb coding efforts
RelDB/XML coding efforts
Applies to small n
Diff.
cos
ts fo
r le
arni
ng
Speculation 1: Using ontologies and RDF schemata, we can develop more efficiently using
the right tools!
Halstead metrics for different tasks:
If someone gives me a perfect RDF-to-OO mapping for freethen I will not care about whether it is RDF or RelDB
underneath!
Speculation 2
Ctotal
n (as in n*Ccode )
SemWeb coding efforts
RelDB/XML coding efforts
„Perfect“ object model shields
developer from database
ideosyncracies
Diff.
cos
ts fo
r se
tup
Speculation 2: For large programmes, our tools need to offer better support to reduce setup
costs!
CONCLUSION
Semantic Web: Make Developers More Productive
Linked Data
SPARQL endpoint
RDF file
„Inside“Data Mgmt
„Outside“
Data Mgmt
Eclipse
Visual Studio
....
Ontology
What I Think
• New programming languages vs. code generation + existing PLs– Mismatches between RDF and OO: Oren et al 2007,
Saathoff et al 2009– Only some ontologies can be perfectly mapped to
OO classes– A killer PL to come
References• C. Saathoff, S. Scheglmann, S. Schenk. Winter: Mapping RDF
to POJOs revisited.• E. Oren, R. Delbru, S. Gerke, A. Haller, S. Decker: ActiveRDF:
object-oriented semantic web programming. WWW 2007: 817-824
• S. Scheglmann, A. Scherp, S. Staab. Declarative Representation of Programming Access to Ontologies. In: 9th Extended Semantic Web Conference (ESWC2012), Heraklion, Greece, May 27-31, 2012.
• W. Cook, A. Ibrahim. Integrating Programming Languages & Databases: What’s the Problem?? http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.7169&rep=rep1&type=pdf