© Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Lessons and requirements from a decade of deployed Semantic Web apps
Benjamin Heitmann, Richard Cyganiak, Conor Hayes, Stefan Decker
Funded by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Líon-2)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Input for this workshop
LEDP workshop CfP calls for: requirements patterns gaps in Linked Data
standards + guidelines
Where should this input come from ?
2
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
The Semantic Web: a decade is a long time
3
2001 2011
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Choice of methodology?
Goal: patterns, requirements and gaps
regarding LD
Data: 10 years of Semantic Web research
Which scientific approach fits ? Empirical software engineering
Full IEEE transactions journal paper:
http://tinyurl.com/semweblessons
4
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Overview
5
Empiricalsurvey
Architecture:arch. pattern
LD standards:gaps
Software Eng. Process: shortcomings
Software engineering solutions
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Empirical survey
Sources: 124 apps total Semantic Web Challenge
(ISWC): 2003-2009,
101 apps
Scripting for SemWeb Challenge (ESWC), 2006-2009, 23 apps
includes industry & research apps
Checklist (12 questions) Data collection:
1. own analysis of paper
2. validation by email
6
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Empirical survey results
widespread support for SemWeb specific features
clear difference to database-driven apps big uptake of Linked Data principles and
eco-system integration requires human intervention
top 3 standards: RDF, OWL, SPARQL top 3 vocabularies: FOAF, DC, SIOC
7
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Conceptual architecture
Conceptual architecture: describes major design elements of
a system (+ relations) domain specific
(e.g. the Semantic Web)
provides architectural pattern documents community consensus
8
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Components of conceptual architecture
9
RDF datahandling
starting point:
Data integration
Graph-based navigation interface
(91%)
Userinterface
Structured data authoring interface
(29%)
Data homogenisation service (74%)
Data discoveryservice (30%)
Graph access layer (100%)
RDF store (88%)
Graph query language service
(77%)
decouple +specialise
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
LD gaps: publishing/consuming
all applications consume RDF 73% import API, 69% export API but: incompatible
implementations LD principles in 2006 led to
consolidation
embedding RDF: web for humans vs. web for machines
2008: introduction of RDFa
10
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
LD gaps: beyond open data
writing/changing/updating RDF data is difficult
71% of apps do not support data changes
Writing to remote RDF store: draft status in 2011: SPARQL Update
Restricting access (read/write): no standards no interoperability closest ideas (?): R/W design note, WebID
11
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Software Eng. process shortcomings (1)
Integrating noisy RDF data: 60% semi-automatic integration this involves human intervention only 20% use automatic heuristics
major part of Semantic Web specific code
Distribution of application logic: multiple components and standards queries(41%), rules(52%) or formal
vocabularies hard to maintain
12
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Software Eng. process shortcomings (2)
Mismatch of data models between components graph versus relational or
object oriented (90%)
overhead in communication
inconsistent round-trip conversion
3 way ORM needed ?
13
relationalobject oriented
graph-based
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Software Eng. solutions (1)
More guidelines, best practices and design patterns: current examples:
– Linked Data principles and publishing guidelines
– guidelines for naming of URIs– Linked Data patterns collection
result: more interoperability, more coherent Web of Data
14
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Software Eng. solutions (2)
More software libraries (beyond RDF storage!) guidelines can be hardcoded in
reusable libraries good libraries can make
complicated guidelines easy to use (See HTTP, SSL, SMTP and DNS lookups)
current examples: – any23, d2r server, Semantic
Web Client Library
15
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Software Eng. solutions (3)
More software factories: create complete applications requires patterns + libraries or: “opinionated software”
components can be customised for domain
Interface, homogenisation and data discovery usually made from scratch
16
https://developers.facebook.com/docs/beta/opengraph/tutorial/
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Summary
17
Empiricalsurvey
Architecture:arch. pattern
LD standards:gaps
Software Eng. Process: shortcomings
Software engineering solutions
Full article:
http://tinyurl.com/semweblessons
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Appendix: threats to validity
Representativeness: only complete applications part of challenges (not tools or
libraries) apps needed to use real-world data submission of paper describing the app was required challenge extends of multiple years, allows trends to be seen
Number of authors who verified checklist (65%): academic email addresses expire quickly we manually tried to find new email addresses
no source code was used: source code was not required for challenges due to e.g. IP
issues
18
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Table: Impl. details
19
2003 2004 2005 2006 2007 2008 2009 overall
Programminglanguages
Java 60%C 20%
Java 56%JS 12% Java 66%
Java 10%JS 15%PHP 26%
Java 50%PHP 25%
Java 43%PHP 21%
Java 46%JS 23%PHP 23%
Java 48%PHP 19%JS 13%
RDF libraries —Jena 18%Sesame 12%Lucene 18%
—RAP 15%RDFLib10%
Sesame 33%Jena 8%
Sesame 17%ARC 17%Jena 13%
Sesame 23% Sesame 19%Jena 9%
SemWeb standards RDF 100%OWL 30%
RDF 87%RDFS 37%OWL 37%
RDF 66%OWL 66%RDFS 50%
RDF 89%OWL 42%SPARQL15%
RDF 100%SPARQL50%OWL 41%
RDF 100%SPARQL17%OWL 10%
RDF 100%SPARQL69%OWL 46%
RDF 96%OWL 43%SPARQL41%
Schemas/vocabularies/ontologies
RSS 20%FOAF 20%DC 20%
DC 12%SWRC 12%
—FOAF 26%RSS 15%Bibtex 10%
FOAF 41%DC 20%SIOC 20%
FOAF 30%DC 21%DBpedia13%
FOAF 34%DC 15%SKOS 15%
FOAF 27%DC 13%SIOC 7%
2003 2004 2005 2006 2007 2008 2009manual 30% 13% 0% 16% 9% 5% 4%
semi-automatic
70% 31% 100% 47% 58% 65% 61%
automatic 0% 25% 0% 11% 13% 4% 19%not needed 0% 31% 0% 26% 20% 26% 16%
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Tables: Data integration and other properties
20
2003 2004 2005 2006 2007 2008 2009
Data creation 20% 37% 50% 52% 37% 52% 76%
Data import 70% 50% 83% 52% 70% 86% 73%
Data export 70% 56% 83% 68% 79% 86% 73%
Inferencing 60% 68% 83% 57% 79% 52% 42%
Decentralised
sources90% 75% 100% 57% 41% 95% 96%
Multiple
owners90% 93% 100% 89% 83% 91% 88%
Heterogeneous
formats90% 87% 100% 89% 87% 78% 88%
Data updates 90% 75% 83% 78% 45% 73% 50%
Linked Data
principles0% 0% 0% 5% 25% 26% 65%
year num
ber
ofap
plic
atio
ns
grap
hac
cess
laye
r
RD
Fst
ore
grap
h-ba
sed
navi
-ga
tion
inte
rfac
e
data
hom
ogen
i-sa
tion
serv
ice
grap
hqu
ery
lang
uage
serv
ice
stru
ctur
edda
taau
thor
ing
inte
rfac
e
data
disc
over
yse
rvic
e
2003 10 100% 80% 90% 90% 80% 20% 50%2004 16 100% 94% 100% 50% 88% 38% 25%2005 6 100% 100% 100% 83% 83% 33% 33%2006 19 100% 95% 89% 63% 68% 37% 16%2007 24 100% 92% 96% 88% 88% 33% 54%2008 23 100% 87% 83% 70% 78% 26% 30%2009 26 100% 77% 88% 80% 65% 19% 15%total 124 100% 88% 91% 74% 77% 29% 30%
Digital Enterprise Research Institute www.deri.ie
Enabling Networked KnowledgeBenjamin Heitmann, slide: /17
Table: architectural analysis
21
Top Related