Linked Data and Services
-
Upload
barry-norton -
Category
Technology
-
view
1.483 -
download
2
description
Transcript of Linked Data and Services
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Institute AIFB
www.kit.edu
Linked Data and Services
Andreas Harth and Barry Norton
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Outline
Motivation
Linked Data Principles
Query Processing over Linked Data
Linked Data Services (LIDS) and Linked Open Services (LOS)
Conclusion
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Motivation
Semantic Web/Linked Data technologies are well-suited for data integration
11.04.2023
Data Integration
Interactive Data Exploration
Common Data Format/Access
Protocol
!?
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Linked Data Principles*
1. Use URIs to name things; not only documents, but also people, locations, concepts, etc.
2. To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs
3. When someone looks up a URI we provide useful information; with 'useful' in the strict sense we usually mean structured data in RDF.
4. Include links to other URIs allowing agents (machines and humans) to discover more things
(*) http://www.w3.org/DesignIssues/LinkedData.html
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Correspondence between thing-URI and source-URI
5
User Agent
Web Server
http://www.polleres.net/foaf.rdf#me
http://www.polleres.net/foaf.rdf
HTTPGET
RDF
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Correspondence between thing-URI and source-URI
6
User Agent
Web Server
http://dbpedia.org/resource/Gordon_Brown
http://dbpedia.org/data/Gordon_Brown
HTTP GET
303 HTTPGET
RDF
http://dbpedia.org/page/Gordon_Brown
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Queries over Linked Data
SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}
?f ?n
SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Data warehousing or materialisation-based approaches (MAT)
Querying Data Across Sources
9 15.03.2010
CRAWL INDEX SERVE
SELECT * FROM…
R S
Distributed query processing approaches (DQP)
R S
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
DQP on Linked Data
10 15.03.2010
SELECT * FROM…
R S
R S
SELECT ?s WHERE…
TP TP
TP TP
HTTP GET
HTTP GET
ODBCODBC
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Query Processing Overview
Andreas Harth
Data Summaries for On-Demand Queries over Linked Data
11 15.03.2010
TP(an:f#ah foaf:knows ?f)
SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}
TP(?f foaf:name ?n)
?f ?n
http://danbri.org/foaf.rdf#danbri Dan Brickley
Select source(s)
Select source(s)
HTTP
GETRDF HTTP
GET
RDF
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Problem: Source Selection for Triple Patterns
12 15.03.2010
(?s ?p ?o)(#s ?p ?o)(?s #p ?o)(?s ?p #o)(#s #p ?o)(#s ?p #o)(?s #p #o)(#s #p #o)
Given a triple pattern, which source can contribute bindings for the triple pattern?
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Keep index of properties and/or classes contained in sources
(?s #p ?o), (?s rdf:type #o)
Covers only queries containing schema-level elements
Commonly used properties select potentially too many sources
Schema-Level Indices [Stuckenschmidt et al. 2004]
13 15.03.2010
SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}
SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Exploits correspondence between thing-URI and source-URI
Linked Data sources (aka RDF files) return typically triples with a subject corresponding to the source
Sometimes the sources return triples with object corresponding to the source
(#s ?p ?o), (#s #p ?o), (#s #p #o)(?s ?p #o), (?s #p #o)
Incomplete wrt. patterns but also wrt. to URI reuse across sources
Limited parallelism, unclear how to schedule lookups
Direct Lookup (DL) [Hartig et al. 2009]
SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}
SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Combined description of schema-level and instance-level
Use approximation to reduce index size (incurs false positives)
Possible to use entire query for source selection
Parallel lookups since sources can be determined for the entire query
(?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p ?o), (#s ?p #o), (?s #p #o), (#s #p #o)and combinations of triple patterns
Approximate Data Summaries
15 15.03.2010
SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}
SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Implementation
Deploy wrappers „in the cloud“
Google App Engine: hosting of Java and Python webapps on Google’s Cloud infrastructure
Limited amount of processing time (6hrs/day)
Single-threaded applications
Suited for deploying wrappers
e.g. http://twitter2foaf.appspot.com/ converts Twitter user data to RDF
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2007
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2008
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2009
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Linking Open Data Cloud 2010
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Geonames Services
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Geonames Services
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Geonames Services
{"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO", ...
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
{"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO", ...
Geonames Services
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Linked Open Service Principles
REST Principles1. Application state and functionality is divided into resources2. Every resource is uniquely addressable3. All resources share a uniform interface: a) A constrained set of well-defined operations b) A constrained set of content types
Linked Data Principles1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)4. Include links to other URIs. so that they can discover more things.
Linked Open Service Principles1. Describe services as LOD prosumers with input and output descriptions as SPARQL graph patterns2. Communicate RDF by RESTful content negotiation3. The output should make explicit its relation with the input
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
LOS Weather Service
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
LOS Geo Resources
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Resource-Based Linked Open Services
GETAccept: text/html
303 REDIRECT /page
GETAccept: application/rdf+xml (or text/n3)
303 REDIRECT /data Link
ed
Dat
aLi
nked
S
ervi
ce
GET /weatherAccept: application/rdf+xml(or text/n3)
200 <rdf:Description>
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Interlinking Data with Data from Services?
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Linked Data Services
We’d like to integrate data services with Linked Data
1. LIDS need to adhere to Linked Data principles
We’d like to use data services in software programs
2. LIDS need machine-readable descriptions of input and output
Compared to naïve approach: assign URI to service output
Relationship between input and output is explicitly described
Dynamicity is supported
Multiple or no output resources can be linked to input
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Interlink LIDS and Linked Data
Generate service URIs with input bindings, from evaluating :select Xi where Ti
sameAs: binding for i
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Query Answering using LIDS and Linked Data
Query execution resolves URIs
=> enlarges data set
LIDS are interlinked
Query is executed again on new data set
Repeat until no new links or no new data
Combine results
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Experiment: Query Answering
Input:List of 562 (potential) universities from Facebook Graph API
Output: Facebook fans and DBpedia student numbers for 104 universities
PREFIX u: <http://openlids.org/universities.rdf#> SELECT ?n ?f ?s WHERE { u:list foaf:topic ?u . ?u foaf:name ?n . ?u og:fan_count ?f .?u d:numberOfStudents ?s }
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
Linked * Services and PlanetData
Several areas seem likely to produce services:Stream, inc. Sensor, resources (latest values)
Any others exposing dynamic resources
Dynamic computations, inc. on-the-fly quality assessments
Other areas seem likely to consider service technologies and move towards more service-like HTTP interactions
Access control (OpenID, OAuth, etc.)
Finally, remaining areas could serve to complement LIDS/LOS alignment
Provenance