Friday talk 11.02.2011
-
Upload
juergen-umbrich -
Category
Technology
-
view
294 -
download
1
description
Transcript of Friday talk 11.02.2011
![Page 1: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/1.jpg)
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Querying Live Linked Data
Mini Viva presentation ( 11.02.2011)
1
by Jürgen Umbrich
![Page 2: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/2.jpg)
Digital Enterprise Research Institute www.deri.ie
Querying in the Linked Data space
millions of diverse but often interrelated data
sources
“data everywhere” on the Web
no complete control over the data
crawl IndexYars2
Virtuoso
livedistributed querying
QP
sta
tic
dyn
am
ic
2
![Page 3: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/3.jpg)
Digital Enterprise Research Institute www.deri.ie
Linked Data is Dynamic
Dataset – Web data (’08 – ‘09) 24 weekly snapshots 4 hop neighborhood from Tim Berners-Lee FOAF file 550K RDF/XML docs, 3.3M unique entities
[ Umbrich et al. 2010 ]
Findings (entity level)
68% 32%
static dynamic
3
52%
24%
10%14%
<1 week >1 week<= 1 month
>1 month<= 3 month
>3 month<= 6 month
Change frequencyChange frequency
![Page 4: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/4.jpg)
Digital Enterprise Research Institute www.deri.ie
Accessing Linked Data
① Use URIs for things② Use HTTP URIs so that
people can look it up③ Provide useful
information, using standards (RDF, SPARQL)
④ Include links to other URIs
① Use URIs for things② Use HTTP URIs so that
people can look it up③ Provide useful
information, using standards (RDF, SPARQL)
④ Include links to other URIs
Direct correspondence between thing-URI and source-URI
http://umbrich.net/foaf.rdf#mehttp://umbrich.net/foaf.rdf#me
HTTP-GETHTTP-GET
http://umbrich.net/foaf.rdfhttp://umbrich.net/foaf.rdf
RDF/XMLRDF/XML
#me#me
http://dbpedia.org/resource/Galway
http://dbpedia.org/resource/Galway
4
foaf:based_near
![Page 5: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/5.jpg)
Digital Enterprise Research Institute www.deri.ie
Accessing Linked Data
http://dbpedia.org/resource/Galwayhttp://dbpedia.org/resource/Galway
Re-direct correspondence between thing-URI and source-URI
HTTP-GETHTTP-GET
http://dbpedia.org/data/Galwayhttp://dbpedia.org/data/Galway
HTMLHTML
http://dbpedia.org/page/Galwayhttp://dbpedia.org/page/Galway
Direct correspondence between thing-URI and source-URI
http://umbrich.net/foaf.rdf#mehttp://umbrich.net/foaf.rdf#me
HTTP-GETHTTP-GET
http://umbrich.net/foaf.rdfhttp://umbrich.net/foaf.rdf
RDF/XMLRDF/XML
#me#me
http://dbpedia.org/resource/Galway
http://dbpedia.org/resource/Galway
RDF/XMLRDF/XML
5
![Page 6: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/6.jpg)
Digital Enterprise Research Institute www.deri.ie
SELECT ?friendLabel WHERE{ juum:me foaf:knows ?f . ?f foaf:name ?friendLabel .}
SELECT ?friendLabel WHERE{ juum:me foaf:knows ?f . ?f foaf:name ?friendLabel .}
The Problem
What are the query relevant sources?
Example Query
?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .
polleres.net/foaf.rdf
6
umbrich.net/foaf.rdf sw.deri.org/~aidanh/
![Page 7: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/7.jpg)
Digital Enterprise Research Institute www.deri.ie
Index
Source Selection Approaches
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .
“Aidan Hogan”
“Aidan Hogan”
“Axel Polleres”
“Axel Polleres”
7
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
HTTP GET “Aidan Hogan” HTTP GET
“Axel Polleres”
HTTP GET
![Page 8: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/8.jpg)
Digital Enterprise Research Institute www.deri.ie
Source Selection Approaches
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
Direct execution/ graph traversal [Hartig et al. 2009]
Direct execution/ graph traversal [Hartig et al. 2009]
?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .
HTTP GET HTTP GET
“Aidan Hogan”
“Aidan Hogan”
“Aidan Hogan”
“Axel Polleres”
“Axel Polleres”
“Axel Polleres”
8
Direct execution/ graph traversal [Hartig et al. 2009]
Direct execution/ graph traversal [Hartig et al. 2009]
![Page 9: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/9.jpg)
Digital Enterprise Research Institute www.deri.ie
Source Selection Approaches
Schema-Level Indices [Stuckenschmidt et al.
2004]
Schema-Level Indices [Stuckenschmidt et al.
2004]Data Summaries
[Umbrich et al. 2010]Data Summaries
[Umbrich et al. 2010]
Inverted Indices [Heflin et al. 2010] (e.g.
Sindice V1.0)
Inverted Indices [Heflin et al. 2010] (e.g.
Sindice V1.0)
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
Direct execution/ graph traversal [Hartig et al.
2009]
Direct execution/ graph traversal [Hartig et al.
2009]
Index SizeQuery time recall freshness
ResultsQuery System
9
![Page 10: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/10.jpg)
Digital Enterprise Research Institute www.deri.ie
Approximate Data Summaries
Combined description of schema level and instance level
Use approximation to reduce index size (incurs false positives)
Index growth only with the number of sources
10
Multidimensional numerical dataspace
Hash-based data summaries
o
s1
301
30
![Page 11: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/11.jpg)
Digital Enterprise Research Institute www.deri.ie
o
s1
301
30
10
20
10 20
Hash-based Data Summaries
① juum:me foaf:knows ah:ah <http…foaf.rdf>
11
① Input: triple + source information ② Hash triples
② [ 24 , 5 , 2 ] <http…foaf.rdf>
③ Insert hash-triple into dataspaceand store source information with buckets
③ INS([ 24 , 5 , 2 ] , http…foaf.rdf )
Equi-width histogram
④ Query for relevant sources
④ QUERY ( juum:me ?p ?o ) -> ( 24, ?, ? )
![Page 12: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/12.jpg)
Digital Enterprise Research Institute www.deri.ie
o
s1
301
30
10
20
10 20
QTree: Efficient source selection
12
Equi-width histogram QTree
Combination of histograms and R-tree inheriting thebenefit of both data structures optimal for sparse data
Buckets store cardinality and set of sources => Top-k source rankinge.g. R1,1 ( 1: { http://…/foaf.rdf } )
![Page 13: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/13.jpg)
Digital Enterprise Research Institute www.deri.ie
Evaluation: Source Selection
13
J. Umbrich, K. Hose, M. Karnstedt, A. Harth, A. Polleres."Comparing Data Summaries for Processing Live Queries over Linked Data.”. In WWW Journal, Special Issue "Querying the Data Web", 2011
![Page 14: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/14.jpg)
Digital Enterprise Research Institute www.deri.ie
Source Selection Approaches
Schema-Level Indices [Stuckenschmidt et al.
2004]
Schema-Level Indices [Stuckenschmidt et al.
2004]Data Summaries
[Umbrich et al. 2010]Data Summaries
[Umbrich et al. 2010]
Inverted Indices [Hefflin et al. 2010] (e.g. Sindice V1.0)
Inverted Indices [Hefflin et al. 2010] (e.g. Sindice V1.0)
Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)
Direct execution/ graph traversal [Hartig et al.
2009]
Direct execution/ graph traversal [Hartig et al.
2009]
Index SizeQuery time recall freshness
ResultsQuery System
14
![Page 15: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/15.jpg)
Digital Enterprise Research Institute www.deri.ie
Querying in the Linked Data Space
millions of diverse but often interrelated data
sources
“data everywhere” on the Web
no complete control over the data
crawl MATIndex
livedistributed querying
QP
sta
tic
dyn
am
ic
Combined Query of RDF stores and the Linked Data Web
Combined Query of RDF stores and the Linked Data Web
15
![Page 16: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/16.jpg)
Digital Enterprise Research Institute www.deri.ie
Improved Query Time & Fresh Results
query
tim
e
#number of query execution
live querying
index querying
16
combined queryinglearning about source dynamics
combined querying
decrease query time by avoiding unnecessary HTTP lookups and still returning fresh results
![Page 17: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/17.jpg)
Digital Enterprise Research Institute www.deri.ie
Current Research Question
17
How to combined queryRDF stores and the Linked Data Web
![Page 18: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/18.jpg)
Digital Enterprise Research Institute www.deri.ie
Combined Query Processing
Live results on top of SPARQL stores
SPARQL
Index
query
live results
Query Processo
r
18
to decide (at query time) if we access the static store or the Web resources
Linked Data Web
by integrating the knowledge about the dynamic of sources into the query processor
SourceSelectio
n
Dynamics
SourceSelectio
n
Dynamics
Query Processo
r
Yars2,Virtuoso
![Page 19: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/19.jpg)
Digital Enterprise Research Institute www.deri.ie
Mining Dynamic/Static Patterns
Goal acquire knowledge about dynamic patterns
( e.g. geo:lat, geo:long) Considering context of a node ( e.g. a location value of a city
vs location value of a GPS sensor )
19
Dynamics Based on two datasets (started in March 2010 ) Daily 3-hop neighborhood crawls from 20 seed URIs Weekly snapshots over ~10 month
10% sampling from a billion triples crawl(fixed URI list, contains ~2K web vocabularies)
Learn to predict changes events
![Page 20: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/20.jpg)
Digital Enterprise Research Institute www.deri.ie
Query Processor
Collaboration with Yuan (APEXLAB)
Elaboration on how dynamic query planning can support data access decision taking into account dynamic patterns
Investigation of one of the possible approaches
20
Query Processo
r
![Page 21: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/21.jpg)
Digital Enterprise Research Institute www.deri.ie
Evaluation
Based on simulation using our dynamic mining dataset
Based on real-world data Linked Stream Data effort Using the gathered knowledge from our dynamic mining
Evaluation criteria Query time ( number of HTTP lookups ) Result freshness Recall (number of results)
21
![Page 22: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/22.jpg)
Digital Enterprise Research Institute www.deri.ie
22
How to combined query RDF stores and the Linked Data Web
to return fresh results
SourceSelecti
on
Dynamics
SPARQL
Index
Query Process
or
query
live results
Questions ?
![Page 23: Friday talk 11.02.2011](https://reader035.fdocuments.us/reader035/viewer/2022062513/555066d5b4c905c0448b549d/html5/thumbnails/23.jpg)
Digital Enterprise Research Institute www.deri.ie
Literature
23
[Hartig 2009 ] O. Hartig, Ch. Bizer, and J.-Ch. Freytag. Executing SPARQL Queries over the Web of Linked Data. In ISWC’09, 2009.
[Stuckenschmidt] H. Stuckenschmidt, R. Vdovjak, J. Broekstra, and G.-J. Houben. Towards distributed processing of RDF path queries. JWET, 2(2/3):207–230, 2005.
[Umbrich 2010] J. Umbrich, M. Hausenblas, A. Hogan, A. Polleres, S. Decker. Towards Understanding Dataset Dynamics: Change Frequency of Linked Data Sources. LODW 2010 at WWW 2010, 2010.
. [Heflin 2010] Y. Li and J. Heflin. Using Reformulation Trees to Optimize Queries over Distributed Heterogeneous Sources. In proceedings of the 9th International Semantic Web Conference (ISWC2010). 2010.