Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute...
Transcript of Processing Queries on Top of Linked Data and Sensor Data · Digital Enterprise Research Institute...
© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Processing Queries on Top of Linked Data and Sensor Data
Cry Distribution
Marcel Karnstedt
Digital Enterprise Research Institute www.deri.ie
Linked Data
Use URIs as names for things (documents, people, organisations, products, …)
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information, typically structured data in RDF
Include links to other URIs, so that they can discover more things
http://www.w3.org/DesignIssues/LinkedData
Digital Enterprise Research Institute www.deri.ie
Accessing Linked Data
Digital Enterprise Research Institute www.deri.ie
Linked Data in Practice
User Agent
Web Server
http://www.polleres.net/foaf.rdf#me
http://www.polleres.net/foaf.rdf
HTTP GET
RDF
Digital Enterprise Research Institute www.deri.ie
Forward References
User Agent
Web Server
http://dbpedia.org/resource/Gordon_Brown
http://dbpedia.org/data/Gordon_Brown
HTTP GET
303 HTTP GET
RDF
http://dbpedia.org/page/Gordon_Brown
Digital Enterprise Research Institute www.deri.ie
Consumption - Essentials
Linked Data provides for a global data-space with a uniform API (due to RDF as the data model)
Access methods Dereference URIs via HTTP GET (RDF/XML, RDFa, etc.)
SPARQL (‘the SQL of RDF’)
Data dumps (RDF/XML, etc.)
Metadata about LOD datasets voiD (http://semanticweb.org/wiki/VoiD)
Allows to select datasets based on their characteristics (topic, license, interlinking, formats, etc.)
Digital Enterprise Research Institute www.deri.ie
Consumption - Technologies
Basic Linked Data access mechanisms widely supported in all major platforms and languages (HTTP interface &
RDF parsing), such as Java, PHP, C/C++/.NET, etc.
Inspect and debug tools – Command line tools (curl, rapper, etc.)
– Online tools – http://redbot.org/ (HTTP/low-level) – http://sindice.com/developers/inspector (RDF/data-level)
SPARQL endpoints (generic and dataset-specific) http://esw.w3.org/SparqlEndpoints
Digital Enterprise Research Institute www.deri.ie
Gimme URIs…!!
Distributed setup need for central point of access (indexer, aggregator)
Sindice, an index of the Web of Data http://sindice.com/
Sig.ma, Web of Data aggregator & browser http://sig.ma/
Relationship discovery http://relfinder.semanticweb.org/
But where is the DB…?! Complex queries, efficient storage, quick access, etc.
Digital Enterprise Research Institute www.deri.ie
Ranking
Huge data, millions of sources, messiness, RDF model, … Requires special ranking
TF-IDF style, Graph based (PageRank style), Cardinality based (histogram style), etc. Triple level
URI level
Document level
Source level
Domain level
…
Digital Enterprise Research Institute www.deri.ie
Skyline Ranking
Objects that are not “dominated“ by other objects Scoring function on multiple attributes, no weighting
In contrast to (multidimensional) top-k
No straightforward IR operation
dominated objects
price
distance age
price time
Digital Enterprise Research Institute www.deri.ie
Querying
SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. }
?f ?n
SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1, ?a2. ?x1 owl:sameAs ?a1. ?x2 owl:sameAs ?a2. ?x1 foaf:knows ?x2. ?x2 foaf:knows ?x1. }
Digital Enterprise Research Institute www.deri.ie
ξ([?s2·?p2·?o2 → ···])
WHERE { ?s1 <p1> ?o1 . ?s2 ?p2 ?o2 . ?s2 <add> ?add . FILTER (edist(?o1, ?o2) < k) . FILTER ?p2=<pred>}
σ(?p2=<pred>) ξ([?s1··?o1 → ·<p1>·])
(edist(?o1,?o2)<k)
ξ([?s2·?p2·?o2 → ·<pred>·])
ω(?s2;[?s2·· ?add → · <add> ·])
Query Algebra
Digital Enterprise Research Institute www.deri.ie
Data warehousing or materialisation-based approaches (MAT)
Querying Data Across Sources
CRAWL INDEX SERVE
RDBMS One big table
Property tables Vertical storage, column stores
Hybrid approaches, such as Virtuoso
Native stores, such as YARS Special (simplified) structures, special indexes!
Digital Enterprise Research Institute www.deri.ie
Indexing in YARS etc.
Index the different parts of a triple Ideally: all combinations
Optimised for read-only access With prefix support: only 6 (spo, sop, pso, pos, osp, ops)
Trade-off: storage vs. performance, read vs. write
Optional special indexes (full-text, string similarity, …)
<x> name Xavier <x> knows <friend> <x> seeAlso <link> <x> sameAs <y>
Digital Enterprise Research Institute www.deri.ie
Live lookups, on-demand querying
Live Queries
15
SELECT * FROM…
R S
R S
SELECT ?s WHERE…
TP TP
TP TP
HTTP GET
HTTP GET
ODBC ODBC
Digital Enterprise Research Institute www.deri.ie
Live Queries: Approaches
Andreas Harth Data
16
15.03.2010
TP (an:f#ah foaf:knows ?f)
SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. }
TP (?f foaf:name ?n)
?f ?n
http://danbri.org/foaf.rdf#danbri Dan Brickley
Select source(s)
Select source(s)
HTTP
GET RDF HTTP
GET
RDF
Direct lookups dereferencing URIs,
recursive
Data summaries
Digital Enterprise Research Institute www.deri.ie
Federated/Distributed Querying
Federated feature in SPARQL1.1 Directly refer to sources
Automatically split/copy queries and forward
Depends on query capabilities! Simple sources vs. SPARQL end points etc.
Similar issues as in central engines Indexing, local stores, …
New challenges Availability, guarantees, robustness, consistency, …
Digital Enterprise Research Institute www.deri.ie
Basic federated Queries (time permitting)
http://www.w3.org/TR/sparql11-federated-query/
Will be integrated in Query spec
Essentially new pattern SERVICE Similar to GRAPH
allows delegate query parts to a specific (remote) endpoint
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?N FROM <http://www.w3.org/People/Berners-Lee/card> WHERE { { <http://www.w3.org/People/Berners-Lee/card#i> foaf:knows ?F . ?F foaf:name ?N } UNION { SERVICE <http://dblp.l3s.de/d2r/sparql>
{ [ foaf:maker <http://dblp.l3s.de/…/authors/Tim_Berners-Lee>, [ foaf:name ?N ] ] . } } }
Digital Enterprise Research Institute www.deri.ie
Total Decentralisation
DB Find all reviews for movies made in 1994 in central Europe!
RDF: Geonames data
RDF: IMDB data RDF: reviews
Digital Enterprise Research Institute www.deri.ie
<x> name Xavier <x> knows <friend> <x> seeAlso <link> <x> sameAs <y>
Use distributed hash tables (DHT) Indexing of attributes = key for Hashing Which attributes? All!
Indexing in UniStore
h(s) for subject lookup h(p1 || o1) for ?s pi ?o . h(p2 || o2) filter (?o ≥ v) ... (prefix search)
...trade-off storage vs. performance
Digital Enterprise Research Institute www.deri.ie
Goal: stateless processing → “push“ approach Messages containing both plan and intermediate results
(based on Mutant Query Plans [Papadimos et al. 02])
Receiver peer is identified by applying the hash function
Multiple instances of the plan travel trough the network
Robustness: Parallel Execution
p0
p1
p2
p3
p4
p5
{(A,1),(A,2)}
{(A,3),(A,4)}
{(A,5),(A,6)} {(B,5),(B,6)}
{(B,2),(C,1),(B,2),(C,4)}
p0
{(A,5),(B, 5)}
{(A,2,B,2,C,1), (A,2,B,2,C,4)}
σ(A) B
σ(A) B
σ(A) B
{(A,5)} B
{(A,2)} B
{(A,3),(A,4)} B
Digital Enterprise Research Institute www.deri.ie
Less Bandwidth: Sequential Execution
p0
p1
p2
p3
p4
p5
{(A,1),(A,2)}
{(A,3),(A,4)}
{(A,5),(A,6)} {(B,5),(B,6)}
{(B,2),(C,1),(B,2),(C,4)}
p0
{(A,2,B,2,C,1),(A,2,B,2,C,4),...}
σ(A) B {(A,2)} B
{(A,2),(A,3),(A,4)} B
{(A,2),(A,3),(A,4),(A,5)} B
{(A,2),(A,3),(A,4),(A,5,B,5)} B
All peers can be queried in a sequence
Decision at each peer: adaptive query processing
Digital Enterprise Research Institute www.deri.ie
Mixed Query Execution
p0
p1
p2
p3
p4
p5
{(A,1),(A,2)}
{(A,3),(A,4)}
{(A,5),(A,6)} {(B,5),(B,6)}
{(B,2),(C,1),(B,2),(C,4)}
p0
{(A,5),(B, 5)}
{(A,2,B,2,C,1), (A,2,B,2,C,4)}
σ(A) B {(A,2)} B
{(A,2),(A,3),(A,4)} B
{(A,2),(A,3),(A,4),(A,5)} B
May result in unpredictable behavior
“Fire and forget” A peer may see the same query multiple times
Different data to process
Different operators to process
Digital Enterprise Research Institute www.deri.ie
Reasoning
Digital Enterprise Research Institute www.deri.ie
In Principle
Machine-interpretable representation of data allows for deductive reasoning Drawing conclusions from axioms and data
Web Ontology Language (OWL) provides constructs supporting entity consolidation Same as, inverse-functional properties (mbox_sha1sum)
Reasoning can further be used to: Unite fractured data sets
Disambiguate entities
Check consistency of knowledgebases
Digital Enterprise Research Institute www.deri.ie
Example
General Idea: Answer Queries with implicit answers
Simplified example: :jeff rdfs:type foaf:person & :jeff foaf:knows :aidan
query: select ?x { ?x rdfs:type foaf:agent }
foaf:person rdfs:subClassOf foaf:agent :jeff as result
query: select ?x { ?x rdfs:type foaf:person } foaf:knows rdfs:range foaf:person :jeff and :aidan as result
Inverse-functional properties, sameAs, subPropertyOff etc.
Digital Enterprise Research Institute www.deri.ie
Problems…
Usually expensive, huge amount of “new” facts
Potential conflicts due to inconsistencies Potentially infinite results
“Ontology hijacking” e.g. foaf:Person subClassOf my:Person
A new statement for each Person in the dataset
Non-distinguished variables SELECT ?X { ?X :hasFather ?Y }
No such triple in the data, but “every person has a father”?!
08445a31a78661b5c746feff39a9db6e4e2cc5cf sha1-sum of ‘mailto:’
Digital Enterprise Research Institute www.deri.ie
Implementation
Materialise inferred triples Forward chaining
Query rewriting, recursive/iterative Backward chaining
On-the-fly with data summaries Ongoing research
Stateless query expansion Parallel sub-queries
Digital Enterprise Research Institute www.deri.ie
Query Expansion
Unexpanded query
Map operators added
First mapping Expanded query
Digital Enterprise Research Institute www.deri.ie
Querying Sensor Data
Digital Enterprise Research Institute www.deri.ie
Processing Paradigms
Digital Enterprise Research Institute www.deri.ie
AnduIN
Digital Enterprise Research Institute www.deri.ie
CQL
Digital Enterprise Research Institute www.deri.ie
In-Network Query Processing
Digital Enterprise Research Institute www.deri.ie
Example
Anomalies in sensor networks Sensors deliver measurements
x
y
s9
s6 s5 s25
s2
s15
s17
s24
s18
s10 s4
s12
s28
s13
s16
s7
s3 s1
s21
s11
s8
s19
s23
s41
s34
s38 s37
s20
s22
s27 s26
s31
s29 s30
s35
s36
s39
s14
s40
s32 s33
Digital Enterprise Research Institute www.deri.ie
Example
Anomalies in sensor networks Identify anomalies from the stream
x
y
s9
s6 s5 s25
s2
s15
s17
s24
s18
s10 s4
s12
s28
s13
s16
s7
s3 s1
s21
s11
s8
s19
s23
s41
s34
s38 s37
s20
s22
s27 s26
s31
s29 s30
s35
s36
s39
s14
s40
s32 s33
Digital Enterprise Research Institute www.deri.ie
Example
Anomalies in sensor networks Determine anomalous regions
x
y
s9
s6 s5 s25
s2
s15
s17
s24
s18
s10 s4
s12
s28
s13
s16
s7
s3 s1
s21
s11
s8
s19
s23
s41
s34
s38 s37
s20
s22
s27 s26
s31
s29 s30
s35
s36
s39
s14
s40
s32 s33
Digital Enterprise Research Institute www.deri.ie
Example
Anomalies in sensor networks Respect obstacles
x
y
s9
s6 s5 s25
s2
s15
s17
s24
s18
s10 s4
s12
s28
s13
s16
s7
s3 s1
s21
s11
s8
s19
s23
s41
s34
s38 s37
s20
s22
s27 s26
s31
s29 s30
s35
s36
s39
s14
s40
s32 s33
Digital Enterprise Research Institute www.deri.ie
Example
Anomalies in sensor networks Obstacles change regions
x
y
s9
s6 s5 s25
s2
s15
s17
s24
s18
s10 s4
s12
s28
s13
s16
s7
s3 s1
s21
s11
s8
s19
s23
s41
s34
s38 s37
s20
s22
s27 s26
s31
s29 s30
s35
s36
s39
s14
s40
s32 s33
Digital Enterprise Research Institute www.deri.ie
Example Scenario
Storms in California
Digital Enterprise Research Institute www.deri.ie
Anomaly Degrees and Regions
Regions for different thresholds Triangulated Wireframe Surface
(TWS)
Degree plane at height 0.25 Degree plane at height 0.4
Digital Enterprise Research Institute www.deri.ie
IN Region Detection
Focus on energy consumption But cost model supports multiple dimensions
Digital Enterprise Research Institute www.deri.ie
Cost Estimation
In streams: continuous queries
Thus, query planning should be adaptive Not “optimise first, execute next” any more
When and how to re-optimise
Forecast, e.g., by exponential smoothing
Decide between alternative query plans
Digital Enterprise Research Institute www.deri.ie
Eval: Anomalies
Anomaly rate and size of sliding window
Digital Enterprise Research Institute www.deri.ie
Eval: Anomalies /2
Number of leader nodes
Digital Enterprise Research Institute www.deri.ie
Eval: Anomalous Region
Anomaly rate
Digital Enterprise Research Institute www.deri.ie
Brief Wrap-Up
Different approaches for querying SemWeb data
Linked Data is inherently distributed ...but not inherently dynamic?!
Distributed approaches promising: Support dynamic data
Scalable
Query processing in sensor networks shows similarities
Scalability requirements advise to focus on distributed approaches, resource limitations demand it