Query Rewriting in RDF Stream Processing

21
Query Rewriting in RDF Stream Processing Jean-Paul Calbimonte – Jose Mora – Oscar Corcho LSIR EPFL OEG UPM ESWC Heraklion, Greece. June 2016 @jpcik

Transcript of Query Rewriting in RDF Stream Processing

Page 1: Query Rewriting in RDF Stream Processing

Query Rewriting in RDF Stream Processing

Jean-Paul Calbimonte – Jose Mora – Oscar Corcho

LSIR EPFLOEG UPM

ESWC

Heraklion, Greece. June 2016

@jpcik

Page 2: Query Rewriting in RDF Stream Processing

2

Query Rewriting in

RDF Stream Processing

What?

Page 3: Query Rewriting in RDF Stream Processing

RDF Streams

3

s p oTriple

s p o t

Time-stamped triple

s p o

s p o

s p o

t

Time-stamped graph

Gi-1 Gi Gi+1 Gi+2 Gi+3

RDF stream

RSPS

W(S,t)

Page 4: Query Rewriting in RDF Stream Processing

Windowed Queries in RSP

4

:obs1 rdf:type ssn:Observation .:obs2 rdf:type ssn:Observation .:obs3 rdf:type ssn:Observation .

qG(x) ← ssn:Observation(x) :obs1,:obs2 ,:obs3

qW(S,t)(x) ← ssn:Observation(x)

:obs1,:obs2 [5]

:obs3,:obs4 [10]

:obs5,:obs6 [15]

g1[3][4]

[7]

[10]

[11]

[13]

:obs1 rdf:type ssn:Observation . g2 :obs2 rdf:type ssn:Observation .

g3 :obs3 rdf:type ssn:Observation .

g4 :obs4 rdf:type ssn:Observation . g5 :obs5 rdf:type ssn:Observation .

g6 :obs6 rdf:type ssn:Observation .

[1] S

G grap

h

WW size=5

qW(S,t)(x) ← ssn:observedBy(x,y)

No answers for this query: no matches in the stream

Page 5: Query Rewriting in RDF Stream Processing

RSP Query Languages

5

SELECT (MAX(?temp) AS ?maxtemp) ?sensorWHERE { STREAM :stream [RANGE 1h] { ?obs ssn:observationResult ?result; ssn:observedProperty cf-property:air_temperature; ssn:observedBy ?sensor. ?result ssn:hasValue ?obsValue. ?obsValue qu:numericalValue ?temp. }} GROUP BY ?sensor

SELECT ?obs WHERE { STREAM :stream [RANGE 5s] {?obs rdf:type ssn:Observation. }}

CQELSC-SPARQLSPARQLStream

EP-SPARQL

RSP languages

Example:Get the maximum air temperature observed in the last hour, grouped by sensor.

Page 6: Query Rewriting in RDF Stream Processing

6

Query Rewriting in

RDF Stream Processing

How?

Page 7: Query Rewriting in RDF Stream Processing

Query Rewriting

7

q’’(x) ← HumiditySensor(x)

Query Rewriting

q(x) ← Sensor(x) q’(x) ← Sensor(x)

Sensor

HumiditySensorTBox

Original QueryRewritten UCQ

ABox

Ontology expressiveness: e.g. RDFS, ELHIO, etc.

OBDA systems, query rewriters: Prexto, Requiem, Kyrie, etc.

Explosion of rewritings, complexity of queries

Page 8: Query Rewriting in RDF Stream Processing

Semantic Sensor Network Ontology

8

ssn:Sensor

ssn:Platform

ssn:FeatureOfInterest

ssn:Deployment

ssn:Property

cf-prop:air_temperature

ssn:observes

ssn:onPlatform

dul:Placedul:hasLocation

ssn:SensingDevicessn:inDeployment

ssn:MeasurementCapability

ssn:MeasurementProperty

geo:lat, geo:lngxsd:double

ssn:hasMeasurementProperty

ssn:Accuracy

ssn:ofFeature

aws:TemperatureSensor

aws:Thermistor

ssn:Latency

dim:Temperature

qu:QuantityKind

cf-prop:soil_temperature

cf-feat:Wind

cf-feat:Surface

cf-feat:Medium

cf-feat:aircf-feat:soil

dim:VelocityOrSpeed cf-prop:wind_speedcf-prop:rainfall_rate

aws:CapacitiveBead …

Page 9: Query Rewriting in RDF Stream Processing

Rewriting in RSP

9

:obs1,:obs2 [5]

:obs3,:obs4 [10]

:obs5,:obs6 [15]

g1[3][4]

[7]

[10]

[11]

[13]

:obs1 rdf:type ssn:Observation . g2 :obs2 rdf:type ssn:Observation .

g3 :obs3 rdf:type ssn:Observation .

g4 :obs4 rdf:type ssn:Observation .

g5 :obs5 rdf:type ssn:Observation .

g6 :obs6 rdf:type ssn:Observation .

[1]

q’’W(S,t)(x) ← ssn:Observation(x) Query

RewritingqW(S,t)(x) ← ssn:observedBy(x,y)

q’W(S,t)(x) ← ssn:observedBy(x,y)

Original QueryRewritten UCQ

ssn:Observation ssn:observedBy

S

W(S,t)

Page 10: Query Rewriting in RDF Stream Processing

Rewriting in RSP: Semantics

10

𝑞 :𝑞h (𝒙 )←𝑝1∧…∧𝑝𝑛(𝒙𝑛)

𝑐𝑒𝑟𝑡 (𝑞 ,𝒪 )= {𝜶∨𝑞∪𝒯∪𝒜⊨𝑞h(𝜶 )}

𝑐𝑒𝑟𝑡 (𝑞 ,𝒪 )= {𝜶∨𝑞 ′∪𝒜⊨𝑞h(𝜶)}

Stream:

𝑐𝑒𝑟𝑡 (𝑞𝑤 , ⟨𝒯 ,𝒜𝑤 ⟩ )= {𝜶∨𝑞𝑤∪𝒯∪𝒜𝑤⊨𝑞h(𝜶 )}

𝑐𝑒𝑟𝑡 (𝑞𝑤 , ⟨𝒯 ,𝒜𝑤 ⟩ )= {𝜶∨𝑞 ′𝑤∪𝒜𝑤⊨𝑞h(𝜶)}

Query

Certain answers

Ontology

rewrittenquery

get rid of TBox

Streaming Aboxes

Window

Evaluated only for a window

Now for RDF streams:

Page 11: Query Rewriting in RDF Stream Processing

Implementation: StreamQR

11

kyrie rewriter CQELS

OntologyTBOX

StreamQR

query registration

continuous answers

CQELSquery CQELSUCQ

RDF Stream

Github StreamQR: https://github.com/jpcik/streapler/tree/streamQR

Page 12: Query Rewriting in RDF Stream Processing

StreamQR: Rewriting Steps

12

preprocess

separate BGP & context

OntologyTBOX

Datalog query

CQELSquery

RewrittenCQELSUCQ

remove functional

terms

remove auxiliary

predicates

expand syntactic transformation

UCQ

CQ (BGP)

Page 13: Query Rewriting in RDF Stream Processing

StreamQR in Action

13

met:TemperatureObservation ssn:observedBy.aws:TemperatureSensormet:AirTemperatureObservation met:TemperatureObservation met:ThermistorObservation met:TemperatureObservation aws:Thermistor aws:TemperatureSensor aws:CapacitiveBead aws:TemperatureSensor

TBoxCONSTRUCT { ?o a :ObservedTemperature. }WHERE { STREAM :stream1 [RANGE 10ms] { ?o ssn:observedBy ?t . ?t a aws:TemperatureSensor. }}

Q(?0) <- aws:TemperatureSensor(?1), ssn:observedBy(?0,?1)

kyrie rewriter

CQELS Query

CQ extraction & syntactic transformation

Page 14: Query Rewriting in RDF Stream Processing

StreamQR in Action

14

Q(?0) <- met:TemperatureObservation(?0)Q(?0) <- met:AirTemperatureObservation(?0)Q(?0) <- met:ThermistorObservation(?0)Q(?0) <- aws:TemperatureSensor(?1), ssn:observedBy(?0,?1)Q(?0) <- aws:Thermistor(?1), ssn:observedBy(?0,?1)Q(?0) <- aws:CapacitiveBead(?1), ssn:observedBy(?0,?1)

Q(?0) <- aws:TemperatureSensor(?1), ssn:observedBy(?0,?1)

met:TemperatureObservation ssn:observedBy.aws:TemperatureSensormet:AirTemperatureObservation met:TemperatureObservation met:ThermistorObservation met:TemperatureObservation aws:Thermistor aws:TemperatureSensor aws:CapacitiveBead aws:TemperatureSensor

TBox

Query expansion

Rewritten UCQ

Page 15: Query Rewriting in RDF Stream Processing

StreamQR in Action

15

Q(?0) <- met:TemperatureObservation(?0)Q(?0) <- met:AirTemperatureObservation(?0)Q(?0) <- met:ThermistorObservation(?0)Q(?0) <- aws:TemperatureSensor(?1), ssn:observedBy(?0,?1)Q(?0) <- aws:Thermistor(?1), ssn:observedBy(?0,?1)Q(?0) <- aws:CapacitiveBead(?1), ssn:observedBy(?0,?1)

PREFIX ssn: <http://purl.oclc.org/NET/ssnx/ssn#>PREFIX aws: <http://purl.oclc.org/NET/ssnx/meteo/aws#>PREFIX met: <http://purl.org/env/meteo#>CONSTRUCT { ?o a :ObservedTemperature. }WHERE { STREAM :stream1 [RANGE 10ms] { { ?o a met:TemperatureObservation } UNION { ?o a met:AirTemperatureObservation } UNION { ?o a met:ThermistorObservation } UNION { ?s a aws:TemperatureSensor . ?o ssn:observedBy ?s } UNION { ?s a aws:Thermistor . ?o ssn:observedBy ?s } UNION { ?s a aws:CapacitiveBead . ?o ssn:observedBy ?s }}CQELS rewritten

query: syntactic transformation

Rewritten UCQ

Page 16: Query Rewriting in RDF Stream Processing

Comparison: Rewriting vs No-rewriting

16

• Throughput under different input data loads• Modified version of the SRBench benchmark queries• Ontology based on AWS, which extends SSN-O

• Compared the throughput of StreamQR with CQELS (no rewiriting)• Tested under different input rates, ranging from 10 to 100K triples/s

• Under three different load conditions: such that only:• 10%, 50% and 90% of the input data matches the continuous query

10% of matches 90% of matches50% of matches

Reaches optimal t-put up to this point

Page 17: Query Rewriting in RDF Stream Processing

Varying input distributions

17

• The more matches, the more time the engine spends on evaluation • Up to 10K triples/s., StreamQR is capable to keep up with the input• Beyond that, the throughput degrades until it reaches a limit

Changning que input stream triple distribution

• Experiments under different input loads• Varying distribution of the types of triples

• 10, 20, 50, 80 and 90% of the triples match the query.

Page 18: Query Rewriting in RDF Stream Processing

Throughput for different queries

18

Different queries prducing from 2 to 180 rewritings

• Throughput decreases for queries that produce more rewrittings • Optimizations: existing techniques used in query rewriting and OBDA• Pruning queries that may not match any input. • For around 1K triples/s, it still reaches maximum throughput.

• Different queries -> UCQ with a different number of sub-queries. • 9 distinct queries that produce from 2 to over 180 sub-queries

Page 19: Query Rewriting in RDF Stream Processing

Comparing with TrOwl

19

• Removal operation known to be expensive in incremental materialization. • StreamQR sustains better throughput under fast input rates• Under lower input rates both are able to reach maximum throughput.

• TrOWL Provides incremental reasoning for ABoxes• Three settings:

• TrOWL without performing any reasoning (noreclassify)• Activating the reasoning, but allowing only additions• Including removals

Page 20: Query Rewriting in RDF Stream Processing

Conclusions

20

• Query answering over ontologies for RDF stream processors

• Novel approach that combines query rewriting techniques and an

RSP engines

• Implemented StreamQR: incorporates the kyrie into an existing RSP

• Still efficient in terms of throughput, for a large range of scenarios

• Plan to study other criteria such as correctness

• Exploring different expressiveness

• Find a good balance between efficiency and complexity.

• Research in approaches that combine rewriting and incremental

materialization

Page 21: Query Rewriting in RDF Stream Processing

Thanks a lot!¿Tienes preguntas?

Jean-Paul CalbimonteLSIR EPFL

@jpcik