An experience on empirical research about rdf stream

22
Dipartimento di Elettronica, Informazione e Bioingegneria An Experience on Empirical Research about RDF Stream Processing Daniele Dell’Aglio [email protected] Joint work with: Jean-Paul Calbimonte, Marco Balduini, Oscar Corcho and Emanuele Della Valle

description

The invited talk I gave at the EMPIRICAL 2014 workshop at the ESWC 2014

Transcript of An experience on empirical research about rdf stream

Page 1: An experience on empirical research about rdf stream

Dipartimento di

Elettronica, Informazione e

Bioingegneria

An Experience on Empirical

Research about RDF Stream

Processing

Daniele Dell’Aglio – [email protected]

Joint work with: Jean-Paul Calbimonte, Marco Balduini, Oscar Corcho

and Emanuele Della Valle

Page 2: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

RDF Stream Processing in a nutshell

Continuous queries over RDF streams - infinite

sequences of time-stamped RDF statements (RDF

streams)

Bring together DSMS/CEP and Semantic Web research

fields

Several prototypes – with similar models – are available

today

Trend on evaluation and comparison of the existing

systems

26 May 2014 - EMPIRICAL@ESWC2014

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 2

Page 3: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

The CQL model for RSPs

Transform a set of mappings in another set of

mappings

SPARQL 1.0/1.1 queries

Each set of mapping produced by the R2R operator

is transformed and appended to the output

stream

Operators: RStream, DStream, IStream

Converts the infinite stream of RDF elements in a

finite set of mappings

The window operators: time-based, tuple-based, …

S2R

operator

R2R

operator

R2S

operator

Input stream

Output stream Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 3

26 May 2014 - EMPIRICAL@ESWC2014

Page 4: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

R2R operator

S2R - Time-based sliding window

S3

S4 S5

S6

S7

S8

S9 S10

S11

S12

S S1

S2

W(ω,β)

β

ω

t

width slide

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 4

26 May 2014 - EMPIRICAL@ESWC2014

Page 5: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Implementations (oversimplified!)

C-SPARQL – RDF Store + Stream processor

RDF Store

Stream processor

Continuous query

continuous results

translator

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 5

26 May 2014 - EMPIRICAL@ESWC2014

Page 6: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Implementations (oversimplified!)

C-SPARQL – RDF Store + Stream processor

CQELS: – Implemented from scratch. Focus on performance

RDF Store

Stream processor

Continuous query

continuous results

Native RSP Continuous

query continuous

results

translator

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 5

26 May 2014 - EMPIRICAL@ESWC2014

Page 7: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Implementations (oversimplified!)

C-SPARQL – RDF Store + Stream processor

CQELS: – Implemented from scratch. Focus on performance

SPARQLstream: – Ontology-based stream query answering

RDF Store

Stream processor

Continuous query

continuous results

Native RSP Continuous

query continuous

results

translator

DSMS/CEP Continuous

query continuous

results rewriter

R2RML mappings

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 5

26 May 2014 - EMPIRICAL@ESWC2014

Page 8: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Same inputs, different outputs…

And the continuous

query: – Where are Alice and

Bob, when they are together?

– With a tumbling window W(ω=β=5)

Execution 1° answer 2° answer

1 :hall [6] :kitchen [11]

2 :hall [5] :kitchen [10]

3 :hall [6] :kitchen [11]

4 - [7] - [12]

S1 S2 S3 S4 S

t 3 6 9 1

:alice :isIn :hall

:bob :isIn :hall

:alice :isIn :kitchen

:bob :isIn :kitchen

width slide

After 4 executions:

Let’s consider the following stream:

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 8

26 May 2014 - EMPIRICAL@ESWC2014

Page 9: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

The first hypothesis

All the three systems show similar behaviours

Intuition: there are one or more parameters that are not

taken into account by the model

As consequence, the implementations can output

different correct answers

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 9

26 May 2014 - EMPIRICAL@ESWC2014

Page 10: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

The first hypothesis

HP1: it is possible to have a unique correct answer if we

can control the time instant on which the sliding window

operator starts to work (t0)

S1 S2 S3 S4 S

t 3 6 9 1

:bob :isIn :hall :bob :isIn :kitchen

t0=0

:alice :isIn :hall :alice :isIn :kitchen

t0=1

t0=2

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 10

26 May 2014 - EMPIRICAL@ESWC2014

Page 11: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

The experiment

We work on the difference between the time

instant on which the stream starts (ts) and the

query registration time (tq) – At each execution, we check the result

– We estimated the delay between tq and t0

tq

ts

Black box approach – we work on inputs/outputs

– the source code of all the systems

RSP

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 11

26 May 2014 - EMPIRICAL@ESWC2014

t0

Page 12: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Observation and explanation

As result, for each system – We identified the value of the t0 parameter

– We are able to produce the different results for each t0

value

Is it enough to claim that hypothesis 1 holds?

Exec 1° answer 2° answer

1 :hall [6] :kitchen [11]

2 :hall [5] :kitchen [10]

3 :hall [6] :kitchen [11]

4 - [7] - [12]

Window 1° answer 2° answer

t0=0 :hall [5] :kitchen [10]

t0=1 :hall [6] :kitchen [11]

t0=2 - [7] - [12]

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 12

26 May 2014 - EMPIRICAL@ESWC2014

Page 13: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Some consideration on the experiment

Comparison:

– We ran the experiment multiple times to collect instances and check them

Reproducibility: can other researchers reproduce the

experiment?

– We released both the code and the data used for the experiment (see http://streamreasoning.org/Benchmarks/)

Repeatability: is the result universally valid?

– We changed inputs (streams and queries) and OS/JVM to verify if the hypothesis holds

– We repeated the experiment with different implementations (C-SPARQL, CQELS, etc.)

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 13

26 May 2014 - EMPIRICAL@ESWC2014

Page 14: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Something more on repeatability…

We made some assumptions on the setting

26 May 2014 - EMPIRICAL@ESWC2014

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 14

S2R

R2R R2S S2R

S2R

From single

to multi

window

From single to

multi stream

Reasoning

q2

Static

knowledge Multiple

queries

Page 15: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

As “side effect” of the first experiment, we

discovered that results of different systems are

not the same:

Intuition: t0 is not the only parameter our model

lacks

A more complex problem…

Exec 1° answer 2° answer

1 :hall [6] :kitchen [11]

2 :hall [5] :kitchen [10]

3 :hall [6] :kitchen [11]

4 - [7] - [12]

Exec 1° answer 2° answer

1 :hall [3] :kitchen [9]

2 No answers

3 :hall [3] :kitchen [9]

4 No answers

C-SPARQL CQELS

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 15

26 May 2014 - EMPIRICAL@ESWC2014

Page 16: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

R2R operator

The SECRET framework

S3

S4 S5

S6

S7

S8

S9 S10

S11

S12

S S1

S2

W(ω,β)

β

ω

t0: When does the

window start?

(internal window

param)

TICK: When are

data stream

elements added to

the window?

Triple-based vs

graph-based

REPORT: When is the window content

made available to the R2R operator?

Non-empty content, Content-change,

Window-close, Periodic

t

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 16

26 May 2014 - EMPIRICAL@ESWC2014

Page 17: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

SECRET and RSPs

HP2: given an input stream, a query, the value of t0 and

description of the RSP w.r.t. SECRET, we can determine

the answer that will be provided by the system

To investigate it, we built a software that evaluates in

batch the answer and matches it with the RSP one

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 17

26 May 2014 - EMPIRICAL@ESWC2014

Page 18: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Observation and analysis

We prepared a set of seven

queries (to stress different part of

the sliding window)

We run each query multiple times

Most of the times, we can foresee the

answer that will be provided

CQ

ELS

C-S

PA

RQ

L

SP

AR

QL

stre

am

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 18

26 May 2014 - EMPIRICAL@ESWC2014

Page 19: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Observation and analysis

We investigated the observations where there is

not a match, and we discovered that they were

errors in the implementations, such as: – Initialization

– Slide parameter

– Window contents

– Internal timestamp management

Conclusion: HP2 seems to be valid in the

considered setting

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 19

26 May 2014 - EMPIRICAL@ESWC2014

Page 20: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

CSR-bench

The main outcome of our experience is CSR-bench, an

extension of the CSR benchmark

– More info at http://www.w3.org/wiki/CSRBench

Two main components:

– A common model for the RDF stream processor operational semantics

– An oracle (an automatic correctness validator), available at https://github.com/dellaglio/csrbench-oracle

– A test suite

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 20

26 May 2014 - EMPIRICAL@ESWC2014

Page 21: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

References

Daniele Dell'Aglio, Marco Balduini, Emanuele Della Valle. On the need to

include functional testing in RDF stream engine benchmarks. 1st

International Workshop on Benchmarking RDF Systems (BeRSys2013)

Daniele Dell'Aglio, Jean-Paul Calbimonte, Marco Balduini, Óscar Corcho,

Emanuele Della Valle: On Correctness in RDF Stream Processor

Benchmarking. International Semantic Web Conference (2) 2013: 326-342

Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: C-

SPARQL: A continuous query language for RDF data streams. IJSC 4(1)

(2010) 3–25

Calbimonte, J.P., Jeung, H., Corcho, O., Aberer, K.: Enabling Query

Technologies for the Semantic Sensor Web. IJSWIS 8(1) (2012) 43–63

Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and

adaptive approach for unified processing of linked streams and linked data.

In: ISWC. (2011) 370–388

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 21

26 May 2014 - EMPIRICAL@ESWC2014

Page 22: An experience on empirical research about rdf stream

Dipartimento di Elettronica, Informazione e Bioingegneria

Thank you! Questions?

An Experience on Empirical Research about

RDF Stream Processing

Daniele Dell’Aglio

(DEIB, Politecnico di Milano)

[email protected]

Danie

le D

ell'

Agli

o -

Exper

imen

tal

rese

arc

h a

bout

RSP

s 22

26 May 2014 - EMPIRICAL@ESWC2014