On web stream processing

45
Department of Informatics On web stream processing Daniele Dell’Aglio [email protected] http://dellaglio.org @dandellaglio Linköping, 22.11.2017

Transcript of On web stream processing

Page 1: On web stream processing

Department of Informatics

On web stream processing

Daniele Dell’Aglio

[email protected] http://dellaglio.org @dandellaglio

Linköping, 22.11.2017

Page 2: On web stream processing

RDF Stream Processing

StreamProcessing

RDF

&

SPARQL

RDF Stream Processing

(RSP)

Real-time

processing of

highly dynamic

data

Semantic Web

technologies for

data exchange

through the Web

Linköping, 22.11.2017 On web stream processing 2

Page 3: On web stream processing

Finding agreements

Many topics

– RDF streams

– Stream reasoning

– Complex event processing

– Stream query processing

– Internet/web of things

Many studies

– Data models

– Query models

– Prototypes

– Benchmarks

– Datasets

W3C RSP community group (2013 – 2016)

– Effort to (discuss | formalise | standardise | combine | evangelise) the

existing studies on RSP

– Outcomes

– Abstract model for RDF streams

– Requirements document for query languages of RDF streams

– More at: https://www.w3.org/community/rsp/

Linköping, 22.11.2017 On web stream processing 3

Page 4: On web stream processing

But...

W3C RSP sets some foundations and requirements, but:

– Standard protocols and exchanging mechanisms for RDF stream are still missing

– We need generic and flexible solutions for making RDF streams available and exchangeable on the Web

Linköping, 22.11.2017 On web stream processing 4

Page 5: On web stream processing

The goal: a decentralized web of RSPs

MorphStreams

CSPARQL

TrOWL StreamRule

CQELS

CSPARQL

Instans

Q1: How can we let RSP engines interact and

exchange streams on the web?

Linköping, 22.11.2017 On web stream processing 5

Page 6: On web stream processing

The goal: a decentralized web of RSPs in the web

MorphStreams

CSPARQL

StreamRule

CSPARQL

Instans

SPARQL

Q2: How to integrate stream processing with

background knowledge exposed remotely on the web?

SPARQL

CQELS

TrOWL

Linköping, 22.11.2017 On web stream processing 6

Page 7: On web stream processing

EXCHANGING STREAMS ON

THE WEB

Linköping, 22.11.2017 On web stream processing 7

Page 8: On web stream processing

How far are we?

Documents from RSP

– Abstract model of RDF Stream

– Requirements for query languages for RDF Stream

Protocols to exchange data streams on the web and internet

– WebSocket, MQTT

Description of the stream

– SSN

Interfaces to control RSP engines

Linköping, 22.11.2017 On web stream processing 8

Page 9: On web stream processing

Requirements

A framework for RDF stream exchange should

1. prioritize active paradigms for data stream exchange

2. enable the combination of streaming and stored data

3. enable the possibility to build reliable, distributed and scalable streaming applications

4. guarantee a wide range of operations over the streams

5. support the publication of information about the stream

6. support the exchange of a wide variety of streams

7. exploit as much as possible existing protocols and standards

Linköping, 22.11.2017 On web stream processing 9

Page 10: On web stream processing

WeSP

A framework to publish and exchange RDF streams on the web

• A model to serialise RDF streams

• A model to describe RDF streams

• A communication protocol

Linköping, 22.11.2017 On web stream processing 10

Page 11: On web stream processing

A model to serialise RDF streams

An RDF stream can be represented as an (infinite) ordered sequence of time-annotated data items (RDF graphs)…

... serialized in JSON-LD

[{ "@graph": {"@id": "http://.../G1",{ "@id": "http://.../a", "http://.../isIn": {"@id":"http://.../rRoom"}}

},{ "@id": "http://.../G1","prov:generatedAt":"2016-16-12T00:01:00"

}},{ "@graph": {

"@id": "http://.../G2",{ "@id": "http://.../b",

"http://.../isIn": {"@id":"http://.../bRoom"}}},{ "@id": "http://.../G2",

"prov:generatedAt":" 2016-16-12T00:03:00"}

},…

Compliant with RDF, as well as W3C RSP abstract data model

G1

G2

G3

{:a :isIn :rRoom}

{:b :isIn :bRoom}

{:c :talksIn :rRoom,

:d :talksIn :bRoom}

S

3

5

1

t

Linköping, 22.11.2017 On web stream processing 11

Page 12: On web stream processing

A model to describe RDF streams

A description of the RDF stream should be provided

• The identifier of the stream

• A description of the schema of the stream items

• Data item samples

• The location of the stream endpoint (e.g. WebSocket URL)

This description is provided through the RDF Stream Descriptor

• Serialised in RDF

• An extension of DCAT and SPARQL Service Descriptor

• Published according to the linked data principles

Linköping, 22.11.2017 On web stream processing 12

Page 13: On web stream processing

A communication protocol

Two interfaces

• Producer

• Consumer

We distinguish three types of actors (depending on the implemented interfaces)

Producer Consumer

Stream source

Stream

transformer

Stream sink

Linköping, 22.11.2017 On web stream processing 13

Page 14: On web stream processing

A communication protocol: push-based streams

Producer

ConsumerStream Descriptor

endpoint

RDF stream

endpoint

Get stream descriptor (SD)

SDProcess

SD

Subscribe to stream

Stream item

Stream item

Stream item…

Process

stream

Linköping, 22.11.2017 On web stream processing 14

Page 15: On web stream processing

A communication protocol: pull-based streams

Producer

ConsumerStream Descriptor

endpoint

RDF stream

endpoint

Get stream descriptor (SD)

SDProcess

SD

GET items

Stream items

Process

stream

GET items

Stream items

GET items

Stream items

Linköping, 22.11.2017 On web stream processing 15

Page 16: On web stream processing

Protocols

The RDF Stream Descriptor is accessible through HTTP

The transmission of the stream can happen through different protocols

• HTTP chunked encoding

• WebSocket

• Message Queing Telemetry Transport (MQTT)

• Server-Sent Events (SSE)

• HTTP

• ...

Linköping, 22.11.2017 On web stream processing 16

Page 17: On web stream processing

WeSP: Proof of concepts

C-SPARQL

• Stream transformer

• WeSP implemented as a wrapper

• https://github.com/dellaglio/csparql-wesp

CQELS

• Stream transformer

• Native implementation of WeSP

• https://github.com/cqels/CQELS-1.x/

TripleWave

• Stream source

• Native implementation of WeSP

• http://streamreasoning.github.io/TripleWave

Linköping, 22.11.2017 On web stream processing 17

Page 18: On web stream processing

TripleWave

TripleWave is open source

• Learn more at: https://streamreasoning.github.io/TripleWave/

Triple

Wave

input?

RDF Streams(Web socket |

HTTP-chunk |

etc.)

Stream

Descriptor

Linköping, 22.11.2017 On web stream processing 18

Page 19: On web stream processing

Feeding TripleWave

TripleWave supports a variety of data sources:

• RDF dumps with temporal information

• RDF with temporal information exposed through SPARQL endpoints

• Streams available on the Web

Web API

Transform Stream

Graph stream

Connector stream

Datagen stream

Scheduler stream

Web Service

SPARQL Endpoint

File

R2RML Mapping

Conversion

Replay

Replay loop

Linköping, 22.11.2017 On web stream processing 19

Page 20: On web stream processing

Summary

WeSP: framework to exchange RDF streams on the web

– RDF to serialise the stream items

– RDF to describe the stream

– Application and communication protocols: HTTP, WebSocket, MQTT, etc.

– Interfaces to produce and consume RDF streams

What’s next?

– Relation with other technologies: LDN, Activity Streams, etc.

– Adoption

– Federated stream processing over the Web

Linköping, 22.11.2017 On web stream processing 20

Page 21: On web stream processing

COMBINING STREAMS AND

BACKGROUND DATA

Linköping, 22.11.2017 On web stream processing 21

Page 22: On web stream processing

The goal: a decentralized web of RSPs in the web

MorphStreams

CSPARQL

StreamRule

CSPARQL

Instans

SPARQL

Q2: How to integrate stream processing with

background knowledge exposed remotely on the web?

SPARQL

CQELS

TrOWL

Linköping, 22.11.2017 On web stream processing 22

Page 23: On web stream processing

W(ω,β)

Evaluation

Time-based sliding window

S3

S4 S5

S6

S7

S8

S9 S10

S11

S12

S1

S2

β

ω

t

widthslideWindow

S

Linköping, 22.11.2017 On web stream processing 23

Page 24: On web stream processing

Join

RDF stream generator

Background data(SPARQL endpoint)

Win

do

w

The setting

Background data changes and it is stored on the web

Accessing background data is costly

Is it possible to avoid a continuous access to the background data?

Linköping, 22.11.2017 On web stream processing 24

Page 25: On web stream processing

Local view

How to cope with changes on the background data?

Join

RDF stream generator

Background data(SPARQL endpoint)

Win

do

w

Local view

Linköping, 22.11.2017 On web stream processing 25

Page 26: On web stream processing

Maintenance process

Maintenance introduces a trade-off between response quality and time.

We propose to manage this trade-off by fixing time dimension based on query constraints and maximizing freshness of response.

Join

RDF stream generator

Background data(SPARQL endpoint)

Win

do

w

Local View

Maintenance process

Linköping, 22.11.2017 On web stream processing 26

Page 27: On web stream processing

How to track background data changes?

Update streams

• stream with changes available to the query processor

• rarely available on the Web, e.g. Wikipedia, SPARQLPush

Data changes regularly

• data generated by automatic processes that refresh it periodically

• data warehouses, sensors

Data changes “randomly”

• Twitter user profiles, taxi status, financial updates

Linköping, 22.11.2017 On web stream processing 27

Page 28: On web stream processing

Requirements

The maintenance process:

1. should take into account the change rates of the data elements in the background data;

2. should consider the dynamicity of the change rate values;

3. should satisfy the Quality of Service constraints on responsiveness and freshness of the answer;

4. may consider the query and its definition.

Linköping, 22.11.2017 On web stream processing 28

Page 29: On web stream processing

A query-driven maintenance process

WINDOW(S, ω, β) PW JOIN SERVICE(BKG) PS

WINDOW clause

JOIN Proposer Ranker

MaintainerLocal View

Ω𝑗𝑜𝑖𝑛4 2

3

1

SERVICE clause

E

C

RND

LRU

WBM

SBM

IBM

WSJ

Linköping, 22.11.2017 On web stream processing 29

Page 30: On web stream processing

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

Terminology

Best Before Time: the time that an element will

become stale and is defined by:

Mappings from the WINDOW clause

Mappings in the LOCAL VIEW

Compatible mappings

Linköping, 22.11.2017 On web stream processing 30

Page 31: On web stream processing

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

WSJ

WSJ identifies the candidate set: the possibly stale local view mappings involved in the current evaluation.

WSJ analyzes the content of the current window evaluation and identifying the compatible mappings in the local view.

The possibly stale mappings are identified by analyzing the associated best before time

Linköping, 22.11.2017 On web stream processing 31

Page 32: On web stream processing

V L Score

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

WBM

WBM ranks the candidate set to determine which mappings to update.

The ranking is computed through two values: the renewed best before time and the remaining life time

The top k elements are selected to be refreshed. The value k is selected according to the responsiveness constraint.

Linköping, 22.11.2017 On web stream processing 32

Page 33: On web stream processing

V L Score

3

4

1

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

WBM: renewed best before time

When would the mappings became stale if refreshed now?

The renewed best before time V is computed as:

Linköping, 22.11.2017 On web stream processing 33

Page 34: On web stream processing

V L Score

3 3

4 1

1 3

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

WBM: remaining life time and score

For how many future evaluations the mappings is involved?

The remaining life time L is computed as:

WBM ranks the mappings by using a score:

Score=min(L,V)

is selected for the maintenance

Linköping, 22.11.2017 On web stream processing 34

Page 35: On web stream processing

Experiments

Linköping, 22.11.2017 On web stream processing 35

Page 36: On web stream processing

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

Extensions: SBM

It exploits the fact that mappings may have n-n relations

• Each pair generates a join (e.g. BGP)

If is refreshed, there will be four fresh mappings

If is refreshed, there will be five fresh mappings

is selected for the maintenance

Linköping, 22.11.2017 On web stream processing 36

Page 37: On web stream processing

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

Extensions: SBM

It exploits the fact that mappings may have n-n relations

• A result is fresh if all the pairs are fresh (e.g. aggregations)

If is refreshed, there will be one fresh mapping

If is refreshed, there will be two fresh mappings

is selected for the maintenance

fresh

Linköping, 22.11.2017 On web stream processing 37

Page 38: On web stream processing

Other extensions

We developed a other rankers:

IBM: combines WBM and SBM, taking into account both the number of produced join mappings in the present and in future windows

FBA: dynamic allocations of the refresh operations among different evaluations

F rankers: extensions of the presented rankers to cope with queries with FILTER clauses on the subquery over the background data

Linköping, 22.11.2017 On web stream processing 38

Page 39: On web stream processing

Summary

We proposed using the idea of materialization to optimize processing continuous queries.

We proposed a policy to maximize the freshness according to time constraint in continuous query.

We tested our policy against based line policies (LRU and Random).

Future Work:

– Measuring the time overhead of maintenance

– Investigating more queries involving both remote SPARQL endpoints and streams.

– Dynamically estimating the change rate of users.

Linköping, 22.11.2017 On web stream processing 39

Page 40: On web stream processing

Acknowledgments

Linköping, 22.11.2017 On web stream processing 40

Page 41: On web stream processing

Conclusions

RDF (or semantic) streams are getting a momentum

• Several active research groups, working on querying and reasoning

• Prototypes, methods and applications

• Query languages, ontologies

• Use cases

However, the web dimension has only been slightly considered

Linköping, 22.11.2017 On web stream processing 41

Page 42: On web stream processing

What’s next?

We still need

• Infrastructures and standards to exchange (RDF) streams on the Web

• Agreements on languages to specify tasks over such streams

• Query languages richer than SPARQL not only to manage streams, but also to express higher-level operations

• Methods to manage reasoning tasks over streams

The Web dimension requires to be studied and understood

• Combination of remote streams and background data requires new solutions

• Not only queries, but also constraints over them (QoS)

Linköping, 22.11.2017 On web stream processing 42

Page 43: On web stream processing

Thank you! Questions?

On web stream processing

Daniele Dell’Aglio

[email protected]

http://dellaglio.org

@dandellaglio

Linköping, 22.11.2017 On web stream processing 43

Page 44: On web stream processing

Find more: Q1

• A. Mauri, J.-P. Calbimonte, D. Dell’Aglio, M. Balduini, E. Della Valle, K. Aberer: Where Are the RDF Streams?: On Deploying RDF Streams on the Web of Data with TripleWave. Poster at International Semantic Web Conference 2015.

• A. Mauri, J.-P. Calbimonte, D. Dell’Aglio, M. Balduini, M. Brambilla, E. Della Valle, K. Aberer: TripleWave: Spreading RDF Streams on the Web. Resource Paper at International Semantic Web Conference 2016.

• D. Dell'Aglio, D. Le Phuoc, A. Lê Tuán, M. Intizar Ali, J.-P.Calbimonte: On a Web of Data Streams. DeSemWeb@ISWC 2017

Linköping, 22.11.2017 On web stream processing 44

Page 45: On web stream processing

Find more: Q2

• S. Dehghanzadeh, A. Mileo, D. Dell'Aglio, E. Della Valle, Shen Gao, A. Bernstein: Online View Maintenance for Continuous Query Evaluation. WWW (Companion Volume) 2015: 25-26

• S. Dehghanzadeh, D. Dell'Aglio, S. Gao, E. Della Valle, A. Mileo, A. Bernstein: Approximate Continuous Query Answering over Streams and Dynamic Linked Data Sets. ICWE 2015: 307-325

• S. Zahmatkesh, E. Della Valle, D. Dell'Aglio: When a FILTER Makes the Difference in Continuously Answering SPARQL Queries on Streaming and Quasi-Static Linked Data. ICWE 2016: 299-316

• S. Gao, D. Dell'Aglio, S. Dehghanzadeh, A. Bernstein, E. Della Valle, A. Mileo: Planning Ahead: Stream-Driven Linked-Data Access Under Update-Budget Constraints. International Semantic Web Conference (1) 2016: 252-270

• S. Zahmatkesh, E. Della Valle, D. Dell'Aglio: Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data. DEBS 2017: 170-179

Linköping, 22.11.2017 On web stream processing 45