On web stream processing

Post on 22-Jan-2018

93 views 3 download

Transcript of On web stream processing

Department of Informatics

On web stream processing

Daniele Dell’Aglio

dellaglio@ifi.uzh.ch http://dellaglio.org @dandellaglio

Linköping, 22.11.2017

RDF Stream Processing

StreamProcessing

RDF

&

SPARQL

RDF Stream Processing

(RSP)

Real-time

processing of

highly dynamic

data

Semantic Web

technologies for

data exchange

through the Web

Linköping, 22.11.2017 On web stream processing 2

Finding agreements

Many topics

– RDF streams

– Stream reasoning

– Complex event processing

– Stream query processing

– Internet/web of things

Many studies

– Data models

– Query models

– Prototypes

– Benchmarks

– Datasets

W3C RSP community group (2013 – 2016)

– Effort to (discuss | formalise | standardise | combine | evangelise) the

existing studies on RSP

– Outcomes

– Abstract model for RDF streams

– Requirements document for query languages of RDF streams

– More at: https://www.w3.org/community/rsp/

Linköping, 22.11.2017 On web stream processing 3

But...

W3C RSP sets some foundations and requirements, but:

– Standard protocols and exchanging mechanisms for RDF stream are still missing

– We need generic and flexible solutions for making RDF streams available and exchangeable on the Web

Linköping, 22.11.2017 On web stream processing 4

The goal: a decentralized web of RSPs

MorphStreams

CSPARQL

TrOWL StreamRule

CQELS

CSPARQL

Instans

Q1: How can we let RSP engines interact and

exchange streams on the web?

Linköping, 22.11.2017 On web stream processing 5

The goal: a decentralized web of RSPs in the web

MorphStreams

CSPARQL

StreamRule

CSPARQL

Instans

SPARQL

Q2: How to integrate stream processing with

background knowledge exposed remotely on the web?

SPARQL

CQELS

TrOWL

Linköping, 22.11.2017 On web stream processing 6

EXCHANGING STREAMS ON

THE WEB

Linköping, 22.11.2017 On web stream processing 7

How far are we?

Documents from RSP

– Abstract model of RDF Stream

– Requirements for query languages for RDF Stream

Protocols to exchange data streams on the web and internet

– WebSocket, MQTT

Description of the stream

– SSN

Interfaces to control RSP engines

Linköping, 22.11.2017 On web stream processing 8

Requirements

A framework for RDF stream exchange should

1. prioritize active paradigms for data stream exchange

2. enable the combination of streaming and stored data

3. enable the possibility to build reliable, distributed and scalable streaming applications

4. guarantee a wide range of operations over the streams

5. support the publication of information about the stream

6. support the exchange of a wide variety of streams

7. exploit as much as possible existing protocols and standards

Linköping, 22.11.2017 On web stream processing 9

WeSP

A framework to publish and exchange RDF streams on the web

• A model to serialise RDF streams

• A model to describe RDF streams

• A communication protocol

Linköping, 22.11.2017 On web stream processing 10

A model to serialise RDF streams

An RDF stream can be represented as an (infinite) ordered sequence of time-annotated data items (RDF graphs)…

... serialized in JSON-LD

[{ "@graph": {"@id": "http://.../G1",{ "@id": "http://.../a", "http://.../isIn": {"@id":"http://.../rRoom"}}

},{ "@id": "http://.../G1","prov:generatedAt":"2016-16-12T00:01:00"

}},{ "@graph": {

"@id": "http://.../G2",{ "@id": "http://.../b",

"http://.../isIn": {"@id":"http://.../bRoom"}}},{ "@id": "http://.../G2",

"prov:generatedAt":" 2016-16-12T00:03:00"}

},…

Compliant with RDF, as well as W3C RSP abstract data model

G1

G2

G3

{:a :isIn :rRoom}

{:b :isIn :bRoom}

{:c :talksIn :rRoom,

:d :talksIn :bRoom}

S

3

5

1

t

Linköping, 22.11.2017 On web stream processing 11

A model to describe RDF streams

A description of the RDF stream should be provided

• The identifier of the stream

• A description of the schema of the stream items

• Data item samples

• The location of the stream endpoint (e.g. WebSocket URL)

This description is provided through the RDF Stream Descriptor

• Serialised in RDF

• An extension of DCAT and SPARQL Service Descriptor

• Published according to the linked data principles

Linköping, 22.11.2017 On web stream processing 12

A communication protocol

Two interfaces

• Producer

• Consumer

We distinguish three types of actors (depending on the implemented interfaces)

Producer Consumer

Stream source

Stream

transformer

Stream sink

Linköping, 22.11.2017 On web stream processing 13

A communication protocol: push-based streams

Producer

ConsumerStream Descriptor

endpoint

RDF stream

endpoint

Get stream descriptor (SD)

SDProcess

SD

Subscribe to stream

Stream item

Stream item

Stream item…

Process

stream

Linköping, 22.11.2017 On web stream processing 14

A communication protocol: pull-based streams

Producer

ConsumerStream Descriptor

endpoint

RDF stream

endpoint

Get stream descriptor (SD)

SDProcess

SD

GET items

Stream items

Process

stream

GET items

Stream items

GET items

Stream items

Linköping, 22.11.2017 On web stream processing 15

Protocols

The RDF Stream Descriptor is accessible through HTTP

The transmission of the stream can happen through different protocols

• HTTP chunked encoding

• WebSocket

• Message Queing Telemetry Transport (MQTT)

• Server-Sent Events (SSE)

• HTTP

• ...

Linköping, 22.11.2017 On web stream processing 16

WeSP: Proof of concepts

C-SPARQL

• Stream transformer

• WeSP implemented as a wrapper

• https://github.com/dellaglio/csparql-wesp

CQELS

• Stream transformer

• Native implementation of WeSP

• https://github.com/cqels/CQELS-1.x/

TripleWave

• Stream source

• Native implementation of WeSP

• http://streamreasoning.github.io/TripleWave

Linköping, 22.11.2017 On web stream processing 17

TripleWave

TripleWave is open source

• Learn more at: https://streamreasoning.github.io/TripleWave/

Triple

Wave

input?

RDF Streams(Web socket |

HTTP-chunk |

etc.)

Stream

Descriptor

Linköping, 22.11.2017 On web stream processing 18

Feeding TripleWave

TripleWave supports a variety of data sources:

• RDF dumps with temporal information

• RDF with temporal information exposed through SPARQL endpoints

• Streams available on the Web

Web API

Transform Stream

Graph stream

Connector stream

Datagen stream

Scheduler stream

Web Service

SPARQL Endpoint

File

R2RML Mapping

Conversion

Replay

Replay loop

Linköping, 22.11.2017 On web stream processing 19

Summary

WeSP: framework to exchange RDF streams on the web

– RDF to serialise the stream items

– RDF to describe the stream

– Application and communication protocols: HTTP, WebSocket, MQTT, etc.

– Interfaces to produce and consume RDF streams

What’s next?

– Relation with other technologies: LDN, Activity Streams, etc.

– Adoption

– Federated stream processing over the Web

Linköping, 22.11.2017 On web stream processing 20

COMBINING STREAMS AND

BACKGROUND DATA

Linköping, 22.11.2017 On web stream processing 21

The goal: a decentralized web of RSPs in the web

MorphStreams

CSPARQL

StreamRule

CSPARQL

Instans

SPARQL

Q2: How to integrate stream processing with

background knowledge exposed remotely on the web?

SPARQL

CQELS

TrOWL

Linköping, 22.11.2017 On web stream processing 22

W(ω,β)

Evaluation

Time-based sliding window

S3

S4 S5

S6

S7

S8

S9 S10

S11

S12

S1

S2

β

ω

t

widthslideWindow

S

Linköping, 22.11.2017 On web stream processing 23

Join

RDF stream generator

Background data(SPARQL endpoint)

Win

do

w

The setting

Background data changes and it is stored on the web

Accessing background data is costly

Is it possible to avoid a continuous access to the background data?

Linköping, 22.11.2017 On web stream processing 24

Local view

How to cope with changes on the background data?

Join

RDF stream generator

Background data(SPARQL endpoint)

Win

do

w

Local view

Linköping, 22.11.2017 On web stream processing 25

Maintenance process

Maintenance introduces a trade-off between response quality and time.

We propose to manage this trade-off by fixing time dimension based on query constraints and maximizing freshness of response.

Join

RDF stream generator

Background data(SPARQL endpoint)

Win

do

w

Local View

Maintenance process

Linköping, 22.11.2017 On web stream processing 26

How to track background data changes?

Update streams

• stream with changes available to the query processor

• rarely available on the Web, e.g. Wikipedia, SPARQLPush

Data changes regularly

• data generated by automatic processes that refresh it periodically

• data warehouses, sensors

Data changes “randomly”

• Twitter user profiles, taxi status, financial updates

Linköping, 22.11.2017 On web stream processing 27

Requirements

The maintenance process:

1. should take into account the change rates of the data elements in the background data;

2. should consider the dynamicity of the change rate values;

3. should satisfy the Quality of Service constraints on responsiveness and freshness of the answer;

4. may consider the query and its definition.

Linköping, 22.11.2017 On web stream processing 28

A query-driven maintenance process

WINDOW(S, ω, β) PW JOIN SERVICE(BKG) PS

WINDOW clause

JOIN Proposer Ranker

MaintainerLocal View

Ω𝑗𝑜𝑖𝑛4 2

3

1

SERVICE clause

E

C

RND

LRU

WBM

SBM

IBM

WSJ

Linköping, 22.11.2017 On web stream processing 29

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

Terminology

Best Before Time: the time that an element will

become stale and is defined by:

Mappings from the WINDOW clause

Mappings in the LOCAL VIEW

Compatible mappings

Linköping, 22.11.2017 On web stream processing 30

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

WSJ

WSJ identifies the candidate set: the possibly stale local view mappings involved in the current evaluation.

WSJ analyzes the content of the current window evaluation and identifying the compatible mappings in the local view.

The possibly stale mappings are identified by analyzing the associated best before time

Linköping, 22.11.2017 On web stream processing 31

V L Score

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

WBM

WBM ranks the candidate set to determine which mappings to update.

The ranking is computed through two values: the renewed best before time and the remaining life time

The top k elements are selected to be refreshed. The value k is selected according to the responsiveness constraint.

Linköping, 22.11.2017 On web stream processing 32

V L Score

3

4

1

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

WBM: renewed best before time

When would the mappings became stale if refreshed now?

The renewed best before time V is computed as:

Linköping, 22.11.2017 On web stream processing 33

V L Score

3 3

4 1

1 3

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

WBM: remaining life time and score

For how many future evaluations the mappings is involved?

The remaining life time L is computed as:

WBM ranks the mappings by using a score:

Score=min(L,V)

is selected for the maintenance

Linköping, 22.11.2017 On web stream processing 34

Experiments

Linköping, 22.11.2017 On web stream processing 35

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

Extensions: SBM

It exploits the fact that mappings may have n-n relations

• Each pair generates a join (e.g. BGP)

If is refreshed, there will be four fresh mappings

If is refreshed, there will be five fresh mappings

is selected for the maintenance

Linköping, 22.11.2017 On web stream processing 36

τ

t5 6 7 8 9 10 11

W1 W2 W3 W4

124

5 6 7 8 9 10 11 124

Extensions: SBM

It exploits the fact that mappings may have n-n relations

• A result is fresh if all the pairs are fresh (e.g. aggregations)

If is refreshed, there will be one fresh mapping

If is refreshed, there will be two fresh mappings

is selected for the maintenance

fresh

Linköping, 22.11.2017 On web stream processing 37

Other extensions

We developed a other rankers:

IBM: combines WBM and SBM, taking into account both the number of produced join mappings in the present and in future windows

FBA: dynamic allocations of the refresh operations among different evaluations

F rankers: extensions of the presented rankers to cope with queries with FILTER clauses on the subquery over the background data

Linköping, 22.11.2017 On web stream processing 38

Summary

We proposed using the idea of materialization to optimize processing continuous queries.

We proposed a policy to maximize the freshness according to time constraint in continuous query.

We tested our policy against based line policies (LRU and Random).

Future Work:

– Measuring the time overhead of maintenance

– Investigating more queries involving both remote SPARQL endpoints and streams.

– Dynamically estimating the change rate of users.

Linköping, 22.11.2017 On web stream processing 39

Acknowledgments

Linköping, 22.11.2017 On web stream processing 40

Conclusions

RDF (or semantic) streams are getting a momentum

• Several active research groups, working on querying and reasoning

• Prototypes, methods and applications

• Query languages, ontologies

• Use cases

However, the web dimension has only been slightly considered

Linköping, 22.11.2017 On web stream processing 41

What’s next?

We still need

• Infrastructures and standards to exchange (RDF) streams on the Web

• Agreements on languages to specify tasks over such streams

• Query languages richer than SPARQL not only to manage streams, but also to express higher-level operations

• Methods to manage reasoning tasks over streams

The Web dimension requires to be studied and understood

• Combination of remote streams and background data requires new solutions

• Not only queries, but also constraints over them (QoS)

Linköping, 22.11.2017 On web stream processing 42

Thank you! Questions?

On web stream processing

Daniele Dell’Aglio

dellaglio@ifi.uzh.ch

http://dellaglio.org

@dandellaglio

Linköping, 22.11.2017 On web stream processing 43

Find more: Q1

• A. Mauri, J.-P. Calbimonte, D. Dell’Aglio, M. Balduini, E. Della Valle, K. Aberer: Where Are the RDF Streams?: On Deploying RDF Streams on the Web of Data with TripleWave. Poster at International Semantic Web Conference 2015.

• A. Mauri, J.-P. Calbimonte, D. Dell’Aglio, M. Balduini, M. Brambilla, E. Della Valle, K. Aberer: TripleWave: Spreading RDF Streams on the Web. Resource Paper at International Semantic Web Conference 2016.

• D. Dell'Aglio, D. Le Phuoc, A. Lê Tuán, M. Intizar Ali, J.-P.Calbimonte: On a Web of Data Streams. DeSemWeb@ISWC 2017

Linköping, 22.11.2017 On web stream processing 44

Find more: Q2

• S. Dehghanzadeh, A. Mileo, D. Dell'Aglio, E. Della Valle, Shen Gao, A. Bernstein: Online View Maintenance for Continuous Query Evaluation. WWW (Companion Volume) 2015: 25-26

• S. Dehghanzadeh, D. Dell'Aglio, S. Gao, E. Della Valle, A. Mileo, A. Bernstein: Approximate Continuous Query Answering over Streams and Dynamic Linked Data Sets. ICWE 2015: 307-325

• S. Zahmatkesh, E. Della Valle, D. Dell'Aglio: When a FILTER Makes the Difference in Continuously Answering SPARQL Queries on Streaming and Quasi-Static Linked Data. ICWE 2016: 299-316

• S. Gao, D. Dell'Aglio, S. Dehghanzadeh, A. Bernstein, E. Della Valle, A. Mileo: Planning Ahead: Stream-Driven Linked-Data Access Under Update-Budget Constraints. International Semantic Web Conference (1) 2016: 252-270

• S. Zahmatkesh, E. Della Valle, D. Dell'Aglio: Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data. DEBS 2017: 170-179

Linköping, 22.11.2017 On web stream processing 45