RDF Stream Processing Implementa · PDF file 2016-11-02 · RDF Stream Processing...
date post
22-May-2020Category
Documents
view
9download
0
Embed Size (px)
Transcript of RDF Stream Processing Implementa · PDF file 2016-11-02 · RDF Stream Processing...
Tutorial on RDF Stream
Processing 2016 M.I. Ali, J-P Calbimonte, D. Dell'Aglio,
E. Della Valle, and A. Mauri http://streamreasoning.org/events/rsp2016
RDF Stream Processing
Implementations Jean-Paul Calbimonte
[email protected] http://jeanpi.org @jpcik
http://streamreasoning.org/events/rsp2016 http://dellaglio.org/
http://streamreasoning.org/events/rsp2016
Share, Remix, Reuse — Legally
This work is licensed under the Creative Commons Attribution 3.0 Unported License.
You are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
Under the following conditions
• Attribution — You must attribute the work by inserting
– “[source http://streamreasoning.org/rsp2014]” at the end of each reused slide
– a credits slide stating - These slides are partially based on “RDF Stream Processing 2014”
by M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, E. Della Valle http://streamreasoning.org/rsp2014
To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/
2
http://streamreasoning.org/rsp2014 http://streamreasoning.org/rsp2014 http://creativecommons.org/licenses/by/3.0/
http://streamreasoning.org/events/rsp2016
RSP for developers
• RDF Streams in practice
• RSP Query Engines
• Developing with an RSP Engine
• Handling Results
• RSP Services
3
http://streamreasoning.org/events/rsp2016
RDF Streams in Practice
4
http://streamreasoning.org/events/rsp2016
RSP: Keep the data moving
5
Process data in-stream
Not required to store
Active processing model
input streams
RSP queries/
rules output streams/events
RDF Streams
http://streamreasoning.org/events/rsp2016 6
RDF Stream
…
Gi
Gi+1
Gi+2
…
Gi+n
…un b o u n d e d s
e q u e n c e
Gi {(s1,p1,o1),
(s2,p2,o2),…} [ti]
1+ triples
implicit/explicit
timestamp/interval
RDF streams in theory
How do I code this?
Use Web standards?
http://streamreasoning.org/events/rsp2016 7
Linked Data on the Web
Web of Data
Linked Data
W3C Standards: RDF, SPARQL, etc.
http://streamreasoning.org/events/rsp2016 8
Linked Data principles for RDF streams?
e.g. publish sensor data as RDF/Linked Data?
URIs as names of things HTTP URIs
useful information when URI is dereferenced
Link to other URIs
users
application s
WEB
Use RDF model to continuously query real-time data streams?
static vs. streams
one-off vs. continuous
http://streamreasoning.org/events/rsp2016 9
(Sensor) Data Streams on the Web
9
http://mesowest.utah.edu/
http://earthquake.usgs.gov/earthquakes/feed/v1.0/
http://swiss-experiment.ch
• Monitoring
• Alerts
• Notifications
• Hourly/daily updates
• Myriad of Formats
• Ad-hoc access points
• Informal description
• Convention-semantics
• Uneven use of standards
• Manual exploration
http://streamreasoning.org/events/rsp2016 10
RDF Streams before RDF Streams
http://richard.cyganiak.de/2007/10/lod/
2011
Linked Sensor Data
MetOffice AEMET
http://streamreasoning.org/events/rsp2016 11
Sensor Data & Linked Data
11
Zip Files
Number of Triples
Example: Nevada dataset
-7.86GB in n-triples format
-248MB zipped
An example: Linked Sensor Data
http://wiki.knoesis.org/index.php/LinkedSensorData
http://streamreasoning.org/events/rsp2016 12
Sensor Data & Linked Data
12
.
"30.0"^^ .
.
.
.
.
.
.
"2003-03-31T05:10:00-07:00^^http://www.w3.org/2001/XMLSchema#dateTime" .
What do we get in these datasets?
Nice triples
What is measured
Measurement
Unit
Sensor
When is it measured
http://streamreasoning.org/events/rsp2016 13
RDF Streams before RDF Streams
i.e. just use RDF
:observation1 rdf:type om-owl:Observation .
:observation1 om-owl:observedProperty weather:_AirTemperature .
:observation1 om-owl:procedure :sensor1 .
:observation1 om-owl:result :obsresult1 .
:observation1 om-owl:resultTime "2015-01-01T10:00:01"
:obsresult1 om-owl:floatValue 35.4 .
Plain triples
Where is the
timestamp?
:observation2 rdf:type om-owl:Observation .
:observation2 om-owl:observedProperty weather:_AirTemperature .
:observation2 om-owl:procedure :sensor1 .
:observation2 om-owl:result :obsresult2 .
:observation2 om-owl:resultTime "2015-01-01T10:00:02"
:obsresult2 om-owl:floatValue 36.4 .
What is the order
in the RDF graph?
Appended to a file?
Or to some RDF dataset?
How to store it?
http://streamreasoning.org/events/rsp2016 14
Feed an RDF Stream to a RSP engine
Ad-hoc
Conversion to
RDF
Live Non-RDF Streams
RDF
RDF datasets
RSP
Add (internal)
timestamp
on insertion
What is currently done in most RSPs
Continuous
additions
RDF +
timestamps
http://streamreasoning.org/events/rsp2016 15
Feed an RDF Stream to C-SPARQL
public class SensorsStreamer extends RdfStream implements Runnable {
public void run() {
..
while(true){
...
RdfQuadruple q=new RdfQuadruple(subject,predicate,object,
System.currentTimeMillis());
this.put(q);
}
}
}
something
to run on
a thread
timestamped
triple
the stream is
“observable”
Data structure, execution
and callbacks are mixed
Observer pattern
Tightly coupled listener
Added timestamp
http://streamreasoning.org/events/rsp2016 16
Actor Model
Actor
1
Actor
2
m No shared mutable state
Avoid blocking operators
Lightweight objects
Loose coupling
communicate
through messages
mailbox
state
behaviornon-blocking response
send: fire-forget
Implementations: e.g. Akka for Java/Scala
http://streamreasoning.org/events/rsp2016 17
RDF Stream
object DemoStreams {
...
def streamTriples={
Iterator.from(1) map{i=>
...
new Triple(subject,predicate,object)
}
}
Data structure
Infinite
triple
iterator
Execution val f=Future(DemoStreams.streamTriples)
f.map{a=>a.foreach{triple=>
//do something
}}
Asynchronou
s iteration
Message passing f.map{a=>a.foreach{triple=>
someSink ! triple
}}
send triple to
actor
Immutable RDF stream
avoid shared mutable
state
avoid concurrent writes
unbounded sequence
Ideas using akka actors
Futures
non blocking composition
concurrent computations
work with not-yet-
computed results
Actors
message-based
share-nothing async
distributable
http://streamreasoning.org/events/rsp2016 18
RDF Stream
… other issues:
Graph implementation?
Timestamps: application vs system?
Serialization?
Loose coupling
Immutable data streams
Asynchronous message passing
Well defined input/output
http://streamreasoning.org/events/rsp2016 19
Data stream characteristics
19
Data regularity • Raw data typically collected as time series
• Very regular structure.
• Patterns can be exploited
E.g. mobile NO2 sensor readings
29-02-2016T16:41:24,47,369,46.52104,6.63579
29-02-2016T16:41:34,47,358,46.52344,6.63595
29-02-2016T16:41:44,47,354,46.52632,6.63634
29-02-2016T16:41:54,47,355,46.52684,6.63729
...
Data order • Order of data is crucial
• Time is the key attribute for establishing an order among the data items.
• Important for indexing
• Enables efficient time-based selection, filtering and windowing
Timestamp Sensor Observed
Value
Coordinates
http://streamreasoning.org/events/rsp2016 20
Feed an RDF Stream to a RSP engine
Conversion to
RDF
Live Non-RDF Streams
RDF
RDF datasets
RSP
Add (internal)
timestamp
on insertion
Adding mappings to the data flow
Continuous
additions
RDF +
times