Elizabeth Krupinski, PhD Jeffrey Johnson, PhD Hans Roehrig, PhD Jeffrey Lubin, PhD
Phd
-
Upload
jean-paul-calbimonte -
Category
Education
-
view
397 -
download
0
description
Transcript of Phd
Ontology-based Access to Sensor Data Streams
Jean-Paul Calbimonte
Supervisor: Oscar Corcho
Ontology Engineering GroupFacultad de Informática, Universidad Politécnica de Madrid
PhD Thesis Defense
18.4.2013
2
Outline
Motivation
Background
Conclusions
Semantic stream query processing
Sensor metadata characterization
Ontology-based Access to Sensor Data Streams
Hypotheses & contributions
Challenges
Motivation
3
from Sensor Networks
to the Sensor Web
and the Semantic Sensor Web
Sensors
4http://www.flickr.com/photos/wouterh/2409251427/
data capture
different Sensor providers
transmission
. . . . . .
data streams
Sensor Networks and the Web
5
Sensor Networks
users
applicationsdata
streams
Volume
VelocityVariety WEB
Universal Web-based access to Sensor data
Querying the semantic sensor Web
6
e.g. publish sensor data as RDF/Linked Data?
URIs as names of thingsHTTP URIs
useful information when URI is dereferenced
Link to other URIs
users
applications
WEB
Use ontology models to continuously query real-time data streams originated from sensors?1
static vs. streams
one-off vs. continuous
Research questions & hypotheses
7
Ontology models to query real-time sensor data streams?
Access heterogeneous SPEs using ontologies as an overarching data model?
SPARQL streaming extensions for querying data from SPEs (stream processing engines)?
1
H1: Sensor streaming data instances of an ontology model
H2: SPARQL extensions streaming operators & continuous processing
H3: Ontology-based streaming queries rewritten to relational-based queries using mappings
H4: Ontology-based streaming queries abstract expressions concrete executable SPE
queriesH5: Query rewriting Pull & Push delivery acceptable overhead
Sensor Data: Observations
Citizen Science
Multiple publishers
Heterogeneity
Metadata quality
8
Sensor data: observations
9
9
Characterizing semantic sensor metadata
10
users
applications
WEB
Characterizing sensor data, deriving semantic metadata from the sensor observations2
different publishersdifferent metadata
publish streams
Search/query relevant data sources?
GSN
Research questions & hypotheses
11
Data representation suitable for extracting data features that characterize a set of sensor streams?
Classification and mining techniques to characterize sensor data streams?
2
H6: Sensor data series find characteristic patterns make it recognizable among other
types
H7: Slope representations semantic properties such as the type of data learned with classification techniques
acceptable precision
Contributions
12
SPARQL extensions & formalization rewriting to algebra expressions using declarative mappings results data translation
query evaluation pluggable to ≠ SPEs
query rewriting using R2RML mappings
data representation as slope distributions characterize types of sensor data classifying sensor time series extract metadata features derive semantic properties & R2RML
SPARQLStream
Sensor metadata characterization
Qu
ery
ing
Meta
data
2
1
Limitations
13
L1: Rewriting medium sampling throughput, e.g. Env. monitoringL2: Query expressivity is limited to underlying SPEs’.
L3: Adapters implemented for custom sources.
L4: Querying only simple entailment
L5: Arbitrarily noisy sensor series no accurate characterization.L6: Classification number of sensor time series in training setL7: Data characterization is not computed in real-time, but offline
14
Outline
Motivation
Background
Conclusions
Semantic stream query processing
Sensor metadata characterization
Ontology-based Access to Sensor Data Streams
Hypotheses & contributions
Challenges
Data Streams
Continuous queries
WindowSPEs Ontology-based data
access
Sensor data streams & events
15
(temp,hum,pres) τi
(36.2,89,4) τimilford1
(35.6,87,4) τi-1
(37.2,88,4) τi+1
watford7
. . .
(37.6,88,7) τi (36.3,89,2) τi+1. . .
. . .
stream tuples
event processing
Querying streams & events
16
w1 w2windows
SELECT attribute FROM stream [NOW -10 MIN]
streaming tuples
Query processor
query results
database
Continuous query
processor
query
pushresults
pullrequest
SPE
continuous processingone-off queries
Stream Processing Engines (SPE)
17
Data Stream Management Systems (DSMS)
Complex Event Processors (CEP)
Sensor Data Middleware
CQL/Stream
Borealis
TelegraphCQ
StreamMill
Cayuga
GEM
CEDR
NiagaraCQ
Rapide
Cosm
HourglassSStreamWare
GSN
IBM InfoSphere
Sybase CEP
Microsoft StreamInsight
Oracle CEP
Esper
StreamBase
Diverse query languagesDifferent query capabilitiesDifferent query models
Extracting data from relational databases
18
WEB
Ontology-based data access
one-off SPARQL queries
data as RDF
relational database
RDB to RDF
mappings
static data
D2R
Morph
ODEMapster
TriplifyUltraWra
pMastro
R2RML
W3C SSN Ontology
Summary
19
Existing SPEs available and producing data streams
Ontology-based access only for stored data
SPARQL query language not suitable for streams
SPEs are highly heterogeneous in models and queries
20
Outline
Motivation
Background
Conclusions
Semantic stream query processing
Sensor metadata characterization
Ontology-based Access to Sensor Data Streams
Hypotheses & contributions
SPARQLStrea
m
Challenges
Query rewriting
RDF Stream Mappings using
R2RML Execution over
SPEs
RDF Streams
21
〈 s,p,o〉
<aemet:observation1, qudt:hasNumericValue, “15.5”>
<aemet:observation1, ssn:observedBy, aemet:Sensor3>
For streams?
(〈 s,p,o〉 ,τ)
(<aemet:observation1, qudt:hasNumericValue, “15.5”>,34532)
timestamped triples
• Gutierrez et al. (2007) Introducing time into RDF. IEEE TKDE• Rodríguez et al. (2009) Semantic management of streaming data. SSN
SPARQLStream extensions
22
SELECT (MAX(?temperature) AS ?maxtemp) ?sensor WHERE { ?obs ssn:observedBy ?sensor. ?obs ssn:observationResult ?res. ?res aemet:hasAirTemperatureValue ?val. ?val qu:numericValue ?temperature.} GROUP BY ?sensor
SELECT (MAX(?temp) AS ?maxtemp) ?sensor FROM NAMED STREAM <http://aemet.linkeddata.es/observations.srdf> [NOW-1 HOURS] WHERE { ?obs ssn:observedBy ?sensor. ?obs ssn:observationResult ?res. ?res aemet:hasAirTemperatureValue ?val. ?val qu:numericValue ?temp.}GROUP BY ?sensor
SPARQLStrea
m
Named streamsTime windows
Other approaches: Streaming SPARQL (2008), C-SPARQL (2009), CQELS (2011), EP-SPARQL (2011), INSTANS (2012)
Streaming SPARQL execution approaches
23
Extend RDF for streaming data
Extend SPARQL for streaming RDF
Use a SPE internally for evaluation
Query rewriting to SPEs
RDF Streaming engine from scratch
Logic-programming based query evaluation
~Similarities
Divergence
streams
DSMSs
CEPs
Middleware S
PA
RQ
LS
tream
Mapping SPE schemas and ontologies
24
wan7
timed: datetime PKsp_wind: float
timed sp_wind
1 3.4
2 5.6
3 11.2
4 1.2
5 3.1
.. …
Queries
SELECT sp_wind FROM wan7 [NOW -5 HOUR] WHERE sp_wind >10
SPE
SPE data schemas
ssn:Observation
Ontology modelsSPARQLStream Queries
Stream-to-ontology mappings
SELECT ?wspeedFROM STREAM <SensorReadings.srdf> [NOW–5 HOUR]WHERE { ?obs a ssn:ObservationValue; qudt:numericalValue ?wspeed; FILTER (?wspeed>10) }
http://swissex.ch/data#Wan7/WindSpeed/ObsValue{timed}
sp_wind
http://swissex.ch/data#Wan7/WindSpeed/Observation{timed}
http://swissex.ch/data#Wan7/ WindSpeed/ ObsOutput{timed}
sweetSpeed:WindSpeed
Creating Mappings
25
wan7
timed: datetime PKsp_wind: float
ssn:ObservationValue
qudt:numericValue
xsd:decimal
ssn:SensorOutput
ssn:Observation
ssn:hasValue
ssn:observationResult
ssn:Propertyssn:observedProperty
:Wan4WindSpeed a rr:TriplesMapClass; rr:tableName "wan7"; rr:subjectMap [ rr:template "http://swissex.ch/data#Wan7/WindSpeed/ObsValue/{timed}"; rr:class ssn:ObservationValue; rr:graph ssg:swissexsnow.srdf ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate qudt:numericValue ]; rr:objectMap [ rr:column "sp_wind” rr:datatype xsd:decimal]];.
W3C R2RML Mapping Language
Query rewriting
SELECT ?windspeedFROM STREAM <http://ssg4env.eu/SensorReadings.srdf>
[NOW–5 HOUR TO NOW]WHERE { ?obs a ssn:ObservationValue; qudt:numericalValue ?windspeed; FILTER (?windspeed>10) }
SELECT sp_wind FROM wan7 [FROM NOW-5 HOURS TO NOW] WHERE sp_wind >10
timed,sp_wind
π
ω
σsp_wind>10
5 Hour
wan7
SELECT sp_wind FROM wan7.win:time(5 hour) WHERE sp_wind >10
http://montblanc.slf.ch:22001/multidata?vs[0]=wan7& field[0]=wind_speed_scalar_av&c_min[0]=10& from=15/05/2012+05:00:00&to=15/05/2012+10:00:00
http://api.cosm.com/v2/feeds/14321/datastreams/4?start=2012-05-15T05:00:00Z&end=2012-05-15T10:00:00Z
Query rewriting
R2RML
SNEE (DSMS)
Esper (DSMS)
GSN (middlwr)
Cosm(middlwr)
26
H4: Ontology-based streaming queries abstract expressions concrete executable SPE queries
H3: Ontology-based streaming queries rewritten to relational-based queries using mappings
SPARQLStrea
m
27
Ontology-based query rewriting
Query rewriting
Query ProcessingC
lien
t
SPARQLStream
[tuples][triples/
bindings]
Algebra expression
R2RML Mappings
SPARQLStream query processing
SELECT ?windspeedFROM STREAM <http://ssg4env.eu/SensorReadings.srdf> [NOW–5 HOUR]WHERE { ?obs a ssn:ObservationValue; qudt:numericalValue ?windspeed; FILTER (?windspeed>10) }
SELECT sp_wind FROM wan7.win:time(5 hour) WHERE sp_wind >10
π timed,sp_wind
ω
σsp_wind>10
5 Hour
wan7
Data translation
SNEE
Esper
GSNCosmpull/
push
https://github.com/jpcik/morph-streams
Other
H1: Sensor streaming data instances of an ontology model
H2: SPARQL extensions streaming operators & continuous processing
Evaluation of query rewriting overhead
28
H5: Query rewriting Pull & Push delivery acceptable overhead
Native execution w/o rewriting
Execution with rewriting
Pull & Push deliveryEnd-to latency
Adapted Esper benchmark
29
Outline
Motivation
Background
Conclusions
Semantic stream query processing
Sensor metadata characterization
Ontology-based Access to Sensor Data Streams
Hypotheses & contributions
Representation
Challenges
Classification Metadata
Characterizing semantic sensor metadata
30
WEB
GSN
Air Pressure?
Air Temperature?
Already classified time series
Unclassified input series
compare
Deriving Semantic Metadata
31
Representation
Classification
Metadata
0 1 2 3 4 5 6 7 8 9 103.65
3.7
3.75
3.8
3.85
3.9
3.95
4
4.05
4.1
0 1 2 3 4 5 6 7 8 9 103.7
3.75
3.8
3.85
3.9
3.95
4
4.05
4.1
Piecewise Linear Approximation
32
Reflect data trends
Apply with different resolutionsApplicable for different rates
Online computation cheap
Linear segmentsTime series
time
Reduce numerosity
Linear Approximations
33
a
d
a
c0
π/2
-π/4
π/4a
b
c
d
Key: segment slopes (angles)
Divide the angle space in sectors
distribution of angles in training set
compute linear approximationcompute slope distribution
K-nearest neighbor classification
2
1
3
Experiments SwissEx
Confusion matrix SwissEx
Training-Test datasets
SwissExperiment AEMET
34
Experiments AEMET
Confusion matrix AEMET
H6: Sensor data series find characteristic patterns make it recognizable among other types
35
Classification according to typeFPs on subclasses of the same
property
Evaluation vs SAX
36
H7: Slope representations type of data: semantic
property learned through classification acceptable precision
Semantic Sensor Metadata
swissex:Sensor1
rdf:type ssn:Sensor;
ssn:onPlatform swissex:Station1;
ssn:observes cf-property:wind_speed.
swissex:Sensor2
rdf:type ssn:Sensor;
ssn:onPlatform swissex:Station1;
ssn:observes cf-property:air_temperature.
37
station1
senso
r1
senso
r2
W3C SSN Ontology
Derive semantic metadata properties
cf-property:wind_speed rdf:type dim:VelocityOrSpeed; rdfs:label "wind speed"; ssn:isPropertyOf cf-feature:wind; qu:propertyType qu:scalar; qu:generalQuantityKind qu:speed.
Raw sensor data
Semantic metadata
38
Outline
Motivation
Background
Conclusions
Semantic stream query processing
Sensor metadata characterization
Ontology-based Access to Sensor Data Streams
Hypotheses & contributions
Challenges
Conclusions
H1: Sensor streaming data instances of an ontology model
H2: SPARQL extensions streaming operators & continuous processing
H3: Ontology-based streaming queries rewritten to relational-based queries using mappings
Mapping sensor data to ontology instances, e.g. SSN Ontology
SPARQLStream data model, extensions syntax, semantics
SPARQLStream semantics of query rewriting to relational steaming algebra
usage of declarative mappings (W3C R2RML)
Calbimonte, Corcho & Gray. Enabling ontology-based access to streaming data sources. ISWC 2010
Gray, García-Castro, Kyzirakos, Karpathiotakis, Calbimonte, Page et al. A semantically enabled service architecture for mashups over streaming and stored data. ESWC 2011
Gray, Sadler, Kit, Kyzirakos, Karpathiotakis, Calbimonte, Page, García-Castro, et al. A semantic sensor web for environmental decision support applications. Sensors, MDPI, 2011
Calbimonte, Corcho & Gray. Ontology-based Access to Streaming Data. In Posters ESWC 2010
39
Conclusions
40
H4: Ontology-based streaming queries abstract expressions concrete executable SPE
queriesInstantiate, execute ≠ SPEs: SNEE (DSMS), Esper (CEP), GSN & Cosm (Middlwr)
Available implementation application in different domains
H5: Query rewriting Pull & Push delivery evaluation overhead
SPARQLStream evaluation overhead wrt. native execution
Push & pull delivery evaluation
Calbimonte, Jeung, Corcho & Aberer. Enabling Query Technologies for the Semantic Sensor Web. IJSWIS 2012.Calbimonte & Corcho. Evaluating SPARQL Queries over RDF Streams. Linked Data Management: Principles and Techniques, CRC Press, 2013 (under review)
Zhang, Duc, Corcho & Calbimonte. SRBench: A Streaming RDF/SPARQL Benchmark. ISWC 2012.Ruckhaus, Calbimonte, García-Castro & Corcho. Short Paper: From Streaming Data to Linked Data–A Case Study with Bike Sharing Systems. ISWC SSN 2012
Conclusions
41
H6: Sensor data series analyze in order to find characteristic patterns make it recognizable among other types
H7: Slope representations semantic properties such as the type of data learned with classification techniques
acceptable precision
41
Raw observations analysis slope distribution representation compared with SoA representations i.e. SAX
Evaluation of classification task real world datasets AEMET, SwissEx in presence of noisy data deriving semantic metadata
Calbimonte, Yan, Jeung, Corcho & Aberer. Deriving Semantic Sensor Metadata from Raw Measurements. ISWC SSN 2012
Calbimonte, Jeung, Corcho, & Aberer. Semantic Sensor Data Search in a Large-Scale Federated Sensor Network. ISWC SSN 2011
Future directions
42
WEB
SPARQLStream queries
Publishing Linked Stream Data
Currently static
SPARQL streaming standards
Dereferencing streaming data
Query FederationDistributed sensor
data
Static and streaming sources
Stream Reasoningquery rewriting, expanding
queries
Expresiveness
Integrate with the Web of Data
Inferencing
Future directions
WEB
Sensor pattern classification
Combine with query processing
Live data classification
Statistical & quality analysis
Integrate statistic analyisis
Mappings to statistical models
Data quality filtering
Parallel Massive Stream Processing
Online stream analysis
Scalable stream processing
S4, Storm, Streamcloud
Heterogeneity
43
Ontology-based Access to Sensor Data Streams
Jean-Paul Calbimonte
Supervisor: Oscar Corcho
Ontology Engineering GroupFacultad de Informática, Universidad Politécnica de Madrid
18.4.2013
PhD Thesis Defense
45
SSN Ontology with other ontologies
46
W3C SSN Ontology
tool for modeling our sensor datacombine with domain ontologies
Algebra construction
47
timed,sp_wind
π
ω
σ sp_wind>10
5 Hour
wan7windsensor1 windsensor2
Static optimization
48
timed,sp_wind
π
ω
σ sp_wind>10
5 Hour
wan7
timed,windvalue
π
ω
σ windvalue>10
5 Hour
windsensor1
timed,windvalue
π
ω
σ windvalue>10
5 Hour
windsensor2
SPARQL Streaming extensions
49
SPARQL Stream features
50
SRBench
51
RDF Streams and SPARQLStream
52
RDF Stream
Time window
Window-Stream
Mappings
53
Subject, predicate, object
Evaluate query
Rewrite to algebra
54
Rewriting and Execution Process
55
Execution process
56
SRBench Datasets
real-world U.S. weather data1
first & largest sensor dataset in LOD
57
LinkedSensorData
LinkedSensorMetadata
LinkedObservationData~20k US weather stations, ~100k sensors
links to locations in GeoNames nearbyhurricane & blizzard observations in
US~1.73 billion RDF triples~159 million observations
1 http://mesowest.utah.edu
Name Storm Type Date #Triples #Observations Data size
Bill Hurricane Aug. 17 – 22, 2009
231,021,108 21,272,790 ~15 GB
Ike Hurricane Sep. 01 – 13, 2008
374,094,660 34,430,964 ~34 GB
Gustav Hurricane Aug. 25 – 31, 2008
258,378,511 23,792,818 ~17 GB
Bertha Hurricane Jul. 06 – 17, 2008 278,235,734 25,762,568 ~13 GB
Wilma Hurricane Oct. 17 – 23, 2005 171,854,686 15,797,852 ~10 GB
Katrina Hurricane Aug. 23 – 30, 2005
203,386,049 18,832,041 ~12 GB
Charley Hurricane Aug. 09 – 15, 2004
101,956,760 9,333,676 ~7 GB
Blizzard Apr. 01 – 06, 2003 111,357,227 10,237,791 ~2 GB
SRBench Queries
58
graph pattern matchingsolution
modifierquery
formSPARQL
1.1reasonin
gstreamin
g
data access
and, filter, union, optional
projection, distinct
select, construct, askaggregate, subquery
subclass, subproperty, sameAs
time window, istream
observations, sensor metadatageonames, dbpedia
select expr, property path
dstream, rstream
17
q
ueri
es
Query Features
59
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17
1.Graph pattern matching
A A,F,O A A,F A A,F,U A A A A A,F A,F,U A,F A,F,U A,F A,F A,F
2. Solution modifier P,D P,D P P P P P,D P P P,D P,D P P P,D P P P
3. Query form S S A S C S S S S S S S S S S S S
4. SPARQL 1.1 F,P A A,E,M,F
A,S N A,E,M A,E,M A,S,M,F
A,S,E,M,F,P
A,E,M,F,P
F,P A,E,M,P
P P
5. Reasoning C R C A C
6. Streaming T T T T T T T,D T T T T T T T T
7. Dataset O O O O O O O O,S O,S O,S O,S O,S,G O,S,G O,S,G O,S,D O,S,G,D
S
1. And, Filter, Union, Optional2. Projection, Distinct3. Select, Construct, Ask4. Aggregate, Subquery, Negation, Expr in SELECT, assignMent,
Functions&operators, PropertyPath5. subClassOf, subpRopertyOf, owl:sameAs6. Time-based window, Istream, Dstream,Rstream7. LinkedObservationData, LinkedSensorMetadata, GeoNames,
Dbpedia