Phd

59
Ontology-based Access to Sensor Data Streams Jean-Paul Calbimonte Supervisor: Oscar Corcho Ontology Engineering Group Facultad de Informática, Universidad Politécnica de Madrid PhD Thesis Defense 18.4.2013

description

 

Transcript of Phd

Page 1: Phd

Ontology-based Access to Sensor Data Streams

Jean-Paul Calbimonte

Supervisor: Oscar Corcho

Ontology Engineering GroupFacultad de Informática, Universidad Politécnica de Madrid

[email protected]

PhD Thesis Defense

18.4.2013

Page 2: Phd

2

Outline

Motivation

Background

Conclusions

Semantic stream query processing

Sensor metadata characterization

Ontology-based Access to Sensor Data Streams

Hypotheses & contributions

Challenges

Page 3: Phd

Motivation

3

from Sensor Networks

to the Sensor Web

and the Semantic Sensor Web

Page 4: Phd

Sensors

4http://www.flickr.com/photos/wouterh/2409251427/

data capture

different Sensor providers

transmission

. . . . . .

data streams

Page 5: Phd

Sensor Networks and the Web

5

Sensor Networks

users

applicationsdata

streams

Volume

VelocityVariety WEB

Universal Web-based access to Sensor data

Page 6: Phd

Querying the semantic sensor Web

6

e.g. publish sensor data as RDF/Linked Data?

URIs as names of thingsHTTP URIs

useful information when URI is dereferenced

Link to other URIs

users

applications

WEB

Use ontology models to continuously query real-time data streams originated from sensors?1

static vs. streams

one-off vs. continuous

Page 7: Phd

Research questions & hypotheses

7

Ontology models to query real-time sensor data streams?

Access heterogeneous SPEs using ontologies as an overarching data model?

SPARQL streaming extensions for querying data from SPEs (stream processing engines)?

1

H1: Sensor streaming data instances of an ontology model

H2: SPARQL extensions streaming operators & continuous processing

H3: Ontology-based streaming queries rewritten to relational-based queries using mappings

H4: Ontology-based streaming queries abstract expressions concrete executable SPE

queriesH5: Query rewriting Pull & Push delivery acceptable overhead

Page 8: Phd

Sensor Data: Observations

Citizen Science

Multiple publishers

Heterogeneity

Metadata quality

8

Page 9: Phd

Sensor data: observations

9

9

Page 10: Phd

Characterizing semantic sensor metadata

10

users

applications

WEB

Characterizing sensor data, deriving semantic metadata from the sensor observations2

different publishersdifferent metadata

publish streams

Search/query relevant data sources?

GSN

Page 11: Phd

Research questions & hypotheses

11

Data representation suitable for extracting data features that characterize a set of sensor streams?

Classification and mining techniques to characterize sensor data streams?

2

H6: Sensor data series find characteristic patterns make it recognizable among other

types

H7: Slope representations semantic properties such as the type of data learned with classification techniques

acceptable precision

Page 12: Phd

Contributions

12

SPARQL extensions & formalization rewriting to algebra expressions using declarative mappings results data translation

query evaluation pluggable to ≠ SPEs

query rewriting using R2RML mappings

data representation as slope distributions characterize types of sensor data classifying sensor time series extract metadata features derive semantic properties & R2RML

SPARQLStream

Sensor metadata characterization

Qu

ery

ing

Meta

data

2

1

Page 13: Phd

Limitations

13

L1: Rewriting medium sampling throughput, e.g. Env. monitoringL2: Query expressivity is limited to underlying SPEs’.

L3: Adapters implemented for custom sources.

L4: Querying only simple entailment

L5: Arbitrarily noisy sensor series no accurate characterization.L6: Classification number of sensor time series in training setL7: Data characterization is not computed in real-time, but offline

Page 14: Phd

14

Outline

Motivation

Background

Conclusions

Semantic stream query processing

Sensor metadata characterization

Ontology-based Access to Sensor Data Streams

Hypotheses & contributions

Challenges

Data Streams

Continuous queries

WindowSPEs Ontology-based data

access

Page 15: Phd

Sensor data streams & events

15

(temp,hum,pres) τi

(36.2,89,4) τimilford1

(35.6,87,4) τi-1

(37.2,88,4) τi+1

watford7

. . .

(37.6,88,7) τi (36.3,89,2) τi+1. . .

. . .

stream tuples

event processing

Page 16: Phd

Querying streams & events

16

w1 w2windows

SELECT attribute FROM stream [NOW -10 MIN]

streaming tuples

Query processor

query results

database

Continuous query

processor

query

pushresults

pullrequest

SPE

continuous processingone-off queries

Page 17: Phd

Stream Processing Engines (SPE)

17

Data Stream Management Systems (DSMS)

Complex Event Processors (CEP)

Sensor Data Middleware

CQL/Stream

Borealis

TelegraphCQ

StreamMill

Cayuga

GEM

CEDR

NiagaraCQ

Rapide

Cosm

HourglassSStreamWare

GSN

IBM InfoSphere

Sybase CEP

Microsoft StreamInsight

Oracle CEP

Esper

StreamBase

Diverse query languagesDifferent query capabilitiesDifferent query models

Page 18: Phd

Extracting data from relational databases

18

WEB

Ontology-based data access

one-off SPARQL queries

data as RDF

relational database

RDB to RDF

mappings

static data

D2R

Morph

ODEMapster

TriplifyUltraWra

pMastro

R2RML

W3C SSN Ontology

Page 19: Phd

Summary

19

Existing SPEs available and producing data streams

Ontology-based access only for stored data

SPARQL query language not suitable for streams

SPEs are highly heterogeneous in models and queries

Page 20: Phd

20

Outline

Motivation

Background

Conclusions

Semantic stream query processing

Sensor metadata characterization

Ontology-based Access to Sensor Data Streams

Hypotheses & contributions

SPARQLStrea

m

Challenges

Query rewriting

RDF Stream Mappings using

R2RML Execution over

SPEs

Page 21: Phd

RDF Streams

21

〈 s,p,o〉

<aemet:observation1, qudt:hasNumericValue, “15.5”>

<aemet:observation1, ssn:observedBy, aemet:Sensor3>

For streams?

(〈 s,p,o〉 ,τ)

(<aemet:observation1, qudt:hasNumericValue, “15.5”>,34532)

timestamped triples

• Gutierrez et al. (2007) Introducing time into RDF. IEEE TKDE• Rodríguez et al. (2009) Semantic management of streaming data. SSN

Page 22: Phd

SPARQLStream extensions

22

SELECT (MAX(?temperature) AS ?maxtemp) ?sensor WHERE { ?obs ssn:observedBy ?sensor. ?obs ssn:observationResult ?res. ?res aemet:hasAirTemperatureValue ?val. ?val qu:numericValue ?temperature.} GROUP BY ?sensor

SELECT (MAX(?temp) AS ?maxtemp) ?sensor FROM NAMED STREAM <http://aemet.linkeddata.es/observations.srdf> [NOW-1 HOURS] WHERE { ?obs ssn:observedBy ?sensor. ?obs ssn:observationResult ?res. ?res aemet:hasAirTemperatureValue ?val. ?val qu:numericValue ?temp.}GROUP BY ?sensor

SPARQLStrea

m

Named streamsTime windows

Other approaches: Streaming SPARQL (2008), C-SPARQL (2009), CQELS (2011), EP-SPARQL (2011), INSTANS (2012)

Page 23: Phd

Streaming SPARQL execution approaches

23

Extend RDF for streaming data

Extend SPARQL for streaming RDF

Use a SPE internally for evaluation

Query rewriting to SPEs

RDF Streaming engine from scratch

Logic-programming based query evaluation

~Similarities

Divergence

streams

DSMSs

CEPs

Middleware S

PA

RQ

LS

tream

Page 24: Phd

Mapping SPE schemas and ontologies

24

wan7

timed: datetime PKsp_wind: float

timed sp_wind

1 3.4

2 5.6

3 11.2

4 1.2

5 3.1

.. …

Queries

SELECT sp_wind FROM wan7 [NOW -5 HOUR] WHERE sp_wind >10

SPE

SPE data schemas

ssn:Observation

Ontology modelsSPARQLStream Queries

Stream-to-ontology mappings

SELECT ?wspeedFROM STREAM <SensorReadings.srdf> [NOW–5 HOUR]WHERE { ?obs a ssn:ObservationValue; qudt:numericalValue ?wspeed; FILTER (?wspeed>10) }

Page 25: Phd

http://swissex.ch/data#Wan7/WindSpeed/ObsValue{timed}

sp_wind

http://swissex.ch/data#Wan7/WindSpeed/Observation{timed}

  

http://swissex.ch/data#Wan7/ WindSpeed/ ObsOutput{timed}

  

sweetSpeed:WindSpeed

Creating Mappings

25

wan7

timed: datetime PKsp_wind: float

ssn:ObservationValue

qudt:numericValue

xsd:decimal

ssn:SensorOutput

ssn:Observation

ssn:hasValue

ssn:observationResult

ssn:Propertyssn:observedProperty

:Wan4WindSpeed a rr:TriplesMapClass; rr:tableName "wan7"; rr:subjectMap [ rr:template "http://swissex.ch/data#Wan7/WindSpeed/ObsValue/{timed}"; rr:class ssn:ObservationValue; rr:graph ssg:swissexsnow.srdf ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate qudt:numericValue ]; rr:objectMap [ rr:column "sp_wind” rr:datatype xsd:decimal]];.

W3C R2RML Mapping Language

Page 26: Phd

Query rewriting

SELECT ?windspeedFROM STREAM <http://ssg4env.eu/SensorReadings.srdf>

[NOW–5 HOUR TO NOW]WHERE { ?obs a ssn:ObservationValue; qudt:numericalValue ?windspeed; FILTER (?windspeed>10) }

SELECT sp_wind FROM wan7 [FROM NOW-5 HOURS TO NOW] WHERE sp_wind >10

timed,sp_wind

π

ω

σsp_wind>10

5 Hour

wan7

SELECT sp_wind FROM wan7.win:time(5 hour) WHERE sp_wind >10

http://montblanc.slf.ch:22001/multidata?vs[0]=wan7& field[0]=wind_speed_scalar_av&c_min[0]=10& from=15/05/2012+05:00:00&to=15/05/2012+10:00:00

http://api.cosm.com/v2/feeds/14321/datastreams/4?start=2012-05-15T05:00:00Z&end=2012-05-15T10:00:00Z

Query rewriting

R2RML

SNEE (DSMS)

Esper (DSMS)

GSN (middlwr)

Cosm(middlwr)

26

H4: Ontology-based streaming queries abstract expressions concrete executable SPE queries

H3: Ontology-based streaming queries rewritten to relational-based queries using mappings

SPARQLStrea

m

Page 27: Phd

27

Ontology-based query rewriting

Query rewriting

Query ProcessingC

lien

t

SPARQLStream

[tuples][triples/

bindings]

Algebra expression

R2RML Mappings

SPARQLStream query processing

SELECT ?windspeedFROM STREAM <http://ssg4env.eu/SensorReadings.srdf> [NOW–5 HOUR]WHERE { ?obs a ssn:ObservationValue; qudt:numericalValue ?windspeed; FILTER (?windspeed>10) }

SELECT sp_wind FROM wan7.win:time(5 hour) WHERE sp_wind >10

π timed,sp_wind

ω

σsp_wind>10

5 Hour

wan7

Data translation

SNEE

Esper

GSNCosmpull/

push

https://github.com/jpcik/morph-streams

Other

H1: Sensor streaming data instances of an ontology model

H2: SPARQL extensions streaming operators & continuous processing

Page 28: Phd

Evaluation of query rewriting overhead

28

H5: Query rewriting Pull & Push delivery acceptable overhead

Native execution w/o rewriting

Execution with rewriting

Pull & Push deliveryEnd-to latency

Adapted Esper benchmark

Page 29: Phd

29

Outline

Motivation

Background

Conclusions

Semantic stream query processing

Sensor metadata characterization

Ontology-based Access to Sensor Data Streams

Hypotheses & contributions

Representation

Challenges

Classification Metadata

Page 30: Phd

Characterizing semantic sensor metadata

30

WEB

GSN

Air Pressure?

Air Temperature?

Already classified time series

Unclassified input series

compare

Page 31: Phd

Deriving Semantic Metadata

31

Representation

Classification

Metadata

Page 32: Phd

0 1 2 3 4 5 6 7 8 9 103.65

3.7

3.75

3.8

3.85

3.9

3.95

4

4.05

4.1

0 1 2 3 4 5 6 7 8 9 103.7

3.75

3.8

3.85

3.9

3.95

4

4.05

4.1

Piecewise Linear Approximation

32

Reflect data trends

Apply with different resolutionsApplicable for different rates

Online computation cheap

Linear segmentsTime series

time

Reduce numerosity

Page 33: Phd

Linear Approximations

33

a

d

a

c0

π/2

-π/4

π/4a

b

c

d

Key: segment slopes (angles)

Divide the angle space in sectors

distribution of angles in training set

compute linear approximationcompute slope distribution

K-nearest neighbor classification

2

1

3

Page 34: Phd

Experiments SwissEx

Confusion matrix SwissEx

Training-Test datasets

SwissExperiment AEMET

34

Page 35: Phd

Experiments AEMET

Confusion matrix AEMET

H6: Sensor data series find characteristic patterns make it recognizable among other types

35

Classification according to typeFPs on subclasses of the same

property

Page 36: Phd

Evaluation vs SAX

36

H7: Slope representations type of data: semantic

property learned through classification acceptable precision

Page 37: Phd

Semantic Sensor Metadata

swissex:Sensor1

rdf:type ssn:Sensor;

ssn:onPlatform swissex:Station1;

ssn:observes cf-property:wind_speed.

swissex:Sensor2

rdf:type ssn:Sensor;

ssn:onPlatform swissex:Station1;

ssn:observes cf-property:air_temperature.

37

station1

senso

r1

senso

r2

W3C SSN Ontology

Derive semantic metadata properties

cf-property:wind_speed rdf:type dim:VelocityOrSpeed; rdfs:label "wind speed"; ssn:isPropertyOf cf-feature:wind; qu:propertyType qu:scalar; qu:generalQuantityKind qu:speed.

Raw sensor data

Semantic metadata

Page 38: Phd

38

Outline

Motivation

Background

Conclusions

Semantic stream query processing

Sensor metadata characterization

Ontology-based Access to Sensor Data Streams

Hypotheses & contributions

Challenges

Page 39: Phd

Conclusions

H1: Sensor streaming data instances of an ontology model

H2: SPARQL extensions streaming operators & continuous processing

H3: Ontology-based streaming queries rewritten to relational-based queries using mappings

Mapping sensor data to ontology instances, e.g. SSN Ontology

SPARQLStream data model, extensions syntax, semantics

SPARQLStream semantics of query rewriting to relational steaming algebra

usage of declarative mappings (W3C R2RML)

Calbimonte, Corcho & Gray. Enabling ontology-based access to streaming data sources. ISWC 2010

Gray, García-Castro, Kyzirakos, Karpathiotakis, Calbimonte, Page et al. A semantically enabled service architecture for mashups over streaming and stored data. ESWC 2011

Gray, Sadler, Kit, Kyzirakos, Karpathiotakis, Calbimonte, Page, García-Castro, et al. A semantic sensor web for environmental decision support applications. Sensors, MDPI, 2011

Calbimonte, Corcho & Gray. Ontology-based Access to Streaming Data. In Posters ESWC 2010

39

Page 40: Phd

Conclusions

40

H4: Ontology-based streaming queries abstract expressions concrete executable SPE

queriesInstantiate, execute ≠ SPEs: SNEE (DSMS), Esper (CEP), GSN & Cosm (Middlwr)

Available implementation application in different domains

H5: Query rewriting Pull & Push delivery evaluation overhead

SPARQLStream evaluation overhead wrt. native execution

Push & pull delivery evaluation

Calbimonte, Jeung, Corcho & Aberer. Enabling Query Technologies for the Semantic Sensor Web. IJSWIS 2012.Calbimonte & Corcho. Evaluating SPARQL Queries over RDF Streams. Linked Data Management: Principles and Techniques, CRC Press, 2013 (under review)

Zhang, Duc, Corcho & Calbimonte. SRBench: A Streaming RDF/SPARQL Benchmark. ISWC 2012.Ruckhaus, Calbimonte, García-Castro & Corcho. Short Paper: From Streaming Data to Linked Data–A Case Study with Bike Sharing Systems. ISWC SSN 2012

Page 41: Phd

Conclusions

41

H6: Sensor data series analyze in order to find characteristic patterns make it recognizable among other types

H7: Slope representations semantic properties such as the type of data learned with classification techniques

acceptable precision

41

Raw observations analysis slope distribution representation compared with SoA representations i.e. SAX

Evaluation of classification task real world datasets AEMET, SwissEx in presence of noisy data deriving semantic metadata

Calbimonte, Yan, Jeung, Corcho & Aberer. Deriving Semantic Sensor Metadata from Raw Measurements. ISWC SSN 2012

Calbimonte, Jeung, Corcho, & Aberer. Semantic Sensor Data Search in a Large-Scale Federated Sensor Network. ISWC SSN 2011

Page 42: Phd

Future directions

42

WEB

SPARQLStream queries

Publishing Linked Stream Data

Currently static

SPARQL streaming standards

Dereferencing streaming data

Query FederationDistributed sensor

data

Static and streaming sources

Stream Reasoningquery rewriting, expanding

queries

Expresiveness

Integrate with the Web of Data

Inferencing

Page 43: Phd

Future directions

WEB

Sensor pattern classification

Combine with query processing

Live data classification

Statistical & quality analysis

Integrate statistic analyisis

Mappings to statistical models

Data quality filtering

Parallel Massive Stream Processing

Online stream analysis

Scalable stream processing

S4, Storm, Streamcloud

Heterogeneity

43

Page 44: Phd

Ontology-based Access to Sensor Data Streams

Jean-Paul Calbimonte

Supervisor: Oscar Corcho

Ontology Engineering GroupFacultad de Informática, Universidad Politécnica de Madrid

18.4.2013

[email protected]

PhD Thesis Defense

Page 45: Phd

45

Page 46: Phd

SSN Ontology with other ontologies

46

W3C SSN Ontology

tool for modeling our sensor datacombine with domain ontologies

Page 47: Phd

Algebra construction

47

timed,sp_wind

π

ω

σ sp_wind>10

5 Hour

wan7windsensor1 windsensor2

Page 48: Phd

Static optimization

48

timed,sp_wind

π

ω

σ sp_wind>10

5 Hour

wan7

timed,windvalue

π

ω

σ windvalue>10

5 Hour

windsensor1

timed,windvalue

π

ω

σ windvalue>10

5 Hour

windsensor2

Page 49: Phd

SPARQL Streaming extensions

49

Page 50: Phd

SPARQL Stream features

50

Page 51: Phd

SRBench

51

Page 52: Phd

RDF Streams and SPARQLStream

52

RDF Stream

Time window

Window-Stream

Page 53: Phd

Mappings

53

Subject, predicate, object

Evaluate query

Page 54: Phd

Rewrite to algebra

54

Page 55: Phd

Rewriting and Execution Process

55

Page 56: Phd

Execution process

56

Page 57: Phd

SRBench Datasets

real-world U.S. weather data1

first & largest sensor dataset in LOD

57

LinkedSensorData

LinkedSensorMetadata

LinkedObservationData~20k US weather stations, ~100k sensors

links to locations in GeoNames nearbyhurricane & blizzard observations in

US~1.73 billion RDF triples~159 million observations

1 http://mesowest.utah.edu

Name Storm Type Date #Triples #Observations Data size

Bill Hurricane Aug. 17 – 22, 2009

231,021,108 21,272,790 ~15 GB

Ike Hurricane Sep. 01 – 13, 2008

374,094,660 34,430,964 ~34 GB

Gustav Hurricane Aug. 25 – 31, 2008

258,378,511 23,792,818 ~17 GB

Bertha Hurricane Jul. 06 – 17, 2008 278,235,734 25,762,568 ~13 GB

Wilma Hurricane Oct. 17 – 23, 2005 171,854,686 15,797,852 ~10 GB

Katrina Hurricane Aug. 23 – 30, 2005

203,386,049 18,832,041 ~12 GB

Charley Hurricane Aug. 09 – 15, 2004

101,956,760 9,333,676 ~7 GB

Blizzard Apr. 01 – 06, 2003 111,357,227 10,237,791 ~2 GB

Page 58: Phd

SRBench Queries

58

graph pattern matchingsolution

modifierquery

formSPARQL

1.1reasonin

gstreamin

g

data access

and, filter, union, optional

projection, distinct

select, construct, askaggregate, subquery

subclass, subproperty, sameAs

time window, istream

observations, sensor metadatageonames, dbpedia

select expr, property path

dstream, rstream

17

q

ueri

es

Page 59: Phd

Query Features

59

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17

1.Graph pattern matching

A A,F,O A A,F A A,F,U A A A A A,F A,F,U A,F A,F,U A,F A,F A,F

2. Solution modifier P,D P,D P P P P P,D P P P,D P,D P P P,D P P P

3. Query form S S A S C S S S S S S S S S S S S

4. SPARQL 1.1 F,P A A,E,M,F

A,S N A,E,M A,E,M A,S,M,F

A,S,E,M,F,P

A,E,M,F,P

F,P A,E,M,P

P P

5. Reasoning C R C A C

6. Streaming T T T T T T T,D T T T T T T T T

7. Dataset O O O O O O O O,S O,S O,S O,S O,S,G O,S,G O,S,G O,S,D O,S,G,D

S

1. And, Filter, Union, Optional2. Projection, Distinct3. Select, Construct, Ask4. Aggregate, Subquery, Negation, Expr in SELECT, assignMent,

Functions&operators, PropertyPath5. subClassOf, subpRopertyOf, owl:sameAs6. Time-based window, Istream, Dstream,Rstream7. LinkedObservationData, LinkedSensorMetadata, GeoNames,

Dbpedia