Give me purpose! - Technische Universität Darmstadt · PDF fileBackground •Immanuel...
Transcript of Give me purpose! - Technische Universität Darmstadt · PDF fileBackground •Immanuel...
Why?SenseML 2014 Keynote
Immanuel Schweizer
Background
• Immanuel Schweizer
• TU Darmstadt, Germany
• Telecooperation Lab• Ubiquitous Computing
• Smart Urban Networks
SenseML 2014 2
Background
• Graph-based optimization forP2P networks
• PhD Thesis • Energy-efficient network protocols
for wireless sensor networks
• Flow Control
• Topology Control
• Application: Urban Management
SenseML 2014 3
Background
SenseML 2014 4
Inductive Loops
• >150 traffic lights• ~3,000 sensors
• Two parameters• Utilization
• Count
SenseML 2014 5
Street Cars
• ~10 sensors• Deployed on streetcars
• Solar cells, Zigbee (868 MHz), temperature, GPS, …
SenseML 2014 6
Phones / Noisemap
• Noise pollution via microphone
• More than 2000 installations• 30 active users per day
• ~ 750,000 data points
• Gamification
• Calibration
SenseML 2014 7
da_sense
SenseML 2014 8
More sensors…
…more data!SenseML 2014 9
And more data…
OpenSense (ETH Zurich, http://www.opensense.ethz.ch/trac/)
DeviceAnalyzer (University of Cambridge, https://deviceanalyzer.cl.cam.ac.uk/)
SenseML 2014 10
What do we do with all that data?
SenseML 2014 11
What do we do with all that data?
• Help with planning tasks
• Understand human activity
• Environmental models
• Detect events
• Track users
• Nowcasting / Forecasting
• …
SenseML 2014 12
Machine Learning
SenseML 2014 13
What‘s special about sensor data?
SenseML 2014 14
Where does sensor data come from?
SenseML 2014 15
Sensor Infrastructure
SenseML 2014 16
Sensor Infrastructure
• High cost per sensor
• Mostly wired
• High quality of information
• Some kind of certification
SenseML 2014 17
Sensor Infrastructure
(Wireless) Sensor Networks
SenseML 2014 18
Wireless Sensor Networks
• Cheaper hardware
• Mostly wireless
• Battery-powered
• Mixed quality of information
• High diversity
SenseML 2014 19
Sensor Infrastructure
(Wireless) Sensor Networks
Mobile Sensing / User-generated Data
SenseML 2014 20
Mobile Sensing
• Easy development and deployment
• Almost no hardware cost
• Lack of control over quality of information
• Privacy
• Humans-in-the-loop
SenseML 2014 21
Sensor Infrastructure
(Wireless) Sensor Networks
Mobile Sensing / User-generated Data
Qu
antity
Qu
ality
SenseML 2014 22
What‘s special about sensor data?
Heterogeneity
• Unstructured vs. Structured data
• Different hardware• Different Sensors• Mobile Phones vs.
Dedicated Hardware
• Heterogeneity of data sources
• Spatial and time resolution
Quality-of-Information
• Low cost sensors
• Mobility
• Human-in-the-loop
• Faults
• Placement
• …
SenseML 2014 23
Preprocessing
• Data Fusion
• Integrating External Sources
• Filtering
• Approximation
• Fault Detection
• Manual Cleaning
• …
SenseML 2014 24
Example 1: Location
SenseML 2014 25
Example 2: Filtering Noisemap
SenseML 2014 26
Example 2: Filtering Noisemap
SenseML 2014 27
Example 3: Road Network
• Traffic measurements
• Noise measurements
• Idea: Predict traffic, based on noise measurements
SenseML 2014 28
Example 3: Road Network
SenseML 2014 29
Road network data processing
Road Segment
· Road Type· Surface Type· Maximum Speed· Oneway· Number of lanes· Etc.
Road Characteristics
A polygon area in WGS 84 coordinate system
An area around the road segment, excluding the
space near neighbor segements and the areas of surrounding buildings.
Road Segment Geometry Selection area geometry
Average sound pressure level for a time interval
Traffic levelWeather conditions
SenseML 2014 30
Road network data processingOpenStreetMap
• Goal - create road segments automatically
• Largest free road network dataset
• OSM Data format• Node, way, relation• Attributes
SenseML 2014 31
Road network data processing
OSM - Non-planar topology
• Straight-forward planarization not possible• Road segment separated in multiple polylines
SenseML 2014 32
Road network data processing
• Misclassified road links• Remove "unclassified" roads• Filter by length
• Represent multiple ways as single way• Merge ways
• Missing common node• Merge nodes in proximity of 5 cm
SenseML 2014 33
Road network data processing
• Clean up
• Combine parallel ways of the same street
SenseML 2014 34
Road network data processing
• 2D geometry• Based on number of lanes
SenseML 2014 35
𝑆𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛𝐴𝑟𝑒𝑎 = 𝐴\(𝐵1 ∪ 𝐵2 ∪ … 𝐵𝑛)
Road network data processing
Spatial filter
• Which sound pressure records to include?
• Straight-forward approach: select measurements based on proximity
• 2 spatial buffers around each segment
SenseML 2014 36
Road network data processing
• Exclude buildings
• Location accuracy - falsely included/excluded measurements• Inward/outward offsetting
• Inward: minimize the number of included measurements, that are recorded outside
• Outward: minimize the number of filtered out measurements, that are recorded inside
SenseML 2014 37
Example 3: Road Network
SenseML 2014 38
What‘s special about sensor data?
SenseML 2014 39
What‘s special about sensor data?
SenseML 2014 40
What‘s special about sensor data?
=?
SenseML 2014 41
Real-world data
• Classes for classification• Sound Level
• Traffic Level
SenseML 2014 42
Example: Traffic Level
SenseML 2014 43
Example: Traffic Level
SenseML 2014 44
Real-world data
• Classes for classification• Sound Level
• Traffic Level
• Evaluation
• Transferability
SenseML 2014 45
Example: Noise Pollution
Visualization
Sound Level Prediction
ARFF Writer
Classification
Decision Tree Learning
Final Model
1
2
OpenStreetMap
Extracting OSM information about
nearby streets
LinkedGeoData
Extracting information about nearby buildings
Object Data (RDF)
WeatherData
Extracting weather information in the surrounding area
Data File
Data File
External Data Sources
Additional Data
Adding additional information
SPARQL
Attributes
Noisemap
Instances of noise data
Initial Dataset
Point of Interest
Geocoordinates
1
2
SenseML 2014 46
Evaluation
• Cross Validation• Accuracy, Precision, Recall ~
80%
• Other Models• Same Resolution
• Same Input Data
• Difference?
• Human-readable rules
SenseML 2014 47
Transferability
• Perfect Model for Darmstadt
• No noise data in Nancy, France
• Same Features?• External data sources
• Different regulations
• …
SenseML 2014 48
What‘s special about sensor data?
SenseML 2014 49
Pipeline
Visualization
Sound Level Prediction
ARFF Writer
Classification
Decision Tree Learning
Final Model
1
2
OpenStreetMap
Extracting OSM information about
nearby streets
LinkedGeoData
Extracting information about nearby buildings
Object Data (RDF)
WeatherData
Extracting weather information in the surrounding area
Data File
Data File
External Data Sources
Additional Data
Adding additional information
SPARQL
Attributes
Noisemap
Instances of noise data
Initial Dataset
Point of Interest
Geocoordinates
1
2
SenseML 2014 50
PipelinesVisualization
Sound Level Prediction
ARFF Writer
Classification
Decision Tree Learning
Final Model
1
2
OpenStreetMap
Extracting OSM information about
nearby streets
LinkedGeoData
Extracting information about nearby buildings
Object Data (RDF)
WeatherData
Extracting weather information in the surrounding area
Data File
Data File
External Data Sources
Additional Data
Adding additional information
SPARQL
Attributes
Noisemap
Instances of noise data
Initial Dataset
Point of Interest
Geocoordinates
1
2
Layer 1
Layer 2
Layer 3
Measurements Traffic Data
Measurement Filter Traffic ParserOSM Parser
OSM XML
Machine Learning ModelTraining Set Builder
SenseML 2014 51
Pipelines
• StandardizedToolbox• Rapidminer++
• Generalize Components (with interfaces)
• Learn and share• What parts can be generalized? Why?
• Share your experience about building these pipelines
SenseML 2014 52
Visualization
Sound Level Prediction
ARFF Writer
Classification
Decision Tree Learning
Final Model
1
2
OpenStreetMap
Extracting OSM information about
nearby streets
LinkedGeoData
Extracting information about nearby buildings
Object Data (RDF)
WeatherData
Extracting weather information in the surrounding area
Data File
Data File
External Data Sources
Additional Data
Adding additional information
SPARQL
Attributes
Noisemap
Instances of noise data
Initial Dataset
Point of Interest
Geocoordinates
1
2
What‘s special about sensor data?
• Preprocessing• Heterogeneity
• QoI
• Real-World• Classes
• Evaluation
• Transferability
• Pipeline• Share, learn, and standardize?
• More automation
SenseML 2014 53