Database Middleware for Sensor Networks
description
Transcript of Database Middleware for Sensor Networks
1
Database Middleware for Sensor Networks
Sam MaddenAssistant Professor, [email protected]
Slides prepared with Wei Hong
2
Motivation• Sensor networks (aka sensor webs, emnets) are here
– Several widely deployed HW/SW platforms• Low power radio, small processor, RAM/Flash
– Variety of (novel) applications: scientific, industrial, commercial– Great platform for mobile + ubicomp experimentation
• Real, hard research problems to be solved– Networking, systems, languages, databases– Central problem: ease of access, appropriate programming
abstractions
I will summarize:– Low-level sensornet issues– A particular middleware architecture:
• TinyDB + TASK– Current and future research middleware ideas
Berkeley Mote
Some
Sensornet
Apps
redwood forestmicroclimate monitoring
smart coolingin data centers
http://www.hpl.hp.com/research/dca/smart_cooling/
condition-basedmaintenance
And More…
• Homeland security• Container monitoring
• Mobile environmental apps• Bird tracking • Zebranet
• Home automation• Etc!
structural integrity
4
Architectural Overview
Stable Store(DBMS)
Field Tools
Local Servers
Internet
Client Tools GUIs,etcExternal Tools
Sensor Network
TinyDB
Middleware
Middleware Issues:APIs for current + historical access?
Which data when?How to act on data?
Network and node status?
Directed DiffusionCOUGAR
5
Declarative Queries
• Programming Apps is Hard– Limited power budget– Lossy, low bandwidth communication– Require long-lived, zero admin deployments– Distributed Algorithms– Limited tools, debugging interfaces
• Queries abstract away much of the complexity– Burden on the database developers– Users get:
• Safe, optimizable programs• Freedom to think about apps instead of details
6
TinyDB: Declarative Query Interface to Sensornets
• Platform: Berkeley Motes + TinyOS• Continuous variant of SQL : TinySQL
• Power and data-acquisition based in-network optimization framework
• Extensible interface for aggregates, new types of sensors
7
Agenda
• Part 1 : Sensor Networks (40 mins)– TinyOS– NesC
• Part 2: TinyDB + TASK (50 mins)– Data Model and Query Language– Software Architecture
• 30 minute break• Part 3: Alternative Middleware (1:30 mins)
Architectures + Research Directions• Finish around 12
8
Part 1
• Sensornet Background• Motes + Mote Hardware
– TinyOS– Programming Model + NesC
• TinyOS Architecture– Major Software Subsystems– Networking Services
9
Sensor Networks: a hot topic
• New university courses• New conferences
– ACM SenSys, IEEE IPSN, etc.
• New industrial research lab projects– Intel, PARC, MSR, HP, Accenture, etc.
• Startup companies– Crossbow, Dust, Ember, Sensicast, Moteiv, etc.
• Media Buzz– Over 30 news articles since July 2002 covering Intel-
Berkeley/UC Berkeley sensor network activities– One of 10 emerging technologies that will change
the world – MIT Technology Review
11
Why Now?
• Commoditization of radio hardware– Cellular and cordless phones, wireless
communication
• Low cost -> many/tiny -> new applications!
• Real application for ad-hoc network research from the late 90’s
• Coming together of EE + CS communities
12
MotesuProc: 4Mhz, 8 bit Atmel RISCRadio: 40 kbit 900/450/300 MHz or 250 kbit 2.5GHz (MicaZ 802.15.4)Memory:4 K RAM / 128 K Program Flash / 512 K Data FlashPower: 2 x AA or coin cell
Mica MoteMica Mote
Mica2DotMica2Dot
uProc: 8Mhz, 16 bit TI RISCRadio: 250 kbit 2.5GHz (802.15.4)Memory:2 K RAM / 60 K Program Flash / 512 K Data FlashPower: 2 x AA
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Telos MoteTelos Mote iMoteiMote
uProc: 12Mhz, 16 bit ARMRadio: BluetoothMemory:64K SRAM / 512 K Data FlashPower: 2 x AA
13
History of Motes
• Initial research goal wasn’t hardware– Has since become more of a priority with emerging
hardware needs, e.g.:• Power consumption• (Ultrasonic) ranging + localization
– MIT Cricket, NEST Project• Connectivity with diverse sensors
– UCLA sensor board
– Even so, now on the 5th generation of devices• Costs down to ~$50/node (Moteiv, Dust)• Greatly improved radio quality• Multitude of interfaces: USB, Ethernet, CF, etc.• Variety of form factors, packages
14
Motes vs. Traditional Computing
• Embedded OS• Lossy, Adhoc Radio
Communication• Sensing Hardware• Severe Power Constraints
• NesC: a C dialect for embedded programming– Components,
“wired together”– Quick commands
and asynch events
Think of the pair as a programming environment
NesC/TinyOS
• TinyOS: a set of NesC components– hardware
components– ad-hoc network
formation & maintenance
– time synchronization
16
Radio Communication
• Low Bandwidth Shared Radio Channel– ~40kBits on motes– Much less in practice
• Encoding, Contention for Media Access (MAC)
• Very lossy: 30% base loss rate– Argues against TCP-like end-to-end
retransmission• And for link-layer retries
• Generally, not well behaved
From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013
17
Types of Sensors
• Sensors attach via daughtercard
•Weather–Temperature–Light x 2 (high intensity PAR, low intensity, full spectrum)–Air Pressure–Humidity
•Vibration–2 or 3 axis accelerometers
•Tracking–Microphone (for ranging and acoustic signatures)–Magnetometer
• GPS• RFID Reader
18
Non-Volatile Storage
• EEPROM– 512K off chip, 32K on chip– Writes at disk speeds, reads at RAM speeds– Interface : random access, read/write 256 byte pages– Maximum throughput ~10Kbytes / second
• MatchBox Filing System– Provides a Unix-like file I/O interface– Single, flat directory– Only one file being read/written at a time
19
Power Consumption and Lifetime
• Power typically supplied by a small battery– 1000-2000 mAH– 1 mAH = 1 milliamp current for 1 hour
• Typically at optimum voltage, current drain rates– Power = Watts (W) = Amps (A) * Volts (V)– Energy = Joules (J) = W * time
• Lifetime, power consumption varies by application– Processor: 5mA active, 1 mA idle, 5 uA sleeping– Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet– Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample
20
• Each mote collects 1 sample of (light,humidity) data every 10 seconds, forwards it
• Each mote can “hear” 10 other motes• Process:
– Wake up, collect samples (~ 1 second)– Listen to radio for messages to forward (~1 second)– Forward data
Power Consumption Breakdown
0
10
20
30
40
50
60
70
80
90
Radio Sensors Processor
Hardware Element
Percentage of Total Power
Energy Usage in A Typical Data Collection Scenario
Processor Energy Breakdown
05
101520253035404550
Idle Waiting
for Radio
Waiting
for
Sensors
Sending
Processing Phase
Percentage of Total Energy
21
Sensors: Slow, Power Hungry, Noisy
Time of Day vs. Light
-20
0
20
40
60
80
100
120
140
160
180
200
20:09 20:38 21:07 21:36 22:04 22:33 23:02 23:31 0:00 0:28 0:57 1:26
Time of Day
Lux
Chamber Sensor
Sensor 69
Time of Day vs. Light
-20
0
20
40
60
80
100
120
140
160
180
200
20:09 20:38 21:07 21:36 22:04 22:33 23:02 23:31 0:00 0:28 0:57 1:26
Time
Light (Lux)
Chamber Sensor
Sensor 69 (Median of Last 10)
22
TinyOS: Getting Started
• The TinyOS home page:– http://webs.cs.berkeley.edu/tinyos– Start with the tutorials!
• The CVS repository– http://sf.net/projects/tinyos
• The NesC Project Page– http://sf.net/projects/nescc
• Crossbow motes (hardware):– http://www.xbow.com
• Intel Imote– www.intel.com/research/exploratory/motes.htm.
23
Part 2
The Design and Implementation of TinyDB
24
Part 2 Outline
• TinyDB Overview• Data Model and Query Language• TinyDB Java API and Scripting• Demo with TinyDB GUI• TinyDB Internals• Extending TinyDB• TinyDB Status and Roadmap
25
TinyDB RevisitedSELECT MAX(mag) FROM sensors WHERE mag > threshSAMPLE PERIOD 64ms
• High level abstraction:– Data centric programming– Interact with sensor
network as a whole– Extensible framework
• Under the hood:– Intelligent query
processing: query optimization, power efficient execution
– Fault Mitigation: automatically introduce redundancy, avoid problem areas
App
Sensor Network
TinyDB
Query, Trigger
Data
26
Feature Overview
• Declarative SQL-like query interface• Metadata catalog management• Multiple concurrent queries• Network monitoring (via queries)• In-network, distributed query processing• Extensible framework for attributes,
commands and aggregates• In-network, persistent storage
27
TinyDB GUI
TinyDB Client APIDBMS
Sensor network
Architecture
TinyDB query processor
0
4
0
1
5
2
6
3
7
JDBC
Mote side
PC side
8
28
Data Model
• Entire sensor network as one single, infinitely-long logical table: sensors
• Columns consist of all the attributes defined in the network
• Typical attributes:– Sensor readings– Meta-data: node id, location, etc.– Internal states: routing tree parent, timestamp, queue
length, etc.• Nodes return NULL for unknown attributes• On server, all attributes are defined in catalog.xml• Discussion: other alternative data models?
29
Query Language (TinySQL)
SELECT <aggregates>, <attributes>
[FROM {sensors | <buffer>}][WHERE <predicates>][GROUP BY <exprs>][SAMPLE PERIOD <const> |
ONCE][INTO <buffer>][TRIGGER ACTION <command>]
30
Comparison with SQL
• Single table in FROM clause• Only conjunctive comparison predicates
in WHERE and HAVING• No subqueries• No column alias in SELECT clause• Arithmetic expressions limited to
column op constant• Only fundamental difference: SAMPLE
PERIOD clause
31
TinySQL Examples
SELECT nodeid, nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s
1EpocEpoc
hhNodeiNodei
ddnestNnestN
ooLightLight
0 1 17 455
0 2 25 389
1 1 17 422
1 2 25 405
Sensors
“Find the sensors in bright nests.”
32
TinySQL Examples (cont.)
Epoch region CNT(…) AVG(…)
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
“Count the number occupied nests in each loud region of the island.”
SELECT region, CNT(occupied) AVG(sound)
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10s
3
Regions w/ AVG(sound) > 200
SELECT AVG(sound)
FROM sensors
EPOCH DURATION 10s
2
33
Event-based Queries
• ON event SELECT …• Run query only when interesting events
happens• Event examples
– Button pushed– Message arrival– Bird enters nest
• Analogous to triggers but events are user-defined
34
Query over Stored Data
• Named buffers in Flash memory• Store query results in buffers• Query over named buffers• Analogous to materialized views• Example:
– CREATE BUFFER name SIZE x (field1 type1, field2 type2, …)
– SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name
– SELECT field1, field2, … FROM name SAMPLE PERIOD d
35
Using the Java API
• SensorQueryer– translateQuery() converts TinySQL string into
TinyDBQuery object– Static query optimization
• TinyDBNetwork– sendQuery() injects query into network– abortQuery() stops a running query– addResultListener() adds a ResultListener that is
invoked for every QueryResult received– removeResultListener()
• QueryResult– A complete result tuple, or– A partial aggregate result, call mergeQueryResult()
to combine partial results• Key difference from JDBC: push vs. pull
36
Writing Scripts with TinyDB
• TinyDB’s text interface– java net.tinyos.tinydb.TinyDBMain –
run “select …”– Query results printed out to the
console– All motes get reset each time new
query is posed• Handy for writing scripts with shell,
perl, etc.
37
Using the GUI Tools
• Demo time
38
Inside TinyDB
TinyOS
Schema
Query Processor
Multihop Network
Filterlight >
400get (‘temp’)
Aggavg(tem
p)
QueriesSELECT AVG(temp) WHERE light > 400
ResultsT:1, AVG: 225T:2, AVG: 250
Tables Samples got(‘temp’)
Name: tempTime to sample: 50 uSCost to sample: 90 uJCalibration Table: 3Units: Deg. FError: ± 5 Deg FGet f : getTempFunc()…
getTempFunc(…)getTempFunc(…)
TinyDBTinyDB
~10,000 Lines Embedded C Code
~5,000 Lines (PC-Side) Java
~3200 Bytes RAM (w/ 768 byte heap)
~58 kB compiled code
(3x larger than 2nd largest TinyOS Program)
39
Tree-based Routing
• Tree-based routing– Used in:
• Query delivery • Data collection• In-network aggregation
– Relationship to indexing?
A
B C
D
FE
Q:SELECT …
Q Q
Q
Q
Q
Q
Q
Q QQ
R:{…}
R:{…}
R:{…}
R:{…} R:{…}
40
Sensor A
Time
Curre
nt
Sensor B
Power Consumption and Lifetime
• Power typically supplied by a small battery– At full power, device will last 2-3 days -> Critical Constraint
• Lifetime, power consumption varies by application– Scales with “duty cycle” : amount of time on– Low data rate (< 1 sample / 30 secs) : > 6 months possible from AA
batteries
Sleeping
Radio On, Processing
TransmittingFundamental challenge: distributed coordination with low
power!
Sensor B
Must Synchronize
!
41
Time Synchronization
• All messages include a 5 byte time stamp indicating system time in ms– Synchronize (e.g. set system time to timestamp) with
• Any message from parent• Any new query message (even if not from parent)
– Punt on multiple queries– Timestamps written just after preamble is xmitted
• All nodes agree that the waking period begins when (system time % epoch dur = 0)– And lasts for WAKING_PERIOD ms
• Adjustment of clock happens by changing duration of sleep cycle, not wake cycle.
42
Extending TinyDB
• Why extending TinyDB?– New sensors attributes– New control/actuation commands– New data processing logic
aggregates– New events
• Analogous to concepts in object-relational databases
43
Adding Attributes
• Types of attributes– Sensor attributes: raw or cooked
sensor readings– Introspective attributes: parent,
voltage, ram usage, etc.– Constant attributes: constant values
that can be statically or dynamically assigned to a mote, e.g., nodeid, location, etc.
44
Adding Attributes (cont)
• Interfaces provided by Attr component– StdControl: init, start, stop– AttrRegister
• command registerAttr(name, type, len)• event getAttr(name, resultBuf, errorPtr)• event setAttr(name, val)• command getAttrDone(name, resultBuf, error)
– AttrUse• command startAttr(attr)• event startAttrDone(attr)• command getAttrValue(name, resultBuf, errorPtr)• event getAttrDone(name, resultBuf, error)• command setAttrValue(name, val)
45
Adding Attributes (cont)
• Steps to adding attributes to TinyDB1) Create attribute nesC components2) Wire new attribute components to
TinyDBAttr configuration 3) Reprogram TinyDB motes4) Add new attribute entries to catalog.xml
• Constant attributes can be added on the fly through TinyDB GUI
46
Adding Aggregates
• Step 1: wire new nesC components
47
Adding Aggregates (cont)
• Step 2: add entry to catalog.xml<aggregate>
<name>AVG</name><id>5</id><temporal>false</temporal><readerClass>net.tinyos.tinydb.AverageClass</readerClass>
</aggregate>
• Step 3 (optional): implement reader class in Java– a reader class interprets and finalizes aggregate
state received from the mote network, returns final result as a string for display.
48
TinyDB Status
• Latest released with TinyOS 1.1 (9/03)– Install the task-tinydb package in TinyOS 1.1
distribution– First release in TinyOS 1.0 (9/02)– Widely used by research groups as well as industry pilot
projects
• Successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden– Largest deployment: ~80 weather station nodes– Network longevity: 4-5 months
49
The Redwood Tree Deployment
• Redwood Grove in UC Botanical Garden, Berkeley
• Collect dense sensor readings to monitor climatic variations across– altitudes,– angles,– time,– forest locations, etc.
• Versus sporadic monitoring points with 30lb loggers!
• Current focus: study how dense sensor data affect predictions of conventional tree-growth models
50
Humidity vs. Time
35
45
55
65
75
85
95
Rel Humidity (%)
101 104 109 110 111
Data from Redwoods
36m
33m: 111
32m: 110
30m: 109,108,107
20m: 106,105,104
10m: 103, 102, 101
Temperature vs. Time
8
13
18
23
28
33
7/7/039:40
7/7/0313:11
7/7/0316:43
7/7/0320:15
7/7/0323:46
7/8/033:18
7/8/036:50
7/8/0310:21
7/8/0313:53
7/8/0317:25
7/8/0320:56
7/9/030:28
7/9/034:00
7/9/037:31
7/9/0311:03
Date
Temperature (C)
51
TASK
52
A SensorNet Dilemma
• Sensors still packaged like HeathKits– Pretty hard to cope with out of the box
• Bare metal encourages one-off applications– Inhibits reuse
• Deployment not intuitive– No configuration/monitoring tools
• SensorNet PhD Factor– Today ~2.5 PhDs needed to deploy a
SensorNet– Needs to be Zero
53
TASK Design Requirements
• Ease of S/W Installation• Deployment tools• Reconfigurability• Health/Mgmt Monitoring• Network Reliability
Guarantee• Interpretable Sensor
Results• Tool Integration
• Audit Trails• Lifetime estimates
• Familiar API• Extensibility of S/W• Modular services
~ For Developers ~
54
Tiny Application Sensor Kit
TASK Field Tools
Stable Store(DBMS)
TASK Client Tools
TASK ServerSensorNet Appliance
External Tools
TinyDB Sensor Network
Internet
TaskView
• Simplicity vs. Functionality• Modularity• Remote control• Fault Tolerant
55
SensorNet Appliance
• Intelligent Gateway– Proxy for the sensornet– Distributes query– Stages results– Manages configuration
• Components– TASK Server– TinyDB Client (Java)– DBMS (PostgreSQL)– WebServer (Apache)
TinyDB Client
DBMS
TASKServer
SNAhttp, other
ODBC
SensorNet
56
Tools
• Field Tool– In-situ diagnostics
• TaskView– Integrated tool for
management and monitoring
57
For more information
• http://triplerock.cs.bekeley.edu/tinydb
58
Part 3
Middleware Architecture and Research Topics
59
Architectural Overview
Stable Store(DBMS)
Field Tools
Local Servers
Internet
Client Tools GUIs,etcExternal Tools
Sensor Network
TinyDB
Middleware
60
What’s Left?
• TinyDB and TinyOS provide a reasonable low-level substrate
• TASK sufficient for many data collection apps• But… there are other architecture issues
– Efficiency concerns• Currently transmit readings from all sensors on each
epoch• Variable, context sensitive rates…
– Data quality issues• Missing and faulty sensors?
– Architectural issues• Actuation / closed loop issues stuff• Disconnection, etc.
61
Sensor Network Research
• Very active research area– Can’t summarize it all
• Focus: database-relevant research topics– Some outside of Berkeley– Other topics that are itching to be scratched– But, some bias towards work that we find
compelling
62
Topics
• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing
• Alternative Architectures– Statistical Techniques
– Heterogeneity– Intermittent Connectivity
• New features– In-network storage– Closing the loop– Integration with traditional databases
63
Topics
• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing
• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity
• New features– In-network storage– Closing the loop– Integration with traditional databases
64
Tiny Aggregation (TAG)
• In-network processing of aggregates– Common data analysis operation
• Aka gather operation or reduction in || programming
– Communication reducing• Operator dependent benefit
– Across nodes during same epoch
• Exploit query semantics to improve efficiency!
Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.
65
Basic Aggregation
• In each epoch:– Each node samples local sensors once– Generates partial state record (PSR)
• local readings • readings from children
– Outputs PSR during assigned comm. interval• Interval assigned based on depth in tree
1
2 3
4
5 Interval 1
2
33
4
• At end of epoch, PSR for whole network output at root
• New result on each successive epoch
66
Illustration: In-Network Aggregation
1 2 3 4 5
4 1
3
2
1
4
1
2 3
4
5
1
Sensor #
Inte
rval #
Interval 4SELECT COUNT(*) FROM sensors
Sample Period
Time
67
Illustration: In-Network Aggregation
1 2 3 4 5
4 1
3 2
2
1
4
1
2 3
4
5
2
Sensor #
Interval 3SELECT COUNT(*) FROM sensors
Inte
rval #
68
Illustration: In-Network Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1
4
1
2 3
4
5
31
Sensor #
Interval 2SELECT COUNT(*) FROM sensors
Inte
rval #
69
Illustration: In-Network Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
1
2 3
4
5
5
Sensor #
SELECT COUNT(*) FROM sensors Interval 1
Inte
rval #
70
Illustration: In-Network Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
1
2 3
4
5
1
Sensor #
SELECT COUNT(*) FROM sensors Interval 4
Inte
rval #
71
Illustration: In-Network Aggregation
1 2 3 4 5
4 zzz zzz zzz 1
3 zzz zzz 2 zzz
2 1 3 zzz zzz
1 5 zzz zzz zzz zzz
4 zzz zzz zzz 1
1
2 3
4
5
1
Sensor #
SELECT COUNT(*) FROM sensors Interval 4
Inte
rval #
72
Aggregation Framework
• As in extensible databases, TinyDB supports any aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0} <a0>
Fmerge {<a1>,<a2>} <a12>
Fevaluate {<a1>} aggregate value
Example: AverageAVGinit {v} <v,1>
AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>} S/C
Partial State Record (PSR)
Restriction: Merge associative, commutative
73
Property Examples Affects
Partial State MEDIAN : unbounded, MAX : 1 record
Effectiveness of TAG
Monotonicity COUNT : monotonicAVG : non-monotonic
Hypothesis Testing, Snooping
Exemplary vs. Summary
MAX : exemplaryCOUNT: summary
Applicability of Sampling, Effect of Loss
Duplicate Sensitivity
MIN : dup. insensitive,AVG : dup. sensitive
Routing Redundancy
Taxonomy of Aggregates
• TAG insight: classify aggregates according to various functional properties– Yields a general set of optimizations that can automatically be
applied
Drives an API!
74
Use Multiple Parents
• Use graph structure – Increase delivery probability with no communication
overhead
• For duplicate insensitive aggregates, or• Aggs expressible as sum of parts
– Send (part of) aggregate to all parents• In just one message, via multicast
– Assuming independence, decreases variance
SELECT COUNT(*)
A
B C
R
A
B C
c
R
P(link xmit successful) = p
P(success from A->R) = p2
E(cnt) = c * p2
Var(cnt) = c2 * p2 * (1 – p2) V
# of parents = n
E(cnt) = n * (c/n * p2)
Var(cnt) = n * (c/n)2 * p2 * (1 – p2) = V/n
A
B C
c/n c/n
R
n = 2
75
Multiple Parents Results
• Better than previous analysis expected!
• Losses aren’t independent!
• Insight: spreads data over many links
Benefit of Result Splitting (COUNT query)
0
200
400
600
800
1000
1200
1400
(2500 nodes, lossy radio model, 6 parents per node)
Avg. COUNT
Splitting
No Splitting
Critical Link!
No Splitting With Splitting
76
Acquisitional Query Processing (ACQP)
• TinyDB acquires AND processes data
– Could generate an infinite number of samples
• An acqusitional query processor controls
– when,
– where,
– and with what frequency data is collected!
• Versus traditional systems where data is provided a priori
Madden, Franklin, Hellerstein, and Hong. The Design of An Acqusitional Query Processor. SIGMOD, 2003.
77
ACQP: What’s Different?• How should the query be processed?
– Sampling as a first class operation
• How does the user control acquisition?– Rates or lifetimes– Event-based triggers
• Which nodes have relevant data?– Index-like data structures
• Which samples should be transmitted?– Prioritization, summary, and rate control
78
• E(sampling mag) >> E(sampling light)
1500 uJ vs. 90 uJ
Operator Ordering: Interleave Sampling + Selection
SELECT light, magFROM sensorsWHERE pred1(mag)AND pred2(light)EPOCH DURATION 1s
(pred1)
(pred2)
mag
light
(pred1)
(pred2)
mag
light
(pred1)
(pred2)
mag light
Traditional DBMS
ACQP
At 1 sample / sec, total power savings could be as much as 3.5mW Comparable to processor!
Correct orderingCorrect ordering(unless pred1 is (unless pred1 is very very selective selective
and pred2 is not):and pred2 is not):
Cheap
Costly
79
Exemplary Aggregate Pushdown
SELECT WINMAX(light,8s,8s)FROM sensorsWHERE mag > xEPOCH DURATION 1s
• Novel, general pushdown technique
• Mag sampling is the most expensive operation!
WINMAX
(mag>x)
mag light
Traditional DBMS
light
mag
(mag>x)
WINMAX
(light > MAX)
ACQP
80
Topics
• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing
• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity
• New features– In-network storage– Closing the loop– Integration with traditional databases
81
Statistical Techniques
• Approximations, summaries, and sampling based on statistics and statistical models
• Applications:– Limited bandwidth and large number of nodes -
> data reduction– Lossiness -> predictive modeling– Uncertainty -> tracking correlations and
changes over time– Physical models -> improved query answering
82
Every time step
TinyDB Retrospective
TinyDBQuery
Distributequery
Collectquery answer
or data
SQL-stylequery
Declarative interface: Sensor nets are not just for PhDs Decrease deployment time
Data aggregation: Can reduce communication
83
Every time step
Limitations of TinyDB approach
TinyDBQuery
Distributequery
Collectdata
SQL-stylequery
Redoprocesseverytimequery
changesQuery distribution: Every node must receive query
New QueryData collection: Every node must wake up at every time step Data loss ignored No quality guarantees Wastes resources by ignoring correlations
84
Sensor net data is correlated
Spatial-temporal correlation
• Data is not i.i.d. shouldn’t ignore missing data
• Observing one sensor information about other sensors (and future values)
• Observing one type of reading information about other local readings
8510 20 300
0.1
0.2
0.3
0.4
t - transition model
SQL-style query
with desired confidence
BBQ: Model-driven data acquisition
Probabilistic Model
10 20 300
0.1
0.2
0.3
0.4
Query
Data gathering
plan
Conditionon new
observations
Example model: Multidimensional
Gaussian
10 20 300
0.1
0.2
0.3
0.4
posterior belief
Strengths of model-based data acquisition Observe fewer attributes Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries
New QueryMiddleware Layer
86
Probabilistic models and queries
User’s perspective:QuerySELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensorsWHERE nodeId in {1..8}
System selects and observes subset of nodesObserved nodes: {3,6,8}
Query result
Node 1 2 3 4 5 6 7 8
Temp. 17.3
18.1 17.4 16.1 19.2 21.3 17.5 16.3
Conf. 98%
95% 100% 99% 95% 100% 98% 100%
10 20 300
0.1
0.2
0.3
0.4
1.0°C
87
Supported queries
• Value query– Xi ± with prob. at least 1-
• SELECT and Range query– Xi[a,b] with prob. at least 1-– which sensors have temperature greater than
25°C ?
• Aggregation– average ± of subset of attribs. with prob. > 1-– combine aggregation and selection– probability > 10 sensors have temperature
greater than 25°C ?
Queries require solution to integrals Many queries computed in closed-form Some require numerical integration/sampling
88
Experimental results
• Redwood trees and Intel Lab datasets• Learned models from data
– Static model– Dynamic model – Kalman filter, time-indexed transition
probabilities
• Evaluated on a wide range of queries
SERVER
LAB
KITCHEN
COPYELEC
PHONEQUIET
STORAGE
CONFERENCE
OFFICEOFFICE50
51
52 53
54
46
48
49
47
43
45
44
42 41
3739
38 36
33
3
6
10
11
12
13 14
1516
17
19
2021
22
242526283032
31
2729
23
18
9
5
8
7
4
34
1
2
3540
89
Cost versus Confidence level
90
Obtaining approximate values
Query: True temperature value ± epsilon with confidence 95%
91
–E.g., if we can characterize failure modes, we can discard them
• Applying well known probabilistic techniques to allow TinyDB to deal with such issues.
Next Step : Outliers and Unusual Events
• Once we have a model of the expected behavior, we can:– Detect unusual (low probability) events– Predict missing values
• Often, there are several “expected” behavior modes, which we want to differentiate betweenAC ON
AC OFF
ON
OFF
92
IDSQ
• Similar idea: suppose you want to e.g., localize a vehicle in a field of sensors
• Idea: task sensors in order of best improvement to estimate of some value:– Choose leader(s)
• Suppress subordinates• Task subordinates, one at a time
– Until some measure of goodness (error bound) is met
See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001.
93
Model location estimate as a point with 2-dimensional Gaussian uncertainty.
Graphical Representation
Principal Axis
S1
Residual 1
Preferred because it reduces error along principal axis
Residual 2 S2
Area of residuals is equal
94
Lots of Other Work with of This Flavor
• Precision / Energy Tradeoff -- Want nodes to sleep except when their data is needed– Olston et al. Approximate Caching. SIGMOD
‘03.– Cheng et al. Kalman Filters. SIGMOD ‘04.- Lazaridis and Mehrotra. Approximate Selection
Queries over Imprecise Data. ICDE 2004.- UCI Quasar Project
- Timeliness + Real Time Constraints• John A. Stankovic etl al. Real Time Communication and
Coordination in Sensor Networks. Proceedings of the IEEE, 91(7), July 2003.
• Tian He et al. SPEED: a stateless protocol (ICDCS’03)
95
In-Net Regression
• Linear regression : simple way to predict future values, identify outliers
• Regression can be across local or remote values, multiple dimensions, or with high degree polynomials– E.g., node A readings vs. node B’s– Or, location (X,Y), versus temperature
E.g., over many nodes
X vs Y w/ Curve Fit
y = 0.9703x - 0.0067
R2 = 0.947
0
2
4
6
8
10
12
1 3 5 7 9Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient
Framework for Modeling Sensor Network Data .” Under submission.
96
In-Net Regression (Continued)
• Problem: may require data from all sensors to build model
• Solution: partition sensors into overlapping “kernels” that influence each other– Run regression in each kernel
• Requiring just local communication
– Blend data between kernels– Requires some clever matrix manipulation
• End result: regressed model at every node– Useful in failure detection, missing value
estimation
97
Topics
• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing
• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity
• New features– In-network storage– Closing the loop– Integration with traditional databases
98
Heterogeneous Sensor Networks
• Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes
• Still must be transparent and ad-hoc• Key to scalability of sensor networks• Interesting heterogeneities
– Energy: battery vs. outlet power– Link bandwidth: Chipcon vs. 802.11x– Computing and storage: ATMega128 vs.
Xscale– Pre-computed results– Sensing nodes vs. QP nodes
99
Computing Heterogeneity with TinyDB
• Separate query processing from sensing– Provide query processing on a small number of nodes– Attract packets to query processors based on “service
value”• Compare the total energy consumption of the
network
• No aggregation• All aggregation• Opportunistic aggregation• HSN proactive
aggregation
Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor
Network Project, ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf.
100
5x7 TinyDB/HSN Mica2 Testbed
101
Data Packet SavingData Packet Saving
-50.00%
-45.00%
-40.00%
-35.00%
-30.00%
-25.00%
-20.00%
-15.00%
-10.00%
-5.00%
0.00%
1 2 3 4 5 6 All (35)
Number of Aggregator
% Change in Data Packet Count
Data Packet Saving - Aggregator Placement
-50.00%
-45.00%
-40.00%
-35.00%
-30.00%
-25.00%
-20.00%
-15.00%
-10.00%
-5.00%
0.00%
25 27 29 31 All (35)
Aggregator Location
% Change in Data Packet Counnt
• How many aggregators are desired?
• Does placement matter?
11% aggregators achieve 72% of max
data reduction
Optimal placement 2/3 distance from sink.
102
Topics
• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing
• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity
• New features– In-network storage– Closing the loop– Integration with traditional databases
103
Occasionally Connected Sensornets
TinyDB QP
TinyDB QP
TinyDB QP
TinyDB Server
GTWY
Mobile GTWY
Mobile GTWYMobile GTWY
GTWYinternet
104
Occasionally Connected Sensornets Challenges
• Networking support– Tradeoff between reliability, power
consumption and delay– Data custody transfer: duplicates?– Load shedding– Routing of mobile gateways
• Query processing– Operation placement: in-network vs. on mobile
gateways– Proactive pre-computation and data movement
• Tight interaction between networking and QP
Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf.
105
Other Occasionally Connected Work
• Kevin Fall. Delay Tolerant Networks. SIGCOMM 2003.
• Juang et al. Enery efficient computing for wildlife tracking. ASPLOS 2002.
• Li et al. Sending messages to mobile users in disconnected ad-hoc wireless networks. MOBICOM 2000.
• Shah et al. Data Mules. SNPA 2003.
106
Topics
• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing
• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity
• New features– In-network storage– Closing the loop– Integration with traditional databases
107
Distributed In-network Storage
• Collectively, sensornets have large amounts of in-network storage
• Good for in-network consumption or caching
• Challenges– Distributed indexing for fast query
dissemination– Resilience to node or link failures– Graceful adaptation to data skews– Minimizing index insertion/maintenance cost
108
Example: DIM• Functionality
– Efficient range query for multidimensional data.
• Approaches– Divide sensor field into
bins.– Locality preserving
mapping from m-d space to geographic locations.
– Use geographic routing such as GPSR.
• Assumptions– Nodes know their
locations and network boundary
– No node mobility
E2= <0.6, 0.7>E1 = <0.7, 0.8>
Q1=<.5-.7, .5-1>
Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong, Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003.
109
Topics
• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing
• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity
• New features– In-network storage– Closing the loop– Integration with traditional databases
110
Closing the Loop
• Challenge: want more than data collection– Condition-based sensing, rate adjustment– Condition-based actuation
• E.g.,– Kansal et al. Sensor Uncertainty Reduction Using Low
Complexity Actuation. IPSN 2004. – work from Qiong Luo HKUST et al in CIDR.– Various process control systems: ladder logic,
SCADA, etc.
• Questions:– Appropriate languages– Resource contention on actuators– Closed-loop safety concerns
111
Topics
• Improving TinyDB Efficiency– In-network aggregation– Acquisitional Query Processing
• Alternative Architectures– Statistical Techniques– Heterogeneity– Intermittent Connectivity
• New features– In-network storage– Closing the loop– Integration with traditional databases
112
Alternative Middleware: Integration into an
Existing DBMS
113
Concluding Remarks
• Sensor networks are an exciting emerging technology, with a wide variety of applications
• Many research challenges in all areas of computer science– Database community included– Some agreement that a declarative interface is right
• TinyDB and other early work are an important first step
• But there’s lots more to be done!– Real challenge is building appropriate middleware abstractions
114
Questions?
http://db.lcs.mit.edu/madden/middleware_tutorial.ppt
115
In-Network Join Strategies
• Types of joins: – non-sensor -> sensor– sensor -> sensor
• Optimization questions:– Should the join be pushed down?– If so, where should it be placed?– What if a join table exceeds the
memory available on one node?
116
Choosing Where to Place Operators
• Idea : choose a “join node” to run the operator
• Over time, explore other candidate placements– Nodes advertise data rates to their neighbors– Neighbors compute expected cost of running the
join based on these rates– Neighbors advertise costs– Current join node selects a new, lower cost node
Bonfils + Bonnet, Adaptive and Decentralized Operator Placement for In-Network QueryProcessing IPSN 2003.
117
Topics
• In-network aggregation• Acquisitional Query Processing• Heterogeneity• Intermittent Connectivity• In-network Storage• Statistics-based summarization and
sampling• In-network Joins• Adaptivity and Sensor Networks• Multiple Queries
118
Adaptivity In Sensor Networks
• Queries are long running• Selectivities change
– E.g. night vs day
• Network load and available energy vary• All suggest that some adaptivity is needed
– Of data rates or granularity of aggregation when optimizing for lifetimes
– Of operator orderings or placements when selectivities change (c.f., conditional plans for correlations)
• As far as we know, this is an open problem!
119
Multiple Queries and Work Sharing
• As sensornets evolve, users will run many queries simultaneously– E.g., traffic monitoring
• Likely that queries will be similar– But have different end points, parameters,
etc
• Would like to share processing, routing as much as possible
• But how? Again, an open problem.