Abstractions for Shared Sensor Networks DMSN September 2006 Michael J. Franklin.

35
Abstractions for Shared Abstractions for Shared Sensor Networks Sensor Networks DMSN September 2006 Michael J. Franklin
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Abstractions for Shared Sensor Networks DMSN September 2006 Michael J. Franklin.

Abstractions for Shared Abstractions for Shared Sensor NetworksSensor Networks

DMSNSeptember 2006

Michael J. Franklin

Mike FranklinUC Berkeley EECS

OutlineOutline

• Perspective on shared infrastructure• Scientific Applications• Business Environments

• Data Cleaning as a Shared Service• Other Core Services

• What’s core and what isn’t?

• Conclusions

Mike FranklinUC Berkeley EECS

Scientific Scientific InstrumentsInstruments

Cost: moderateUsers: oneUse: defense/navigationScheduling: ad hocData Cleaning: cloth

Mike FranklinUC Berkeley EECS

Scientific Scientific InstrumentsInstruments

Cost: moreUsers: oneUse: scienceScheduling: ad hocData Cleaning: religious

Mike FranklinUC Berkeley EECS

Scientific Scientific InstrumentsInstruments

Cost: 100’s K$ (1880s $)

Users: 100’sUse: scienceScheduling: by committeeData Cleaning: grad students

Mike FranklinUC Berkeley EECS

Scientific Scientific InstrumentsInstruments

Cost: 100’s M$ (2010s $)Users: 1000’s-millionsUse: science and educationScheduling: mostly static - SURVEYData cleaning: mostly algorithmic

Key Point: Enabled by modern (future) Data Management!

Mike FranklinUC Berkeley EECS

Shared InfrastructureShared Infrastructure• Sharing dictated by costs

• Costs of hardware• Costs of deployment• Costs of maintenance

• Pooled Resource Management• Comptetitively Scheduled• Statically Scheduled (surveys)

• Data Cleaning• At the instrument• By the applications (or end users)

• Other Services

Mike FranklinUC Berkeley EECS

Shared Sensor NetsShared Sensor Nets• Macroscopes are expensive:

• to design• to build• to deploy• to operate and maintain

They will be shared resources:- across organizations- across apps w/in organizations

Q: What are the right abstractions to support them?

Mike FranklinUC Berkeley EECS

Traditional Shared Data Traditional Shared Data MgmtMgmt

Point of Sale

Inventory

Data Feeds

Data Warehouse

Business Intelligence

Etc.

Reports

ExtractTransform

Load Data Mart

Data Mart

Dashboards

OperationalSystems

ad hocQueries

Cleaning,Auditing,…

UsersAll users/apps see only cleaned

data:a.k.a. “TRUTH”

Mike FranklinUC Berkeley EECS

Shared SensorNet Shared SensorNet ServicesServices

Data Cleanin

gSchedulin

gMonitoring

Actuation

Tasking/ Programming

Evolution

Provisioning

We will need to understand the shared/customtradeoffs for all of these.

Quality Estimatio

n

DataCollection

Query & Reporting

Mike FranklinUC Berkeley EECS

Data Cleaning as a Shared Service

Mike FranklinUC Berkeley EECS

Some Data Quality Some Data Quality Problems with SensorsProblems with Sensors1. (Cheap) sensors are failure and error

prone (and people want their sensors to be really cheap).

2. Device interface is too low level for applications.

3. They produce too much (uninteresting) data.

4. They produce some interesting data, and it’s hard to tell case #3 from case #4.

5. Sensitive to environmental conditions.

Mike FranklinUC Berkeley EECS

Problem 1a: Sensors are Problem 1a: Sensors are NoisyNoisy

• A simple RFID Experiment

• 2 adjacent shelves, 6 ft. wide

• 10 EPC-tagged items each, plus 5 moved between them

• RFID antenna on each shelf

Mike FranklinUC Berkeley EECS

Shelf RIFD Test - Shelf RIFD Test - Ground TruthGround Truth

Mike FranklinUC Berkeley EECS

Actual RFID Actual RFID ReadingsReadings

“Restock every time inventory goes below 5”

Mike FranklinUC Berkeley EECS

Prob 1b: Sensors “Fail Prob 1b: Sensors “Fail Dirty”Dirty”

• 3 temperature-sensing motes in the same room

Outlier Mote

Average

Mike FranklinUC Berkeley EECS

Problem 2: Low-Problem 2: Low-level Interfacelevel Interface

Lack of good support for devices increases thecomplexity of sensor-based applications.

Mike FranklinUC Berkeley EECS

Problems 3 and 4: Problems 3 and 4: The Wheat from the The Wheat from the ChaffChaff

Shelf RFID reports (50 times/sec):• there are 100 items on the shelf

• the 100 items are still on the shelf• the 100 items are still on the shelf

• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf

• the 100 items are still on the shelf• the 100 items are still on the shelf

• the 100 items are still on the shelf • the 100 items are still on the shelf• the 100 items are still on the shelf

• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf• the 100 items are still on the shelf • the 100 items are still on the shelf

• the 100 items are still on the shelf• there are 99 items on the shelf

• the 99 items are still on the shelf

Mike FranklinUC Berkeley EECS

Problem 5: Problem 5: EnvironmentEnvironment

Read Rate vs. Distance Alien I2 Tag in a room on the 4th floor of Soda Hall

Read Rate vs. Distance using same reader and tag in the room next door

Mike FranklinUC Berkeley EECS

VICE:VICE: Virtual Device Virtual Device InterfaceInterface [Jeffery et al., Pervasive [Jeffery et al., Pervasive 2006]2006]

• Goal: Hide messy details of underlying physical devices.• Error characteristics• Failure• Calibration• Sampling Issues• Device Management• Physical vs. Virtual

• Fundamental abstractions:• Spatial & temporal

granules

“Metaphysical Data Independence”

Mike FranklinUC Berkeley EECS

VICE - A Virtual Device VICE - A Virtual Device LayerLayer

RFIDRFID

“Virtual Device(VICE)API”

Vice API is a natural placeto hide much of the complexity arising from physical devices.

Mike FranklinUC Berkeley EECS

The VICE The VICE QueryQuery PipelinePipeline

Multiple Receptors

Single Tuple

Window

Vice Stages

Generalization

Arbitrate

Clean

Smooth

Validate

Analyze

Join w/Stored Data

On-line Data Mining

Mike FranklinUC Berkeley EECS

RFID Smoothing RFID Smoothing w/Queriesw/Queries

Time

Raw readings

Smoothed output

• RFID data has many dropped readings• Typically, use a smoothing filter to interpolateSELECT distinct tag_idFROM RFID_stream [RANGE ‘5 sec’]GROUP BY tag_id

SELECT distinct tag_idFROM RFID_stream [RANGE ‘5 sec’]GROUP BY tag_id

Smoothing Filter

Mike FranklinUC Berkeley EECS

After Vice After Vice ProcessingProcessing

“Restock every time inventory goes below 5”

Mike FranklinUC Berkeley EECS

Adaptive SmoothingAdaptive Smoothing[Jeffery et al. VLDB 2006][Jeffery et al. VLDB 2006]

Mike FranklinUC Berkeley EECS

Ongoing Work: Spatial Ongoing Work: Spatial SmoothingSmoothing

• With multiple readers, more complicated

Reinforcement

A? B? A U B? A B?

Arbitration

A? C? All are addressed by statistical framework!

U

A

B

C

D

Two rooms, two readers per room

Mike FranklinUC Berkeley EECS

Problems with a Problems with a single Truthsingle Truth

• If you knew what was going to happen, you wouldn’t need sensors• upside down airplane• ozone layer hole

• Monitoring vs. Needle-in-a-haystack

• Probability-based smoothing may remove unlikely, but real events!

Mike FranklinUC Berkeley EECS

Risks of too little Risks of too little cleaningcleaning

• GIGO• Complexity- Burden on App

Developers• Efficiency (repeated work)• Too much opportunity for error

Mike FranklinUC Berkeley EECS

Risks of too much Risks of too much cleaningcleaning

The appearance of a hole in the earth's ozone layer over Antarctica, first detected in 1976, was so unexpected that scientists didn't pay attention to what their instruments were telling them; they thought their instruments were malfunctioning.

National Center for Atmospheric Research

In fact, the data were rejected as unreasonable by data quality control algorithms

Mike FranklinUC Berkeley EECS

One Truth for Sensor One Truth for Sensor Nets?Nets?

• How clean is “clean-enough”?• How much cleaning is too much?• Answers are likely to be:

• domain-specific• sensor-specific• application-specific• user-specific• all of the above?

How to split between shared and application-specific cleaning?

Mike FranklinUC Berkeley EECS

Fuzzy TruthFuzzy TruthOne solution is to make the

shared interface richer.Probabilistic Data Management is

also the key to “Calm Computing”

Mike FranklinUC Berkeley EECS

Adding Quality AssessmentAdding Quality Assessment

A. Das Sarma, S. Jeffery, M. Franklin, J. Widom, “Estimating Data Stream Quality for Object-Detection Applications”, 3rd Intl ACM SIGMOD Workshop on Information Quality in Info Sys, 2006

Mike FranklinUC Berkeley EECS

‘‘Data Furnace” Data Furnace” ArchitectureArchitecture

Service Layer•Probabilistic Reasoning•Uncertainty Management•Data Model Learning•Complex Event Processing•Data Archiving and Streaming

Garafalakis et al.D.E. Bulletin, 3/06

Mike FranklinUC Berkeley EECS

Rethinking Service Rethinking Service AbstractionsAbstractions

Data Cleanin

gScheduling

Monitoring

Actuation

Tasking/ Programming Evolution

Provisioning

We will need to understand the shared/customtradeoffs for all of these.

Quality Estimatio

n

Query-DataCollection

Mike FranklinUC Berkeley EECS

ConclusionsConclusions

• Much current sensor research is focused on the “single user” or “single app” model.

•Sensor networks will be shared resources.

•Can leverage some ideas from current shared Data Management infrastructures.

•But, new solutions, abstractions, and architectures will be required.