Provenance in Sensornet Republishing
Unkyu Park and John HeidemannUniversity of Southern California
Information Science InstituteJune 18, 2008
Why Sensornet Provenance?
• Growing amount of sensornet data– In isolated sensornets?– Today, reuse of data and collaboration are rare
• Sharing is important– Use the Internet in sharing sensor data– multiple steps, different users
• Provenance for sensornet– Support tracking data back to its source– Encourage sharing
2
Sensor-Internet
• Goals– Share and search across many independently running sensor networks– Allow users to process and share transformed data
3
republisher: transforms the existing data
theInternet
sensor-search:index data and
support sensornet discovery
users
motesensornet sensornet mobile phones or personal computerssensors:
sense the environments
sensor store: repository for all data
republisher
[S. Reddy, G. Chen, B. Fulkerson, S. J. Kim, U. Park, N. Yau, J. Cho, M. Hansen, and J. Heidemann. Sensor-Internet Share and Search: Enabling Collaboration of Citizen Scientists. in Data Sharingand Interoperability on the World-wide Sensor Web, IPSN 2007, April 2007]
Sensor data sharing
How Can Sensornet Provenance Help?
4
TempMapInterpolate point data into a complete temperature map
1. Check the transformation
Problem: A user detects abnormality on the map
Temperature SensorRaw87.1?7.187.?
Image
Recognition
94.3
94.3
77.187.378.8
58.4
2. Check the input
Q. What causes problem?
3. Find an abnormal sensor reading4. Check the transformation and input
5. Found that the image recognition problem (74.3 94.3)
Fixing Digits
Digit Repair
Corrected87.1 ±0.187.1 ±0.187.5 ±0.5
Raw87.1?7.187.?
Building Sensornet Ecosystem
• Collaborative processing– Encourage users
who use the same data to collaborate
– Participatory sensing• Search over the provenance
– Exploit the provenance to indentify high quality sensor data
5
TempMapTemperature
SensorRaw87.1?7.187.?
Digit Repair
Challenges in Sensornet Provenance
• Sensor data are distributed across many data providers
– Need: distributed data management and authorization• Locate the distributed sensor data• Support a distributed authorization in tracking provenance
• Each sensor data item is often small– Need: efficient provenance storage
• Scale the provenance storage according the sensor data size
• Sensor data keeps arriving– Need: stream-aware provenance
• Record the temporal location of stream
6
Sensor ProvenanceGoals and Contributions
• Goals– End-user can follow back to the original source– Observe each step of processing
• Contributions– Provenance via new linking scheme (distributed data
management)– User-centric access control (distributed authorization)– Incremental compression (provenance storage)– Stream-aware provenance
7
Outline• Motivation• Sensornet Provenance• Evaluation
– Prototype deployment– Storage cost– Compression alternatives– Ease-of-use provenance
8
Design Choice of Sensornet Provenance
• Representation – annotation vs. inversion– content vs. link
• Granularity– tuple-level (fine-grained) vs. table-level
• Consistency (Stream-aware provenance)– timestamping to handle sensor data that keep arriving
• Authorization– The data generator controls data access– Pass a “letter of reference” to the owner
9
Predecessor Links• Purpose: locate sensor data across different administration• Fine-grained, annotation based, timestamped links
– Source location• Location of the source repository • Table at that repository• Search from the table
– Timestamp• To replay a relative query and produce the same result
– Transformation• A point to a general description, source codes, or executable programs
• An example– .
10&x="http://www.isi.edu/ilense/siss/tempread.html"
sb://sensorbase.org/soap/sensorbase2.wsdl?s=getData&a1="datetime,temperature"&a2=p_97_temperature&a3=‘sensorid="sum-in"’&a4=0&a5=1
&t="2008-02-24 12:00:00”
Letter of Reference• Purpose: provide an ease-of-use authorization• Sensor-store security model
– Public– Case-by-case basis
• Letter of reference– Contextual information of the data requestor
• User’s activities : collaboration with others, data sharing activities• How the user encountered the provider’s data
• Authentication– Provide this context to inform the data owner– The owner will make a decision based on it
11
Outline• Motivation• Sensornet Provenance• Evaluation
– Prototype deployment– Storage cost– Compression alternatives– Ease-of-use provenance
12
Prototype Deployment• Deployment
– Provenance system– Sensors– Sensor-store
• Prototype republishers– Digit repair– Digit repair with Image– TempMap
13
Fixing D
igits
Repair with image Correcte
d87.1 ±0.187.1 ±0.187.5 ±0.5
Raw87.1?7.187.?
Fixing D
igits
Digit RepairCorrected87.1 ±0.187.1 ±0.187.5 ±0.5
Raw87.1?7.187.?
TempMapInterpolate point data into a complete temperature map
republishingIm
age Recognitio
n
West L.A. Temperature Publishing Raw
87.1?7.187.?
republishinrepublishingg
Storage Alternatives• Alternatives
– copy source– uncompressed links– compressed links
• Small source, and data – Copying source works well– Uncompressed link is
verbose, larger than data– With compression, cost
equals copying source
14
Digit Repair(small source and republished data)
Benefits Depend the Size of Source
• Copying source is expensive when source is large• Compressed link works well in all three cases
15
Repair with Image(large source and
small republished data)
TempMap(small source and
large republished data)
Link Compression• We showed that link compression is important, so what
are the compression alternatives• Compression Alternatives
– no compression– per-link– Incremental
• Exploit redundancy across predecessor links
• 83% storage savingcompared to no compression
16
Ease-of-use: Provenance• Provenance extension
– Sensorbase.org– predecessor links
• Easy source tracking– A simple click allow to
track the source data
17
provenance
a list of predecessor linkssource data
provenance of the source data
Ease-of-use: AuthorizationEasy, user-centric, distributed access control
18
have an account?
Yes No
Generated a letter of reference (predecessor link, user account, target, user’s activities)
If accessing source data requires an authentication
Conclusions• Sensor republishing will become an
important means to share sensor data• New provenance for sensornet
– Provenance via new linking scheme– Easy, user-centric, distributed access control– Compression makes the tuple-level provenance
reasonable• http://www.isi.edu/ilense/siss
19
Top Related