How Dirty is your Data : The Duality between detecting Events and Faults
description
Transcript of How Dirty is your Data : The Duality between detecting Events and Faults
How Dirty is your Data : The Duality between detecting Events and
Faults
J. Gupchup A. Terzis R. Burns A. SzalayDepartment of Computer Science
Johns Hopkins University
Outline
Background Problem Statement Experiments Results Discussion
Application
Monitoring nesting conditions of the Maryland Box turtles
Science Questions: Do nesting conditions determine sex ?
Important to correlate observations with environmental events (rain, snow etc)
Duality of Faults & Events
Data gathered from Sensor Networks contain faults
Delivering faulty data consumes resources and pollutes statistics
Need for fault detection techniques
Fault Detection methods detect readings that deviate from “normal” or “expected” values
Environmental Events :– Scientifically interesting– Deviate from the norm
Research Question(s)
Are “Events” misclassified as “Faults” ?
What metrics could be used to quantify the misclassification ?
How does the misclassification vary with:– Type of Fault– Type of Fault Detection method– Type of modality (Moisture, Temperature)
Is it possible to design a fault detection mechanism that minimizes the misclassification ?
Know Thy Faults
Short Faults– Sudden Change
in measurement
Noise Faults– Large variations in amplitude than expected– Little or no variation in amplitude (unresponsive)
Fault Detection Methods
SHORT Rule– If Xi – X(i-1) > δSHORT mark current measurement as fault (point method)
δSHORT is established from domain knowledge
NOISE Rule– Take W successive samples– IF (σW ≤ σtrain-σallow) OR (σW ≥ σtrain+σallow), mark all W readings as faulty (block method)– σtrain and σallow are established from training data
Linear Least-square Estimation (LLSE)– Estimate expected value of a sensor’s value using other sensors using LLSE
– If Xmodel – Xactual > δLLSE for k of the node’s neighbors, mark the reading as faulty (point method)
A. Sharma, L. Golubchik, and R. Govindan, “On the prevalence of sensor faults in real world deployments”, IEEE conference on Sensor, Mesh and Ad Hoc Communications and networks (SECON), 2007
Evaluation Metrics
Misclassification error (μ) for Point faults: μ = event readings tagged as faults / total event measurements
Total Misclassification (μ )= ∑i Di / ∑i Ei
Misclassification error (μ) for Block Faults:
Misclassification
Fault detection evaluation metric : False negative ratio = fraction of faults failed to be detected
Event Period (Ei)
time
Misclassification
Di
Event Period (Ei)
time
Di
Jug bay Deployment Map
2
5
6
Turtle Nests
38.784607, -76.700460
Weather Station
Courtesy: Google maps
Dataset
Sensor Data: Box temperature and soil moisture 3 motes from Jug Bay (previous slide) 5 months of data (sampled every 10 min.) Train Data Set (1 month), Test Data Set (4
months)
Event Ground Truth (Weather Data): Precipitation data collected from a weather
station ~ 700 m away (sampled every 15 min.) 21 major events (i.e. rainfall) occurred Total rainfall hours : 158 hours
Faults Ground Truth
Start with a cleandata set
Inject Faults to Establish groundTruth
Methodology
For Each Fault Detection Method & Each modality
Use 1st month’s data to Train
Obtain Model Parameters
Evaluate Method on Fault-Injected Test Data
Soil Moisture ‘SHORT RULE’
Reducing the number of misclassification errors increases false negatives
Misclassification LLSE method
Modality Misclassification error False Negatives
Box Temperature 0.3 % 77.19 %
Soil Moisture 46.3 % 50.03 %
Higher misclassification can occur due to :
Spatial & Temporal Heterogeneity of the soil
Lessons Learned
There exists a tension between detecting Events and Faults
Fault Detection Algorithms need to take this into consideration– Events can be misclassified as faults
Need for novel Fault Detection methods that are robust in the presence of Events
Need for Pattern Recognition techniques
Acknowledgements
Abhishek Sharma, Dept. of Computer Science, University of Southern California
Chris Swarth, Jug Bay Wetlands Sanctuary Life Under Your Feet team Marcus Chang, University of Copenhagen
(Courtesy : Andreas Terzis)
Questions !!!!