How Dirty is your Data : The Duality between detecting Events and Faults

18
How Dirty is your Data : The Duality between detecting Events and Faults J. Gupchup A. Terzis R. Burns A. Szalay Department of Computer Science Johns Hopkins University

description

How Dirty is your Data : The Duality between detecting Events and Faults J. Gupchup A. Terzis R. Burns A. Szalay Department of Computer Science Johns Hopkins University. Outline. Background Problem Statement Experiments Results Discussion. Application. - PowerPoint PPT Presentation

Transcript of How Dirty is your Data : The Duality between detecting Events and Faults

Page 1: How Dirty is your Data  :  The Duality between detecting Events and Faults

How Dirty is your Data : The Duality between detecting Events and

Faults

J. Gupchup A. Terzis R. Burns A. SzalayDepartment of Computer Science

Johns Hopkins University

Page 2: How Dirty is your Data  :  The Duality between detecting Events and Faults

Outline

Background Problem Statement Experiments Results Discussion

Page 3: How Dirty is your Data  :  The Duality between detecting Events and Faults

Application

Monitoring nesting conditions of the Maryland Box turtles

Science Questions: Do nesting conditions determine sex ?

Important to correlate observations with environmental events (rain, snow etc)

Page 4: How Dirty is your Data  :  The Duality between detecting Events and Faults

Duality of Faults & Events

Data gathered from Sensor Networks contain faults

Delivering faulty data consumes resources and pollutes statistics

Need for fault detection techniques

Fault Detection methods detect readings that deviate from “normal” or “expected” values

Environmental Events :– Scientifically interesting– Deviate from the norm

Page 5: How Dirty is your Data  :  The Duality between detecting Events and Faults

Research Question(s)

Are “Events” misclassified as “Faults” ?

What metrics could be used to quantify the misclassification ?

How does the misclassification vary with:– Type of Fault– Type of Fault Detection method– Type of modality (Moisture, Temperature)

Is it possible to design a fault detection mechanism that minimizes the misclassification ?

Page 6: How Dirty is your Data  :  The Duality between detecting Events and Faults

Know Thy Faults

Short Faults– Sudden Change

in measurement

Noise Faults– Large variations in amplitude than expected– Little or no variation in amplitude (unresponsive)

Page 7: How Dirty is your Data  :  The Duality between detecting Events and Faults

Fault Detection Methods

SHORT Rule– If Xi – X(i-1) > δSHORT mark current measurement as fault (point method)

δSHORT is established from domain knowledge

NOISE Rule– Take W successive samples– IF (σW ≤ σtrain-σallow) OR (σW ≥ σtrain+σallow), mark all W readings as faulty (block method)– σtrain and σallow are established from training data

Linear Least-square Estimation (LLSE)– Estimate expected value of a sensor’s value using other sensors using LLSE

– If Xmodel – Xactual > δLLSE for k of the node’s neighbors, mark the reading as faulty (point method)

A. Sharma, L. Golubchik, and R. Govindan, “On the prevalence of sensor faults in real world deployments”, IEEE conference on Sensor, Mesh and Ad Hoc Communications and networks (SECON), 2007

Page 8: How Dirty is your Data  :  The Duality between detecting Events and Faults

Evaluation Metrics

Misclassification error (μ) for Point faults: μ = event readings tagged as faults / total event measurements

Total Misclassification (μ )= ∑i Di / ∑i Ei

Misclassification error (μ) for Block Faults:

Misclassification

Fault detection evaluation metric : False negative ratio = fraction of faults failed to be detected

Event Period (Ei)

time

Misclassification

Di

Event Period (Ei)

time

Di

Page 9: How Dirty is your Data  :  The Duality between detecting Events and Faults

Jug bay Deployment Map

2

5

6

Turtle Nests

38.784607, -76.700460

Weather Station

Courtesy: Google maps

Page 10: How Dirty is your Data  :  The Duality between detecting Events and Faults

Dataset

Sensor Data: Box temperature and soil moisture 3 motes from Jug Bay (previous slide) 5 months of data (sampled every 10 min.) Train Data Set (1 month), Test Data Set (4

months)

Event Ground Truth (Weather Data): Precipitation data collected from a weather

station ~ 700 m away (sampled every 15 min.) 21 major events (i.e. rainfall) occurred Total rainfall hours : 158 hours

Page 11: How Dirty is your Data  :  The Duality between detecting Events and Faults

Faults Ground Truth

Start with a cleandata set

Inject Faults to Establish groundTruth

Page 12: How Dirty is your Data  :  The Duality between detecting Events and Faults

Methodology

For Each Fault Detection Method & Each modality

Use 1st month’s data to Train

Obtain Model Parameters

Evaluate Method on Fault-Injected Test Data

Page 13: How Dirty is your Data  :  The Duality between detecting Events and Faults

Soil Moisture ‘SHORT RULE’

Reducing the number of misclassification errors increases false negatives

Page 14: How Dirty is your Data  :  The Duality between detecting Events and Faults

Misclassification LLSE method

Modality Misclassification error False Negatives

Box Temperature 0.3 % 77.19 %

Soil Moisture 46.3 % 50.03 %

Higher misclassification can occur due to :

Spatial & Temporal Heterogeneity of the soil

Page 15: How Dirty is your Data  :  The Duality between detecting Events and Faults

Lessons Learned

There exists a tension between detecting Events and Faults

Fault Detection Algorithms need to take this into consideration– Events can be misclassified as faults

Need for novel Fault Detection methods that are robust in the presence of Events

Page 16: How Dirty is your Data  :  The Duality between detecting Events and Faults

Need for Pattern Recognition techniques

Page 17: How Dirty is your Data  :  The Duality between detecting Events and Faults

Acknowledgements

Abhishek Sharma, Dept. of Computer Science, University of Southern California

Chris Swarth, Jug Bay Wetlands Sanctuary Life Under Your Feet team Marcus Chang, University of Copenhagen

(Courtesy : Andreas Terzis)

Page 18: How Dirty is your Data  :  The Duality between detecting Events and Faults

Questions !!!!