Research Article A Self-Learning Sensor Fault Detection...

9
Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2013, Article ID 712028, 8 pages http://dx.doi.org/10.1155/2013/712028 Research Article A Self-Learning Sensor Fault Detection Framework for Industry Monitoring IoT Yu Liu, 1 Yang Yang, 2 Xiaopeng Lv, 3 and Lifeng Wang 4 1 State Key Laboratory of Soſtware Development Environment, Beihang University, Beijing 100191, China 2 Information Center of Guangdong Power Grid Corporation, China Southern Power Grid, Guangzhou 510620, China 3 Beijing Guotie Huachen Communication & Infomation Technology Co., Ltd., Beijing 10070, China 4 Beijing Electronic Science and Technology Institute, Beijing 100070, China Correspondence should be addressed to Yu Liu; [email protected] Received 8 June 2013; Accepted 3 August 2013 Academic Editor: Zhongmei Zhou Copyright © 2013 Yu Liu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Many applications based on Internet of ings (IoT) technology have recently founded in industry monitoring area. ousands of sensors with different types work together in an industry monitoring system. Sensors at different locations can generate streaming data, which can be analyzed in the data center. In this paper, we propose a framework for online sensor fault detection. We motivate our technique in the context of the problem of the data value fault detection and event detection. We use the Statistics Sliding Windows (SSW) to contain the recent sensor data and regress each window by Gaussian distribution. e regression result can be used to detect the data value fault. Devices on a production line may work in different workloads and the associate sensors will have different status. We divide the sensors into several status groups according to different part of production flow chat. In this way, the status of a sensor is associated with others in the same group. We fit the values in the Status Transform Window (STW) to get the slope and generate a group trend vector. By comparing the current trend vector with history ones, we can detect a rational or irrational event. In order to determine parameters for each status group we build a self-learning worker thread in our framework which can edit the corresponding parameter according to the user feedback. Group-based fault detection (GbFD) algorithm is proposed in this paper. We test the framework with a simulation dataset extracted from real data of an oil field. Test result shows that GbFD detects 95% sensor fault successfully. 1. Introduction Internet of ings (IoT) has been paid more and more attention by the government, academe, and industry all over the world because of its great prospect [13]. In the IoT application field, intelligent industry is an important branch. A wired or wireless sensor network is the basic facility of the industry monitoring IoT. ese networks comprising of thousands of inexpensive sensors can report their values to the data center. e aim of the monitoring system is to guarantee the process of production. Fault detection is an important process for industry mon- itoring IoT, but it is a difficult and complex task because there are many factors that influence data and could cause faults. And faults are application and sensor type dependent [46]. Sensors in an industry monitoring IoT have three features: (1) Big : thousands of sensors on different devices are working together, (2) Multitypes: many physical quantities are needed to determine the production status, (3) Uncertainty: different workload is needed according to the production plan and some devices need shut down for examination. So the values of correlative sensors will change between different levels. From the data-centric view, we focus on the Outliers, Stuck-at faults and Spikes [7]. From the application and system view, we focus on the rational and irrational trend detections. A rational trend means the sensor value trans- forms from one level to another smoothly and it is caused by a rational reason, such as shut down a device. An irrational trend means value changed when something is wrong with a device. e typical two mistakes of the monitoring system are taking a rational trend for an Outlier, or ignoring an irrational trend while values are still in range.

Transcript of Research Article A Self-Learning Sensor Fault Detection...

Page 1: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 712028 8 pageshttpdxdoiorg1011552013712028

Research ArticleA Self-Learning Sensor Fault Detection Framework forIndustry Monitoring IoT

Yu Liu1 Yang Yang2 Xiaopeng Lv3 and Lifeng Wang4

1 State Key Laboratory of Software Development Environment Beihang University Beijing 100191 China2 Information Center of Guangdong Power Grid Corporation China Southern Power Grid Guangzhou 510620 China3 Beijing Guotie Huachen Communication amp Infomation Technology Co Ltd Beijing 10070 China4 Beijing Electronic Science and Technology Institute Beijing 100070 China

Correspondence should be addressed to Yu Liu liuyupapergmailcom

Received 8 June 2013 Accepted 3 August 2013

Academic Editor Zhongmei Zhou

Copyright copy 2013 Yu Liu et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Many applications based on Internet of Things (IoT) technology have recently founded in industry monitoring area Thousands ofsensors with different types work together in an industry monitoring system Sensors at different locations can generate streamingdata which can be analyzed in the data center In this paper we propose a framework for online sensor fault detectionWemotivateour technique in the context of the problem of the data value fault detection and event detection We use the Statistics SlidingWindows (SSW) to contain the recent sensor data and regress each window by Gaussian distribution The regression result can beused to detect the data value fault Devices on a production line may work in different workloads and the associate sensors will havedifferent status We divide the sensors into several status groups according to different part of production flow chat In this waythe status of a sensor is associated with others in the same group We fit the values in the Status Transform Window (STW) to getthe slope and generate a group trend vector By comparing the current trend vector with history ones we can detect a rational orirrational event In order to determine parameters for each status group we build a self-learning worker thread in our frameworkwhich can edit the corresponding parameter according to the user feedback Group-based fault detection (GbFD) algorithm isproposed in this paper We test the framework with a simulation dataset extracted from real data of an oil field Test result showsthat GbFD detects 95 sensor fault successfully

1 Introduction

Internet of Things (IoT) has been paid more and moreattention by the government academe and industry all overthe world because of its great prospect [1ndash3] In the IoTapplication field intelligent industry is an important branchA wired or wireless sensor network is the basic facility ofthe industry monitoring IoT These networks comprising ofthousands of inexpensive sensors can report their valuesto the data center The aim of the monitoring system is toguarantee the process of production

Fault detection is an important process for industrymon-itoring IoT but it is a difficult and complex task because thereare many factors that influence data and could cause faultsAnd faults are application and sensor type dependent [4ndash6]Sensors in an industry monitoring IoT have three features

(1) Big thousands of sensors on different devices are workingtogether (2) Multitypes many physical quantities are neededto determine the production status (3) Uncertainty differentworkload is needed according to the production plan andsome devices need shut down for examination So the valuesof correlative sensors will change between different levels

From the data-centric view we focus on the OutliersStuck-at faults and Spikes [7] From the application andsystem view we focus on the rational and irrational trenddetections A rational trend means the sensor value trans-forms from one level to another smoothly and it is caused bya rational reason such as shut down a device An irrationaltrend means value changed when something is wrong with adeviceThe typical twomistakes of themonitoring system aretaking a rational trend for anOutlier or ignoring an irrationaltrend while values are still in range

2 Mathematical Problems in Engineering

Wk W3 W2 W1middot middot middot Wr Wc

Time

Valu

e

VcVcminusm

Figure 1 Estimation of the data distribution in the Statistics SlidingWindows (SSW) for one time instance

P

P-0-1 E-2

E-3

E-1V-1

V-2

V-3

P-8

V-5

T

T-0-1

P

P-1-1T

T-1-1

P-1

P

P-2-1T

T-2-1

P

P-3-1T

T-3-1

L

L-1-1

L

L-2-1

L

L-3-1

P-2

P-3

Figure 2 Typical flowchat of a gas processing plant

In this paper we propose a self-learning sensor faultdetection framework for industry monitoring IoT The datamodel design is described in Section 3 In Section 4 theframework and the core algorithm are discussed A simula-tion experiment base on real data is shown in Section 5

2 Related Work

Many researchers pay their attention to building a smartmonitoring system Bressan et al [8] created a solid routinginfrastructure through RPL Castellani et al [9] concentrateon the actual implementation of the communication technol-ogy and presented a lightweight implementation of an EXIlibrary Yuan et al [10] present a parallel distributed structuralhealth monitoring technology based on the wireless sensornetwork An IoT communication framework for distributedworldwide health care applications is maintained in [11] Allthese works are focused on the basic frameworks protocolsand communication technologies of monitoring systems butdiscussed less on sensors management

For modeling sensor network data Guestrin et al [12]propose a framework for the nodes in the network to col-laborate in order to fit a global function to each of their localmeasurementsThis is a parametric approximation technique

0 10 20 30 40 50 60

0

10

20

30

40

50

60

70

80

90

100

Status transform window

Time (s)

Pressure P1Pressure P2Temperature T1

Sens

or v

alue

(KPa

forP

and∘C

forT

)Figure 3 Status transformation process for two time instances

and has more parameters then our approach References[13 14] study the problem of computing order statistics in asensor network There has also been work on predicting andcaching the values generated by the sensors [15 16] whichcan result in significant communication savings But all theseapproaches are not fit our setting

A similar approach for sensor fault detection in streamingdata is described by Yamanishi et al [17] In contrast to ourwork their method does not operate on sliding windows butrather on the entire history of the data values Chan et al[18] extend the study of algorithms formonitoring distributeddata streams fromwhole data streams to a time-based slidingwindow but their focus is on presenting a communication-efficient algorithm

Ding et al [19] combined trajectories of all nodes and theparamealgorithm which requires low computational over-head The proposed algorithm compared its sensor readingwith the median value of its neighborsrsquo readings Gao et al[20] approach WSN fault detection problems using spatialcorrelation with the assumption of similar reading withincross range of neighbor nodes Krishnamachari and Iyengar[21] tried to solve the faulty node detection problem by usinglocalized event region and they assume that the system knowsthe location of sensor In an industry monitoring sensornetwork finding out a neighbor automatically is very hardIn our approach we separate sensors into groups accordingto the production flow charts

3 Data Model Design

This section focuses on the sensor data model design Wemodel sensors from the view of value for Outliers Stuck-atfaults and Spikes detection and from the application view forevent detection The events we are interested in are rationaltrend and irrational trend

Mathematical Problems in Engineering 3

Self-learning sensor fault detection

Input sensor value (ID value)

Thread status queue

Working threads pool

Self-learning threadIterateTask timer

Feature selection

Trend vector

Incremental clustering

NB classification

Parameter feedback

Detection threadParameter filter

Outlier Stuck Spikes Status trend

Sensor fault warning

Application DB

DB operationControl flowIO interface

Output interface

Figure 4 A self-learning sensor fault detection framework

31 Sensor Value For detection the Outliers Stuck-at faultsand Spikes we propose a statistics method Figure 1 shows themechanism of Statistics Sliding Windows (SSW) For sensor119904 the current value V

119888and the previous 119898 values form the

current windows 119882119888= V119888minus119898

V119888minus119898+1

V119888minus119898+2

V119888 In the

recent history we can define 119896windows with the same lengthand get the set 119867V = 119882

111988221198823 119882

119896 We estimate

the values in each sliding window by Gaussian distribution119882119894

rarr 119873(120583119894 1205902119894) (Formula (1)) If 1205902

119888= 0 a Stuck-at fault

is detected And when 1205902119888is big enough a Spikes fault may

happened

119882119894997888rarr 119873(120583 120590

2

)

120583 =sum119898

119896=1V119896

119898 1205902

=sum119898

119896=1(V119896minus 120583)2

119898

(1)

There is a buffer named119882119903in front of the current window

119882119888 With the new samples coming into 119882

119888 the obsolete

samples deserted by 119882119888will become a member of 119882

119903 When

the size of 119882119903reach 119898 the oldest window in 119867V will be

discarded and the current 119882119903will join 119867V as 1198821 With SSW

large numbers of historical sensor values are regressed to 119896

pairs of Gaussian characteristics In a real application weneed not hold all the recent values in the memory

32 Status Group In the industry monitoring IoT the statusof a production line may be uncertain With the differentmanufacturing techniques and different workloads somedevices in the production line may be shut down In thiscase the whole production line is still working but valuesof sensors monitoring the shutdown devices will run out of

range (Outliers) At the same time related devices may alsochange with the shutting down operation In an industrymonitoring IoT the rational status transformation of sensorsshould be recognized and ignored

Figure 2 shows a typical flow chat of a gas-processingplant P-1 P-2 and P-3 are three parallel subpipelines whichare controlled by V-1 V-2 and V-3 respectively Each sub-pipeline can be shut down independently and this operationwill affect the value of P-0-1 a pressure sensor on the maininput pipeline In our approach we deposit the sensors inthe goal-processing plant into several status groups Althoughthe application background is important to the dispositionmethod we can follow some common rules as follows

119866 = 1198781 1198782 119878

119899 119878

119894isin VALVE

119875 119899 le 10 (2)

(1) Sensors on the opposite side of a valve are not inthe same group A valve is a typical controller inthe industrial IoT All the devices on the backwardposition can be shut down by the proper valve Thesensors on the different sidemay be in different status

(2) If there are too many sensors which are controlledby one slave they should be divided into differentgroups In our approach we will not put more than10 sensors into one status group because the probablestatus space will expand acutely with the increasingsensor numbers In this case sensors can be groupedby their relative position with the most complexdevice because the complex device may lead to statuschange most possibly

4 Mathematical Problems in Engineering

(1) let119882119898 be the statistics sliding windows size(2) let 120576

119904be the outlier detection threshold for sensor S and let P be the

global list to keep all of the 120576119904

(3) let U and V be the global arrays to keep the last 10 gaussiandistribution characteristics for each sensor

(4) let119882119899 be the status transform windows size and 1205791198944 be the trend

vector merging threshold for group 119866119894 all the rational trend vectors

of 119866119894are stored in 119877

119894

(5) procedure GFD ()(6) init P loading 120576 from Appilcation DB(7) init U and V loading the last 10 distribution characteristics from

Application DB(8) create C DetectionThread threads 119862 = 119873119906119898

119888119901119906lowast 2

(9) start SelfLearningThread()(10) do(11) get a value set V

119894for sensors in a group 119866

119894

(12) find a idle DetectionThread(13) DetectionThread(V

119894)

(14) while not end(15) return(16) procedureDetectionThread(V

119894)

(17) if (IsStuck(V119894) or IsSpikes(V

119894))

(18) return(19) if (IsOutlier(V

119894) and not IsRatStatChange(V

119894))

(20) return(21) IsRatStatChange(V

119894)

(22) return(23) procedure IsStuck(V

119894)

(24) get 120590119894119895

2 for sensor j by 119879119888

(25) if (120590119894119895

2 = 0)(26) mark 119904

119894119895as a Stuck

(27) return(28) procedure IsSpikes(V

119894)

(29) get 120583119894119895and 120590

119894119895

2 for sensor j by 119879119888

(30) if (120583119894119895mean(119880(120583

119894)) lt 120585 and 120590

119894119895

2mean(119881(120590119894

2)) gt 120591)(31) mark 119904

119894119895as a Spikes

(32) return(33) procedure IsOutlier(V

119894)

(34) use U and V to caculate 120579119894119895

(35) if (120579119894119895gt 120576119894119895) mark 119904

119894119895as an Outlier

(36) return(37) procedure IsRatStatChange(V

119894)

(38) use the last 119899 values of 119904119894to fit the trend vector 119878

119894

(39) for each rational trend 119903119894in 119877119894

(40) if (cosine(119903119894 119878119894) lt 1205791198944)

(41) mark 119878119894as a rational status change 119878

119894119895= mean(119878

119894119895 119903119894119895)

(42) mark the max unstable sensor 119904119894119895as a sensor falut

(43) return(44) procudure SelfLearningThread()(45) while (true) do(46) load the user feedback(47) for each miss alart(48) if missed Outlier then 120576

119894119895= 120576119894119895divide 120582

(49) if missed irrational status change then 120579119894= 120579119894times 120582

(50) IncClustering(119878119894 null)

(51) for each false alart(52) if false Outlier then 120576

119894119895= 120576119894119895times 120582

(53) if false irrational status change then 120579119894= 120579119894divide 120582

(54) IncClustering(119878119894 119903119894119895)

(55) sleep(119879) continues

Algorithm 1 Continued

Mathematical Problems in Engineering 5

(56) return(57) procedure IncClustering(119878

119894119903119894119895)

(58) if 119903119894119895is not null then for each 119878

119894119895in 119878119894

(59) if there is cosine(119903119894 119878119894) lt 120579119894then 119878

119894119895= mean(119878

119894119895 119903119894119895)

(60) else add 119903119894119895into 119878

119894

(61) for each two 119878119894119895 119878119894119896in 119878119894

(62) if (cosine(119878119894119895 119878119894119896) lt 120579119894)

(63) 119878119894119895= mean(119878

119894119895 119878119894119896) remove 119878

119894119896

(64) return

Algorithm 1 A group-based fault detection Algorithm

(3) The production line is normally divided into manyunits according to the geographical position We canignore the relationship between sensors in differentsections

For the 11 sensors shown in Figure 2 we divide theminto four groups which are 119866

0= 119875

0minus1 1198790minus1

1198661

=

1198751minus1

1198711minus1

1198791minus1

1198662

= 1198752minus1

1198712minus1

1198792minus1

and 1198663

=

1198753minus1

1198713minus1

1198793minus1

1198660represents the status of the main input

pipeline 1198661 1198662 and 119866

3are status groups which are derived

from the three parallel subpipelines respectively

33 Status Model For a status group 119866 = 1198781 1198782 119878

119899 all

the sensors in 119866may have different trends when the produc-tion status changed Figure 3 shows a status transformationprocess 119875

1 1198752 and 119879

1are stable in the first 30 and the

last 20 seconds The interval from 30 s to 40 s is called theStatus Transform Window (STW) 119875

1and 119875

2increase to a

new level while 1198791keep the same trend And the slopes of 119875

1

and 1198752are different We use the trend vector which contains

all the slopes of one sensorsrsquos group to represent the statustransformation that is 119878 = 119896

1 1198962 119896

119899 The trend of a

sensor can be found by fitting its values by a specific size ofSTW Here we use the least-square method [22] (Formula(3)) to fit the values and record the rational trend vectorby the sensor index We use the key idea of incrementalclustering algorithm [23] to handle the trend vectors get thecosine angle between the current trend and existing vectorsrespectively If the angle is big enough then mark it a newtrend Otherwise a repeated trend is found and we only needto merge it with the closest vector In Section 4 the specificclustering method will be given for details

= 120572 + 120573119883

120572 =sum119884

119899minus

120573sum119883

119899 120573 =

119899sum119883119884 minus sum119883sum119884

119899sum1198832 minus (sum119883)2

(3)

4 Design of Architecture

In this section we propose the architecture for sensor faultdetection in industry monitoring at firstThen how to realizeour algorithm is discussed

Table 1 Simulation data description

Feature ValueSampling rate 60 sSensor count 5800Total samples 75168 millionStatus groups 1340Outliers 437Stuck-at amp Spikes 163RT missed 961IRT missed 35

41 A Self-Learning Framework The self-learning sensorfault detection architecture is shown in Figure 4 There arefour modules in our approach

(1) Application DB all the parameters are stored in theApplication DB including the threshold 120576

119904for sensor

119904 and the history statistics result 120583119894 1205902119894 The grouping

information is also serialized in this database ThisDB is the interface for high layer application whichcan get the sensor fault prediction and input the userfeedback

(2) DetectionThread it is a background service and con-tains the main detecting process Since our approachis an online detection a series of detection thread willbe created andmaintained by theworking thread pooland a related status queue When new data is comingthe working thread dispatcher a wakes up a pendingthread to handle it

(3) Self-Learning Thread the self-learning thread usesthe OS timer as a driver The user feedback about thedetection result will be rechecked by this thread torevise the trend vectors

42 The GbFD Algorithm Dividing the sensors into statusgroup is the key idea in our approach For group-based sensorfault detection we propose the (group-based fault detection)GbFD algorithm

The GbFD Algorithm 1 starts by initializing the globalparameters (line 6-7) then it instantiates the two coreprocesses SelfLearningThread andDetectionThreadThe input

6 Mathematical Problems in Engineering

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(a) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(b) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(c) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(d) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

OtlrS amp S

RTIRT

Prec

ision

(e) 119882119898 = 90119882119899 = 25

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(f) 119882119898 = 90119882119899 = 25

Figure 5 Four-folder cross-validation for precision (a c e) and recall (b d f) with different119882119898 and119882119899

Mathematical Problems in Engineering 7

of GbFD is defined as a set that contains one time samples fora status group

In the DetectionTread we first check the Stuck-at faultsand Spikes by calling the IsStuck and IsSpikes methods AndIsRatStaChange is called two times (line 19ndash21) to detectthe sensor status transformation When an Outlier wasdetected GbFD should give the conclusion whether it isa rational operation such as shut down the device and ifno Outlier happened we also need to check the abnormalstatus transformation The first calling of IsRatStaChange iscontrolled by the IsOutlier method with an AND logic nomore algorithm complexity is increased by the twice callingThe SelfLearningThread works as a background service Itsresponsibility is learning the user feedback to find out themissing detection and the false detection and adjusting therelational parameters For a new rational trend vector wewill add it to the proper group rational trend history by theIncClustering method which can recluster the grouped trendvectors according to the specified angle threshold

The time complexity of the GbFD algorithm is dominatedby IsOutlier and IsRatStuChange methods For a sensor ingroup119866 the detection computation takes119874((119882

119899+119889)times119882119898)119882119899 is the STW size 119889 is the length of 119866 and 119882119898 is theSSW size In a real application the two windows sizes aresteerable and 119889 is less than 20 for the most part Moreovera proper thread dispatch mechanism will guarantee GbFDto handle the real-time detection task And the self-learningprocess is more complex due to many iterative operationsbut it is a background service and does not require real-timeperformance

5 Experimental Evaluation

We built an application to evaluate our framework Thisapplication was implemented in JAVA about 2400 lines codeand used Oracle as the Application DB

51 Data Preparation We use real data from an oil field inChina This oil field has 20 oilgas treatment plants and all ofthe production equipments are monitored by sensors whichcan be mainly classified as temperature sensor pressuresensor and liquid level sensor The sampling rate of theproduction IoT is 60 seconds We obtained the sample dataof 10000 sensors between January 1st 8 PM to October 31th8AM 2012 Since the production lines are relatively stablewe filtered the original data by two steps Firstly someproduction units that never make a mistake were eliminatedSecondly for a plant we discard somedata in its stable period

Table 1 shows features of our simulation dataset Wechoose more than 751 million samples from 5800 sensorsAccording to the corresponding flow charts we separate thesensors into 1340 groups Each group has 433 sensors inaverage and the maximum group contains 8 sensors Weanalyze the user feedback history and find out four typicalerrors Outlier means that a value runs out of range and itis caused by the sensor failure We put Stuck-at faults andSpikes together The Rational Trend (RT) missed fault meansthat sensors submit an exceptional value after a rational user

operation such as shuting down a device for examinationAnd the Irrational Trend (IRT) missed fault means thatsomething wrong happened with the production line andthe sensor values changed with it but still suitable with thethreshold condition

52 Experiment We split the simulation data into fourdatasets according to the time sequence Each time we useoptional three data sets to train the GbFD algorithm and theremainder is used as the testing data

We use three pairs SSW size and STW size which are(30 15) (60 20) and (90 25) to run the four-folder cross-validation test The result is shown in Figure 5 When 119882

119898 =

30 and119882119899 = 15 each cross get a precision of about 80Thisresult is not good enough A satisfactory result is generatedin the next test 119882119898 = 60 and 119882119899 = 20 get a mean95 accuracy But with the increase of two windows sizethe accuracy of GbFD will go to an opposite direction Thisphenomenon indicates that the sizes of SSW and STW arestrong correlating with our algorithm Choosing the properwindows size can increase the detection sensitivity and thewindows size (60 20) is suitable with our simulation data

6 Conclusions

We present a self-learning sensor fault detection frameworkin this paper We propose a model which can represent thesensor value sensor relationship and sensor status transfor-mation GbFD algorithm is proposed to detect the sensorfault And we use real data from an oil field for validationExperimental results show that our system can detect 95 ofdata fault in the simulation datawhich contains 75168millionsamples from 5800 sensors

We will continue validate our approach on other datasetto find out the proper statistical sliding window size and sta-tus transform windows size in different application contextsOur goal is to build a sensor health management system forindustry IoT that includes not only sensor fault detectionbut also sensor lifecycle prediction and sensor inspectionmanagement

Acknowledgments

The authors would like to thank Jun Ma for helping themto dispose the simulation data This research is supported bythe National Natural Science Funds of China (61202238) theState Key Laboratory of Software Development EnvironmentFunds (SKLSDE-2011ZX-08) and the Fundamental ResearchFunds for the Central Universities (2012RC0209)

References

[1] Y Liu and G Zhou ldquoKey technologies and applications ofinternet of thingsrdquo in Proceedings of the 5th International Con-ference on Intelligent Computation Technology and Automation(ICICTA rsquo12) pp 197ndash200 Zhangjiajie China January 2012

[2] W Baoyun ldquoReview on Internet ofThingsrdquo Journal of ElectronicMeasurement and Instrument vol 23 no 12 pp 1ndash7 2009

8 Mathematical Problems in Engineering

[3] S Haller S Karnouskos and C Schroth ldquoThe internet ofthings in an enterprise contextrdquo in Future Internet vol 5468 ofLecture Notes in Computer Science pp 14ndash28 Springer BerlinGermany 2009

[4] I F AkyildizW Su Y Sankarasubramaniam and E Cayirci ldquoAsurvey on sensor networksrdquo IEEE Communications Magazinevol 40 no 8 pp 102ndash105 2002

[5] L Paradis and Q Han ldquoA survey of fault management inwireless sensor networksrdquo Journal of Network and SystemsManagement vol 15 no 2 pp 171ndash190 2007

[6] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[7] KNiN RamanathanMNHChehade et al ldquoSensor networkdata fault typesrdquo ACM Transactions on Sensor Networks vol 5no 3 pp 1ndash29 2009

[8] N Bressan L Bazzaco N Bui P Casari L Vangelista andM Zorzi ldquoThe deployment of a smart monitoring systemusing wireless sensor and actuator networksrdquo Proceedings ofInternational Conference on Smart Grid Communications pp49ndash54 2010

[9] A P Castellani M Gheda N Bui M Rossi and M ZorzildquoWeb services for the Internet of things through CoAP andEXIrdquo in Proceedings of the IEEE International Conference onCommunications Workshops (ICC rsquo11) pp 1ndash6 Kyoto JapanJune 2011

[10] S YuanX Lai X ZhaoXXu andL Zhang ldquoDistributed struc-tural health monitoring system based on smart wireless sensorand multi-agent technologyrdquo Smart Materials and Structuresvol 15 no 1 pp 1ndash8 2006

[11] N Bui and M Zorzi ldquoHealth care applications a solutionbased on the Internet of Thingsrdquo in Proceedings of the 4thInternational Symposium on Applied Sciences in Biomedical andCommunication Technologies (ISABEL rsquo11) article 131 October2011

[12] C Guestrin P Bodik R Thibaux M Paskin and S MaddenldquoDistributed regression an efficient framework for modelingsensor network datardquo in Proceedings of the 3rd InternationalSymposium on Information Processing in Sensor Networks (IPSNrsquo04) pp 1ndash10 Berkeley Calif USA April 2004

[13] J Considine M Hadjieleftheriou F Li J Byers and G KolliosldquoRobust approximate aggregation in sensor data managementsystemsrdquo ACM Transactions on Database Systems vol 34 no 1article 6 2009

[14] M B Greenwald and S Khanna ldquoPower-conserving compu-tation of order-statistics over sensor networksrdquo in Proceedingsof the 23rd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems (PODS rsquo04) pp 275ndash285 ParisFrance June 2004

[15] A Jain E Y Chang and Y-F Wang ldquoAdaptive stream resourcemanagement using Kalman Filtersrdquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 11ndash22 Paris France June 2004

[16] C Olston J Jiang and J Widom ldquoAdaptive filters for contin-uous queries over distributed data streamsrdquo in Proceedings ofthe ACM SIGMOD International Conference on Management ofData pp 563ndash574 June 2003

[17] K Yamanishi J-I Takeuchi and G Williams ldquoOn-line unsu-pervised outlier detection using finite mixtures with dis-counting learning algorithmsrdquo in Proceedings of the 6th ACM

SIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo01) pp 320ndash324 August 2000

[18] H-L Chan T-W Lam L-K Lee and H-F Ting ldquoContinuousmonitoring of distributed data streams over a time-basedsliding windowrdquo Algorithmica vol 62 no 3-4 pp 1088ndash11112012

[19] M Ding D Chen K Xing and X Cheng ldquoLocalized fault-tolerant event boundary detection in sensor networksrdquo inProceedings of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM rsquo05) pp902ndash913 March 2005

[20] J-L Gao Y-J Xu and X-W Li ldquoWeighted-median baseddistributed fault detection forwireless sensor networksrdquo Journalof Software vol 18 no 5 pp 1208ndash1217 2007

[21] B Krishnamachari and S Iyengar ldquoDistributed Bayesian algo-rithms for fault-tolerant event region detection in wirelesssensor networksrdquo IEEE Transactions on Computers vol 53 no3 pp 241ndash250 2004

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo Journal of the Society For Industrial ampApplied Mathematics vol 11 no 2 pp 431ndash441 1963

[23] X Su Y Lan R Wan and Y Qin Y ldquoA fast incremental cluster-ing algorithmrdquo in Proceedings of the International Symposiumon Information Processing (ISIP rsquo09) pp 175ndash178 HuangshanChina 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

2 Mathematical Problems in Engineering

Wk W3 W2 W1middot middot middot Wr Wc

Time

Valu

e

VcVcminusm

Figure 1 Estimation of the data distribution in the Statistics SlidingWindows (SSW) for one time instance

P

P-0-1 E-2

E-3

E-1V-1

V-2

V-3

P-8

V-5

T

T-0-1

P

P-1-1T

T-1-1

P-1

P

P-2-1T

T-2-1

P

P-3-1T

T-3-1

L

L-1-1

L

L-2-1

L

L-3-1

P-2

P-3

Figure 2 Typical flowchat of a gas processing plant

In this paper we propose a self-learning sensor faultdetection framework for industry monitoring IoT The datamodel design is described in Section 3 In Section 4 theframework and the core algorithm are discussed A simula-tion experiment base on real data is shown in Section 5

2 Related Work

Many researchers pay their attention to building a smartmonitoring system Bressan et al [8] created a solid routinginfrastructure through RPL Castellani et al [9] concentrateon the actual implementation of the communication technol-ogy and presented a lightweight implementation of an EXIlibrary Yuan et al [10] present a parallel distributed structuralhealth monitoring technology based on the wireless sensornetwork An IoT communication framework for distributedworldwide health care applications is maintained in [11] Allthese works are focused on the basic frameworks protocolsand communication technologies of monitoring systems butdiscussed less on sensors management

For modeling sensor network data Guestrin et al [12]propose a framework for the nodes in the network to col-laborate in order to fit a global function to each of their localmeasurementsThis is a parametric approximation technique

0 10 20 30 40 50 60

0

10

20

30

40

50

60

70

80

90

100

Status transform window

Time (s)

Pressure P1Pressure P2Temperature T1

Sens

or v

alue

(KPa

forP

and∘C

forT

)Figure 3 Status transformation process for two time instances

and has more parameters then our approach References[13 14] study the problem of computing order statistics in asensor network There has also been work on predicting andcaching the values generated by the sensors [15 16] whichcan result in significant communication savings But all theseapproaches are not fit our setting

A similar approach for sensor fault detection in streamingdata is described by Yamanishi et al [17] In contrast to ourwork their method does not operate on sliding windows butrather on the entire history of the data values Chan et al[18] extend the study of algorithms formonitoring distributeddata streams fromwhole data streams to a time-based slidingwindow but their focus is on presenting a communication-efficient algorithm

Ding et al [19] combined trajectories of all nodes and theparamealgorithm which requires low computational over-head The proposed algorithm compared its sensor readingwith the median value of its neighborsrsquo readings Gao et al[20] approach WSN fault detection problems using spatialcorrelation with the assumption of similar reading withincross range of neighbor nodes Krishnamachari and Iyengar[21] tried to solve the faulty node detection problem by usinglocalized event region and they assume that the system knowsthe location of sensor In an industry monitoring sensornetwork finding out a neighbor automatically is very hardIn our approach we separate sensors into groups accordingto the production flow charts

3 Data Model Design

This section focuses on the sensor data model design Wemodel sensors from the view of value for Outliers Stuck-atfaults and Spikes detection and from the application view forevent detection The events we are interested in are rationaltrend and irrational trend

Mathematical Problems in Engineering 3

Self-learning sensor fault detection

Input sensor value (ID value)

Thread status queue

Working threads pool

Self-learning threadIterateTask timer

Feature selection

Trend vector

Incremental clustering

NB classification

Parameter feedback

Detection threadParameter filter

Outlier Stuck Spikes Status trend

Sensor fault warning

Application DB

DB operationControl flowIO interface

Output interface

Figure 4 A self-learning sensor fault detection framework

31 Sensor Value For detection the Outliers Stuck-at faultsand Spikes we propose a statistics method Figure 1 shows themechanism of Statistics Sliding Windows (SSW) For sensor119904 the current value V

119888and the previous 119898 values form the

current windows 119882119888= V119888minus119898

V119888minus119898+1

V119888minus119898+2

V119888 In the

recent history we can define 119896windows with the same lengthand get the set 119867V = 119882

111988221198823 119882

119896 We estimate

the values in each sliding window by Gaussian distribution119882119894

rarr 119873(120583119894 1205902119894) (Formula (1)) If 1205902

119888= 0 a Stuck-at fault

is detected And when 1205902119888is big enough a Spikes fault may

happened

119882119894997888rarr 119873(120583 120590

2

)

120583 =sum119898

119896=1V119896

119898 1205902

=sum119898

119896=1(V119896minus 120583)2

119898

(1)

There is a buffer named119882119903in front of the current window

119882119888 With the new samples coming into 119882

119888 the obsolete

samples deserted by 119882119888will become a member of 119882

119903 When

the size of 119882119903reach 119898 the oldest window in 119867V will be

discarded and the current 119882119903will join 119867V as 1198821 With SSW

large numbers of historical sensor values are regressed to 119896

pairs of Gaussian characteristics In a real application weneed not hold all the recent values in the memory

32 Status Group In the industry monitoring IoT the statusof a production line may be uncertain With the differentmanufacturing techniques and different workloads somedevices in the production line may be shut down In thiscase the whole production line is still working but valuesof sensors monitoring the shutdown devices will run out of

range (Outliers) At the same time related devices may alsochange with the shutting down operation In an industrymonitoring IoT the rational status transformation of sensorsshould be recognized and ignored

Figure 2 shows a typical flow chat of a gas-processingplant P-1 P-2 and P-3 are three parallel subpipelines whichare controlled by V-1 V-2 and V-3 respectively Each sub-pipeline can be shut down independently and this operationwill affect the value of P-0-1 a pressure sensor on the maininput pipeline In our approach we deposit the sensors inthe goal-processing plant into several status groups Althoughthe application background is important to the dispositionmethod we can follow some common rules as follows

119866 = 1198781 1198782 119878

119899 119878

119894isin VALVE

119875 119899 le 10 (2)

(1) Sensors on the opposite side of a valve are not inthe same group A valve is a typical controller inthe industrial IoT All the devices on the backwardposition can be shut down by the proper valve Thesensors on the different sidemay be in different status

(2) If there are too many sensors which are controlledby one slave they should be divided into differentgroups In our approach we will not put more than10 sensors into one status group because the probablestatus space will expand acutely with the increasingsensor numbers In this case sensors can be groupedby their relative position with the most complexdevice because the complex device may lead to statuschange most possibly

4 Mathematical Problems in Engineering

(1) let119882119898 be the statistics sliding windows size(2) let 120576

119904be the outlier detection threshold for sensor S and let P be the

global list to keep all of the 120576119904

(3) let U and V be the global arrays to keep the last 10 gaussiandistribution characteristics for each sensor

(4) let119882119899 be the status transform windows size and 1205791198944 be the trend

vector merging threshold for group 119866119894 all the rational trend vectors

of 119866119894are stored in 119877

119894

(5) procedure GFD ()(6) init P loading 120576 from Appilcation DB(7) init U and V loading the last 10 distribution characteristics from

Application DB(8) create C DetectionThread threads 119862 = 119873119906119898

119888119901119906lowast 2

(9) start SelfLearningThread()(10) do(11) get a value set V

119894for sensors in a group 119866

119894

(12) find a idle DetectionThread(13) DetectionThread(V

119894)

(14) while not end(15) return(16) procedureDetectionThread(V

119894)

(17) if (IsStuck(V119894) or IsSpikes(V

119894))

(18) return(19) if (IsOutlier(V

119894) and not IsRatStatChange(V

119894))

(20) return(21) IsRatStatChange(V

119894)

(22) return(23) procedure IsStuck(V

119894)

(24) get 120590119894119895

2 for sensor j by 119879119888

(25) if (120590119894119895

2 = 0)(26) mark 119904

119894119895as a Stuck

(27) return(28) procedure IsSpikes(V

119894)

(29) get 120583119894119895and 120590

119894119895

2 for sensor j by 119879119888

(30) if (120583119894119895mean(119880(120583

119894)) lt 120585 and 120590

119894119895

2mean(119881(120590119894

2)) gt 120591)(31) mark 119904

119894119895as a Spikes

(32) return(33) procedure IsOutlier(V

119894)

(34) use U and V to caculate 120579119894119895

(35) if (120579119894119895gt 120576119894119895) mark 119904

119894119895as an Outlier

(36) return(37) procedure IsRatStatChange(V

119894)

(38) use the last 119899 values of 119904119894to fit the trend vector 119878

119894

(39) for each rational trend 119903119894in 119877119894

(40) if (cosine(119903119894 119878119894) lt 1205791198944)

(41) mark 119878119894as a rational status change 119878

119894119895= mean(119878

119894119895 119903119894119895)

(42) mark the max unstable sensor 119904119894119895as a sensor falut

(43) return(44) procudure SelfLearningThread()(45) while (true) do(46) load the user feedback(47) for each miss alart(48) if missed Outlier then 120576

119894119895= 120576119894119895divide 120582

(49) if missed irrational status change then 120579119894= 120579119894times 120582

(50) IncClustering(119878119894 null)

(51) for each false alart(52) if false Outlier then 120576

119894119895= 120576119894119895times 120582

(53) if false irrational status change then 120579119894= 120579119894divide 120582

(54) IncClustering(119878119894 119903119894119895)

(55) sleep(119879) continues

Algorithm 1 Continued

Mathematical Problems in Engineering 5

(56) return(57) procedure IncClustering(119878

119894119903119894119895)

(58) if 119903119894119895is not null then for each 119878

119894119895in 119878119894

(59) if there is cosine(119903119894 119878119894) lt 120579119894then 119878

119894119895= mean(119878

119894119895 119903119894119895)

(60) else add 119903119894119895into 119878

119894

(61) for each two 119878119894119895 119878119894119896in 119878119894

(62) if (cosine(119878119894119895 119878119894119896) lt 120579119894)

(63) 119878119894119895= mean(119878

119894119895 119878119894119896) remove 119878

119894119896

(64) return

Algorithm 1 A group-based fault detection Algorithm

(3) The production line is normally divided into manyunits according to the geographical position We canignore the relationship between sensors in differentsections

For the 11 sensors shown in Figure 2 we divide theminto four groups which are 119866

0= 119875

0minus1 1198790minus1

1198661

=

1198751minus1

1198711minus1

1198791minus1

1198662

= 1198752minus1

1198712minus1

1198792minus1

and 1198663

=

1198753minus1

1198713minus1

1198793minus1

1198660represents the status of the main input

pipeline 1198661 1198662 and 119866

3are status groups which are derived

from the three parallel subpipelines respectively

33 Status Model For a status group 119866 = 1198781 1198782 119878

119899 all

the sensors in 119866may have different trends when the produc-tion status changed Figure 3 shows a status transformationprocess 119875

1 1198752 and 119879

1are stable in the first 30 and the

last 20 seconds The interval from 30 s to 40 s is called theStatus Transform Window (STW) 119875

1and 119875

2increase to a

new level while 1198791keep the same trend And the slopes of 119875

1

and 1198752are different We use the trend vector which contains

all the slopes of one sensorsrsquos group to represent the statustransformation that is 119878 = 119896

1 1198962 119896

119899 The trend of a

sensor can be found by fitting its values by a specific size ofSTW Here we use the least-square method [22] (Formula(3)) to fit the values and record the rational trend vectorby the sensor index We use the key idea of incrementalclustering algorithm [23] to handle the trend vectors get thecosine angle between the current trend and existing vectorsrespectively If the angle is big enough then mark it a newtrend Otherwise a repeated trend is found and we only needto merge it with the closest vector In Section 4 the specificclustering method will be given for details

= 120572 + 120573119883

120572 =sum119884

119899minus

120573sum119883

119899 120573 =

119899sum119883119884 minus sum119883sum119884

119899sum1198832 minus (sum119883)2

(3)

4 Design of Architecture

In this section we propose the architecture for sensor faultdetection in industry monitoring at firstThen how to realizeour algorithm is discussed

Table 1 Simulation data description

Feature ValueSampling rate 60 sSensor count 5800Total samples 75168 millionStatus groups 1340Outliers 437Stuck-at amp Spikes 163RT missed 961IRT missed 35

41 A Self-Learning Framework The self-learning sensorfault detection architecture is shown in Figure 4 There arefour modules in our approach

(1) Application DB all the parameters are stored in theApplication DB including the threshold 120576

119904for sensor

119904 and the history statistics result 120583119894 1205902119894 The grouping

information is also serialized in this database ThisDB is the interface for high layer application whichcan get the sensor fault prediction and input the userfeedback

(2) DetectionThread it is a background service and con-tains the main detecting process Since our approachis an online detection a series of detection thread willbe created andmaintained by theworking thread pooland a related status queue When new data is comingthe working thread dispatcher a wakes up a pendingthread to handle it

(3) Self-Learning Thread the self-learning thread usesthe OS timer as a driver The user feedback about thedetection result will be rechecked by this thread torevise the trend vectors

42 The GbFD Algorithm Dividing the sensors into statusgroup is the key idea in our approach For group-based sensorfault detection we propose the (group-based fault detection)GbFD algorithm

The GbFD Algorithm 1 starts by initializing the globalparameters (line 6-7) then it instantiates the two coreprocesses SelfLearningThread andDetectionThreadThe input

6 Mathematical Problems in Engineering

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(a) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(b) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(c) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(d) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

OtlrS amp S

RTIRT

Prec

ision

(e) 119882119898 = 90119882119899 = 25

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(f) 119882119898 = 90119882119899 = 25

Figure 5 Four-folder cross-validation for precision (a c e) and recall (b d f) with different119882119898 and119882119899

Mathematical Problems in Engineering 7

of GbFD is defined as a set that contains one time samples fora status group

In the DetectionTread we first check the Stuck-at faultsand Spikes by calling the IsStuck and IsSpikes methods AndIsRatStaChange is called two times (line 19ndash21) to detectthe sensor status transformation When an Outlier wasdetected GbFD should give the conclusion whether it isa rational operation such as shut down the device and ifno Outlier happened we also need to check the abnormalstatus transformation The first calling of IsRatStaChange iscontrolled by the IsOutlier method with an AND logic nomore algorithm complexity is increased by the twice callingThe SelfLearningThread works as a background service Itsresponsibility is learning the user feedback to find out themissing detection and the false detection and adjusting therelational parameters For a new rational trend vector wewill add it to the proper group rational trend history by theIncClustering method which can recluster the grouped trendvectors according to the specified angle threshold

The time complexity of the GbFD algorithm is dominatedby IsOutlier and IsRatStuChange methods For a sensor ingroup119866 the detection computation takes119874((119882

119899+119889)times119882119898)119882119899 is the STW size 119889 is the length of 119866 and 119882119898 is theSSW size In a real application the two windows sizes aresteerable and 119889 is less than 20 for the most part Moreovera proper thread dispatch mechanism will guarantee GbFDto handle the real-time detection task And the self-learningprocess is more complex due to many iterative operationsbut it is a background service and does not require real-timeperformance

5 Experimental Evaluation

We built an application to evaluate our framework Thisapplication was implemented in JAVA about 2400 lines codeand used Oracle as the Application DB

51 Data Preparation We use real data from an oil field inChina This oil field has 20 oilgas treatment plants and all ofthe production equipments are monitored by sensors whichcan be mainly classified as temperature sensor pressuresensor and liquid level sensor The sampling rate of theproduction IoT is 60 seconds We obtained the sample dataof 10000 sensors between January 1st 8 PM to October 31th8AM 2012 Since the production lines are relatively stablewe filtered the original data by two steps Firstly someproduction units that never make a mistake were eliminatedSecondly for a plant we discard somedata in its stable period

Table 1 shows features of our simulation dataset Wechoose more than 751 million samples from 5800 sensorsAccording to the corresponding flow charts we separate thesensors into 1340 groups Each group has 433 sensors inaverage and the maximum group contains 8 sensors Weanalyze the user feedback history and find out four typicalerrors Outlier means that a value runs out of range and itis caused by the sensor failure We put Stuck-at faults andSpikes together The Rational Trend (RT) missed fault meansthat sensors submit an exceptional value after a rational user

operation such as shuting down a device for examinationAnd the Irrational Trend (IRT) missed fault means thatsomething wrong happened with the production line andthe sensor values changed with it but still suitable with thethreshold condition

52 Experiment We split the simulation data into fourdatasets according to the time sequence Each time we useoptional three data sets to train the GbFD algorithm and theremainder is used as the testing data

We use three pairs SSW size and STW size which are(30 15) (60 20) and (90 25) to run the four-folder cross-validation test The result is shown in Figure 5 When 119882

119898 =

30 and119882119899 = 15 each cross get a precision of about 80Thisresult is not good enough A satisfactory result is generatedin the next test 119882119898 = 60 and 119882119899 = 20 get a mean95 accuracy But with the increase of two windows sizethe accuracy of GbFD will go to an opposite direction Thisphenomenon indicates that the sizes of SSW and STW arestrong correlating with our algorithm Choosing the properwindows size can increase the detection sensitivity and thewindows size (60 20) is suitable with our simulation data

6 Conclusions

We present a self-learning sensor fault detection frameworkin this paper We propose a model which can represent thesensor value sensor relationship and sensor status transfor-mation GbFD algorithm is proposed to detect the sensorfault And we use real data from an oil field for validationExperimental results show that our system can detect 95 ofdata fault in the simulation datawhich contains 75168millionsamples from 5800 sensors

We will continue validate our approach on other datasetto find out the proper statistical sliding window size and sta-tus transform windows size in different application contextsOur goal is to build a sensor health management system forindustry IoT that includes not only sensor fault detectionbut also sensor lifecycle prediction and sensor inspectionmanagement

Acknowledgments

The authors would like to thank Jun Ma for helping themto dispose the simulation data This research is supported bythe National Natural Science Funds of China (61202238) theState Key Laboratory of Software Development EnvironmentFunds (SKLSDE-2011ZX-08) and the Fundamental ResearchFunds for the Central Universities (2012RC0209)

References

[1] Y Liu and G Zhou ldquoKey technologies and applications ofinternet of thingsrdquo in Proceedings of the 5th International Con-ference on Intelligent Computation Technology and Automation(ICICTA rsquo12) pp 197ndash200 Zhangjiajie China January 2012

[2] W Baoyun ldquoReview on Internet ofThingsrdquo Journal of ElectronicMeasurement and Instrument vol 23 no 12 pp 1ndash7 2009

8 Mathematical Problems in Engineering

[3] S Haller S Karnouskos and C Schroth ldquoThe internet ofthings in an enterprise contextrdquo in Future Internet vol 5468 ofLecture Notes in Computer Science pp 14ndash28 Springer BerlinGermany 2009

[4] I F AkyildizW Su Y Sankarasubramaniam and E Cayirci ldquoAsurvey on sensor networksrdquo IEEE Communications Magazinevol 40 no 8 pp 102ndash105 2002

[5] L Paradis and Q Han ldquoA survey of fault management inwireless sensor networksrdquo Journal of Network and SystemsManagement vol 15 no 2 pp 171ndash190 2007

[6] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[7] KNiN RamanathanMNHChehade et al ldquoSensor networkdata fault typesrdquo ACM Transactions on Sensor Networks vol 5no 3 pp 1ndash29 2009

[8] N Bressan L Bazzaco N Bui P Casari L Vangelista andM Zorzi ldquoThe deployment of a smart monitoring systemusing wireless sensor and actuator networksrdquo Proceedings ofInternational Conference on Smart Grid Communications pp49ndash54 2010

[9] A P Castellani M Gheda N Bui M Rossi and M ZorzildquoWeb services for the Internet of things through CoAP andEXIrdquo in Proceedings of the IEEE International Conference onCommunications Workshops (ICC rsquo11) pp 1ndash6 Kyoto JapanJune 2011

[10] S YuanX Lai X ZhaoXXu andL Zhang ldquoDistributed struc-tural health monitoring system based on smart wireless sensorand multi-agent technologyrdquo Smart Materials and Structuresvol 15 no 1 pp 1ndash8 2006

[11] N Bui and M Zorzi ldquoHealth care applications a solutionbased on the Internet of Thingsrdquo in Proceedings of the 4thInternational Symposium on Applied Sciences in Biomedical andCommunication Technologies (ISABEL rsquo11) article 131 October2011

[12] C Guestrin P Bodik R Thibaux M Paskin and S MaddenldquoDistributed regression an efficient framework for modelingsensor network datardquo in Proceedings of the 3rd InternationalSymposium on Information Processing in Sensor Networks (IPSNrsquo04) pp 1ndash10 Berkeley Calif USA April 2004

[13] J Considine M Hadjieleftheriou F Li J Byers and G KolliosldquoRobust approximate aggregation in sensor data managementsystemsrdquo ACM Transactions on Database Systems vol 34 no 1article 6 2009

[14] M B Greenwald and S Khanna ldquoPower-conserving compu-tation of order-statistics over sensor networksrdquo in Proceedingsof the 23rd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems (PODS rsquo04) pp 275ndash285 ParisFrance June 2004

[15] A Jain E Y Chang and Y-F Wang ldquoAdaptive stream resourcemanagement using Kalman Filtersrdquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 11ndash22 Paris France June 2004

[16] C Olston J Jiang and J Widom ldquoAdaptive filters for contin-uous queries over distributed data streamsrdquo in Proceedings ofthe ACM SIGMOD International Conference on Management ofData pp 563ndash574 June 2003

[17] K Yamanishi J-I Takeuchi and G Williams ldquoOn-line unsu-pervised outlier detection using finite mixtures with dis-counting learning algorithmsrdquo in Proceedings of the 6th ACM

SIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo01) pp 320ndash324 August 2000

[18] H-L Chan T-W Lam L-K Lee and H-F Ting ldquoContinuousmonitoring of distributed data streams over a time-basedsliding windowrdquo Algorithmica vol 62 no 3-4 pp 1088ndash11112012

[19] M Ding D Chen K Xing and X Cheng ldquoLocalized fault-tolerant event boundary detection in sensor networksrdquo inProceedings of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM rsquo05) pp902ndash913 March 2005

[20] J-L Gao Y-J Xu and X-W Li ldquoWeighted-median baseddistributed fault detection forwireless sensor networksrdquo Journalof Software vol 18 no 5 pp 1208ndash1217 2007

[21] B Krishnamachari and S Iyengar ldquoDistributed Bayesian algo-rithms for fault-tolerant event region detection in wirelesssensor networksrdquo IEEE Transactions on Computers vol 53 no3 pp 241ndash250 2004

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo Journal of the Society For Industrial ampApplied Mathematics vol 11 no 2 pp 431ndash441 1963

[23] X Su Y Lan R Wan and Y Qin Y ldquoA fast incremental cluster-ing algorithmrdquo in Proceedings of the International Symposiumon Information Processing (ISIP rsquo09) pp 175ndash178 HuangshanChina 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

Mathematical Problems in Engineering 3

Self-learning sensor fault detection

Input sensor value (ID value)

Thread status queue

Working threads pool

Self-learning threadIterateTask timer

Feature selection

Trend vector

Incremental clustering

NB classification

Parameter feedback

Detection threadParameter filter

Outlier Stuck Spikes Status trend

Sensor fault warning

Application DB

DB operationControl flowIO interface

Output interface

Figure 4 A self-learning sensor fault detection framework

31 Sensor Value For detection the Outliers Stuck-at faultsand Spikes we propose a statistics method Figure 1 shows themechanism of Statistics Sliding Windows (SSW) For sensor119904 the current value V

119888and the previous 119898 values form the

current windows 119882119888= V119888minus119898

V119888minus119898+1

V119888minus119898+2

V119888 In the

recent history we can define 119896windows with the same lengthand get the set 119867V = 119882

111988221198823 119882

119896 We estimate

the values in each sliding window by Gaussian distribution119882119894

rarr 119873(120583119894 1205902119894) (Formula (1)) If 1205902

119888= 0 a Stuck-at fault

is detected And when 1205902119888is big enough a Spikes fault may

happened

119882119894997888rarr 119873(120583 120590

2

)

120583 =sum119898

119896=1V119896

119898 1205902

=sum119898

119896=1(V119896minus 120583)2

119898

(1)

There is a buffer named119882119903in front of the current window

119882119888 With the new samples coming into 119882

119888 the obsolete

samples deserted by 119882119888will become a member of 119882

119903 When

the size of 119882119903reach 119898 the oldest window in 119867V will be

discarded and the current 119882119903will join 119867V as 1198821 With SSW

large numbers of historical sensor values are regressed to 119896

pairs of Gaussian characteristics In a real application weneed not hold all the recent values in the memory

32 Status Group In the industry monitoring IoT the statusof a production line may be uncertain With the differentmanufacturing techniques and different workloads somedevices in the production line may be shut down In thiscase the whole production line is still working but valuesof sensors monitoring the shutdown devices will run out of

range (Outliers) At the same time related devices may alsochange with the shutting down operation In an industrymonitoring IoT the rational status transformation of sensorsshould be recognized and ignored

Figure 2 shows a typical flow chat of a gas-processingplant P-1 P-2 and P-3 are three parallel subpipelines whichare controlled by V-1 V-2 and V-3 respectively Each sub-pipeline can be shut down independently and this operationwill affect the value of P-0-1 a pressure sensor on the maininput pipeline In our approach we deposit the sensors inthe goal-processing plant into several status groups Althoughthe application background is important to the dispositionmethod we can follow some common rules as follows

119866 = 1198781 1198782 119878

119899 119878

119894isin VALVE

119875 119899 le 10 (2)

(1) Sensors on the opposite side of a valve are not inthe same group A valve is a typical controller inthe industrial IoT All the devices on the backwardposition can be shut down by the proper valve Thesensors on the different sidemay be in different status

(2) If there are too many sensors which are controlledby one slave they should be divided into differentgroups In our approach we will not put more than10 sensors into one status group because the probablestatus space will expand acutely with the increasingsensor numbers In this case sensors can be groupedby their relative position with the most complexdevice because the complex device may lead to statuschange most possibly

4 Mathematical Problems in Engineering

(1) let119882119898 be the statistics sliding windows size(2) let 120576

119904be the outlier detection threshold for sensor S and let P be the

global list to keep all of the 120576119904

(3) let U and V be the global arrays to keep the last 10 gaussiandistribution characteristics for each sensor

(4) let119882119899 be the status transform windows size and 1205791198944 be the trend

vector merging threshold for group 119866119894 all the rational trend vectors

of 119866119894are stored in 119877

119894

(5) procedure GFD ()(6) init P loading 120576 from Appilcation DB(7) init U and V loading the last 10 distribution characteristics from

Application DB(8) create C DetectionThread threads 119862 = 119873119906119898

119888119901119906lowast 2

(9) start SelfLearningThread()(10) do(11) get a value set V

119894for sensors in a group 119866

119894

(12) find a idle DetectionThread(13) DetectionThread(V

119894)

(14) while not end(15) return(16) procedureDetectionThread(V

119894)

(17) if (IsStuck(V119894) or IsSpikes(V

119894))

(18) return(19) if (IsOutlier(V

119894) and not IsRatStatChange(V

119894))

(20) return(21) IsRatStatChange(V

119894)

(22) return(23) procedure IsStuck(V

119894)

(24) get 120590119894119895

2 for sensor j by 119879119888

(25) if (120590119894119895

2 = 0)(26) mark 119904

119894119895as a Stuck

(27) return(28) procedure IsSpikes(V

119894)

(29) get 120583119894119895and 120590

119894119895

2 for sensor j by 119879119888

(30) if (120583119894119895mean(119880(120583

119894)) lt 120585 and 120590

119894119895

2mean(119881(120590119894

2)) gt 120591)(31) mark 119904

119894119895as a Spikes

(32) return(33) procedure IsOutlier(V

119894)

(34) use U and V to caculate 120579119894119895

(35) if (120579119894119895gt 120576119894119895) mark 119904

119894119895as an Outlier

(36) return(37) procedure IsRatStatChange(V

119894)

(38) use the last 119899 values of 119904119894to fit the trend vector 119878

119894

(39) for each rational trend 119903119894in 119877119894

(40) if (cosine(119903119894 119878119894) lt 1205791198944)

(41) mark 119878119894as a rational status change 119878

119894119895= mean(119878

119894119895 119903119894119895)

(42) mark the max unstable sensor 119904119894119895as a sensor falut

(43) return(44) procudure SelfLearningThread()(45) while (true) do(46) load the user feedback(47) for each miss alart(48) if missed Outlier then 120576

119894119895= 120576119894119895divide 120582

(49) if missed irrational status change then 120579119894= 120579119894times 120582

(50) IncClustering(119878119894 null)

(51) for each false alart(52) if false Outlier then 120576

119894119895= 120576119894119895times 120582

(53) if false irrational status change then 120579119894= 120579119894divide 120582

(54) IncClustering(119878119894 119903119894119895)

(55) sleep(119879) continues

Algorithm 1 Continued

Mathematical Problems in Engineering 5

(56) return(57) procedure IncClustering(119878

119894119903119894119895)

(58) if 119903119894119895is not null then for each 119878

119894119895in 119878119894

(59) if there is cosine(119903119894 119878119894) lt 120579119894then 119878

119894119895= mean(119878

119894119895 119903119894119895)

(60) else add 119903119894119895into 119878

119894

(61) for each two 119878119894119895 119878119894119896in 119878119894

(62) if (cosine(119878119894119895 119878119894119896) lt 120579119894)

(63) 119878119894119895= mean(119878

119894119895 119878119894119896) remove 119878

119894119896

(64) return

Algorithm 1 A group-based fault detection Algorithm

(3) The production line is normally divided into manyunits according to the geographical position We canignore the relationship between sensors in differentsections

For the 11 sensors shown in Figure 2 we divide theminto four groups which are 119866

0= 119875

0minus1 1198790minus1

1198661

=

1198751minus1

1198711minus1

1198791minus1

1198662

= 1198752minus1

1198712minus1

1198792minus1

and 1198663

=

1198753minus1

1198713minus1

1198793minus1

1198660represents the status of the main input

pipeline 1198661 1198662 and 119866

3are status groups which are derived

from the three parallel subpipelines respectively

33 Status Model For a status group 119866 = 1198781 1198782 119878

119899 all

the sensors in 119866may have different trends when the produc-tion status changed Figure 3 shows a status transformationprocess 119875

1 1198752 and 119879

1are stable in the first 30 and the

last 20 seconds The interval from 30 s to 40 s is called theStatus Transform Window (STW) 119875

1and 119875

2increase to a

new level while 1198791keep the same trend And the slopes of 119875

1

and 1198752are different We use the trend vector which contains

all the slopes of one sensorsrsquos group to represent the statustransformation that is 119878 = 119896

1 1198962 119896

119899 The trend of a

sensor can be found by fitting its values by a specific size ofSTW Here we use the least-square method [22] (Formula(3)) to fit the values and record the rational trend vectorby the sensor index We use the key idea of incrementalclustering algorithm [23] to handle the trend vectors get thecosine angle between the current trend and existing vectorsrespectively If the angle is big enough then mark it a newtrend Otherwise a repeated trend is found and we only needto merge it with the closest vector In Section 4 the specificclustering method will be given for details

= 120572 + 120573119883

120572 =sum119884

119899minus

120573sum119883

119899 120573 =

119899sum119883119884 minus sum119883sum119884

119899sum1198832 minus (sum119883)2

(3)

4 Design of Architecture

In this section we propose the architecture for sensor faultdetection in industry monitoring at firstThen how to realizeour algorithm is discussed

Table 1 Simulation data description

Feature ValueSampling rate 60 sSensor count 5800Total samples 75168 millionStatus groups 1340Outliers 437Stuck-at amp Spikes 163RT missed 961IRT missed 35

41 A Self-Learning Framework The self-learning sensorfault detection architecture is shown in Figure 4 There arefour modules in our approach

(1) Application DB all the parameters are stored in theApplication DB including the threshold 120576

119904for sensor

119904 and the history statistics result 120583119894 1205902119894 The grouping

information is also serialized in this database ThisDB is the interface for high layer application whichcan get the sensor fault prediction and input the userfeedback

(2) DetectionThread it is a background service and con-tains the main detecting process Since our approachis an online detection a series of detection thread willbe created andmaintained by theworking thread pooland a related status queue When new data is comingthe working thread dispatcher a wakes up a pendingthread to handle it

(3) Self-Learning Thread the self-learning thread usesthe OS timer as a driver The user feedback about thedetection result will be rechecked by this thread torevise the trend vectors

42 The GbFD Algorithm Dividing the sensors into statusgroup is the key idea in our approach For group-based sensorfault detection we propose the (group-based fault detection)GbFD algorithm

The GbFD Algorithm 1 starts by initializing the globalparameters (line 6-7) then it instantiates the two coreprocesses SelfLearningThread andDetectionThreadThe input

6 Mathematical Problems in Engineering

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(a) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(b) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(c) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(d) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

OtlrS amp S

RTIRT

Prec

ision

(e) 119882119898 = 90119882119899 = 25

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(f) 119882119898 = 90119882119899 = 25

Figure 5 Four-folder cross-validation for precision (a c e) and recall (b d f) with different119882119898 and119882119899

Mathematical Problems in Engineering 7

of GbFD is defined as a set that contains one time samples fora status group

In the DetectionTread we first check the Stuck-at faultsand Spikes by calling the IsStuck and IsSpikes methods AndIsRatStaChange is called two times (line 19ndash21) to detectthe sensor status transformation When an Outlier wasdetected GbFD should give the conclusion whether it isa rational operation such as shut down the device and ifno Outlier happened we also need to check the abnormalstatus transformation The first calling of IsRatStaChange iscontrolled by the IsOutlier method with an AND logic nomore algorithm complexity is increased by the twice callingThe SelfLearningThread works as a background service Itsresponsibility is learning the user feedback to find out themissing detection and the false detection and adjusting therelational parameters For a new rational trend vector wewill add it to the proper group rational trend history by theIncClustering method which can recluster the grouped trendvectors according to the specified angle threshold

The time complexity of the GbFD algorithm is dominatedby IsOutlier and IsRatStuChange methods For a sensor ingroup119866 the detection computation takes119874((119882

119899+119889)times119882119898)119882119899 is the STW size 119889 is the length of 119866 and 119882119898 is theSSW size In a real application the two windows sizes aresteerable and 119889 is less than 20 for the most part Moreovera proper thread dispatch mechanism will guarantee GbFDto handle the real-time detection task And the self-learningprocess is more complex due to many iterative operationsbut it is a background service and does not require real-timeperformance

5 Experimental Evaluation

We built an application to evaluate our framework Thisapplication was implemented in JAVA about 2400 lines codeand used Oracle as the Application DB

51 Data Preparation We use real data from an oil field inChina This oil field has 20 oilgas treatment plants and all ofthe production equipments are monitored by sensors whichcan be mainly classified as temperature sensor pressuresensor and liquid level sensor The sampling rate of theproduction IoT is 60 seconds We obtained the sample dataof 10000 sensors between January 1st 8 PM to October 31th8AM 2012 Since the production lines are relatively stablewe filtered the original data by two steps Firstly someproduction units that never make a mistake were eliminatedSecondly for a plant we discard somedata in its stable period

Table 1 shows features of our simulation dataset Wechoose more than 751 million samples from 5800 sensorsAccording to the corresponding flow charts we separate thesensors into 1340 groups Each group has 433 sensors inaverage and the maximum group contains 8 sensors Weanalyze the user feedback history and find out four typicalerrors Outlier means that a value runs out of range and itis caused by the sensor failure We put Stuck-at faults andSpikes together The Rational Trend (RT) missed fault meansthat sensors submit an exceptional value after a rational user

operation such as shuting down a device for examinationAnd the Irrational Trend (IRT) missed fault means thatsomething wrong happened with the production line andthe sensor values changed with it but still suitable with thethreshold condition

52 Experiment We split the simulation data into fourdatasets according to the time sequence Each time we useoptional three data sets to train the GbFD algorithm and theremainder is used as the testing data

We use three pairs SSW size and STW size which are(30 15) (60 20) and (90 25) to run the four-folder cross-validation test The result is shown in Figure 5 When 119882

119898 =

30 and119882119899 = 15 each cross get a precision of about 80Thisresult is not good enough A satisfactory result is generatedin the next test 119882119898 = 60 and 119882119899 = 20 get a mean95 accuracy But with the increase of two windows sizethe accuracy of GbFD will go to an opposite direction Thisphenomenon indicates that the sizes of SSW and STW arestrong correlating with our algorithm Choosing the properwindows size can increase the detection sensitivity and thewindows size (60 20) is suitable with our simulation data

6 Conclusions

We present a self-learning sensor fault detection frameworkin this paper We propose a model which can represent thesensor value sensor relationship and sensor status transfor-mation GbFD algorithm is proposed to detect the sensorfault And we use real data from an oil field for validationExperimental results show that our system can detect 95 ofdata fault in the simulation datawhich contains 75168millionsamples from 5800 sensors

We will continue validate our approach on other datasetto find out the proper statistical sliding window size and sta-tus transform windows size in different application contextsOur goal is to build a sensor health management system forindustry IoT that includes not only sensor fault detectionbut also sensor lifecycle prediction and sensor inspectionmanagement

Acknowledgments

The authors would like to thank Jun Ma for helping themto dispose the simulation data This research is supported bythe National Natural Science Funds of China (61202238) theState Key Laboratory of Software Development EnvironmentFunds (SKLSDE-2011ZX-08) and the Fundamental ResearchFunds for the Central Universities (2012RC0209)

References

[1] Y Liu and G Zhou ldquoKey technologies and applications ofinternet of thingsrdquo in Proceedings of the 5th International Con-ference on Intelligent Computation Technology and Automation(ICICTA rsquo12) pp 197ndash200 Zhangjiajie China January 2012

[2] W Baoyun ldquoReview on Internet ofThingsrdquo Journal of ElectronicMeasurement and Instrument vol 23 no 12 pp 1ndash7 2009

8 Mathematical Problems in Engineering

[3] S Haller S Karnouskos and C Schroth ldquoThe internet ofthings in an enterprise contextrdquo in Future Internet vol 5468 ofLecture Notes in Computer Science pp 14ndash28 Springer BerlinGermany 2009

[4] I F AkyildizW Su Y Sankarasubramaniam and E Cayirci ldquoAsurvey on sensor networksrdquo IEEE Communications Magazinevol 40 no 8 pp 102ndash105 2002

[5] L Paradis and Q Han ldquoA survey of fault management inwireless sensor networksrdquo Journal of Network and SystemsManagement vol 15 no 2 pp 171ndash190 2007

[6] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[7] KNiN RamanathanMNHChehade et al ldquoSensor networkdata fault typesrdquo ACM Transactions on Sensor Networks vol 5no 3 pp 1ndash29 2009

[8] N Bressan L Bazzaco N Bui P Casari L Vangelista andM Zorzi ldquoThe deployment of a smart monitoring systemusing wireless sensor and actuator networksrdquo Proceedings ofInternational Conference on Smart Grid Communications pp49ndash54 2010

[9] A P Castellani M Gheda N Bui M Rossi and M ZorzildquoWeb services for the Internet of things through CoAP andEXIrdquo in Proceedings of the IEEE International Conference onCommunications Workshops (ICC rsquo11) pp 1ndash6 Kyoto JapanJune 2011

[10] S YuanX Lai X ZhaoXXu andL Zhang ldquoDistributed struc-tural health monitoring system based on smart wireless sensorand multi-agent technologyrdquo Smart Materials and Structuresvol 15 no 1 pp 1ndash8 2006

[11] N Bui and M Zorzi ldquoHealth care applications a solutionbased on the Internet of Thingsrdquo in Proceedings of the 4thInternational Symposium on Applied Sciences in Biomedical andCommunication Technologies (ISABEL rsquo11) article 131 October2011

[12] C Guestrin P Bodik R Thibaux M Paskin and S MaddenldquoDistributed regression an efficient framework for modelingsensor network datardquo in Proceedings of the 3rd InternationalSymposium on Information Processing in Sensor Networks (IPSNrsquo04) pp 1ndash10 Berkeley Calif USA April 2004

[13] J Considine M Hadjieleftheriou F Li J Byers and G KolliosldquoRobust approximate aggregation in sensor data managementsystemsrdquo ACM Transactions on Database Systems vol 34 no 1article 6 2009

[14] M B Greenwald and S Khanna ldquoPower-conserving compu-tation of order-statistics over sensor networksrdquo in Proceedingsof the 23rd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems (PODS rsquo04) pp 275ndash285 ParisFrance June 2004

[15] A Jain E Y Chang and Y-F Wang ldquoAdaptive stream resourcemanagement using Kalman Filtersrdquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 11ndash22 Paris France June 2004

[16] C Olston J Jiang and J Widom ldquoAdaptive filters for contin-uous queries over distributed data streamsrdquo in Proceedings ofthe ACM SIGMOD International Conference on Management ofData pp 563ndash574 June 2003

[17] K Yamanishi J-I Takeuchi and G Williams ldquoOn-line unsu-pervised outlier detection using finite mixtures with dis-counting learning algorithmsrdquo in Proceedings of the 6th ACM

SIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo01) pp 320ndash324 August 2000

[18] H-L Chan T-W Lam L-K Lee and H-F Ting ldquoContinuousmonitoring of distributed data streams over a time-basedsliding windowrdquo Algorithmica vol 62 no 3-4 pp 1088ndash11112012

[19] M Ding D Chen K Xing and X Cheng ldquoLocalized fault-tolerant event boundary detection in sensor networksrdquo inProceedings of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM rsquo05) pp902ndash913 March 2005

[20] J-L Gao Y-J Xu and X-W Li ldquoWeighted-median baseddistributed fault detection forwireless sensor networksrdquo Journalof Software vol 18 no 5 pp 1208ndash1217 2007

[21] B Krishnamachari and S Iyengar ldquoDistributed Bayesian algo-rithms for fault-tolerant event region detection in wirelesssensor networksrdquo IEEE Transactions on Computers vol 53 no3 pp 241ndash250 2004

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo Journal of the Society For Industrial ampApplied Mathematics vol 11 no 2 pp 431ndash441 1963

[23] X Su Y Lan R Wan and Y Qin Y ldquoA fast incremental cluster-ing algorithmrdquo in Proceedings of the International Symposiumon Information Processing (ISIP rsquo09) pp 175ndash178 HuangshanChina 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

4 Mathematical Problems in Engineering

(1) let119882119898 be the statistics sliding windows size(2) let 120576

119904be the outlier detection threshold for sensor S and let P be the

global list to keep all of the 120576119904

(3) let U and V be the global arrays to keep the last 10 gaussiandistribution characteristics for each sensor

(4) let119882119899 be the status transform windows size and 1205791198944 be the trend

vector merging threshold for group 119866119894 all the rational trend vectors

of 119866119894are stored in 119877

119894

(5) procedure GFD ()(6) init P loading 120576 from Appilcation DB(7) init U and V loading the last 10 distribution characteristics from

Application DB(8) create C DetectionThread threads 119862 = 119873119906119898

119888119901119906lowast 2

(9) start SelfLearningThread()(10) do(11) get a value set V

119894for sensors in a group 119866

119894

(12) find a idle DetectionThread(13) DetectionThread(V

119894)

(14) while not end(15) return(16) procedureDetectionThread(V

119894)

(17) if (IsStuck(V119894) or IsSpikes(V

119894))

(18) return(19) if (IsOutlier(V

119894) and not IsRatStatChange(V

119894))

(20) return(21) IsRatStatChange(V

119894)

(22) return(23) procedure IsStuck(V

119894)

(24) get 120590119894119895

2 for sensor j by 119879119888

(25) if (120590119894119895

2 = 0)(26) mark 119904

119894119895as a Stuck

(27) return(28) procedure IsSpikes(V

119894)

(29) get 120583119894119895and 120590

119894119895

2 for sensor j by 119879119888

(30) if (120583119894119895mean(119880(120583

119894)) lt 120585 and 120590

119894119895

2mean(119881(120590119894

2)) gt 120591)(31) mark 119904

119894119895as a Spikes

(32) return(33) procedure IsOutlier(V

119894)

(34) use U and V to caculate 120579119894119895

(35) if (120579119894119895gt 120576119894119895) mark 119904

119894119895as an Outlier

(36) return(37) procedure IsRatStatChange(V

119894)

(38) use the last 119899 values of 119904119894to fit the trend vector 119878

119894

(39) for each rational trend 119903119894in 119877119894

(40) if (cosine(119903119894 119878119894) lt 1205791198944)

(41) mark 119878119894as a rational status change 119878

119894119895= mean(119878

119894119895 119903119894119895)

(42) mark the max unstable sensor 119904119894119895as a sensor falut

(43) return(44) procudure SelfLearningThread()(45) while (true) do(46) load the user feedback(47) for each miss alart(48) if missed Outlier then 120576

119894119895= 120576119894119895divide 120582

(49) if missed irrational status change then 120579119894= 120579119894times 120582

(50) IncClustering(119878119894 null)

(51) for each false alart(52) if false Outlier then 120576

119894119895= 120576119894119895times 120582

(53) if false irrational status change then 120579119894= 120579119894divide 120582

(54) IncClustering(119878119894 119903119894119895)

(55) sleep(119879) continues

Algorithm 1 Continued

Mathematical Problems in Engineering 5

(56) return(57) procedure IncClustering(119878

119894119903119894119895)

(58) if 119903119894119895is not null then for each 119878

119894119895in 119878119894

(59) if there is cosine(119903119894 119878119894) lt 120579119894then 119878

119894119895= mean(119878

119894119895 119903119894119895)

(60) else add 119903119894119895into 119878

119894

(61) for each two 119878119894119895 119878119894119896in 119878119894

(62) if (cosine(119878119894119895 119878119894119896) lt 120579119894)

(63) 119878119894119895= mean(119878

119894119895 119878119894119896) remove 119878

119894119896

(64) return

Algorithm 1 A group-based fault detection Algorithm

(3) The production line is normally divided into manyunits according to the geographical position We canignore the relationship between sensors in differentsections

For the 11 sensors shown in Figure 2 we divide theminto four groups which are 119866

0= 119875

0minus1 1198790minus1

1198661

=

1198751minus1

1198711minus1

1198791minus1

1198662

= 1198752minus1

1198712minus1

1198792minus1

and 1198663

=

1198753minus1

1198713minus1

1198793minus1

1198660represents the status of the main input

pipeline 1198661 1198662 and 119866

3are status groups which are derived

from the three parallel subpipelines respectively

33 Status Model For a status group 119866 = 1198781 1198782 119878

119899 all

the sensors in 119866may have different trends when the produc-tion status changed Figure 3 shows a status transformationprocess 119875

1 1198752 and 119879

1are stable in the first 30 and the

last 20 seconds The interval from 30 s to 40 s is called theStatus Transform Window (STW) 119875

1and 119875

2increase to a

new level while 1198791keep the same trend And the slopes of 119875

1

and 1198752are different We use the trend vector which contains

all the slopes of one sensorsrsquos group to represent the statustransformation that is 119878 = 119896

1 1198962 119896

119899 The trend of a

sensor can be found by fitting its values by a specific size ofSTW Here we use the least-square method [22] (Formula(3)) to fit the values and record the rational trend vectorby the sensor index We use the key idea of incrementalclustering algorithm [23] to handle the trend vectors get thecosine angle between the current trend and existing vectorsrespectively If the angle is big enough then mark it a newtrend Otherwise a repeated trend is found and we only needto merge it with the closest vector In Section 4 the specificclustering method will be given for details

= 120572 + 120573119883

120572 =sum119884

119899minus

120573sum119883

119899 120573 =

119899sum119883119884 minus sum119883sum119884

119899sum1198832 minus (sum119883)2

(3)

4 Design of Architecture

In this section we propose the architecture for sensor faultdetection in industry monitoring at firstThen how to realizeour algorithm is discussed

Table 1 Simulation data description

Feature ValueSampling rate 60 sSensor count 5800Total samples 75168 millionStatus groups 1340Outliers 437Stuck-at amp Spikes 163RT missed 961IRT missed 35

41 A Self-Learning Framework The self-learning sensorfault detection architecture is shown in Figure 4 There arefour modules in our approach

(1) Application DB all the parameters are stored in theApplication DB including the threshold 120576

119904for sensor

119904 and the history statistics result 120583119894 1205902119894 The grouping

information is also serialized in this database ThisDB is the interface for high layer application whichcan get the sensor fault prediction and input the userfeedback

(2) DetectionThread it is a background service and con-tains the main detecting process Since our approachis an online detection a series of detection thread willbe created andmaintained by theworking thread pooland a related status queue When new data is comingthe working thread dispatcher a wakes up a pendingthread to handle it

(3) Self-Learning Thread the self-learning thread usesthe OS timer as a driver The user feedback about thedetection result will be rechecked by this thread torevise the trend vectors

42 The GbFD Algorithm Dividing the sensors into statusgroup is the key idea in our approach For group-based sensorfault detection we propose the (group-based fault detection)GbFD algorithm

The GbFD Algorithm 1 starts by initializing the globalparameters (line 6-7) then it instantiates the two coreprocesses SelfLearningThread andDetectionThreadThe input

6 Mathematical Problems in Engineering

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(a) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(b) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(c) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(d) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

OtlrS amp S

RTIRT

Prec

ision

(e) 119882119898 = 90119882119899 = 25

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(f) 119882119898 = 90119882119899 = 25

Figure 5 Four-folder cross-validation for precision (a c e) and recall (b d f) with different119882119898 and119882119899

Mathematical Problems in Engineering 7

of GbFD is defined as a set that contains one time samples fora status group

In the DetectionTread we first check the Stuck-at faultsand Spikes by calling the IsStuck and IsSpikes methods AndIsRatStaChange is called two times (line 19ndash21) to detectthe sensor status transformation When an Outlier wasdetected GbFD should give the conclusion whether it isa rational operation such as shut down the device and ifno Outlier happened we also need to check the abnormalstatus transformation The first calling of IsRatStaChange iscontrolled by the IsOutlier method with an AND logic nomore algorithm complexity is increased by the twice callingThe SelfLearningThread works as a background service Itsresponsibility is learning the user feedback to find out themissing detection and the false detection and adjusting therelational parameters For a new rational trend vector wewill add it to the proper group rational trend history by theIncClustering method which can recluster the grouped trendvectors according to the specified angle threshold

The time complexity of the GbFD algorithm is dominatedby IsOutlier and IsRatStuChange methods For a sensor ingroup119866 the detection computation takes119874((119882

119899+119889)times119882119898)119882119899 is the STW size 119889 is the length of 119866 and 119882119898 is theSSW size In a real application the two windows sizes aresteerable and 119889 is less than 20 for the most part Moreovera proper thread dispatch mechanism will guarantee GbFDto handle the real-time detection task And the self-learningprocess is more complex due to many iterative operationsbut it is a background service and does not require real-timeperformance

5 Experimental Evaluation

We built an application to evaluate our framework Thisapplication was implemented in JAVA about 2400 lines codeand used Oracle as the Application DB

51 Data Preparation We use real data from an oil field inChina This oil field has 20 oilgas treatment plants and all ofthe production equipments are monitored by sensors whichcan be mainly classified as temperature sensor pressuresensor and liquid level sensor The sampling rate of theproduction IoT is 60 seconds We obtained the sample dataof 10000 sensors between January 1st 8 PM to October 31th8AM 2012 Since the production lines are relatively stablewe filtered the original data by two steps Firstly someproduction units that never make a mistake were eliminatedSecondly for a plant we discard somedata in its stable period

Table 1 shows features of our simulation dataset Wechoose more than 751 million samples from 5800 sensorsAccording to the corresponding flow charts we separate thesensors into 1340 groups Each group has 433 sensors inaverage and the maximum group contains 8 sensors Weanalyze the user feedback history and find out four typicalerrors Outlier means that a value runs out of range and itis caused by the sensor failure We put Stuck-at faults andSpikes together The Rational Trend (RT) missed fault meansthat sensors submit an exceptional value after a rational user

operation such as shuting down a device for examinationAnd the Irrational Trend (IRT) missed fault means thatsomething wrong happened with the production line andthe sensor values changed with it but still suitable with thethreshold condition

52 Experiment We split the simulation data into fourdatasets according to the time sequence Each time we useoptional three data sets to train the GbFD algorithm and theremainder is used as the testing data

We use three pairs SSW size and STW size which are(30 15) (60 20) and (90 25) to run the four-folder cross-validation test The result is shown in Figure 5 When 119882

119898 =

30 and119882119899 = 15 each cross get a precision of about 80Thisresult is not good enough A satisfactory result is generatedin the next test 119882119898 = 60 and 119882119899 = 20 get a mean95 accuracy But with the increase of two windows sizethe accuracy of GbFD will go to an opposite direction Thisphenomenon indicates that the sizes of SSW and STW arestrong correlating with our algorithm Choosing the properwindows size can increase the detection sensitivity and thewindows size (60 20) is suitable with our simulation data

6 Conclusions

We present a self-learning sensor fault detection frameworkin this paper We propose a model which can represent thesensor value sensor relationship and sensor status transfor-mation GbFD algorithm is proposed to detect the sensorfault And we use real data from an oil field for validationExperimental results show that our system can detect 95 ofdata fault in the simulation datawhich contains 75168millionsamples from 5800 sensors

We will continue validate our approach on other datasetto find out the proper statistical sliding window size and sta-tus transform windows size in different application contextsOur goal is to build a sensor health management system forindustry IoT that includes not only sensor fault detectionbut also sensor lifecycle prediction and sensor inspectionmanagement

Acknowledgments

The authors would like to thank Jun Ma for helping themto dispose the simulation data This research is supported bythe National Natural Science Funds of China (61202238) theState Key Laboratory of Software Development EnvironmentFunds (SKLSDE-2011ZX-08) and the Fundamental ResearchFunds for the Central Universities (2012RC0209)

References

[1] Y Liu and G Zhou ldquoKey technologies and applications ofinternet of thingsrdquo in Proceedings of the 5th International Con-ference on Intelligent Computation Technology and Automation(ICICTA rsquo12) pp 197ndash200 Zhangjiajie China January 2012

[2] W Baoyun ldquoReview on Internet ofThingsrdquo Journal of ElectronicMeasurement and Instrument vol 23 no 12 pp 1ndash7 2009

8 Mathematical Problems in Engineering

[3] S Haller S Karnouskos and C Schroth ldquoThe internet ofthings in an enterprise contextrdquo in Future Internet vol 5468 ofLecture Notes in Computer Science pp 14ndash28 Springer BerlinGermany 2009

[4] I F AkyildizW Su Y Sankarasubramaniam and E Cayirci ldquoAsurvey on sensor networksrdquo IEEE Communications Magazinevol 40 no 8 pp 102ndash105 2002

[5] L Paradis and Q Han ldquoA survey of fault management inwireless sensor networksrdquo Journal of Network and SystemsManagement vol 15 no 2 pp 171ndash190 2007

[6] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[7] KNiN RamanathanMNHChehade et al ldquoSensor networkdata fault typesrdquo ACM Transactions on Sensor Networks vol 5no 3 pp 1ndash29 2009

[8] N Bressan L Bazzaco N Bui P Casari L Vangelista andM Zorzi ldquoThe deployment of a smart monitoring systemusing wireless sensor and actuator networksrdquo Proceedings ofInternational Conference on Smart Grid Communications pp49ndash54 2010

[9] A P Castellani M Gheda N Bui M Rossi and M ZorzildquoWeb services for the Internet of things through CoAP andEXIrdquo in Proceedings of the IEEE International Conference onCommunications Workshops (ICC rsquo11) pp 1ndash6 Kyoto JapanJune 2011

[10] S YuanX Lai X ZhaoXXu andL Zhang ldquoDistributed struc-tural health monitoring system based on smart wireless sensorand multi-agent technologyrdquo Smart Materials and Structuresvol 15 no 1 pp 1ndash8 2006

[11] N Bui and M Zorzi ldquoHealth care applications a solutionbased on the Internet of Thingsrdquo in Proceedings of the 4thInternational Symposium on Applied Sciences in Biomedical andCommunication Technologies (ISABEL rsquo11) article 131 October2011

[12] C Guestrin P Bodik R Thibaux M Paskin and S MaddenldquoDistributed regression an efficient framework for modelingsensor network datardquo in Proceedings of the 3rd InternationalSymposium on Information Processing in Sensor Networks (IPSNrsquo04) pp 1ndash10 Berkeley Calif USA April 2004

[13] J Considine M Hadjieleftheriou F Li J Byers and G KolliosldquoRobust approximate aggregation in sensor data managementsystemsrdquo ACM Transactions on Database Systems vol 34 no 1article 6 2009

[14] M B Greenwald and S Khanna ldquoPower-conserving compu-tation of order-statistics over sensor networksrdquo in Proceedingsof the 23rd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems (PODS rsquo04) pp 275ndash285 ParisFrance June 2004

[15] A Jain E Y Chang and Y-F Wang ldquoAdaptive stream resourcemanagement using Kalman Filtersrdquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 11ndash22 Paris France June 2004

[16] C Olston J Jiang and J Widom ldquoAdaptive filters for contin-uous queries over distributed data streamsrdquo in Proceedings ofthe ACM SIGMOD International Conference on Management ofData pp 563ndash574 June 2003

[17] K Yamanishi J-I Takeuchi and G Williams ldquoOn-line unsu-pervised outlier detection using finite mixtures with dis-counting learning algorithmsrdquo in Proceedings of the 6th ACM

SIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo01) pp 320ndash324 August 2000

[18] H-L Chan T-W Lam L-K Lee and H-F Ting ldquoContinuousmonitoring of distributed data streams over a time-basedsliding windowrdquo Algorithmica vol 62 no 3-4 pp 1088ndash11112012

[19] M Ding D Chen K Xing and X Cheng ldquoLocalized fault-tolerant event boundary detection in sensor networksrdquo inProceedings of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM rsquo05) pp902ndash913 March 2005

[20] J-L Gao Y-J Xu and X-W Li ldquoWeighted-median baseddistributed fault detection forwireless sensor networksrdquo Journalof Software vol 18 no 5 pp 1208ndash1217 2007

[21] B Krishnamachari and S Iyengar ldquoDistributed Bayesian algo-rithms for fault-tolerant event region detection in wirelesssensor networksrdquo IEEE Transactions on Computers vol 53 no3 pp 241ndash250 2004

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo Journal of the Society For Industrial ampApplied Mathematics vol 11 no 2 pp 431ndash441 1963

[23] X Su Y Lan R Wan and Y Qin Y ldquoA fast incremental cluster-ing algorithmrdquo in Proceedings of the International Symposiumon Information Processing (ISIP rsquo09) pp 175ndash178 HuangshanChina 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

Mathematical Problems in Engineering 5

(56) return(57) procedure IncClustering(119878

119894119903119894119895)

(58) if 119903119894119895is not null then for each 119878

119894119895in 119878119894

(59) if there is cosine(119903119894 119878119894) lt 120579119894then 119878

119894119895= mean(119878

119894119895 119903119894119895)

(60) else add 119903119894119895into 119878

119894

(61) for each two 119878119894119895 119878119894119896in 119878119894

(62) if (cosine(119878119894119895 119878119894119896) lt 120579119894)

(63) 119878119894119895= mean(119878

119894119895 119878119894119896) remove 119878

119894119896

(64) return

Algorithm 1 A group-based fault detection Algorithm

(3) The production line is normally divided into manyunits according to the geographical position We canignore the relationship between sensors in differentsections

For the 11 sensors shown in Figure 2 we divide theminto four groups which are 119866

0= 119875

0minus1 1198790minus1

1198661

=

1198751minus1

1198711minus1

1198791minus1

1198662

= 1198752minus1

1198712minus1

1198792minus1

and 1198663

=

1198753minus1

1198713minus1

1198793minus1

1198660represents the status of the main input

pipeline 1198661 1198662 and 119866

3are status groups which are derived

from the three parallel subpipelines respectively

33 Status Model For a status group 119866 = 1198781 1198782 119878

119899 all

the sensors in 119866may have different trends when the produc-tion status changed Figure 3 shows a status transformationprocess 119875

1 1198752 and 119879

1are stable in the first 30 and the

last 20 seconds The interval from 30 s to 40 s is called theStatus Transform Window (STW) 119875

1and 119875

2increase to a

new level while 1198791keep the same trend And the slopes of 119875

1

and 1198752are different We use the trend vector which contains

all the slopes of one sensorsrsquos group to represent the statustransformation that is 119878 = 119896

1 1198962 119896

119899 The trend of a

sensor can be found by fitting its values by a specific size ofSTW Here we use the least-square method [22] (Formula(3)) to fit the values and record the rational trend vectorby the sensor index We use the key idea of incrementalclustering algorithm [23] to handle the trend vectors get thecosine angle between the current trend and existing vectorsrespectively If the angle is big enough then mark it a newtrend Otherwise a repeated trend is found and we only needto merge it with the closest vector In Section 4 the specificclustering method will be given for details

= 120572 + 120573119883

120572 =sum119884

119899minus

120573sum119883

119899 120573 =

119899sum119883119884 minus sum119883sum119884

119899sum1198832 minus (sum119883)2

(3)

4 Design of Architecture

In this section we propose the architecture for sensor faultdetection in industry monitoring at firstThen how to realizeour algorithm is discussed

Table 1 Simulation data description

Feature ValueSampling rate 60 sSensor count 5800Total samples 75168 millionStatus groups 1340Outliers 437Stuck-at amp Spikes 163RT missed 961IRT missed 35

41 A Self-Learning Framework The self-learning sensorfault detection architecture is shown in Figure 4 There arefour modules in our approach

(1) Application DB all the parameters are stored in theApplication DB including the threshold 120576

119904for sensor

119904 and the history statistics result 120583119894 1205902119894 The grouping

information is also serialized in this database ThisDB is the interface for high layer application whichcan get the sensor fault prediction and input the userfeedback

(2) DetectionThread it is a background service and con-tains the main detecting process Since our approachis an online detection a series of detection thread willbe created andmaintained by theworking thread pooland a related status queue When new data is comingthe working thread dispatcher a wakes up a pendingthread to handle it

(3) Self-Learning Thread the self-learning thread usesthe OS timer as a driver The user feedback about thedetection result will be rechecked by this thread torevise the trend vectors

42 The GbFD Algorithm Dividing the sensors into statusgroup is the key idea in our approach For group-based sensorfault detection we propose the (group-based fault detection)GbFD algorithm

The GbFD Algorithm 1 starts by initializing the globalparameters (line 6-7) then it instantiates the two coreprocesses SelfLearningThread andDetectionThreadThe input

6 Mathematical Problems in Engineering

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(a) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(b) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(c) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(d) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

OtlrS amp S

RTIRT

Prec

ision

(e) 119882119898 = 90119882119899 = 25

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(f) 119882119898 = 90119882119899 = 25

Figure 5 Four-folder cross-validation for precision (a c e) and recall (b d f) with different119882119898 and119882119899

Mathematical Problems in Engineering 7

of GbFD is defined as a set that contains one time samples fora status group

In the DetectionTread we first check the Stuck-at faultsand Spikes by calling the IsStuck and IsSpikes methods AndIsRatStaChange is called two times (line 19ndash21) to detectthe sensor status transformation When an Outlier wasdetected GbFD should give the conclusion whether it isa rational operation such as shut down the device and ifno Outlier happened we also need to check the abnormalstatus transformation The first calling of IsRatStaChange iscontrolled by the IsOutlier method with an AND logic nomore algorithm complexity is increased by the twice callingThe SelfLearningThread works as a background service Itsresponsibility is learning the user feedback to find out themissing detection and the false detection and adjusting therelational parameters For a new rational trend vector wewill add it to the proper group rational trend history by theIncClustering method which can recluster the grouped trendvectors according to the specified angle threshold

The time complexity of the GbFD algorithm is dominatedby IsOutlier and IsRatStuChange methods For a sensor ingroup119866 the detection computation takes119874((119882

119899+119889)times119882119898)119882119899 is the STW size 119889 is the length of 119866 and 119882119898 is theSSW size In a real application the two windows sizes aresteerable and 119889 is less than 20 for the most part Moreovera proper thread dispatch mechanism will guarantee GbFDto handle the real-time detection task And the self-learningprocess is more complex due to many iterative operationsbut it is a background service and does not require real-timeperformance

5 Experimental Evaluation

We built an application to evaluate our framework Thisapplication was implemented in JAVA about 2400 lines codeand used Oracle as the Application DB

51 Data Preparation We use real data from an oil field inChina This oil field has 20 oilgas treatment plants and all ofthe production equipments are monitored by sensors whichcan be mainly classified as temperature sensor pressuresensor and liquid level sensor The sampling rate of theproduction IoT is 60 seconds We obtained the sample dataof 10000 sensors between January 1st 8 PM to October 31th8AM 2012 Since the production lines are relatively stablewe filtered the original data by two steps Firstly someproduction units that never make a mistake were eliminatedSecondly for a plant we discard somedata in its stable period

Table 1 shows features of our simulation dataset Wechoose more than 751 million samples from 5800 sensorsAccording to the corresponding flow charts we separate thesensors into 1340 groups Each group has 433 sensors inaverage and the maximum group contains 8 sensors Weanalyze the user feedback history and find out four typicalerrors Outlier means that a value runs out of range and itis caused by the sensor failure We put Stuck-at faults andSpikes together The Rational Trend (RT) missed fault meansthat sensors submit an exceptional value after a rational user

operation such as shuting down a device for examinationAnd the Irrational Trend (IRT) missed fault means thatsomething wrong happened with the production line andthe sensor values changed with it but still suitable with thethreshold condition

52 Experiment We split the simulation data into fourdatasets according to the time sequence Each time we useoptional three data sets to train the GbFD algorithm and theremainder is used as the testing data

We use three pairs SSW size and STW size which are(30 15) (60 20) and (90 25) to run the four-folder cross-validation test The result is shown in Figure 5 When 119882

119898 =

30 and119882119899 = 15 each cross get a precision of about 80Thisresult is not good enough A satisfactory result is generatedin the next test 119882119898 = 60 and 119882119899 = 20 get a mean95 accuracy But with the increase of two windows sizethe accuracy of GbFD will go to an opposite direction Thisphenomenon indicates that the sizes of SSW and STW arestrong correlating with our algorithm Choosing the properwindows size can increase the detection sensitivity and thewindows size (60 20) is suitable with our simulation data

6 Conclusions

We present a self-learning sensor fault detection frameworkin this paper We propose a model which can represent thesensor value sensor relationship and sensor status transfor-mation GbFD algorithm is proposed to detect the sensorfault And we use real data from an oil field for validationExperimental results show that our system can detect 95 ofdata fault in the simulation datawhich contains 75168millionsamples from 5800 sensors

We will continue validate our approach on other datasetto find out the proper statistical sliding window size and sta-tus transform windows size in different application contextsOur goal is to build a sensor health management system forindustry IoT that includes not only sensor fault detectionbut also sensor lifecycle prediction and sensor inspectionmanagement

Acknowledgments

The authors would like to thank Jun Ma for helping themto dispose the simulation data This research is supported bythe National Natural Science Funds of China (61202238) theState Key Laboratory of Software Development EnvironmentFunds (SKLSDE-2011ZX-08) and the Fundamental ResearchFunds for the Central Universities (2012RC0209)

References

[1] Y Liu and G Zhou ldquoKey technologies and applications ofinternet of thingsrdquo in Proceedings of the 5th International Con-ference on Intelligent Computation Technology and Automation(ICICTA rsquo12) pp 197ndash200 Zhangjiajie China January 2012

[2] W Baoyun ldquoReview on Internet ofThingsrdquo Journal of ElectronicMeasurement and Instrument vol 23 no 12 pp 1ndash7 2009

8 Mathematical Problems in Engineering

[3] S Haller S Karnouskos and C Schroth ldquoThe internet ofthings in an enterprise contextrdquo in Future Internet vol 5468 ofLecture Notes in Computer Science pp 14ndash28 Springer BerlinGermany 2009

[4] I F AkyildizW Su Y Sankarasubramaniam and E Cayirci ldquoAsurvey on sensor networksrdquo IEEE Communications Magazinevol 40 no 8 pp 102ndash105 2002

[5] L Paradis and Q Han ldquoA survey of fault management inwireless sensor networksrdquo Journal of Network and SystemsManagement vol 15 no 2 pp 171ndash190 2007

[6] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[7] KNiN RamanathanMNHChehade et al ldquoSensor networkdata fault typesrdquo ACM Transactions on Sensor Networks vol 5no 3 pp 1ndash29 2009

[8] N Bressan L Bazzaco N Bui P Casari L Vangelista andM Zorzi ldquoThe deployment of a smart monitoring systemusing wireless sensor and actuator networksrdquo Proceedings ofInternational Conference on Smart Grid Communications pp49ndash54 2010

[9] A P Castellani M Gheda N Bui M Rossi and M ZorzildquoWeb services for the Internet of things through CoAP andEXIrdquo in Proceedings of the IEEE International Conference onCommunications Workshops (ICC rsquo11) pp 1ndash6 Kyoto JapanJune 2011

[10] S YuanX Lai X ZhaoXXu andL Zhang ldquoDistributed struc-tural health monitoring system based on smart wireless sensorand multi-agent technologyrdquo Smart Materials and Structuresvol 15 no 1 pp 1ndash8 2006

[11] N Bui and M Zorzi ldquoHealth care applications a solutionbased on the Internet of Thingsrdquo in Proceedings of the 4thInternational Symposium on Applied Sciences in Biomedical andCommunication Technologies (ISABEL rsquo11) article 131 October2011

[12] C Guestrin P Bodik R Thibaux M Paskin and S MaddenldquoDistributed regression an efficient framework for modelingsensor network datardquo in Proceedings of the 3rd InternationalSymposium on Information Processing in Sensor Networks (IPSNrsquo04) pp 1ndash10 Berkeley Calif USA April 2004

[13] J Considine M Hadjieleftheriou F Li J Byers and G KolliosldquoRobust approximate aggregation in sensor data managementsystemsrdquo ACM Transactions on Database Systems vol 34 no 1article 6 2009

[14] M B Greenwald and S Khanna ldquoPower-conserving compu-tation of order-statistics over sensor networksrdquo in Proceedingsof the 23rd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems (PODS rsquo04) pp 275ndash285 ParisFrance June 2004

[15] A Jain E Y Chang and Y-F Wang ldquoAdaptive stream resourcemanagement using Kalman Filtersrdquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 11ndash22 Paris France June 2004

[16] C Olston J Jiang and J Widom ldquoAdaptive filters for contin-uous queries over distributed data streamsrdquo in Proceedings ofthe ACM SIGMOD International Conference on Management ofData pp 563ndash574 June 2003

[17] K Yamanishi J-I Takeuchi and G Williams ldquoOn-line unsu-pervised outlier detection using finite mixtures with dis-counting learning algorithmsrdquo in Proceedings of the 6th ACM

SIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo01) pp 320ndash324 August 2000

[18] H-L Chan T-W Lam L-K Lee and H-F Ting ldquoContinuousmonitoring of distributed data streams over a time-basedsliding windowrdquo Algorithmica vol 62 no 3-4 pp 1088ndash11112012

[19] M Ding D Chen K Xing and X Cheng ldquoLocalized fault-tolerant event boundary detection in sensor networksrdquo inProceedings of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM rsquo05) pp902ndash913 March 2005

[20] J-L Gao Y-J Xu and X-W Li ldquoWeighted-median baseddistributed fault detection forwireless sensor networksrdquo Journalof Software vol 18 no 5 pp 1208ndash1217 2007

[21] B Krishnamachari and S Iyengar ldquoDistributed Bayesian algo-rithms for fault-tolerant event region detection in wirelesssensor networksrdquo IEEE Transactions on Computers vol 53 no3 pp 241ndash250 2004

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo Journal of the Society For Industrial ampApplied Mathematics vol 11 no 2 pp 431ndash441 1963

[23] X Su Y Lan R Wan and Y Qin Y ldquoA fast incremental cluster-ing algorithmrdquo in Proceedings of the International Symposiumon Information Processing (ISIP rsquo09) pp 175ndash178 HuangshanChina 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

6 Mathematical Problems in Engineering

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(a) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(b) 119882119898 = 30119882119899 = 15

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Prec

ision

OtlrS amp S

RTIRT

(c) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(d) 119882119898 = 60119882119899 = 20

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

OtlrS amp S

RTIRT

Prec

ision

(e) 119882119898 = 90119882119899 = 25

Set1 Set2 Set3 Set40

01

02

03

04

05

06

07

08

09

1

Reca

ll

OtlrS amp S

RTIRT

(f) 119882119898 = 90119882119899 = 25

Figure 5 Four-folder cross-validation for precision (a c e) and recall (b d f) with different119882119898 and119882119899

Mathematical Problems in Engineering 7

of GbFD is defined as a set that contains one time samples fora status group

In the DetectionTread we first check the Stuck-at faultsand Spikes by calling the IsStuck and IsSpikes methods AndIsRatStaChange is called two times (line 19ndash21) to detectthe sensor status transformation When an Outlier wasdetected GbFD should give the conclusion whether it isa rational operation such as shut down the device and ifno Outlier happened we also need to check the abnormalstatus transformation The first calling of IsRatStaChange iscontrolled by the IsOutlier method with an AND logic nomore algorithm complexity is increased by the twice callingThe SelfLearningThread works as a background service Itsresponsibility is learning the user feedback to find out themissing detection and the false detection and adjusting therelational parameters For a new rational trend vector wewill add it to the proper group rational trend history by theIncClustering method which can recluster the grouped trendvectors according to the specified angle threshold

The time complexity of the GbFD algorithm is dominatedby IsOutlier and IsRatStuChange methods For a sensor ingroup119866 the detection computation takes119874((119882

119899+119889)times119882119898)119882119899 is the STW size 119889 is the length of 119866 and 119882119898 is theSSW size In a real application the two windows sizes aresteerable and 119889 is less than 20 for the most part Moreovera proper thread dispatch mechanism will guarantee GbFDto handle the real-time detection task And the self-learningprocess is more complex due to many iterative operationsbut it is a background service and does not require real-timeperformance

5 Experimental Evaluation

We built an application to evaluate our framework Thisapplication was implemented in JAVA about 2400 lines codeand used Oracle as the Application DB

51 Data Preparation We use real data from an oil field inChina This oil field has 20 oilgas treatment plants and all ofthe production equipments are monitored by sensors whichcan be mainly classified as temperature sensor pressuresensor and liquid level sensor The sampling rate of theproduction IoT is 60 seconds We obtained the sample dataof 10000 sensors between January 1st 8 PM to October 31th8AM 2012 Since the production lines are relatively stablewe filtered the original data by two steps Firstly someproduction units that never make a mistake were eliminatedSecondly for a plant we discard somedata in its stable period

Table 1 shows features of our simulation dataset Wechoose more than 751 million samples from 5800 sensorsAccording to the corresponding flow charts we separate thesensors into 1340 groups Each group has 433 sensors inaverage and the maximum group contains 8 sensors Weanalyze the user feedback history and find out four typicalerrors Outlier means that a value runs out of range and itis caused by the sensor failure We put Stuck-at faults andSpikes together The Rational Trend (RT) missed fault meansthat sensors submit an exceptional value after a rational user

operation such as shuting down a device for examinationAnd the Irrational Trend (IRT) missed fault means thatsomething wrong happened with the production line andthe sensor values changed with it but still suitable with thethreshold condition

52 Experiment We split the simulation data into fourdatasets according to the time sequence Each time we useoptional three data sets to train the GbFD algorithm and theremainder is used as the testing data

We use three pairs SSW size and STW size which are(30 15) (60 20) and (90 25) to run the four-folder cross-validation test The result is shown in Figure 5 When 119882

119898 =

30 and119882119899 = 15 each cross get a precision of about 80Thisresult is not good enough A satisfactory result is generatedin the next test 119882119898 = 60 and 119882119899 = 20 get a mean95 accuracy But with the increase of two windows sizethe accuracy of GbFD will go to an opposite direction Thisphenomenon indicates that the sizes of SSW and STW arestrong correlating with our algorithm Choosing the properwindows size can increase the detection sensitivity and thewindows size (60 20) is suitable with our simulation data

6 Conclusions

We present a self-learning sensor fault detection frameworkin this paper We propose a model which can represent thesensor value sensor relationship and sensor status transfor-mation GbFD algorithm is proposed to detect the sensorfault And we use real data from an oil field for validationExperimental results show that our system can detect 95 ofdata fault in the simulation datawhich contains 75168millionsamples from 5800 sensors

We will continue validate our approach on other datasetto find out the proper statistical sliding window size and sta-tus transform windows size in different application contextsOur goal is to build a sensor health management system forindustry IoT that includes not only sensor fault detectionbut also sensor lifecycle prediction and sensor inspectionmanagement

Acknowledgments

The authors would like to thank Jun Ma for helping themto dispose the simulation data This research is supported bythe National Natural Science Funds of China (61202238) theState Key Laboratory of Software Development EnvironmentFunds (SKLSDE-2011ZX-08) and the Fundamental ResearchFunds for the Central Universities (2012RC0209)

References

[1] Y Liu and G Zhou ldquoKey technologies and applications ofinternet of thingsrdquo in Proceedings of the 5th International Con-ference on Intelligent Computation Technology and Automation(ICICTA rsquo12) pp 197ndash200 Zhangjiajie China January 2012

[2] W Baoyun ldquoReview on Internet ofThingsrdquo Journal of ElectronicMeasurement and Instrument vol 23 no 12 pp 1ndash7 2009

8 Mathematical Problems in Engineering

[3] S Haller S Karnouskos and C Schroth ldquoThe internet ofthings in an enterprise contextrdquo in Future Internet vol 5468 ofLecture Notes in Computer Science pp 14ndash28 Springer BerlinGermany 2009

[4] I F AkyildizW Su Y Sankarasubramaniam and E Cayirci ldquoAsurvey on sensor networksrdquo IEEE Communications Magazinevol 40 no 8 pp 102ndash105 2002

[5] L Paradis and Q Han ldquoA survey of fault management inwireless sensor networksrdquo Journal of Network and SystemsManagement vol 15 no 2 pp 171ndash190 2007

[6] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[7] KNiN RamanathanMNHChehade et al ldquoSensor networkdata fault typesrdquo ACM Transactions on Sensor Networks vol 5no 3 pp 1ndash29 2009

[8] N Bressan L Bazzaco N Bui P Casari L Vangelista andM Zorzi ldquoThe deployment of a smart monitoring systemusing wireless sensor and actuator networksrdquo Proceedings ofInternational Conference on Smart Grid Communications pp49ndash54 2010

[9] A P Castellani M Gheda N Bui M Rossi and M ZorzildquoWeb services for the Internet of things through CoAP andEXIrdquo in Proceedings of the IEEE International Conference onCommunications Workshops (ICC rsquo11) pp 1ndash6 Kyoto JapanJune 2011

[10] S YuanX Lai X ZhaoXXu andL Zhang ldquoDistributed struc-tural health monitoring system based on smart wireless sensorand multi-agent technologyrdquo Smart Materials and Structuresvol 15 no 1 pp 1ndash8 2006

[11] N Bui and M Zorzi ldquoHealth care applications a solutionbased on the Internet of Thingsrdquo in Proceedings of the 4thInternational Symposium on Applied Sciences in Biomedical andCommunication Technologies (ISABEL rsquo11) article 131 October2011

[12] C Guestrin P Bodik R Thibaux M Paskin and S MaddenldquoDistributed regression an efficient framework for modelingsensor network datardquo in Proceedings of the 3rd InternationalSymposium on Information Processing in Sensor Networks (IPSNrsquo04) pp 1ndash10 Berkeley Calif USA April 2004

[13] J Considine M Hadjieleftheriou F Li J Byers and G KolliosldquoRobust approximate aggregation in sensor data managementsystemsrdquo ACM Transactions on Database Systems vol 34 no 1article 6 2009

[14] M B Greenwald and S Khanna ldquoPower-conserving compu-tation of order-statistics over sensor networksrdquo in Proceedingsof the 23rd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems (PODS rsquo04) pp 275ndash285 ParisFrance June 2004

[15] A Jain E Y Chang and Y-F Wang ldquoAdaptive stream resourcemanagement using Kalman Filtersrdquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 11ndash22 Paris France June 2004

[16] C Olston J Jiang and J Widom ldquoAdaptive filters for contin-uous queries over distributed data streamsrdquo in Proceedings ofthe ACM SIGMOD International Conference on Management ofData pp 563ndash574 June 2003

[17] K Yamanishi J-I Takeuchi and G Williams ldquoOn-line unsu-pervised outlier detection using finite mixtures with dis-counting learning algorithmsrdquo in Proceedings of the 6th ACM

SIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo01) pp 320ndash324 August 2000

[18] H-L Chan T-W Lam L-K Lee and H-F Ting ldquoContinuousmonitoring of distributed data streams over a time-basedsliding windowrdquo Algorithmica vol 62 no 3-4 pp 1088ndash11112012

[19] M Ding D Chen K Xing and X Cheng ldquoLocalized fault-tolerant event boundary detection in sensor networksrdquo inProceedings of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM rsquo05) pp902ndash913 March 2005

[20] J-L Gao Y-J Xu and X-W Li ldquoWeighted-median baseddistributed fault detection forwireless sensor networksrdquo Journalof Software vol 18 no 5 pp 1208ndash1217 2007

[21] B Krishnamachari and S Iyengar ldquoDistributed Bayesian algo-rithms for fault-tolerant event region detection in wirelesssensor networksrdquo IEEE Transactions on Computers vol 53 no3 pp 241ndash250 2004

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo Journal of the Society For Industrial ampApplied Mathematics vol 11 no 2 pp 431ndash441 1963

[23] X Su Y Lan R Wan and Y Qin Y ldquoA fast incremental cluster-ing algorithmrdquo in Proceedings of the International Symposiumon Information Processing (ISIP rsquo09) pp 175ndash178 HuangshanChina 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

Mathematical Problems in Engineering 7

of GbFD is defined as a set that contains one time samples fora status group

In the DetectionTread we first check the Stuck-at faultsand Spikes by calling the IsStuck and IsSpikes methods AndIsRatStaChange is called two times (line 19ndash21) to detectthe sensor status transformation When an Outlier wasdetected GbFD should give the conclusion whether it isa rational operation such as shut down the device and ifno Outlier happened we also need to check the abnormalstatus transformation The first calling of IsRatStaChange iscontrolled by the IsOutlier method with an AND logic nomore algorithm complexity is increased by the twice callingThe SelfLearningThread works as a background service Itsresponsibility is learning the user feedback to find out themissing detection and the false detection and adjusting therelational parameters For a new rational trend vector wewill add it to the proper group rational trend history by theIncClustering method which can recluster the grouped trendvectors according to the specified angle threshold

The time complexity of the GbFD algorithm is dominatedby IsOutlier and IsRatStuChange methods For a sensor ingroup119866 the detection computation takes119874((119882

119899+119889)times119882119898)119882119899 is the STW size 119889 is the length of 119866 and 119882119898 is theSSW size In a real application the two windows sizes aresteerable and 119889 is less than 20 for the most part Moreovera proper thread dispatch mechanism will guarantee GbFDto handle the real-time detection task And the self-learningprocess is more complex due to many iterative operationsbut it is a background service and does not require real-timeperformance

5 Experimental Evaluation

We built an application to evaluate our framework Thisapplication was implemented in JAVA about 2400 lines codeand used Oracle as the Application DB

51 Data Preparation We use real data from an oil field inChina This oil field has 20 oilgas treatment plants and all ofthe production equipments are monitored by sensors whichcan be mainly classified as temperature sensor pressuresensor and liquid level sensor The sampling rate of theproduction IoT is 60 seconds We obtained the sample dataof 10000 sensors between January 1st 8 PM to October 31th8AM 2012 Since the production lines are relatively stablewe filtered the original data by two steps Firstly someproduction units that never make a mistake were eliminatedSecondly for a plant we discard somedata in its stable period

Table 1 shows features of our simulation dataset Wechoose more than 751 million samples from 5800 sensorsAccording to the corresponding flow charts we separate thesensors into 1340 groups Each group has 433 sensors inaverage and the maximum group contains 8 sensors Weanalyze the user feedback history and find out four typicalerrors Outlier means that a value runs out of range and itis caused by the sensor failure We put Stuck-at faults andSpikes together The Rational Trend (RT) missed fault meansthat sensors submit an exceptional value after a rational user

operation such as shuting down a device for examinationAnd the Irrational Trend (IRT) missed fault means thatsomething wrong happened with the production line andthe sensor values changed with it but still suitable with thethreshold condition

52 Experiment We split the simulation data into fourdatasets according to the time sequence Each time we useoptional three data sets to train the GbFD algorithm and theremainder is used as the testing data

We use three pairs SSW size and STW size which are(30 15) (60 20) and (90 25) to run the four-folder cross-validation test The result is shown in Figure 5 When 119882

119898 =

30 and119882119899 = 15 each cross get a precision of about 80Thisresult is not good enough A satisfactory result is generatedin the next test 119882119898 = 60 and 119882119899 = 20 get a mean95 accuracy But with the increase of two windows sizethe accuracy of GbFD will go to an opposite direction Thisphenomenon indicates that the sizes of SSW and STW arestrong correlating with our algorithm Choosing the properwindows size can increase the detection sensitivity and thewindows size (60 20) is suitable with our simulation data

6 Conclusions

We present a self-learning sensor fault detection frameworkin this paper We propose a model which can represent thesensor value sensor relationship and sensor status transfor-mation GbFD algorithm is proposed to detect the sensorfault And we use real data from an oil field for validationExperimental results show that our system can detect 95 ofdata fault in the simulation datawhich contains 75168millionsamples from 5800 sensors

We will continue validate our approach on other datasetto find out the proper statistical sliding window size and sta-tus transform windows size in different application contextsOur goal is to build a sensor health management system forindustry IoT that includes not only sensor fault detectionbut also sensor lifecycle prediction and sensor inspectionmanagement

Acknowledgments

The authors would like to thank Jun Ma for helping themto dispose the simulation data This research is supported bythe National Natural Science Funds of China (61202238) theState Key Laboratory of Software Development EnvironmentFunds (SKLSDE-2011ZX-08) and the Fundamental ResearchFunds for the Central Universities (2012RC0209)

References

[1] Y Liu and G Zhou ldquoKey technologies and applications ofinternet of thingsrdquo in Proceedings of the 5th International Con-ference on Intelligent Computation Technology and Automation(ICICTA rsquo12) pp 197ndash200 Zhangjiajie China January 2012

[2] W Baoyun ldquoReview on Internet ofThingsrdquo Journal of ElectronicMeasurement and Instrument vol 23 no 12 pp 1ndash7 2009

8 Mathematical Problems in Engineering

[3] S Haller S Karnouskos and C Schroth ldquoThe internet ofthings in an enterprise contextrdquo in Future Internet vol 5468 ofLecture Notes in Computer Science pp 14ndash28 Springer BerlinGermany 2009

[4] I F AkyildizW Su Y Sankarasubramaniam and E Cayirci ldquoAsurvey on sensor networksrdquo IEEE Communications Magazinevol 40 no 8 pp 102ndash105 2002

[5] L Paradis and Q Han ldquoA survey of fault management inwireless sensor networksrdquo Journal of Network and SystemsManagement vol 15 no 2 pp 171ndash190 2007

[6] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[7] KNiN RamanathanMNHChehade et al ldquoSensor networkdata fault typesrdquo ACM Transactions on Sensor Networks vol 5no 3 pp 1ndash29 2009

[8] N Bressan L Bazzaco N Bui P Casari L Vangelista andM Zorzi ldquoThe deployment of a smart monitoring systemusing wireless sensor and actuator networksrdquo Proceedings ofInternational Conference on Smart Grid Communications pp49ndash54 2010

[9] A P Castellani M Gheda N Bui M Rossi and M ZorzildquoWeb services for the Internet of things through CoAP andEXIrdquo in Proceedings of the IEEE International Conference onCommunications Workshops (ICC rsquo11) pp 1ndash6 Kyoto JapanJune 2011

[10] S YuanX Lai X ZhaoXXu andL Zhang ldquoDistributed struc-tural health monitoring system based on smart wireless sensorand multi-agent technologyrdquo Smart Materials and Structuresvol 15 no 1 pp 1ndash8 2006

[11] N Bui and M Zorzi ldquoHealth care applications a solutionbased on the Internet of Thingsrdquo in Proceedings of the 4thInternational Symposium on Applied Sciences in Biomedical andCommunication Technologies (ISABEL rsquo11) article 131 October2011

[12] C Guestrin P Bodik R Thibaux M Paskin and S MaddenldquoDistributed regression an efficient framework for modelingsensor network datardquo in Proceedings of the 3rd InternationalSymposium on Information Processing in Sensor Networks (IPSNrsquo04) pp 1ndash10 Berkeley Calif USA April 2004

[13] J Considine M Hadjieleftheriou F Li J Byers and G KolliosldquoRobust approximate aggregation in sensor data managementsystemsrdquo ACM Transactions on Database Systems vol 34 no 1article 6 2009

[14] M B Greenwald and S Khanna ldquoPower-conserving compu-tation of order-statistics over sensor networksrdquo in Proceedingsof the 23rd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems (PODS rsquo04) pp 275ndash285 ParisFrance June 2004

[15] A Jain E Y Chang and Y-F Wang ldquoAdaptive stream resourcemanagement using Kalman Filtersrdquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 11ndash22 Paris France June 2004

[16] C Olston J Jiang and J Widom ldquoAdaptive filters for contin-uous queries over distributed data streamsrdquo in Proceedings ofthe ACM SIGMOD International Conference on Management ofData pp 563ndash574 June 2003

[17] K Yamanishi J-I Takeuchi and G Williams ldquoOn-line unsu-pervised outlier detection using finite mixtures with dis-counting learning algorithmsrdquo in Proceedings of the 6th ACM

SIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo01) pp 320ndash324 August 2000

[18] H-L Chan T-W Lam L-K Lee and H-F Ting ldquoContinuousmonitoring of distributed data streams over a time-basedsliding windowrdquo Algorithmica vol 62 no 3-4 pp 1088ndash11112012

[19] M Ding D Chen K Xing and X Cheng ldquoLocalized fault-tolerant event boundary detection in sensor networksrdquo inProceedings of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM rsquo05) pp902ndash913 March 2005

[20] J-L Gao Y-J Xu and X-W Li ldquoWeighted-median baseddistributed fault detection forwireless sensor networksrdquo Journalof Software vol 18 no 5 pp 1208ndash1217 2007

[21] B Krishnamachari and S Iyengar ldquoDistributed Bayesian algo-rithms for fault-tolerant event region detection in wirelesssensor networksrdquo IEEE Transactions on Computers vol 53 no3 pp 241ndash250 2004

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo Journal of the Society For Industrial ampApplied Mathematics vol 11 no 2 pp 431ndash441 1963

[23] X Su Y Lan R Wan and Y Qin Y ldquoA fast incremental cluster-ing algorithmrdquo in Proceedings of the International Symposiumon Information Processing (ISIP rsquo09) pp 175ndash178 HuangshanChina 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

8 Mathematical Problems in Engineering

[3] S Haller S Karnouskos and C Schroth ldquoThe internet ofthings in an enterprise contextrdquo in Future Internet vol 5468 ofLecture Notes in Computer Science pp 14ndash28 Springer BerlinGermany 2009

[4] I F AkyildizW Su Y Sankarasubramaniam and E Cayirci ldquoAsurvey on sensor networksrdquo IEEE Communications Magazinevol 40 no 8 pp 102ndash105 2002

[5] L Paradis and Q Han ldquoA survey of fault management inwireless sensor networksrdquo Journal of Network and SystemsManagement vol 15 no 2 pp 171ndash190 2007

[6] Y Zhang N Meratnia and P Havinga ldquoOutlier detectiontechniques for wireless sensor networks a surveyrdquo IEEE Com-munications Surveys and Tutorials vol 12 no 2 pp 159ndash1702010

[7] KNiN RamanathanMNHChehade et al ldquoSensor networkdata fault typesrdquo ACM Transactions on Sensor Networks vol 5no 3 pp 1ndash29 2009

[8] N Bressan L Bazzaco N Bui P Casari L Vangelista andM Zorzi ldquoThe deployment of a smart monitoring systemusing wireless sensor and actuator networksrdquo Proceedings ofInternational Conference on Smart Grid Communications pp49ndash54 2010

[9] A P Castellani M Gheda N Bui M Rossi and M ZorzildquoWeb services for the Internet of things through CoAP andEXIrdquo in Proceedings of the IEEE International Conference onCommunications Workshops (ICC rsquo11) pp 1ndash6 Kyoto JapanJune 2011

[10] S YuanX Lai X ZhaoXXu andL Zhang ldquoDistributed struc-tural health monitoring system based on smart wireless sensorand multi-agent technologyrdquo Smart Materials and Structuresvol 15 no 1 pp 1ndash8 2006

[11] N Bui and M Zorzi ldquoHealth care applications a solutionbased on the Internet of Thingsrdquo in Proceedings of the 4thInternational Symposium on Applied Sciences in Biomedical andCommunication Technologies (ISABEL rsquo11) article 131 October2011

[12] C Guestrin P Bodik R Thibaux M Paskin and S MaddenldquoDistributed regression an efficient framework for modelingsensor network datardquo in Proceedings of the 3rd InternationalSymposium on Information Processing in Sensor Networks (IPSNrsquo04) pp 1ndash10 Berkeley Calif USA April 2004

[13] J Considine M Hadjieleftheriou F Li J Byers and G KolliosldquoRobust approximate aggregation in sensor data managementsystemsrdquo ACM Transactions on Database Systems vol 34 no 1article 6 2009

[14] M B Greenwald and S Khanna ldquoPower-conserving compu-tation of order-statistics over sensor networksrdquo in Proceedingsof the 23rd ACM SIGMOD-SIGACT-SIGART Symposium onPrinciples of Database Systems (PODS rsquo04) pp 275ndash285 ParisFrance June 2004

[15] A Jain E Y Chang and Y-F Wang ldquoAdaptive stream resourcemanagement using Kalman Filtersrdquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 11ndash22 Paris France June 2004

[16] C Olston J Jiang and J Widom ldquoAdaptive filters for contin-uous queries over distributed data streamsrdquo in Proceedings ofthe ACM SIGMOD International Conference on Management ofData pp 563ndash574 June 2003

[17] K Yamanishi J-I Takeuchi and G Williams ldquoOn-line unsu-pervised outlier detection using finite mixtures with dis-counting learning algorithmsrdquo in Proceedings of the 6th ACM

SIGKDD International Conference on Knowledge Discovery andData Mining (KDD rsquo01) pp 320ndash324 August 2000

[18] H-L Chan T-W Lam L-K Lee and H-F Ting ldquoContinuousmonitoring of distributed data streams over a time-basedsliding windowrdquo Algorithmica vol 62 no 3-4 pp 1088ndash11112012

[19] M Ding D Chen K Xing and X Cheng ldquoLocalized fault-tolerant event boundary detection in sensor networksrdquo inProceedings of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM rsquo05) pp902ndash913 March 2005

[20] J-L Gao Y-J Xu and X-W Li ldquoWeighted-median baseddistributed fault detection forwireless sensor networksrdquo Journalof Software vol 18 no 5 pp 1208ndash1217 2007

[21] B Krishnamachari and S Iyengar ldquoDistributed Bayesian algo-rithms for fault-tolerant event region detection in wirelesssensor networksrdquo IEEE Transactions on Computers vol 53 no3 pp 241ndash250 2004

[22] D W Marquardt ldquoAn algorithm for least-squares estimation ofnonlinear parametersrdquo Journal of the Society For Industrial ampApplied Mathematics vol 11 no 2 pp 431ndash441 1963

[23] X Su Y Lan R Wan and Y Qin Y ldquoA fast incremental cluster-ing algorithmrdquo in Proceedings of the International Symposiumon Information Processing (ISIP rsquo09) pp 175ndash178 HuangshanChina 2009

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article A Self-Learning Sensor Fault Detection …downloads.hindawi.com/journals/mpe/2013/712028.pdf · In this paper, we propose a self-learning sensor fault detection framework

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of