Advanced Pattern Recognition for Optimal Bandwidth and ...sv4/papers/WorldComp2016_FINAL.pdf ·...

Advanced Pattern Recognition for Optimal Bandwidth

and Power Utilization for Wireless Intelligent Motes for

IoT Applications

Kenny C. Gross, Kalyan Vaidyanathan and SeyyedMajid Valiollahzadeh

Oracle Corporation

{kenny.gross,kalyan.vaidyanathan,majid.valiollahzadeh}@oracle.com

Abstract - New prognostic machine learning innovations for

wireless intelligent motes for Internet-of-Things (IoT)

applications are being developed and demonstrated for

enhancing the reliability, availability, and serviceability of

end-customer assets monitored with wireless sensors. For the

transmitter side, battery life is optimized by not transmitting

data during "uninteresting" times when signals are statistically

consistent with normal background dynamic activity, as

determined by a novel six-component sequential probability

ratio test (SPRT) described herein. This new six-component

SPRT pattern recognition test possesses significant

advantages of conventional threshold-limit tests and results in

superior prognostic performance with better battery life for

intelligent wireless mote based prognostic IoT applications.

Keywords: Prognostic Machine Learning, Intelligent

Wireless Motes

1 Introduction

Oracle has been building a portfolio of prognostics

innovations for pattern recognition applications for

processing, analysis, archival, and data mining of telemetry

signals coming from wired sensors in Oracle enterprise

servers and facilities cooling systems in large-scale enterprise

data centers. A recent trend in the industry is to eliminate the

fetters of hard-wired sensors. As a result, we see increasing

deployment of intelligent wireless sensors - more than just

transducers, the motes contain their own tiny Java (or other)

OS and have sufficient compute cycles to do preprocessing of

digitized signals from multiple physical transducers. In the

industry, these are being called Smart Motes, and are finding

wide use in Internet-of-Things (IoT) applications, not only in

data centers but also in industrial applications in

manufacturing, utilities, and transportation, in measurement of

a wide range of physical variables such as temperature,

vibration, humidity, barometric pressure, radiation level (for

numerous Homeland Security apps), light, sonar, etc.

Many types of variables that are being measured by Smart

Motes are episodic in nature. In such cases, there is some

normal “background” variation level that is not particularly

interesting. Then there are episodes of interesting events that

may be characterized by elevated levels, an increased

burstiness, the appearance of a trend or growth rate in

variables that are otherwise stationary (in the statistical sense),

or the appearance of dynamic phenomena that distinguish the

interesting events from the normal background variation

levels.

One could have Smart Motes transmit the data values

continuously, but doing so during uninteresting time periods

wastes bandwidth and wastes battery power for the Smart

Motes. A conventional approach to use the bandwidth more

wisely is to set thresholds for variables so that if the measured

level exceeds a threshold, then one transmits the data. This

conventional threshold-limit approach suffers from two

serious limitations:

1) It is difficult to decide where to set the threshold. For

noisy processes, if one sets the threshold too low, one gets

frequent “false alarm” trips. To avoid the false alarms, one

sets the thresholds higher. Then it is possible to miss

interesting activity.

2) The received data has gaps during the “uninteresting”

times. Most pattern recognition and machine learning

techniques that Oracle has developed or licensed for

intelligent “consumers” of telemetry data require uniformly

sampled signals. We have not been able to identify any

consumer processes that can take multiple signals that can

have disparate gaps in their signatures.

This present paper leverages Oracle experience with

prognostic statistical pattern recognition and introduces a

novel approach for transmitting and receiving digitized

signals from Smart Motes. The novel approach introduced

herein minimizes bandwidth, and battery power for the Smart

Motes, yet eliminates gaps in data on the receiving end.

1.1 Transmitting Side:

Instead of using a fixed threshold to decide when to transmit

data, this paper applies a sequential detection algorithm called

the Sequential Probability Ratio Test (SPRT) to decide when

the events monitored by the wireless sensors are “interesting”

(which can include coming from a distribution that has a

higher or lower mean, or having a larger or smaller level of

variability than normal). Using a SPRT on the transmission

side brings the following advantages:

� The SPRT has user configurable false-alarm and

missed-alarm probabilities (as opposed to threshold

limits, which have a “sea-saw” tradeoff between

sensitivity and false alarms).

� The SPRT has the mathematically shortest decision

time for catching subtle anomalies in noisy process

variables.

It is important to note that the SPRT is not compute intensive

and is well suited to operate within the cpu constraints of the

Smart Motes (the SPRT comprises some simple algebraic

expressions, as shown in the mathematical underpinnings

section below).

1.2 Receiving Side:

On the receiving side, signals coming from Smart Motes

during their uninteresting times could have large gaps during

times when the wireless sensors are not transmitting. The

Smart Mote system described here generates synthetic

observations adhering to a target distribution that has the same

mean and variance as the parameter being monitored by the

Smart Mote during that variable's “background activity” time

period. The distribution defaults to normal, Gaussian white

noise, which is adequate for 99% of the anticipated

applications. By definition, the data characteristics are less

interesting during this time period, so that synthesis of an

approximation to background activity is adequate and, most

importantly, supplies synchronous observations so that

Oracle's (already existing), or any other pattern recognition

algorithms can be used as consumers of the telemetry signals.

(As an option, the capability is provided to customize the

distribution for applications where the expected “background

activity” is different from Gaussian.)

2 SPRT Implementation

The Sequential Probability Ratio Test [1-3] is a

statistical hypothesis test that differs from standard fixed

sample tests in the way in which statistical observations are

employed. In familiar fixed-sample statistical tests, a given

number of observations are used to select one hypothesis from

one or more alternative hypotheses. The SPRT, however,

examines one observation at a time, then makes a decision as

soon as it has sufficient information to ensure that pre-

specified confidence bounds are met.

The basic approach taken by the SPRT technique is to

analyze successive observations of a discrete process. Let yn

represent a sample from the process at a given moment tn in

time. We'll assume for simplicity of illustration that the

sequence of values {Yn} = y0, y1 ... yn comes from a

stationary process characterized by a Gaussian, white-noise

probability density function (pdf) with mean 0. (Note that

since we are dealing here with nominally stationary processes,

any process variables with a nonzero mean can be first

normalized to a mean of zero with no loss of generality).

The SPRT is a binary hypothesis test that analyzes

process observations sequentially to determine whether or not

the signal is consistent with normal behavior. When a SPRT

reaches a decision about current process behavior, i.e., that

the signal is behaving normally or abnormally, the decision is

reported and the test continues to process observations.

For each of the six types of tandem SPRT tests

developed here, normal process behavior is defined for data

signals corresponding to a Gaussian pdf with mean 0 and

variance σσσσ2. Normal signal behavior is referred to as the null

hypothesis, H0. We formulate six specific SPRT hypothesis

tests that are computed in parallel for the variable monitored

by each wireless Smart Mote. Although the default

implementation is described in terms of applying a SPRT that

applies to Gaussian time series, one can use a nonparametric

SPRT for signals that are not represented with a Gaussian

distribution.

Within a SPRT surveillance module, all 6 tandem

hypothesis tests are executed in parallel. Each test determines

whether the current sequence of process observations is

consistent with the null hypothesis vs. an alternative

hypothesis. Four of the tests, which have been applied in

open-literature publications previously (but for different types

of applications), are known as the positive mean test, the

negative mean test, the nominal variance test, and the inverse

variance test. For the positive mean test, the corresponding

alternative hypothesis, H1, is that the signal data

corresponding to a Gaussian pdf with mean +M and variance

σσσσ2. For the negative mean test, the corresponding alternative

hypothesis, H2, is that the signal data corresponding to a

Gaussian pdf with mean -M and variance σσσσ2. For the nominal

variance test, the corresponding alternative hypothesis, H3, is

that the signal data corresponding to a Gaussian pdf with

mean 0 and variance Vσσσσ2 (with scalar factor V). For the

inverse variance test, the corresponding alternative

hypothesis, H4, is that the data corresponding to a Gaussian

pdf with mean 0 and variance σσσσ2/V.

The final two tandem SPRT tests are performed not on

the raw Smart Mote output variables described above, but on

the first difference function of the variable. For discrete time

series, the first difference function (i.e. difference between

each observation and the observation preceding it) gives an

estimate of the numerical derivative of the time series. During

uninteresting time periods, the observations in the first

difference function will be a nominally stationary random

process centered about zero, which is perfectly in concert with

our assumptions in employing a SPRT. If an upward or

downward trend should suddenly appear in the signal, SPRTs

number 5 and 6 monitor for an increase or decrease,

respectively, in the slope of the Smart Mote variable. As

such, if there is a decrease in the value of the variable, SPRT

alarms will be triggered for SPRTs 2 and 6. SPRT 2 will fire

a warning because the sequence of raw observations is

dropping with time. And SPRT 6 will fire because the slope

of the variable changes from zero to something less than zero.

The advantage of monitoring the mean SPRT and slope SPRT

in tandem is realized if the signal levels off to a new

stationary value (or plateau). At this point SPRT 2 will

continue firing because the new current value is different from

the value prior to the degradation; whereas the alarms from

SPRT 6 will cease because the slope returns to zero when the

raw signal reaches a plateau.

If SPRTs 3 or 4 should fire a warning, it means that the

variance of the sensed variable is increasing or decreasing,

respectively. An increasing variance that is not accompanied

by a change in mean (inferred from SPRTs 1 & 2 and SPRTs

5 & 6) can signify an episodic event that is “bursty” or

“spiky” with time. A decreasing variance that is not

accompanied by a change in mean is a common symptom of a

failing sensor that is characterized by an increasing time

constant. As such, having variance SPRTs available in

parallel with slope and mean SPRTs can provide a wealth of

supplementary diagnostic information that has not been

possible with conventional wireless sensors.

The SPRT technique provides a quantitative framework

that permits a decision to be made between the null hypothesis

and the foregoing six alternative hypotheses with specified

misidentification probabilities. If the SPRT accepts one of the

alternative hypotheses, an alarm flag is set and data is

transmitted. If all six of the null hypotheses are met, it can be

concluded with a high degree of confidence that the data

represents normal background activity for the variable

monitored by the Smart Mote.

The SPRT operates as follows. At each time step in a

calculation, a test index is calculated and compared to two

stopping boundaries A and B (defined below). The test index

is equal to the natural log of a likelihood ratio (Ln), which for

a given SPRT is the ratio of the probability that the alternative

hypothesis for the test (Hj, where j is the appropriate subscript

for the SPRT in question) is true, to the probability that the

null hypothesis (H0) is true.

trueHgivenYsequenceobservedofyprobabilit

trueHgivenYsequenceobservedofyprobabilit

n

jn

0}{

}{ Ln =

(1)

If the logarithm of the likelihood ratio is greater than or equal

to the logarithm of the upper threshold limit [i.e., ln(Ln) > ln(B)], then it can be concluded that the alternative hypothesis

is true. If the logarithm of the likelihood ratio is less than or

equal to the logarithm of the lower threshold limit [i.e., ln(Ln)

< ln(A)], then it can be concluded that the null hypothesis is

true. If the log likelihood ratio falls between the two limits,

[i.e., ln(A) < ln(Ln) < ln(B)], then there is not yet enough

information to make a decision (and, incidentally, no other

statistical test could yet reach a decision with the same given

Type I and II misidentification probabilities).

The threshold limits are related to the misidentification

probabilities α and β by the following expressions:

A = αβ−1 and B = α

β−1 , (2)

where

α is the probability of accepting Hj when H0 is true (i.e., the

false-alarm probability), and

β is the probability of accepting H0 when Hj is true (i.e., the

missed-alarm probability).

The first two SPRT tests for normal distributions examine the

mean of the process observations. The goal of the mean tests

is to declare that the system is degraded if the distribution of

observations exhibits a non-zero mean, e.g., a mean of either

+M or -M, where M is the pre-assigned system disturbance

magnitude for the mean test. Assuming that the sequence

{Yn} is corresponding to a Gaussian pdf, then the probability

that the null hypothesis H0 is true (i.e., mean 0 and variance

σσσσ2) is given by [Ref. 4], resulting in:

P(y1,y2,…yn | H0) =

2/2

)2(

1nσπ

exp

− ∑

−

n

k

ky1

2

22

1

σ (3)

Similarly, the probability for alternative hypothesis H1 (i.e.,

mean M and variance σσσσ2) is:

P(y1,y2,…yn | H1 ) =

2/2 )2(

1nσπ

exp

+−− ∑ ∑∑

− −−

n

k

n

k

k

n

k

k MMyy1 1

2

1

2

22

2

1

σ (4)

The ratio of the probabilities in Equations (3) and (4) gives the

likelihood ratio Ln for the positive mean test, Equation (5):

Ln = exp ( )

−− ∑

−k

n

k

yMM 22

1

12σ

(5)

The SPRT index for the positive mean test (SPRTpos) is

given by taking the logarithm of the foregoing likelihood ratio:

SPRTpos = ( )kn

k

yMM 22

1

12

−− ∑−σ

= 2σM∑−

−

n

k

k

My

1 2 (6)

The SPRT index for the negative mean test (SPRTpos) can be

derived by substituting -M for each instance of M in Equations

(4) through (6), resulting in:

SPRTneg = 2σM

∑−

−−n

k

k

My

1 2 (7)

The remaining two SPRT tests examine the variance of

the sequence. This capability gives the SPRT module the

ability to detect and quantitatively characterize changes in

variability for processes, which is vitally important for 6-

sigma QA/QC improvement initiatives. In the variance tests,

the system is declared to be degraded if the sequence exhibits

a change in variance by a factor of V or 1/V. Where V, the

pre-assigned system disturbance magnitude for the variance

test, is a positive scalar. The probability that the alternative

hypothesis H3 is true (i.e., mean 0 and variance Vσσσσ2) is given

by Equation (3) with σσσσ2 replaced by Vσσσσ2

:

P(y1,y2…yn|H2)= 2/2 )2(

1nVσπ

exp

− ∑

−

n

k

kyV 1

2

22

1

σ (8)

The likelihood ratio for the variance test is given by the ratio

of Equation (8) to Equation (3):

Ln = V-n/2

exp

−− ∑

−

n

k

kyV

V

1

2

2

1

2

1

σ (9)

The SPRT index for the nominal variance test (SPRTnom) is given by taking the logarithm of the likelihood ratio given in

Equation (9), to give:

SPRTnom =

−V

V 1

2

12σ

∑−

n

k

ky1

2

- Vn

ln2

(10)

The SPRT index for the inverse variance test (SPRTinv) can

be derived by substituting 1/V for each instance of V in

Equations (8) through (10), resulting in:

SPRTinv = ( )V−12

12σ

∑−

n

k

ky1

2

+ Vn

ln2

(11)

The tandem SPRT module performs mean, variance, and

SPRT tests on the raw process signal and its first difference

function. To initialize the module for analysis of a wireless

Smart Mote variable time series, the user specifies the system

disturbance magnitudes for the tests (M and V), the false-

alarm probability (α), and the missed- alarm probability (β). Then, during the training phase (before the first failure of a

component under test), the module calculates the mean and

variance of the monitored variable process signal. For most

monitored variables the mean of the raw observations for the

variable will be non-zero; in this case the mean calculated

from the training phase is used to normalize the signal during

the monitoring phase. The system disturbance magnitude for

the mean tests specifies the number of standard deviations (or

fractions thereof) that the distribution must shift in the positive

or negative direction to trigger an alarm. The system

disturbance magnitude for the variance tests specifies the

fractional change of the variance necessary to trigger an alarm.

At the beginning of the monitoring phase, all six SPRT

indices are set to 0. Then, during each time step of the

calculation, the SPRT indices are updated using Equations (6),

(7), (10), and (11). Each SPRT index is then compared to the

upper [i.e., ln((1-β)/α] and lower [i.e., ln((β/(1-α))] decision

boundaries, with these three possible outcomes: 1) the lower

limit is reached, in which case the process is declared healthy,

the test statistic is reset to zero, and sampling continues; 2) the

upper limit is reached, in which case the process is declared

degraded, an alarm flag is raised indicating a sensor or process

fault, the test statistic is reset to zero, and sampling continues;

or, 3) neither limit has been reached, in which case no

decision concerning the process can yet be made and the

sampling continues.

The advantages of using a SPRT, as have been demonstrated

in previous Oracle prognostic systems that use wired sensors

[Refs 5-7], are twofold:

1. One can detect very subtle anomalies in noisy process

variables at the earliest possible time.

2. One can pre-specify quantitative false-alarm and

missed-alarm probabilities.

For the intelligent wireless mote IoT applications addressed in

this paper, we introduce tandem SPRTs in a novel application

that monitors “derivative SPRTs” in parallel with mean and

variance SPRTs that are performed on the time series

associated with the Smart Mote measured variable(s). This

new tandem-SPRT approach enables one to determine the

onset of interesting episodic events, saving both bandwidth

and transmitter battery power during periods of normal

background activity for IoT applications of intelligent wireless

motes. During such time periods, the receiver agent generates

synthesized observations that possess exactly the same mean,

variance, skewness, and kurtosis, as the original variable

during its “background activity” period. (Gaussian data is

generated by default; but the distribution can be customized to

match an empirical CDF for IoT applications wherein signals

with non-Gaussian noise are encountered).

Output from this technique is then consumed and processed

with standard pattern recognition algorithms that expect

uniform, synchronous time series as input.

3 Example Applications

The new technique is illustrated with an example that uses

three typical signals from prototype wireless Smart Motes.

Continuous variables are illustrated by the upper 3 subplots in

Fig. 1. The second set of 3 subplots shows digitized samples

after the signals pass through an A/D converter chip.

With conventional approaches, all of the data shown would be

transmitted continuously, even during the less interesting

“background activity” periods that can be seen in the figures.

Figure 2 shows SPRT indices indicating “interesting” episodes

of activity in the sensors (i.e. the distribution is significantly

different, with a pre-defined confidence factor, from the

distribution of the “background activity”). The lower three

subplots of Fig. 2 show the optimized transmission activity for

the Smart Motes.

Figure 3 shows the original raw data as seen by the

sensors, the “interesting” data that was transmitted via the

wireless network, and the reconstructed signals per this paper.

The background activity in the reconstructed signals is

statistically indistinguishable from the background activity in

the raw signals (matches in mean and variance). The

optimized reconstructed signals are now synchronously

sampled and are amenable to analysis by pattern recognition

“consumer” algorithms, such as those separately patented by

Oracle for proactive anomaly detection.

Fig. 1. SPRT-Based Wireless Smart Mote Example

Application

Fig 2. SPRT Indices Indicate Interesting Episodes and

Trigger Data Transmission

Fig 3. Reconstructed Signals on Receiving Side

4 Conclusions

For intelligent wireless mote IoT applications addressed in

this paper, we introduce a new six-component tandem SPRT

statistical machine learning approach embedded in a light-

weight algorithm that monitors “derivative SPRTs” in parallel

with mean and variance SPRTs computed on digitized time

series associated with the Smart Mote measured variable(s).

This new tandem-SPRT approach enables one to determine

the onset of interesting episodic events, saving both

bandwidth and transmitter battery power during periods of

normal background activity for IoT applications of intelligent

wireless motes. During such time periods, the receiver agent

generates synthesized observations that possess exactly the

same mean, variance, skewness, and kurtosis, as the original

variable during its “background activity” period. This new

approach leverages and extends Oracle's experience with real-

time prognostics from wired telemetry pattern-recognition

applications. For intelligent wireless motes for IoT prognostic

applications, the new tandem-SPRT data-transmission

actuator introduced herein optimizes bandwidth utilization,

minimizes battery power for the smart motes, yet eliminates

gaps in data on the receiving end.

5 References

[1] A. Wald. Sequential Analysis. John Wiley &

Sons, New York, NY, 1947.

[2] K. C. Gross and K. Humenik. "Nuclear Power

Plant Component Surveillance Implemented in SAS

Software," Proc. SAS Users Group Int’l. Conf. pp.

1127-1131, San Francisco, April 1989.

[3] K. C. Gross, R. Dhanekula, and K.

Vaidyanathan, “Novel Training Enhancements for

Advanced Statistical Pattern Recognition Used for

Electronic Prognostics of Enterprise Computing

Systems,” Proc. IEEE World Congress in

Computer Science, Computer Engineering, and

Applied Computing (WorldComp2011), Las

Vegas, NV (Aug 2011).

[4] K. Whisnant, K. C. Gross and N. Lingurovska,

“Proactive Fault Monitoring in Enterprise Servers,”

Proc. 2005 IEEE Intn'l Multiconference in Computer

Science & Computer Eng., Las Vegas, NV (June 2005).

[5] A. Urmanov and K. C. Gross, “Failure Avoidance in

Computer Systems,” Proc. 59th Meeting of the Society

for Machinery Failure Prevention Technology,

Virginia Beach, VA (Apr 18-21, 2005).

[6] K. Vaidyanathan and K. C. Gross, “Proactive

Detection of Software Anomalies through MSET,”

Proc. IEEE Workshop on Predictive Software Models

(PSM-2004), Chicago (Sept 17-19, 2004).

[7] K. C. Gross, K. W. Whisnant and A. Urmanov,

"Electronic Prognostics Through Continuous System

Telemetry," Proc. 60th Meeting of the Society for

Machinery Failure Prevention Technology,Virginia

Beach, VA (April 2006).

Advanced Pattern Recognition for Optimal Bandwidth and ...sv4/papers/WorldComp2016_FINAL.pdf ·...

Documents

Transcript of Advanced Pattern Recognition for Optimal Bandwidth and ...sv4/papers/WorldComp2016_FINAL.pdf ·...