1997-Finding Optimal Solutions to Rubik's Cube Using Pattern Databases
Advanced Pattern Recognition for Optimal Bandwidth and ...sv4/papers/WorldComp2016_FINAL.pdf ·...
Transcript of Advanced Pattern Recognition for Optimal Bandwidth and ...sv4/papers/WorldComp2016_FINAL.pdf ·...
Advanced Pattern Recognition for Optimal Bandwidth
and Power Utilization for Wireless Intelligent Motes for
IoT Applications
Kenny C. Gross, Kalyan Vaidyanathan and SeyyedMajid Valiollahzadeh
Oracle Corporation
{kenny.gross,kalyan.vaidyanathan,majid.valiollahzadeh}@oracle.com
Abstract - New prognostic machine learning innovations for
wireless intelligent motes for Internet-of-Things (IoT)
applications are being developed and demonstrated for
enhancing the reliability, availability, and serviceability of
end-customer assets monitored with wireless sensors. For the
transmitter side, battery life is optimized by not transmitting
data during "uninteresting" times when signals are statistically
consistent with normal background dynamic activity, as
determined by a novel six-component sequential probability
ratio test (SPRT) described herein. This new six-component
SPRT pattern recognition test possesses significant
advantages of conventional threshold-limit tests and results in
superior prognostic performance with better battery life for
intelligent wireless mote based prognostic IoT applications.
Keywords: Prognostic Machine Learning, Intelligent
Wireless Motes
1 Introduction
Oracle has been building a portfolio of prognostics
innovations for pattern recognition applications for
processing, analysis, archival, and data mining of telemetry
signals coming from wired sensors in Oracle enterprise
servers and facilities cooling systems in large-scale enterprise
data centers. A recent trend in the industry is to eliminate the
fetters of hard-wired sensors. As a result, we see increasing
deployment of intelligent wireless sensors - more than just
transducers, the motes contain their own tiny Java (or other)
OS and have sufficient compute cycles to do preprocessing of
digitized signals from multiple physical transducers. In the
industry, these are being called Smart Motes, and are finding
wide use in Internet-of-Things (IoT) applications, not only in
data centers but also in industrial applications in
manufacturing, utilities, and transportation, in measurement of
a wide range of physical variables such as temperature,
vibration, humidity, barometric pressure, radiation level (for
numerous Homeland Security apps), light, sonar, etc.
Many types of variables that are being measured by Smart
Motes are episodic in nature. In such cases, there is some
normal “background” variation level that is not particularly
interesting. Then there are episodes of interesting events that
may be characterized by elevated levels, an increased
burstiness, the appearance of a trend or growth rate in
variables that are otherwise stationary (in the statistical sense),
or the appearance of dynamic phenomena that distinguish the
interesting events from the normal background variation
levels.
One could have Smart Motes transmit the data values
continuously, but doing so during uninteresting time periods
wastes bandwidth and wastes battery power for the Smart
Motes. A conventional approach to use the bandwidth more
wisely is to set thresholds for variables so that if the measured
level exceeds a threshold, then one transmits the data. This
conventional threshold-limit approach suffers from two
serious limitations:
1) It is difficult to decide where to set the threshold. For
noisy processes, if one sets the threshold too low, one gets
frequent “false alarm” trips. To avoid the false alarms, one
sets the thresholds higher. Then it is possible to miss
interesting activity.
2) The received data has gaps during the “uninteresting”
times. Most pattern recognition and machine learning
techniques that Oracle has developed or licensed for
intelligent “consumers” of telemetry data require uniformly
sampled signals. We have not been able to identify any
consumer processes that can take multiple signals that can
have disparate gaps in their signatures.
This present paper leverages Oracle experience with
prognostic statistical pattern recognition and introduces a
novel approach for transmitting and receiving digitized
signals from Smart Motes. The novel approach introduced
herein minimizes bandwidth, and battery power for the Smart
Motes, yet eliminates gaps in data on the receiving end.
1.1 Transmitting Side:
Instead of using a fixed threshold to decide when to transmit
data, this paper applies a sequential detection algorithm called
the Sequential Probability Ratio Test (SPRT) to decide when
the events monitored by the wireless sensors are “interesting”
(which can include coming from a distribution that has a
higher or lower mean, or having a larger or smaller level of
variability than normal). Using a SPRT on the transmission
side brings the following advantages:
� The SPRT has user configurable false-alarm and
missed-alarm probabilities (as opposed to threshold
limits, which have a “sea-saw” tradeoff between
sensitivity and false alarms).
� The SPRT has the mathematically shortest decision
time for catching subtle anomalies in noisy process
variables.
It is important to note that the SPRT is not compute intensive
and is well suited to operate within the cpu constraints of the
Smart Motes (the SPRT comprises some simple algebraic
expressions, as shown in the mathematical underpinnings
section below).
1.2 Receiving Side:
On the receiving side, signals coming from Smart Motes
during their uninteresting times could have large gaps during
times when the wireless sensors are not transmitting. The
Smart Mote system described here generates synthetic
observations adhering to a target distribution that has the same
mean and variance as the parameter being monitored by the
Smart Mote during that variable's “background activity” time
period. The distribution defaults to normal, Gaussian white
noise, which is adequate for 99% of the anticipated
applications. By definition, the data characteristics are less
interesting during this time period, so that synthesis of an
approximation to background activity is adequate and, most
importantly, supplies synchronous observations so that
Oracle's (already existing), or any other pattern recognition
algorithms can be used as consumers of the telemetry signals.
(As an option, the capability is provided to customize the
distribution for applications where the expected “background
activity” is different from Gaussian.)
2 SPRT Implementation
The Sequential Probability Ratio Test [1-3] is a
statistical hypothesis test that differs from standard fixed
sample tests in the way in which statistical observations are
employed. In familiar fixed-sample statistical tests, a given
number of observations are used to select one hypothesis from
one or more alternative hypotheses. The SPRT, however,
examines one observation at a time, then makes a decision as
soon as it has sufficient information to ensure that pre-
specified confidence bounds are met.
The basic approach taken by the SPRT technique is to
analyze successive observations of a discrete process. Let yn
represent a sample from the process at a given moment tn in
time. We'll assume for simplicity of illustration that the
sequence of values {Yn} = y0, y1 ... yn comes from a
stationary process characterized by a Gaussian, white-noise
probability density function (pdf) with mean 0. (Note that
since we are dealing here with nominally stationary processes,
any process variables with a nonzero mean can be first
normalized to a mean of zero with no loss of generality).
The SPRT is a binary hypothesis test that analyzes
process observations sequentially to determine whether or not
the signal is consistent with normal behavior. When a SPRT
reaches a decision about current process behavior, i.e., that
the signal is behaving normally or abnormally, the decision is
reported and the test continues to process observations.
For each of the six types of tandem SPRT tests
developed here, normal process behavior is defined for data
signals corresponding to a Gaussian pdf with mean 0 and
variance σσσσ2. Normal signal behavior is referred to as the null
hypothesis, H0. We formulate six specific SPRT hypothesis
tests that are computed in parallel for the variable monitored
by each wireless Smart Mote. Although the default
implementation is described in terms of applying a SPRT that
applies to Gaussian time series, one can use a nonparametric
SPRT for signals that are not represented with a Gaussian
distribution.
Within a SPRT surveillance module, all 6 tandem
hypothesis tests are executed in parallel. Each test determines
whether the current sequence of process observations is
consistent with the null hypothesis vs. an alternative
hypothesis. Four of the tests, which have been applied in
open-literature publications previously (but for different types
of applications), are known as the positive mean test, the
negative mean test, the nominal variance test, and the inverse
variance test. For the positive mean test, the corresponding
alternative hypothesis, H1, is that the signal data
corresponding to a Gaussian pdf with mean +M and variance
σσσσ2. For the negative mean test, the corresponding alternative
hypothesis, H2, is that the signal data corresponding to a
Gaussian pdf with mean -M and variance σσσσ2. For the nominal
variance test, the corresponding alternative hypothesis, H3, is
that the signal data corresponding to a Gaussian pdf with
mean 0 and variance Vσσσσ2 (with scalar factor V). For the
inverse variance test, the corresponding alternative
hypothesis, H4, is that the data corresponding to a Gaussian
pdf with mean 0 and variance σσσσ2/V.
The final two tandem SPRT tests are performed not on
the raw Smart Mote output variables described above, but on
the first difference function of the variable. For discrete time
series, the first difference function (i.e. difference between
each observation and the observation preceding it) gives an
estimate of the numerical derivative of the time series. During
uninteresting time periods, the observations in the first
difference function will be a nominally stationary random
process centered about zero, which is perfectly in concert with
our assumptions in employing a SPRT. If an upward or
downward trend should suddenly appear in the signal, SPRTs
number 5 and 6 monitor for an increase or decrease,
respectively, in the slope of the Smart Mote variable. As
such, if there is a decrease in the value of the variable, SPRT
alarms will be triggered for SPRTs 2 and 6. SPRT 2 will fire
a warning because the sequence of raw observations is
dropping with time. And SPRT 6 will fire because the slope
of the variable changes from zero to something less than zero.
The advantage of monitoring the mean SPRT and slope SPRT
in tandem is realized if the signal levels off to a new
stationary value (or plateau). At this point SPRT 2 will
continue firing because the new current value is different from
the value prior to the degradation; whereas the alarms from
SPRT 6 will cease because the slope returns to zero when the
raw signal reaches a plateau.
If SPRTs 3 or 4 should fire a warning, it means that the
variance of the sensed variable is increasing or decreasing,
respectively. An increasing variance that is not accompanied
by a change in mean (inferred from SPRTs 1 & 2 and SPRTs
5 & 6) can signify an episodic event that is “bursty” or
“spiky” with time. A decreasing variance that is not
accompanied by a change in mean is a common symptom of a
failing sensor that is characterized by an increasing time
constant. As such, having variance SPRTs available in
parallel with slope and mean SPRTs can provide a wealth of
supplementary diagnostic information that has not been
possible with conventional wireless sensors.
The SPRT technique provides a quantitative framework
that permits a decision to be made between the null hypothesis
and the foregoing six alternative hypotheses with specified
misidentification probabilities. If the SPRT accepts one of the
alternative hypotheses, an alarm flag is set and data is
transmitted. If all six of the null hypotheses are met, it can be
concluded with a high degree of confidence that the data
represents normal background activity for the variable
monitored by the Smart Mote.
The SPRT operates as follows. At each time step in a
calculation, a test index is calculated and compared to two
stopping boundaries A and B (defined below). The test index
is equal to the natural log of a likelihood ratio (Ln), which for
a given SPRT is the ratio of the probability that the alternative
hypothesis for the test (Hj, where j is the appropriate subscript
for the SPRT in question) is true, to the probability that the
null hypothesis (H0) is true.
trueHgivenYsequenceobservedofyprobabilit
trueHgivenYsequenceobservedofyprobabilit
n
jn
0}{
}{ Ln =
(1)
If the logarithm of the likelihood ratio is greater than or equal
to the logarithm of the upper threshold limit [i.e., ln(Ln) > ln(B)], then it can be concluded that the alternative hypothesis
is true. If the logarithm of the likelihood ratio is less than or
equal to the logarithm of the lower threshold limit [i.e., ln(Ln)
< ln(A)], then it can be concluded that the null hypothesis is
true. If the log likelihood ratio falls between the two limits,
[i.e., ln(A) < ln(Ln) < ln(B)], then there is not yet enough
information to make a decision (and, incidentally, no other
statistical test could yet reach a decision with the same given
Type I and II misidentification probabilities).
The threshold limits are related to the misidentification
probabilities α and β by the following expressions:
A = αβ−1 and B = α
β−1 , (2)
where
α is the probability of accepting Hj when H0 is true (i.e., the
false-alarm probability), and
β is the probability of accepting H0 when Hj is true (i.e., the
missed-alarm probability).
The first two SPRT tests for normal distributions examine the
mean of the process observations. The goal of the mean tests
is to declare that the system is degraded if the distribution of
observations exhibits a non-zero mean, e.g., a mean of either
+M or -M, where M is the pre-assigned system disturbance
magnitude for the mean test. Assuming that the sequence
{Yn} is corresponding to a Gaussian pdf, then the probability
that the null hypothesis H0 is true (i.e., mean 0 and variance
σσσσ2) is given by [Ref. 4], resulting in:
P(y1,y2,…yn | H0) =
2/2
)2(
1nσπ
exp
− ∑
−
n
k
ky1
2
22
1
σ (3)
Similarly, the probability for alternative hypothesis H1 (i.e.,
mean M and variance σσσσ2) is:
P(y1,y2,…yn | H1 ) =
2/2 )2(
1nσπ
exp
+−− ∑ ∑∑
− −−
n
k
n
k
k
n
k
k MMyy1 1
2
1
2
22
2
1
σ (4)
The ratio of the probabilities in Equations (3) and (4) gives the
likelihood ratio Ln for the positive mean test, Equation (5):
Ln = exp ( )
−− ∑
−k
n
k
yMM 22
1
12σ
(5)
The SPRT index for the positive mean test (SPRTpos) is
given by taking the logarithm of the foregoing likelihood ratio:
SPRTpos = ( )kn
k
yMM 22
1
12
−− ∑−σ
= 2σM∑−
−
n
k
k
My
1 2 (6)
The SPRT index for the negative mean test (SPRTpos) can be
derived by substituting -M for each instance of M in Equations
(4) through (6), resulting in:
SPRTneg = 2σM
∑−
−−n
k
k
My
1 2 (7)
The remaining two SPRT tests examine the variance of
the sequence. This capability gives the SPRT module the
ability to detect and quantitatively characterize changes in
variability for processes, which is vitally important for 6-
sigma QA/QC improvement initiatives. In the variance tests,
the system is declared to be degraded if the sequence exhibits
a change in variance by a factor of V or 1/V. Where V, the
pre-assigned system disturbance magnitude for the variance
test, is a positive scalar. The probability that the alternative
hypothesis H3 is true (i.e., mean 0 and variance Vσσσσ2) is given
by Equation (3) with σσσσ2 replaced by Vσσσσ2
:
P(y1,y2…yn|H2)= 2/2 )2(
1nVσπ
exp
− ∑
−
n
k
kyV 1
2
22
1
σ (8)
The likelihood ratio for the variance test is given by the ratio
of Equation (8) to Equation (3):
Ln = V-n/2
exp
−− ∑
−
n
k
kyV
V
1
2
2
1
2
1
σ (9)
The SPRT index for the nominal variance test (SPRTnom) is given by taking the logarithm of the likelihood ratio given in
Equation (9), to give:
SPRTnom =
−V
V 1
2
12σ
∑−
n
k
ky1
2
- Vn
ln2
(10)
The SPRT index for the inverse variance test (SPRTinv) can
be derived by substituting 1/V for each instance of V in
Equations (8) through (10), resulting in:
SPRTinv = ( )V−12
12σ
∑−
n
k
ky1
2
+ Vn
ln2
(11)
The tandem SPRT module performs mean, variance, and
SPRT tests on the raw process signal and its first difference
function. To initialize the module for analysis of a wireless
Smart Mote variable time series, the user specifies the system
disturbance magnitudes for the tests (M and V), the false-
alarm probability (α), and the missed- alarm probability (β). Then, during the training phase (before the first failure of a
component under test), the module calculates the mean and
variance of the monitored variable process signal. For most
monitored variables the mean of the raw observations for the
variable will be non-zero; in this case the mean calculated
from the training phase is used to normalize the signal during
the monitoring phase. The system disturbance magnitude for
the mean tests specifies the number of standard deviations (or
fractions thereof) that the distribution must shift in the positive
or negative direction to trigger an alarm. The system
disturbance magnitude for the variance tests specifies the
fractional change of the variance necessary to trigger an alarm.
At the beginning of the monitoring phase, all six SPRT
indices are set to 0. Then, during each time step of the
calculation, the SPRT indices are updated using Equations (6),
(7), (10), and (11). Each SPRT index is then compared to the
upper [i.e., ln((1-β)/α] and lower [i.e., ln((β/(1-α))] decision
boundaries, with these three possible outcomes: 1) the lower
limit is reached, in which case the process is declared healthy,
the test statistic is reset to zero, and sampling continues; 2) the
upper limit is reached, in which case the process is declared
degraded, an alarm flag is raised indicating a sensor or process
fault, the test statistic is reset to zero, and sampling continues;
or, 3) neither limit has been reached, in which case no
decision concerning the process can yet be made and the
sampling continues.
The advantages of using a SPRT, as have been demonstrated
in previous Oracle prognostic systems that use wired sensors
[Refs 5-7], are twofold:
1. One can detect very subtle anomalies in noisy process
variables at the earliest possible time.
2. One can pre-specify quantitative false-alarm and
missed-alarm probabilities.
For the intelligent wireless mote IoT applications addressed in
this paper, we introduce tandem SPRTs in a novel application
that monitors “derivative SPRTs” in parallel with mean and
variance SPRTs that are performed on the time series
associated with the Smart Mote measured variable(s). This
new tandem-SPRT approach enables one to determine the
onset of interesting episodic events, saving both bandwidth
and transmitter battery power during periods of normal
background activity for IoT applications of intelligent wireless
motes. During such time periods, the receiver agent generates
synthesized observations that possess exactly the same mean,
variance, skewness, and kurtosis, as the original variable
during its “background activity” period. (Gaussian data is
generated by default; but the distribution can be customized to
match an empirical CDF for IoT applications wherein signals
with non-Gaussian noise are encountered).
Output from this technique is then consumed and processed
with standard pattern recognition algorithms that expect
uniform, synchronous time series as input.
3 Example Applications
The new technique is illustrated with an example that uses
three typical signals from prototype wireless Smart Motes.
Continuous variables are illustrated by the upper 3 subplots in
Fig. 1. The second set of 3 subplots shows digitized samples
after the signals pass through an A/D converter chip.
With conventional approaches, all of the data shown would be
transmitted continuously, even during the less interesting
“background activity” periods that can be seen in the figures.
Figure 2 shows SPRT indices indicating “interesting” episodes
of activity in the sensors (i.e. the distribution is significantly
different, with a pre-defined confidence factor, from the
distribution of the “background activity”). The lower three
subplots of Fig. 2 show the optimized transmission activity for
the Smart Motes.
Figure 3 shows the original raw data as seen by the
sensors, the “interesting” data that was transmitted via the
wireless network, and the reconstructed signals per this paper.
The background activity in the reconstructed signals is
statistically indistinguishable from the background activity in
the raw signals (matches in mean and variance). The
optimized reconstructed signals are now synchronously
sampled and are amenable to analysis by pattern recognition
“consumer” algorithms, such as those separately patented by
Oracle for proactive anomaly detection.
Fig. 1. SPRT-Based Wireless Smart Mote Example
Application
Fig 2. SPRT Indices Indicate Interesting Episodes and
Trigger Data Transmission
Fig 3. Reconstructed Signals on Receiving Side
4 Conclusions
For intelligent wireless mote IoT applications addressed in
this paper, we introduce a new six-component tandem SPRT
statistical machine learning approach embedded in a light-
weight algorithm that monitors “derivative SPRTs” in parallel
with mean and variance SPRTs computed on digitized time
series associated with the Smart Mote measured variable(s).
This new tandem-SPRT approach enables one to determine
the onset of interesting episodic events, saving both
bandwidth and transmitter battery power during periods of
normal background activity for IoT applications of intelligent
wireless motes. During such time periods, the receiver agent
generates synthesized observations that possess exactly the
same mean, variance, skewness, and kurtosis, as the original
variable during its “background activity” period. This new
approach leverages and extends Oracle's experience with real-
time prognostics from wired telemetry pattern-recognition
applications. For intelligent wireless motes for IoT prognostic
applications, the new tandem-SPRT data-transmission
actuator introduced herein optimizes bandwidth utilization,
minimizes battery power for the smart motes, yet eliminates
gaps in data on the receiving end.
5 References
[1] A. Wald. Sequential Analysis. John Wiley &
Sons, New York, NY, 1947.
[2] K. C. Gross and K. Humenik. "Nuclear Power
Plant Component Surveillance Implemented in SAS
Software," Proc. SAS Users Group Int’l. Conf. pp.
1127-1131, San Francisco, April 1989.
[3] K. C. Gross, R. Dhanekula, and K.
Vaidyanathan, “Novel Training Enhancements for
Advanced Statistical Pattern Recognition Used for
Electronic Prognostics of Enterprise Computing
Systems,” Proc. IEEE World Congress in
Computer Science, Computer Engineering, and
Applied Computing (WorldComp2011), Las
Vegas, NV (Aug 2011).
[4] K. Whisnant, K. C. Gross and N. Lingurovska,
“Proactive Fault Monitoring in Enterprise Servers,”
Proc. 2005 IEEE Intn'l Multiconference in Computer
Science & Computer Eng., Las Vegas, NV (June 2005).
[5] A. Urmanov and K. C. Gross, “Failure Avoidance in
Computer Systems,” Proc. 59th Meeting of the Society
for Machinery Failure Prevention Technology,
Virginia Beach, VA (Apr 18-21, 2005).
[6] K. Vaidyanathan and K. C. Gross, “Proactive
Detection of Software Anomalies through MSET,”
Proc. IEEE Workshop on Predictive Software Models
(PSM-2004), Chicago (Sept 17-19, 2004).
[7] K. C. Gross, K. W. Whisnant and A. Urmanov,
"Electronic Prognostics Through Continuous System
Telemetry," Proc. 60th Meeting of the Society for
Machinery Failure Prevention Technology,Virginia
Beach, VA (April 2006).