Active Inference and Dynamic Gaussian Bayesian Networks ...ml/pdfs/komurlu-aisgsb16.pdf · Active...

Active Inference and Dynamic Gaussian Bayesian Networks for BatteryOptimization in Wireless Sensor Networks

Caner Komurlu and Mustafa BilgicIllinois Institute of Technology

10 W 31st Street, Stuart BuildingChicago, Illinois 60616

[email protected], [email protected]

AbstractWireless sensor networks play a major role in smart grids andsmart buildings. They are not just used for sensing, but theyare also used as actuating. In terms of sensing they are usedto measure temperature, humidity, light, to detect motion, etc.Sensors are often operated on a battery and hence we oftenface a trade-off between obtaining frequent sensor readingsversus maximizing their battery life. There have been sev-eral approaches to maximizing their battery life from hard-ware level to software level such as reducing components en-ergy consumption, limiting node operation capabilities, us-ing power-aware routing protocols, and adding solar energysupport. In this paper, we introduce a novel approach: wemodel the sensor readings in a wireless network using a dy-namic Gaussian Bayesian network (dGBn) whose structureis automatically learned from data. dGBn allows us to inte-grate information across sensors and infer missing readingsmore accurately. Through active inference for dGBns, we areable to actively choose which sensors should be pulled for areading and which ones can stay in a power-saving mode ateach time step, maximizing prediction accuracy while stayingwithin the budgetary constraints on battery consumption.

IntroductionSmart buildings are increasingly more prevalent and they areused in various service areas such as factories, energy facili-ties, airports, harbors, schools, medical centers, governmentbuildings, military bases, etc. Properties of smart buildingsand their objectives show a wide range of variety: security,surveillance, adapting to weather conditions and quality ofservice.

Sensing is a major task in smart buildings. Arguably themost common sensing mechanism consists of wireless sen-sor networks (WSN). WSNs’ key components are sensornodes. They are mechanically independent and they commu-nicate through RF. They sense one or multiple specific phys-ical events, process their readings, and forward to a server.

As nodes in a WSN are mechanically independent, theyrely on their individual source of energy, most generally abattery. Lifespan of each sensor node is determined by howfast it consumes energy. Many studies have been carried outto increase lifespan of nodes in WSN. In addition to extend-ing battery life through physiochemical enhancements, some

Copyright © 2016, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

strategies have been proposed for decreasing energy expen-diture (Akyildiz et al. 2002).

A sensor node performs three essential tasks: sensing inwhich it converts a physical quantity to a reading, process-ing in which it treats readings and store them, and finallycommunicating in which it sends or receives data. Energyconsumption of sensing and processing is negligible besidecommunicating. Instead of gathering every reading fromeach node, readings from a subset can be fetched and re-mainders can be kept silent. The more frequent a sensorkeeps silent, the longer it survives.

The intuition behind keeping some sensors silent is to uti-lize the correlations between sensors to predict sensor read-ings that are not communicated to conserve energy. Collect-ing readings from a sensor or a group of sensors might giveenough data to estimate readings of another. Thereby, wecan save energy by not fetching true readings. Correlationsexisting between sensors should be considered not only inspace but in time as well. A sensor’s reading in the pastmight provide enough insight to estimate its value for someperiod with help of concurrent readings from other sensorsand their past readings as well.

In this paper, we propose a dynamic Gaussian Bayesiannetwork (dGBn) model for WSNs and utilize the correlationbetween sensors to minimize prediction error subject to howmany messages each sensor can send in a given time frame.Our contribution consists of two phases. First we model aWSN as a dGBn that represents correlations between sen-sors in space and time. Second, on top of our dGBn model,we select sensors to fetch readings on each time stamp withan intelligent strategy, which we call active inference. Activeinference dynamically selects random variables on dGBn,which are sensors in this context, for observation. This ob-servation is used as evidence to maximize accuracy. As aresult, active inference can help us to minimize battery us-age.

In the rest of the paper, we provide background on wire-less sensor networks and dynamic Bayesian network mod-els. Then we describe our dynamic Gaussian Bayesian net-work model and we formulate the active inference problemfollowed by a discussion of experimental methodology. Wefinally present results, discuss related work and conclude.

Background and ApproachProblem StatementLifespans of sensors in a typical wireless sensor network(WSN) depend on their battery consumption and commu-nication is the largest consumer of battery. To increase lifes-pan, an obvious solution is to reduce the frequency of mes-sages that a sensor sends to other sensors and/or to a centralserver. The downside of reducing the frequency of messagesis that it leads to missing information at various time stamps.One possible remedy is to use a model of sensor readingsand i) when the sensor reading is available, use the observedreading, and ii) when the sensor reading is missing due toreduced communication frequency, predict missing valuesusing the underlying model.

In this paper, we tackle the following problem: given asensor network, a predictive model for the sensor network,and a budget of how many messages can be sent per timestamp, determine the ideal sensors that should communicatetheir readings to a central location so that the overall errorover the network at each time stamp is minimized. More for-mally, let Y s

t be the sensor reading for time t and sensor s (ifit is communicated to the central server, then it is observed;if not, it is predicted), Yt be the sensor readings, both ob-served and predicted, of all sensors at time t (i.e, Yt is theunion of Y s

t for all s), Xst be the always-observed features

of the sensor s at time t, which includes sensor specific in-formation such as ID and location, Xt (i.e, Xt is the unionof Xs

t for all s), the set of observed features of all sensorsat time t, B the budget, t the current time, O the the set ofobserved readings up to time t (i.e., the communicated Y s

ivalues for 0 ≤ i < t), θ the underlying model, and Err theerror function over Y , the objective is:

argminS⊂Yt

Err(Yt|S,O,Xt, θ) s.t. |S| = B (1)

That is, find the subset of sensors at time t that shouldcommunicate their readings to the central location so thatthe error over all sensor readings, observed and predicted,is minimized for time t. Note that in this case, the error iscomputed over both observed and predicted sensors. The er-ror for observed sensors can be assumed to be zero (or in thecase of noisy sensor readings, the error can reflect the noisein readings). Alternatively, one can compute the error overonly the sensors that did not communicate their readings andhence need to be predicted:

argminS⊂Yt

Err(Yt \ S|S,O,Xt, θ) s.t. |S| = B (2)

We will discuss potential approaches to choosing S in ac-tive inference section below. There are a number of potentialchoices for θ. For example, one can train a temporal modelper sensor such as a Gaussian process model per sensor. An-other possibility is to use graphical models. We discuss thesenext.

Predictive Models for WSNsGaussian Processes One of the simplest and yet most ap-propriate strategy for modeling sensor readings over time

is Gaussian Processes (Rasmussen 2004). For WSN, a pos-sible approach is to train one Gaussian Process per sensorusing its past history of readings over time. The advantageis that training of a Gaussian process per sensor and usingit for prediction are both very fast. The disadvantages are,however, i) the correlations between sensors are not takeninto account, and ii) once the model is trained and fixed, theobserved readings at one time stamp at prediction time doesnot affect the predictions at other time stamps.

Dynamic Gaussian Bayesian Network We use dynamicGaussian Bayesian network modeling to handle the temporalbehavior of the system and to exploit spatio-temporal cor-relations between sensors. Each sensor is represented by aGaussian random variable Y s

t with conditional Gaussian dis-tributionN (β0+β

T ·Pa(Y st ), σi) wherePa(Y s

t ) is Y st ’s

parents. To explore independencies between sensors, thusfinding a DBN structure capturing correlations the most, weresort K2 structure learning proposed by Cooper and Her-skovits (1992).

To keep structure simplistic for temporal dependencies,we assumed that each random variable at time slice t, Y s

twill be independent from all other sensor values of the pre-vious time slice t − 1 given itself in the previous time slicet−1 (Y s

t ⊥ Yt−1 \{Y st−1}|Y s

t−1). Therefore we added Y st−1

to the parent list of Y st

Since exact inference methods are intractable (Koller andFriedman 2009) on DBNs, approximate inference methodsare generally resorted for prediction. We used Metropolis-Hastings (Hastings 1970) (MH) sampling for inference, anMCMC method which has found practice in many studies(Gilks, Richardson, and Spiegelhalter 1996). For each ran-dom variable, MH samples a value from a proposal distri-bution. Then using this newly sampled value and previouslysampled value, it computes an acceptance probability. If theacceptance probability exceeds a random threshold sampleduniformly from [0, 1] interval, then the new value is acceptedas the new state of the random variable, otherwise the previ-ous value is repeated in the sample.

Equation 3 shows how acceptance probability is com-puted and evaluated. Ys

t is random variable Y st ’s last ac-

cepted value, Y′st is newly sampled value,us is Y it ’s Markov

blanket and r is random threshold for α the acceptance prob-ability. Note that t represents here sampling step.

α(Y′st ,Yst,u

s) = min

{1,p(Y′st ,u

s)× q(Y′st |Yst,u

s)

p(Yst,u

s)× q(Yst|Y′st ,us)

}r ∼ U([0, 1])

Y′st+1 =

{Y′st , if α ≥ 1

Yst, otherwise

(3)

Active InferenceActive inference (Bilgic and Getoor 2009), as a method inthe context of statistical prediction such as classification orregression, is a technique of selective information gatheringwith the objective of maximizing prediction accuracy. Un-like active learning (Settles 2012) which collects label in-

formation during phase of training a statistical model, activeinference collects in the phase of prediction.

Active inference on Bayesian networks is selecting ran-dom variables to observe during inference for better pre-diction on the rest. Active inference formulation is perfectmatch for battery optimization in WSNs, because active in-ference determines which nodes should be observed (i.e.,which sensors should communicate their readings) to maxi-mize the prediction accuracy on the remaining ones, subjectto budgetary constraints.

Arguably, the simplest way is randomly selecting obser-vation set. Another method is to define a fixed frequency forthe sensors and hence they communicate their readings atfixed intervals, like once every hour. We call this approach,the sliding window selection approach. In this study we pro-pose impact-based selection method, in which variables areselected for observation based on their impact on predict-ing others. In following subsections, we describe these threeactive inference methods.

Random Selection At each prediction time t, random se-lection chooses B random sensors for observation. Then,the observed readings for these sensors, and all the past ob-served readings are used during inference to predict the sen-sor readings for the unobserved ones and error is computedusing Equations 1 and 2.

Sliding Window Selection This method, as a baseline al-ternative to random selection, selects sensors not randomlybut with respect to a predefined order. In details, the listof sensors is shuffled initially and for once, then it is splitinto equal width bins, b1, b2, . . . bm. Then at each predictiontime, a bin is selected in the original order, for observation.When we predict the time slice ti, we observe the bin bi.After mth time slice, we restart the sequence with b1. Tomake picture clear, a very simple example can be given. Inthe case of 50% of sensors are selected as evidence. Some25 sensors are selected for the first time slice, and the other25 are observed in the following. On the third time slice, thefirst 25 are observed again, and so on. An advantage of thisselection method is collecting information from each sensorin equal frequency, whereas random selection can choose asensor multiple times in consecutive slices.

Impact-Based Selection For the impact-based selectionapproach, we first define the impact of each random variableon others when it is observed. Note that when a sensor’sreading is observed, its immediate impact is on its unob-served neighbors in the network, i.e., its unobserved parentsand unobserved children in the Bayesian network. In Equa-tion 4 we define an impact function which incorporates βcoefficients of Y s

t and β coefficients of its children. ω1 is anindicator function which is 1 when corresponding parent isunobserved, 0 otherwise. ω1 is the vector of indicator func-tions of Pa(Y s

t ). ω2 is the set of unobserved children ofY st . The impact of observing Y s

t on its immediate neighborsis defined as:

Imp(xi) = |βTi | · ω1 +

∑j∈ω2

|β(i)j | (4)

argmaxS

∑Y st ∈S⊂Yt

Imp(Y st ) s.t.

∣∣S∣∣ = B (5)

Given this impact formulation, we would like to find asubset of random variables of which sum of impacts is max-imum, as shown in Equation 5. In literature, this problem isalso described as finding the maximum cut given parts of sizein a graph. This problem is discussed by Ageev and Sviri-denko (1999). They also show that it is APX-Hard, i.e. itdoes not admit an error guarantee with a polynomial-timeapproximation algorithm. Therefore, we propose a greedyalgorithm.

Algorithm 1 Greedy Impact Based Selection1: procedure GREEDY–IBS(Yt,B)2: for i = 1, 2, . . . B do3: maxI = 04: maxY = NIL5: for Y s

t ∈ Yt \ S do6: if Imp(Y s

t ) > maxI then7: maxI = Imp(Y s

t )8: maxY = Y s

t9: end if

10: end for11: S = S ∪ {maxY }12: for each Y s

t ∈ Pa(maxY ) do13: ω2(Y

st ) = ω2(Y

st ) \ {maxY }

14: end for15: for each Y s

t ∈ Child(maxY ) do16: ω1(Y

st )[maxY ] = 0

17: end for18: end for19: end procedure

Algorithm 1 describes our greedy impact based selectionmethod. Yt is the set of all random variables, B is the num-ber of random variables we can select for observation andS is the observation/evidence set. We loop over all randomvariables which are not yet in the evidence set, we find theone with the highest impact, we add it to evidence list andremove it from parent list of its children and child list ofits parents. The reason is obvious, a random variable has noimpact on an observed variable, therefore once a variableis observed, it has to be removed from neighborhood of itsneighbors and impact of all its neighbors should be recom-puted. We repeat this selection B times.

Experimental Design and EvaluationWe first describe the data on which we test our dGBn modeland active inference. Next we illustrate our baselines to as-sess our dGBn model and our active inference method.

DataIn our study we used Intel Research Lab sensor data (Desh-pande et al. 2004). It consists of temperature, humidity, light,and voltage readings collected from 60 sensors. These sen-sors were placed in an office environment and employed forsensing the environment in different frequencies.

In this article, we focused on temperature readings fordays 2, 3, 4, and 5 as these days show highest temperaturevariation. We split this period into equal bins of 30 minutes.We averaged readings of each sensor at each bin. In the end,we obtained a data set composed of one temperature valuefor each sensor at each time bin. We selected days 2, 3, 4 astraining set, and we used first 6 hours of the 5th day as testset.

Evaluation MetricsTo evaluate our dGBn model, our active inference method,and the baselines, we utilize Mean Absolute Error (MAE), asour predictions are in a continuous domain. MAE measuresdistance of each reading to real value and computes meanof distances of a set of readings. Note that in the context ofthis study, a reading can be an actual observation, as wellas a prediction. Based on our objective, we define the set ofreadings on which we compute the mean.

Mean Absolute Error on all readings In the first method,we refer to the objective function given in Equation 1. Inthis objective, we include all readings from the predictiontime to compute error. Hence we average absolute error overall readings, including observed ones, in order to exploreoverall prediction performance of the predictive model (i.e.Gaussian processes or dGBn). Equation 6 shows how wecompute average error of prediction over all sensors, givenErr as the distance of a reading to its actual value:

MEA(t) =1

|Yt|∑

Y st ∈Yt

|Err(Y st )| (6)

Mean Absolute Error on predicted readings In the sec-ond method, we refer to Equation 2 which minimizes erroron only predicted readings. Equation 7 shows how we com-pute average error.

MEA(t) =1

|Yt \ S|∑

Y st ∈Yt\S

|Err(Y st )| (7)

For stochastic methods, which are random selection andsliding window selection, we ran 5 trials and averaged MAEat each time slice over 5 trials. Because a time slice hasno specific effect on error reduction, we average MAE overtime slices and obtain an overall error for each budget B.

In order to see the effect of budget on error reduction, wetried various budget levels: 0%, 10%, 20%, 30%, 40%, and50%. To evaluate our dGBn model and our greedy impact-based selection method, we tried Gaussian processes withrandom selection, which we shortly call GP-RND. Then wetried our dGBn model with 3 different active inference meth-ods: random selection (dGBn-RND), sliding window se-lection (dGBn-SW), greedy impact-based selection (dGBn-IBS).

Experimental Results and DiscussionWe first present results for Gaussian processes followed bythe results for dynamic Gaussian Bayesian network model.Then, we compare Gaussian processes with dynamic Gaus-sian Bayesian networks. We present these results in tables.

In each table, columns are reserved for different budgets.The first row represents MAEs over all readings, whereas thesecond row shows MAEs over predicted readings only. Forall cases, we present results corresponding to MAE over allreadings and MAE over only predicted readings. We showthese results in bar plots.

Mean Absolute Error of GP-RND

Budget

0% 10% 20% 30% 40% 50%

All readings 0.56 0.50 0.45 0.39 0.34 0.28

Predicted readings 0.56 0.56 0.56 0.56 0.56 0.56

Table 1: Mean absolute error of random selection on Gaus-sian process with budgets from 0% to 50%.

Table 1 shows prediction error of Gaussian Process usingrandom sampling as active inference (GP-RND). In the firstrow, we see a strictly monotonic reduction in error as budgetincreases. This result is not surprising: as the observed sen-sor percentage increases, the error goes down because theerror on the observed ones are assumed to be zero. It is im-portant to note however that as the observed sensor rate goesup, so does the battery consumption. On the other hand, inthe second row, we notice that the error rate is invariant onbudget for MAE with predicted readings only. Since Gaus-sian process cannot incorporate evidence and cannot makeuse of correlations between sensors, more evidence will nothelp predicting a reading.

Mean Absolute Error of dGBn-RND

Budget

0% 10% 20% 30% 40% 50%

All readings 4.19 0.63 0.31 0.24 0.18 0.15


Table 2: Mean absolute error of random selection on dGBnwith budgets from 0% to 50%.

In Table 2, we present performance result of dGBn withrandom selection. In both rows we see a monotonic decreasein error rates as more evidence is provided. The decreasein error in the first row is not surprising again, as was ex-plained for the GP-RND result (Table 1). A decrease in errorcomputed on predicted readings demonstrates that our dGBnmodel is able to exploit relationships between sensor nodesand hence observed/collected readings help reduce the pre-diction error for the unobserved/not-collected readings.

In Table 3 we show the effect of sliding window selectionwith dGBn (dGBn-SW) on error reduction. We see similartrends as dGBn-RND results. More evidence yields less er-ror in regards to both error computation methods.

Table 4 shows our greedy impact-based selectionmethod’s contribution to our dGBn in error reduction. Wesee a monotonic decrease in both measures, except 30% onMAE averaged on unobserved sensors only, which is sameas 20%.

Mean Absolute Error of dGBn-SW

Budget

0% 10% 20% 30% 40% 50%

All readings 4.19 0.58 0.33 0.23 0.18 0.14


Table 3: Mean absolute error of sliding window selection ondGBn with budgets from 0% to 50%.

Mean Absolute Error of dGBn-IBS

Budget

0% 10% 20% 30% 40% 50%

All readings 4.19 0.31 0.27 0.23 0.18 0.12


Table 4: Mean absolute error of impact-based selection ondGBn with budgets from 0% to 50%.

Next, we compare various modeling and active inferenceresults side by side using bar plots. Figure 1 shows errorrates of GP-RND, dGBn-RND, dGBn-SW, and dGBn-IBS,referring to all readings including observed ones. Likewise,Figure 2 shows error rates with respect to predicted readingsonly. In these plots, Y axis represents the error rate in tem-perature. X axis is reserved for evidence rates. We did notinclude error rates on budget 0% since they are drasticallylarger on this budget, and this difference scales the plot in away that comparison of error rates on other budgets becomeinfeasible.

In Figure 1, surprisingly, dGBn-RND and dGBn-SW per-formed worse than GP-RND on budget of 10%. We can in-terpret that this budget was good enough for neither RNDnor SW on dGBn to reduce error as much as GP-RND cando with the same budget. Note that GP uses local attributesas well. However, on this rate we can see that dGBn-IBS out-performs all other model and method combinations includ-ing GP-RND. On other evidence rates, all active inferencemethods on dGBn outperform GP-RND. We can concludethat on overall prediction performance our dGBn model out-performs GP when at least 20% of sensors are observed. On30% and 40%, dGBn-IBS outperforms dGBn-RND and itcompetes with dGBn-SW. When budget is 50%, dGBn-IBSis the best model and method combination.

Figure 2 presents error rates of each model and methodcombination with respect to predicted readings only. Resultsin this figure show some similarities to those on Figure 1. Amajor difference in Figure 2 is that GP-RND is constant withrespect to budget. This is simply because GP-RND cannotexploit evidence to reduce error on predictions. On 30% and40%, dGBn-IBS is at least as good as dGBn-SW, and bet-ter than dGBn-RND and GP-RND. On remaining rates, it isbetter than all other model and method combinations.

We omitted results for GP with SW approach because,given enough trials, the results are expected to be similar toGP-RND results. The reason is that GP per sensor approach

does not exploit relationships between sensors.

0.00

0.20

0.40

0.60

10% 20% 30% 40% 50%

MEA

Evidence rates

Mean Absolute Error on All Readings

GP-RND dGBn-RND dGBn-SW dGBn-IBS

Figure 1: Mean absolute error of all model and method com-binations computed over all readings on budgets from 0% to50%

0.00

0.20

0.40

0.60

10% 20% 30% 40% 50%

MA

E

Evidence rate

Mean Absolute Error on Predicted Readings Only

GP-RND dGBn-RND dGBn-SW dGBn-IBS

Figure 2: Mean absolute error of all model and method com-binations computed over predicted readings only on budgetsfrom 0% to 50%

Related WorkMany studies addressed predictive models for energy savingon wireless sensor networks (WSN). Elnahrawy and Nath(2004) propose prediction by context awareness. They dis-cretize readings and use naıve Bayes classifier to predictthese discrete values. They use the geographical topologyof the sensor network as the basis of node neighborhood intheir model.

In a similar way, Wang and Deshpande (2008) clustersensors and discretize readings. Then they apply compres-sion techniques on descretized data to detect correlations be-tween sensors within each cluster. Using compression-basedcorrelations, they fit joint distributions for variables in eachcluster, and they incorporate the joint distributions in a prob-abilistic model, which they refer as decomposable model, topredict some sensors instead of reading them.

Deshpande et al. (2004) also apply a model based pre-dictive approach to saving energy. They train a multivariate

Gaussian distribution which models a wireless sensor net-work and let user run queries on their model with arbitraryconfidence intervals. Depending on target query and con-fidence interval, their model decides which sensors to ob-serve. Their approach involves the entire set of sensors asone multivariate distribution and they incorporate only ob-servations of the prediction time. In our case, we make useof current observations and all past observations.

In addition to modeling sensor networks for energy opti-mization, another approach is scheduling sensors for read-ing. Slijepcevic and Potkonjak (2001) addressed this ap-proach to seek an efficient way of clustering sensors so thateach cluster alone covers the entire area of surveillance, andthat clusters alternate for sensing one at a time. They turnthis problem into an area cover problem and maximize thecount of clusters so that they can keep as many sensors silentas possible at each turn.

Gaussian Bayesian networks are so far used in variouscontexts. Castillo et al. (1997) designed a model for damageassessment in concrete structures of buildings using Gaus-sian Bayesian network modeling. They used their model innumerical propagation of uncertainty in incremental formand they also proved that conditional means and variancesof nodes in their network are rational functions given evi-dence. Again Castillo, Menendez, and Sanchez-Cambronero(2008) used Gaussian Bayesian networks to model trafficflow.

Active inference was previously applied by Bilgic andGetoor (2009; 2010) for general graphs. They used activeinference to query the labels of a small number of care-fully chosen nodes in a network to condition the underlyingmodel on these collected labels to increase prediction perfor-mance on the remaining nodes. Active inference was used byChen et al. (2011) to analyze short chunks of video on whichthe underlying model can condition and make better predic-tions on remaining segments. Bilgic and Getoor (2010) usedactive inference along with an iterative classification algo-rithm. Chen et al. (2011) used it on hidden Markov models.Active inference was also discussed in the context of hid-den Markov models by Krause and Guestrin (2009). FinallyKrause and Guestrin, (2005b; 2005a; 2009), and Bilgic andGetoor (2011) formulated active inference as optimal valueof information on graphical models.

Limitations and Future DirectionsIn this work, the dGBn model of the sensor network utilizedonly the correlations between the sensor readings, i.e., theY st variables, ignoring the local attributes, Xs

t . Therefore,when no observation was made, the full network defaultedto the average prediction. In the future, we plan on incor-porating local attribute values into dBGn so that predictionaccuracy can be maximized, especially when the observednode ratio is small.

We utilized several active inference approaches in thispaper, including sliding-window and impact-based ap-proaches. There are other approaches that can be utilized asbaselines. One such baseline is the variance approach wheresensor readings with the highest prediction variance is cho-sen for observation. The motivation is that the higher the

prediction variance is, the higher the chance of incorrect pre-diction is.

Finally, nodes in a sensor network often sense more thanone measure. For example, in the Intel Research Lab data(Deshpande et al. 2004) that we used, the nodes sensetemperature, humidity, light, and voltage. We focused ononly the temperature readings in this paper. Training oneBayesian network model over multiple types of sensing,such as temperature and humidity, will enable the model toexploit correlations between different types of readings. Forexample, it is quite conceivable that humidity and tempera-ture readings are correlated and hence it makes perfect senseto exploit these relationships.

ConclusionsWe tackled the problem of simultaneously minimizing thebattery consumption in a wireless sensor network by limit-ing how frequently the sensors can communicate their read-ings to a central server and minimizing prediction error overthe not-communicated ones. We presented two predictivemodels, a Gaussian process model, and a dynamic Gaus-sian Bayesian network model, and several active inferenceapproaches that selectively gather sensing information fromnodes of a sensor network. We showed that by utilizing thedynamic Gaussian Bayesian network model that exploits thespatio-temporal correlations between sensor readings andperforming active inference formulated through the edgeweights on the Bayesian network model, we were able toreduce the prediction error drastically.

Acknowledgements This material is based upon worksupported by the National Science Foundation under grantno. 1125412.

ReferencesAgeev, A. A., and Sviridenko, M. I. 1999. Approximationalgorithms for maximum coverage and max cut with givensizes of parts. In Integer Programming and CombinatorialOptimization. Springer. 17–30.Akyildiz, I. F.; Su, W.; Sankarasubramaniam, Y.; andCayirci, E. 2002. Wireless sensor networks: a survey. Com-puter networks 38(4):393–422.Bilgic, M., and Getoor, L. 2009. Reflect and correct: A mis-classification prediction approach to active inference. ACMTransactions on Knowledge Discovery from Data 3(4):1–32.Bilgic, M., and Getoor, L. 2010. Active inference for collec-tive classification. In Twenty-Fourth Conference on ArtificialIntelligence (AAAI NECTAR Track), 1652–1655.Bilgic, M., and Getoor, L. 2011. Value of information lat-tice: Exploiting probabilistic independence for effective fea-ture subset acquisition. Journal of Artificial Intelligence Re-search (JAIR) 41:69–95.Castillo, E.; Gutierrez, J.; Hadi, A.; and Solares, C. 1997.Symbolic propagation and sensitivity analysis in gaussianbayesian networks with application to damage assessment.Artificial Intelligence in Engineering 11(2):173 – 181.

Castillo, E.; Menendez, J. M.; and Sanchez-Cambronero,S. 2008. Predicting traffic flow using bayesian networks.Transportation Research Part B: Methodological 42(5):482– 509.Chen, D.; Bilgic, M.; Getoor, L.; Jacobs, D.; Mihalkova, L.;and Yeh, T. 2011. Active inference for retrieval in cameranetworks. In Workshop on Person Oriented Vision.Cooper, G. F., and Herskovits, E. 1992. A bayesian methodfor the induction of probabilistic networks from data. Ma-chine learning 9(4):309–347.Deshpande, A.; Guestrin, C.; Madden, S. R.; Hellerstein,J. M.; and Hong, W. 2004. Model-driven data acquisitionin sensor networks. In Proceedings of the Thirtieth interna-tional conference on Very large data bases-Volume 30, 588–599. VLDB Endowment.Elnahrawy, E., and Nath, B. 2004. Context-aware sensors.In Wireless Sensor Networks. Springer. 77–93.Gilks, W. R.; Richardson, S.; and Spiegelhalter, D. J. 1996.Markov Chain Monte Carlo in Practice. InterdisciplinaryStatistics. Chapman & Hall/CRC.Hastings, W. K. 1970. Monte carlo sampling methods usingmarkov chains and their applications. Biometrika 57(1):97–109.Koller, D., and Friedman, N. 2009. Probabilistic GraphicalModels: Principles and Techniques. The MIT Press.Krause, A., and Guestrin, C. 2005a. Near-optimal nonmy-opic value of information in graphical models. In AnnualConference on Uncertainty in Artificial Intelligence.Krause, A., and Guestrin, C. 2005b. Optimal nonmyopicvalue of information in graphical models - efficient algo-rithms and theoretical limits. In International Joint Con-ference on Artificial Intelligence, 1339–1345.Krause, A., and Guestrin, C. 2009. Optimal value of infor-mation in graphical models. Journal of Artificial IntelligenceResearch 35:557–591.Rasmussen, C. E. 2004. Gaussian processes in ma-chine learning. In Advanced lectures on machine learning.Springer. 63–71.Settles, B. 2012. Active Learning. Synthesis Lectures on Ar-tificial Intelligence and Machine Learning. Morgan & Clay-pool.Slijepcevic, S., and Potkonjak, M. 2001. Power efficientorganization of wireless sensor networks. In Communica-tions, 2001. ICC 2001. IEEE International Conference on,volume 2, 472–476 vol.2.Wang, L., and Deshpande, A. 2008. Predictive modeling-based data collection in wireless sensor networks. Lec-ture Notes in Computer Science (including subseries LectureNotes in Artificial Intelligence and Lecture Notes in Bioin-formatics) 4913 LNCS:34–51.

Active Inference and Dynamic Gaussian Bayesian Networks ...ml/pdfs/komurlu-aisgsb16.pdf · Active...

Documents

Transcript of Active Inference and Dynamic Gaussian Bayesian Networks ...ml/pdfs/komurlu-aisgsb16.pdf · Active...