This file has been cleaned of potential threats. If you confirm that...

16
Water Resour Manage (2016) 30:3107–3122 DOI 10.1007/s11269-016-1334-6 A Probabilistic Nonlinear Model for Forecasting Daily Water Level in Reservoir Monidipa Das 1 · Soumya K. Ghosh 1 · V. M. Chowdary 2 · A. Saikrishnaveni 2 · R. K. Sharma 3 Received: 26 May 2015 / Accepted: 24 April 2016 / Published online: 14 May 2016 © Springer Science+Business Media Dordrecht 2016 Abstract Accurate prediction and monitoring of water level in reservoirs is an important task for the planning, designing, and construction of river-shore structures, and in taking decisions regarding irrigation management and domestic water supply. In this work, a novel probabilistic nonlinear approach based on a hybrid Bayesian network model with exponen- tial residual correction has been proposed for prediction of reservoir water level on daily basis. The proposed approach has been implemented for forecasting daily water levels of Mayurakshi reservoir (Jharkhand, India), using a historic data set of 22 years. A compara- tive study has also been carried out with linear model (ARIMA) and nonlinear approaches (ANN, standard Bayesian network (BN)) in terms of various performance measures. The proposed approach is comparable with the observed values on every aspect of prediction, and can be applied in case of scarce data, particularly when forcing parameters such as precipitation and other meteorological data are not available. Keywords Nonlinear prediction · Surface water level · Reservoir · Bayesian network · Residual correction 1 Introduction Reservoirs are the open-air storage area where water is collected and kept in quantity so that it may be drawn off for use, especially to meet the uneven distribution of water both Soumya K. Ghosh [email protected] 1 Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur 721302, India 2 Regional Remote Sensing Centre - East, National Remote Sensing Centre, Indian Space Research Organization, Kolkata, India 3 Irrigation and Waterways Department, Kolkata, West Bengal, India

Transcript of This file has been cleaned of potential threats. If you confirm that...

Page 1: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

Water Resour Manage (2016) 30:3107–3122DOI 10.1007/s11269-016-1334-6

A Probabilistic Nonlinear Model for Forecasting DailyWater Level in Reservoir

Monidipa Das1 ·Soumya K. Ghosh1 ·V. M. Chowdary2 ·A. Saikrishnaveni2 ·R. K. Sharma3

Received: 26 May 2015 / Accepted: 24 April 2016 /Published online: 14 May 2016© Springer Science+Business Media Dordrecht 2016

Abstract Accurate prediction and monitoring of water level in reservoirs is an importanttask for the planning, designing, and construction of river-shore structures, and in takingdecisions regarding irrigation management and domestic water supply. In this work, a novelprobabilistic nonlinear approach based on a hybrid Bayesian network model with exponen-tial residual correction has been proposed for prediction of reservoir water level on dailybasis. The proposed approach has been implemented for forecasting daily water levels ofMayurakshi reservoir (Jharkhand, India), using a historic data set of 22 years. A compara-tive study has also been carried out with linear model (ARIMA) and nonlinear approaches(ANN, standard Bayesian network (BN)) in terms of various performance measures. Theproposed approach is comparable with the observed values on every aspect of prediction,and can be applied in case of scarce data, particularly when forcing parameters such asprecipitation and other meteorological data are not available.

Keywords Nonlinear prediction · Surface water level · Reservoir · Bayesian network ·Residual correction

1 Introduction

Reservoirs are the open-air storage area where water is collected and kept in quantity sothat it may be drawn off for use, especially to meet the uneven distribution of water both

� Soumya K. [email protected]

1 Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur721302, India

2 Regional Remote Sensing Centre - East, National Remote Sensing Centre, Indian Space ResearchOrganization, Kolkata, India

3 Irrigation and Waterways Department, Kolkata, West Bengal, India

Page 2: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

3108 M. Das et al.

in space and time. The natural flow of streams and rivers vary greatly with the change inweather throughout a year and may lead to excess flows (flood) or low flows (droughts)in various regions (Postel and Richter 2012; Poff et al. 2007). In such situations, water-reservoirs play a significant role to mitigate these natural disasters by impounding waterduring periods of higher flows, and permitting gradual release of water during periods oflower flows. Besides, these are extensively used as the source of water for drinking as wellas irrigation. Therefore, accurate prediction of reservoir water level is highly essential foroptimizing the various water management issues.

At the same time, prediction of water-level in reservoir is a very challenging task, since itdepends not only on the stream flow volume but also on other parameters, like flow velocity,stream flow path, climate factors (rainfall, temperature etc.) and so on (Bates et al. 2008;Piao et al. 2010; Chamoglou et al. 2014). Streams, that transport much suspended sediment,severely shorten the reservoir life by rapidly reducing its storage capacity (Panagopouloset al. 2008). Water in a reservoir may also be lost by surface evaporation (Christensen et al.2004; Bates et al. 2008), by seepage into the surrounding soil or rocks, and by dischargethrough dam foundations, say, for crop irrigation purpose (McNider et al. 2014). Grossevaporation from water surfaces in the temperate and tropical climates may amount to a fewmeters a year, whereas, in humid regions this loss is offset by direct precipitation, and thus,the net surface loss becomes moderate or negligible. Thus, reservoir water level variationsare complex outcomes of many of these environmental factors.

Although it is possible to identify sophisticated models taking into consideration theaforementioned parameters, it is preferable to have a model which simulates water levelvariations based on previously recorded data (Kisi et al. 2012). Therefore, the reservoirwater level forecasting using the records of past time series becomes an important issue inwater resource planning.

For decades, several approaches have been proposed for forecasting the reservoir waterlevels and river flows using traditional statistical methods and Artificial Intelligence (AI)techniques. The linear structure of conventional statistical methods (ARMA and ARIMAmodels) do not guaranty accuracy of prediction (Kisi et al. 2012). Therefore, owing to thestrongly nonlinear, highly uncertain, and time-varying characteristics of the hydro-system,none of these models can be considered as a single superior model.

In the last decade, Artificial Intelligence (AI) techniques, such as Artificial Neural Net-works (ANN), Genetic Algorithms (GA), and fuzzy theory have increasingly been appliedin hydrological and water resources systems (Chang and Chang 2001; Altunkaynak 2007;Chaves and Chang 2008; Buyukyildiz et al. 2014). Most of these nonlinear prediction mod-els are based on ANN (Chaves and Chang 2008; Ishak et al. 2012; Kisi et al. 2012) withless exploration of uncertainty management issues and there is also a need for incorporat-ing robustness in those ANN-based approaches (Maier et al. 2010). Ondimu and Murase(2007) applied ANN to forecast reservoir level by considering a feature group compris-ing of rainfall, evaporation rate, river-discharges and the water levels. Chang and Chang(2006) constructed an adaptive network-based fuzzy inference system (ANFIS) for forecast-ing water levels during flood periods. The prediction model showed accurate and reliableforecasting in the next three hours, and it also handled uncertainty in the data. Cimen andKisi (2009) applied SVM and ANN models in modeling lake level fluctuations. In all thesecases, the effectiveness of ANN is highly dependent on the understanding of the behaviorbetween the variables as well as the extensive knowledge about appropriate operation ofneural network (Mustafa et al. 2012).

In the present work, we have proposed a probabilistic nonlinear prediction model basedon Bayesian network (BN), a powerful tool for representing and reasoning with uncertain

Page 3: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

A Probabilistic Nonlinear Model for Forecasting Water Level 3109

knowledge (Russell et al. 2003). BNs are also able to model complex systems with a largenumber of variables in an efficient manner (Getoor et al. 2004). In our present study, theBayesian network principle has been hybridized with a residual-correction module to havea better accuracy of prediction. The specific objectives of the present work can be listed asfollows:

– Development of a nonlinear prediction model, based on the principles of Bayesiannetwork, for forecasting reservoir water-levels;

– Incorporation of a residual-correction module with the Bayesian network model to com-pensate the scarcity of influencing factors in the network and improve the predictionaccuracy;

– Evaluation of the proposed nonlinear approach in comparison with the traditionalstatistical models (ARIMA), standard BN, and ANN based time series forecastingtechniques.

1.1 Problem Definition

The current problem of reservoir level prediction can formally be defined as follows:

– Given the historical data of daily water-level WL = {wl1, wl2, · · · , wld} in a particularreservoir for the past t years: y1, y2, · · · , yt , where, d = total number of days in the tyears (approximately, d = 365× t). The problem is to get a daily estimate of the water-level in the same reservoir for the year y(t + 1) in next future which can be denoted bythe series PWL = {pwl1, pwl2, pwl3, · · · , pwlk}. Here, each pwli is equivalent tothe predicted water-level value for the day (d + i), and k is the total number of days inthe prediction year y(t + 1).

The proposed BN-based approach with the incorporated residual correction module hasbeen evaluated with real life data set to forecast water level in Mayurakshi reservoir,India (central co-ordinate: 24◦6.6′N , 87◦18.9′E) for future five years (2008-2012). Theexperimental result have proved our proposed approach to be more than 60 %, 33 %,and 14 % accurate than the ARIMA models, standard Bayesian network (BN), and ANNrespectively.

2 Data set and Study Area

In this study, Mayurakshi reservoir (catchment area of 1860 sq. km) in Jharkhand state,India, is considered as the case study area (Fig. 1). The climate of the study area is trop-ical and it experiences three well defined seasons: (i) hot weather from March to June,(ii) rainy season from July to October, and (iii) winter season from November to Febru-ary. The average annual rainfall in the study area is nearly 1400 mm. The reservoir hasa live storage of 559.49 Mm3 at full reservoir level (FRL) i.e 121.31 m above mean sealevel (amsl) and dead storage of 49.86 Mm3 at dead storage level of 106.38 m as perthe capacity survey conducted during the year 2001 (CWC 2015). The total culturablecommand area (CCA) is nearly 2.27 lakh ha. The water spread area of the Mayurak-shi reservoir at full reservoir level (FRL) is around 68 sq. km. The daily water leveldata of this reservoir for a span of 22 years (1st January, 1991 to 31st December 2012)has been collected from the office of the Irrigation and Waterways Department, Kolkata,India.

Page 4: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

3110 M. Das et al.

Fig. 1 Study area: Mayurakshi reservoir, India (central co-ordinate: 24◦6.6′N , 87◦18.9′E)

3 Bayesian Network Approach for Forecasting Reservoir Water Level

3.1 Bayesian Network

A Bayesian network (Aguilera et al. 2011) is a statistical multivariate model for a set ofvariables X = {X1, . . . , Xn}, which consists of the following two major components:

– Qualitative component: A directed acyclic graph (DAG), termed as causal dependencygraph (CDG), where each vertex represents one of the variables in the model, and anedge linking two variables indicates the existence of statistical dependence betweenthem.

– Quantitative component: A conditional distribution p(Xi |pa(Xi)) for each variableXi, i = 1, 2, · · · , n given its parents in the graph, denoted as pa(Xi).

In Bayesian network, any node Xi given its parents is conditionally independent of itsnon-descendants. i.e.

p (Xi |pa (Xi) , nd (Xi)) = p (Xi |pa (Xi)) (1)

where, nd (Xi) is the set of non-descendants of Xi . Therefore, the dependency structurein directed acyclic graph of Bayesian network can be simply represented as joint Proba-bility Density Function (PDF) of the variables by means of a factorization as a product ofconditional/marginal probability distributions as follows:

p (x1, x2, · · · , xi, · · · , xn) =n∏

i=1

p (xi | par (Xi)) (2)

where, xi is a specific value for variable Xi and par (Xi) denotes the specific values of thevariables in pa(Xi).

Once the Bayesian network is learned, the probability distribution of a node given itsparents is obtained, and even the other way round, the probability distribution of a parentnode given its child nodes can also be determined.

Page 5: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

A Probabilistic Nonlinear Model for Forecasting Water Level 3111

3.2 Process Flow

The proposed approach comprises of four major steps, namely (A) Data Preprocessing,(B) Variable Selection using Correlation Analysis, (C) Causality Model Selection, and (D)Prediction System. The basic block diagram of the proposed prediction approach is shownin the Fig. 2. The system takes the historical data of past t years: y1, y2, · · · , yt as input,and forecasts the daily water-levels for the next year (y(t + 1)).

3.2.1 Data Pre-Processing

The main purpose in this step is the discretization of water-level data to make it suitable forBayesian network analysis. Initially, we have determined the maximum and minimum val-ues of surface water level attained during the given historical period. Then, the entire rangehas been divided into a suitable number of sub-intervals to produce the desired discretizedvalues.

3.2.2 Variable Selection using Correlation Analysis

The objective here is to select variables causing the change in present water level most,and eventually simplify the Bayesian network topology. There can be several factors, likethe water level in the previous day, average water level in previous m days, water-levelin the previous year, average water level in the previous m years, average water level in

Fig. 2 Basic building blocks of the proposed probabilistic approach for water level prediction

Page 6: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

3112 M. Das et al.

Table 1 Correlation between present water level and various influencing factors

Candidate factor influencing present water level Correlation with

(on day d) present water level

v1: Water-level in the previous year on day d 0.9999

v2: Water level in the previous day (d − 1) 0.9970

v3: Average water level in the previous month 0.9236

v4: Average water level in previous m(= 3) days 0.9931

v5: Average water level in the previous m(= 3) years on day d 1.0000

the previous month etc., which can affect the present level of water in a reservoir. In orderto select the more influential factor between two similar kind of candidates, the pair wisecorrelation analysis between the present water level and each of these candidate factors isperformed separately. Subsequently, the factor that shows the higher correlation with thepresent water level is chosen in the analysis.

For example, as shown in Table 1, among the five candidate variables (v1, v2, · · · , v5),v2 and v4 are of same kind, but since the correlation of v2 with the present water level ishigher, v2 will be selected as more influential factor than v4. Similarly, v5 will be choseninstead of v1, and since v3 has no competitor, it will also be selected as one of the factorsdetermining present water level in reservoir. This step helps to properly build the causaldependency graph of the Bayesian network by eliminating redundant influencing factorsfrom the network structure.

3.2.3 Causality Model Selection

In this step, we define the causal dependency graph (CDG) of the Bayesian network model,comprising of the variables selected in the previous step. In our present study, three variables(or factors), namely water level of the previous day in the same year, average water levelin the previous m(= 3) years on same day, and average water level in the previous monthin the same year have been identified as variables with the most influence on present waterlevel. The corresponding causal dependency graph is shown in the Fig. 3.

In generic sense, the Bayesian network model in such kinds of prediction is highly prob-able to be complex involving several factors influencing the prediction variable. Therefore,a discriminative type (Santafe et al. 2007) of Bayesian network is chosen for the mod-eling purpose. The absence of other promising variables (influencing factors) and other

Fig. 3 A Causal Dependency Graph of BN, showing the dependency of present water level on the otherinfluencing variables

Page 7: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

A Probabilistic Nonlinear Model for Forecasting Water Level 3113

possible causal dependencies are compensated by integrating a residual correction module(Module2) in the BN learning framework.

3.2.4 Prediction System

The basic framework of the proposed prediction system has been depicted in the Figs. 2and 4. The entire prediction system consists of two key modules:

– Module1: Performing Bayesian analysis– Module2: Performing residual correction

A. Module1 [Bayesian Analysis]

The purpose of this module is to learn the probabilistic relationships between the presentwater level and other variables, and then infer the water level for the prediction day. Oncethe network is trained, it is utilized to infer the value of present water level for a particularday under consideration. As shown in Fig. 2, the Module1 consists of two major units: a)Bayesian learning unit, and b) Bayesian inference generation unit.

(a) Bayesian learning unit:

This unit takes as input the historical data of past years, and the causal dependency graphof Bayesian network (Fig. 3) as defined during the causality model selection. The outputof this unit is a trained Bayesian network with the captured probabilistic relationships for agiven prediction year.

Figure 4 depicts the basic idea behind the proposed Bayesian learning mechanism. Thebasic framework of Bayesian analysis in this module is adopted from Das and Ghosh (2014).The network (causal dependency graph) is trained separately, with the given data for eachtraining year (y(1), y(2), y(3), · · · , y(t), t = total number of available training years) tolearn the associated probabilistic relationships among the variables during each year. Asshown in the figure, the network is trained separately for each training year and is denotedby BNy(1), BNy(2), BNy(3), · · · , BNy(t) respectively. At the end of training for each year,the marginal and conditional probabilities obtained for each considered variable (v) are

Fig. 4 Proposed prediction system based on hybrid Bayesian network model with residual correction

Page 8: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

3114 M. Das et al.

averaged in following manner to get the corresponding probabilities for the prediction yeary(t + 1).

PvalvF tab = PvalvF tab + (wi × PvalvCtab) (3)

PvalvF tab = Element in final probability table for variable v; PvalvCtab = Element in prob-ability table (of current training year) for the same variable v; and wi = Weight assigned tothe i-th training year, such that

∑ti=1 wi = 1. The weight wi is defined as follows:

wi =(

1di+1

)

∑ti=1

(1

di+1

) , (4)

where, di = [y(t + 1) − y(i)], i.e. the temporal distance of the current training year yi

from the prediction year y(t + 1). Equation (4) is based on the temporal autocorrelationproperty, which considers that the inter-variable relationships in the prediction year are morelikely to be same as those during the nearby training years. The proposed Bayesian analysismechanism used in Module-1 is presented as Algorithm 1.

Bayesian inference generation unit:

This unit infers the value of present water level for a particular day (d) using Bayesiannetwork inference technique. For achieving this, the prediction year y(t + 1), and the rela-tionships among the attributes as learnt in the previous unit are taken as input. It assumes allthe other variables (water level in the previous day, average water level in the previous m

years, and average water level in the previous month) to be the evidence variable, and thevalue of present water level is inferred based on these evidences.

Now, let for a particular day (d) in the prediction year y(t +1) the value of present water

level is to be predicted using the water level in the previous day (d − 1)(WLPD

y(t+1)

d−1

),

the average water level in the previous m years (AWLPYy(t+1)−1d ), and the average water

Page 9: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

A Probabilistic Nonlinear Model for Forecasting Water Level 3115

level in the previous month (AWLPMy(t+1)). The values for present water level can beinferred using the following equation:

p(CWLd

y(t+1)/WLPDy(t+1)

d−1 , AWLPYy(t+1)−1d , AWLPMy(t+1)

)=

p(CWLd

y(t+1)

).p

(WLPD

y(t+1)d−1 ,AWLPY

y(t+1)−1d ,AWLPMy(t+1)

CWLdy(t+1)

)

p(WLPD

y(t+1)

d−1 , AWLPYy(t+1)−1d , AWLPMy(t+1)

) (5)

Where CWLdy(t+1) is the value of present water level for the day d.

Basically, p(CWLd

y(t+1)/WLPDy(t+1)

d−1 , AWLPYy(t+1)−1d , AWLPMy(t+1)

)is directly

available from the Bayesian learning, since the network structure has been modeled as ofdiscriminative type.

Now, if SCWL be the set of all possible values for present water level on the day d, thenthe inferred value of present water level (on day d) becomes as follows:

Ipwldy(t+1) =Max

(p(CWLd

y(t+1)(i)/WLPDy(t+1)

d−1 , AWLPYy(t+1)−1d , AWLPMy(t+1))

),

∀CWLdy(t+1)

(i)∈SCWL(6)

That means the value of CWLdy(t+1) ∈ SCWL, producing the highest conditional proba-

bility is treated to be the value of present water level as obtained from the Bayesian inferencegeneration.

B. Module2 [Residual Estimation]

One of the main limitations of BNs is that, as the number of variables increases, the buildingprocess of the network and the parameter estimation requires more and more data to main-tain the accuracy (Ordonez Galan et al. 2009). To overcome this shortcoming, the presentwork has proposed a hybrid structure of Bayesian network learning, by performing residualcorrection over the inferred value.

Residual can be defined as a quantity that measures the deviation of an observed valueof an element from the estimated function value. It is an observable estimate of unobservedstatistical error and also termed as fitting error. Various methods, based on Gauss-Seidelalgorithm (Atkinson 2008), maximum a posteriori reconstruction (Fu and Qi 2008) etc.,exist for performing residual correction. In the present study, a new approach, based on theprinciples of exponential average, has been proposed for residual correction. The residualvalue, as produced at the time of inference generation, is modified exponentially duringthe network learning, and the final value of residual is utilized to compensate the dearthof different other necessary but unknown variables that might be present in the networktopology.

This module has been integrated with the Bayesian network model for residual-correction in exponential manner. Each time the network is trained, the current residualvalue εi is modified in following manner:

εi = (αEi−1) + (1 − α)εi−1 (7)

Where, α ∈ [0, 1] and Ei−1 is the error in prediction for the same day(d) in the year y(i−1)

and calculated as follows:

Ei−1 = ActualV alue − pwldy(i−1) (8)

Page 10: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

3116 M. Das et al.

Where, pwldy(i−1) is the predicted value of water level for the day d in the year y(i − 1).At the end of training with the data of t past years, we get the final value of residual εt ,

and the predicted value of water level for the day d in the year y(t + 1) becomes:

pwldy(t+1) = Ipwldy(t+1) + εt (9)

= Ipwldy(t+1) + αEt−1 + (1 − α)εt−1

= Ipwldy(t+1) + α[Et−1 + (1 − α)Et−2

] + (1 − α)2εt−2

= Ipwldy(t+1) + α[Et−1 + (1 − α)Et−2 + · · · + (1 − α)(t−1)E0

]+ (1 − α)t ε0

(10)

Where, Ipwldy(t+1) is the inferred value of the water level for the day d in the year y(t+1)

as obtained by the Bayesian analysis.The overall procedure for residual correction has been presented through Algorithm 2.

The residual correction, as performed in this module, helps to recompense the absence ofvarious other factors in the considered Bayesian network topology (Fig. 3) which mighthave a significant influence on the present water level.

4 Forecasting Reservoir Water Levels

In this work, we have proposed a nonlinear prediction model, based on hybrid Bayesian net-work with an incorporated residual correction method, for forecasting surface water-levelin reservoir. The theoretical foundations of the proposed approach have been described inthe previous sections. Here, we present the empirical validation of the proposed approach,

Page 11: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

A Probabilistic Nonlinear Model for Forecasting Water Level 3117

where a historical data set of 22 years have been used to predict the water level inMayurakshi reservoir on daily basis. The overall results are found to be encouraging.

4.1 Experimental Setup

The proposed BN based approach has been implemented using MATLAB 7.12.0 (R2011a)in Windows 7 (32-bit Operating System, 2.40 GHz CPU, 2.00 GB RAM) and R-tool version3.1.1 (32 bit). MATLAB has been utilized to implement the BN-based approaches, wherethe water level values have been discretized into five ranges. The average water level in theprevious m years has been estimated considering the window size m = 3. The proposedapproach has been evaluated in comparison with other popular linear as well as nonlin-ear prediction approaches like moving average, exponential model (Holt-winters approach),automated ARIMA, standard BN, and ANN. MATLAB has been utilized to perform timeseries forecast of water level using the feed-forward back propagation model of atrificialneural network (ANN), and implementing the standard BN technique. On the other hand,the R-tool has been used for forecasting water level using different models of ARIMA,like Moving average, Holt-Winters method, Automated ARIMA etc. Same input combina-tions have been used for the proposed approach and all the other methods for carrying outthe comparative study. The various combinations of training years and the correspondingprediction year, used in the experimentation, are as follows: ([1991-2007], 2008), ([1991-2008], 2009), ([1991-2009], 2010), ([1991-2010], 2011), and ([1991-2011], 2012), wherein each touple, the first element is the set of training years and the second element is theprediction year. However, our proposed approach is flexible enough to adjust with any othercombinations of training and testing year as well.

4.2 Results and Discussions

In this study, a nonlinear, BN based prediction model based on the hybrid Bayesian networkwith an incorporated residual correction method has been proposed for forecasting waterlevels in reservoir. The model has been evaluated by comparing with the existing linearand nonlinear prediction models. Five statistical evaluation criteria (root mean square error(RMSE), mean absolute error (MAE), normalized root mean square deviation (NRMSD),coefficient of determination or R-squared (R2) (Everitt BS 2002), and Pearson’s correla-tion cofficient (CC)) have been used to assess the prediction performances of our proposedmethod. The details of these metrics can be found in the ‘Annexure A’ of the supplemen-tary document. The model performance indicators for the prediction period 2008-2012 arepresented in the Table 2.

By analyzing the different outcomes, as shown in the Table 2 and in Fig. 5, the followinginferences can be drawn about the proposed probabilistic approach i.e. the hybrid BN withresidual correction:

i) From the Table 2, it is evident that the proposed BN-based approach (using resid-ual correction) has resulted in least average RMSE and MAE of 3.06 and 2.5respectively in comparison with the standard BN, statistical (ARIMA) and ANNmodels. The residual-correction mechanism has compensated the unknown vari-ables in the BN causal dependency graph and contributed for improved predictionaccuracy.

Page 12: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

3118 M. Das et al.

Table 2 Comparative study of proposed approach (BN with residual correction) with existing predictiontechniques (in terms of RMSE, MAE, NRMSD (%), R2 and CC)

Error Statistics Prediction technique Prediction years

2008 2009 2010 2011 2012

RMSE Moving Average [ARIMA(0,0,1)] 16.77 23.54 25.22 21.59 22.78

Exponential Model [Holt-Winters Approach] 13.60 18.47 19.41 16.35 16.93

Automated ARIMA 16.76 23.54 25.22 21.59 22.78

ANN (feed-forward back propagation) 11.77 12.74 03.31 08.63 03.88

Standard BN (without residual correction) 15.42 15.02 08.07 11.00 10.78

Proposed Approach (BN with residual correction) 03.15 03.42 03.06 02.45 03.21

MAE Moving Average [ARIMA(0,0,1)] 11.81 20.98 24.89 19.78 22.60

Exponential Model [Holt-Winters Approach] 09.57 15.32 18.98 14.60 16.69

Automated ARIMA 11.81 20.98 24.89 19.78 22.60

ANN (feed-forward back propagation) 09.43 09.77 02.42 06.19 03.24

Standard BN (without residual correction) 10.38 11.28 06.62 09.42 10.08

Proposed Approach (BN with residual correction) 02.67 02.56 02.66 01.97 02.72

NRMSD Moving Average [ARIMA(0,0,1)] 41.50 58.20 153.81 74.46 145.12

Exponential Model [Holt-Winters Approach] 33.66 45.66 118.37 56.37 107.86

Automated ARIMA 41.50 58.20 153.81 74.46 145.12

ANN (feed-forward back propagation) 29.14 31.49 20.17 29.77 24.70

Standard BN (without residual correction) 38.17 37.14 49.24 37.93 68.66

Proposed Approach (BN with residual correction) 07.80 08.46 18.67 08.47 20.45

R2 Moving Average [ARIMA(0,0,1)] 0.00 0.00 0.00 0.00 0.00

Exponential Model [Holt-Winters Approach] 0.00 0.00 0.00 0.00 0.00

Automated ARIMA 0.00 0.00 0.00 0.00 0.00

ANN (feed-forward back propagation) 0.57 0.41 0.50 0.25 0.04

Standard BN (without residual correction) 0.02 0.00 0.03 0.16 0.18

Proposed Approach (BN with residual correction) 0.94 0.91 0.60 0.94 0.64

CC Moving Average [ARIMA(0,0,1)] 0.00 0.00 0.00 0.00 0.00

Exponential Model [Holt-Winters Approach] 0.00 0.00 0.00 0.00 0.00

Automated ARIMA 0.00 0.00 0.00 0.00 0.00

ANN (feed-forward back propagation) 0.76 0.64 0.71 0.50 -0.20

Standard BN (without residual correction) 0.14 0.08 0.17 0.40 0.43

Proposed Approach (BN with residual correction) 0.97 0.96 0.77 0.97 0.80

ii) The lower values of NRMSD (∼7-13 %), computed for all years considered for pre-diction, indicates the efficacy of proposed BN-based approach compared to the othertechniques (Table 2).

iii) From the NRMSD values in Table 2, it may be observed that the proposed approachis on average more than 60 % better than the statistical forecasting models (ARIMAmodels), and almost 15 % better than the ANN-based prediction technique. Moreover,the performance of the proposed BN-based approach, when hybridized with the

Page 13: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

A Probabilistic Nonlinear Model for Forecasting Water Level 3119

residual correction module, has improved about 33 % with respect to the standard BNwith no residual correction.

iv) In order to estimate the fitness of the forecasting methods, the R2 values have beenpresented in Table 2. The higher the value of R2 ∈ [0, 1], the better the model fitsfor prediction. It may be noted that the proposed hybrid BN-based approach provideshigher R-squared value (mostly ∼1), whereas that for the ANN, BN and ARIMAmodels are much lower.

v) Table 2 also shows the correlation (CCs) of predicted values with respect to theobserved data in the year 2008-2012 for each prediction technique. Here also theproposed probabilistic approach shows a very high correlation value (≈ 1) andoutperforms the ANN, standard BN, and ARIMA models.

vi) Time series of the observed daily reservoir levels and the model forecasts for the sixprincipal model configurations for the validation period 2008-2012 are shown in theFig. 5. From the figure, it is clear that the outcome of the proposed nonlinear approachis matching well with the actual water level value in all the prediction year, indicatingbetter model efficiency.

Therefore, in summary it can be stated that although the RMSE, MAE and NRMSD val-ues in case of ANN and BN are similar, the prediction result improves substantially whenBN model is hybridized with residual correction. Since the nodes in a BN are modeledby means of probability distributions, risk and uncertainty are estimated more accuratelythan in models where only mean values are taken into consideration. Moreover, the resid-ual correction module, incorporated in the proposed BN-based forecasting approach, helpsto recompense the absence of various other factors in the considered Bayesian networktopology, which might have a significant influence on the present water level. Hence, theproposed approach, that captures the nonlinear behavior of reservoir level variations basedon the historic data, is highly useful for formulating water resources management strategies.Besides, once the reservoir level is forecasted, the corresponding reservoir capacity can alsobe obtained from the elevation-capacity rating curve (please refer to the ‘Annexure B’ inthe supplementary document).

Overall, the proposed approach is comparable with the observed values on every aspectof prediction and can be applied in case of scarce data, particularly when the influencingmeteorological parameters, such as precipitation, are not available.

4.3 Novelty of the Work

The novelties in this work are threefold:

1) First of all, unlike the standard Bayesian network (BN), the proposed BN learningencompasses the temporal change in inter-variable relationships with considerationto the autocorrelation property. This helps in proper modeling of the change ininter-relationships among the variables from one year to another year.

2) Secondly, in the proposed learning approach, Bayesian network has been hybridizedwith residual correction module, providing a mechanism for tuning the capturedprobabilistic relationship at each temporal instant.

3) Moreover, the incorporation of residual correction module in the proposed model offersa superior ability for learning from only the historic data of water level, especially incase of unavailability of data on influencing parameters such as rainfall, temperature,reservoir inflow-outflow, evaporation from water-body etc.

Page 14: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

3120 M. Das et al.

Fig. 5 Comparative study of predicted water level with actual water level in different prediction year: (a)2008, (b) 2009, (c) 2010, (d) 2011, (e) 2012

Page 15: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

A Probabilistic Nonlinear Model for Forecasting Water Level 3121

5 Conclusions

This paper presents a probabilistic approach, based on Bayesian network, to predict thereservoir water-level with fair degree of accuracy. A novel architecture of BN learning, witha residual-correction mechanism has been employed. The incorporated residual-correctionmechanism helps to compensate the non-available variables in the BN causal dependencygraph and assists in improving the accuracy of prediction. The proposed approach has beenevaluated in comparison with the conventional statistical methods (ARIMA models), andnonlinear approaches (ANN, standard BN), with respect to the long term prediction of waterlevels in Mayurakshi reservoir, India. The high prediction accuracy demonstrates the effec-tiveness of the proposed prediction technique, using hybrid BN with residual correction.Moreover, the proposed approach provides an effective and timely mechanism for forecast-ing water levels in the lake, which can aid in water usage management, disaster monitoringetc.

In future, attempts can be made to find the best fitting residual correction modulein such prediction approach. Various forcing meteorological parameters, such as rainfall,temperature etc. can also be envisaged for future study.

Compliance with Ethical Standards

Conflict of Interests The authors have no conflict of interest.

References

Aguilera P, Fernandez A, Fernandez R, Rumı R, Salmeron A (2011) Bayesian networks in environmentalmodelling. Environ Modell Softw 26(12):1376–1388

Altunkaynak A (2007) Forecasting surface water level fluctuations of lake van by artificial neural networks.Water Resour Manag 21(2):399–408

Atkinson KE (2008) An introduction to numerical analysis. WileyBates B, Kundzewicz ZW, Wu S, Palutikof J et al (2008) Climate change and water, Intergovernmental Panel

on Climate Change (IPCC)Buyukyildiz M, Tezel G, Yilmaz V (2014) Estimation of the change in lake water level by artificial

intelligence methods. Water Resour Manag 28(13):4747–4763Chamoglou M, Papadimitriou T, Kagalou I (2014) Key-descriptors for the functioning of a mediterranean

reservoir: the case of the new lake karla-Greece. Environ Process 1(2):127–135Chang FJ, Chang YT (2006) Adaptive neuro-fuzzy inference system for prediction of water level in reservoir.

Adv Water Resour 29(1):1–10Chang LC, Chang FJ (2001) Intelligent control for modelling of real-time reservoir operation. Hydrol Process

15(9):1621–1634Chaves P, Chang FJ (2008) Intelligent reservoir operation system based on evolving artificial neural

networks. Adv Water Resour 31(6):926–936Christensen NS, Wood AW, Voisin N, Lettenmaier DP, Palmer RN (2004) The effects of climate change on

the hydrology and water resources of the colorado river basin. Clim Chang 62(1-3):337–363Cimen M, Kisi O (2009) Comparison of two different data-driven techniques in modeling lake level

fluctuations in Turkey. J Hydrol 378(3):253–262CWC (2015) Compendium on silting of reservoirs in India. cwc (central water commission) report. 2015.

ws & rs directorate, emo, cwc. new delhi. www.cwc.nic.in/main/downloads/CoSoR2015.pdf, [Online;Accessed 18-Jun-2015]

Das M, Ghosh SK (2014) A probabilistic approach for weather forecast using spatio-temporal inter-relationships among climate variables. In: 9Th IEEE international conference on industrial andinformation systems. IEEE, Gwalior, India, pp 15–17

Everitt BS (2002) The Cambridge Dictionary of Statistics, 2nd edn. Cambridge University Press. ISBN0-521-81099-X

Page 16: This file has been cleaned of potential threats. If you confirm that ...cse.iitkgp.ac.in/~monidipa.das/Papers/bnrc.pdf · basis. The proposed approach has been implemented for forecasting

3122 M. Das et al.

Fu L, Qi J (2008) A residual correction method for iterative reconstruction with inaccurate system model.In: 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro 2008. ISBI 2008.IEEE, pp 1311–1314

Getoor L, Rhee JT, Koller D, Small P (2004) Understanding tuberculosis epidemiology using structuredstatistical models. Artif Intell Med 30(3):233–256

Ishak WHW, Mahamud KRK, Norwawi NM (2012) Modelling reservoir water release decision usingtemporal data mining and neural network. Int J Emerg Technol Adv Eng 2(8):422–428

Kisi O, Shiri J, Nikoofar B (2012) Forecasting daily lake levels using artificial intelligence approaches.Comput Geosci 41:169–180

Maier HR, Jain A, Dandy GC, Sudheer KP (2010) Methods used for the development of neural networks forthe prediction of water resource variables in river systems: current status and future directions. EnvironModell Softw 25(8):891–909

McNider R, Handyside C, Doty K, Ellenburg W, Cruise J, Christy J, Moss D, Sharda V, HoogenboomG (2014) An integrated crop and hydrologic modeling system to estimate hydrologic impacts of cropirrigation demands. Environmental Modelling & Software

Mustafa MR, Isa MH, Rezaur RB (2012) Artificial neural networks modeling in water resources engineering:infrastructure and applications. World Acad Sci Eng Technol 62:341–349

Ondimu S, Murase H (2007) Reservoir level forecasting using neural networks: Lake naivasha. Biosyst Eng96(1):135–138

Ordonez Galan C, Matıas JM, Rivas T, Bastante F (2009) Reforestation planning using bayesian networks.Environ Modell Softw 24(11):1285–1292

Panagopoulos Y, Georgiou E, Grammatikogiannis A, Polizoi E, Mimikou M (2008) Impacts of humaninteraction on the sediment transport processes in the arachtos river basin, western Greece. Eur Water21(22):3–16

Piao S, Ciais P, Huang Y, Shen Z, Peng S, Li J, Zhou L, Liu H, Ma Y, Ding Y et al (2010) The impacts ofclimate change on water resources and agriculture in China. Nature 467(7311):43–51

Poff NL, Olden JD, Merritt DM, Pepin DM (2007) Homogenization of regional river dynamics by dams andglobal biodiversity implications. Proc Nat Acad Sci 104(14):5732–5737

Postel S, Richter B (2012) Rivers for life: managing water for people and nature. Island PressRussell SJ, Norvig P, Canny JF, Malik JM (2003) Artificial intelligence: A modern approach. Prentice Hall

SeriesSantafe G, Lozano JA, Larranaga P (2007) Discriminative vs. generative learning of bayesian network clas-

sifiers. In: Symbolic and Quantitative Approaches to Reasoning with Uncertainty. Springer, pp 453–464