Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land...

19
Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model Prashant K. Srivastava & Dawei Han & Miguel A. Rico-Ramirez & Deleen Al-Shrafany & Tanvir Islam Received: 20 May 2013 / Accepted: 29 September 2013 / Published online: 15 October 2013 # Springer Science+Business Media Dordrecht 2013 Abstract Microwave remote sensing and mesoscale weather models have high potential to monitor global hydrological processes. The latest satellite soil moisture dedicated mission SMOS and WRF-NOAH Land Surface Model (WRF-NOAH LSM) provide a flow of coarse resolution soil moisture data, which may be useful data sources for hydrological applica- tions. In this study, four data fusion techniques: Linear Weighted Algorithm (LWA), Multiple Linear Regression (MLR), Kalman Filter (KF) and Artificial Neural Network (ANN) are evaluated for Soil Moisture Deficit (SMD) estimation using the SMOS and WRF-NOAH LSM derived soil moisture. The first method (and most simplest) utilizes a series of simple combinations between SMOS and WRF-NOAH LSM soil moisture products, while the second uses a predictor equation generally formed by dependent variables (Probability Distributed Model based SMD) and independent predictors (SMOS and WRF-NOAH LSM). The third and fourth techniques are based on rigorous calibration and validation and need proper optimisation for the final outputs backboned by strong non-linear statistical analysis. The performances of all the techniques are validated against the probability distributed model based soil moisture deficit as benchmark; estimated using the ground based observed datasets. The observed high Nash Sutcliffe Efficiencies between the fused datasets with Probability Distribution Model clearly demonstrate an improved performance from the individual products. However, the overall analysis indicates a higher capability of ANN and KF for data fusion than the LWA or MLR approach. These techniques serve as one of the first demonstrations that there is hydrological relevant information in the coarse resolution SMOS satellite and WRF-NOAH LSM data, which could be used for hydrolog- ical applications. Water Resour Manage (2013) 27:50695087 DOI 10.1007/s11269-013-0452-7 P. K. Srivastava (*) : D. Han : M. A. Rico-Ramirez : D. Al-Shrafany : T. Islam WEMRC, Department of Civil Engineering, University of Bristol, Bristol, UK e-mail: [email protected] P. K. Srivastava e-mail: [email protected] T. Islam National Oceanic and Atmospheric Administration (NOAA), College Park, MD, USA

Transcript of Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land...

Page 1: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

Data Fusion Techniques for Improving Soil MoistureDeficit Using SMOS Satellite and WRF-NOAH LandSurface Model

Prashant K. Srivastava & Dawei Han & Miguel A. Rico-Ramirez &

Deleen Al-Shrafany & Tanvir Islam

Received: 20 May 2013 /Accepted: 29 September 2013 /Published online: 15 October 2013# Springer Science+Business Media Dordrecht 2013

Abstract Microwave remote sensing and mesoscale weather models have high potential tomonitor global hydrological processes. The latest satellite soil moisture dedicated missionSMOS and WRF-NOAH Land Surface Model (WRF-NOAH LSM) provide a flow of coarseresolution soil moisture data, which may be useful data sources for hydrological applica-tions. In this study, four data fusion techniques: Linear Weighted Algorithm (LWA), MultipleLinear Regression (MLR), Kalman Filter (KF) and Artificial Neural Network (ANN) areevaluated for Soil Moisture Deficit (SMD) estimation using the SMOS and WRF-NOAHLSM derived soil moisture. The first method (and most simplest) utilizes a series of simplecombinations between SMOS and WRF-NOAH LSM soil moisture products, while thesecond uses a predictor equation generally formed by dependent variables (ProbabilityDistributed Model based SMD) and independent predictors (SMOS and WRF-NOAHLSM). The third and fourth techniques are based on rigorous calibration and validationand need proper optimisation for the final outputs backboned by strong non-linear statisticalanalysis. The performances of all the techniques are validated against the probabilitydistributed model based soil moisture deficit as benchmark; estimated using the groundbased observed datasets. The observed high Nash Sutcliffe Efficiencies between the fuseddatasets with Probability Distribution Model clearly demonstrate an improved performancefrom the individual products. However, the overall analysis indicates a higher capability ofANN and KF for data fusion than the LWA or MLR approach. These techniques serve as oneof the first demonstrations that there is hydrological relevant information in the coarseresolution SMOS satellite and WRF-NOAH LSM data, which could be used for hydrolog-ical applications.

Water Resour Manage (2013) 27:5069–5087DOI 10.1007/s11269-013-0452-7

P. K. Srivastava (*) : D. Han :M. A. Rico-Ramirez :D. Al-Shrafany : T. IslamWEMRC, Department of Civil Engineering, University of Bristol, Bristol, UKe-mail: [email protected]

P. K. Srivastavae-mail: [email protected]

T. IslamNational Oceanic and Atmospheric Administration (NOAA), College Park, MD, USA

Page 2: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

Keywords SMOS . Soil moisture deficit . WRF-NOAHLSM . Data fusion . ANN . Kalmanfilter . Linear weighted algorithm . Probability DistributedModel

1 Introduction

The science of understanding and predicting soil moisture variability is extremely challeng-ing and thus complicating most of the flood and drought monitoring strategies (Jackson et al.1999). This fact induces the necessity to develop an improved estimate of soil moisture andrelated parameters by coupling land surface-atmosphere models in order to better predict itsvariability (Al-Shrafany et al. 2013). One of the key factors that are missing at the momentfor any flood/drought forecast is the level of soil moisture saturation and/or deficit. Thecurrent estimates of deficit are produced by analyzing a range of variables like rainfall,temperature and evaporation using the water balance equations (Calder et al. 1983). Theunderlying hypothesis is that any drought/flood forecast would be much more accurate if thecomputer models include not just rainfall and river flow but also the level of soil moisturedeficit (Alley 1984). Currently, some models are available which can depict degree of soilsaturation like ROSETTA using the soil hydraulic parameters (Gupta et al. 2012). However,the availability of fine spatial resolution hydraulic data is not always available and hencecomplicates the problems. One of most important indicators for showing the degree of soilsaturation is Soil Moisture Deficit (SMD) which is found to be inversely related with the soilmoisture in the soil layer (Srivastava et al. 2012a). If soil moisture deficit is higher in the soilthere is less chance of flood but conversely there will a higher chance of drought, if the samesituation persist for a prolonged time (Norbiato et al. 2008).

After the latest missions like Advanced Microwave Scanning Radiometer for EOS (AMSR-E) and Soil Moisture and Ocean Salinity (SMOS) there is now availability of a flow of coarsespatial resolution soil moisture data over the globe. The SMOS L-band (1.4 GHz) frequencymakes it very useful as at this frequency the atmosphere becomes almost transparent and surfaceemission is found to be strongly related to soil moisture over continental surfaces (Saleh et al.2006). The other important data source is ECMWF soil moisture global datasets for whichWeather Research and Forecasting (WRF) model can be used in integration with the NOAHLand Surface Model (Hong et al. 2009). WRF used here is a next-generation mesocalenumerical weather prediction system partly developed by the National Center forAtmospheric Research (NCAR) and is currently in operational use at the National Center forEnvironmental Prediction (NCEP) (Michalakes et al. 2001). The Land Surface Model plays animportant role in building soil temperature and moisture profiles and canopy properties (Hogueet al. 2005). Nevertheless, the initial assessment of these soil moisture datasets for SMDretrieval is very much needed over a local or regional scale. Recent studies show that SMOSsoil moisture is highly correlated to SMD, and hence indicating its possibility for SMD retrieval(Srivastava et al. 2013b). However, the WRF-NOAH LSM is not yet evaluated for the SMDretrieval, which makes the usability of this datasets under question. Through this work an initialassessment of WRF-NOAH LSM is also presented in comparison to the SMOS soil moisturealong with the data fusion techniques.

Data fusion is the integration of multiple data and knowledge representing the same real-world object into a consistent, accurate, and useful representation (Stathaki 2008). Thispaper presents the data fusion using the SMOS satellite and mesoscale model based SMD.The data fusion refers to combining datasets from two different sources to form a newcomposite datasets (Srivastava et al. 2013b). The techniques like Artificial Neural Network(ANN) already have been used in remote sensing arena for data fusion (Dornfeld and

5070 P.K. Srivastava et al.

Page 3: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

DeVries 1990). Similarly a number of researches have also reported linear weighted algorithm(LWA) (Li et al. 2003), Multiple Linear Regression (MLR) (Kutner et al. 2004) and Kalmanfilter (KF) (Sun 2004) for data fusion. However their applications on SMOS and mesoscalemodel based products are still unexplored. In this study, four techniques/methodologies arepresented which give user a variety of options to detect, recognise and identify problems in datafusion and their implementation. Further, this paper aims to address the following researchquestions: (1) Choice of datasets between SMOS or WRF-NOAH LSM; (2) Performances ofboth the soil moisture datasets with respect to SMD retrieval; and (3) If both have agreeableperformance, is there a way that both can be fused for a better retrieval. This paper tries toanswer most of the abovementioned research problems in a reasonable way.

2 Materials and Methodology

2.1 Case Study Area and Datasets

The Brue catchment (135.5 Km2) is chosen as the study area which is located in the south-westof England. The land use/land cover of Brue is illustrated in Fig. 1 with the Digital ElevationModel and land use. More details can be found in (Srivastava et al. 2013e). All meteorologicaldatasets are provided by the British Atmospheric Data Centre (BADC), UK. Despite theincreasing computing power from desktop PCs, downscaling ECMWF datasets using WRFat a high spatial and temporal resolution is still time consuming. Hence, in this work only 1 yeardatasets are taken into account starting from February 2011 to January 2012. For the benchmarkSMD retrieval during the period as mentioned for SMOS and WRF-NOAH LSM a rigorouscalibration and validation have been performed with PDM in which the data from the first24 months (Feb 2009 to Jan 2011) have been used for the calibration and the remaining12 months (Feb 2011–Jan 2012) are used for the validation purposes. The validation PDM

Fig. 1 Study area with digital elevation model, land use and WRF domains (Srivastava et al. 2013c)

Data Fusion Techniques for Improving Soil Moisture Deficit 5071

Page 4: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

SMDdata is then used as a benchmark for all the comparison. In this study, the global EuropeanCentre for Medium Range Weather Forecasts (ECMWF) ERA interim reanalysis dataare used and can be downloaded from http://www.ecmwf.int/. Recently, Srivastava et al.(2013e) reported a considerably higher performance of ECMWF than NCEP hence ECMWF isselected in this study for simulating soil moisture.

2.2 WRF-NOAH LSM

The Weather Research and Forecasting system (WRF) is a next generation model after theMM5 system (Skamarock and Klemp 2008). The WRF is a grid based model, using the finitedifferencing method to resolve the model dynamics at different pressure levels. Detailedinformation of the WRF model can be found in Michalakes et al. (2001). In this study, theWRF model is coupled with the NOAH Land surface model for retrieval of the volumetric soilmoisture content (Noilhan and Planton 1989). This land surface model scheme with WRFworks as an input for atmospheric processes or in other words provides the forcing from theground below to the lower part of the boundary layer which subsequently gets transported to therest of the atmosphere (Mohan and Bhati 2011). The modified LSM is based on the Penmanpotential evaporation approach with the multilayer soil model and the canopy model given byPan and Mahrt (1987) with an explicit canopy resistance formulation and a surface runoffscheme of Schaake et al. (1996). It is integrated with one canopy and snow layer. The soil watermovement between the soil layers is governed by the mass conservation law and the Richards’equation. For estimating the surface temperature a surface energy budget equation is used (Chenand Dudhia 2001). A detailed description of the WRF-NOAH LSM can be found in Chen andDudhia (2001). The WRF configuration used in this study has been well described in of ourprevious paper (Srivastava et al. 2013e) with an addition of NOAH LSM.

2.3 SMOS Soil Moisture and Downscaling

The SMOSmission is a joint program of the European Space Agency (ESA), the National Centrefor Space Studies (CNES - Centre National d’Etudes Spatiales), and the Industrial TechnologicalDevelopment Centre (CDTI – Centro para el Desarrollo Technológico Industrial). The MIRASinstrument in the SMOS satellite provide global information on surface soil moisture with anaccuracy of 4 % (Kerr et al. 2001). In this study, Level 2 SMOS soil moisture products are used.The grid of SMOS products is ~40 km of spatial resolution with the soil moisture retrieval unitin m3 m−3 (i.e. volumetric) (Kerr et al. 2012). For the comparison between the catchment andSMOS soil moisture, the SMOS pixel with its centroid over the Brue catchment is extracted andconsidered for the subsequent analysis. The ascending SMOS data products are chosen over thecatchment in order to minimise the factors impacting soil moisture retrieval, such as verticalsoil-vegetation temperature gradients (Srivastava et al. 2013a). Downscaling schemes proposedfor SMOS by Srivastava et al. (2013a) indicates that downscaled SMOS soil moisture productcould be a promising data source for the data fusion analysis. The detailed comparison betweennon-downscaled SMOS and WRF-NOAH LSM soil moisture with SMD is given in thesection 3.2.

2.4 Probability Distributed Model

The PDM (Probability Distributed Model) is used in this study. It is a fairly general conceptualrainfall-runoff model which transforms rainfall and evaporation data to flow at the catchmentoutlet (Moore 2007; Liu andHan 2010). The PDMhas beenwidely applied throughout the world,

5072 P.K. Srivastava et al.

Page 5: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

both for operational and design purposes (Bell and Moore 1998; Rees and Collins 2006; Cabus2008). The model formulations are well suited for automatic parameter estimations. The SMD isderived to determine the effect of drying on the catchment on the actual evapotranspiration (ET).In this study, the PDM is used for SMD estimation through PDM SMD routine (Moore 2007).More details about PDM are provided in Moore (2007).

2.5 Data Fusion Techniques

The data fusion techniques used in this study is briefly explained in the subsections with theirtheoretical backgrounds. All the techniques are employed through R programming language, anopen source software developed for statistical computing and graphics (RDevelopment 2010)over a Linux platform. The techniques for the data fusion involves linear weighted algorithm,multiple linear regression, Kalman filtering and artificial neural network.

2.5.1 Linear Weighted Algorithm

A linear combination model, in which the weight of every input system is calculated viathe returned results of the input systems, which is the same as the fusion process applies.This method is called as a linear weighted algorithm, which stands for the weighted sum.We first calculated the weights in the linear combination model, and then we multiplythis by the value of the input systems. The optimum weight is decided by using theseries of values from 0 to 1. The weighting values are changed for the both variables inthe opposite order with an interval of 0.1. Later on the best values are taken for datafusion using the weighted sum methodology. The linear weighed algorithm can beexpressed as:

LWA ¼Xi¼1

n

w1 � xi þ w2 � yið Þ ð1Þ

where, w1 and w2 are weighting coefficients which must follow the criteria w1+w2=1 andn is the total number of data values. The xi and yi represents the two variables i.e. SMOSand WRF-NOAH LSM soil moisture respectively. These two weights are derived fromthe both the data sources used for the fusion. However, the weights for the bestperformance are dependent on the values derived after optimization. In this study thevalues are obtained by changing values from 0 to 1. Total 11 combinations are used inthis study which is well discussed in the section 3.3 under the results and discussion.

2.5.2 Multiple Linear Regression Approach (MLR)

Multiple linear regression attempts to model the relationship between two or more explan-atory variables and a response variable by fitting a linear equation to observed data. Onevariable is considered to be an explanatory variable (xi), and the other is considered to be adependent variable (yi) (Johnson and Wichern 2002). They are represented as

yi ¼ xibþ ei; ð2Þ

where i=1, . . . , n,; yi is a dependent variable, xi is a vector of k independent predictors, b is avector of unknown parameters and the ei is stochastic disturbances. The PDM SMD is

Data Fusion Techniques for Improving Soil Moisture Deficit 5073

Page 6: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

chosen as a dependent variable while the SMOS and WRF-NOAH LSM soil moisture arefixed as independent predictors. Through the calibration and validation the best vectors ofunknown parameters are determined for the data fusion performances.

2.5.3 Kalman Approach

The Kalman algorithm can be used for filtering, smoothing and simulation of linear statespace models with exact diffuse initialization (Koopman 2012) using the KFAS packageover the R platform on Linux. The linear Gaussian state space model is given by

yt ¼ Ztαt þ εt:::: observation equationð Þ ð3Þ

αtþ1 ¼ Ttαt þ Rtηt:::::: transition equationð Þ ð4Þ

where εt~N(0,Ht),ηt~N(0,Qt) and α1~N(a1,P1) are independent of each other. The term yt isobserved response of class time series containing the data needs to be coerced (in thiscase SMOS and WRF-NOAH LSM soil moisture); αt is dynamics of the state vector; εtis noise and ηt is stochastic inputs. All the system and covariance matrices (Zt, Ht, Tt,Rt, Qt) are time-varying, and partially or totally missing observations yt are allowed(Koopman and Durbin 2003; Shumway et al. 2000). The function of KFAS requirediagonal covariance matrices Qt and Ht which is define over here by using LDLdecomposition (closely related variant of the classical Cholesky decomposition). Theαt is estimated using exact diffusion initialisation as formulated by (Koopman andDurbin 2000, 2003). For the unknown elements of initial state vector αt, KFAS usesthe exact diffuse initialisation (Koopman and Durbin 2000, 2003) where the unknowninitial states are set to have a zero mean and infinite variance. In case of non-Gaussianobservations the observation equations is of form p(yt|θt)=p(yt|Ztαt) with p(yt|θt) is fromeither Poisson or binomial distribution where θ denotes the parameters vector. In thisstudy a non-Gaussian distribution is used with binomial distribution. The initial valuesfor conditional mode θ are defined by log(mean(yt/(ut−yt))) for binomial distribution,where u is a vector specifying number of the trials at times 1,......,n and θt ¼ log πt= 1−πtð Þ½ � inwhich πt is the probability of success at time t. Over here the value of ut is taken as 1 specifiedfor the non-gaussian models as suggested by Koopman (2012). All functions of this KFASalgorithm uses the univariate approach which is adopted from (Shumway et al. 2000), as thereare no matrix inversions in univariate approach which therefore speedup the process ascompared to traditional Kalman approaches (Koopman and Durbin 2003). More details of thisapproach are provided in (Koopman 2012) and in KFAS manual (Helske 2010).

2.5.4 Artif icial Neural Network Technique

Artificial neural networks (ANNs) have proven to be a more powerful and self-adaptivemethod of pattern recognition as compared to traditional linear, simple nonlinear analysesand classification problems (Islam et al. 2012; Srivastava et al. 2012b). The structure ofANN used in this study is following the order 2-2-1 i.e. 2 input, 2 hidden layers and 1 output.This study has utilized the artificial neural network with the Levenberg-Marquardt training.The activation function of the hidden layer is sigmoid and the output layer is purelin. The

5074 P.K. Srivastava et al.

Page 7: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

structure of the ANN employed is shown in Fig. 2. For the output layer, a linear function isused with the relevant calculations as follows (Anderson and Davis 1995):

Oa ¼ hhiddenXp¼1

P

ia;pwa;p þ ba

!ð5Þ

where hhidden xð Þ ¼ 1

1þ e−xð6Þ

where, Oa is the output of the current hidden layer node a; P is the number of nodes in theprevious hidden layer; ia,p is an input to node a from either the previous hidden layer p ornetwork input p; wa,p is the weight modifying the connection from either node p to node a orfrom input p to node a,; and ba is the bias. In the above equation, hhidden (x) is the sigmoidactivation function of the node. In ANN, an appropriate normalisation of the training data isessential to avoid saturating the activation function which is done here using (Zhang et al.1998; Ishak et al. 2013):

znorm ¼ zo−zzmax−zmin

ð7Þ

where znorm = normalised value; z0 = original value; z = mean; zmax = maximum value; andzmin = minimum value. The best parameter values for decay rate, hidden layers and iterationsare used after optimizing the parameters by using the method as discussed in (Srivastavaet al. 2013a).

2.6 Performance Indices

The three performance statistics: the Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe1970), Root Mean Square Error (RMSE) and percent Bias (%Bias) are taken into account asperformance measurements. The Correlation, RMSE and SD are expressed in the form ofTaylor diagram (Taylor 2001) implemented through R programming language. It provides a

Fig. 2 Architecture of the ANN

Data Fusion Techniques for Improving Soil Moisture Deficit 5075

Page 8: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

way of graphically summarizing how closely a pattern matches observations. The Nash-Sutcliffe Efficiency (NSE) is estimated using:

NSE ¼ 1−

Xi¼1

n

yi−xi½ �2

Xi¼1

n

xi−xh i2 ð8Þ

where xi is the benchmark PDM SMD and yi is the estimated SMD.The Root Mean Square Error (RMSE) is calculated using:

RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

n

Xi¼1

n

yi−xi½ �2 !vuut ð9Þ

The absolute %Bias measures the positive or negative deviation of the estimated valuefrom the benchmark value. It can be calculated using:

%Bias ¼ 100*X

yi ið Þ−xi ið Þð Þ.X

xi ið Þð Þh i

ð10Þ

where x is the mean of benchmark PDM SMD and y is the mean of estimated SMD.

3 Results and Discussion

3.1 Evaluation of Hydro-Meteorological Datasets

For the calibration of the PDM model, 2 year of ground based observation data (February 1,2009 to January 31, 2011) were used, while for the validation 1 year (February 1, 2011 toJanuary 31, 2012) data was taken into account. The calibration procedure in particular focuseson the estimation of 13 PDM parameters to achieve a good agreement between simulated andobserved discharge. The details about PDM parameters are discussed briefly for the Bruecatchment in the paper (Srivastava et al. 2013d). The agreement of the model is judged by theNash–Sutcliffe efficiency (NSE) coefficients as an objective function. The model was validatedby simulating 1 year continuous time series data and by comparing it with the measured flowtime series over the validation period. The overall analysis indicates a NSE value of 0.84 duringthe calibration and 0.81 for the validation. The uncertainty analysis of the PDM simulationsindicates a quite satisfactory performance of themodel with p-factor (observations bracketed bythe 95% prediction uncertainty (95UB)) of 84% and 71% during the calibration and validation(Srivastava et al. 2013c). The soil moisture deficit is used to distribute the precipitation tothe various types of runoff basically divided into type’s i.e. slow and fast component. If thesoil moisture deficit is low then a sizeable portion of runoff will occur as fast runoff andthe remainder to slow base flow groundwater runoff (Moore 2007). The time seriesbetween the observed and simulated flow during the calibration and validation period areshown in Fig. 3.

The hydro-meteorological parameter used in this study comprises ETo (estimated from airtemperature, dew point, wind speed and net radiation using Penman Monteith equation) andrainfall. To analyze the critical relationship (linear and nonlinear) between the input data used inthis study, the Spearman (rs) and Pearson (r) correlation statistics are taken into account shownin Fig. 4. The analysis suggest that there is a strong nonlinearity between the rainfall and soil

5076 P.K. Srivastava et al.

Page 9: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

moisture and ETo datasets as a high rs correlation are observed as compared to r, while the rest ofthe parameters indicate a strong linear correlation with each other with comparable r and rscoefficients. As expected the rainfall indicates a negative correlation with the SMD and a positivecorrelation with the soil moisture datasets. By contrast an opposite relation is obtained with ETowhich is opposite to the observation as obtained with the rainfall. The ETo exhibits a higher degreeof correlation with the WRF-NOAH LSM followed by rainfall, whereas SMOS soil moistureshows a lower correlation to the two products (ETo and rainfall). A negative correlation is observedbetween ETo and rainfall with higher value of rs coefficient indicating a non linearity between thesetwo datasets. The poor reason behind lower performance of SMOS soil moisture can be attributedto the mismatch of spatial resolution between the catchment and SMOS soil moisture pixel. TheSMOS soil moisture has a spatial resolution of ~40 Km while the NOAH LSM soil moistureis a downscaled product up to a ~9 km spatial resolution which is much closer to thecatchment under study. This result reveals that the relationships with the abovementionedvariables are more justifiable if both have spatial resolutions closer to the catchment. Hence,the downscaling scheme proposed by Srivastava et al. (2013a) is used in this study for all thesubsequent analysis.

3.2 Soil Moisture Products

The time series depicting the soil moisture variations for the full hydrologic year is shown inFig. 5. The comparisons between ETo, SMOS and WRF-NOAH LSM soil moisture timeseries exhibit a high spatial variability with the seasons. A considerably low value of soil

Fig. 3 Flow time series during the calibration and validation with simulated hydrographs from the PDM forthe 2 year calibration and 1 year validation

Data Fusion Techniques for Improving Soil Moisture Deficit 5077

Page 10: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

moisture is recorded during the summer season and peaking normally in December toFebruary in fact maximum during these periods. As a comparison to ETo, the increasingtemperatures and high evaporation through the months April–May and August–Septemberdirected towards a progressive drying of the soil. The data analysis of rainfall indicates thatwinter season is the relatively wettest periods during the whole monitoring period. The ETorates decreases in the winter season followed by the rainfall which wets-up the soil profileleading to a surging soil moisture graph. It is also observed that during the very wetconditions, soil moisture and rainfall on average started to rise at approximately the sametime. Increasing ETo after mid of April or May can be linked to the reduction in soilmoisture. Hence this may be a possible reason for the low soil moisture record from Aprilto the beginning of August (usually, the driest and warmest period of the year). The analysisof r and rs indicates comparable performances between SMD and SMOS soil moisture asdiscussed in Section 3.1, therefore over here only a linear curve fitting is taken into accountfor deriving the empirical relationship between PDM SMD, SMOS and WRF-NOAH LSMsoil moisture. To estimate the performances of the soil moisture products, the goodness of

Fig. 4 Correlation matrix plots between PDM SMD, WRF-NOAH LSM soil moisture from ECMWF, SMOSsoil moisture, ETo and rainfall (r=Pearson; rs=Spearman correlation coefficient)

5078 P.K. Srivastava et al.

Page 11: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

fit i.e. NSE and RMSE are calculated for all the three datasets - SMOS Non-downscaled,WRF-NOAH LSM and SMOS downscaled with the PDM SMD shown in Table 1. The scatterplots during the validation are indicated in Fig. 6. The results show that the WRF-NOAH LSMhas a better performance than the SMOS non-downscaled product. However, after downscalingan improved performance is obtained using the SMOS soil moisture (Srivastava et al. 2013a).Due to a higher performance of the downscaled product henceforth, only downscaled SMOSsoil moisture derived SMD is taken into account for all the data fusion analysis.

3.3 Data Fusion Performances

From all the above analysis a clear physical connection between the PDM SMD and derivedSMD can be depicted and this can be used to build a pragmatic relationship between the twoparameters. Data fusion approach could be a possible option to combine both datasets for animproved retrieval of SMD. All the techniques need to be optimised for an optimum andstable results and thus require a comprehensive analysis of parameters before data fusion. Inthe case of LWA, a consideration of series changing weights (0–>1) is used for the model

Fig. 5 Time series depicting relationship between WRF-NOAH LSM using ECMWF and SMOS soilmoisture with ETo

Table 1 Performances statistics of SMD derived using WRF-NOAH LSM, SMOS Non-downscaled andSMOS downscaled soil moisture

Datasets Calibration Validation

NSE RMSE NSE RMSE

WRF-NOAH LSM 0.631 0.014 0.70 0.013

SMOS Non-downscaled 0.391 0.018 0.418 0.017

SMOS downscaled (Srivastava et al. 2013c) 0.797 0.012 0.751 0.011

Data Fusion Techniques for Improving Soil Moisture Deficit 5079

Page 12: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

Fig. 6 Scatter plots of SMD dur-ing the validation (a ECMWF; bSMOS Non-downscaled; c SMOSdownscaled)

5080 P.K. Srivastava et al.

Page 13: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

selection. This type of trial-and-error approach is commonly employed for the selection ofbest parameters for most of the linear weighted data fusion techniques. The performancewith different weights using SMOS soil moisture and WRF-NOAH LSM is shown in theTaylor diagram (Fig. 7). The Taylor diagram is used here to show the ability of the LWA datafusion performance for the selected weight in comparison to the PDM SMD. The circle markin the x-axis, called reference point, represents the perfect fit between algorithm results anddata. The position of the labels, representing the results of the different runs, is determinedby the values of the correlation R and of the standard deviation (SD) of the fused data. Closeris a label to the reference point; better is the performance of the run. The tendency toover/under estimate the PDM SMD is also indicated by the Taylor diagram, generally when

Fig. 7 Taylor diagram for theperformances of LWA with dif-ferent weights (a calibration; bvalidation)

Data Fusion Techniques for Improving Soil Moisture Deficit 5081

Page 14: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

the standard deviation of the simulated data is higher than that of the observed values anoverestimation can be predicted and vice versa. The data shows that in most of the weightcombinations LWA shows an overestimation and a high correlation with the PDM SMD. It isevident from the diagram that the weights combinations 0.1 for SMOS and 0.9 for WRF-NOAH LSM are more efficient and relatively skilful in fusing the datasets with fairly highperformances and can be considered as the best data fusion combination for LWA. For ANN,we began with a very small network and varied several parameters including the number ofhidden layers, number of nodes in the hidden layer, and decay rate in case of ANN technique.Here the training is based on minimizing the total error, generally expressed as a function of theweights. To combat the problem of over fitting and reduced generalisation, regularisationapproach is taken into account, employed using a weight decay function (Moody et al.1995). In this study to estimate the optimized value of decay function a series of decay valuesare taken into account as similar to the methodology given by (Srivastava et al. 2013a). Theparameterisation of ANN with respect to weight decay rate shows that 10,000 iterations with5×10−5 decay is sufficient for the optimized performance with 2 hidden layers architecture. Inthe case of Kalman filter all functions use a univariate approach also known as sequentialprocessing (Anderson and Moore 1979) in which the observations are introduced one elementat the time. As there is no matrix inversion in KFAS, it provides more stable and faster results.For the Kalman approach, the best solutions are obtained with 10,000 iterations during thelinearisation. For optimisation, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm istaken into account. The number of independent samples used in estimating the log-likelihood ofthe non-gaussian state space object is fixed here to zero, which gives good starting values for theoptimisation (Shumway et al. 2000).

The time series for the SMOS and WRF-NOAH LSM SMD fusion products using LWA,MLR, Kalman and ANN are shown in Fig. 8. All the SMD time series demonstrate highspatial correlations with the seasonal cycle. The data analysis of SMD indicates that winterseason is the wettest periods during the year, however from March to May a drier period is

Fig. 8 Time series of SMD obtained using data fusion techniques

5082 P.K. Srivastava et al.

Page 15: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

reported. As SMD is very much related to change in ETo, it’s increase is proportional to theSMD development. The seasonal trend analysis indicates a higher value of SMD during thesummer season can be ascribed to a higher ETo and lower rainfall during this period.However as the season progresses from drier to wet side of the year i.e. the period fromNovember to mid of February a significantly lower value of SMD is recorded. Thedecrease in SMD during this period can be attributed to the lowering down of temper-atures and ETo through the months and a higher rainfall between these periods whicheventually leads to the building up of high soil moisture. A sudden decrease in SMD isrecorded during the June can be attributed to some short duration storm which wet up thesoil profile and hence a lower SMD. To diagnose the performance of the data fusiontechniques the three statistics as mentioned in methodology section are taken intoaccount. The goodness of fit of all the data fusion are indicated in Table 2, while theirscatter plots with 1:1 equiline are represented in Fig. 9. The NSE for the LWA is foundas 0.880 with RMSE of 0.008 during the calibration and NSE=0.885 and RMSE=0.008during the validation, similar statistics are also shown by MLR technique. The perfor-mance is slightly improved in case of Kalman and ANN technique as statistics indicates avery close relation between the PDM SMD and fusion datasets derived SMD. The NSEfor the fused datasets in case of ANN indicates a value 0.896 with RMSE of 0.008during the calibration, while in validation a marginally improved performance is obtainedas compared to other techniques (NSE=0.911 and RMSE=0.008). Rest of the techniquesare showing a little lesser performance statistics during the validation such as for Kalman(NSE=0.902; RMSE=0.007), LWA (NSE=0.885; RMSE=0.008) and MLR (NSE=0.885;RMSE=0.008) are obtained. All the techniques gives a very low %Bias statistics,indicates a better fit of the models and hence very low possibility of under/overprediction. A slight underestimation is obtained with ANN technique while the othersare generally showing an over prediction. The detailed statistical diagnosis of this studyshows that a strong linearity is existed in the input datasets for the data fusion with PDMSMD and that’s the reason trivial improvement by using non-linear techniques such asKalman and ANN are observed. The overall performance comparison of non fuseddatasets as compared to LWA, MLR, Kalman and ANN algorithms reveals that all thetechniques can effectively fused the datasets with an improved performance, imposingthat the data fusion using Kalman and ANN marginally improves the SMOS and WRF-NOAH LSM products as compared to the original SMOS or WRF- NOAH LSM derivedSMDs. However, the main limitation with Kalman and ANN is that it takes morecomputational time than linear weighted algorithms. The application of LWA and MLRare also showing an improved performance and can be an alternative for data fusion inabsence of ANN or Kalman techniques, as it is more basic and easy to implement.

Table 2 NSE, RMSE and %Bias for the four data fusion techniques

Algorithms Calibration Validation

NSE RMSE %Bias NSE RMSE %Bias

LWA 0.880 0.008 −1.104 0.885 0.008 0.584

MLR 0.881 0.008 −0.062 0.885 0.008 1.616

KF 0.905 0.007 −1.008 0.902 0.007 0.554

ANN 0.896 0.008 −1.380 0.911 0.008 −0.330

Data Fusion Techniques for Improving Soil Moisture Deficit 5083

Page 16: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

Calibration Validation

Calibration Validation

Calibration Validation

Calibration Validation

MLR

LWA

ANN

KF

a

b

c

d

Fig. 9 Scatter plots for the estimated SMD during the calibration and validation aMLR b LWA c ANN and dKalman

5084 P.K. Srivastava et al.

Page 17: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

4 Conclusion

The study focuses on the utilization of SMOS soil moisture and the WRF-NOAH LSM forsoil moisture deficit estimation using various data fusion techniques. Comparisons betweenthe various techniques are briefly presented for SMD estimation at a catchment scale. Thisstudy provides a first-time comprehensive evaluation of data fusion technique for data userswho wish to apply the soil moisture for hydrological modelling at local or regional scale. Inthis work, synergistic evaluation of the four techniques for data fusion has been performed.The simulated SMD appears generally overestimated, especially concerning the LWA, MLRand Kalman. In contrary, a slight underestimation is shown by the ANN technique. All thedata fusion algorithms generally improve the data quality in comparison with the originalSMOS or WRF-NOAH LSM data and hence are suitable for soil moisture deficit estimationfor hydrological applications. The ANN based data fusion marginally improves the soilmoisture derived from SMOS satellite and WRF-NOAH LSM and outperforms all the otherdata fusion techniques. This study shows that the LWA and MLR involve less expertise andeasy algorithm and hence reduce the computational time as compared to the Kalman andANN. The overall analysis indicates that, these techniques can be used for the data fusionand will provide hydrologists with valuable information on the applicability of SMOS andWRF-NOAH LSM for SMD estimations. However, further exploration of this potentiallyvaluable data source by the scientific community has been recommended so that usefulunderstanding and knowledge could be accumulated in the technical literature domain.Furthermore, the land surface hydrology could be improved by adding the representationof a soil moisture deficit and assimilating it in rainfall runoff models. Future research willfocus on the new algorithm development for improved soil moisture retrieval. Advancedinvestigation should be made in the near futures towards input data improvement currentlyavailable for the data fusion. It will include WRF-NOAH LSM parameterization schemesand a more accurate retrieval of SMOS soil moisture using modified radiative transferequations.

Acknowledgments The authors would like to thank the Commonwealth Scholarship Commission, BritishCouncil, United Kingdom and Ministry of Human Resource Development, Government of India for providingthe necessary support and funding for this research. The authors would like to acknowledge the BritishAtmospheric Data Centre and Environment Agency, United Kingdom for providing the ground datasets. Theauthor also acknowledges the Advanced Computing Research Centre at University of Bristol for providing theaccess to the supercomputer facility (The Blue Crystal) and Linux R support.

References

Alley WM (1984) The Palmer drought severity index: limitations and assumptions. J Clim Appl Meteorol23(7):1100–1109

Al-Shrafany D, Rico-Ramirez MA, Han D, Bray M (2013) Comparative assessment of soil moistureestimation from land surface model and satellite remote sensing based on catchment water balance.Meteorol Appl. doi:10.1002/met.1357

Anderson JA (1995) An introduction to neural networks, vol 1. MIT Press, Cambridge MA, p 650Anderson B, Moore J (1979) Optimal filtering. Prentice-Hall, Englewood CliffsBell V, Moore R (1998) A grid-based distributed flood forecasting model for use with weather radar data: Part

2. Case studies. Hydrol Earth Syst Sci 2(2/3):283–298Cabus P (2008) River flow prediction through rainfall–runoff modelling with a probability-distributed model

(PDM) in Flanders, Belgium. Agric Water Manag 95(7):859–868

Data Fusion Techniques for Improving Soil Moisture Deficit 5085

Page 18: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

Calder I, Harding R, Rosier P (1983) An objective assessment of soil-moisture deficit models. J Hydrol 60(1–4):329–355

Chen F, Dudhia J (2001) Coupling an advanced land surface-hydrology model with the Penn State-NCARMM5 modeling system. Part I: Model implementation and sensitivity. Mon Weather Rev 129(4):569–585

Dornfeld DA, DeVries M (1990) Neural network sensor fusion for tool condition monitoring. CIRP AnnManuf Technol 39(1):101–105

Gupta M, Garg N, Joshi H, Sharma M (2012) Persistence and mobility of 2, 4-D in unsaturated soil zone underwinter wheat crop in sub-tropical region of India. Agric Ecosyst Environ 146(1):60–72

Helske J (2010) KFAS: Kalman filter and smoothers for exponential family state space models. R packageversion 06 0, URL http://CRAN.R-projectorg/package=KFAS

Hogue TS, Bastidas L, Gupta H, Sorooshian S, Mitchell K, Emmerich W (2005) Evaluation and transferabilityof the Noah land surface model in semiarid environments. J Hydrometeorol 6(1):68–84

Hong S, Lakshmi V, Small EE, Chen F, Tewari M, Manning KW (2009) Effects of vegetation and soilmoisture on the simulated land surface processes from the coupled WRF/Noah model. J Geophys Res114(D18), D18118

Ishak A, Remesan R, Srivastava P, Islam T, Han D (2013) Error correction modelling of wind speed throughhydro-meteorological parameters and mesoscale model: a hybrid approach.Water Resour Manag 27(1):1–23.doi:10.1007/s11269-012-0130-1

Islam T, Rico-Ramirez MA, Han D, Bray M, Srivastava PK (2012) Fuzzy logic based melting layerrecognition from 3 GHz dual polarization radar: appraisal with NWPmodel and radio sounding observations.Theor Appl Climatol 112(1–2):317–338

Jackson TJ, Le Vine DM, Hsu AY, Oldak A, Starks PJ, Swift CT, Isham JD, Haken M (1999) Soil moisturemapping at regional scales using microwave radiometry: the Southern Great Plains Hydrology Experi-ment. Geosci Remote Sens IEEE Trans 37(5):2136–2151

Johnson RA,Wichern DW (2002) Applied multivariate statistical analysis, vol 4. Prentice Hall, Upper Saddle RiverKerr YH, Waldteufel P, Wigneron JP, Martinuzzi J, Font J, Berger M (2001) Soil moisture retrieval from

space: the Soil Moisture and Ocean Salinity (SMOS) mission. Geosci Remote Sens IEEE Trans 39(8):1729–1735

Kerr YH,Waldteufel P, Richaume P,Wigneron JP, Ferrazzoli P,Mahmoodi A, Al Bitar A, Cabot F, Gruhier C, JugleaSE (2012) The SMOS soil moisture retrieval algorithm. Geosci Remote Sens IEEE Trans 50(5):1384–1403

Koopman SJ (2012) Time series analysis by state space methods, vol 38. Oxford University Press, Oxford,p 346

Koopman SJ, Durbin J (2000) Fast filtering and smoothing for multivariate state space models. J Time SerAnal 21(3):281–296

Koopman SJ, Durbin J (2003) Filtering and smoothing of state vector for diffuse state–space models. J TimeSer Anal 24(1):85–98

Kutner MH, Nachtsheim C, Neter J (2004) Applied linear regression models. McGraw-Hill/Irwin; 5th edn(August 10, 2004), p 1396

Li XR, Zhu Y, Wang J, Han C (2003) Optimal linear estimation fusion. I. Unified fusion rules. Inf TheoryIEEE Trans 49(9):2192–2208

Liu J, Han D (2010) Indices for calibration data selection of the rainfall-runoff model. Water Resour Res46(4), W04512

Michalakes J, Chen S, Dudhia J, Hart L, Klemp J, Middlecoff J, Skamarock W (2001) Development of a nextgeneration regional weather research and forecast model. In: Developments in Teracomputing: Proceed-ings of the Ninth ECMWF Workshop on the use of high performance computing in meteorology. WorldScientific, pp 269–276

Mohan M, Bhati S (2011) Analysis of WRF model performance over subtropical region of Delhi, India. AdvMeteorol 2011:13. doi:10.1155/2011/621235

Moody J, Hanson S, Krogh A, Hertz JA (1995) A simple weight decay can improve generalization. AdvNeural Inf Process Syst 4:950–957

Moore R (2007) The PDM rainfall-runoff model. Hydrol Earth Syst Sci 11(1):483–499Nash JE, Sutcliffe J (1970) River flow forecasting through conceptual models part I—a discussion of

principles. J Hydrol 10(3):282–290Noilhan J, Planton S (1989) A simple parameterization of land surface processes for meteorological models.

Mon Weather Rev 117(3):536–549Norbiato D, Borga M, Degli Esposti S, Gaume E, Anquetin S (2008) Flash flood warning based on rainfall

thresholds and soil moisture conditions: an assessment for gauged and ungauged basins. J Hydrol362(3):274–290

Pan H-L, Mahrt L (1987) Interaction between soil hydrology and boundary-layer development. Bound-LayerMeteorol 38(1):185–202

5086 P.K. Srivastava et al.

Page 19: Data Fusion Techniques for Improving Soil Moisture Deficit Using SMOS Satellite and WRF-NOAH Land Surface Model

RDevelopment C (2010) TEAM. 2006. R: a language and environment for statistical computing. R Founda-tion for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org

Rees HG, Collins DN (2006) Regional differences in response of flow in glacier–fed Himalayan rivers toclimatic warming. Hydrol Process 20(10):2157–2169

Saleh K, Wigneron J-P, De Rosnay P, Calvet J-C, Escorihuela MJ, Kerr Y, Waldteufel P (2006) Impact of raininterception by vegetation and mulch on the L-band emission of natural grass. Remote Sens Environ101(1):127–139

Schaake JC, Koren VI, Duan Q-Y, Mitchell K, Chen F (1996) Simple water balance model for estimatingrunoff at different spatial and temporal scales. J Geophys Res D Atmos 101:7461–7475

Shumway RH, Stoffer DS, Stoffer DS (2000) Time series analysis and its applications, vol 549. Springer, NewYork

Skamarock WC, Klemp JB (2008) A time-split nonhydrostatic atmospheric model for weather research andforecasting applications. J Comput Phys 227(7):3465–3485

Srivastava PK, Han D, Rico-Ramirez MA (2012a) Assessment of SMOS satellite derived soil moisture for soilmoisture deficit stimation. Symposium on Prediction in Ungauged basin (PUBS) co-organized by DelftUniversity of Technology, Delft, Netherlands and International Association of Hydrological Sciences(IAHS) dated 22–25 October 2012:1

Srivastava PK, Han D, Rico-RamirezMA, BrayM, IslamT (2012b) Selection of classification techniques for landuse/land cover change investigation. Adv Space Res 50(9):1250–1265. doi:10.1016/j.asr.2012.06.032

Srivastava PK, Han D, Ramirez MR, Islam T (2013a) Machine learning techniques for downscaling SMOSsatellite soil moisture using MODIS land surface temperature for hydrological application. Water ResourManag 27(8):3127–3144. doi:10.1007/s11269-013-0337-9

Srivastava PK, Han D, Rico-Ramirez MA (2013b) Data fusion techniques for an improved soil moistureretrieval using SMOS and WRF-NOAH Land surface model SMOS land application workshop, ESA-ESRIN, Frascati, Italy 25–27 February 2013

Srivastava PK, Han D, Rico-Ramirez MA, Islam T (2013c) Sensitivity and uncertainty analysis of mesoscalemodel downscaled hydro-meteorological variables for discharge prediction. Hydrol Process. doi:10.1002/hyp.9946

Srivastava PK, Han D, Rico Ramirez MA, Islam T (2013d) Appraisal of SMOS soil moisture at a catchmentscale in a temperate maritime climate. J Hydrol. doi:10.1016/j.jhydrol.2013.06.021

Srivastava PK, Han D, Rico Ramirez MA, Islam T (2013e) Comparative assessment of evapotranspirationderived from NCEP and ECMWF global datasets through Weather Research and Forecasting model.Atmos Sci Lett 14(2):118–125. doi:10.1002/asl2.427

Stathaki T (2008) Image fusion: algorithms and applications. Academic Press, London, p 520Sun S-l (2004) Multi-sensor optimal information fusion Kalman filters with applications. Aerosp Sci Technol

8(1):57–62Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res

106(D7):7183–7192Zhang G, Eddy Patuwo B, Hu MY (1998) Forecasting with artificial neural networks: the state of the art. Int J

Forecast 14(1):35–62

Data Fusion Techniques for Improving Soil Moisture Deficit 5087