Performance of Gostatistical_18 Feb2006

download Performance of Gostatistical_18 Feb2006

of 13

Transcript of Performance of Gostatistical_18 Feb2006

  • 8/9/2019 Performance of Gostatistical_18 Feb2006

    1/13

    O R I G I N A L P A P E R

    Daro Rojas-Avellaneda Jose Luis Silva n-Ca rdenasPerformance of geostatistical interpolation methods for modeling

    sampled data with non-stationary mean

    Published online: 18 February 2006 Springer-Verlag 2006

    Abstract The measured ozone pollution peak in theatmosphere of Mexico City region was considered inorder to study the effect of a non-stationary mean of thesampled data in geostatistics interpolation methods.

    With this objective the local mean value of the sampleddata was estimated through a linear regression analysisof their values on the monitoring stations coordinates.The residuals obtained by removing the data trend areconsidered as a set of stationary random variables.Several interpolation methods used in geostatistics, suchas inverse distance weighted, kriging, and artificialneural networks techniques were considered. In an effortto optimize and evaluate its performance, we fit inter-polated values to sampled data, obtaining optimal val-ues for the parameters defining the used model, thatmeans, the values of the parameters that give the lowestmean RMSE between the interpolated value and mea-

    sured data at 20 stations at 1500 hours for a set of21 days of December 2001, which was chosen as thetraining set. The training set is conformed by all the daysin December 2001 excepting the days (3,6,9,12,...,27,30)which were considered as the testing set. Once theoptimal model is obtained, it is used to interpolate thevalues at the stations at 1500 hours for the testing days.The RMSE between interpolated and measured valuesat monitoring stations was also evaluated for thesetesting values and is shown as a percentage in Table 2.These values and the defined generalization parameterG, can be used to evaluate the performance and theability of the models to predict and reproduce the peak

    of ozone concentrations. Scatter plots for testing dataare presented for each interpolation method. An inter-pretation of the ozone pollution levels obtained at

    1500 hours at December 21 was given using the windfield that prevailed in the region 1 h before the same day.

    Keywords Interpolation Non-stationary data

    Kriging

    Artificial neural network

    Ozone pollution

    1 Introduction

    Two types of mathematical models are mainly used inurban and regional air quality studies: deterministic andstatistical. Deterministic models are based on the fun-damental description of atmospheric chemical andphysical processes (Seinfeld and Pandis 1998, pp. 11931240) while statistical models are characterized by theirdirect use of air quality measurements to infer semi-empirical relationships (Seinfeld and Pandis 1998, pp.

    12451283).A deterministic model involves the description of

    emission sources, meteorology, chemical transformationand removal processes. The concentration of a pollutantsatisfies the advectiondiffusion equation (Jacobson2000, Eq. 3.30). Because the flows of interest in airpollution studies are turbulent, actual numerical solu-tions of this equation require that variables in equationbe averaged in space, over grid volumes and in time.

    Grid volume averaging is commonly performed usingthe assumption that each variable u in the equation isdecomposed into an average term "u plus a sub-gridperturbation u

    u "u u0:

    The average of the sub-grid perturbation is assumed tobe zero.

    Solutions of the equation for the average term arefound only through several semi empirical assumptions(Seinfeld and Pandis 1998). Several methods have beenproposed for their solution (Oran and Boris 1987;Jacobson 2000) including, global finite differences,operator splitting, finite elements methods and spectralmethods. These methods determine mean pollutant

    D. Rojas-Avellaneda (&) J. L. Silva n-Ca rdenasCentro de Investigacio n en Geografa y Geoma ticaIng. Jorge L. Tamayo, Contoy No.137,Lomas de Padierna, Tlalpan, C.P.14740,Me xico, D.F.,, Me xicoE-mail: [email protected]: +52-55-26152289E-mail: [email protected]

    Stoch Environ Res Risk Assess (2006) 20: 455467DOI 10.1007/s00477-006-0038-5

  • 8/9/2019 Performance of Gostatistical_18 Feb2006

    2/13

    concentrations in urban or rural areas, using averageemission rates from point and area sources, and mete-orological information.

    The application of mathematical models of this kindis often limited because its complexity and the require-ment of precise knowledge of some values (such as initialvalue and boundary conditions of concentrations, andinventories and emission factors of sources) and detailedmeteorological information is necessary.

    On the other hand, statistical models are based on thefact that air pollutant concentrations are inherentlyrandom, because of their dependence on the fluctuationsof meteorological variables (such as wind direction andspeed) and emission variables (such as emission rates,plume rise, types of sources) (Seinfeld and Pandis 1998).Several types of statistical methods are frequently usedin air pollution studies (Gilbert 1987). Frequency dis-tributions are mostly used to asses the probability den-sity function of the air pollutant concentration(Georgopoulos and Seinfeld 1982). Time series analysismethods are used in the analysis of ordered data in atime sequence (Simpson and Layton 1983). Spectral

    analysis techniques allow the identification of cycles inmeteorological (Trivikrama et al. 1976) and air qualitytime-series measurements and regression analysis tech-niques are used to study the concentration of a pollutantas a function of meteorological conditions and of otherspollutants concentration (Barone et al. 1978).

    Many air pollution studies have employed spatialinterpolation methods to produce maps of air pollutionconcentrations. These studies are based primarily ondistance-weighting methods (Falke and Husar 1996;De Leeuw et al. 1997; Phillips et al. 1997) and kriging(Mulholland et al. 1998; Phillips et al. 1997; Liu andRossini 1996; Yi and Prybutok 1996). The inverse dis-

    tance-weighting method (inverse distance weighted,IDW) typically assigns more weight to nearby pointsthan to distant points (Phillips et al. 1997). Kriging is aregression-based technique that estimates values at un-sampled locations using weights reflecting the correla-tion between data at two sampled locations or between asample location and the location to be estimated (Wa-ckernagel 2003).

    Both the IDW method and kriging directly usecoordinate information of sample points to performinterpolation and krigings performance specially isdependent on the presence of spatial autocorrelation(values at nearby points are more similar than are values

    at distant points). The accuracy in the estimation ofspatial interpolation data using these methods dependson the relatively high sampling densities of air qualitymonitors as well as on appropriate spatial distributionsof those monitors (Diem and Comrie 2002). Neverthe-less relative accuracy was obtained recently with theapplications of the IDW and kriging methods for esti-mating ambient ozone concentrations in some metro-politan areas in the United States, with relatively lowsampling densities and not uniformly spaced sampledlocations (Mulholland et al. 1998; Liu and Rossini

    1996). Since these conditions are similar to the existingin the monitoring locations in Mexico City region wealso considered these two techniques to predict ozonepollution levels across this area. Kriging is more com-plex and has been used more widely than IDW methods.Some comparison studies (Phillips et al. 1997; Leenaerset al. 1990) have found kriging to out-perform IDWmethods.

    We have conducted a study with three interpolationprocedures to predict ozone concentration values atunsampled locations in ambient air over Mexico City:IDW, kriging and artificial neural networks (ANN).

    The ANN is a nonlinear regression method coupledwith an optimization routine named back-propagationfor determining the parameters of the regression. Therelationship between meteorology and air pollution iscomplex, nonlinear and potentially multi-scale in nature.Ambient ozone concentrations are determined by acomplex interaction between radiative, chemical andmeteorological processes. ANN are capable of modelinghighly nonlinear relationships and can be trained toaccurately generalize when presented with new, unseen

    data (Gardner and Dorling 1998). Multilayer perceptronneural network have been shown to be effective alter-natives to more traditional statistical techniques withinthe field of air quality prediction, and unlike them makesno prior assumptions concerning the data distribution.Results from the neural network were shown to be betterthan those obtained from regression analysis for pre-diction of maximum ozone concentrations in an indus-trialized urban area (Yi and Prybutok 1996). Comrie(1997) showed that the neural network approach isconsistently better than regression models for ozoneforecasting, although the gains in performance are onlysmall to moderate. Other applications of multilayer

    perceptrons for prediction include the forecasting ofatmospheric sulfur dioxide concentrations in a highlypolluted industrialized area of Slovenia (Boznar et al.1993), and to predict hourly NOx and NO2 concentra-tions from readily observable local meteorological data(Gardner and Dorling 1999).

    This paper is divided into several sections and sub-sections. We first begin in Sect. 2 by providing a shortoverview of spatial interpolation methods. The simpleIDW method is considered in Sect. 2.1 while the opti-mized IDW is outlined in Sect. 2.2. In Sect. 2.3, thekriging model is described and also are defined the threecases simple kriging, SK, ordinary kriging OK, and

    Universal Kriging UK. In Sect. 2.3.1, we outlined theprocedure to obtain sampled variograms and variogrammodels required for the application of the krigingmethods. We then present in Sect. 2.4 a brief introduc-tion to ANN and a description of the basic algorithm fortraining a neural network, known as back propagation.In Sect. 3 we provided a description of the study areaand data used. In this section is presented a descriptionof the required steps to fit different interpolation meth-ods to measured data, in order to clarify the processesconsidered in the work. In Sect. 3.1 we provided a

    456

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/9/2019 Performance of Gostatistical_18 Feb2006

    3/13

    description of the required steps for transforming a dataset to one with means zero value using the conventionalmultiple regression model. In Sect. 3.2 we described theinterpolation process used in the present work. Thetraining and testing processes are presented inSects. 3.2.1 and 3.2.2, respectively.

    The results obtained by applying algorithms to spe-cific data are presented in Sect. 4. The performance ofthe three methods is compared graphically or numeri-cally and finally in Sect. 5, remarks on advantages anddisadvantages of each method are made in the conclu-sions.

    2 Spatial interpolation methods

    2.1 IDW

    The IDW interpolation method is commonly used forestimating air pollutant concentrations at locations be-tween monitoring stations. The IDW is based upon theintuitive idea that nearer observations must have more

    influence on the estimated value than farther ones. Thisis a local method for the estimation of Z on x0 with thefollowing expression:

    Zx0

    PNi1

    wd1i zxi

    PNi1

    wd1i

    ; 1

    where w() is the weighting function of the inverse of thedistance di between the observation at xi and the inter-polation point x0. Equation 1 is referred to as thestandard IDW interpolation for the most simple weight

    definition wi(di

    1)=di-a

    , with a=1 or 2. This weightcomprises a monotonically decreasing function thatvanishes as the distance tends to infinity.

    2.2 Optimized IDW

    In practice, it is desirable that the method should beflexible enough to optimize the datasets by limiting theradius of influence for the weighting function andexploring with different decay exponents. The optimizedIDW is an attempt to provide such flexibility, and it hasthe advantage that these two parameters are chosen

    optimally according to a minimum root mean squareerror (RMSE) criterion. The interpolated Zvalue for theoptimized IDW can be obtained throughout the fol-lowing expression,

    Z

    PNn1

    wnZn

    PNn1

    wn

    2

    with the weights given by,

    wn K

    1 K1 dn=r a ; dn r

    0; dn > r

    &: 3

    The parameters a and r can be estimated by minimizingthe square root of the mean-square differences betweenthe measured and the estimated value, and dn is thediscrete distance variable. The parameter K is a scalingconstant that makes the weight at d=0 finite rather thaninfinite as in the standard case. Some weighting func-tions for parameters r=10 and K=10 are shown inFig. 1 to illustrate how the exponent a modulates theweighting function.

    2.3 Kriging

    Environmental sciences have recently started to usegeostatistics as a mean to interpolate data, and to ex-plore forms of spatial variation. Pollution can be con-sidered a regionalized variable z(x) defined throughout aregion D, such that kriging can be used to explore theway that contaminants vary in space.

    Several kriging models can be used in environmentalsciences (Goovaerts 1997). All kriging models considerz(x) as a continuous random function realization of astochastic process Z(x), defined for the study area D,which satisfies the intrinsic hypothesis defined by twoconditions:

    1. for any fixed displacement vector h the expected value(or mean) of the increments is:

    E Z x Z x h f g mh 8x,x h 2 D:

    This condition assumes that the random function has alinear drift m(h). If the drift is zero (i.e. m=0), that is, if

    the random function has a constant mean and this meanis known we must consider the simple kriging (SK)

    Fig. 1 Weighting functions with different decay exponents. Bothcutoff and scaling parameters equal 10

    457

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/9/2019 Performance of Gostatistical_18 Feb2006

    4/13

    model. If the mean is constant but unknown, the or-dinary kriging model (OK) must be considered, in con-trast to the Universal Kriging model (UK) whichconsiders unknown, but non-constant mean (m(h) 0).Universal Kriging is the appropriate method to use withnon-stationary data since this method, unlike OK,incorporates a drift function to account for the trend ofthe data.2. the variance of the increments exists and is indepen-

    dent of the observation point x, that is:

    Var Z x Z x h f g 2c h 8x,x h 2 D:

    Function c is called the semivariogram, which is said tobe isotropic if it depends only on the modulus of h, andanisotropic in the case it depends on both the modulusand the direction of h. The anisotropy of the contami-nation data would require the use of an anisotropicsemivariogram model. The sparse ozone-sampled loca-tions, and the inappropriate spatial distributions ofmonitors, dont enable the evaluation of an anisotropicsemivariogram model and consequently in this work weconsider an isotropic model approach. The randomvariables Z(x) and Z(x+h) relate to the same attribute z,i.e. the concentration of a given contaminant, at twodifferent locations x and x+h.

    In order to estimate Z0=Z(x0) (i.e., the contaminantconcentration value at x0), the particular case of anunknown non-constant mean is considered, and a linearestimator Z0

    Pi

    kiZi is used. The UK model also as-sumes that the mean m(x) can be written as a finiteexpansion:

    mx0 XL

    l0

    alflx0; 4

    where the fl (x0) are known basis functions and al arefixed but unknown coefficients. The first basis function(case l=0) is the constant function equal to 1, whichguarantees that the constant-mean case (OK) is includedin the model. The other functions are consideredmonomials in the coordinates (x), so that f l (x0)=x0

    l .In this work the first two terms for the basis functions

    are considered, the case l=0, corresponding to a con-stant mean, and l=1 a linearly varying mean. Thecoefficients al in Eq. 4 are determined through a linearregression process of the sample data on the coordinatesof the sample locations. The residual function R(x),

    defined as the difference between Z(x) and the estimatedmean, is a random function with a constant mean equalto zero. For this function we must consider the simplekriging model.

    The weighting factors ki for the estimatorR0

    Pi

    k0iRi are chosen in a form that produces anunbiased estimator (i.e., E[R0*-R0]=0) with lowestvariance (i.e., Var[R0*-R0] is minimum). These twoconditions lead to a set of N linear equations called theSimple Kriging system (SK system) (Chile` s and Delfiner1999; Eq. 3-2)

    XNj1

    k0jCxi;xj Cxi;x0 i 1; 2; . . . ;N; 5

    where C(xi, xj) is the covariance between R(xi) and R(xj);R(xi)=Z(xi)m(xi), is the residual at sample point xi; x0is the interpolation point. The covariance is obtainedfrom the variogram model and its parameters sill andrange using the basic relation between the variogram

    and the corresponding covariance C(h):

    ch C0 Ch 6

    (Chile` s and Delfiner 1999; Eq. 2.3) which is a relationvalid for stationary random functions and where,C(0)=r2 is the variance.

    The value of Z0* is then given by:

    Zx0 X

    i

    k0

    iZxi mxi mx0: 7

    2.3.1 Estimation of the variogram model

    In practice, the variogram model is estimated from thesample variogram c *.We consider in this work thesample variogram proposed by Cressie and Hawkins(Cressie 1991; Eq. 2.4.12)

    2c h Nhj j

    0:457 Nhj j 0:494

    1

    Nhj j

    XNh

    Zxi Zxj 1=2

    8