Statistical modelling and prediction of atmospheric pollution by particulate material: two...

13
ENVIRONMETRICS Environmetrics 1990^ 01]036Ð048 Received 8 December 0888 Copyright Þ 1990 John Wiley + Sons\ Ltd[ Accepted 14 May 1999 Statistical modelling and prediction of atmospheric pollution by particulate material] two nonparametric approaches Claudio Silva 0 \ Patricio Pe rez 1 and Alex Trier 1 0 Universidad de Santia`o de Chile\ Facultad de Ciencia\ Departamento de Matema tica y Ciencia de la Computacio n\ Casilla 296\ Correo 1\ Santia`o\ Chile 1 Universidad de Santia`o de Chile\ Facultad de Ciencia\ Departamento de F( sica\ Casilla 296\ Correo 1\ Santia`o\ Chile SUMMARY Atmospheric particles are one of the main factors of air pollution in Santiago\ Chile[ Inhalation of particulate material is known to lead to serious health problems\ including respiratory illness and complications related thereto[ Vehicular tra.c\ industrial activity and street dust are important sources of atmospheric particles[ The public authorities in Santiago have been monitoring air pollution by means of a network of semi! automatic sampling stations[ At one of these stations\ located near the city centre close to Government House\ both PM1[4 and PM09 particulate material concentrations have been measured continuously for several years[ Here PM1[4 refers to particles having a diameter smaller than 1[4 microns and PM09 corresponds to particles smaller than 09 microns[ Hourly averages of the concentrations are available[ For the present work\ hourly data recorded at intervals of 01 hours have been used[ The aim is to describe and forecast these variables with satisfactory precision\ including critical pollution episodes\ both as a function of previous behaviour and of a set of meteorological variables\ comprising wind speed and direction\ ambient temperature and relative air humidity[ Both non!parametric discriminant analysis and multivariate adaptive regression splines procedures have been applied[ Highly satisfactory classi_cation as well as forecasting results were achieved with these approaches\ respectively[ Copyright Þ 1990 John Wiley + Sons\ Ltd[ KEY WORDS] environmental study^ MARS models^ non!parametric discrimination^ particulate material^ prediction 0[ INTRODUCTION Santiago de Chile "22[4>S\ 69[7>W# is located in a valley enclosed by mountain ranges[ The city centre has an elevation of 419 m[ The metropolitan area of Santiago exceeds 04 999 km 1 \ with a Correspondence to] C[ Silva Z[\ Universidad de Santiago de Chile\ Facultad de Ciencia\ Departamento de Matema tica y Ciencia de la Computacion\ Casilla 296\ Correo 1\ Santiago\ Chile[ E!mail] csilvaÝlauca[usach[cl Contract grant number] FONDECYT 0829985[ Contract grant number] FONDECYT 0869307[ Contract grant number] DICYT!USACH 8422SZ[

Transcript of Statistical modelling and prediction of atmospheric pollution by particulate material: two...

Page 1: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

ENVIRONMETRICS

Environmetrics 1990^ 01]036Ð048

Received 8 December 0888Copyright Þ 1990 John Wiley + Sons\ Ltd[ Accepted 14 May 1999

Statistical modelling and prediction of atmosphericpollution by particulate material] two nonparametric

approaches

Claudio Silva0�\ Patricio Pe�rez1 and Alex Trier1

0 Universidad de Santia`o de Chile\ Facultad de Ciencia\ Departamento de Matema�tica y Ciencia de la Computacio�n\Casilla 296\ Correo 1\ Santia`o\ Chile

1 Universidad de Santia`o de Chile\ Facultad de Ciencia\ Departamento de F(�sica\ Casilla 296\ Correo 1\ Santia`o\Chile

SUMMARY

Atmospheric particles are one of the main factors of air pollution in Santiago\ Chile[ Inhalation of particulatematerial is known to lead to serious health problems\ including respiratory illness and complications relatedthereto[ Vehicular tra.c\ industrial activity and street dust are important sources of atmospheric particles[The public authorities in Santiago have been monitoring air pollution by means of a network of semi!automatic sampling stations[ At one of these stations\ located near the city centre close to GovernmentHouse\ both PM1[4 and PM09 particulate material concentrations have been measured continuously forseveral years[ Here PM1[4 refers to particles having a diameter smaller than 1[4 microns and PM09corresponds to particles smaller than 09 microns[ Hourly averages of the concentrations are available[ Forthe present work\ hourly data recorded at intervals of 01 hours have been used[ The aim is to describe andforecast these variables with satisfactory precision\ including critical pollution episodes\ both as a functionof previous behaviour and of a set of meteorological variables\ comprising wind speed and direction\ambient temperature and relative air humidity[ Both non!parametric discriminant analysis and multivariateadaptive regression splines procedures have been applied[ Highly satisfactory classi_cation as well asforecasting results were achieved with these approaches\ respectively[ Copyright Þ 1990 John Wiley +Sons\ Ltd[

KEY WORDS] environmental study^ MARS models^ non!parametric discrimination^ particulate material^prediction

0[ INTRODUCTION

Santiago de Chile "22[4>S\ 69[7>W# is located in a valley enclosed by mountain ranges[ The citycentre has an elevation of 419 m[ The metropolitan area of Santiago exceeds 04 999 km1\ with a

� Correspondence to] C[ Silva Z[\ Universidad de Santiago de Chile\ Facultad de Ciencia\ Departamento de Matema�ticay Ciencia de la Computacion\ Casilla 296\ Correo 1\ Santiago\ Chile[ E!mail] csilvaÝlauca[usach[cl

Contract grant number] FONDECYT 0829985[Contract grant number] FONDECYT 0869307[Contract grant number] DICYT!USACH 8422SZ[

Page 2: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

C[ SILVA\ P[ PEłREZ AND A[ TRIER[

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

037

Figure 0[ MP09 historic trend\ Santiago de Chile[ Mean monthly concentration vs month"Jan[ 0878ÐFeb[ 0888#[ Source] www[sesma[cl[

population approaching 4[2 million "0881 census#[ Annual rainfall averages less than 399 mm[Prevailing wind direction is southwest into the city[ Altitude thermal inversions occurs frequently\compounding the usual phenomena of ground!level thermal inversion[ Thermal inversion pre!cludes vertical ventilation so that air pollutant concentrations are enhanced[ The problembecomes especially acute in the months spanning the period of April to August^ Figure 0 showsthe general trend for the 0878Ð0888 period\ as well as a least squares line2standard deviation[

The concern here is with atmospheric particles which are classi_ed as inhalable when dealingwith the so!called PM09 fraction "particles smaller than 09 microns# or respirable "implyingrespirable to lung alveoli# when the PM1[4 fraction is dealt with\ where 1[4 signi_es particle sizessmaller than 1[4 microns[

The main sources of emission of particulate material are] intense vehicular tra.c on non!pavmented streets\ internal combustion motors and industrial processes like small foundries[ Thepublic authorities "Santiago Metropolitan Health Authority] SESMA# have been measuring airpollutant concentrations since 0877 through a network of automatic monitoring stations[ Inaccordance with the information collected\ emergency action may be taken to abate pollution\involving restrictions on citizen and industrial activity[

Academic research in Santiago on diverse aspects of atmospheric pollution goes back over twodecades\ involving elemental and chemical analysis of pollutants "Trier and Silva\ 0873\ 0876^Dinator\ 0884#\ pollution impact on health "Ruiz et al[\ 0877^ Ostro et al[\ 0885^ Sanhueza et al[\0888#[ The possibility of forecasting pollution episodes on the basis of visibility time series hasbeen examined by Trier and Firinguetti "0883#\ while Rutllant and Garreaud "0884# have exam!ined forecasting of pollution episodes by means of synoptic meteorological indicators[

In this paper we summarize results of two studies developed in 0884 and 0887\ respectively[ In

Page 3: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

PREDICTION OF ATMOSPHERIC POLLUTION

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

038

Section 1 of the present work\ forecasting is attempted by discriminant analysis incorporatingboth PM1[4 and PM09 particulate mass concentrations measured near Santiago city centre andthe ground!level meteorological variables temperature\ relative air humidity\ wind speed andwind direction as measured at a local airport[ Here\ our objective was to predict\ with adequateprecision and practical anticipation\ a discrete "three levels# particulate mass concentration basedon few previous observations plus the meteorological information "Silva and Trier\ 0884#[

In Section 2\ a convenient adaptation of MARS "Multivariate Adaptive Regression Splines^Friedman\ 0880# has been worked out obtaining modelistic descriptions coherent with previousstudies "Trier and Firinguetti\ 0883^ Silva and Trier\ 0884^ Pe�rez et al[\ 0887# along with sat!isfactory predictions "Silva et al[\ 0887#[

1[ DISCRIMINANT ANALYSIS

1[0[ Parametric classi_cation

A collection of N�109 observations of four dimensions was available "Silva and Trier\ 0884#]x0 � temperature\ x1 � relative air humidity\ x2 �wind speed and x3 �PM1[4 mass concen!tration[ These observations were spaced regularly at 01 hours from 08 July to 29 October 0882[It was deemed useful to de_ne an auxiliary variable Y "Y�9 if x3 ¾ 49\ Y�0 if 49³x3 ¾ 049and Y�1 if x3 × 049# to represent {normal|\ {alarm| and {emergency| conditions\ respectively[

One can process such multivariate information using some classi_cation algorithm associatedwith classical discriminant analysis[ To this e}ect one combines k of these vectors "b�0\ 1\ 2 or3#\ thus generating "N−b# vectors x of dimension 3b with each of which one associates the Y!value pertaining to f forward periods " f�0\ 1\ 2 or 3#[ Naturally\ these new vectors are notstochastically independent and\ furthermore\ the variable Y is a discretized version\ lagged f!steps\ of the variable x3[ This might require\ strictly speaking\ a reformulation of the theory ofthe classical discriminant analysis[ Here we use\ with due reserve\ some existing algorithms[

Various combinations of b and f values were analyzed[ For instance\ b�3 implies informationincluding up to the previous 25 hours\ f�2 implies classi_cation "in terms of Y# 25 hours intothe future[

:**:**:** =**:**:**:

99x"05x0#:: y"0x0#

The results can be presented in cross!classi_cation tables summarizing the actual classi_cationof each case and the classi_cation assigned by the procedure being employed[ Basically\ anobservation x is classi_ed in category h if p"h=x# is the maximum of p" j =x# for j�0\ 1\ 2 where

p" j=x# �exp"−

0

1d1

j "x##

s2

i�0

exp"−0

1d1

i "x##

para d1i "x# � "x−x¹ i#?S

−0i "x−x¹ i#¦ln=Si

That is to say\ d1i "x# is the generalized Mahalanobis distance of the observation x to the

Page 4: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

C[ SILVA\ P[ PEłREZ AND A[ TRIER[

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

049

Table I[ Parametric classi_cation PM1[4] b � 1\ f � 3 "see text#[

Obs[ from Classi_ed in population Totalpopulation

9 0 1

9 58"63[1# 07"08[3# 5"5[3# 82"099[9#0 15"17[5# 43"48[2# 00"01[0# 80"099[9#1 9"99[9# 9"99[9# 10"099[9# 10"099[9#Total 84"35[2# 61"24[0# 27"07[4# 194"099[9#

Table II[ Parametric classi_cation PM1[4] b � 2\ f � 3 "see text#[

Obs[ from Classi_ed in population Totalpopulation

9 0 1

9 62"67[4# 08"19[3# 0"0[0# 82"099[9#0 11"13[1# 52"58[1# 5"5[5# 80"099[9#1 9"99[9# 9"99[9# 19"099[9# 19"099[9#Total 84"35[5# 71"39[1# 16"02[1# 193"099[9#

centroid of the ith sample[ Under the assumption of multivariate normality\ p" j=x# would be theposterior probability of x belonging to the jth population[

Taking a basis of two periods "b�1# and forecasting four steps ahead "37 hours# one achievesthe data in Table I with a global error rate of 11[05 per cent "14[7 per cent for class 9\ 39[6 percent for class 0 and 9 per cent for class 1#[ It is to be noted that no emergency {escapes| theclassi_cation process\ while there are only six false alarms out of 27[

Table II\ taking a basis of three periods "b�2# and forecasting four steps ahead "37 hours#results in data with a global error rate of 19[8 per cent "10[4 per cent for class 9\ 29[7 per cent forclass 0 and 9 per cent for class 1#[ Note that no emergency {escapes| the classi_cation process\while there is only one false alarm out of 16[

Since important administrative decisions are made in Santiago on the basis of the measuredPM09 mass concentrations\ it was deemed of interest to reanalyze the classi_cation using thetrichotomy de_ned by an auxiliary variable Y09 "y09 �9 if x3 ¾ 049\ y09 �0 if 049³x3 ¾ 299and y09 �1 if x3 × 299#\ in correspondence with {normal|\ {alarm| and {emergency| conditions[x3 now stands for the PM09 mass concentration[

With a basis of three periods "b�2# and forecasting four steps ahead "37 hours# the data arederived in Table III with a global error rate of 03[1 per cent "07[3 per cent for class 9\ 13[1 percent for class 0 and 9[9 per cent for class 1#[ Note that no emergency {escapes| the classi_cationprocess and that there would be no false alarms[

1[1[ Non!parametric classi_cation

One can restate the classi_cation problem without assuming the hypothesis of multivariatenormality[ For this purpose one has recourse to some discriminant analysis procedure based onthe estimation of probability density functions using kernel functions "Hand\ 0871^ Scott\ 0881#[

Page 5: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

PREDICTION OF ATMOSPHERIC POLLUTION

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

040

Table III[ Parametric classi_cation PM09] b � 2\ f � 3 "see text#[

Obs[ from Classi_ed in population Totalpopulation

9 0 1

9 79"70[5# 07"07[3# 9"9[9# 87"099[9#0 13"13[1# 64"68[7# 9"9[9# 88"099[9#1 9"99[9# 9"99[9# 6"099[9# 6"099[9#Total 093"40[9# 82"34[5# 6"2[3# 193"099[9#

The classi_cation of an observation x is based on the posterior probabilities of belonging toeach group^ these probabilities are calculated from the speci_c densities of the group using thetraining set[

The kernel method uses a _xed radius r and a kernel function Kt to estimate the tth density ineach observation x[ The Epanechnikov proposal\ for instance\ is

Kt"z# �c"t# 00−z?V−0

t z

r1 1 if z?V−0t z¾ r1

� 9 otherwise

being

c"t# � 00¦p

11 :vt"r#

where

vt"r# �rp =Vt =0:1v9 �rp =Vt =0:1p0:1

G 0p

1¦01

is the volume of the p!dimensional ellipsoid "z=z?V−0t z¾ r1# if Vt is the variance!covariance within

the tth group[The probability density for an observation x in that group is estimated by

ft"x# �0

nt

sy $ Gt

Kt"x−y#

and the posterior probability of belonging to group t will be given by

Page 6: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

C[ SILVA\ P[ PEłREZ AND A[ TRIER[

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

041

Table IV[ Non!parametric classi_cation PM1[4] b � 1\ f � 3 "see text#[

Obs[ from Classi_ed in population Totalpopulation

9 0 1

9 82"099[9# 9"9[9# 9"9[9# 82"099[9#0 9"99[9# 80"099[9# 9"99[9# 80"099[9#1 9"99[9# 9"99[9# 19"099[9# 19"099[9#Total 82"099[9# 80"099[9# 19"099[9# 193"099[9#

Table V[ Parametric classi_cation] b � 1\ f � 3 "see text#[

Obs[ from Classi_ed in population Totalpopulation

9 0 1

9 87"099[9# 9"99[9# 9"99[9# 87"099[9#0 9"99[9# 88"099[9# 9"99[9# 88"099[9#1 9"99[9# 9"99[9# 6"099[9# 6"099[9#Total 87"099[9# 88"099[9# 6"099[9# 193"099[9#

p"t=x# �qt ft"x#

sh

qh fh"x#

where qh is the prior probability of x belonging to the hth group[For the problem at hand\ considering the PM1[4 data\ taking a base of two periods and

forecasting four steps ahead "37 hours# one obtains the data in Table IV\ showing optimalclassi_cation[

Applying the same non!parametric procedure to the PM09 data\ one gets the results shown inTable V\ again with optimal classi_cation[

1[2[ Comments

The adaptation of discriminant analysis methods\ both parametric and non!parametric\ to thepredictive classi_cation of environmental risk episodes in the city of Santiago\ Chile\ has yieldedencouraging results[ The performance of the non!parametric method is especially encouragingas] "i# the resulting classi_cation is optimal and "ii# no distributional assumptions are required\making the approach suitable for analyzing complex real life situations[

The analyses reported here were made with the computational support of the DISCRIMprocedure of the SAS "SAS Institute\ 0877# package[ It would be gratifying to develop an ad!hoc computational support to process atmospheric data dynamically on a continuous basis toprovide authorities with an optimal decision procedure[

Page 7: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

PREDICTION OF ATMOSPHERIC POLLUTION

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

042

2[ MARS MODELS

2[0[ Multivariate adaptive re`ression splines "MARS#

Friedman "0880# is a milestone in statistical methodology\ containing an exhaustive presentationof MARS models and illuminating commentaries by several distinguished statisticians[ Afterthat work the bibliography on theory and applications of these models has been abundant] Lewisand Stevens "0880#\ Chen et al[ "0886#\ Friedman and Roosen "0884# are a few examples[ MARSis a method for ~exible modelling of high dimensional data that can be conceptualized as ageneralization of recursive partitioning[ It _ts a nonlinear regression model based on an expansionin product spline basis functions[

2[0[0[ Model for one predictor[ For a single response variable y and a predictor variable x and arandom additive error variable o we might assume a model y� f "x#¦o[ By selecting K {knots|x�tk\ k� 0\ [ [ [ \K\ we de_ne K¦0 {regions| over the domain of x[ We can associate to eachknot a linear spline function generating a family of basis function

"B"q#k "x## � 6

xj j� 9\ [ [ [ \ q

"X−tk#q¦ k� 0\ [ [ [ \K

"0#

For a given order q of approximation\ an estimate value

fq"x# � sK¦q

k�9

akB"q#k "x#

might be satisfactorily adjusted[ Usually we take q�2 to have continuity of fq\ f?q and fýq[ Giventhat the number and location of the knots as well as the values of the ak coe.cients must beadequately selected\ we have plenty of possibilities for improving the performance of the _ttedvalues fq"x#[ To evaluate this model\ Friedman proposed the statistics

GCV�Generalized cross validation�A×si

"yi−fq"xi##:N "1#

with A� "0−C"M#:N#−1 and C"M# � 0¦trace "B"B?B#−0B?# being a complexity cost function"Friedman\ 0880#[

2[0[1[ Extension to p predictors[ For x� "x0\ [ [ [ \xp# we have families of basis functions"B"q#

kj"xj## j� 0\ [ [ [ \ p and an estimated value for the response variable can be constructed under

the form "2# or equivalently "3# or "4#

fq"x# �skp

[ [ [skp

ak0 [ [ [ kptp

j�0

B"q#kj

"xj# "2#

This can be accomplished through a basic algorithm such as the following] "a# Fit the response

Page 8: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

C[ SILVA\ P[ PEłREZ AND A[ TRIER[

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

043

y with a constant "i[e[\ _nd its mean#[ "b# Pick the variable xj and the knot location whichgive the best _t in terms of residual sum of squares error\ say] f�a9¦a0 � "xj−tj#¦

¦a1 � "−xj¦tj#¦ = c#[ Repeat these steps on every other variable] new functions may depend ornot on the previous knot locations[ Basis functions are tensor products of previous basis functions[A limit to the number of basis functions is set as a parameter of the algorithm[ "d# Followingsuch forward construction\ a backward elimination is performed removing individual terms thatdo not improve the _t enough to justify the increment on complexity[ "e# Fit the resulting modelwith a cubic spline so that it has continuous _rst derivatives as well as being continuous[

fq"x# �a9¦Sam tp

j�0

ðskm"xv"k\m#−tkm#Łq¦ "3#

with skm �20[ Collecting all the basis functions that involve identical predictor variables wehave

fq"x# �a9¦ sKm �0

fi"xi#¦ s

Km �1

fij"xi\xj#¦ s

Km �2

fijk"xi\xj\xk#¦= = = "4#

In essence\ MARS can be characterized as a data driven procedure\ instead of the more frequentlyused model driven procedures[

In summary\ {MARS produces a model for the response that automatically selects the variablesappearing in the _nal equation[ It also indicates whether a variable enters additively\ or interactswith another variable\ and furthermore selects the complexity of the relationship between theresponse and each variable "by the number of basis functions used#[ Finally\ graphic displaysallow the user to interpret the relationships between the response and the predictors| "De Veauxet al[\ 0882#[

2[1[ Application

For this study "Silva et al[\ 0887# we used N�175 vectors of observed values of x0 � temperature">C#\ x1 � relative humidity\ x2 �wind speed "m:s#\ x3 �wind direction\ x4 �hour of registry"9 or 01 hours#\ x5 �day of week and PM09� level of concentration of particulate materialwith aerodynamic diameter less or equal to 09 mm[ These observations were taken each 01 hoursfrom 0 May to 29 September 0883[

Three dummy variables were de_ned to handle x3 "NÐS\ EÐW\ NEÐSW and NWÐSE winddirections#\ one dummy for x4 "midnight:noon# and one for x5 "Wednesday\ Thursday and Fridayversus Saturday\ Sunday\ Monday and Tuesday#[ The variables listed in Table I were standardizedto m�4\ s�0[ Additional predictors for PM09 were obtained from y and x0\ [ [ [ \x4 by appli!cation of operators lag0\ lag1 and lag2[ As important administrative decisions must be taken inPM09× 299 "legal environmental emergency status# or if 049³PM09¾ 299 "pre!emergencystatus#\ we introduced in our _tting a {weight variable| w with value 0 if PM09³ 049\ value 1 if049³PM09¾ 299 and value 2 if PM09× 299[

In order to compare with other modelling procedures we have computed for each model threestatistics] the generalized cross validation statistics GCV\ the correlation coe.cientr� corr"yi\ fq"xi## and the mean proportion of absolute estimation error

Page 9: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

PREDICTION OF ATMOSPHERIC POLLUTION

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

044

Table VI[ Comparison of three MARS models[

Model gcv r mpab

0] p � 1\ q � 0 9[826 9[48 9[031] p � 1\ q � 1 0[093 9[49 9[042] p � 1\ q � 2 0[980 9[33 9[05

mpab�0

Nsi

=yi−fq"xi# =yi

[

2[1[0[ Model 0] Forecasting 01 hours ahead "based on 01 and 13 hours lagged information#[ TheMARS selection process "forward and backward# begins with 02 predictors] lag0"xi#\ lag1"xi#i� 0\ [ [ [ \ 4\x5\ lag0"y# and lag1"y#^ the predictors chosen were] lag0"x0#\ lag1"x0#\ lag0"x1#\lag0"x2#\ x5\ lag0"y#\ lag1"y#[ They correspond to temperature "lag0 and lag1#\ relative humidity"lag0#\ day of week and PM09 "lag0 and lag1#[

For this model we have "see Table VI# gcv�9[8262\ r�9[48 and mpab�9[03[ This last valuetells us that\ on average\ the percentage of error is 03 per cent\ a result highly superior to thoseobtained with time series "Trier and Firinguetti\ 0883# or neural networks "Pe�rez et al[\ 0887#[Additionally\ less than 09 per cent of the predicted values err more than 14 per cent and onlythree failed more than 49 per cent "not emergency days\ fortunately#[ Figure 1 shows forobservations 0 through 099\ "a# the observed and predicted responses and "b# the proportion ofabsolute error[ Figures 2Ð4 exhibit the interactions involved in this model[

2[1[1[ Model 1] Forecasting 25 hours ahead "based on 25 and 37 hours lagged information#[ Fromthe initial 02 predictors] lag2"xi#\ lag3"xi# i� 0\ [ [ [ \ 4\x5\ lag2"y# and lag3"y#\ the selectionprocess picked up just seven predictors] lag1"x0#\ lag2"x0#\ lag1"x1#\ lag2"x2#\ x5\ lag1"y#\ lag2"y#[They correspond to temperature "lag1 and lag2#\ relative humidity "lag1#\ wind velocity "lag2#\day of week and PM09 "lag0 and lag1#[

For this model we have gcv�0[093\ r�9[49 and mpab�9[04[ Additionally\ less than 09 percent of the predicted values err more than 24 per cent and only _ve failed more than 49 per cent[

2[1[2[ Model 2] Forecasting 25 hours ahead "based on 25 and 37 hours information#[ Beginningwith 02 predictors] lag2"xi#\ lag3"xi# i� 0\ [ [ [ \ 4\x5\ lag2"y# and lag3"y#\ MARS chose thefollowing predictors] lag2"x0#\ lag2"x1#\ lag3"x2#\ x5\ lag2"y#\ lag3"y#[ They correspond to tem!perature "lag2#\ relative humidity "lag2#\ wind velocity "lag3#\ day of week and PM09 "lag1 andlag2#[

For this model we have gcv�0[093\ r�9[49 and mpab�9[05[ This last value tells us that\on average\ the percentage of error is 05 per cent[ Additionally\ less than 09 per cent of thepredicted values err more than 25 per cent and only _ve failed more than 46 per cent[

2[2[ Comments

As we can see in Table VI\ the three models presented here are satisfactory in terms of approximateevaluation "one\ two or three steps ahead# of level of PM09[ For further anticipation\ 37 hoursor more\ this is not true[

Page 10: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

C[ SILVA\ P[ PEłREZ AND A[ TRIER[

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

045

Figure 1[ First MARS model[ Responses and errors[

Page 11: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

PREDICTION OF ATMOSPHERIC POLLUTION

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

046

Figure 2[ First MARS model\ part "a#[

Figure 3[ First MARS model\ part "b#[

Page 12: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

C[ SILVA\ P[ PEłREZ AND A[ TRIER[

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

047

Figure 4[ First MARS model\ part "c#[

The proportion of error is\ in our experience\ lower than that corresponding to other modellingapproaches[ In these three models the variables retained by MARS are two lagged values ofPM09\ day of week\ one value of relative humidity "that with lower lag order#[ Temperature ispresent with two values in models 0 and 1\ but only with one value in model 2^ wind velocity ispresent only in models 1 and 2[ In any case\ the presence of persistence is evident in this modellingapproach to PM09 as well as the impact of meteorological factors[

3[ DISCUSSION

The procedures applied here have proven their e.ciency for modelling and predicting atmosphericpollution by particulate material\ outperforming other methodologies[ For example\ times seriesinvolving transference functions associated to meteorological variables gave us 39 per cent ofmean proportion of error "Silva et al[\ 0883# and neural networks with a preliminary smoothinggave us 29 per cent approximately "Pe�rez et al[\ 0887#[ This last result is consistent with _ndingsby other authors who write {we _nd that MARS is in most cases both more accurate and muchfaster than neural networks| "De Veaux et al[\ 0882#[ Ruttlant and Garreaud "0884# applieddiscriminant analysis to develop a Meteorological Air!Pollution!Potential Index "MAPPI# whichforecasts 01 hours ahead\ with 62 per cent accuracy\ the high air pollution potential episodes[Non parametric discrimination would be highly recommended for a purely categorical predictionor classi_cation with the additional advantage of easy access to standard software "SAS orequivalent#[ On the other hand\ MARS modelling provides us a quantitative prediction that it isappealing for decision making^ the computational support is more cumbersome than that usedfor nonparametric discriminant analysis[

Page 13: Statistical modelling and prediction of atmospheric pollution by particulate material: two nonparametric approaches

PREDICTION OF ATMOSPHERIC POLLUTION

Copyright Þ 1990 John Wiley + Sons\ Ltd[ Environmetrics 1990^ 01]036Ð048

048

ACKNOWLEDGEMENTS

The authors are grateful for _nancial support provided by grants FONDECYT 0829985\ FONDECYT0869307 and DICYT!USACH 8422SZ[

REFERENCES

Chen G\ Abraham B\ Bennett GW[ 0886[ Parametric and non!parametric modelling of time series Ð an empirical study[Environmetrics 7]52Ð63[

De Veaux RD\ Psichogios DC\ Ungar LH[ 0882[ A comparison of two nonparametric estimation schemes] MARS andneural networks[ Computers and Chemical En`ineerin` 06"7#]708Ð726[

Dinator MI[ 0884[ A multiparametric study of suspended particulate matter in the atmosphere of Santiago\ Chile[ InProceedin`s\ 09th World Clean Air Con`ress\ Espoo\ Finland[

Friedman JH[ 0880[ Multivariate adaptive regression splines[ Annals of Statistics 08]0Ð030[Friedman JH\ Roosen CB[ 0884[ An introduction to multivariate adaptive regression splines[ Statistical Methods in

Medical Research 3]086Ð106[Hand DJ[ 0871[ Kernel Discriminant Analysis\ Research Studies Press[ Wiley] New York[Lewis PAW\ Stevens JG[ 0880[ Nonlinear modeling of time series using multivariate adaptive regression splines "MARS#[

Journal of the American Statistical Association 75]753Ð766[Ostro B\ Sa�nchez JM\ Aranda C\ Eskeland G[ 0885[ Air pollution and mortality[ Results from a study in Santiago\ Chile[

Journal of Exposure Analysis and Environmental Epidemiolo`y 5]86Ð003[Pe�rez P\ Trier A\ Silva C\ Montan½o R[ 0887[ Prediction of atmospheric pollution by particulate matter using a neural

network[ In Proceedin`s of the 0886 Conference on Neural Information Processin`\ Dunedin\ New Zealand\ Vol[ 1]0999Ð0993[ Springer] Berlin[

Ruiz F\ Videla L\ Vargas N\ Parra MA\ Trier A\ Silva C[ 0877[ Air pollution impact on phagocytic capacity of peripheralblood macrophages and antioxidant activity of plasma among school children[ Archives of Environmental Health32"3#]175Ð180[

Rutllant J\ Garreaud R[ 0884[ Meteorological air pollution potential for Santiago\ Chile] towards an objective episodeforecasting[ Environmental Monitorin` and Assessment 23]112Ð133[

Sanhueza P\ Vargas C\ Jimenez J[ 0888[ Mortalidad diaria en Santiago y su re�lacio�n con la contaminacio�n del aire[Revista Me�dica de Chile 016"1#] 124Ð131[

SAS Institute[ 0877[ SAS:STAT User|s Guide\ Version 5\ 0st edn\ Vol[ 0[ SAS Institute] Cary\ NC[Scott DW[ 0881[ Multivariate Density Estimation[ Theory\ Practice and Visualization[ Wiley] New York[Silva C\ Firinguetti L\ Trier A[ 0883[ Contaminacio�n ambiental por part(�culas en suspensio�n] Modelamiento estad(�stico\

Actas XXI Jornadas Nacionales de Estad(�stica\ 00Ð01\ Concepcio�n\ Noviembre 0883[Silva C\ Trier A[ 0884[ Statistical modeling and prediction of the atmospheric pollution by particulate material[ In Actas

VII Con`reso Internacional de Biomatema�ticas\ Buenos Aires\ Oct[ 0884^ 271Ð277[Silva C\ Pe�rez P\ Trier A\ Montan½o R[ 0887[ Modelamiento estad(�stico y prediccio�n MARS de la contaminacio�n

ambiental por part(�culas en suspensio�n\ Actas VII Congreso latinoamericano de probabilidad y Estad(�stica Matema�tica\63Ð64\ Co�rdoba\ Septiembre 0887[

Trier A\ Firinguetti L[ 0883[ A time series investigation of visibility in an urban atmosphere Ð I[ Atmospheric Environment17"4#]880Ð885[

Trier A\ Silva C[ 0873[ Multivariate statistical analysis of atmospheric aerosol composition data for Santiago\ Chile[ InAerosols\ Science\ Technolo`y and Industrial Applications of Airborne Particles\ Lui BYH\ Pui DYH\ Fissan HJ "eds#[Elsevier] Amsterdam[

Trier A\ Silva C[ 0876[ Inhalable atmospheric particulate matter in a semi!arid climate] the case of Santiago de Chile[Atmospheric Environment 10"3#]866Ð872[