Optimization of the atmospheric pollution monitoring network at Santiago de Chile
-
Upload
claudio-silva -
Category
Documents
-
view
213 -
download
0
Transcript of Optimization of the atmospheric pollution monitoring network at Santiago de Chile
Atmospheric Environment 37 (2003) 2337–2345
Optimization of the atmospheric pollution monitoringnetwork at Santiago de Chile
Claudio Silvaa,�, Alexis Quirozb
aUniversidad de Chile, Facultad de Medicina, Escuela de Salud P !ublica, Casilla 70012, Correo 7, Santiago, Chileb Ingeniero Estad!ıstico, Universidad de Santiago de Chile, Chile
Received 7 November 2002; accepted 20 February 2003
Abstract
Environmental pollution is a problem affecting many cities in our planet. Santiago de Chile is one with the worst
indices. Because of that, local authorities implemented a few years ago an air quality monitoring network with eight
monitoring stations located across the whole city. These stations continuously collect information about the presence
and level of atmospheric contaminants as well as meteorological indices.
As the budget for this activity is limited, to increase the monitoring network as the city grows might be an inefficient
decision. To evaluate alternative decisions multiple criteria should be consider. A statistical evaluation of some low cost
modifications of the network becomes a valid research topic. This paper attempts to optimize Santiago’s atmospheric
monitoring network by excluding the least informative stations with respect to the variables under study: carbon
monoxide (CO), airborne particulate material (PM10), ozone (O3) and sulfur dioxide (SO2). To accomplish this, an
index of multivariate effectiveness, based on Shannon information index, is applied to that network.
r 2003 Elsevier Science Ltd. All rights reserved.
Keywords: Effectiveness index; Monitoring network; Shannon information index; Atmospheric pollution
1. Introduction
The design of a new atmospheric pollution monitor-
ing network and the evaluation of an existing network
have attracted the attention of different researchers. One
approach followed by Zimmerman and Hormer (1991),
Cressie (1990) and Mardia and Goodall (1993) focuses
on the estimation of special attributes of a semivario-
gram. A different perspective, based on the use of
Shannon information, was initiated with the results of
Caselton and Zidek (1984), applied in a univariate setup
by Sampson and Guttorp (1992) and Guttorp et al.
(1993) and later extended to a multivariate context by
P!erez-Abreu and Rodr!ıguez (1996). These authors
applied their results to a 15-day campaign collecting
data on four gaseous pollutants at Mexico City (March
1992). Recently the efficiency of the air pollution
monitoring networks of Helsinski and Brisbane have
been thoroughly studied by Karppinen et al. (2000) and
Morawska et al. (2002), respectively.
The chilean capital city, Santiago (33.5�S, 70.8�W), is
located in a valley enclosed by the Los Andes mountain
range. The city centre has an elevation of 520m. The
metropolitan area of Santiago exceeds 15000 km2, with
population approaching 6.2 millions (National Statistics
Institute, 2001 estimation; www.ine.cl). Annual rainfall
averages less than 400mm. Temperature varies typically
from an annual minimum of �2�C to an annual
maximum of 35�C. Prevailing wind direction is south-
west into the city. Thermal inversion precludes vertical
ventilation so that air pollutant concentrations are
enhanced. This problem and its health consequences
become especially acute in the April–August period.
The public authorities (Santiago Metropolitan Health
Authority, SESMA) have been operating an air
quality monitoring network since 1990; eight stations,
AE International – Central & South America
�Corresponding author.
E-mail addresses: [email protected] (C. Silva),
[email protected] (A. Quiroz).
1352-2310/03/$ - see front matter r 2003 Elsevier Science Ltd. All rights reserved.
doi:10.1016/S1352-2310(03)00152-3
distributed through out the urban area are components
of this network (See Figs. 1 and 2). They continuously
collect information on pollutants in the atmosphere and
some meteorological variables. The main structure of
each station is a container METONE with special
entries for air sampling and adequate accommodations
for API gas analyzers and a R&P particle analyzer; daily
automatic calibration is conducted using USA-EPA
standards. A 10m pole supports the instruments
necessary for the collection of the meteorological
parameters. A data logger saves the information at each
station; hourly this material is sent electronically to a
central computer. The use of the environmental software
AIRVIRO at central and local levels provides a fast and
efficient data management.
To control the serious public health problems
associated with the atmospheric pollution, the public
authorities must take complicated long term and also
daily decisions. Scientific, demographic, economic and
political criteria are involved. To maintain a monitoring
network prepared to generate opportune and reliable
information is crucial. Because of budgetary limitations
for this activity, to increase the monitoring network
accordingly with the city growth in extension, human
population and number of vehicles is not a simple
decision. To evaluate alternative decisions multiple
criteria should be consider. A statistical evaluation of
some low cost modifications of the network becomes a
relevant issue.
The main purpose of this work is to evaluate the
possible exclusion of those monitoring stations appear-
ing as ‘‘the least informative’’ and, if possible, to find out
an optimal configuration of stations, meaning a smaller
set of stations that provides adequate information for
administrative purposes. P!erez-Abreu and Rodr!ıguez
(1996) did a similar work for Mexico City using a 15
days campaign collecting data on four gaseous pollu-
tants. We use daily averages for the July 1998 values of:
(a) Carbon monoxide (CO; ppm).
(b) Airborne particulate material, fraction under 10
micron (PM10; micrograms per standard cubic
meter).
(c) Ozone (O3; ppm� 1000).
(d) Sulfur dioxide (SO2, ppm� 1000).
Section 2, introduces some basic definitions and
present results required for our analyses. In Section 3
we apply these ideas to Santiago atmospheric environ-
ment based on the four environmental variables above
mentioned. Special attention is devoted to validate
statistical assumptions.
2. Effectiveness of an environmental monitoring network
2.1. Shannon information index for an environmental
monitoring network
2.1.1. One variable of interest
Let us assume a collection of m locations, where
stations are continuously recording the magnitude of a
variable of interest; a subset A of n of these stations is
monitored and the complement B of m � n stations is
not monitored. On the other hand, for each instant t
(t ¼ 1;y;T), let Mt ¼ M t;1; :::;M t;n
� �be the measure-
ments of X at the monitored stations and U t ¼U t;1; :::;U t;m�n
� �those corresponding to the not mon-
itored stations. (Usually we will drop the subindice t:)Let fM ;U represent the joint density of M and U ; and
let fM ; fU be the corresponding marginal densities. Then,
Shannon index of information (Shannon, 1948; Klir and
Folger, 1988; P!erez-Abreu and Rodr!ıguez,1996) is
defined as
IðM ;UÞ ¼Z
lnfM ;U ðx; yÞfM ðxÞfU ðyÞ
� �fM ;U ðx; yÞ dx dy: ð1Þ
Note that, if M and U are independent, then
fM;U ðx; yÞ ¼ fM ðxÞfU ðyÞ and IðM ;UÞ ¼ 0:That is, the monitored stations do not provide
information on the unmonitored ones.
0 10 20 KM
N
S
EW
Fig. 1. Map of the city of Santiago, Chile, displaying the
geographic distribution of the eight stations forming the
atmospheric monitoring network. Dark areas correspond to
the urban region. Contour lines at the extreme right represent
the Andes Mountain; other hills surrounding Santiago are
similarly marked. Source: www.sesma.cl. Station B (Providen-
cia), Station F (Independencia), Station L (La Florida), Station
M (Las Condes), Station N (Santiago Centro), Station O
(Pudahuel), Station P (Cerrillos), and Station Q (El Bosque).
C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–23452338
If the m-dimensional vector (M, U) has a normal
multivariate distribution with mean vector l covariance
matrix R; then the Information Index of Shannon
reduces to:
IðM;UÞ ¼ �1
2ln
detðSÞdetðS11ÞdetðS22Þ
: ð2Þ
where S11 and S22 represent the covariance matrices of
M and U, respectively, and detð:Þ indicates the determi-nant of the corresponding matrix.
2.1.2. Two or more variables of interest
Let us assume that our interest is to use the same
network to monitor r variables X 1;y;X r: One possibi-lity is to evaluate and discuss, separately, for each
variable Shannon’s index IðM i;U iÞ i ¼ 1;y; r: How-ever, usually the optimum design based on one variable
is not the best for the others. Fortunately, different
strategies are available.
(A) Shannon joint information index: Let us assume
that M i ¼ fM i1;y;M i
ng i ¼ 1;y; r correspond to the
values of the variable X i at the n monitored stations and
let U i ¼ fU i1;y;U i
m�ng i ¼ 1;y; r be the values of X i
at the m � n unmonitored stations. The joint Shannon
index of information is defined as
IðM1yMr;U1
yU rÞ
¼Z
lnfM1yMr ;U1yUr ðx1;y;xr; y1;y; yrÞfM1yMr ðx1;yxrÞfU1yUr ðy1;y; yrÞ
�fM1yMrU1yUr ðx1yxr; y1yyrÞ dx1
ydxr dy1ydyr:
ð3Þ
Given multivariate normality, the Shannon joint
information index is
IðM ;UÞ ¼ �1
2ln
detðSÞdetðS11ÞdetðS22Þ
; ð4Þ
where S corresponds to the covariance matrix of
fM i1;y;M i
n;Ui1;y;U i
m�n; i ¼ 1;y; rÞg; S11 and S22
are the covariance matrices of M ¼ fM i1;y;M i
n; i ¼1;yrg and U ¼ fU i
1;y;U im�n; i ¼ 1;yrg; respec-
tively.
(B)Effectiveness index: Let assume that the optimal
composition of the collections A (monitored stations)
and B (unmonitored stations) is unknown and our
interest is to choose the n locations in A optimally on the
set of k ¼ Cmn possible configurations. Let j denote one
of such configurations of A and B; let be MðjÞ and UðjÞ
Variable SO2
DISTANCE TO STATION P
3000020000100000
CO
RR
ELA
TIO
N
1.0
.9
.8
.7
.6
.5
Q
ON
M
L
F
B
Variable MP10
CO
RR
ELA
TIO
N
.9
.8
.7
.6
.5
Q
P
O
N
M
L
BVariable CO
DISTANCE TO STATION F
1400012000100008000600040002000
DISTANCE TO STATION F
1400012000100008000600040002000
CO
RR
ELA
TIO
N
1.0
.8
.6
.4
.2
0.0
QP
O
N
M
LB
Variable O3
DISTANCE TO STATION Q
24000
22000
20000
18000
16000
14000
12000
10000
8000
6000
CO
RR
ELA
TIO
N
1.0
.8
.6
.4
.2P
ON
M
L
FB
Fig. 2. Graph 2.1 is a plot of ‘‘the distance of each station to station F00 against ‘‘the correlation between corresponding levels of CO’’.
Graphs 2.2, 2.3 and 2.4 are similar for levels of MP10, O3 and SO2.
C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–2345 2339
the corresponding values of X : Then the optimal design
is the configuration jn; such that:
IðMðjnÞ;UðjnÞÞ
¼ maxfIðMðjÞ;UðjÞÞ; j ¼ 1;y; kg: ð5Þ
From this basis, monitoring more than one variable
may be handled through the following procedure to
design and evaluate multivariate-monitoring networks.
For each variable X i let I ij ðM
i;U iÞ be the Shannon indexassociated to the configuration AðjiÞ; i ¼ 1;y; r; j ¼1;y; k: Anðjni Þ will denote the configuration with
maximal I ij ¼ ðM i;U iÞ: Then
Pij AðjiÞ½ � ¼maxfI i
j ðMi;U ig � I i
j ðMi;U iÞ
maxfI ij ðM
i;U ig
with j ¼ 1;y; k ð6Þ
represents the loss of information corresponding to the
design AðjiÞ with respect to the configuration Aðjni Þoptimal when only the X i variable is considered.
For any 0opp1 let Cip ¼ fAðjiÞ; PijðAðjiÞppg be the
set of all possible design, for the variable X i; with loss ofinformation less or equal than p: Now we can define the
index of effectiveness for the collection of r variables as
qn ¼ supf1� p : C1p-y-Cr
pafg: ð7Þ
The quantity q� measures the ability of optimally
monitoring r variables by considering a network of n
stations chosen from a set of m available stations. In this
sense a small value of q� indicates a low performance
whereas a value close to 1 indicates a good performance.
It is possible that some relations between variables
can be explained by the physical location of the stations;
therefore spatial sampling concepts will be elaborated.
2.2. Statistical data analysis
2.2.1. Normality assumptions
Basic descriptive statistics (see Table 1) and a
Shapiro–Wilk test of normality were carried out for
each variable at each station. SO2 shows non-normality
in four stations, whereas CO, O3 and PM10 have similar
behaviour at one station. As we need multivariate
normality to apply the simplest form of Shannon’s index
of information, we explored the use of Box–Cox
transformations (Atkinson and Cox, 1988; Broemeling,
1982) to reach univariate normality.
yðlÞ ¼yl � 1
lif la0;
logðyÞ if l ¼ 0;
8<: ð8Þ
Table 1
Descriptive statistics for CO, PM10, O3 and SO2 at each station (based on daily averages for 1–31 July 1998)
B F L M N O P Q
CO
Mean 2.80 2.72 2.17 1.57 3.26 2.06 2.40 2.35
Std. dev. 0.74 0.85 0.59 0.41 1.43 1.16 1.01 0.72
Min. 1.71 1.42 1.08 0.80 0.74 0.55 0.34 1.09
Max. 4.71 4.67 3.24 2.48 6.71 5.38 4.07 4.05
PM10
Mean 109.39 126.06 159.53 94.58 137.06 124.79 126.33 133.78
Std. dev. 26.36 30.72 40.98 25.86 38.34 39.56 36.88 33.87
Min. 69.25 75.88 86.63 51.25 73.25 54.13 68.00 80.00
Max. 159.21 190.96 241.58 150.38 224.96 203.46 190.29 195.29
O3
Mean 5.61 5.95 11.09 11.35 8.15 11.11 14.20 8.51
Std. dev. 1.61 2.77 3.93 4.26 3.75 2.99 8.02 2.84
Min. 2.83 0.94 2.79 2.96 1.79 5.63 3.33 2.38
Max. 10.21 12.54 18.50 18.38 16.54 17.29 31.96 13.46
SO2
Mean 11.06 10.88 9.23 5.24 9.54 6.05 8.19 9.04
Std. dev. 4.86 5.29 4.96 2.75 4.95 3.58 4.62 4.03
Min. 3.04 2.63 3.75 1.54 1.96 1.64 1.88 3.38
Max. 21.04 21.42 24.63 13.33 21.29 13.96 17.25 17.83
The levels of these contaminants look almost uniform across the city. However, in a closer look, we can note that Las Condes (M), a
station located in an upper class neighborhood, shows the lowest levels for CO, PM10 and SO2, but has one of the worse levels in O3;
Centro (N), the downtown station, is the worse for CO (second for PM10). On the other hand, La Florida (L), located in a middle class
neighborhood appears as the worse location both for PM10 and SO2.
C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–23452340
where the parameter l is estimated using Maximum
likelihood. Any lognormal variable is included in this
family of transformations for l ¼ 0:In Table 2 (left-hand side), we present the estimates of
l for each non-normal variable-station combination.
All the transformed variables present p-values larger
than 0.15 for Shapiro–Wilk test. Having passed this
necessary condition for multivariate normality we
implemented Mardia’s test for joint multivariate p-
normality (Mardia, 1985) based on the following
definitions:
Multivariate skewness :
b1; p ¼ Efðx � mÞ0S�1ðy � mÞg3;
Multivariate kurtosis :
b2; p ¼ Efðx � mÞ0S�1ðx � mÞg2:
Under multivariate normality the values of these
parameters are b1; p ¼ 0 and b2;p ¼ pðp þ 2Þ; respec-
tively.
To evaluate adequate test statistics we must follow the
following steps:
(a) For each pair of observed vectors xi and xjiaj; wecompute
gij ¼ ðxi � %xÞ0S�1ðxj � %xÞ: ð9Þ
(b) Then
b1; p ¼1
n2
Xn
i; j¼1
g3ij and b2; p ¼1
n
Xn
i¼1
g2ij and finally
ð10Þ
(c) The test statistics with their corresponding sam-
pling null distributions are:
n
6b1; pBw2
1
6pðp þ 1Þðp þ 2Þ
�and
fb2; p � pðp þ 2Þg
f8pðp þ 2Þ=ng1=2BNð0; 1Þ: ð11Þ
For our present situation H0 : b1;4 ¼ 0 and H0 : b2;4 ¼24: In Table 2 (right-hand side) we summarize the resultsof this analysis:
Now we can proceed to compute the Shannon index
under its simpler form.
3. Design and application of an index of effectiveness for
Santiago atmospheric monitoring network
Using formula (2) we evaluated the Shannon index of
information excluding one station at a time (eight
possible configurations). Therefore we get a value for
each combination (variable by configuration) See
Table 3.
From Table 3 we can see that, for variable CO,
configuration number 2 presents the highest Shannon
index of information: this means that set of stations B,
L, M, N, O, P, Q gives maximum information with
respect the excluded station F. Graph 2.1 displays for
each station pair including F the distance (m) and the
correlation for their CO values (Fig. 2). It is clear that,
for six out of seven pairs, the correlations are at least 0.6
(even for stations that are rather distant from F).
This means that in terms of CO, the station F could
be removed since most of its information would be
preserved by the collection formed with the stations
B, L, M, N, O, P, Q.
Considering the variable PM10, the Shannon index of
information shows that configuration number two (re-
moved F keeping stations B, L, M, N, O, P, Q) it is
optimal. In Graph 2.2 we have distances and correlations
for each pair of stations keeping fixed station F. From this
figure we observe that, independently from the distance
between station F and any of the remaining, all the
correlations between PM10 values are high and statistically
significative. Therefore, regarding PM10 and based on
Shannon’s index of information, the station F could be
removed since the other stations give the maximum of
information compared with all other configurations.
Table 2
Box–Cox l estimates for each non-normal variable (P-values for Mardia’s tests at each station)
Station Contaminant p-value under hypothesis
CO PM10 O3 SO2 H0 : b1;4 ¼ 0 H0 : b2;4 ¼ 24
B . . . 0.5 0.5851 0.5902
F . . . 0.4 0.6061 0.2148
L . . . �0.1 0.9756 0.1293
M . . . �0.1 0.9165 0.2552
N . . . 0.3 0.2634 0.7460
O 0.1 . . �0.1 0.9473 0.1806
P . . 0.1 0.3 0.3143 0.2150
Q . �0.1 . 0.1 0.8994 0.1602
These p-values show that, after Box–Cox transformation, multivariate normality has been attained.
C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–2345 2341
Similar discussion with respect to O3 shows that the
configuration B, F, L, M, N, O, P (ignoring station Q), is
the most efficient (see Table 4 and Graph 2.3).
Finally, for variable SO2, combining both criteria we
conclude that station P is the candidate for exclusion.
From these figures and the computation of Shannon’s
index of information we calculated the losses of informa-
tion, with respect to the optimal configuration, for all the
other configurations and for each variable. In short, the
idea is to find out a configuration of stations adequate to
monitor all the variables of interest with minimal loss of
information excluding one station, two, etc.
In Table 4 we observe that the fifth configuration is
the one with lowest loss of information as compared
with the optimal configuration. Since such losses are
lower than 0.21 we can say that the network effective-
ness index is q� ¼ 0:79; in other words, that configura-
tion can monitor the collection of four studied variables
with a 79% of effectiveness. Additionally we notice that
station N would be the ‘‘least informative’’, becoming
candidate for exclusion.
It is significant to note that regarding CO as the most
important variable would lead us to choose the fifth
configuration as the best, with an index of effectiveness
of q� ¼ 0:98: Similarly if PM10 is considered as the
most important variable, then the best configuration is
the number 1 with an index of effectiveness of q� ¼ 0:94;for O3 the sixth configuration is the best with q� ¼ 0:99and for SO2 the best configuration is the fifth with
q� ¼ 0:79:Now, we might be interested in the possible exclusion
of two stations from the network and to determine an
optimal six-station configuration (see Table 5).
From this table we can notice that if we consider the
CO variable, the optimum configuration is the fourth,
whereas for variables PM10 and SO2, that optimum
corresponds to the sixth configuration. Finally, for
variable O3, the optimum configuration it is the 16th.
From that analysis, we computed the loss of information
for each combination of stations as compared with the
corresponding optimum for each variable. The results
are summarized in Table 6.
Table 3
Shannon index of information each variable and each combination of seven stations (leaving one out)
Configuration Monitored stations Shannon index
Yes No CO PM10 O3 SO2
1 F,L,M,N,O,P,Q B 1.2530 1.3206 0.8928 1.4450
2 B,L,M,N,O,P,Q F 1.5262 1.4051 0.7929 1.3904
3 B,F,M,N,O,P,Q L 1.0377 1.0486 1.2465 1.0112
4 B,F,L,N,O,P,Q M 0.4784 0.7981 0.8042 0.7429
5 B,F,L,M,O,P,Q N 1.5032 1.1648 1.3250 1.4782
6 B,F,L,M,N,P,Q O 0.7292 0.7506 1.3533 1.3318
7 B,F,L,M,N,O,Q P 1.2422 1.2073 0.5077 1.8646
8 B,F,L,M,N,O,P Q 0.6628 0.8616 1.3637 0.7858
The last column of this table shows that, in terms of SO2, the most informative seven-station configuration is the 7th (keeping
unmonitored the station P); the 8th configuration is the most informative in terms of O3 and the second is the best for CO and PM10.
Table 4
Loss of information for each variable with respect to the optimal configuration
Configuration Monitored stations Loss of information
Yes No CO PM10 O3 SO2
1 F,L,M,N, O,P,Q B 0.1790 0.0602 0.3453 0.2250
2 B, L,M,N,O,P,Q F 0 0 0.4185 0.2543
3 B,F,M,N,O,P,Q L 0.3200 0.2537 0.0859 0.4577
4 B,F,L,N,O,P,Q M 0.6865 0.4320 0.4102 0.6016
5 B,F,L,M,O,P,Q N 0.0150 0.1710 0.0284 0.2072
6 B,F,L,M,N,P,Q O 0.5222 0.4658 0.0076 0.2857
7 B,F,L,M,N,O,Q P 0.1861 0.1408 0.6277 0
8 B,F,L,M,N,O,P Q 0.5657 0.3868 0 0.5786
For each value of the Shannon index of information (Table 3) its relative difference with respect to the maximum of the corresponding
column is reported here loss of information of that ‘‘variable by configuration’’ combination. Each italic numbers corresponds to the
largest loss of information associated to a given configuration. The minimum of these losses is 0.2072 implying that for a configuration
of 7 stations the effectiveness would be 79%. Recommendation: leave unmonitored the station N.
C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–23452342
Inspection of these results shows that configuration 19
is optimal in the sense of having the minimal loss with
respect to the optimal configuration corresponding to
each variable; the index of effectiveness, in this case, it is
q� ¼ 0:80: Therefore, the configuration 19 it is able to
monitor our four pollution variables with an effective-
ness of 79%.
Additional explorations, removing three stations were
also conducted with results of 60% effectiveness, but
they are not reported here.
4. Conclusions
As a summary of our study covering four atmospheric
contaminants: PM10, O3, SO2 and CO across the eight
monitoring stations, we have:
(a) The best configuration (using only seven out of 8
monitoring stations) is composed by stations B, F,
L, M, O, P, Q, which can monitor the whole system
with an effectiveness of 79%.
(b) If the target is a single variable, the effectiveness
climbs to 98% (CO and configuration 5), 94%
(PM10 and configuration 1), 99% (for O3, config-
uration 6) or 79% (for SO2, configuration 5).
(c) The best configuration with only six stations is
formed by F, L, M, O, P and Q with a 80% of
effectiveness to monitor all the four variables.
Considering only CO, the effectiveness is 83%
for configuration 4; for PM10 the effectiveness is
81% with configuration 6; identical result is valid
for SO2 alone and for O3 the effectiveness is 79%
for configuration 16.
(d) Removing more than two stations resulted in poor
effectiveness (less than 60%).
Any statistical model is just an element of the
array of criteria contributing to any mayor politi-
cal decision making process. The levels of all the
Table 5
Shannon index of information for each variable and each combination of six stations (leaving two stations out)
Configuration Monitored stations Shannon index of information
Yes No CO PM10 O3 SO2
1 L,M,N,O,P,Q B,F 1.4317 1.7291 1.3754 1.9404
2 F,M,N,O,P,Q B,L 1.6567 1.9752 1.1787 1.5265
3 F,L,N,O,P,Q B,M 1.6512 1.5042 1.3389 1.4447
4 F,L,M,O,P,Q B,N 2.2828 2.2189 1.4767 2.0309
5 F,L,M,N,P,Q B,O 1.7228 1.9198 1.7478 2.1545
6 F,L,M,N,O,Q B,P 2.0942 2.2343 1.3029 2.5317
7 F,L,M,N,O,P B,Q 1.5222 1.7985 1.7094 1.5646
8 B,M,N,O,P,Q F,L 1.8704 1.7092 1.6397 1.7074
9 B,L,N,O,P,Q F,M 1.9772 2.0363 1.2840 1.8932
10 B,L,M,O,P,Q F,N 1.9533 1.9929 1.6348 1.8858
11 B,L,M,N,P,Q F,O 1.8894 1.9354 1.4582 1.7448
12 B,L,M,N,O,Q F,P 2.1327 1.9204 1.1145 1.9599
13 B,L,M,N,O,P F,Q 1.7052 1.8421 1.6253 1.4290
14 B,F,N,O,P,Q L,M 1.3388 1.7487 1.4608 1.3971
15 B,F,M,O,P,Q L,N 2.0114 1.6715 1.7399 1.6786
16 B,F,M,N,P,Q L,O 1.2941 1.4433 1.8035 1.7174
17 B,F,M,N,O,Q L,P 1.7300 1.6769 1.5305 2.2076
18 B,F,M,N,O,P L,Q 1.2202 1.0318 1.6698 1.2159
19 B,F,L,O,P,Q M,N 1.8952 1.8146 1.4595 2.0251
20 B,F,L,N,P,Q M,O 1.1492 1.5427 1.6238 1.9398
21 B,F,L,N,O,Q M.P 1.7160 1.9833 1.0577 2.4188
22 B,F,L,N,O,P M,Q 1.1383 1.5785 1.5169 1.3056
23 B,F,L,M,P,Q N,O 1.7894 1.5078 1.6171 1.9194
24 B,F,L,M,O,Q N,P 1.3506 1.2075 1.6285 1.8358
25 B,F,L,M,O,P N,Q 1.6222 1.6092 1.7034 1.6004
26 B,F,L,M,N,Q O,P 1.4409 1.4405 1.7363 1.6826
27 B,F,L,M,N,P O,Q 1.0348 1.2479 1.5089 1.5460
28 B,F,L,M,N,O P,Q 1.3440 1.5807 1.7066 2.0062
The last column of this table shows that, in terms of SO2 and PM10 the most informative six-station configuration is the 6th (keep
unmonitored the stations B and P); the 16th configuration is the most informative in terms of O3 and 4th configuration is the best for
CO.
C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–2345 2343
contaminants that we have considered are responses
to the influence of multiple factors: presence of large
and small industries, human population, size and
characteristics of public and private transportation
systems, topographic and meteorological conditions,
etc. Therefore the multivariate approach: a joint
vectorial picture involving these responses, should
provide the most complete analysis from the informa-
tion perspective.
July was chosen as period to study because of two
basic facts. The contamination problem in Santiago is
permanent; levels may change along the year, but the
correlational pattern is essentially constant. Addition-
ally, public awareness respect to this problem increase
when public health crisis occur.
Looking at the geographical location of the stations, it
becomes reasonable consider that the Centro station
might be the less informative, since part of its informa-
tion might be redundant given the information collected
at the surrounding stations; this analysis can be
extended to the nearby Providencia station.
From a regional administration perspective, in
order to improve the monitoring network using the
same resources, our summary conclusion: remove
stations N (Centro) and B (Providencia) and reallo-
cate them in more informative areas, must be combi-
ned with nonstatistical considerations. For example, as
creation of new industries in Santiago has been
strongly discouraged, it is reasonable to assign priority
to areas with accelerated demographic development,
as they demand more public and private transpor-
tation, larger commercial areas, construction of new
highways, etc., all potential sources of additional
pollution.
Table 6
Loss of information for each variable with respect to the optimal configuration when two of the eight stations are unmonitored
Configuration Monitored stations Loss of information
Yes No CO PM10 O3 SO2
1 L,M,N,O,P,Q B,F 0.3727 0.22610 0.2374 0.2336
2 F,M,N,O,P,Q B,L 0.2742 0.1160 0.3464 0.3970
3 F,L,N,O,P,Q B,M 0.2766 0.3268 0.2576 0.4294
4 F,L,M,O,P,Q B,N 0 0.0069 0.1812 0.1978
5 F,L,M,N,P,Q B,O 0.2452 0.1474 0.0308 0.1490
6 F,L,M,N,O,Q B,P 0.0825 0 0.2776 0
7 F,L,M,N,O,P B,Q 0.3331 0.1950 0.0522 0.3820
8 B,M,N,O,P,Q F,L 0.1806 0.2350 0.0908 0.3256
9 B,L,N,O,P,Q F,M 0.1337 0.0886 0.2880 0.2522
10 B,L,M,O,P,Q F,N 0.1442 0.1080 0.0935 0.2551
11 B,L,M,N,P,Q F,O 0.1722 0.1335 0.1915 0.3108
12 B,L,M,N,O,Q F,P 0.0656 0.1405 0.3820 0.2258
13 B,L,M,N,O,P F,Q 0.2529 0.1755 0.0988 0.4356
14 B,F,N,O,P,Q L,M 0.4134 0.2173 0.1900 0.4482
15 B,F,M,O,P,Q L,N 0.1188 0.2519 0.0353 0.33670
16 B,F,M,N,P,Q L,O 0.4331 0.3540 0 0.3216
17 B,F,M,N,O,Q L,P 0.2420 0.2494 0.1514 0.1280
18 B,F,M,N,O,P L,Q 0.4655 0.5382 0.0741 0.5198
19 B,F,L,O,P,Q M,N 0.1696 0.1878 0.1908 0.2001
20 B,F,L,N,P,Q M,O 0.4965 0.3095 0.0996 0.2338
21 B,F,L,N,O,Q M,P 0.2482 0.1123 0.4135 0.0446
22 B,F,L,N,O,P M,Q 0.5013 0.2935 0.1589 0.4843
23 B,F,L,M,P,Q N,O 0.2160 0.3252 0.1034 0.2419
24 B,F,L,M,O,Q N,P 0.4083 0.4595 0.0970 0.2749
25 B,F,L,M,O,P N,Q 0.2893 0.2797 0.0555 0.3678
26 B,F,L,M,N,Q O,P 0.3687 0.3552 0.0372 0.3354
27 B,F,L,M,N,P O,Q 0.5466 0.4414 0.1634 0.3894
28 B,F,L,M,N,O P,Q 0.4112 0.2925 0.0537 0.2076
For each value of the Shannon index of information (Table 5) its relative difference with respect to the maximum of the corresponding
column is reported here for that ‘‘variable by configuration’’ combination. Each italic numbers corresponds to the largest loss of
information associated to a given configuration. The minimum of these losses is 0.1978 implying that for a configuration of 6 stations
the effectiveness would be 80%. Recommendation: leave unmonitored stations N and B. Note that there is a practical tie between
configurations 4 and 19.
C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–23452344
Acknowledgements
The authors would like to thank Drs. P. P!erez and L.
Firinguetti for very helpful comments and Mr. I. Olaeta
(SESMA) for providing the raw information. C.S. work
was partially supported by FONDECYT Grant No.
1010085.
References
Atkinson, A.C., Cox, D.R., 1988. Transformations. In:
Johnson, N., Kotz, S. (Eds.), Encyclopedia of Statistical
Sciences, Wiley, New York, Vol. 9. pp. 312–318.
Broemeling, L.D., 1982. Box–Cox Transformation. In: John-
son, N., Kotz, S. (Eds.), Encyclopedia of Statistical
Sciences, Wiley, New York, Vol. 1. pp. 306–307.
Caselton, W.F., Zidek, J.V., 1984. Optimal monitoring
networks designs. Statistics and Probability Letters 2,
223–227.
Cressie, N., 1990. Statistics for Spatial Data. Iowa
State University, Wiley-Interscience Publication, Wiley,
New York.
Guttorp, P., Le, N.D., Sampson, P.D., Zidek, J.V., 1993. Using
entropy in the redesign of an environmental monitoring
network. In: Patil, G.P., Rao, C.R. (Eds.), Multivariate
Environmental Statistics. North-Holland, Amsterdam,
pp. 173–202.
Karppinen, A., et al., 2000. A modeling system for predicting
urban air pollution: comparison of model predictions with
the data of an urban measurement network in Helsinki.
Atmospheric Environment 34, 3735–3743.
Klir, G.J., Folger, T.A., 1988. Fuzzy sets, Uncertainty,
and Information, International Editions. Prentice-Hall,
Englewood Cliffs, NJ.
Mardia, K.V., 1985. Mardia’s test of multinormality. In:
Johnson, N., Kotz, S. (Eds.), Encyclopedia of Statistical
Sciences, Wiley, New York, Vol. 5. pp. 217–221.
Mardia, K.V., Goodall, C.R., 1993. Spatial–temporal analysis
of multivariate environmental monitoring data. In:
Patil, G.P., Rao, C.R. (Eds.), Multivariate Environmental
Statistics. North-Holland, Amsterdam, pp. 347–386.
Morawska, L., et al., 2002. Spatial variation of airborne
pollutant concentrations in Brisbane, Australia and its
potential impact on population exposure assessment. Atmo-
spheric Environment 36, 3545–3555.
Sampson, P.D., Guttorp, P., 1992. Nonparametric estimation
of nonstationary spatial covariance structure. Journal of the
American Statistics Association 87, 108–126.
Shannon, C.E., 1948. A mathematical theory of communica-
tion. The Bell System Technology Journal 27, 379–423,
623–656.
P!erez-Abreu, V., Rodr!ıguez, J.E., 1996. Index of effectiveness of
a multivariate environmental monitoring network. Environ-
metrics 7, 489–501.
Zimmerman, D.L., Hormer, K.E., 1991. A network design
criterion for estimating selected attributes of the semivario-
gram. Environmetrics 12, 425–441.
C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–2345 2345