Optimization of the atmospheric pollution monitoring network at Santiago de Chile

9
Atmospheric Environment 37 (2003) 2337–2345 Optimization of the atmospheric pollution monitoring network at Santiago de Chile Claudio Silva a, , Alexis Quiroz b a Universidad de Chile, Facultad de Medicina, Escuela de Salud P ! ublica, Casilla 70012, Correo 7, Santiago, Chile b Ingeniero Estad ! ıstico, Universidad de Santiago de Chile, Chile Received 7 November 2002; accepted 20 February 2003 Abstract Environmental pollution is a problem affecting many cities in our planet. Santiago de Chile is one with the worst indices. Because of that, local authorities implemented a few years ago an air quality monitoring network with eight monitoring stations located across the whole city. These stations continuously collect information about the presence and level of atmospheric contaminants as well as meteorological indices. As the budget for this activity is limited, to increase the monitoring network as the city grows might be an inefficient decision. To evaluate alternative decisions multiple criteria should be consider. A statistical evaluation of some low cost modifications of the network becomes a valid research topic. This paper attempts to optimize Santiago’s atmospheric monitoring network by excluding the least informative stations with respect to the variables under study: carbon monoxide (CO), airborne particulate material (PM 10 ), ozone (O 3 ) and sulfur dioxide (SO 2 ). To accomplish this, an index of multivariate effectiveness, based on Shannon information index, is applied to that network. r 2003 Elsevier Science Ltd. All rights reserved. Keywords: Effectiveness index; Monitoring network; Shannon information index; Atmospheric pollution 1. Introduction The design of a new atmospheric pollution monitor- ing network and the evaluation of an existing network have attracted the attention of different researchers. One approach followed by Zimmerman and Hormer (1991), Cressie (1990) and Mardia and Goodall (1993) focuses on the estimation of special attributes of a semivario- gram. A different perspective, based on the use of Shannon information, was initiated with the results of Caselton and Zidek (1984), applied in a univariate setup by Sampson and Guttorp (1992) and Guttorp et al. (1993) and later extended to a multivariate context by P! erez-Abreu and Rodr ! ıguez (1996). These authors applied their results to a 15-day campaign collecting data on four gaseous pollutants at Mexico City (March 1992). Recently the efficiency of the air pollution monitoring networks of Helsinski and Brisbane have been thoroughly studied by Karppinen et al. (2000) and Morawska et al. (2002), respectively. The chilean capital city, Santiago (33.5 S, 70.8 W), is located in a valley enclosed by the Los Andes mountain range. The city centre has an elevation of 520 m. The metropolitan area of Santiago exceeds 15000 km 2 , with population approaching 6.2 millions (National Statistics Institute, 2001 estimation; www.ine.cl). Annual rainfall averages less than 400 mm. Temperature varies typically from an annual minimum of 2 C to an annual maximum of 35 C. Prevailing wind direction is south- west into the city. Thermal inversion precludes vertical ventilation so that air pollutant concentrations are enhanced. This problem and its health consequences become especially acute in the April–August period. The public authorities (Santiago Metropolitan Health Authority, SESMA) have been operating an air quality monitoring network since 1990; eight stations, AE International – Central & South America Corresponding author. E-mail addresses: [email protected] (C. Silva), [email protected] (A. Quiroz). 1352-2310/03/$ - see front matter r 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S1352-2310(03)00152-3

Transcript of Optimization of the atmospheric pollution monitoring network at Santiago de Chile

Atmospheric Environment 37 (2003) 2337–2345

Optimization of the atmospheric pollution monitoringnetwork at Santiago de Chile

Claudio Silvaa,�, Alexis Quirozb

aUniversidad de Chile, Facultad de Medicina, Escuela de Salud P !ublica, Casilla 70012, Correo 7, Santiago, Chileb Ingeniero Estad!ıstico, Universidad de Santiago de Chile, Chile

Received 7 November 2002; accepted 20 February 2003

Abstract

Environmental pollution is a problem affecting many cities in our planet. Santiago de Chile is one with the worst

indices. Because of that, local authorities implemented a few years ago an air quality monitoring network with eight

monitoring stations located across the whole city. These stations continuously collect information about the presence

and level of atmospheric contaminants as well as meteorological indices.

As the budget for this activity is limited, to increase the monitoring network as the city grows might be an inefficient

decision. To evaluate alternative decisions multiple criteria should be consider. A statistical evaluation of some low cost

modifications of the network becomes a valid research topic. This paper attempts to optimize Santiago’s atmospheric

monitoring network by excluding the least informative stations with respect to the variables under study: carbon

monoxide (CO), airborne particulate material (PM10), ozone (O3) and sulfur dioxide (SO2). To accomplish this, an

index of multivariate effectiveness, based on Shannon information index, is applied to that network.

r 2003 Elsevier Science Ltd. All rights reserved.

Keywords: Effectiveness index; Monitoring network; Shannon information index; Atmospheric pollution

1. Introduction

The design of a new atmospheric pollution monitor-

ing network and the evaluation of an existing network

have attracted the attention of different researchers. One

approach followed by Zimmerman and Hormer (1991),

Cressie (1990) and Mardia and Goodall (1993) focuses

on the estimation of special attributes of a semivario-

gram. A different perspective, based on the use of

Shannon information, was initiated with the results of

Caselton and Zidek (1984), applied in a univariate setup

by Sampson and Guttorp (1992) and Guttorp et al.

(1993) and later extended to a multivariate context by

P!erez-Abreu and Rodr!ıguez (1996). These authors

applied their results to a 15-day campaign collecting

data on four gaseous pollutants at Mexico City (March

1992). Recently the efficiency of the air pollution

monitoring networks of Helsinski and Brisbane have

been thoroughly studied by Karppinen et al. (2000) and

Morawska et al. (2002), respectively.

The chilean capital city, Santiago (33.5�S, 70.8�W), is

located in a valley enclosed by the Los Andes mountain

range. The city centre has an elevation of 520m. The

metropolitan area of Santiago exceeds 15000 km2, with

population approaching 6.2 millions (National Statistics

Institute, 2001 estimation; www.ine.cl). Annual rainfall

averages less than 400mm. Temperature varies typically

from an annual minimum of �2�C to an annual

maximum of 35�C. Prevailing wind direction is south-

west into the city. Thermal inversion precludes vertical

ventilation so that air pollutant concentrations are

enhanced. This problem and its health consequences

become especially acute in the April–August period.

The public authorities (Santiago Metropolitan Health

Authority, SESMA) have been operating an air

quality monitoring network since 1990; eight stations,

AE International – Central & South America

�Corresponding author.

E-mail addresses: [email protected] (C. Silva),

[email protected] (A. Quiroz).

1352-2310/03/$ - see front matter r 2003 Elsevier Science Ltd. All rights reserved.

doi:10.1016/S1352-2310(03)00152-3

distributed through out the urban area are components

of this network (See Figs. 1 and 2). They continuously

collect information on pollutants in the atmosphere and

some meteorological variables. The main structure of

each station is a container METONE with special

entries for air sampling and adequate accommodations

for API gas analyzers and a R&P particle analyzer; daily

automatic calibration is conducted using USA-EPA

standards. A 10m pole supports the instruments

necessary for the collection of the meteorological

parameters. A data logger saves the information at each

station; hourly this material is sent electronically to a

central computer. The use of the environmental software

AIRVIRO at central and local levels provides a fast and

efficient data management.

To control the serious public health problems

associated with the atmospheric pollution, the public

authorities must take complicated long term and also

daily decisions. Scientific, demographic, economic and

political criteria are involved. To maintain a monitoring

network prepared to generate opportune and reliable

information is crucial. Because of budgetary limitations

for this activity, to increase the monitoring network

accordingly with the city growth in extension, human

population and number of vehicles is not a simple

decision. To evaluate alternative decisions multiple

criteria should be consider. A statistical evaluation of

some low cost modifications of the network becomes a

relevant issue.

The main purpose of this work is to evaluate the

possible exclusion of those monitoring stations appear-

ing as ‘‘the least informative’’ and, if possible, to find out

an optimal configuration of stations, meaning a smaller

set of stations that provides adequate information for

administrative purposes. P!erez-Abreu and Rodr!ıguez

(1996) did a similar work for Mexico City using a 15

days campaign collecting data on four gaseous pollu-

tants. We use daily averages for the July 1998 values of:

(a) Carbon monoxide (CO; ppm).

(b) Airborne particulate material, fraction under 10

micron (PM10; micrograms per standard cubic

meter).

(c) Ozone (O3; ppm� 1000).

(d) Sulfur dioxide (SO2, ppm� 1000).

Section 2, introduces some basic definitions and

present results required for our analyses. In Section 3

we apply these ideas to Santiago atmospheric environ-

ment based on the four environmental variables above

mentioned. Special attention is devoted to validate

statistical assumptions.

2. Effectiveness of an environmental monitoring network

2.1. Shannon information index for an environmental

monitoring network

2.1.1. One variable of interest

Let us assume a collection of m locations, where

stations are continuously recording the magnitude of a

variable of interest; a subset A of n of these stations is

monitored and the complement B of m � n stations is

not monitored. On the other hand, for each instant t

(t ¼ 1;y;T), let Mt ¼ M t;1; :::;M t;n

� �be the measure-

ments of X at the monitored stations and U t ¼U t;1; :::;U t;m�n

� �those corresponding to the not mon-

itored stations. (Usually we will drop the subindice t:)Let fM ;U represent the joint density of M and U ; and

let fM ; fU be the corresponding marginal densities. Then,

Shannon index of information (Shannon, 1948; Klir and

Folger, 1988; P!erez-Abreu and Rodr!ıguez,1996) is

defined as

IðM ;UÞ ¼Z

lnfM ;U ðx; yÞfM ðxÞfU ðyÞ

� �fM ;U ðx; yÞ dx dy: ð1Þ

Note that, if M and U are independent, then

fM;U ðx; yÞ ¼ fM ðxÞfU ðyÞ and IðM ;UÞ ¼ 0:That is, the monitored stations do not provide

information on the unmonitored ones.

0 10 20 KM

N

S

EW

Fig. 1. Map of the city of Santiago, Chile, displaying the

geographic distribution of the eight stations forming the

atmospheric monitoring network. Dark areas correspond to

the urban region. Contour lines at the extreme right represent

the Andes Mountain; other hills surrounding Santiago are

similarly marked. Source: www.sesma.cl. Station B (Providen-

cia), Station F (Independencia), Station L (La Florida), Station

M (Las Condes), Station N (Santiago Centro), Station O

(Pudahuel), Station P (Cerrillos), and Station Q (El Bosque).

C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–23452338

If the m-dimensional vector (M, U) has a normal

multivariate distribution with mean vector l covariance

matrix R; then the Information Index of Shannon

reduces to:

IðM;UÞ ¼ �1

2ln

detðSÞdetðS11ÞdetðS22Þ

: ð2Þ

where S11 and S22 represent the covariance matrices of

M and U, respectively, and detð:Þ indicates the determi-nant of the corresponding matrix.

2.1.2. Two or more variables of interest

Let us assume that our interest is to use the same

network to monitor r variables X 1;y;X r: One possibi-lity is to evaluate and discuss, separately, for each

variable Shannon’s index IðM i;U iÞ i ¼ 1;y; r: How-ever, usually the optimum design based on one variable

is not the best for the others. Fortunately, different

strategies are available.

(A) Shannon joint information index: Let us assume

that M i ¼ fM i1;y;M i

ng i ¼ 1;y; r correspond to the

values of the variable X i at the n monitored stations and

let U i ¼ fU i1;y;U i

m�ng i ¼ 1;y; r be the values of X i

at the m � n unmonitored stations. The joint Shannon

index of information is defined as

IðM1yMr;U1

yU rÞ

¼Z

lnfM1yMr ;U1yUr ðx1;y;xr; y1;y; yrÞfM1yMr ðx1;yxrÞfU1yUr ðy1;y; yrÞ

�fM1yMrU1yUr ðx1yxr; y1yyrÞ dx1

ydxr dy1ydyr:

ð3Þ

Given multivariate normality, the Shannon joint

information index is

IðM ;UÞ ¼ �1

2ln

detðSÞdetðS11ÞdetðS22Þ

; ð4Þ

where S corresponds to the covariance matrix of

fM i1;y;M i

n;Ui1;y;U i

m�n; i ¼ 1;y; rÞg; S11 and S22

are the covariance matrices of M ¼ fM i1;y;M i

n; i ¼1;yrg and U ¼ fU i

1;y;U im�n; i ¼ 1;yrg; respec-

tively.

(B)Effectiveness index: Let assume that the optimal

composition of the collections A (monitored stations)

and B (unmonitored stations) is unknown and our

interest is to choose the n locations in A optimally on the

set of k ¼ Cmn possible configurations. Let j denote one

of such configurations of A and B; let be MðjÞ and UðjÞ

Variable SO2

DISTANCE TO STATION P

3000020000100000

CO

RR

ELA

TIO

N

1.0

.9

.8

.7

.6

.5

Q

ON

M

L

F

B

Variable MP10

CO

RR

ELA

TIO

N

.9

.8

.7

.6

.5

Q

P

O

N

M

L

BVariable CO

DISTANCE TO STATION F

1400012000100008000600040002000

DISTANCE TO STATION F

1400012000100008000600040002000

CO

RR

ELA

TIO

N

1.0

.8

.6

.4

.2

0.0

QP

O

N

M

LB

Variable O3

DISTANCE TO STATION Q

24000

22000

20000

18000

16000

14000

12000

10000

8000

6000

CO

RR

ELA

TIO

N

1.0

.8

.6

.4

.2P

ON

M

L

FB

Fig. 2. Graph 2.1 is a plot of ‘‘the distance of each station to station F00 against ‘‘the correlation between corresponding levels of CO’’.

Graphs 2.2, 2.3 and 2.4 are similar for levels of MP10, O3 and SO2.

C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–2345 2339

the corresponding values of X : Then the optimal design

is the configuration jn; such that:

IðMðjnÞ;UðjnÞÞ

¼ maxfIðMðjÞ;UðjÞÞ; j ¼ 1;y; kg: ð5Þ

From this basis, monitoring more than one variable

may be handled through the following procedure to

design and evaluate multivariate-monitoring networks.

For each variable X i let I ij ðM

i;U iÞ be the Shannon indexassociated to the configuration AðjiÞ; i ¼ 1;y; r; j ¼1;y; k: Anðjni Þ will denote the configuration with

maximal I ij ¼ ðM i;U iÞ: Then

Pij AðjiÞ½ � ¼maxfI i

j ðMi;U ig � I i

j ðMi;U iÞ

maxfI ij ðM

i;U ig

with j ¼ 1;y; k ð6Þ

represents the loss of information corresponding to the

design AðjiÞ with respect to the configuration Aðjni Þoptimal when only the X i variable is considered.

For any 0opp1 let Cip ¼ fAðjiÞ; PijðAðjiÞppg be the

set of all possible design, for the variable X i; with loss ofinformation less or equal than p: Now we can define the

index of effectiveness for the collection of r variables as

qn ¼ supf1� p : C1p-y-Cr

pafg: ð7Þ

The quantity q� measures the ability of optimally

monitoring r variables by considering a network of n

stations chosen from a set of m available stations. In this

sense a small value of q� indicates a low performance

whereas a value close to 1 indicates a good performance.

It is possible that some relations between variables

can be explained by the physical location of the stations;

therefore spatial sampling concepts will be elaborated.

2.2. Statistical data analysis

2.2.1. Normality assumptions

Basic descriptive statistics (see Table 1) and a

Shapiro–Wilk test of normality were carried out for

each variable at each station. SO2 shows non-normality

in four stations, whereas CO, O3 and PM10 have similar

behaviour at one station. As we need multivariate

normality to apply the simplest form of Shannon’s index

of information, we explored the use of Box–Cox

transformations (Atkinson and Cox, 1988; Broemeling,

1982) to reach univariate normality.

yðlÞ ¼yl � 1

lif la0;

logðyÞ if l ¼ 0;

8<: ð8Þ

Table 1

Descriptive statistics for CO, PM10, O3 and SO2 at each station (based on daily averages for 1–31 July 1998)

B F L M N O P Q

CO

Mean 2.80 2.72 2.17 1.57 3.26 2.06 2.40 2.35

Std. dev. 0.74 0.85 0.59 0.41 1.43 1.16 1.01 0.72

Min. 1.71 1.42 1.08 0.80 0.74 0.55 0.34 1.09

Max. 4.71 4.67 3.24 2.48 6.71 5.38 4.07 4.05

PM10

Mean 109.39 126.06 159.53 94.58 137.06 124.79 126.33 133.78

Std. dev. 26.36 30.72 40.98 25.86 38.34 39.56 36.88 33.87

Min. 69.25 75.88 86.63 51.25 73.25 54.13 68.00 80.00

Max. 159.21 190.96 241.58 150.38 224.96 203.46 190.29 195.29

O3

Mean 5.61 5.95 11.09 11.35 8.15 11.11 14.20 8.51

Std. dev. 1.61 2.77 3.93 4.26 3.75 2.99 8.02 2.84

Min. 2.83 0.94 2.79 2.96 1.79 5.63 3.33 2.38

Max. 10.21 12.54 18.50 18.38 16.54 17.29 31.96 13.46

SO2

Mean 11.06 10.88 9.23 5.24 9.54 6.05 8.19 9.04

Std. dev. 4.86 5.29 4.96 2.75 4.95 3.58 4.62 4.03

Min. 3.04 2.63 3.75 1.54 1.96 1.64 1.88 3.38

Max. 21.04 21.42 24.63 13.33 21.29 13.96 17.25 17.83

The levels of these contaminants look almost uniform across the city. However, in a closer look, we can note that Las Condes (M), a

station located in an upper class neighborhood, shows the lowest levels for CO, PM10 and SO2, but has one of the worse levels in O3;

Centro (N), the downtown station, is the worse for CO (second for PM10). On the other hand, La Florida (L), located in a middle class

neighborhood appears as the worse location both for PM10 and SO2.

C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–23452340

where the parameter l is estimated using Maximum

likelihood. Any lognormal variable is included in this

family of transformations for l ¼ 0:In Table 2 (left-hand side), we present the estimates of

l for each non-normal variable-station combination.

All the transformed variables present p-values larger

than 0.15 for Shapiro–Wilk test. Having passed this

necessary condition for multivariate normality we

implemented Mardia’s test for joint multivariate p-

normality (Mardia, 1985) based on the following

definitions:

Multivariate skewness :

b1; p ¼ Efðx � mÞ0S�1ðy � mÞg3;

Multivariate kurtosis :

b2; p ¼ Efðx � mÞ0S�1ðx � mÞg2:

Under multivariate normality the values of these

parameters are b1; p ¼ 0 and b2;p ¼ pðp þ 2Þ; respec-

tively.

To evaluate adequate test statistics we must follow the

following steps:

(a) For each pair of observed vectors xi and xjiaj; wecompute

gij ¼ ðxi � %xÞ0S�1ðxj � %xÞ: ð9Þ

(b) Then

b1; p ¼1

n2

Xn

i; j¼1

g3ij and b2; p ¼1

n

Xn

i¼1

g2ij and finally

ð10Þ

(c) The test statistics with their corresponding sam-

pling null distributions are:

n

6b1; pBw2

1

6pðp þ 1Þðp þ 2Þ

�and

fb2; p � pðp þ 2Þg

f8pðp þ 2Þ=ng1=2BNð0; 1Þ: ð11Þ

For our present situation H0 : b1;4 ¼ 0 and H0 : b2;4 ¼24: In Table 2 (right-hand side) we summarize the resultsof this analysis:

Now we can proceed to compute the Shannon index

under its simpler form.

3. Design and application of an index of effectiveness for

Santiago atmospheric monitoring network

Using formula (2) we evaluated the Shannon index of

information excluding one station at a time (eight

possible configurations). Therefore we get a value for

each combination (variable by configuration) See

Table 3.

From Table 3 we can see that, for variable CO,

configuration number 2 presents the highest Shannon

index of information: this means that set of stations B,

L, M, N, O, P, Q gives maximum information with

respect the excluded station F. Graph 2.1 displays for

each station pair including F the distance (m) and the

correlation for their CO values (Fig. 2). It is clear that,

for six out of seven pairs, the correlations are at least 0.6

(even for stations that are rather distant from F).

This means that in terms of CO, the station F could

be removed since most of its information would be

preserved by the collection formed with the stations

B, L, M, N, O, P, Q.

Considering the variable PM10, the Shannon index of

information shows that configuration number two (re-

moved F keeping stations B, L, M, N, O, P, Q) it is

optimal. In Graph 2.2 we have distances and correlations

for each pair of stations keeping fixed station F. From this

figure we observe that, independently from the distance

between station F and any of the remaining, all the

correlations between PM10 values are high and statistically

significative. Therefore, regarding PM10 and based on

Shannon’s index of information, the station F could be

removed since the other stations give the maximum of

information compared with all other configurations.

Table 2

Box–Cox l estimates for each non-normal variable (P-values for Mardia’s tests at each station)

Station Contaminant p-value under hypothesis

CO PM10 O3 SO2 H0 : b1;4 ¼ 0 H0 : b2;4 ¼ 24

B . . . 0.5 0.5851 0.5902

F . . . 0.4 0.6061 0.2148

L . . . �0.1 0.9756 0.1293

M . . . �0.1 0.9165 0.2552

N . . . 0.3 0.2634 0.7460

O 0.1 . . �0.1 0.9473 0.1806

P . . 0.1 0.3 0.3143 0.2150

Q . �0.1 . 0.1 0.8994 0.1602

These p-values show that, after Box–Cox transformation, multivariate normality has been attained.

C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–2345 2341

Similar discussion with respect to O3 shows that the

configuration B, F, L, M, N, O, P (ignoring station Q), is

the most efficient (see Table 4 and Graph 2.3).

Finally, for variable SO2, combining both criteria we

conclude that station P is the candidate for exclusion.

From these figures and the computation of Shannon’s

index of information we calculated the losses of informa-

tion, with respect to the optimal configuration, for all the

other configurations and for each variable. In short, the

idea is to find out a configuration of stations adequate to

monitor all the variables of interest with minimal loss of

information excluding one station, two, etc.

In Table 4 we observe that the fifth configuration is

the one with lowest loss of information as compared

with the optimal configuration. Since such losses are

lower than 0.21 we can say that the network effective-

ness index is q� ¼ 0:79; in other words, that configura-

tion can monitor the collection of four studied variables

with a 79% of effectiveness. Additionally we notice that

station N would be the ‘‘least informative’’, becoming

candidate for exclusion.

It is significant to note that regarding CO as the most

important variable would lead us to choose the fifth

configuration as the best, with an index of effectiveness

of q� ¼ 0:98: Similarly if PM10 is considered as the

most important variable, then the best configuration is

the number 1 with an index of effectiveness of q� ¼ 0:94;for O3 the sixth configuration is the best with q� ¼ 0:99and for SO2 the best configuration is the fifth with

q� ¼ 0:79:Now, we might be interested in the possible exclusion

of two stations from the network and to determine an

optimal six-station configuration (see Table 5).

From this table we can notice that if we consider the

CO variable, the optimum configuration is the fourth,

whereas for variables PM10 and SO2, that optimum

corresponds to the sixth configuration. Finally, for

variable O3, the optimum configuration it is the 16th.

From that analysis, we computed the loss of information

for each combination of stations as compared with the

corresponding optimum for each variable. The results

are summarized in Table 6.

Table 3

Shannon index of information each variable and each combination of seven stations (leaving one out)

Configuration Monitored stations Shannon index

Yes No CO PM10 O3 SO2

1 F,L,M,N,O,P,Q B 1.2530 1.3206 0.8928 1.4450

2 B,L,M,N,O,P,Q F 1.5262 1.4051 0.7929 1.3904

3 B,F,M,N,O,P,Q L 1.0377 1.0486 1.2465 1.0112

4 B,F,L,N,O,P,Q M 0.4784 0.7981 0.8042 0.7429

5 B,F,L,M,O,P,Q N 1.5032 1.1648 1.3250 1.4782

6 B,F,L,M,N,P,Q O 0.7292 0.7506 1.3533 1.3318

7 B,F,L,M,N,O,Q P 1.2422 1.2073 0.5077 1.8646

8 B,F,L,M,N,O,P Q 0.6628 0.8616 1.3637 0.7858

The last column of this table shows that, in terms of SO2, the most informative seven-station configuration is the 7th (keeping

unmonitored the station P); the 8th configuration is the most informative in terms of O3 and the second is the best for CO and PM10.

Table 4

Loss of information for each variable with respect to the optimal configuration

Configuration Monitored stations Loss of information

Yes No CO PM10 O3 SO2

1 F,L,M,N, O,P,Q B 0.1790 0.0602 0.3453 0.2250

2 B, L,M,N,O,P,Q F 0 0 0.4185 0.2543

3 B,F,M,N,O,P,Q L 0.3200 0.2537 0.0859 0.4577

4 B,F,L,N,O,P,Q M 0.6865 0.4320 0.4102 0.6016

5 B,F,L,M,O,P,Q N 0.0150 0.1710 0.0284 0.2072

6 B,F,L,M,N,P,Q O 0.5222 0.4658 0.0076 0.2857

7 B,F,L,M,N,O,Q P 0.1861 0.1408 0.6277 0

8 B,F,L,M,N,O,P Q 0.5657 0.3868 0 0.5786

For each value of the Shannon index of information (Table 3) its relative difference with respect to the maximum of the corresponding

column is reported here loss of information of that ‘‘variable by configuration’’ combination. Each italic numbers corresponds to the

largest loss of information associated to a given configuration. The minimum of these losses is 0.2072 implying that for a configuration

of 7 stations the effectiveness would be 79%. Recommendation: leave unmonitored the station N.

C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–23452342

Inspection of these results shows that configuration 19

is optimal in the sense of having the minimal loss with

respect to the optimal configuration corresponding to

each variable; the index of effectiveness, in this case, it is

q� ¼ 0:80: Therefore, the configuration 19 it is able to

monitor our four pollution variables with an effective-

ness of 79%.

Additional explorations, removing three stations were

also conducted with results of 60% effectiveness, but

they are not reported here.

4. Conclusions

As a summary of our study covering four atmospheric

contaminants: PM10, O3, SO2 and CO across the eight

monitoring stations, we have:

(a) The best configuration (using only seven out of 8

monitoring stations) is composed by stations B, F,

L, M, O, P, Q, which can monitor the whole system

with an effectiveness of 79%.

(b) If the target is a single variable, the effectiveness

climbs to 98% (CO and configuration 5), 94%

(PM10 and configuration 1), 99% (for O3, config-

uration 6) or 79% (for SO2, configuration 5).

(c) The best configuration with only six stations is

formed by F, L, M, O, P and Q with a 80% of

effectiveness to monitor all the four variables.

Considering only CO, the effectiveness is 83%

for configuration 4; for PM10 the effectiveness is

81% with configuration 6; identical result is valid

for SO2 alone and for O3 the effectiveness is 79%

for configuration 16.

(d) Removing more than two stations resulted in poor

effectiveness (less than 60%).

Any statistical model is just an element of the

array of criteria contributing to any mayor politi-

cal decision making process. The levels of all the

Table 5

Shannon index of information for each variable and each combination of six stations (leaving two stations out)

Configuration Monitored stations Shannon index of information

Yes No CO PM10 O3 SO2

1 L,M,N,O,P,Q B,F 1.4317 1.7291 1.3754 1.9404

2 F,M,N,O,P,Q B,L 1.6567 1.9752 1.1787 1.5265

3 F,L,N,O,P,Q B,M 1.6512 1.5042 1.3389 1.4447

4 F,L,M,O,P,Q B,N 2.2828 2.2189 1.4767 2.0309

5 F,L,M,N,P,Q B,O 1.7228 1.9198 1.7478 2.1545

6 F,L,M,N,O,Q B,P 2.0942 2.2343 1.3029 2.5317

7 F,L,M,N,O,P B,Q 1.5222 1.7985 1.7094 1.5646

8 B,M,N,O,P,Q F,L 1.8704 1.7092 1.6397 1.7074

9 B,L,N,O,P,Q F,M 1.9772 2.0363 1.2840 1.8932

10 B,L,M,O,P,Q F,N 1.9533 1.9929 1.6348 1.8858

11 B,L,M,N,P,Q F,O 1.8894 1.9354 1.4582 1.7448

12 B,L,M,N,O,Q F,P 2.1327 1.9204 1.1145 1.9599

13 B,L,M,N,O,P F,Q 1.7052 1.8421 1.6253 1.4290

14 B,F,N,O,P,Q L,M 1.3388 1.7487 1.4608 1.3971

15 B,F,M,O,P,Q L,N 2.0114 1.6715 1.7399 1.6786

16 B,F,M,N,P,Q L,O 1.2941 1.4433 1.8035 1.7174

17 B,F,M,N,O,Q L,P 1.7300 1.6769 1.5305 2.2076

18 B,F,M,N,O,P L,Q 1.2202 1.0318 1.6698 1.2159

19 B,F,L,O,P,Q M,N 1.8952 1.8146 1.4595 2.0251

20 B,F,L,N,P,Q M,O 1.1492 1.5427 1.6238 1.9398

21 B,F,L,N,O,Q M.P 1.7160 1.9833 1.0577 2.4188

22 B,F,L,N,O,P M,Q 1.1383 1.5785 1.5169 1.3056

23 B,F,L,M,P,Q N,O 1.7894 1.5078 1.6171 1.9194

24 B,F,L,M,O,Q N,P 1.3506 1.2075 1.6285 1.8358

25 B,F,L,M,O,P N,Q 1.6222 1.6092 1.7034 1.6004

26 B,F,L,M,N,Q O,P 1.4409 1.4405 1.7363 1.6826

27 B,F,L,M,N,P O,Q 1.0348 1.2479 1.5089 1.5460

28 B,F,L,M,N,O P,Q 1.3440 1.5807 1.7066 2.0062

The last column of this table shows that, in terms of SO2 and PM10 the most informative six-station configuration is the 6th (keep

unmonitored the stations B and P); the 16th configuration is the most informative in terms of O3 and 4th configuration is the best for

CO.

C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–2345 2343

contaminants that we have considered are responses

to the influence of multiple factors: presence of large

and small industries, human population, size and

characteristics of public and private transportation

systems, topographic and meteorological conditions,

etc. Therefore the multivariate approach: a joint

vectorial picture involving these responses, should

provide the most complete analysis from the informa-

tion perspective.

July was chosen as period to study because of two

basic facts. The contamination problem in Santiago is

permanent; levels may change along the year, but the

correlational pattern is essentially constant. Addition-

ally, public awareness respect to this problem increase

when public health crisis occur.

Looking at the geographical location of the stations, it

becomes reasonable consider that the Centro station

might be the less informative, since part of its informa-

tion might be redundant given the information collected

at the surrounding stations; this analysis can be

extended to the nearby Providencia station.

From a regional administration perspective, in

order to improve the monitoring network using the

same resources, our summary conclusion: remove

stations N (Centro) and B (Providencia) and reallo-

cate them in more informative areas, must be combi-

ned with nonstatistical considerations. For example, as

creation of new industries in Santiago has been

strongly discouraged, it is reasonable to assign priority

to areas with accelerated demographic development,

as they demand more public and private transpor-

tation, larger commercial areas, construction of new

highways, etc., all potential sources of additional

pollution.

Table 6

Loss of information for each variable with respect to the optimal configuration when two of the eight stations are unmonitored

Configuration Monitored stations Loss of information

Yes No CO PM10 O3 SO2

1 L,M,N,O,P,Q B,F 0.3727 0.22610 0.2374 0.2336

2 F,M,N,O,P,Q B,L 0.2742 0.1160 0.3464 0.3970

3 F,L,N,O,P,Q B,M 0.2766 0.3268 0.2576 0.4294

4 F,L,M,O,P,Q B,N 0 0.0069 0.1812 0.1978

5 F,L,M,N,P,Q B,O 0.2452 0.1474 0.0308 0.1490

6 F,L,M,N,O,Q B,P 0.0825 0 0.2776 0

7 F,L,M,N,O,P B,Q 0.3331 0.1950 0.0522 0.3820

8 B,M,N,O,P,Q F,L 0.1806 0.2350 0.0908 0.3256

9 B,L,N,O,P,Q F,M 0.1337 0.0886 0.2880 0.2522

10 B,L,M,O,P,Q F,N 0.1442 0.1080 0.0935 0.2551

11 B,L,M,N,P,Q F,O 0.1722 0.1335 0.1915 0.3108

12 B,L,M,N,O,Q F,P 0.0656 0.1405 0.3820 0.2258

13 B,L,M,N,O,P F,Q 0.2529 0.1755 0.0988 0.4356

14 B,F,N,O,P,Q L,M 0.4134 0.2173 0.1900 0.4482

15 B,F,M,O,P,Q L,N 0.1188 0.2519 0.0353 0.33670

16 B,F,M,N,P,Q L,O 0.4331 0.3540 0 0.3216

17 B,F,M,N,O,Q L,P 0.2420 0.2494 0.1514 0.1280

18 B,F,M,N,O,P L,Q 0.4655 0.5382 0.0741 0.5198

19 B,F,L,O,P,Q M,N 0.1696 0.1878 0.1908 0.2001

20 B,F,L,N,P,Q M,O 0.4965 0.3095 0.0996 0.2338

21 B,F,L,N,O,Q M,P 0.2482 0.1123 0.4135 0.0446

22 B,F,L,N,O,P M,Q 0.5013 0.2935 0.1589 0.4843

23 B,F,L,M,P,Q N,O 0.2160 0.3252 0.1034 0.2419

24 B,F,L,M,O,Q N,P 0.4083 0.4595 0.0970 0.2749

25 B,F,L,M,O,P N,Q 0.2893 0.2797 0.0555 0.3678

26 B,F,L,M,N,Q O,P 0.3687 0.3552 0.0372 0.3354

27 B,F,L,M,N,P O,Q 0.5466 0.4414 0.1634 0.3894

28 B,F,L,M,N,O P,Q 0.4112 0.2925 0.0537 0.2076

For each value of the Shannon index of information (Table 5) its relative difference with respect to the maximum of the corresponding

column is reported here for that ‘‘variable by configuration’’ combination. Each italic numbers corresponds to the largest loss of

information associated to a given configuration. The minimum of these losses is 0.1978 implying that for a configuration of 6 stations

the effectiveness would be 80%. Recommendation: leave unmonitored stations N and B. Note that there is a practical tie between

configurations 4 and 19.

C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–23452344

Acknowledgements

The authors would like to thank Drs. P. P!erez and L.

Firinguetti for very helpful comments and Mr. I. Olaeta

(SESMA) for providing the raw information. C.S. work

was partially supported by FONDECYT Grant No.

1010085.

References

Atkinson, A.C., Cox, D.R., 1988. Transformations. In:

Johnson, N., Kotz, S. (Eds.), Encyclopedia of Statistical

Sciences, Wiley, New York, Vol. 9. pp. 312–318.

Broemeling, L.D., 1982. Box–Cox Transformation. In: John-

son, N., Kotz, S. (Eds.), Encyclopedia of Statistical

Sciences, Wiley, New York, Vol. 1. pp. 306–307.

Caselton, W.F., Zidek, J.V., 1984. Optimal monitoring

networks designs. Statistics and Probability Letters 2,

223–227.

Cressie, N., 1990. Statistics for Spatial Data. Iowa

State University, Wiley-Interscience Publication, Wiley,

New York.

Guttorp, P., Le, N.D., Sampson, P.D., Zidek, J.V., 1993. Using

entropy in the redesign of an environmental monitoring

network. In: Patil, G.P., Rao, C.R. (Eds.), Multivariate

Environmental Statistics. North-Holland, Amsterdam,

pp. 173–202.

Karppinen, A., et al., 2000. A modeling system for predicting

urban air pollution: comparison of model predictions with

the data of an urban measurement network in Helsinki.

Atmospheric Environment 34, 3735–3743.

Klir, G.J., Folger, T.A., 1988. Fuzzy sets, Uncertainty,

and Information, International Editions. Prentice-Hall,

Englewood Cliffs, NJ.

Mardia, K.V., 1985. Mardia’s test of multinormality. In:

Johnson, N., Kotz, S. (Eds.), Encyclopedia of Statistical

Sciences, Wiley, New York, Vol. 5. pp. 217–221.

Mardia, K.V., Goodall, C.R., 1993. Spatial–temporal analysis

of multivariate environmental monitoring data. In:

Patil, G.P., Rao, C.R. (Eds.), Multivariate Environmental

Statistics. North-Holland, Amsterdam, pp. 347–386.

Morawska, L., et al., 2002. Spatial variation of airborne

pollutant concentrations in Brisbane, Australia and its

potential impact on population exposure assessment. Atmo-

spheric Environment 36, 3545–3555.

Sampson, P.D., Guttorp, P., 1992. Nonparametric estimation

of nonstationary spatial covariance structure. Journal of the

American Statistics Association 87, 108–126.

Shannon, C.E., 1948. A mathematical theory of communica-

tion. The Bell System Technology Journal 27, 379–423,

623–656.

P!erez-Abreu, V., Rodr!ıguez, J.E., 1996. Index of effectiveness of

a multivariate environmental monitoring network. Environ-

metrics 7, 489–501.

Zimmerman, D.L., Hormer, K.E., 1991. A network design

criterion for estimating selected attributes of the semivario-

gram. Environmetrics 12, 425–441.

C. Silva, A. Quiroz / Atmospheric Environment 37 (2003) 2337–2345 2345