Some methods for testing the homogeneity of rainfall records

17
Journal of Hydrology, 58 (1982) 11--27 11 Elsevier Scientific Publishing Company, Amsterdam -- Printed in The Netherlands [3] SOME METHODS FOR TESTING THE HOMOGENEITY OF RAINFALL RECORDS T.A. BUISHAND Royal Netherlands Meteorological Institute (K.N.M.L ), De Bilt (The Netherlands) (Received June 25, 1981; accepted for publication August 19, 1981) ABSTRACT Buishand, T.A., 1982. Some methods for testing the homogeneity of rainfall records. J. Hydrol., 58: 11--27. Cumulative deviations from the mean are often used in the analysis of homogeneity. Features of five tests on the cumulative deviations are discussed. Some of these tests have optimal properties in testing the null hypothesis of homogeneity against a shift in the mean at an unknown point. Together with the classical yon Neumann ratio the tests were applied to the annual amounts of 30-yr. rainfall records in The Netherlands. For a large number of records strong indications for a change in the mean were found. There were only small differ- ences between the various test-statistics with respect to the number of records for which the null hypothesis was rejected. INTRODUCTION Homogeneous rainfall records are often required in hydrologic design. However, it frequently occurs that rainfall data over different periods are not comparable since the measured amount of rainfall depends on such factors as the type, height and exposure of the raingauge, which have not always been the same. Therefore many meteorological institutes maintain an archive with information on the raingauge sites and the instruments used. Unfortunately, it is often not possible to specify the nature of changes in the mean amount of rainfall from the station documentation. This is partly because it is not always known how a change in the instrument or in the raingauge site may influence the measured amount of rainfall and partly because it is highly questionable whether the station information gives a complete picture of the raingauge site during the period that the station has been in operation. Because of the uncertainty about possible changes, graphical methods are often used in climatology and hydrology to obtain some insight into the homogeneity of a record. A popular tool is the double-mass curve (Searcy and Hardison, 1960), which is obtained by plotting the cumulative amounts 0022-1694/82/0000--0000/$02.75 © 1982 Elsevier Scientific Publishing Company

Transcript of Some methods for testing the homogeneity of rainfall records

Page 1: Some methods for testing the homogeneity of rainfall records

Journal of Hydrology, 58 (1982) 11--27 11 Elsevier Scientific Publishing Company, Amsterdam -- Printed in The Netherlands

[3]

SOME M E T H O D S F O R T E S T I N G T H E H O M O G E N E I T Y O F R A I N F A L L R E C O R D S

T.A. BUISHAND

Royal Netherlands Meteorological Institute (K.N.M.L ), De Bilt (The Netherlands)

(Received June 25, 1981; accepted for publication August 19, 1981)

ABSTRACT

Buishand, T.A., 1982. Some methods for testing the homogeneity of rainfall records. J. Hydrol., 58: 11--27.

Cumulative deviations from the mean are often used in the analysis of homogeneity. Features of five tests on the cumulative deviations are discussed. Some of these tests have optimal properties in testing the null hypothesis of homogeneity against a shift in the mean at an unknown point.

Together with the classical yon Neumann ratio the tests were applied to the annual amounts of 30-yr. rainfall records in The Netherlands. For a large number of records strong indications for a change in the mean were found. There were only small differ- ences between the various test-statistics with respect to the number of records for which the null hypothesis was rejected.

INTRODUCTION

H o m o g e n e o u s rainfal l r eco rds are o f t e n requ i red in h y d r o l o g i c design. However , it f r e q u e n t l y occurs t h a t rainfal l da t a over d i f f e ren t pe r iods are n o t c o m p a r a b l e since t he m e a s u r e d a m o u n t o f ra infal l d e p e n d s on such f ac to r s as the t y p e , he igh t and e x p o s u r e o f the ra ingauge, which have n o t a lways b e e n the same. T h e r e f o r e m a n y m e t e o r o l o g i c a l ins t i tu tes m a i n t a i n an archive wi th i n f o r m a t i o n on t h e ra ingauge sites and the i n s t r u m e n t s used. U n f o r t u n a t e l y , it is o f t e n n o t poss ib le to spec i fy the na tu re o f changes in t he m e a n a m o u n t o f ra infal l f r o m the s t a t ion d o c u m e n t a t i o n . This is pa r t l y because it is n o t a lways k n o w n h o w a change in the i n s t r u m e n t or in the ra ingauge site m a y in f luence the m e a s u r e d a m o u n t o f ra infal l and pa r t l y because it is h ighly ques t ionab le w h e t h e r the s t a t ion i n f o r m a t i o n gives a c o m p l e t e p i c tu re o f the ra ingauge site dur ing the pe r iod t h a t the s t a t ion has been in o p e r a t i o n .

Because o f t he u n c e r t a i n t y a b o u t poss ib le changes , graphical m e t h o d s are o f t e n used in c l i m a t o l o g y and h y d r o l o g y t o ob t a in s o m e insight in to the h o m o g e n e i t y o f a r ecord . A p o p u l a r t o o l is the doub le -mass curve (Searcy and H a r d i s o n , 1960) , wh ich is o b t a i n e d b y p lo t t i ng the cumula t i ve a m o u n t s

0022-1694/82/0000--0000/$02.75 © 1982 Elsevier Scientific Publishing Company

Page 2: Some methods for testing the homogeneity of rainfall records

12

of the station under consideration against the cumulative amounts of a set of neighbouring stations. The plot ted points tend to fall along a straight line under condit ions of homogenei ty . Instead of the double-mass curve one can also plot the cumulative deviations f rom some average value. The cumula- tive deviations have the advantage that changes in the mean amount of rainfall are easier recognized (Craddock, 1979). The graph of the cumulative deviations is sometimes called a residual mass curve.

Though graphs are useful for the detec t ion of shifts in the mean it is usually no t obvious how real changes can be distinguished f rom purely ran- dom fluctuations. Therefore it is always necessary to test the significance of departures f rom homogenei ty by statistical methods. Common statistical techniques in climatology and hydrology are reviewed in a publication on climatic change by the World Meteorological Organization (W.M.O., 1966). It is a surprising fact, however, that these statistical tests are not based on some characteristic of the cumulative sums in the graphical analysis.

The intent ion of this paper is to discuss some tests on the cumulative deviations. These tests are compared with the classical von Neumann ratio.

A study was made on propert ies of the test-statistics for a simple model with a shift in the mean. Fur ther the usefulness of the tests was investigated for annual rainfall totals in The Netherlands for the period 1951-1980. First some features of the test-statistics are derived and then the application to the rainfall data is discussed.

STATISTICAL ANALYSIS OF HOMOGENEITY

In the in t roduct ion the need for statistical techniques was emphasized to test the homogenei ty of rainfall records. Suppose that one wants to test the homogenei ty of a sequence Y1, Y2, • • •, Yn. Under the null hypothesis H0 it is usually assumed that the Yi's have the same mean. The form of the alter- native hypothesis H1 is generally ra ther vague since of ten no reliable prior informat ion is available about possible changes in the mean.

Usually, some assumptions are made on the jo int distr ibution of the Yi's. Most tests require tha t the Yi's be independent . This is not a serious restric- t ion, since the tests are usually per formed on consecutive seasonal or annual values which are approximately independent in many countries. The distri- butions of the test-statistics in this paper are derived for the case that the Yi's are stochastically independent and have a normal distribution, The tests can still be applied, however, when there are slight departures f rom nor- mality.

In the li terature about testing the homogenei ty of rainfall records, hardly any a t tent ion is paid to the distribution of test-statistics under the alterna- tive hypothesis. Generally, no informat ion is given on the probabil i ty of rejecting the null hypothesis in relation to the magnitude of changes in the mean. In this paper, propert ies of test-statistics are illustrated for the case that the Yi's are normally distr ibuted with mean:

Page 3: Some methods for testing the homogeneity of rainfall records

13

u , i = 1 . . . . , m E (Yi) = t (1) # + A , i = m + l . . . . , n

and variance:

var Yi = o~

The model assumes a jump in the mean of magnitude A after m observa- tions. In the sequel, examples are given of the probability of rejecting H 0 as a funct ion of A. Also, some remarks are made on the estimation of the change-point m.

The von Neumann ratio

The well-known yon Neumann ratio is defined by:

,-1 2 / ~ N = ~ ( Y i - - Yi+l) ( Y i - - Y): (2)

i= l ~ i = 1

in which Y stands for the average of the Yi's. Under the null hypothesis of a constant mean it can be shown that

E(N) = 2. For a non-homogeneous record the mean of N tends to be smaller than 2. A table of percentage points of N for normally distributed samples is given by Owen (1962).

The yon Neumann ratio is closely related to the first-order serial correla- t ion coefficient (W.M.O., 1966). A comprehensive study of the effect of changes in the mean on the correlogram was made by Yevjevich and Jeng (1969).

Cumulative deviations

Tests for homogenei ty can be based on the adjusted partial sums or cumu- lative deviations from the mean:

k

S~ = 0; St = ~ (Yi- -Y) , k = 1 , . . . , n (3) i= l

Note that S* -- 0. For a homogeneous record one may expect that the S t ' s f luctuate around zero since there is no systematic pattern in the deviations of the Yds from their average value Y. On the other hand, when A is negative in eq. 1 most values of St are positive because the Yi's tend to be larger than :P if i ~< m, and smaller than Y if i > m. A typical example is given in Fig. 1. For A positive the S t ' s tend to be negative.

Rescaled adjusted partial sums are obtained by dividing the S~'s by the sample standard deviation:

St* = S~/Dy , k = 0 , . . . , n (4)

Page 4: Some methods for testing the homogeneity of rainfall records

14

Yk 20-

10

0 ' ' '5 0 5 1o I

k 5 ~0 15

k

Fig. 1. Non-homogeneous time series with adjusted partial sums.

with

D2Y = ~ (Y i-- Y)2/n i=l

The values of the S~*'s are not influenced by a linear transformation of the data. For instance, if the amount of rainfall is expressed in metres in- stead of in millimetres, the S~ 's are diminished by a factor 1000 but the S~*'s remain unchanged. Therefore tests of homogeneity are based on the rescaled adjusted partial sums S~*.

A statistic which is sensitive to departures from homogeneity is:

V = max IS~*l (5) O ~ k ~ n

High values of Q are an indication for a change in level. Critical values for the test-statistic can be found in Table I. The percentage points in this table are based on 19,999 synthetic sequences of Gaussian random numbers. For n-+~o the critical values of Q can be obtained from a table of the Kolmogorow-Smirnov goodness-of-fit statistic, see the Appendix.

T A B L E I

Percentage points of Qp~rn a n d R]X/-n

n Q]Vrn R/%/~

90% 95% 99% 90% 95% 99%

10 1.05 1 .14 1 .29 1.21 1.28 1.38 20 1 .10 1 .22 1 .42 1 .34 1.43 1 .60 30 1 .12 1 .24 1 .46 1.40 1.50 1 .70 40 1.13 1 .26 1 .50 1.42 1 .53 1 .74 50 1 .14 1.27 1 .52 1.44 1.55 1.78

100 1.17 1 .29 1.55 1 .50 1 .62 1.86 oo 1.22 1 .36 1 .63 1 .62 1.75 2 .00

Page 5: Some methods for testing the homogeneity of rainfall records

15

Another statistic which can be used for testing homogenei ty is the range:

R = max S t * - - min St* (6) O ~ k < ~ n O ~ k ~ n

The range is an important quant i ty in studies on the storage capacity of reservoirs. Much work has been done on its statistical properties in relation to the famous Hurst phenomenon (Gomide, 1978).

Shifts in the mean usually give rise to high values of the range. A figure with percentage points of the distribution of R under the null hypothesis is given by Wallis and O'Connell (1973). Some percentage points are also given in Table I since it is not convenient to determine critical values from a graph.

Worsley's likelihood ratio test

Consider again eq. 1 and assume that one wishes to test A = 0 against A 4= 0. If the position of the change-point m is known Student 's t-test can be used. In situations that no information about m is available the test can be based on.

W = max [tk[ (7) l <~ k <~ n - 1

where tk denotes Student 's t for testing a difference in mean between the first k and the last (n -- k) observations.

Critical values for the test-statistic can be obtained from a paper by Worsley (1979). The test is equivalent with the likelihood ratio test. It is also possible to give a relation between W and the weighted adjusted partial s u m s :

Z'~ = [ k ( n - - k ) l - v 2 S t , k = l , . . . , n - - 1 (8)

The largest weights are given to S T and S*-1. The weights are relatively small for k in the neighbourhood of ½ n. From eq. A-3 in the Appendix it is seen that the variance of Z~ does not depend on k.

Dividing Z~ by the sample standard deviation gives the weighted rescaled adjusted partial sums Z~ *. Let

V = max I Z~*l (9) l~<k~< n - 1

then some algebra shows (Worsley, 1979):

W = ( n - - 2) 1/2 V/ (1- - V 2)1/2 (10)

So there is a unique relation between V and W, which means that tests on V and W are equivalent.

Page 6: Some methods for testing the homogeneity of rainfall records

16

Bayesian procedures

Bayes ian p r o c e d u r e s fo r t he d e t e c t i o n o f changes in the m e a n have been d e v e l o p e d b y C h e r n o f f and Zacks (1964) and G a r d n e r (1969) . In the deriva- t i on o f Bayes ian tes t s i t is a s sumed t h a t t he var iance a ~ is k n o w n . G a r d n e r ' s s ta t is t ic fo r a two-s ided t e s t on a shif t in the m e a n at an u n k n o w n p o i n t can be w r i t t e n as:

r~- i

= ~ Pk {S~ / ay}2 (11) k = l

where Pk d e n o t e s t he p r io r p r o b a b i l i t y t h a t the shif t occurs jus t a f t e r the k t h obse rva t i on (k = 1 , . . . , n - - 1).

When the s t anda rd dev ia t ion is n o t k n o w n a y can be rep laced b y the sample s t anda rd dev ia t ion . F o r Pk i n d e p e n d e n t o f k ( u n i f o r m pr io r dis t r ibu- t ion) one ob ta ins :

1 n-1 - E {S~*} 2 (12) U n(n + 1) k=,

and fo r Pk p r o p o r t i o n a l t o 1 / [ k (n - - k) ] one ob ta ins :

n - 1

A = ~ {Z~*} 2 (13) k = l

Large values of these tes t -s ta t is t ics are an ind ica t ion fo r d e p a r t u r e s f r o m h o m o g e n e i t y . Cri t ical values fo r U and A are given in Tab le II . T h e pe rcen t - age po in t s in this t ab le are based on 19 ,999 s y n t h e t i c sequences o f Gauss ian r a n d o m n u m b e r s . T h e l imi t ing d i s t r ibu t ions o f U and A are those of cer ta in tes t -s ta t is t ics o f the C r a m ~ r - - v o n Mises t ype . The s ta t is t ic U/n c o r r e s p o n d s a s y m p t o t i c a l l y wi th S m i r n o v ' s ¢~2 and the s ta t is t ic A wi th the A n d e r s o n - - Dar l ing s tat is t ic , see the A p p e n d i x .

TABLE II

Percentage points of U and A

n U A

90% 95% 99% 90% 95% 99%

10 0.336 0.414 0.575 1.90 2.31 3.14 20 0.343 0.447 0.662 1.93 2.44 3.50 30 0.344 0.444 0.691 1.92 2.42 3.70 40 0.341 0.448 0.693 1.91 2.44 3.66 50 0.342 0.452 0.718 1.92 2.48 3.78

100 0.341 0.457 0.712 1.92 2.48 3.82 oo 0.347 0.461 0.743 1.93 2.49 3.86

Page 7: Some methods for testing the homogeneity of rainfall records

17

The power of tests on homogeneity

The probability of detecting changes in the mean of a sequence Y1, Y 2 , . . - , Yn by statistical methods depends on how serious these changes are. When only a small change occurs during a short period of the sample record there is little chance that the tests will indicate non-homogeneity. On the other hand, for feasible test-statistics it is necessary that they should be able to indicate all relevant departures from homogeneity.

A study on the power of tests for a change in level at an unknown point was made by Sen and Srivastava (1975). These authors compared the likeli- hood ratio statistic with Bayesian procedures. In this paper the power of N, Q and W is discussed for testing A = 0 against A ~= 0.

For a particular test-statistic the probability of rejecting H0 depends on the significance level ~, the value of A, the standard deviation oy, the num- ber of observations n and the position of the change-point m. The depen- dence on A and Oy can be combined into one single parameter: A' = A/oy.

The power of N, Q and W was investigated for ~ = 0.05 and n = 30. Com- parisons between these test-statistics were based on their power function:

P (A ' ,m) = Pr(H0 is rejected IA ' ,m) (14)

If A = 0, then P(A ' ,m) = ~ = 0.05; for A ~= 0 and m fixed the power func- t ion increases monotonical ly with the absolute value of A'. With IA'l growing, P(A',m) tends to 1, that is H 0 is rejected with probability 1.

To obtain the power of N, Q and W 1,999 sequences of 30 pseudo-random numbers were generated from a standard normal distribution. For each se- quence the statistics N, Q and W were calculated and then the critical values were read from the ordered samples of the computed statistics. The powers

P(a.m) 1.0 , " ' " " ......

0.8- y / /

0.6- Q ~ / /'I"'' ~" / /

0 . 4 - /.." / ' ] : l l l l m : f / 0.2- ."" //

15 .~...~.. n : 30

0 o '

i

i &

Fig. 2. S imula ted powers o f the statistics N, Q and W for test ing a change of level in the middle o f a sequence (ol = 0.06).

Page 8: Some methods for testing the homogeneity of rainfall records

18

P(z~',m) 1.0

0.8 - . . ~ S~ W,~ ,...°"

..' / 1/ ,:" / /

O.Z, " " ~ - - / i

.. ~ / " "-N

0.2 .;;,d" m=5 " n = 3 0

0 i [ i i i &

Fig. 3. Simulated powers of the statistics N, Q and W for testing a change of level near the beginning of a sequence (~ = 0.05).

were based on the same set of random numbers by calculating the test- statistics again after adding a constant A' to the last ( 3 0 - m) numbers of each sequence.

Simulated power funct ions of N, Q and W for m = 15 and m = 5 {which is equivalent to m = 25) are given in Figs. 2 and 3, respectively. Since the power funct ions are symmetric in A', non-negative values of A' are con- sidered only. From the figures it is seen that the von Neumann ratio N is less powerful than Q and W both for m = 5 and m = 15. This is not surpris- ing since N is no t based on a specific form of the alternative hypothesis whereas Q and W are particularly designed for testing a change in level at an unknown point. For other departures f rom homogenei ty N could be more powerful than Q and W.

Q is superior to W for m = 15, while the opposite holds for m = 5. In general, for m in the neighbourhood of ½ n the statistic Q is more powerful than W. On the other hand, W is more sensitive to changes at the beginning and at the end of the sequence. This is a consequence of the large values of the weights [k(n -- k)]-]/2 near the end-points.

The power of the Bayesian statistic U is comparable with that of Q and the power funct ion of A is somewhat similar to that of W. For a single change in the mean the range R is less powerful than Q. But for two change- points the range usually gives a bet ter test. A case with two change-points is discussed by Buishand {1981).

Estimation o f the posi t ion o f a change-point

Graphs of cumulative deviations are of ten used to determine the posit ion

Page 9: Some methods for testing the homogeneity of rainfall records

Pr (K:k) 0.5-

of change-points. It is then assumed that something has happened at points where the cumulative sum plot shows a clear change of slope. For the model in eq. 1 the position of the maximum of IS~[ or IZ~[ can be taken as an estimate for the change-point m.

Let K be the value of k for which IS~[ reaches its maximum, i.e. Q = [S~* [. In the same way M is chosen such that V = [Z~*[. Asymptot ic properties of the statistic M were derived by Hinkley (1970). Because of the slow convergence of M to its asymptot ic distribution Hinkley's results are no t applicable to most hydrological sequences.

0.4

0.3-

0.2

0.1

Pr (M :k) 0.5-

_ m=5 ' ! f

r-J i i

k

~--- rn =15

I I L - , i

- J L -~ -

10 20

Fig. 4. Dis t r ibut ion o f the index for which Is~l reaches its m a x i m u m under the cond i t ion that the null hypothes is is re jec ted at the 5% level (n = 30, IA'I = 1.5).

0.4-

19

o.3- :*--m:5 ~--rn=15

o,2-

f - J

r-J

0 i0 20 k

Fig. 5. Distribution of the index for which i Z~t reaches its maximum under the condition that the null hypothes is is re jected at the 5% level (n ---- 30, i A ' i = 1.5).

Page 10: Some methods for testing the homogeneity of rainfall records

20

For n = 30 the probability distribution of K and M was obtained from the generated samples on which Figs. 2 and 3 were based. Only those samples were taken into account for which the null hypothesis of a constant mean was rejected at the 5% level. Figs. 4 and 5 give the distributions of K and M in the situation that IA'] = 1.5. The distributions are given for two positions of the change-point: m = 5 and m = 15.

The peak in the empirical distributions of K and M always coincides with the position of the change-point m. For m = 15 the statistic K is less dis- persed than M, while on the other hand M is superior if m = 5. The distribu- tion of K is highly skewed when the change in the mean occurs at the beginning of the sequence. This can be roughly explained as follows.

Fig. 6 gives the means of S~ and Z~ (obtained from eq. A-1 in the Appen- dix) for m = 5 and ~ = --1.5. The mean of S~ rises quickly to its maximum at k = 5, but for k > 5 the mean drops down slowly. From the figure one reads for instance E(S~o ) >E(S~). Also from eq. A-2 it follows var(S~0)> var(S~) and consequently for s sufficiently large Pr(S~o>S)> Pr(S~>s). Since the probability of high values for S~ at the beginning of the sequence is relatively small, it is very unlikely tha t S~ reaches its maximum for k < 5. Therefore the distribution of K is positively skewed in this situation. This is not the case for the statistic M, since for the weighted adjusted partial sums Z~ the curve of the mean is rather symmetric near the peak at k = 5, and the variance does not depend on k.

So in the situation of a single change-point the index for which Z~ reaches its maximum (or minimum) has a rather symmetric distribution. When there are two change-points it may occur that the positions of the maximum and minimum of the Z~'s have very skewed distributions (Buishand, 1981).

In Figs. 4 and 5 the magnitude of the jump in the mean was 1.5oy. For larger jumps in the mean the distributions of K and M are more concentrated around the position of the change-point m. When there is only a small change the estimates of m are widely scattered.

E (S~()

2-

0 0 ~0 20 k 30

E(Z~) 0.6

0.

0 3 10 2'o 30 k

Fig. 6. M e a n o f S ~ a n d Z ~ fo r n = 30 , m = 5 a n d A = - - 1 . 5 .

Page 11: Some methods for testing the homogeneity of rainfall records

21

APPLICATION TO RAINFALL DATA

In the climatological network of the Royal Netherlands Meteorological Institute (K.N.M.I.) there are about 320 stations with daily rainfall registra- tions (about 1 gauge per 100km ~ ). The data from 1951 onwards are avail- able on magnetic tape.

The homogeneity of the records from 264 stations was investigated for the period 1951--1980. Stations with long interruptions in the observations were not taken into account. To obtain a sequence of 30 yr. for each station, missing data (e.g. due to a change of observer or the damaging floods in February 1953) were supplemented from nearby stations.

The use o f year-by-year differences

For the analysis of homogeneity the country was divided into a number of regions (Fig. 7). In the flat regions I, II, III and IV there is little variation in the local rainfall climate. Differences in the mean amount of rainfall are more pronounced in the small hilly region V. Fig. 8 gives for each region the annual means over consecutive 5-yr. periods. The figure shows that the early 1950's were rather dry. The very wet 1960's were followed by the dry 1970's.

The statistical tests were applied to the sequence of year-by-year differ- ences:

ri = z i - Ri (15)

with Xi: amount of rainfall in year i for the station under consideration; and R~: average amount of rainfall in year i for the other stations in the region.

In general, regional means are hardly sensitive to changes in the site of individual rainfall stations. Local changes in the observations of the station under consideration affect the means of the Xi's and the Yi's in the same way. But since o~ < o~c, the Y~'s are preferred for testing homogeneity.

In The Netherlands the standard height of the rain-gauge is 0.40 m and the climatological conditions are such that a station relocation or a change in the exposure of the gauge may lead to a decrease or increase in the annual mean of 5--10%. From Fig. 8 it is seen that in all regions the annual mean is ~ 8 0 0 mm. Since oy is on average 45 mm, the standardized shift A' is nearly 1 for a change in the mean of 5%. For this value of A' it is seen from Fig. 2 that for m = 15 the probability of rejecting the null hypothesis varies from 0.27 to 0.67, depending on the test-statistic used. These probabilities are much smaller for a change in the mean near the end-points of the sequence. For m = 5 it follows from Fig. 3 tha t the probability of rejecting H 0 still differs substantially from 1 for jumps in the mean of 10% (A' ~ 2).

For each test-statistic the number of records were counted for which the null hypothesis of a constant mean was rejected at a certain significance level. The results are given in Table III for the 5% level and in Table IV for

Page 12: Some methods for testing the homogeneity of rainfall records

22

5 O i

l O O k m

o ..

II

s

' ! \ , r , ~, ) :/ i /

I g

• L . . - % ~'/

,

i!

r

i

/ )

( s.

' \4 "L\

Fig. 7. Geographical regions used in the analysis of homogeneity.

m m

9 0 0

8 0 0

7 0 0 -

I

/L 1950

11

1980 1950 1980

m m

9oo 1 m T~

,oo~F ~- r

2

1980 1950 1980

Fig. 8. Average annual amounts over 5-yr. periods for the regions in Fig. 7.

Page 13: Some methods for testing the homogeneity of rainfall records

TABLE III

Results of tests on homogeneity ((~ -- 0.05)

23

Region Total number of stations

Number of stations for which H 0 is rejected

N Q R W U A

I 66 16 21 22 23 17 16 II 71 22 25 31 20 23 22 III 53 17 10 10 11 11 10 IV 64 19 17 16 16 13 12 V I0 5 -- 1 -- -- --

Total 264 79 73 80 70 64 60

TABLE IV

Results of tests on homogeneity (~ ---- 0.01)

Region Total number of stations

Number of stations for which H0 is rejected

N Q R W U A

I 66 9 11 10 7 9 9 II 71 10 16 14 10 6 7 III 53 3 6 4 4 4 4 IV 64 6 8 11 6 6 6 V I0 . . . . . .

Total 264 28 41 39 27 25 26

the 1% level. Despi te the low p o w e r o f the test-statistics for relevant changes in the mean, there are m a n y records with s t rong statistical evidence o f non- h o m o g e n e i t y .

Discussion of the results

Table I I I shows tha t fo r each test-statist ic there are ~ 70 significant values at the 5% level. Under the null hypo thes i s o f a cons tan t mean for all 264 records the expec ted n u m b e r of significant values is 13. Provided tha t cor- re la t ion be tween the Yi's f r o m di f fe ren t s ta t ions can be neglected, t w e n t y significant values wou ld be highly unusual . This last a s sumpt ion is quest ion- able since there is a lways some corre la t ion be tween the year -by-year differ- ences o f nea rby stations. For instance, let A and B be two ne ighbour ing s ta t ions in the same region and suppose tha t in a par t icular yea r the a m o u n t o f rainfall a t s ta t ion A lies above the regional average. Then it is very likely tha t the annual a m o u n t o f its ne ighbour B is also higher than the regional mean. Due to this corre la t ion it is possible tha t in a par t icular region the n u m b e r o f significant values is m u c h larger than the expec ted n u m b e r under

Page 14: Some methods for testing the homogeneity of rainfall records

24

the null hypothesis. However, when all records are homogeneous it is very unlikely that this occurs in nearly all regions as is the case here.

So it can be concluded tha t many records are no t homogeneous. There are only small differences between the various test-statistics with respect to the number of significant values. The fact tha t this number is relatively high for the statistics N and R indicates tha t departures f rom homogenei ty do no t always consist of a single shift in the mean.

Though for a number of records there is statistical evidence of changes in the mean, it is of ten no t possible to correct these records. To make sensi- ble corrections it is necessary to know the causes of differences in the mean. For ten records with serious departures f rom homogenei ty a careful ex- amination of the station history was made to t ry to find a reason for these departures. Only in five cases was some indication found for a decrease or increase in the mean amount of rainfall. In one of these the situation of the raingauge site had been gradually improved and in four others there was a marked change in the slope of the cumulative sum plot coinciding with the date of a station relocation. But even for three of these four stations it was no t quite clear why the change of location resulted in a considerable de- crease or increase in the mean amount of rainfall.

Other methods for testing homogeneity

Sometimes the sequence of ratios Xi/Ri is preferred for testing homoge- nei ty (W.M.O., 1966). Instead of testing for a constant ratio between two quantit ies one can also test for a constant difference between their loga- rithms. The tests for homogenei ty were repeated with the logarithms of the annual amounts, which gave the same results as the tests on the original annual amounts.

The tests were also done with a part i t ion of The Netherlands into f if teen regions instead of five. For most stations the results were nearly identical. There were, however, a few stations for which one subdivision indicated serious departures f rom homogenei ty whereas the other subdivision did not.

SUMMARY AND CONCLUSIONS

Characteristics of cumulative deviations f rom the mean can be used to test the homogenei ty of rainfall records. As a first example two tests on the rescaled adjusted partial sums were introduced.

Weighted cumulative deviations were discussed to emphasize changes near the end-points of the sequence. It was pointed ou t tha t Worsley's likelihood ratio test for a shift in the mean in normal populat ions is equivalent to a test on the weighted adjusted partial sums.

Some a t tent ion was paid to Bayesian procedures for testing a change in level. The resulting test-statistics are simple quadratic forms of the rescaled adjusted partial sums.

Page 15: Some methods for testing the homogeneity of rainfall records

25

It was shown by the data generation method that tests on the cumulative deviations are superior to the classical yon Neumann ratio for a model with only one change in the mean. The tests were applied to annual data for 264 rainfall stations in The Netherlands. There was strong evidence of departures from homogeneity. The yon Neumann ratio gave nearly the same results as the tests on the cumulative deviations.

A C K N O W L E D G E M E N T S

The author wishes to thank his colleagues of the Climatological Branch for proposing this subject. He also would like to express his sincere gratitude to Messrs. A. Denkema and A.C. Patist for their work with the rainfall data.

APPENDIX

Properties of adjusted partial sums

When the Y}s have a normal distribution, then the adjusted partial sums are also normally distributed. For the model in eq. 1 it can readily be shown that:

E(S~) = t _ k(n -- m) A,

n

m(n - - k ) A ' n

k = O , . . . , m

k = m + l , . . . , n

(A-l)

and

var(S~) = k(n--k)o~,, k=O, . . . ,n ( A - 2 ) n

So for the weighted adjusted partial sums Z~ one obtains:

var(Z~) = l a ~ , k= 1 , . . . , n - - 1 (A-3) n

Asymptotic properties of the sequence {S~} in the situation that A = 0 are used in heuristic derivations of the limiting distributions of the Kolmo- gorov--Smirnov and the Cram6r--von Mises statistics (Doob, 1949; Anderson and Darling, 1952). Let:

= max ]S'~]/ay (A-4) O ~ k ~ n

Page 16: Some methods for testing the homogeneity of rainfall records

26

and

R = { m a x S t - - min S ~ } / a y (A-5) O ~ l ~ n O ~ k ~ n

The limiting distribution of Q/n is the same as that of the Kolmogorov-- Smirnov statistic (Doob, 1949). A derivation of the asymptot ic distribution o f / ~ is given by Feller (1951). The distribution of the quadratic form in eq. 11 was investigated by Anderson and Darling (1952) to derive the limit- ing distribution of the Cram~r--von Mises statistic.

Properties o f rescaled adjusted partial sums

Because of the sample standard deviation in the denominator of eq. 4 the rescaled adjusted partial sums do not have a normal distribution. When A = 0, the squares of the weighted rescaled adjusted partial sums Z~* have a beta distribution with parameters ½ and ½ n -- 1 (Anis and Lloyd, 1976). Therefore:

1 - - • • ° ~ var(Z~*) = E(Z~*) 2 - n - - l ' k = l , n - - 1 (A-6)

and

var(S~*) E ** 2 k ( n - - k ) = ( S k ) = n - - 1 ' k = O , . . . , n (A-7)

Substi tut ion of this expression for E(S~*) 2 in the right-hand side of eq. 12 gives E(U) = ~ for all n. So U/n and Smirnov's ~2 have the same mean. In the same way it is shown that E(A) = 1 in correspondence with the mean of the Anderson--Darling statistic.

Since for independent normal variates the sample standard deviation Dy converges with probabil i ty 1 to a y , the statistics Q and R have the same limiting distribution as Q and/~, respectively, and the asymptot ic distribu- tions of U/n and A are identical to those of Smirnov's ~2 and the Ander- son-Dar l ing statistic.

REFERENCES

Anderson, T.W. and Darling, D.A., 1952. Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. Ann. Math. Stat., 23: 193--212.

Anis, A.A. and Lloyd, E.H., 1976. The expected value of the adjusted rescaled Hurst range of independent normal summands. Biometrika, 63: 111--116.

Buishand, T.A., 1981. The analysis of homogeneity of long-term rainfall records in The Netherlands. R. Neth. Meteorol. Inst. (K.N.M.I.), De Bilt, Sci. Rep. No. 81-7.

Chernoff, H. and Zacks, S., 1964. Estimating the current mean of a normal distribution which is subjected to changes in time. Ann. Math. Stat., 35: 999--1018.

Craddock, J.M., 1979. Methods of comparing annual rainfall records for climatic pur- poses. Weather, 34: 332--346.

Page 17: Some methods for testing the homogeneity of rainfall records

27

Doob, J.L., 1949. Heuristic approach to the Kolmogorov--Smirnov theorems. Ann. Math. Stat., 20: 393--403.

Feller, W., 1951. The asymptot ic distr ibution of the range of sums of independent ran- dom variables. Ann. Math. Stat., 22: 427--432.

Gardner Jr., L.A., 1969. On detecting changes in the mean of normal variates. Ann. Math. Star., 40: 116--126.

Gomide, F.L.S., 1978. Markovian inputs and the Hurst phenomenon. J. Hydrol. , 37: 23--45.

Hinkley, D.V., 1970. Inference about the change-point in a sequence of random variables. Biometrika, 57: 1--17.

Owen, D.B., 1962. Handbook of Statistical Tables. Addison-Wesley, Reading, Mass. Searcy, J.K. and Hardison, C.H., 1960. Double-mass curves. In: Manual of Hydrology:

Part 1, General Surface Water Techniques. U.S. Geol. Surv., Water-Supply Pap., 1541-B: Washington, D.C., 31--59.

Sen, A. and Srivastava, M.S., 1975. On tests for detecting change in mean. Ann. Stat. , 3: 98--108.

Wallis, J.R. and O'Connell, P.E., 1973. Firm reservoir y i e l d - How reliable are historic hydrological records? Hydrol. Sci. Bull., 18: 347--365.

W.M.O. (World Meteorological Organization), 1966. Climatic change. World Meteorol. Org., Geneva, Tech. Note 79.

Worsley, K.J., 1979. On the l ikelihood rat io test for a shift in location of normal popula- tions. J. Am. Stat. Assoc., 74: 365--367.

Yevjevich, V. and Jeng, R.J., 1969. Properties of non-homogeneous hydrologic series. Colo. State Univ., For t Collins, Colo., Hydrol. Pap. 32.