Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a...

15
Systematic evaluation of autoregressive error models as post-processors for a probabilistic streamflow forecast system Martin Morawietz , Chong-Yu Xu, Lars Gottschalk, Lena M. Tallaksen Department of Geosciences, University of Oslo, P.O. Box 1047 Blindern, 0316 Oslo, Norway article info Article history: Received 20 December 2010 Received in revised form 2 May 2011 Accepted 5 July 2011 Available online 12 July 2011 This manuscript was handled by Andras Bardossy, Editor-in-Chief, with the assistance of Luis E. Samaniego, Associate Editor Keywords: Probabilistic forecast Post-processor Hydrologic uncertainty Autoregressive error model Ranked probability score Bootstrap summary In this study, different versions of autoregressive error models are evaluated as post-processors for prob- abilistic streamflow forecasts. The post-processors account for hydrologic uncertainties that are intro- duced by the precipitation–runoff model. The post-processors are evaluated with the discrete ranked probability score (DRPS), and a non-parametric bootstrap is applied to investigate the significance of dif- ferences in model performance. The results show that differences in performance between most model versions are significant. For these cases it is found that (1) error models with state dependent parameters perform better than those with constant parameters, (2) error models with an empirical distribution for the description of the standardized residuals perform better than those with a normal distribution, and (3) procedures that use a logarithmic transformation of the original streamflow values perform better than those that use a square root transformation. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction In recent years, the topic of probabilistic flow forecasting has gained increased attention in hydrological research and opera- tional applications. The traditional method of flow forecasting has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological model to attain a single deterministic flow value. However, it was recog- nised that such a forecast is often associated with considerable uncertainties, which need to be described as well in order to assist rational decision making. A catalyst in this development has been the development of meteorological ensemble forecasts (e.g. Molteni et al., 1996) that aim to describe the uncertainties of the meteorological forecasts. Many hydrological studies have focused on treating the input uncertainties of precipitation and temperature by using meteoro- logical ensemble forecasts as inputs to hydrological models (see for example the review on ensemble flood forecasting by Cloke and Pappenberger (2009)). However, in order to obtain a proper probabilistic flow forecast, all relevant sources of uncertainty in the hydrological modelling process should be addressed, not only the input uncertainties of forecast precipitation and temperature. Other uncertainties comprise the model uncertainty (model struc- ture and parameters), uncertainty of the initial states of the hydro- logical model at the time of the forecast as well as uncertainties of observed precipitation and temperature that drive the hydrological model up to the point where the forecast starts (these latter uncer- tainties influence the uncertainties of initial states). Ideally, a probabilistic forecast system would treat all possible sources of uncertainty explicitly. However, this seems both theo- retically and practically impossible (Cloke and Pappenberger, 2009). The complex interactions between the different sources of uncertainties and the character of the unknown make an explicit treatment impossible. Following the argument of Krzysztofowicz (1999), ‘‘... for the purpose of real-time forecasting it is infeasible, and perhaps unnecessary, to explicitly quantify every single source of uncertainty’’. Similarly, with respect to parameter uncertainty, Todini (2004) states ‘‘that in flood forecasting problems one is def- initely not interested in a parameter sensitivity analysis, but mainly focused at assessing the uncertainty conditional to the cho- sen model with its assumed parameter values’’. A compromise between explicit and lumped treatment of the different sources of uncertainty is laid out in the framework for probabilistic forecast described by Krzysztofowicz (1999). He proposes to treat those input variables that have the greatest impact on the uncertainty of the forecast explicitly; that means probability distributions of these variables are used as inputs to 0022-1694/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.jhydrol.2011.07.007 Corresponding author. Tel.: +47 22854908; fax: +47 22854215. E-mail address: [email protected] (M. Morawietz). Journal of Hydrology 407 (2011) 58–72 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Transcript of Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a...

Page 1: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Journal of Hydrology 407 (2011) 58–72

Contents lists available at ScienceDirect

Journal of Hydrology

journal homepage: www.elsevier .com/locate / jhydrol

Systematic evaluation of autoregressive error models as post-processorsfor a probabilistic streamflow forecast system

Martin Morawietz ⇑, Chong-Yu Xu, Lars Gottschalk, Lena M. TallaksenDepartment of Geosciences, University of Oslo, P.O. Box 1047 Blindern, 0316 Oslo, Norway

a r t i c l e i n f o

Article history:Received 20 December 2010Received in revised form 2 May 2011Accepted 5 July 2011Available online 12 July 2011This manuscript was handled by AndrasBardossy, Editor-in-Chief, with theassistance of Luis E. Samaniego, AssociateEditor

Keywords:Probabilistic forecastPost-processorHydrologic uncertaintyAutoregressive error modelRanked probability scoreBootstrap

0022-1694/$ - see front matter � 2011 Elsevier B.V. Adoi:10.1016/j.jhydrol.2011.07.007

⇑ Corresponding author. Tel.: +47 22854908; fax: +E-mail address: [email protected] (M.

s u m m a r y

In this study, different versions of autoregressive error models are evaluated as post-processors for prob-abilistic streamflow forecasts. The post-processors account for hydrologic uncertainties that are intro-duced by the precipitation–runoff model. The post-processors are evaluated with the discrete rankedprobability score (DRPS), and a non-parametric bootstrap is applied to investigate the significance of dif-ferences in model performance. The results show that differences in performance between most modelversions are significant. For these cases it is found that (1) error models with state dependent parametersperform better than those with constant parameters, (2) error models with an empirical distribution forthe description of the standardized residuals perform better than those with a normal distribution, and(3) procedures that use a logarithmic transformation of the original streamflow values perform betterthan those that use a square root transformation.

� 2011 Elsevier B.V. All rights reserved.

1. Introduction

In recent years, the topic of probabilistic flow forecasting hasgained increased attention in hydrological research and opera-tional applications. The traditional method of flow forecastinghas been based on using a deterministic meteorological forecastand transforming it through a deterministic hydrological modelto attain a single deterministic flow value. However, it was recog-nised that such a forecast is often associated with considerableuncertainties, which need to be described as well in order to assistrational decision making.

A catalyst in this development has been the development ofmeteorological ensemble forecasts (e.g. Molteni et al., 1996) thataim to describe the uncertainties of the meteorological forecasts.Many hydrological studies have focused on treating the inputuncertainties of precipitation and temperature by using meteoro-logical ensemble forecasts as inputs to hydrological models (seefor example the review on ensemble flood forecasting by Clokeand Pappenberger (2009)). However, in order to obtain a properprobabilistic flow forecast, all relevant sources of uncertainty inthe hydrological modelling process should be addressed, not onlythe input uncertainties of forecast precipitation and temperature.

ll rights reserved.

47 22854215.Morawietz).

Other uncertainties comprise the model uncertainty (model struc-ture and parameters), uncertainty of the initial states of the hydro-logical model at the time of the forecast as well as uncertainties ofobserved precipitation and temperature that drive the hydrologicalmodel up to the point where the forecast starts (these latter uncer-tainties influence the uncertainties of initial states).

Ideally, a probabilistic forecast system would treat all possiblesources of uncertainty explicitly. However, this seems both theo-retically and practically impossible (Cloke and Pappenberger,2009). The complex interactions between the different sources ofuncertainties and the character of the unknown make an explicittreatment impossible. Following the argument of Krzysztofowicz(1999), ‘‘. . . for the purpose of real-time forecasting it is infeasible,and perhaps unnecessary, to explicitly quantify every single sourceof uncertainty’’. Similarly, with respect to parameter uncertainty,Todini (2004) states ‘‘that in flood forecasting problems one is def-initely not interested in a parameter sensitivity analysis, butmainly focused at assessing the uncertainty conditional to the cho-sen model with its assumed parameter values’’.

A compromise between explicit and lumped treatment of thedifferent sources of uncertainty is laid out in the framework forprobabilistic forecast described by Krzysztofowicz (1999). Heproposes to treat those input variables that have the greatestimpact on the uncertainty of the forecast explicitly; that meansprobability distributions of these variables are used as inputs to

Page 2: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72 59

the deterministic hydrological model. In the case of flow forecast-ing, these inputs variables are forecasts of precipitation and, inaddition, forecasts of temperature in catchments where snow playsan important role for runoff formation. All other uncertainties arethen treated together in a lumped form as hydrologic uncertaintywith a so called hydrologic uncertainty processor, also calledpost-processor.

When describing the hydrologic uncertainty through a hydro-logic uncertainty processor for streamflow, the aim is to find thedistribution of future observed streamflow at time t, Qobs(t), condi-tional on the simulated streamflow at time t, Qsim(t), that is at-tained when the true values of precipitation and temperature areused to drive the hydrological model. The distribution can be con-ditioned on additional variables, and, analogue to the use of ob-served river stage at time t = 0 for a hydrologic uncertaintyprocessor for river stage forecasting (Krzysztofowicz and Kelly,2000), the observed streamflow at time t = 0 when the forecaststarts, Qobs(0), is used in a hydrologic uncertainty processor forstreamflow. The conditional distribution that is sought-after isthen u(Qobs(t)|Qsim(t), Qobs(0)).

The approach investigated in this paper is a direct estimation ofthe distribution u(Qobs(t)|Qsim(t), Qobs(0)). The distribution is de-scribed with the help of a first order autoregressive error modelof the form d0t ¼ ad0t�1 þ ret , where d0t is the model error of thedeterministic hydrological model (observed minus simulatedstreamflow) at time t, a and r are the parameters of the error model,and et is the residual error described through a probability distribu-tion j. Based on this equation, the distribution u of the observedstreamflow at time t with time t � 1 = 0 as the time where the fore-cast starts (time of the last observed data) is given as:

uðQ obsðtÞjQ simðtÞ;Q obsð0Þ;Q simð0ÞÞ

¼ jðQ obsðtÞ � QsimðtÞÞ � aðQ obsð0Þ � Q simð0ÞÞ

r

� �ð1Þ

Note that in this description the distribution is conditioned on anadditional variable, Qsim(0).

A similar approach of a direct estimation of the distributionu(Qobs(t)|Qsim(t), Qobs(0)) by using an autoregressive model wasproposed by Seo et al. (2006). However, in their formulation thetransformed observed streamflow itself is the autoregressive vari-able, not its error. Their autoregressive model has then the simu-lated streamflow at time t, Qsim(t), as exogenous variable, but themodel does not contain the simulated streamflow at time 0,Qsim(0).

In general, autoregressive error models have been used inhydrology in many different contexts. For example, Engeland andGottschalk (2002) and Kuczera (1983) use them in the frameworkof Bayesian parameter estimation, and Xu (2001) uses them tostudy the residuals of a water balance model. In the context of flowforecasting, they are used for updating of model outputs (e.g. Tothet al., 1999). However, when used for updating of model outputsonly the deterministic part of the error model is applied for cor-recting a deterministic forecast to another deterministic forecast.This paper analyses the use of an autoregressive error model ashydrologic uncertainty processor, also called post-processor,where not only the deterministic component of the error modelis used as for model updating, but the full distribution is applied.

When using autoregressive models for the description ofstreamflow or streamflow errors, three aspects are important.

(1) In the simplest application, the parameters of the autore-gressive model are assumed to be fixed. However, severalauthors propose the use of different parameters for differenthydrological or meteorological states. Lundberg (1982) andSeo et al. (2006), for example, use different parameters for

high and low flows, while Engeland and Gottschalk (2002)use a more detailed classification based on states of the vari-ables temperature, precipitation and snow depth.

(2) In order to make the residuals homoscedastic, a transforma-tion is often applied to the original observed and simulatedstreamflow values. Common transformations are the loga-rithmic transformation (e.g. Engeland and Gottschalk,2002) or the square root transformation (Xu, 2001), whichare (apart from a linear shift) special cases of the Box-Coxtransformation (Box and Cox, 1964).

(3) The errors et of the autoregressive model are usuallyassumed to be normally distributed. However, when anautoregressive error model is used as a hydrologic uncer-tainty processor to generate probabilistic forecasts, a viola-tion of the distributional assumptions will distort theresults of such a forecast. A straightforward alternative tosolve this problem is the use of an empirical distributiondefined through the empirical standardized residuals of thecalibration period. Application of this approach for a hydro-logic post-processor has to our knowledge so far not beendescribed in the literature.

Another important aspect for a proper evaluation of the resultsis the assessment of the significance of differences found in theevaluation measures. Such an evaluation does not always have tobe carried out through a formal analysis if sufficient experiencewith a certain subject can ensure that a subjective evaluation leadsto a reliable assessment. However, as there is so far relatively littleexperience in hydrological research with forecast evaluation mea-sures such as the discrete ranked probability score (DRPS), an ex-plicit treatment of the uncertainty of these evaluation measuresseems more appropriate. The flexible approach of bootstrap (Efronand Tibshirani, 1993) allows such an explicit evaluation of theuncertainty without the necessity of making distributionalassumptions of the variable being evaluated.

Based on these considerations, the main objective of this studyis an evaluation of different versions of autoregressive error mod-els as hydrologic uncertainty processors. The following aspects areinvestigated in particular:

(1) Use of state dependent parameters versus state independentparameters.

(2) Use of a logarithmic transformation of the original stream-flow values versus a square root transformation.

(3) Use of a standard normal distribution for the description ofthe standardized residuals versus an empirical distribution.

In addition, the application of bootstrap to the forecast evalua-tion measures to evaluate the significance of the results is demon-strated, and a discussion on the discrete ranked probability scorefor the evaluation of probabilistic streamflow forecasts is included.

The study was carried out by evaluating eight different autore-gressive error models as hydrologic uncertainty processors. A well-known precipitation–runoff model, the HBV model, was chosen asdeterministic hydrological model to which the uncertainty proces-sors were applied. The uncertainty processors were calibrated for55 catchments in Norway and evaluated using the discrete rankedprobability score in combination with a non-parametric bootstrap.

2. Methods

2.1. Deterministic hydrological model: HBV model

The HBV model (Bergström, 1976, 1992) can be characterised asa semi-distributed conceptual precipitation–runoff model. It

Page 3: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Table 1Aspects investigated.

(1) Parameters at and rt (2) Transformationtype

(3) Distribution jof the standardizedresiduals et

State dependent (SD) Logtransformation(Log)

Standard normaldistribution(Norm)

at ¼aiðtÞ þbst for QsimðtÞP qthresha�iðtÞ þb�st for QsimðtÞ< qthresh

�ot ¼ ln QobsðtÞst ¼ ln QsimðtÞ

lnrt ¼AiðtÞ þBst for QsimðtÞP qthreshA�iðtÞ þB�st for QsimðtÞ< qthresh

State independent (SI) Square roottransformation(Sqrt)

Empiricaldistribution(Emp)

at, rt constant ot ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiQ obsðtÞ

pst ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiQsimðtÞ

p

Table 2Model versions investigated in the study; abbreviations are defined in Table 1.

Version Label: Parameters.transformation.distribution

1 SD.Log.Norm2 SI.Log.Norm3 SD.Sqrt.Norm4 SI.Sqrt.Norm5 SD.Log.Emp6 SI.Log.Emp7 SD.Sqrt.Emp8 Si.Sqrt.Emp

60 M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72

distinguishes different elevation zones based on the hypsographiccurve, and for these elevation zones temperature and precipitationare adjusted according to temperature and precipitation gradientsand a temperature threshold to distinguish between rain andsnow. Within each elevation zone, different land use zones are dis-tinguished by allowing different parameters for certain model pro-cesses. Each zone runs individual routines for the snow and soilmoisture routine.

Since its original development in the early 1970s, the HBV modelhas been applied and modified in many different operational andresearch settings. The model version used in this study is the ‘‘Nor-dic’’ HBV model (Sælthun, 1996), which is used for operational flowforecasting at the Norwegian Water Resources and Energy Director-ate (NVE). The model is run with daily time steps with mean dailytemperature and accumulated daily precipitation as model inputsand mean daily streamflow as model output. The model was cali-brated by NVE for 117 catchments (Lawrence et al., 2009). The cal-ibration was carried out as an automated model calibration usingthe parameter estimation software PEST (Doherty, 2004), which isbased on an implementation of the Levenberg–Marquardt method(Levenberg, 1944; Marquardt, 1963). From the 117 catchments,55 catchments together with the calibrated HBV models were se-lected for this study based on a sufficiently long period of commondata. For these catchments, the HBV model calibration period was1981–2000 and the validation period were the combined periodsof 1961–1980 and 2001–2006. The Nash–Sutcliffe efficiency coeffi-cients NE (Nash and Sutcliffe, 1970) for the validation period rangefrom 0.50 to 0.90. Twenty catchments have a good model perfor-mance (NE P 0.80), 26 catchments have an intermediate modelperformance (0.65 6 NE < 0.8), and 9 catchments have a relativelyweak model performance (0.5 6 NE < 0.65).

2.2. Versions of post-processors for the HBV model

The simulation errors of the deterministic precipitation–runoffmodel are described through an autoregressive error model:

dt ¼ atdt�1 þ rtet ð2Þ

The simulation error dt is defined as the difference between thetransformed observed streamflow, ot, and transformed simulatedstreamflow, st:

dt ¼ ot � st ð3Þ

Parameters at and rt are the parameters of the error model, and et isthe standardized residual error described through a random vari-able with the probability density function j.

Solving the error model for the observed streamflow yields:

ot ¼ st þ atðot�1 � st�1Þ þ rtet ð4Þ

The density of the observed streamflow ot conditional on st, ot–1 andst–1 is then equal to the density of the value et that corresponds to ot

through Eq. (4):

uðot jst ; ot�1; st�1Þ ¼ jðetÞ with et ¼ðot � stÞ � atðot�1 � st�1Þ

rt

ð5Þ

i.e.

uðot jst ; ot�1; st�1Þ ¼ jðot � stÞ � atðot�1 � st�1Þ

rt

� �ð6Þ

Eq. (6) constitutes a post-processor. The aspects investigated are (1)parameter formulation, (2) transformation type and (3) distributiontype (Table 1). Each of the three aspects has two possible realiza-tions, and by combining these, in total eight model versions ofpost-processors are generated (Table 2).

In this study, the post-processors are applied to HBV-modeloutputs that are produced without additional updating procedureslike e.g. Kalman-filtering (Kalman, 1960) or variational data assim-ilation (Seo et al., 2003). Thus, updating is only done through theautoregressive error model itself. According to the classificationof updating procedures by the World Meteorological Organization(1992), this updating is classified as an updating of output vari-ables. In principle, the post-processors may also be applied to mod-el outputs that include other updating procedures like updating ofinput variables, state variables or model parameters. In this case,the post-processors would need to be calibrated on model outputsthat include these updating procedures.

2.2.1. State dependent parameter formulationAs state dependent parameter formulation, a parameter

description used at the Norwegian Water Resources and EnergyDirectorate (NVE) is applied (Langsrud et al., 1998). State depen-dence of the parameters is realized in three ways:

(1) Firstly, the parameters at and rt of the autoregressive errormodel are formulated to be linearly dependent on the trans-formed simulated streamflow st:

at ¼ aiðtÞ þ bst ð7Þln rt ¼ AiðtÞ þ Bst ð8Þ

(2) Secondly, the parameters ai(t) and Ai(t) of the linear relationscan assume different values, depending on the states definedthrough the variables observed temperature Tt, observedprecipitation Pt and simulated snow water equivalent SWEt

at time t. It is distinguished if the temperature is below orabove 0 �C, if precipitation occurs or not, and if the snowwater equivalent is above or below a certain threshold valueswethresh; if the amounts of snow are below swethresh, thecatchment is assumed to behave as a snow free catchment.

Page 4: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Table 3States distinguished for different conditions of observed temperature Tt, observedprecipitation Pt, simulated snow water equivalent SWEt, and simulated streamflowQsim(t) at time t, and the corresponding parameters of the error model.

Meteorological and snow states i(t) States of simulated streamflow

Qsim(t) P qthresh Qsim(t) < qthresh

Tt 6 0 �C 1 a1, A1 a�1;A�1

Tt > 0 �C and Pt = 0 mm andSWEt 6 swethresh

2 a2, A2 a�2;A�2

Tt > 0 �C and Pt = 0 mm andSWEt > swethresh

3 a3, A3 b, B a�3;A�3 b⁄, B⁄

Tt > 0 �C and Pt > 0 mm andSWEt 6 swethresh

4 a4, A4 a�4;A�4

Tt > 0 �C and Pt > 0 mm andSWEt > swethresh

5 a5, A5 a�5;A�5

1

. . .

. . .

.

ε

F(ε

)

01/M2/M3/M

(M−2)/M(M−1)/M

Fig. 1. Schematic diagram of the empirical distribution function F(e) of thestandardized empirical residuals e.

M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72 61

Through the combination of the three variables, five differ-ent states i(t) e {1, 2, . . . , 5} are distinguished (Table 3). Theclassification into the five states is based on conceptual con-siderations about the presence or absence of different pro-cesses which may result in different error behaviour. Fortemperatures below zero, precipitation is accumulated inthe snow storage and streamflow comes mainly from baseflow. For temperatures above zero, different processes takeplace both in the model and in the real catchment, depend-ing on if precipitation is present or not and if a snow pack ispresent or not.

(3) Thirdly, two different sets of parameters aj, b, Aj, B,j=1, . . . , 5, are used, depending on whether the simulatedstreamflow at time t is above or below a flow thresholdqthresh (Table 3).

Summarizing points 1–3, the state dependent parameters at

and rt are formulated as

at ¼aiðtÞ þ bst for Q simðtÞP qthresh

a�iðtÞ þ b�st for Q simðtÞ < qthresh

(ð9Þ

lnrt ¼AiðtÞ þ Bst for QsimðtÞP qthresh

A�iðtÞ þ B�st for QsimðtÞ < qthresh

(ð10Þ

The threshold for snow water equivalent was chosen as follows.A catchment is assumed to behave as snow free when the simu-lated snow cover falls below 10%. The threshold swethresh is thendetermined as the average snow water equivalent that corre-sponds to a snow cover of 10%.

As threshold qthresh to distinguish between high and low flows,the 75th percentile of the observed streamflow of the calibrationperiod is used.

2.2.2. Empirical distribution functionThe empirical distribution function is based on the empirical

standardized residuals

et ¼dt � atdt�1

rtð11Þ

of the error model, obtained by solving Eq. (2) for et. The set of stan-dardized empirical residuals em;m 2 f1;2; . . . ;Mg, is calculated fromall days m = 1, . . . , M of the calibration period after the parametersof the error model have been estimated. The empirical distributionfunction F(e) (Fig. 1) of the standardized residual e is then defined asthe step function

FðeÞ ¼ 1M

XM

m¼1

Iðem 6 eÞ ð12Þ

where I(A) is an indicator variable that is 1 if A is true and 0 if A isfalse. For values of e that lie outside the observed range of residuals,the non-exceedance probability is 0 or 1.

No further attempt was made to smooth the distribution func-tion or use some other plotting position formula to determine thepoints of the distribution function. Given the large number ofpoints it is not expected that such refinements would lead to anymeasurable changes in the results.

2.3. Estimation of the parameters of the error model

2.3.1. Models with state independent parametersFor models with state independent parameters, Eq. (2)

constitutes a simple linear regression model. Parameter a isestimated using ordinary least squares, and parameter r is esti-mated as the root mean square of the residuals of the regressionmodel.

2.3.2. Models with state dependent parametersFor models with state dependent parameters (Eqs. (9) and (10)),

Eq. (2) can be rewritten as:

dt ¼aiðtÞdt�1 þ bstdt�1 þ expðAiðtÞ þ BstÞet for Q simðtÞP qthresh

a�iðtÞdt�1 þ b�stdt�1 þ expðA�iðtÞ þ B�stÞet for Q simðtÞ < qthresh

(

ð13Þ

The first two terms on the right hand side of Eq. (13) comprise thedeterministic component of the error model, while the third termconstitutes the stochastic component.

The parameter estimation for this model type follows an iter-ative two-step procedure (Langsrud et al., 1998). In the first step,the parameters of the deterministic part of the error model,aj; b; a�j ; b

�, are estimated in a weighted linear regression. In thesecond step, the parameters of the stochastic component,Aj;B;A

�j ;B

�, are estimated in a generalized linear regression withlogarithmic link function and Gamma distribution. Steps oneand two are then repeated using the results from the second stepto update the weights of the linear regression of the first step;repetitions are continued until the parameter estimates converge.A detailed description of the estimation procedure is given inAppendix A.

2.3.3. Practical adjustments for the parameter estimation2.3.3.1. Exclusion of the smallest streamflow values for 15 catch-ments. For several catchments, the smallest streamflow valueswere removed from the data series due to problems with the log-arithmic transformation.

Page 5: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

62 M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72

(1) For 13 catchments, the streamflow series of observed or sim-ulated streamflow contained instances (days) with zero val-ues. However, the logarithm is only defined for values largerthan zero. Therefore, instances with zero values wereremoved from the data series of the respective catchments.

(2) For streamflow values Q that tend towards zero, Q ? 0, thelog transformed values tend towards minus infinity,ln Q ? �1. As a result, infinitesimal differences betweenQobs(t) and Qsim(t) in the original scale can become arbitrarilylarge in the transformed variable space. For the residualset = rtet of an autoregressive error model, this may then leadto the opposite of what the transformation should achieve;instead of harmonizing the residuals, the variance of theresiduals becomes arbitrarily large for Qobs(t) ? 0 orQsim(t) ? 0. While for the majority of catchments, the loga-rithmic transformation worked well (see example plot ofthe squared residuals e2

t versus ln Qsim(t) for catchmentLosna, Fig. 2a), for 10 catchments an arbitrary increase ofthe squared residuals for the smallest streamflow valueswas found (e.g. catchment Reinsnosvatn, Fig. 2b). This ledto a non-convergence of the calibration routine for theparameters A�i and B⁄ in these catchments. Based on visualinspection of the plots of the squared residuals e2

t versuslnQsim(t), instances with very small values of observed orsimulated streamflow were therefore removed from therespective streamflow series.

Based on points (1) and (2) above, the streamflow series of 15catchments were truncated by their smallest flow values. Thecatchments might also have been completely excluded from thestudy. However, this would have substantially reduced the numberof catchments available. As the focus of this study is not on thesmallest values of the flow spectrum but rather on the generalrange and high flows, it is considered that such a truncation is rea-sonable and that it is preferable to include these data series in thestudy.

2.3.3.2. Minimum number of data per class. For each of the 10 classesdefined by meteorological and snow states and states of simulatedstreamflow (Table 3), a sufficient number of data should be avail-able for estimation of the parameters of the respective class. There-fore a minimum requirement of 60 instances per class wasintroduced. In case the number is less, the respective class is

Fig. 2. Example plots of squared residuals e2t ¼ r2

t e2t versus the logarithm of simulate

parameters (SD) and logarithmic transformation (Log). (a) Catchment Losna; the squarCatchment Reinsnosvatn; strong deviation from homoscedastic behaviour for small stre

merged with another class for estimation of common parameters.A merging scheme was developed based on the similarity ofparameter estimates of the different classes. The merging schemewas developed from 35 catchments that had sufficient data in all10 classes.

In the final parameter estimation for the models with statedependent parameters, the complete number of 24 parameterswas estimated in 43% of the estimations. For another 43%, thenumber of estimated parameters was reduced to 22. Twentyparameters were estimated in 11% of the estimations, and only in3% of the estimations the number of estimated parameters was lessthan 20.

2.4. Evaluation of the error model

2.4.1. The discrete ranked probability score (DRPS)The discrete ranked probability score (DRPS; Murphy, 1971;

Toth et al., 2003) was chosen as forecast evaluation measure. It isa summary score that allows an evaluation of the forecast with re-spect to a number of user specified thresholds.

There are several definitions of the discrete ranked probabilityscore. The essence of all definitions is the same, i.e. the DRPS eval-uates the squared differences (pk � ok)2 between the cumulativedistribution function of the forecast, p, and the cumulative distri-bution function of a perfect forecast (observation), o, at some pre-defined thresholds, xk, k = 1, 2, . . . , K (Fig. 3).

Table 4 gives an overview of different definitions found in theliterature. The different formulations lead to differences in rangeand orientation of the score. The first definition defines the DRPSas the sum of the squared differences (Murphy, 1971; Wilks,1995), leading to a range of [0, K]. The second definition uses themean of the squared differences (e.g. Toth et al., 2003), resultingin a standardized range [0, 1]. For both definitions the orientationof the score is negative, i.e. the best score has the lowest value ofzero. In the third definition, the score is inverted to a positive ori-entation. In addition, the range is adjusted by adding a constant ofone. This third version was used in the original definition of theranked probability score by Epstein (1969).

Since the definitions of the DRPS given above are linear trans-formations of each other, the information conveyed through thedifferent scores is essentially the same. So far it does not seem thatone of the above definitions has become a definitive standard. Forthis publication, the definition according to the second equation is

d streamflow ln Qsim(t) for selected catchments for models with state dependented residuals are relatively homoscedastic over the whole range of flow values. (b)amflow values.

Page 6: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Non

−exc

eeda

nce

prob

abilit

y

x1 x2 x3 . . . xK−2 xK−1 xK

popk − ok

0.0

0.2

0.4

0.6

0.8

1.0

Fig. 3. Definition of the discrete ranked probability score.

Table 4Definitions of the discrete ranked probability score (DRPS) for a single event.

No Formulaa Range Orientation Source

1 PKk¼1ðpk � okÞ2 [0, K] Negative Murphy (1971)

Wilks (1995)

2 1K

PKk¼1ðpk � okÞ2 [0, 1] Negative Bougeault (2003)b

Nurmi (2003)Toth et al. (2003)WWRP/WGNE JointWorking Group onForecast VerificationResearch (2010)

3 1� 1K

PKk¼1ðpk � okÞ2 [0, 1] Positive Epstein (1969)c

Stanski et al. (1989)

a The formulas may use symbols or formulations different from the respectivesources in the last column, but are mathematically equivalent.

b No explicit formula is given, but the definition is described in the text.c Mathematical equivalence of the original definition by Epstein (1969) with the

formula given here is demonstrated by Murphy (1971).

M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72 63

chosen. On the one hand, a standardized range seems preferableover a range that is dependent on the number of thresholds as indefinition 1. On the other hand, definition 2 is more direct andintuitive than definition 3. The latter may be an important aspectwhen probabilistic forecasting and forecast verification should becommunicated to a wider audience.

Following Toth et al. (2003), the DRPS for one event is calculatedas follows. K thresholds x1 < x2 < � � � < xK for the continuous variableX are selected. These thresholds define the events Ak = {X 6 xk},k = 1, 2, . . . , K, with the corresponding forecast probabilities p1,p2, . . . , pK. Analogue, for each event Ak a binary indicator variableok is defined that indicates if the event Ak occurs or not, i.e. ok = 1if Ak occurs, and ok = 0 otherwise. The DRPS for one event is thencalculated as

DRPSind ¼ 1K

XK

k¼1

ðpk � okÞ2 ð14Þ

In this study, the continuous variable X is daily streamflow, and asingle event is the daily streamflow occurring on a certain day ina certain catchment.

To evaluate the performance of a probabilistic forecast modelin one catchment over a certain time period, the average of theDRPSind values for the individual days n = 1, 2, . . . , N is calcu-lated as

DRPS ¼ 1N

XN

n¼1

DRPSindn ð15Þ

Furthermore, to evaluate and compare the overall performance ofdifferent forecast models over a number of catchments, the averageof the DRPS values for catchments c = 1, 2, . . . , C is calculated as

DRPS ¼ 1C

XC

c¼1

DRPSc ð16Þ

To actually calculate a DRPS for an event, the K thresholds x1,x2, . . . , xK have to be specified. Jaun and Ahrens (2009) in their eval-uation study of a probabilistic hydrometeorological forecast sys-tem chose four thresholds (the 25th, 50th, 75th and 95thpercentile of the historic streamflow record). However, when sucha small number of thresholds is chosen, differences between thecumulative distribution functions of the forecast and the corre-sponding observation might not be captured adequately. There-fore, if the purpose of the study does not implicate the selectionof certain specific thresholds, one should aim to select a relativelylarge number of thresholds that extend over the whole range ofvalues. For this study, 99 thresholds were chosen as the 1st,2nd, . . . , 99th percentile of the flow record of each catchment. Notethat the selection of the thresholds on the basis of flow quantilesalso allows for a direct comparison of DRPS values from differentcatchments and justifies the calculation of averages of DRPS values

over several catchments as indicated by Eq. (16). The comparabilityover catchments is also reflected in the decomposition of the DRPSwhere the uncertainty component assumes the same value in allcatchments when flow quantiles are selected as thresholds (seeSection 2.4.3).

2.4.2. Significance of differences of the DRPS: bootstrapThe overall evaluation of the eight error models is done by cal-

culating the average discrete ranked probability score over the 55catchments, DRPS, according to Eq. (16) for each model version. Inorder to assess if differences between the scores of the differentmodel versions are significant, confidence intervals are estimatedusing a non-parametric bootstrap (Efron and Tibshirani, 1993). Be-cause of the correlations between the DRPS values of differentmodel versions for individual catchments (see Section 3.1,Figs. 5–7), a construction of confidence intervals for the DRPS val-ues directly would not be helpful because the correlations wouldnot allow a distinction of significant differences based on overlap-ping or non-overlapping confidence intervals. Instead, confidenceintervals for the differences between DRPS values of different modelversions are calculated as described in the following.

Let xc, c = 1, 2, . . . , 55, be the DRPS values (Eq. (15)) for one ver-sion of the error model, say version 1, for the individual catchments1, 2, . . . , 55. And let yc, c = 1, 2, . . . , 55, be the corresponding DRPSvalues for another version of the error model, say version 2. Thenthe mean discrete ranked probability scores over the 55 catch-ments, DRPS (Eq. (16)), are DRPSð1Þ ¼ 1

55

P55c¼1xc and DRPSð2Þ ¼

155

P55c¼1yc for model versions 1 and 2, respectively. Let now

zc = xc � yc, c = 1, 2, . . . , 55, be the differences between the DRPSof the two model versions in the individual catchments. Then thedifference of the mean values DRPSð1Þ and DRPSð2Þ is equal tothe mean of the differences, �z:

DRPSð1Þ � DRPSð2Þ ¼ 155

X55

c¼1

zc ¼ �z ð17Þ

Thus, estimating a confidence interval for the difference of the meanvalues is equivalent to estimating a confidence interval for themean of the differences.

We now regard the differences zc, c = 1, 2, . . . , 55, as a sample ofan unknown population Z, which reflects the distribution of differ-ences that might occur in general. A confidence interval for themean value of samples of size 55 from this unknown populationis estimated with non-parametric bootstrap, using the sample ofdifferences zc, c = 1, 2, . . . , 55, as a surrogate of the unknown pop-ulation Z. The steps are as follows.

Page 7: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Fig. 4. Overview of the 55 catchments and streamflow stations in Norway. Eight catchments are nested in a larger catchment.

64 M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72

(1) Generate a bootstrap sample by sampling 55 times withreplacement from the original sample of zc values. The boot-strap sample consists of the 55 values z�i , i=1, 2, . . . , 55.(Note: the indices i of the bootstrap sample have no relationto the catchment numbers c; each value z�i can be equal toany of the values zc from the original sample.)

(2) Calculate the mean value of the bootstrap sample, �z�, as

�z� ¼ 155

X55

i¼1

z�i ð18Þ

(3) Repeat steps 1 and 2 up to n-times, n being the number ofrepetitions. For each new bootstrap sample, a new meanvalue of the differences, �z�j, j=1, 2, . . . , n, is calculated accord-ing to Eq. (18).

From the sample of bootstrap replicates of the mean values �z�j,j = 1, 2, . . . , n, a confidence interval for the mean of the differencescan now be derived. The most straight forward method, a simplepercentile mapping (Efron and Tibshirani, 1993), is used. Thatmeans the 95% confidence interval is estimated through the 2.5thand 97.5th percentile of the distribution of �z�j values as lowerand upper limit of the confidence interval, respectively.

If the value of zero lies outside the confidence interval, the dif-ference between the two model versions is regarded as signifi-

cantly different from zero. A number of n = 100,000 bootstrapreplications was used for this study.

2.4.3. Decomposition of the DRPSAnalogue to the Brier score (Brier, 1950), the DRPS can be

decomposed into three components. The Brier score decomposi-tion (Murphy, 1973) into reliability (REL), resolution (RES) anduncertainty (UNC) is based on the calculation of mean observed fre-quencies conditioned/stratified on different forecast probabilities.To avoid problems with sparseness and make the estimated condi-tional mean values less uncertain, it is recommendable to dividethe interval of forecast probabilities [0, 1] into a finite set of non-overlapping bins l = 1, . . . , L. When such stratification over bins ofprobabilities is used, the decomposition of the Brier score yieldstwo extra components, the within-bin variance and within-bincovariance (Stephenson et al., 2008). Including these extra compo-nents into the resolution term, which then is labelled as generalizedresolution (GRES), the decomposition of the Brier score can be writ-ten as (Stephenson et al., 2008)

BSk ¼ RELk � GRESk þ UNCk ð19Þ

with

RELk ¼1N

XL

l¼1

Nlð�pl � �olÞ2 ð20Þ

Page 8: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Log.Norm

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (SD)

DRPS

(SI)

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (SD)

DRPS

(SI)

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (SD)

DRPS

(SI)

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (SD)

DRPS

(SI)

(a) Sqrt.Norm(b) Log.Emp(c) Sqrt.Emp(d)

Fig. 5. DRPS values of models with state independent parameters (SI) versus DRPS values of models with state dependent parameters (SD) for the independent validation1984–2005 (Val.2). Each point shows the relation of the scores of two different models in one catchment. (a) DRPS(SI.Log.Norm) versus DRPS(SD.Log.Norm). (b)DRPS(SI.Sqrt.Norm) versus DRPS(SD.Sqrt.Norm). (c) DRPS(SI.Log.Emp) versus DRPS(SD.Log.Emp). (d) DRPS(SI.Sqrt.Emp) versus DRPS(SD.Sqrt.Emp).

SD.Norm

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (Log)

DRPS

(Sqrt)

DRPS

(Sqrt)

DRPS

(Sqrt)

DRPS

(Sqrt)

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (Log)0.01 0.04 0.07

0.01

0.04

0.07

DRPS (Log)0.01 0.04 0.07

0.01

0.04

0.07

DRPS (Log)

(a) SI.Norm(b) SD.Emp(c) SI.Emp(d)

Fig. 6. DRPS values of models with square root transformation (Sqrt) versus DRPS values of models with log transformation (Log) for the independent validation 1984–2005(Val.2). Each point shows the relation of the scores of two different models in one catchment. (a) DRPS(SD.Sqrt.Norm) versus DRPS(SD.Log.Norm). (b) DRPS(SI.Sqrt.Norm)versus DRPS(SI.Log.Norm). (c) DRPS(SD.Sqrt.Emp) versus DRPS(SD.Log.Emp). (d) DRPS(SI.Sqrt.Emp) versus DRPS(SI.Log.Emp).

SD.Log SI.Log SD.Sqrt SI.Sqrt

DRPS (Emp)

DRPS

(Norm

)

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (Emp)

DRPS

(Norm

)

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (Emp)

DRPS

(Norm

)

0.01 0.04 0.07

0.01

0.04

0.07

DRPS (Emp)

DRPS

(Norm

)

0.01 0.04 0.07

0.01

0.04

0.07

(a) (b) (c) (d)

Fig. 7. DRPS values of models with normal distribution (Norm) versus DRPS values of models with empirical distribution (Emp) for the independent validation 1984–2005(Val.2). Each point shows the relation of the scores of two different models in one catchment. (a) DRPS(SD.Log.Norm) versus DRPS(SD.Log.Emp). (b) DRPS(SI.Log.Norm) versusDRPS(SI.Log.Emp). (c) DRPS(SD.Sqrt.Norm) versus DRPS(SD.Sqrt.Emp). (d) DRPS(SI.Sqrt.Norm) versus DRPS(SI.Sqrt.Emp).

M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72 65

GRESk ¼1N

XL

l¼1

Nlð�ol � �oÞ2 � 1N

XL

l¼1

XNl

j¼1

ðplj � �plÞ2

þ 2N

XL

l¼1

XNl

j¼1

ðolj � �olÞðplj � �plÞ ð21Þ

UNCk ¼ �oð1� �oÞ ð22Þ

BSk is the Brier score for the forecast of the event Ak as defined inSection 2.4.1 for one catchment averaged over a total number ofN forecasts (days); Nl is the number of forecast probabilities that fallinto the lth bin; plj, j = 1, . . . , Nl, are the forecast probabilities fallinginto the lth bin, and olj denote the binary observations (0 or 1) cor-responding to the plj; the average of the forecast probabilities of thelth bin is calculated as �pl ¼ ð1=NlÞ

PNlj¼1plj, and �ol ¼ ð1=NlÞ

PNlj¼1olj is

the corresponding average of the binary observations;�o ¼ ð1=NÞ

PLl¼1

PNlj¼1olj is the overall mean of the binary observations,

i.e. the climatological base rate.The DRPS from Eq. (15) can be formulated as the mean of the

Brier scores over all thresholds k = 1, . . . , K, i.e. DRPS ¼

ð1=KÞPK

k¼1BSk (Toth et al., 2003). Thus, a decomposition of theDRPS can be given as

DRPS ¼ DREL� DGRESþ DUNC ð23Þ

with DRPS reliability DREL ¼ ð1=KÞPK

k¼1RELk, DRPS generalized res-olution DGRES ¼ ð1=KÞ

PKk¼1GRESk, and DRPS uncertainty

DUNC ¼ ð1=KÞPK

k¼1UNCk .To calculate the DRPS decomposition components in this study,

L = 10 equally spaced bins were chosen. Based on the selection ofthe thresholds xk as quantiles of the flow distribution, the uncer-tainty component DUNC assumes the same value in all catchments.With xk as the 1st, . . . , 99th percentile, the theoretical value of theuncertainty component in this study is given asDUNC ¼ ð1=99Þ

P�o2f0:01;...;0:99g�oð1� �oÞ ¼ 16:665=99 � 0:168 .

2.5. Catchments and data

Fifty-five catchments distributed over the whole of Norwaywere selected (Fig. 4). The selection was based on a common per-iod of data from 1.9.1961 to 31.12.2005. The catchments are spread

Page 9: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Table 5Overview of the five validations carried out for each model version in each catchment.

Label Validation type Period for validation Period for parameterestimation

Cal.1 + 2 Dependent 1 + 2 (1962–2005) 1 + 2 (1962–2005)Cal.1 Dependent 1 (1962–1983) 1 (1962–1983)Cal.2 Dependent 2 (1984–2005) 2 (1984–2005)Val.1 Independent 1 (1962–1983) 2 (1984–2005)Val.2 Independent 2 (1984–2005) 1 (1962–1983)

66 M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72

relatively evenly over Norway with some clustering in the south-east. Catchment sizes vary from 6 to 15,450 km2. The majority ofthe catchments (45) are smaller than 1000 km2 and of these 21are between 100 and 300 km2. The mean catchment elevationsrange from 181 to 1460 m above mean sea level. They are rela-tively evenly distributed over the range of elevations with somepredominance in the interval 400–900 m. The mean annual runoffranges from 370 to 3236 mm/year. Also the runoff values are rela-tively evenly distributed over the range of runoff values.

The data used in the study are daily data of mean air tempera-ture, accumulated precipitation and mean streamflow for the 55catchments. Station measurements of mean daily temperatureand accumulated daily precipitation have been interpolated to a1 � 1 km grid by the Norwegian Meteorological Institute (Mohrand Tveito, 2008). Catchment values of mean daily temperature(Tt) and accumulated daily precipitation (Pt) are then extractedfrom the grid as mean values of the grid cells that lie within theboundaries of the respective catchments. The streamflow dataare series of mean daily streamflow (Qobs(t)) from the NorwegianWater Resources and Energy Directorate.

2.6. Error model calibration and validation procedure

The HBV model was run for the complete period in the 55 catch-ments, generating time series of simulated streamflow Qsim(t) andsimulated snow water equivalent SWEt. The first four months ofthe model run were discarded as spin-up period and the remainingperiod (1.1.1962–31.12.2005) was kept to investigate the eightversions of the error model.

For each version of the error model, three different parametersets were estimated from three different periods in each of the55 catchments:

� Period 1: 1.1.1962–31.12.1983.� Period 2: 1.1.1984–31.12.2005.� Period 1 + 2: 1.1.1962–31.12.2005.

The eight versions were then evaluated in three dependent andtwo independent validations in each catchment (Table 5). In thedependent validations, the error model was applied to the samedata that was used for estimation of the parameters of the errormodel. In the independent validations, the error model was appliedto independent data that had not been used in the estimation ofthe parameters of the error model.

For each day of a validation period, a probabilistic forecast wasgenerated using Eq. (6). Based on the forecasts distributionsu(ot|st, ot�1, st�1) and the actual observations ot, the discreteranked probability score DRPS was calculated according to Eq.(15). This was done for all 55 catchments, for all 8 model versions(Table 3) and for all 5 validations (Table 5), i.e. altogether 2200DRPS values were calculated.

For this study, a time step of one day was used as interval be-tween t � 1 and t, corresponding to the basic time step of the pre-cipitation–runoff model. In general, the error models may also be

applied as post-processors for longer lead times. This can then bedone in two ways. Either the error model is applied recursivelywith one parameter set estimated for the basic time step; or the er-ror model is applied with different parameter sets that are esti-mated for different intervals between t � 1 and t.

3. Results

3.1. Model performances in the individual catchments

Figs. 5–7 show plots of the DRPS for the 55 catchments forone model version versus the DRPS for another model versionfor selected model combinations. All plots are for the indepen-dent validation for the period 1984–2005 (Val.2). The plots forthe other four validations in Table 5 are similar to the corre-sponding plots of the independent validation 1984–2005 andare therefore not shown. Fig. 5 shows the four plots of stateindependent (SI) models versus the corresponding state depen-dent (SD) models, Fig. 6 shows the four plots of square roottransformed (Sqrt) models versus the corresponding log trans-formed (Log) models, and Fig. 7 shows the four plots of modelswith normal distribution (Norm) versus the corresponding mod-els with empirical distribution (Emp). The following characteris-tics can be seen:

(a) A strong correlation between DRPS values of different modelversions is visible in all plots. The correlation varies to somedegree from plot to plot, but overall all plots show a distinctcorrelation.

(b) The differences between DRPS values of different model ver-sions in the same catchment (vertical deviations of thepoints from the 1:1 line) are in all plots considerably smallerthan the range of DRPS values for one model version over the55 catchments (maximum extent of the points in x or ydirection).

These two characteristics (a) and (b) show that a main influ-ence on the performance of a probabilistic forecast that is basedon the post-processing of a deterministic forecast with some kindof autoregressive error model lies in the performance of the deter-ministic forecast model in combination with the strength of theautoregressive behaviour of the errors of the deterministic forecastmodel. In a catchment where the deterministic forecast modelperforms well or the autocorrelation of the errors of the determin-istic forecast is high, the performance of the probabilistic forecastin terms of the DRPS will in general be good, and the influence ofthe specific implementation of the post-processor is of minorimportance if one compares the results with a catchment wherethe deterministic forecast model has a weak performance andthe autocorrelation of the model errors is weak. This illustratesthat the error model can not ‘cure’ a poor deterministic forecastmodel. Though it may produce a probabilistic forecast that is wellcalibrated, the resolution of the forecast, which has a main influ-ence on the values of the DRPS (see Section 3.4.1), will alwaysbe poor.

(c) As a third characteristic, the point clouds show a systematicshift from the 1:1 line in many of the plots. The shift is verydistinct in some of the plots, for example Fig. 5b, while inother plots, for example Fig. 7b, the shift is not that obvious.Still, on closer inspection, most of the plots seem to exhibit asystematic shift. Such a systematic shift reflects an on aver-age better performance of one version of the error modelover another version with respect to the evaluation measureDRPS.

Page 10: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Cal.1+2 Cal.1 Val.1 Cal.2 Val.2

0.02

20.

024

0.02

60.

028

0.03

00.

032

0.03

4

Period and Validation Type

DRPS

Normal Distribution (Norm)SI.SqrtSD.Sqrt

SI.LogSD.Log

Empirical Distribution (Emp)SI.SqrtSD.Sqrt

SI.LogSD.Log

Fig. 8. Values of DRPS for the eight versions of the error model for the differentperiods with dependent (white background) and independent (grey background)validation.

M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72 67

3.2. Model performances averaged over all catchments

To further investigate and compare the average performance ofthe eight error models, the average discrete ranked probabilityscore over all catchments, DRPS, was calculated according to Eq.(16). Fig. 8 shows the values of DRPS for all five validations.

For the three dependent validations (white backgrounds), it isapparent that the scores in period 2, Cal.2, are consistently bettercompared with corresponding models in period 1, Cal.1. The scoresfor the dependent validation for the complete period, Cal.1 + 2, liebetween the scores of periods 1 and 2. Thus, the period of data hasa clear impact on the performance of the error models in terms ofthe absolute values of DRPS. The differences between the DRPS val-ues of period 1 and period 2 for the same model version can be muchlarger than differences between different model versions in thesame period.

The scores for the two independent validations (grey back-grounds) show a similar behaviour as the dependent validationsof the corresponding periods in that the scores of period 2 are bet-ter for all model versions compared with period 1 for the corre-sponding error models. However, the differences are lesspronounced than for the dependent validation.

The better performance of period 2 over period 1 found for boththe dependent and independent validation may be explainedthrough the correlation found between performance of the post-processors in terms of DRPS and the Nash–Sutcliffe efficiency coef-ficients of the underlying HBV models (see Section 3.4.2). For mostof the catchments (45 out of 55), the HBV model performance isbetter in period 2 than in period 1, and this is correlated with bet-ter performance of the post-processors in terms of DRPS, which isthen reflected in the average DRPS as well.

When comparing scores between independent and dependentvalidations for the same period, there is no consistent change ofthe scores from the dependent to the independent validation.Though for period 2, the scores of the independent validation,Val.2, are a bit worse than the corresponding scores of the depen-dent validation, Cal.2, the performance of the independent valida-tion for period 1, Val.1, is more or less the same as for thedependent validation, Cal.1. This indicates that none of the eightversions of the error model has a clear over-parameterization inthe sense that the model performance would strongly deterioratein periods with independent data.

1 Colour for web version / black-and-white for print version.

3.3. Significance of differences in model performances

We will now look more specifically at the differences in modelperformance between the different model versions, namely the dif-ferences between (1) models with state dependent parameters(SD) and corresponding models with state independent parameters(SI), (2) models with square root transformation (Sqrt) and corre-sponding models with log transformation (Log) and (3) modelswith normal distribution (Norm) and corresponding models withempirical distribution (Emp). Figs. 9–11 show the differences be-tween the average discrete ranked probability scores DRPS (Eq.(16)) of corresponding model versions according to the aspects 3,1 and 2, respectively. Error bars indicate the 95% bootstrap confi-dence intervals.

The results for the differences between models with normal dis-tribution (Norm) and models with empirical distribution (Emp)(Fig. 9) are very clear. In all five validations, the differences be-tween model versions with normal distribution (Norm) and thecorresponding model versions with empirical distribution (Emp)are significantly different from zero, and all are above zero. Thismeans that on average models that use an empirical distributionperform significantly better than models that use the normal dis-

tribution. The degree of improvement in absolute values is largestfor the SI.Sqrt.Norm model (magenta/grey1 diamonds in Fig. 9),which shows the worst model performance with respect to DRPSin all five validations compared with the other model versions (ma-genta/grey diamonds in Fig. 8). The second largest improvement isgiven for the SI.Log.Norm model (blue/black diamonds in Fig. 9),which scores second worst of all models with normal distribution(red/grey circles in Fig. 8).

The results for the comparison of state dependent (SD) versusstate independent (SI) models are also clear (Fig. 10). All differ-ences are larger than zero and all differences except one(Log.Emp models in Val.2) are significantly different from zero.That means that model versions with state dependent parameters(SD) perform on average significantly better than the correspond-ing model versions with state independent parameters (SI). Thelargest improvement of DRPS (blue/black diamonds in Fig. 10)is again given for the version with the worst model performancein terms of DRPS, SI.Sqrt.Norm (magenta/grey diamonds in Fig. 8).The second largest improvement is given for SI.Sqrt.Emp (magen-ta/grey diamonds in Fig. 10), which has the second poorest modelperformance in the class of SI models (magenta/grey pluses inFig. 8).

The results for models with log transformation (Log) versusmodels with square root transformation (Sqrt) are more complex.For the SI models (diamonds in Fig. 11) all differences are positiveand all except one (SI.Emp in Val.1) are significantly different fromzero. That means that for state independent models the model ver-sions that use log transformation perform on average better thanmodels with square root transformation. Again, the improvementis largest for the SI.Sqrt.Norm model (blue/black diamonds inFig. 11), the model with the worst model performance in termsof DRPS (magenta/grey diamonds in Fig. 8). For the SD models how-ever (cirlces in Fig. 11) none of the differences is significantly dif-ferent from zero. That means that no significant difference inmodel performance can be detected for log versus square roottransformation in the case of state dependent models.

Page 11: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

SD.LogSI.LogSD.SqrtSI.Sqrt

Cal.1+2 Cal.1 Val.1 Cal.2 Val.2

−0.0

04−0

.002

0.00

00.

002

0.00

40.

006

0.00

8

Period and Validation Type

DRPS

( Norm

)−DRPS

(Emp)

Fig. 9. Differences of values of DRPS between models with normal distribution(Norm) and models with empirical distribution (Emp) with 95% bootstrap confi-dence intervals; the x-axis distinguishes the different periods with dependent(white background) and independent (grey background) validation.

Log.NormSqrt.NormLog.EmpSqrt.Emp

Cal.1+2 Cal.1 Val.1 Cal.2 Val.2

−0.0

04−0

.002

0.00

00.

002

0.00

40.

006

0.00

8

Period and Validation Type

DRPS

(Si)

−DRPS

( SD

)

Fig. 10. Differences of values of DRPS between models with state independentparameters (SI) and models with state dependent parameters (SD) with 95%bootstrap confidence intervals; the x-axis distinguishes the different periods withdependent (white background) and independent (grey background) validation.

SD.NormSI.NormSD.EmpSI.Emp

Cal.1+2 Cal.1 Val.1 Cal.2 Val.2

−0.0

04−0

.002

0.00

00.

002

0.00

40.

006

0.00

8

Period and Validation Type

DRPS

(Sqrt)

−DRPS

(Log

)Fig. 11. Differences of values of DRPS between models with square root transfor-mation (Sqrt) and models with log transformation (Log) with 95% bootstrapconfidence intervals; the x-axis distinguishes the different periods with dependent(white background) and independent (grey background) validation.

68 M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72

3.4. Decomposition of the DRPS and correlations with catchmentparameters

3.4.1. Decomposition of the DRPSThe decomposition of the DRPS in the individual catchments

according to Eq. (23) is shown in Fig. 12 for the SD.Log.Norm modelfor the independent validation in period 1 (Val.1). The featureshighlighted below are representative for the behaviour of equiva-lent plots for the other model versions and other validations.

The theoretical value of the uncertainty DUNC is indicated as ahorizontal line. The uncertainty values calculated for the individual

catchments show some deviation from the theoretical values. Thisis an artefact of the sampling uncertainty for the flow distributionsin the different periods. To have a consistent basis for all DRPS cal-culations, the same thresholds xk were used in all DRPS calcula-tions, based on the percentiles of the flow distribution of thecomplete period of data (period 1 + 2). As the flow distributionsin the other periods are slightly different, uncertainty values forvalidations in period 1 and period 2 show slight deviations fromthe theoretical uncertainty value.

When comparing the influence of the reliability DREL and gen-eralized resolution DGRES on the final DRPS values, it is apparentthat the main influence is given by the resolution component,while the reliability values DREL are all close to zero.

3.4.2. Correlation between DRPS and catchment characteristicsCorrelations of the DRPS values in the individual catchments

with the catchment characteristics area and runoff coefficient, aswell as with the Nash–Sutcliffe efficiency coefficients of the HBVmodels, were investigated. Correlations with the runoff coefficientswere found to be very week; the Pearson product-moment correla-tion coefficients for the different model versions and validations liebetween 0.07 and 0.26. Moderate negative correlations were foundfor the correlation of DRPS with the logarithm of the catchmentarea; the correlation coefficients lie between �0.41 and �0.51.The strongest (negative) correlations were found for DRPS withthe Nash–Sutcliffe efficiency coefficients; the correlation coeffi-cients lie between �0.64 and �0.80. This reflects the generallyincreasing performance of the probabilistic forecasts in the individ-ual catchments with increasing performance of the underlyingdeterministic precipitation–runoff model.

4. Discussion

4.1. Normal distribution versus empirical distribution of thestandardized residuals

The models using an empirical distribution function to describethe standardized residuals were found to perform significantly

Page 12: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

0 10 20 30 40 50

−0.2

−0.1

0.0

0.1

0.2

Catchment number

DRPSDUNC

DREL−DGRES

Fig. 12. Decomposition of the DRPS values for the 55 catchments for theSD.Log.Norm model in the independent validation for period 1 (Val.1).

M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72 69

better than corresponding models that use a standard normal dis-tribution. This finding is not too surprising when looking at quan-tile–quantile plots of the standardized empirical residuals (Fig. 13).Those plots show a clear deviation of the distribution of the stan-dardized empirical residuals from the standard normal distribu-tion. Thus, when using a standard normal distribution one cannot expect to receive optimal results, and an empirical distributionfunction proved to be superior.

Still, some care has to be taken when assessing the approach ofusing an empirical distribution function as carried out in thisstudy. When applying the same empirical distribution on eachday, one assumes implicitly that the distribution of the standard-ized residuals is the same for each day. However, there is no theo-retical basis that would warrant this assumption and it seemsplausible that, in the same way that the regression parameters ofthe autoregressive model may be different for different conditions,the standardized residuals might be described through differentdistributions for different conditions as well. In this case, the use

Fig. 13. Quantile–quantile plots for the empirical standardized residuals et assuming a(period 1.1.1962–31.12.2005 with parameters estimated from the same period). (a) Comleaving out the 2.5% smallest and 2.5% highest values of et .

of the same empirical distribution for each day, though superiorto the use of the standard normal distribution, is still sub-optimal.Identifying different distributions for different conditions might bemore difficult as larger samples are necessary to clearly identify adistribution than to estimate regression parameters, and a furtherinvestigation of this aspect was beyond the scope of this study. Gi-ven the importance of using a correct distribution for generating avaluable (calibrated) probabilistic forecast, it is an aspect wherefurther improvements of autoregressive hydrologic uncertaintyprocessors could be made if different distributions were presentand could be identified.

4.2. State dependent parameters versus state independent parameters

Considering the simplicity of the autoregressive model of Eq. (2)it seems reasonable to assume that the parameters of the autore-gressive model might not be constant but vary depending on thestates of the model or the environment. Indeed, the results showedclearly that the simple autoregressive models with state indepen-dent parameters have a significantly poorer performance than thecorresponding models with state dependent parameters. The statedependent parameterization chosen for this study is relatively de-tailed. It is possible that a simpler classification scheme might leadto an equivalent performance, or that alternative types of classifi-cations may lead to similar or even better performances. Furtherinvestigation of these aspects was beyond the scope of this study.However, it was clearly shown that for an autoregressive errormodel used as hydrologic uncertainty processor a significantimprovement of the performance is achieved through a statedependent parameterization compared to a simple model withstate independent parameters.

4.3. Log transformation versus square root transformation

For state independent models it was found that the models withlogarithmic transformation perform significantly better than mod-els with square root transformation. An explanation for this behav-iour can be found when looking at plots of the standardizedresiduals et versus simulated streamflow Qsim(t). For most catch-ments, plots of models with logarithmic transformation show afairly homoscedastic behaviour, while plots of models with squareroot transformation reveal a systematic increase of the variance ofthe residuals with increasing streamflow values. Fig. 14 gives anexample of the plots for the catchment Fustvatn. The x-axis inthese plots is scaled according to the rank of the simulated stream-flow. This is done to assure a constant density of the points in thex-dimension. Otherwise, if the density in x-direction is very inho-

standard normal distribution as theoretical distribution for the catchment Bulkenplete data. (b) The same plot as a, but only displaying the central 95% of the data,

Page 13: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

Fig. 14. Plots of the standardized residuals et versus the rank of the simulated streamflow Qsim(t) for models with state independent parameters for the catchment Fustvatn.(a) Model with logarithmic transformation of the original streamflow values. (b) Model with square root transformation of the original streamflow values.

70 M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72

mogeneous as it is usually the case for streamflow values on a lin-ear or logarithmic scale, the visual impression might be distortedand a set of homoscedastic values may appear as strongly non-homoscedastic. In Fig. 14a the standardized residuals for the stateindependent model with logarithmic transformation show a fairlyhomoscedastic behaviour, while in Fig. 14b the residuals for thestate independent model with square root transformation showan increasing variance with increasing streamflow values. Thus,the assumption of a state independent variance r is less justifiedfor the SI models with square root transformation, and this is re-flected in their inferior DRPS compared to the SI models with log-arithmic transformation.

However, for models with state dependent parameters, there isno significant difference in the performance between models withlogarithmic transformation and models with square root transfor-mation. The formulation of rt as dependent on the simulatedstreamflow (Eq. (10)) and the other flexibilities introduced withthe state dependent formulation, can account for the more non-homoscedastic behaviour and other deficiencies that the SI modelswith square root transformation might have compared to the mod-els with logarithmic transformation. The similar performance ofLog and Sqrt models shows that the choice of transformationlooses its importance for the models with state dependent param-eters. It is likely that for a range of transformations of the Box–Coxtype (Box and Cox, 1964) that lie in between the special cases ofthe logarithmic and square root transformation, the differencesintroduced by different transformations will be levelled outthrough the flexibility introduced with the state dependentformulation.

5. Summary and conclusions

Eight different versions of autoregressive error models wereinvestigated as hydrologic uncertainty processors for probabilisticstreamflow forecasting. Evaluation with the discrete ranked prob-ability score as forecast evaluation measure gave the followingmain findings.

(1) The variance of DRPS values for the same model version overdifferent catchments is larger than differences between dif-ferent model versions in the same catchment. This reflectsthe strong dependence of the quality of the probabilisticforecast on the quality of the underlying (updated) deter-ministic forecast.

(2) Given a certain catchment with its deterministic preci-pitation–runoff model, significant differences in modelperformance between different versions of the autoregres-sive hydrologic uncertainty processors could be detected:

(a) Models with state dependent parameters perform signif-

icantly better than corresponding models with stateindependent parameters.

(b) Models using an empirical distribution function todescribe the standardized residuals perform significantlybetter than corresponding models using a standard nor-mal distribution.

(c) For models with state independent parameters, thosewith a logarithmic transformation of the original stream-flow values perform significantly better than those with asquare root transformation. However, for models withstate dependent parameters, this significance disappearsand there is no difference in the performance of the loga-rithmic versus the square root transformation. The expla-nation is found in the flexibility that is introduced withthe state dependent formulation, which can account forand alleviate the more non-homoscedastic behaviour thatis found for the square root transformation.

The results give guidance when using an autoregressive errormodel as hydrologic uncertainty processor. If a simple model withconstant parameters and the assumption of a standard normal dis-tribution for the standardized residuals is chosen, the choice oftransformation used to attain homoscedastic residuals is impor-tant, and the logarithmic transformation was in this study clearlysuperior over the square root transformation. If a more complexmodel is chosen, both use of an empirical distribution functionand a formulation with state dependent parameters will lead toan improved performance of the uncertainty processor. The bestmodel performance is attained for models that use both an empir-ical distribution and state dependent parameters. For this type ofmodels and with the formulation of state dependence as used inthis study, the transformation type has no longer a significantinfluence on the model performance.

Aspects that might lead to further improvements not investi-gated in this study are:

(1) The use of state dependent empirical distributions for thestandardized residuals analogue to the use of state depen-dent parameters.

Page 14: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72 71

(2) The investigation of alternative state dependent parameter-ization schemes. The scheme presented is relatively detailed.There might be less complex schemes that exhibit an equiv-alent performance as the one used in this study, or alterna-tive schemes with better performances.

In addition to the findings, the study discussed the use of thediscrete ranked probability score as an evaluation measure that al-lows evaluation over the whole range of streamflow values (apartfrom a certain discretization) and a direct averaging of the scoresover several catchment for an overall evaluation and comparisonof different methods.

Acknowledgements

We thank the Norwegian Water Resources and Energy Director-ate (NVE) for the provision of the data, the ‘‘Nordic’’ HBV modelprogram and computing facilities. We further thank Thomas Skau-gen and Elin Langsholt for their information on the ‘‘Nordic’’ HBVmodel code and the uncertainty procedures operational at NVE.We are also very grateful to Lukas Gudmundsson and Klaus Vor-moor for their discussions and comments, and we want to thanktwo anonymous reviewers for their valuable comments, whichhelped to improve the paper.

Appendix A. Parameter estimation for models with statedependent parameters

The iterative two-step procedure for parameter estimation formodel versions with state dependent parameters works as follows.Input data to the calibration procedure are time series of the vari-ables dt, dt�1, st, Tt, Pt, SWEt and Qsim(t), as defined in Section 2.2. Inthe beginning, the data set is divided into two subsets 1 and 2 forcases with Qsim(t) P qthresh and Qsim(t) < qthresh respectively. Param-eters aj, b, Aj, B are derived from subset 1, and parametersa�j ; b

�;A�j ;B

� are derived from subset 2. In the following paragraphs,derivation of the parameters for subset 1 is described. Derivation ofthe parameters for subset 2 is analogue.

� Step 1: The deterministic part of the error model is a linearmodel with response variable dt and two predictor variablesdt�1 and zt = stdt�1

dt ¼ aiðtÞdt�1 þ bzt ðA1Þ

The parameters are estimated with ordinary least squares, whichin the case of normal distribution of the residuals are equivalentto maximum likelihood estimates.� Step 2: With the parameters estimated in the first step the

residuals of the deterministic component are calculated as

et ¼ dt � aiðtÞdt�1 � bzt ðA2Þ

Under the assumption of normal distribution of the residuals et,the squared residuals e2

t follow a Chi-square distribution withone degree of freedom, which is a special case of the Gamma dis-tribution. Also, the expected value of the squared residuals isequal to the variance

Eðe2t Þ ¼ r2

t ðA3ÞWith the relation of the standard deviation given in Eq. (10) itfollows that:

ln r2t ¼ 2 ln rt ¼ 2AiðtÞ þ 2Bst ðA4Þ

Combining Eqs. (A3) and (A4) leads to

lnðEðe2t ÞÞ ¼ 2AiðtÞ þ 2Bst ðA5Þ

If the log-transformed values of the expected values of thesquared residuals e2

t are described through a linear predictorand the squared residuals themselves follow a Gamma distri-bution, then Eq. (A5) constitutes a generalized linear model(Faraway, 2006) with response variable e2

t , predictor variablest, link function as the natural logarithm and Gamma distri-bution of the response variable. The parameters of thegeneralized linear model are estimated using iterativelyreweighted least squares (IRWLS), which are equivalent tomaximum likelihood estimates (Faraway, 2006; McCullaghand Nelder, 1989).� Iteration: Once parameters Aj and B are estimated, the vari-

ance r2t is calculated for each time step from Eq. (10). Then

steps one and two are iterated with the only modificationthat the parameters in step one are now estimated with aweighted linear regression using the reciprocal of the vari-ance 1=r2

t as weights. The rationale is that cases with a largervariance in the linear regression of step one should receiveless weight than cases with a smaller variance. The iterationsare stopped once convergence of the parameters is obtained.Praxis showed that the algorithm converges very quickly.Ten iterations were enough to assure convergence of theparameters in this study.

For the sake of clarity, regression Eqs. (A1) and (A5) have beenwritten without individual appearance of the parameters a1, . . . , a5

and A1, . . . , A5. However, individual appearance of the parametersas it is necessary for a formal derivation of the regression parame-ters can be achieved by introducing five indicator variables

Ij;t ¼1 for iðtÞ ¼ j0 for iðtÞ – j

�; j ¼ 1; . . . ;5 ðA6Þ

where i(t) is the number of the meteorological and snow state as de-fined in Table 1. Defining further the five variables dj,t�1 = Ij,t � dt�1,j = 1, . . . , 5, regression Eqs. (A1) and (A5) can be rewritten as

dt ¼ a1d1;t�1 þ a2d2;t�1 þ a3d3;t�1 þ a4d4;t�1 þ a5d5;t�1 þ bzt ðA7Þ

and

lnðEðe2t ÞÞ ¼ 2A1I1;t þ 2A2I2;t þ 2A3I3;t þ 2A4I4;t þ 2A5I5;t þ 2Bst

ðA8Þ

respectively.Under the assumption that the standardized residuals et in Eq.

(13) are standard normally distributed, the ordinary least squaresestimates of step one and the iteratively reweighted least squaresfrom step two are equivalent to Maximum likelihood estimates. Ifthe distributional assumptions are violated the estimates are nolonger strict maximum likelihood estimates. However, it is as-sumed that moderate violations of the distributional assumptionsdo not have a strong negative effect on the estimates of theparameters.

References

Bergström, S., 1976. Development and Application of a Conceptual Runoff Model forScandinavian Catchments. SMHI RHO 7. Swedish Meteorological andHydrological Institute, Norrköping, Sweden.

Bergström, S., 1992. The HBV Model – its Structure and Applications. SMHI ReportsHydrology No. 4, Swedish Meteorological and Hydrological Institute,Norrköping, Sweden.

Bougeault, P., 2003. The WGNE Survey of Verification Methods for NumericalPrediction of Weather Elements and Severe Weather Events. Météo-France,Toulouse, France. <http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/Bougeault/Bougeault_Verification-methods.htm> (accessed 26.11.10).

Box, G.E.P., Cox, D.R., 1964. An analysis of transformations. J. Roy. Stat. Soc. Ser. B 26(2), 211–252.

Brier, G.W., 1950. Verification of forecasts expressed in terms of probability. Mon.Weather Rev. 78 (1), 1–3. doi:10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.

Page 15: Journal of Hydrologyfolk.uio.no/chongyux/papers_SCI/jhydrol_25.pdf · has been based on using a deterministic meteorological forecast and transforming it through a deterministic hydrological

72 M. Morawietz et al. / Journal of Hydrology 407 (2011) 58–72

Cloke, H.L., Pappenberger, F., 2009. Ensemble flood forecasting: a review. J. Hydrol.375 (3–4), 613–626. doi:10.1016/j.jhydrol.2009.06.005.

Doherty, J., 2004. PEST: Model Independent Parameter Estimation. User Manual,fifth ed. Watermark Numerical Computing, Brisbane, Australia.

Engeland, K., Gottschalk, L., 2002. Bayesian estimation of parameters in a regionalhydrological model. Hydrol. Earth Syst. Sci. 6 (5), 883–898. doi:10.5194/hess-6-883-2002.

Efron, B., Tibshirani, R., 1993. An Introduction to the Bootstrap. Chapman & Hall,New York, NY, US.

Epstein, E.S., 1969. A scoring system for probability forecasts of ranked categories. J.Appl. Meteorol. 8 (6), 985–987. doi:10.1175/1520-0450(1969)008<0985:ASSFPF>2.0.CO;2.

Faraway, J.J., 2006. Extending the Linear Model with R: Generalized Linear, MixedEffects and Nonparametric Regression Models. Chapman & Hall/CRC, BocaRaton, FL, US.

Jaun, S., Ahrens, B., 2009. Evaluation of a probabilistic hydrometeorological forecastsystem. Hydrol. Earth Syst. Sci. 13 (7), 1031–1043. doi:10.5194/hess-13-1031-2009.

Kalman, R.E., 1960. A new approach to linear filtering and prediction problems. J.Basic Eng. 82 (1), 35–64.

Krzysztofowicz, R., 1999. Bayesian theory of probabilistic forecasting viadeterministic hydrological model. Water Resour. Res. 35 (9), 2739–2750.doi:10.1029/1999WR900099.

Krzysztofowicz, R., Kelly, K.S., 2000. Hydrologic uncertainty processor forprobabilistic river stage forecasting. Water Resour. Res. 36 (11), 3265–3277.doi:10.1029/2000WR900108.

Kuczera, G., 1983. Improved parameter inference in catchment models: 1.Evaluating parameter uncertainty. Water Resour. Res. 19 (5), 1151–1162.doi:10.1029/WR019i005p01151.

Langsrud, Ø., Frigessi, A., Høst, G., 1998. Pure Model Error of the HBV Model. HYDRANote No. 4, Norwegian Water Resources and Energy Directorate, Oslo, Norway.

Lawrence, D., Haddeland, I., Langsholt, E., 2009. Calibration of HBV HydrologicalModels Using PEST Parameter Estimation. Report No. 1 – 2009, NorwegianWater Resources and Energy Directorate, Oslo, Norway.

Levenberg, K., 1944. A method for the solution of certain problems in least squares.Quart. Appl. Math. 2, 164–168.

Lundberg, A., 1982. Combination of a conceptual model and an autoregressivemodel for improving short time forecasting. Nord. Hydrol. 13 (4), 233–246.

Marquardt, D., 1963. An algorithm for least-squares estimation of non-linearparameters. J. Soc. Ind. Appl. Math. 11 (2), 431–441. doi:10.1137/0111030.

McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, second ed. Chapman &Hall, London, UK.

Mohr, M., Tveito, O.E., 2008. Daily temperature and precipitation maps with 1 kmresolution derived from Norwegian weather observations. In: ExtendedAbstract from 17th Conference on Applied Climatology, AmericanMeteorological Society, 11–14 August, Whistler, BC, Canada. <http://ams.confex.com/ams/pdfpapers/141069.pdf> (accessed 26.11.10).

Molteni, F., Buizza, R., Palmer, T.N., Petroliagis, T., 1996. The ECMWF ensembleprediction system: methodology and validation. Quart. J Royal Meteorol. Soc.122 (529), 73–119. doi:10.1002/qj.49712252905.

Murphy, A.H., 1971. A note on the ranked probability score. J. Appl. Meteorol. 10 (1),155–156. doi:10.1175/1520-0450(1971)010<0155:ANOTRP>2.0.CO;2.

Murphy, A.H., 1973. A new vector partition of the probability score. J. Appl.Meteorol. 12 (4), 595–600.

Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models:Part I – A discussion of principles. J. Hydrol. 10 (3), 282–290. doi:10.1016/0022-1694(70)90255-6.

Nurmi, P., 2003. Recommendations on the Verification of Local Weather Forecasts(at ECMWF Member States). Consultancy Report, ECMWF OperationsDepartment. <http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/Rec_FIN_Oct.pdf> (accessed 26.11.10).

Seo, D.-J., Koren, V., Cajina, N., 2003. Real-time variational assimilation of hydrologicand hydrometeorological data into operational hydrologic forecasting. J.Hydrometeor. 4 (3), 627–641.

Seo, D.-J., Herr, H.D., Schaake, J.C., 2006. A statistical post-processor for accountingof hydrologic uncertainty in short-range ensemble streamflow prediction.Hydrol. Earth Syst. Sci. Discuss. 3 (4), 1987–2035. doi:10.5194/hessd-3-1987-2006.

Stanski, H.R., Wilson, L.J., Burrows, W.R., 1989. Survey of common verificationmethods in meteorology. WMO World Weather Watch Technical Report No. 8,WMO/TD No. 358, second ed. Atmospheric Environment Service, Downsview,Canada. <http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/Stanski_et_al/Stanski_et_al.html> (accessed 26.11.10).

Stephenson, D.B., Coelho, C.A.S., Jolliffe, I.T., 2008. Two extra components in theBrier score decomposition. Weather Forecast. 23 (4), 752–757. doi:10.1175/2007WAF2006116.1.

Sælthun, N.R., 1996. The ‘‘Nordic’’ HBV model. Norwegian Water Resources andEnergy Administration Publication No. 7, Oslo, Norway.

Todini, E., 2004. Role and treatment of uncertainty in real-time flood forecasting.Hydrological Processes 18 (14), 2743–2746. doi:10.1002/hyp.5687.

Toth, E., Montanari, A., Brath, A., 1999. Real-time flood forecasting via combined useof conceptual and stochastic models. Phys. Chem. Earth, Part B 24 (7), 793–798.doi:10.1016/S1464-1909(99)00082-9.

Toth, Z., Talagrand, O., Candille, G., Zhu, Y., 2003. Probability and ensembleforecasts. In: Jolliffe, I.T., Stephenson, D.B. (Eds.), Forecast Verification: aPractitioner’s Guide in Atmospheric Science. John Wiley & Sons, Chichester,UK, pp. 137–164.

Wilks, D.S., 1995. Statistical Methods in Atmospheric Sciences: an Introduction.International Geophysics Series, vol. 59. Academic Press, San Diego, CA, US.

World Meteorological Organization, 1992. Simulated Real-time Intercomparison ofHydrological Models. Operational Hydrology Report No. 38, WMO PublicationNo. 779, World Meteorological Organization, Geneva, Switzerland.

WWRP/WGNE Joint Working Group on Forecast Verification Research, 2010.Forecast verification: Issues, methods and FAQ. <http://www.cawcr.gov.au/projects/verification/> (accessed 26.11.10).

Xu, C.-Y., 2001. Statistical analysis of parameters and residuals of a conceptualwater balance model – methodology and case study. Water Resour. Manage. 15(2), 75–92. doi:10.1023/A:1012559608269.