SAMS 2000 User's Manual.pdf

8/10/2019 SAMS 2000 User's Manual.pdf

1/96

Stochastic Analysis,

Modeling, and Simulation (SAMS)

Version 2000USER's MANUAL

J. D. Salas, N. Saada, C. H. Chung, W. L. Lane, and D. K. Frevert

October, 2000

Computing Hydrology Laboratory

Water Resources, Hydrologic and Environmental Sciences

Engineering Research Center

Fort Collins, Colorado

TECHNICAL REPORT No.10


2/96

1Professor, Water Resources, Hydrologic and Environmental Sciences, Civil Engineering

Department, Colorado State University.

2Former graduate students, Water Resources, Hydrologic and Environmental Sciences , CivilEngineering Department, Colorado State University.

3Consultant, Hydrology and Water Resources Engineering, 1091 Xenophon St., Golden, CO

80401-4218.

4Hydraulic Engineer, Water Resources Services, Technical Service Center, U.S Bureau of

Reclamation, Denver, CO 80225.

Stochastic Analysis, Modeling, and

Simulation (SAMS)

Version 2000 - User's Manual

by

Jose D. Salas1, Nidhal Saada2, and Chen-hua Chung2

Water Resources, Hydrologic and Environmental Sciences

Department of Civil Engineering, Colorado State University

Fort Collins, Colorado, U.S.A

William L. Lane3Consultant, Hydrology and Water Resources Engineering,

1091 Xenophon St., Golden, CO 80401-4218.

and

Donald K. Frevert4

U.S Department of Interior

Bureau of Reclamation

Denver, Colorado

U.S.A


3/96

i

TABLE OF CONTENTS

PagePREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. DESCRIPTION OF SAMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.1 General Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Statistical Analysis of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Fitting a Stochastic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Generating Synthetic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3. DEFINITION OF STATISTICAL CHARACTERISTICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.1 Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1.1 Annual Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.1.2 Seasonal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 Flood, Storage, and Drought Related Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.1 Storage Related Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.2.2 Drought Related Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.3 Surplus Related Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4. MATHEMATICAL MODELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.1 Data Transformations and Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 Univariate ARMA (p,q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 Univariate GAR (1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.4 Univariate PARMA (p,q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.5 Multivariate MAR (p) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.6 Multivariate CARMA (p,q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.7 Multivariate MPAR (p) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.8 Disaggregation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.8.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.8.2 Model Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.9 Model Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5. EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.1 Statistical Analysis of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.2 Stochastic Modeling and Generation of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2.1 Univariate ARMA(p,q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.2 Univariate GAR(1) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2.3 Univariate PARMA(p,q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655.2.4 Multivariate MAR(p) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.2.5 Multivariate CARMA(p,q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2.6 Disaggregation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

APPENDIX A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87APPENDIX B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90APPENDIX C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91


4/96

ii

PREFACE

Several computer packages has been developed since the 1970's for analyzing the stochasticcharacteristics of time series in general and hydrologic and water resources time series in particular.For instance, the LAST package was developed in 1977-1979 by the US Bureau of Reclamation

(USBR) in Denver, Colorado. Originally the package was designed to run on a mainframecomputer, but later it was modified for use on personal computers. While various additions andmodifications have been made to LAST over the past twenty years, the package has not kept pacewith either advances in time series modeling or advances in computer technology. These factsprompted USBR to promote the initial development of SAMS, a computer software package thatdeals with the Stochastic Analysis, Modeling, and Simulation of hydrologic time series, particularlyannual and seasonal streamflow series. It is written in C and Fortran and runs under modernwindows operating systems such as WINDOWS NT and WINDOWS 98. This manual describesthe current version of SAMS denoted as SAMS 2000.

ACKNOWLEDGEMENTS

SAMS has been developed as a cooperative effort between USBR and Colorado StateUniversity (CSU) under USBR Advanced Hydrologic Techniques Research Project through anInteragency Personal Agreement with Professor Jose D. Salas as Principal Investigator. Drs. W.L.Lane and D.K. Frevert provided additional expert guidance and supervision on behalf of USBR.Several former CSU graduate students collaborated in various parts of this project including, M.W.AbdelMohsen, who developed many of the Fortran codes, M. Ghosh who initiated the programmingin C language followed by Mr. Bradley Jones, Nidhal M. Saada, and Chen-Hua Chung.Acknowledgements are due to the funding agency and to the several students who collaborated inthis project.


5/96

1

STOCHASTIC ANALYSIS, MODELING, AND SIMULATION

(SAMS 2000)

1. INTRODUCTION

Stochastic simulation of water resources time series in general and hydrologic time series

in particular has been widely used for several decades for various problems related to planning and

management of water resources systems. Typical examples are determining the capacity of a

reservoir, evaluating the reliability of a reservoir of a given capacity, evaluation of the adequacy of

a water resources management strategy under various potential hydrologic scenarios, and evaluating

the performance of an irrigation system under uncertain irrigation water deliveries (Salas et al, 1980;

Loucks et al, 1981).

Stochastic simulation of hydrologic time series such as streamflow is typically based on

mathematical models. For this purpose a number of stochastic models have been suggested in

literature (Salas, 1993; Hipel and McLeod, 1994). Using one type of model or another for a

particular case at hand depends on several factors such as, physical and statistical characteristics of

the process under consideration, data availability, the complexity of the system, and the overall

purpose of the simulation study. Given the historical record, one would like the model to reproduce

the historical statistics. This is why a standard step in streamflow simulation studies is to determine

the historical statistics. Once a model has been selected, the next step is to estimate the model

parameters, then to test whether the model represents reasonably well the process under

consideration, and finally to carry out the needed simulation study.

The advent of digital computers several decades ago led to the development of computer

software for mathematical and statistical computations of varied degree of sophistication. For

instance, well known packages are IMSL, STATGRAPHICS, ITSM, MINITAB, SAS/ETS, SPSS,

and MATLAB. These packages can be very useful for standard time series analysis of hydrological

processes. However, despite of the availability of such general purpose programs, specialized

software for simulation of hydrological time series such as streamflow, have been attractive because

of several reasons. One is the particular nature of hydrological processes in which periodic

properties are important in the mean, variance, covariance, and skewness. Another one is that some

hydrologic time series include complex characteristics such as long term dependence and memory.


6/96

2

Still another one is that many of the stochastic models useful in hydrology and water resources have

been developed specifically oriented to fit the needs of water resources, for instance temporal and

spatial disaggregation models. Examples of specific oriented software for hydrologic time series

simulation are HEC-4 (U.S Army Corps of Engineers, 1971), LAST (Lane and Frevert, 1990), and

SPIGOT (Grygier and Stedinger, 1990).

The LAST package was developed during 1977-1979 by the U. S. Bureau of Reclamation

(USBR). Originally, the package was designed to run on a mainframe computer (Lane, 1979) but

later it was modified for use on personal computers (Lane and Frevert, 1990). While various

additions and modifications have been made to LAST over the past 20 years, the package has not

kept pace with either advances in time series modeling or advances in computer technology. This

is especially true of the computer graphics. These facts prompted USBR to promote the initial

development of the SAMS package. The first version of SAMS (SAMS-96.1) was released in 1996.

Since then, corrections and modifications were made based on feedback received from the users.

In addition, new functions and capabilities have been implemented.

SAMS 2000 has the following capabilities and limitations:

1. Analyze annual and seasonal data. For seasonal data the maximum number of seasons is 12 (time

intervals within a year).

2. It includes several types of transformation options to transform the original data into normal.

3. It includes a number of single site, multisite, and disaggregation stochastic models that have been

widely used in literature.

4. It includes two major modeling schemes for modeling and generation of complex river network

systems.

5. Maximum number of stations is 40.

6. Maximum number of stations for a group (for purposes of multivariate disaggregation) is 10.

7. Maximum number of years for the input data file is 600.

8. The number of samples that can be generated is unlimited.

9. The number of years that can be generated is unlimited.

The purpose of this manual is to provide a detailed description of the current version of

SAMS developed for the stochastic simulation of hydrologic time series such as annual and monthly

streamflows.


7/96

3

Fig. 1 SAMS main menu

Fig. 2 File menu

2. DESCRIPTION OF SAMS

In section 2.1, a general description of

SAMS is presented in which different operations

undertaken by SAMS are briefly explained.

Then, each operation is explained and illustrated

in subsequent sections more thoroughly.

2.1 General Overview

SAMS is a computer software package

that deals with the stochastic analysis, modeling,

and simulation of hydrologic time series. It is

written in C and Fortran and runs under modern

windows operating systems such as WINDOWS

NT and WINDOWS 98. The package consists of many menu option windows which enables the

user to choose between different options that are currently available. SAMS 2000 is a modified and

expanded version of SAMS-96.1. It consists of three primary application modules: 1) Statistical

Analysis of Data, 2) Fitting a Stochastic Model (includes parameter estimation and testing), and 3)

Generating Synthetic Series. Figure 1 shows the SAMS main menu. The user can select any of the

main modules by clicking on the desired option shown in this menu. Before running the

applications, the user must select (open) a file

that contains the (historical) input data. This can

be done by clicking on the "File Menu" option

shown on the top part of the main menu. This

will take the user to another menu, as shown in

Fig.2. Then the user may Open A File (select

a data file) and Display Current Data File

where the content of the opened file can be seen.

Examples of seasonal and annual input files are

shown in Appendices A and B, respectively.

SAMS has the capability of analyzing

single site and multisite annual and seasonal data


8/96

4

and the results of the analysis are presented in graphical or tabular forms or are written on output

files. The current version of SAMS can be applied to annual and seasonal data, such as quarterly

and monthly data.

The Statistical Analysis of Data module consists of data plotting, checking the normality

of the data, data transformation, and data statistical characteristics. Plotting the data may help

detecting trends, shifts, outliers, or errors in the data. Probability plots are included for verifying

the normality of the data. The data can be transformed to normal by using different transformation

techniques. Currently, logarithmic, power, and Box-Cox transformations are available. SAMS

determines a number of statistical characteristics of the data. These include basic statistics such as

mean, standard deviation, skewness, serial correlations (for annual data), season-to-season

correlations (for seasonal data), annual and seasonal cross-correlations for multisite data, and

drought, surplus, and storage related statistics. These statistics are important in investigating the

stochastic characteristics of the data.

The second main application of SAMS Fitting a Stochastic Model includes parameter

estimation and model testing for alternative univariate and multivariate stochastic models. The

following models are included: (1) univariate ARMA(p,q) model, where p and q can vary from 1

to 10, (2) univariate GAR(1) model, (3) univariate periodic PARMA(p,q) model, (4) univariate

seasonal disaggregation, (5) multivariate autoregressive MAR(p) model, (6) contemporaneous

multivariate CARMA(p,q) model, where p and q can vary from 1 to 10, (7) multivariate periodic

MPAR(p) model, (8) multivariate annual (spatial) disaggregation model, and (9) multivariate

temporal disaggregation model. Two estimation methods are available, namely the method of

moments (MOM) and the least squares method (LS). MOM is available for most of the models

while LS is available only for univariate ARMA, PARMA, and CARMA models. For CARMA

models, both the method of moments (MOM) and the method of maximum likelihood (MLE) are

available for estimation of the variance-covariance (G) matrix. Regarding multivariate annual

(spatial) disaggregation models, parameter estimation is based on Valencia-Schaake or Mejia-

Rousselle methods, while for annual to seasonal (temporal) disaggregation Lane's condensed method

is applied.

For stochastic simulation at several sites in a stream network system a direct modeling

approach based on multivariate autoregressive and CARMA processes are available for annual data


9/96

5

and multivariate periodic autoregressive process is available for seasonal data. In addition, two

schemes based on disaggregation principles are available. For this purpose, it is convenient to

divide the stations into key stations,substations, andsubsequent stations. Generally the key stations

are the farthest downstream stations, substations are the next upstream stations, and subsequent

stations are the next further upstream stations. In the first scheme, the annual flows at the key

stations are added creating an annual flow data at an artificialor indexstation. Subsequently, a

univariate ARMA(p,q) model is fitted to the annual flows of the indexstation. Then, a spatial

disaggregation model relating the annual flows of the index station to the annual flows of the key

stations is fitted. Further, a statistical disaggregation model relating the annual flows of the key

station to those of the substations and another disaggregation model relating the annual flows of the

substations and the subsequent stations, are fitted. In fact, this is a three-level (spatial)

disaggregration procedure. In the second scheme a multivariate AR(p) model is fitted to the annual

data of the key stations, then the rest of the model relating the annual flows at the key station,

substations, and subsequent stations are conducted in a similar manner as in the first scheme.

Furthermore, if the objective of the modeling exercise is to generate seasonal data by using

disaggregration approaches, then an additional temporal disaggregration model is fitted that relates

the annual flows of a group of stations with the corresponding seasonal flows.

The third main application of SAMS is Generating Synthetic Series, i.e. simulating

synthetic data. Data generation is based on the models, approaches, and schemes as mentioned

above. The model parameters for data generation can be those which are estimated by SAMS or

they can be provided by the user. If provided by the user, the program prompts the user to insert the

model type and then the model parameters. The statistical characteristics of the generated data are

presented in graphical or tabular forms along with the historical statistics of the data that was used

in fitting the generating model. The generated data including the "generated" statistics can be

displayed graphically or in table form, and be printed and/or written on specified output files. As

a matter of clarification, we will summarize here the overall data generation procedure for

generating seasonal data based on scheme 2:

(a) a multivariate AR(p) model is used to generate annual flows at the key stations;

(b) a spatial disaggregation model is used to disaggregate the generated annual flows at the key

stations into annual flows at the substations;


10/96

6

Fig.3 Statistical analysis menu

(c) a spatial disaggregation model is used to disaggregate the generated annual flows at substation

into annual flows at subsequent stations;

(d) a temporal disaggregation model is used to disaggregate the annual flows at a group of stations

into the corresponding seasonal flows at those stations.

In modeling and data generation of complex water resources systems involving many

stations, despite the versatility of SAMS 2000, keeping track of different options, components,

parameters, etc. involved can be a time consuming and confusing task. To help alleviating this

problem, a Status button (see Fig.3) can be activated. The user can review the current

transformation, modeling, and generation status and related information by clicking on the Status

button in any menu or window.

2.2 Statistical Analysis of Data

Figure 3 shows the statistical analysis data menu. By selecting the annual or seasonal button

the user can specify the type of data to be analyzed. Then, the following operations can be selected:

1. Plot time series data.

2. Check normality and transform time series.

3. Statistical characteristics of time series.

In the following sections, we will examine and

illustrate each of these options.

Plot Time Series Data

Plotting of the data can help in detecting

trends, shifts, outliers, and errors in the data.

SAMS can plot the data as curve, stick, and bar

graphs. Figure 4 illustrates a time series plot for

annual data. The scale of the plot is determined

based on the sample maximum and minimum as

shown in the control bar at the bottom, but the user

can change it by keying in the desired graph scale

range. This enables the user to zoom in and out the plot to examine the data and do on-screen

graphical check for the variability of the data. Note that if the station names or IDs are available

in the input data file, they will be shown on the plots or tables.


11/96

7

Fig.4 Plotting of annual time series

Check Normality and Transform Time series

SAMS tests the normality of the data by plotting the data on normal probability paper and

by using the skewness test of normality. To examine the adequacy of the transformation, the

comparison of the theoretical generated distribution based on the transformation and the counterpart

historical sample distribution are plotted as shown in Fig. 5 for annual data. For seasonal data, the

results of the seasonal skewness tests are presented in graphical and tabular formats. The test critical

values are also shown on the screen which are guides to check whether the data is within the normal

range. For example, if the sample skewness coefficient for a given season is less than or equal to

the critical value, the hypothesis of normality of the data can not be rejected. On the other hand, if

the sample skewness coefficient is greater than the table value, the hypothesis of normality is

rejected. In addition, for the specified season, the normal probability plot for the transformed

seasonal data and the comparison of the theoretical generated distribution and the sample

distribution for that season are also displayed.


12/96

8

Fig.5 Annual data transformation result

If the data at hand is not normal, one can check whether it can be normalized by a certain

transformation function. This can be done by clicking on "Transformations" button and a menu with

different types of transformations will appear. Fig. 6 shows the transformation menu for seasonal

data. The user can choose any type of transformation by simply clicking on the corresponding

button. Three types of transformations are available: logarithmic, power, and Box-Cox

transformations. The transformation can be done all at once for all seasons or on a season by season

basis. The user can choose any of the above transformations and accordingly key in the

transformation coefficients, then click the "Display" button to preview the transformation result.

Clicking on the "Accept Transformation" button will actually conduct the transformation for

the data of current station and store the transformation type and coefficients in memory. From this

point, SAMS will recognize the transformed data as the default data and will process this data

instead of the original data. For clarification, suppose that the user has chosen to transform the

annual data for site 1 by a logarithmic transformation and accepted the transformation by clicking


13/96

9

Fig.6 Transformation menu for

seasonal data

on the "Accept Transformation" button.

Suppose further that the user wants to model

site 1 data with an ARMA (p,q) model. Then,

the ARMA model will be fitted to the

transformed data and not the original data.

The question that can be raised here is: can I

get the model to fit the original data (without

having to start the whole process over again)?

The answer is yes. You can get your original

data back by clicking on again the

"Transformation" button, then choose the "No

Transformation" button (shown at the bottom in

Fig.6), and then in the next window (refer to

Figs. 5 and 7) use "Accept Transformation" to

retrieve the original data.

The save option (refer to Figs. 5 and 7)

allows the user to save the transformation

parameters in a special file. Before clicking on save, remember to actually transform the data

by clicking on Accept Transformation". Clicking on the "Save" button will prompt a file menu

and allow the user to select the file name (with an extension ".atr" and ".str" automatically attached

for annual and seasonal data, respectively) for storing the transformation parameters. This will

enable the user to access to the transformation parameters at any other time. To understand this

convenient feature of SAMS, suppose that a user transformed the data and fitted the PARMA (1,1)

model to the data. Subsequently, the user wants to fit a different model to the transformed data.

Instead of doing the transformation process over again, the user can simply open the transformation

file which was saved previously. The user can access to this file by clicking on the

"Transformation" button and then on the "Open File that Contains Transformation Parameters"

button. After the file has been opened, one must click on "Accept Transformation" to actually

transform the data. For multisite data, instead of clicking on "Accept Transformation" for each site,

the user can simply click (once) on "Transform all sites" to conduct the data transformation for all


14/96

10

Fig.7 Seasonal data transformation results

sites. Figure 7 shows an example of seasonal transformation results. In the example the

logarithmic transformation has been used with varying values of the coefficient a..

The steps that are usually involved in using the transformation window option presented in

Fig.5 and 7 are summarized below:

1. To check normality of data and use transformation options:

! Key in the proper site number.

! Key in the season number (available for seasonal data only).

! Click on "Transformation" button.


15/96

11

Fig.8 Annual statistical characteristics

menu

! From the transformation menu (for instance see Fig.6 for seasonal data), select a

transformation type.

! Click "Display" on the next window (for instance see Fig. 5 and 7).

! Key in the transformation coefficients (if necessary) and click "Display". See the

results and try other coefficients as needed.

2. To actually transform the data by using the selected transformation type and coefficients

! Click on "Accept Transformation" button.

3. To save the selected transformation type and coefficients in a file

! Click on "Save" button (previously you must have clicked on accept transformation).

4. To transform data by loading the previously saved transformation parameter file

! Click on "Transformation" button and choose "Open File that Contains

Transformation Parameters" to open the transformation coefficients file.

! Click on "Transform all sites".

It is suggested that if transformations are needed for both annual and seasonal data, the user

should conduct annual data transformation before conducting seasonal data transformation.

Statistical Characteristics of Time Series

A number of statistical characteristics can be calculated for the original and transformed data.

They can be available in graphical and

tabular formats and can be saved in an output

file. These are summarized below.

- For Annual Data:

! Basic statistics such as mean,

standard deviation, skewness

coefficient, coefficient of

variation, maximum, and

minimum values.

! Serial correlation coefficients.

! Cross-correlation coefficients

for multisite data.

! Drought, surplus (flood), and


16/96


17/96

13

Fig.10 Window showing the season to season correlations of seasonal data

for up to 4 stations. If the stations specified are more 4 stations(sites), say 7, then after viewing the

results for the first 4 stations, clicking on the "Next" button will enable one to view the results of the

remaining 3 stations.

2.3 Fitting a Stochastic Model

The LAST package included several programs to perform several objectives regarding

stochastic modeling of time series. The basic procedure involved modeling and generating the

annual time series using a multivariate AR(1) or AR(2) model, then using a disaggregation model

to disaggregate the generated annual flows to their corresponding seasonal flows. In contrast,

SAMS has two major modeling strategies which are direct and indirect modeling. Direct modeling

means fitting an stationary model (univariate ARMA or multivariate AR or CARMA) directly to the


18/96

14

Fig.11 Stochastic modeling menu

annual data or fitting a periodic (seasonal) model (univariate PARMA or multivariate PAR) directly

to the seasonal data of the system at hand. Annual to seasonal disaggregation modeling on the other

hand is an indirect procedure since the modeling of seasonal data involves also modeling of the

corresponding annual data as well. Figure 11 displays the referred direct or indirect (using

disaggregation) modeling procedures under annual or seasonal categories. Regardless whether the

input data available is annual data or seasonal (for example monthly data) the user must select on

the annual button if the final objective of the modeling exercise is to generate annual flows only.

Otherwise, if the objective is to generate monthly

quantities then the seasonal button must be

selected.

The following specific models are

currently available in SAMS under each

category:

1. For Annual Modeling:

! Univariate ARMA(p,q) model.

! Univariate GAR(1) model.

! Multivariate AR(p) model (MAR).

! Contemporaneous ARMA(p,q)

model (CARMA).

! Multivariate annual (spatial)

disaggregation.

2. For Seasonal Modeling:

! Univariate PARMA(p,q) model.

! Univariate seasonal

disaggregation.

! Multivariate PAR(p) model (MPAR).

! Multivariate seasonal disaggregation.

Figures 12 and 13 display the menus that can be used for selecting annual and seasonal

models, respectively. The user will need to click on the button corresponding to the desired model

and in turn a modeling menu will appear where the site number, the model order, etc. can be


19/96

15

Fig.12 Annual stochastic modeling

menu

specified. For example, Fig.14 shows a menu

that can be used to fit a PARMA(p,q) model.

Similar menus are available for ARMA, GAR(1),

MAR, CARMA, and MPAR models. The user

needs to specify the station(s) or site(s)

number(s). If standardization of the data is

desired, one must click on the "Standardize Data"

button. Generally, the modeling is performed

with data in which the mean is subtracted. Thus,

standardization implies that not only the mean

will be subtracted but in addition the data will be

further transformed to have a standard deviation

equal to one. For example, for the data of season

5 the mean for season 5 will be subtracted from

each data point, then each observed data point for

that season will be divided by the standard

deviation of the 5thseason. As a result, the mean

and the standard deviation of the standardized

data of the 5thseason will become equal to zero

and one, respectively. Then, the order of the model to be fitted can be selected by clicking on "Enter

model order" button. For instance, one must enter p and q for ARMA models. In the case of MAR

or MPAR models, the user needs to key in the order p only. Subsequently, the method of estimation

of the model parameters must be selected.

Currently SAMS provides two methods of estimation namely the method of moments

(MOM) and the least squares (LS) method. MOM is available for the ARMA(p,q), GAR(1),

MAR(p), PARMA(p,1), and MPAR(p) models while LS is available for ARMA(p,q), CARMA(p,q),

and PARMA(p,q) models. The LS method requires initial parameters estimates (starting points).

These starting points can be selected by the user or the MOM parameters estimates can be used as

the starting points. For cases where the MOM estimates are not available such as for the PARMA

(p,q) model where q>1, the MOM parameter estimates of the closest model will be used instead.


20/96

16

Fig.14 SAMS modeling menu

Fig.13 Seasonal stochastic modeling menu

For example, for the PARMA(3,3) model, the MOM estimates of the PARMA(3,1) model (including

zeros for the two remaining parameters) will be used as the starting points. For fitting CARMA(p,q)

models, the residual variance-covariance G matrix can be estimated using either the method of

moments (MOM) or the maximum likelihood estimation (MLE) method (Stedinger et al., 1985).

The estimated model parameters can be

saved in a file selected by the user. This can be

done by clicking on the "Save" button in the

estimation of parameters window and a menu

will appear in which the user can assign the file

name as shown in Fig.15. The file is written in

a certain format and it is recommended that the

user does not change or edit this file unless it is

necessary. Saving the parameters in a file is

important since this file will be used by SAMS in

the generation of data as we will see in the next


21/96

17

Fig.15 SAMS model parameter window

sections.

After the model has been fitted and the estimated parameters have been saved, it is

recommended that the fitted model be tested to ensure that it is appropriate for the data at hand. In

general, this can be done by testing the residuals and comparing the model and historical properties

of the data. SAMS has the ability to perform such testing. Testing of the residuals is an important

part of the modeling process by which the modeler can test whether the fitted model is adequate.

In all the models available in the current version of SAMS except the GAR(1) model, the basic

assumptions about the residuals are that they are normal and independent. SAMS performs certain

statistical tests to check the validity of these assumptions. The hypothesis that the residuals are

normally distributed is tested based on the skewness test of normality. The results are presented in

terms of rejecting or not rejecting the hypothesis. In addition, the residuals are plotted on normal


22/96

18

Fig.16 Testing the normality and the independence of the residuals

probability paper in order to check graphically whether the residuals are normally distributed. For

testing the independence of the residuals, the Porte Manteau test of independence (Salas, et al, 1980)

is utilized. The correlogram of the residuals is also plotted to help the user in checking the

independence of the residuals. Figure 16 shows an example of results of both normality and

independence tests of the residuals.

Once the model has been fitted to the data, the moments, e.g. the theoretical covariance

structure can be calculated based on the estimated parameters. Comparing the model and historical

covariance (correlation) structure is another method of testing. SAMS provides the user with the

ability to perform such comparisons. The user must click on "Comparing Model and Historical

Correlations" button and then a window will appear in which the theoretical and historical


23/96

19

Fig.17 Comparing the model and the historical correlograms

correlograms are presented in graphical or tabular format. Figure 17 is an example of graphical

comparison of model and historical month-to-month correlations. Additional examinationof the

model can be made regarding model parsimony. The so called Akaike Information Criteria (AIC)

may be used for this purpose. SAMS uses AIC for testing model parsimony when stationary ARMA

models are utilized.

Figure 18 illustrates the seasonal disaggregation menu when scheme 1 is chosen under

multivariate seasonal disaggregation (refer to Fig.13). In disaggregation modeling, the user should

conduct the process step by step following the menus order. The steps that have been done will be

marked successively with relevant text or double arrows to update the user. At the end of

disaggregation modeling, the user may click on "Definition of Spatial and Temporal Adjustment "

to define the "adjustment methods" (refer to Fig.19) and the corresponding system structure (refer


24/96

20

Fig.19 Spatial and temporaladjustment method menu

Fig.18 Seasonal disaggregation modeling menu

to Fig.20) for the stations (sites) that are subject to

modeling. This is necessary if adjustments are needed

for the generated series. The system structure for

adjustment usually depends upon the orders and

positions of the stations relative to each

other. This is important when adjustments need to be

done to the generated series based on spatial

disaggregation. The system structure means defining

for each main river system the sequence of stations

(sites) that conform the river network.

SAMS uses the concept of key stations and subkey

stations (substations and subsequent stations). A key

station is the farthest downstream station along a main

stream. For instance, station 1 is a key station in the

river system shown in Fig.21. Likewise, 2 and 3 are also key stations. On the other hand, if station


25/96

21

Fig.20 System structure input menufor key station and substations

1 would not exist (or not used in the analysis), then in this case stations 4 and 5 will become key

stations. Let us continue the explanation assuming that stations 1, 2, and 3 in Fig.21 are key

stations. Substations are the next upstream stations draining to a key station. For instance, stations

4 and 5 are substations draining to key station 1. Likewise, stations 6 and 7 and 8 and 9 are,

respectively, substations for key stations 2 and 3. Subsequent stations are the next upstream stations

draining into a substation. For instance, stations 11 and 12 are subsequent stations relative to

substation 5 and station 10 is a subsequent station regarding substation 4.

On the other hand, for defining a

"disaggregation configuration" SAMS uses the

concept of groups. As shown in Fig.22, a group

consists of one or more key stations and their

corresponding substations. Groups must be

defined in each disaggregation step. Each group

contains a certain number of stations to be

modeled in a multivariate fashion or "jointly" in

order to preserve their cross-correlations. For

instance, if a certain group has two key stations

and three substations, then the disaggregation

process will preserve the cross-correlations

between all the key and the substations. On the

other hand, if two separate groups are selected,

then the cross-correlations between the stations

that belong to the same group will be preserved,

but the cross-correlations between stations

belonging to different groups will not be

preserved.

The definition of a group is very important in the disaggregation process. For instance,

referring to Fig. 22, key stations 1 and 2 and substations 4, 5, 6, and 7 form one group in which the

flows of all these stations are modeled jointly in a multivariate framework, while key station 3 and

its substations 8 and 9 form another group. In this case, the cross-correlations between the stations


26/96

22

Fig.21 Schematic representation of a streamflow network

Fig.22 Disaggregationconfiguration input menu for

key station and substations

within each group will be preserved but the cross-

correlations among stations in different groups will

not be preserved. For example, in the above

configuration, the cross-correlations between

stations 1 and 3 will not be preserved but the cross-

correlations between stations 1 and 2 will be

preserved. On the other hand, if all the stations are

defined in a single group, then the cross-correlations

between all the stations will be preserved. In the

final step of disaggregation, a group may contain

stations 4, 5, 10, 11, and 12. In the current version

of SAMS, the total combined number of stations in

any defined group must not exceed 10 stations.

After modeling the annual flows using the above


27/96

23

configuration, the annual flows can be disaggregated into seasonal flows. This is handled again by

using the concept of groups as was explained above. The user, for example, can choose stations 3,

8, 9, 17, 18, and 19 as one group. In this case, the annual flows for these stations will be

disaggregated into seasonal flows by a multivariate disaggregation model so as to preserve the

seasonal cross-correlations between all the stations.

Currently, SAMS has two schemes for modeling the key stations. The first scheme, denoted

as scheme 1 (see the modeling menus of Figs.12 and 13), will aggregate the annual flows of the key

stations that belong to a certain group, then use a univariate ARMA(p,q) to model the aggregated

flows, then the aggregated annual flows are disaggregated (spatially) back to each key station by

using the Valencia and Schaake or the Mejia and Rouselle disagregation method. The second

scheme, denoted as scheme 2, will model the annual flows of the key stations belonging to a given

group by a multivariate MAR(p) model. Once the flows at key stations are modeled, the rest of the

procedure for generating annual flows at all substations and subsequent stations and then for

generating the seasonal flows at all stations is the same as in scheme 1 (as above mentioned).

Additional details about disaggregation modeling are shown in chapter 3, where a mathematical

description of the disaggregation methods is presented, and in chapter 4, where an example of

disaggregation modeling applied to real data is given.

2.4 Generating Synthetic Series

Data generation is an important subject in stochastic hydrology and has received a lot of

attention in hydrologic literature. Data generation is used by hydrologists for many purposes. These

include, for example, reservoir sizing, planning and management of an existing reservoir, and

reliability of a water resources system such as a water supply or irrigation system (Salas et al,1980).

Stochastic data generation can aid in making key management decisions especially in critical

situations such as extended droughts periods (Frevert et al, 1989). The main philosophy behind

synthetic data generation is that synthetic samples are generated which preserve certain statistical

properties that exist in the natural hydrologic process (Lane and Frevert, 1990). As a result, each

generated sample and the historic sample are equally likely to occur in the future. The historic

sample is not more likely to occur than any of the generated samples (Lane and Frevert, 1990).

Generation of synthetic time series is based on the models, approaches and schemes

presented in section 2.3 of this manual. Once the model has been defined and the parameters have


28/96

24

Fig.23 SAMS generation menu

been estimated, one can generate synthetic samples based on this model. SAMS allows the user to

generate synthetic data and eventually compare important statistical characteristics of the historical

and the generated data. Such comparison is important for checking whether the model used in

generation is adequate or not. If important historical and generated statistics are comparable, then

one can argue that the model is adequate. The generated data is stored in a file. This allows the user

to further analyze the generated data as needed. Furthermore, when data generation is based on

spatial or temporal disaggregation, one may like to make adjustments to the generated data. This

may be necessary in many cases to enforce that the sum of the disaggregated quantities will add up

to the original total quantity. For example, spacial adjustments may be necessary if the annual flows

at a key station is exactly the sum of the annual flows at the corresponding substations. Likewise,

in the case of temporal disaggregation, one may like to assure that the sum of monthly values will

add up to the annual value. Various options of adjustments are included in SAMS. Further

description on spacial and temporal adjustments are described in Section 4.8.2.

Figure 23 shows the data generation menu.

In this menu the user must specify necessary

information for the generation process. The type of

data to generate (either annual or seasonal) and the

type of modeling, which is either univariate (single

site) or multivariate (multisite) must be selected.

For example, if the user wants to generate annual

data at a single station by using an ARMA model,

then the option "Annual" and "Single site" must be

selected. On the other hand, to generate seasonal

data at several stations from a disaggregation model,

one must select "Seasonal" and "Multisite". In

addition, the data length (in years) and the number

of samples to be generated, and a seed number to

initiate the generation process need to be specified.

In this version of SAMS, both the number of

samples and the length of data to be generated are


29/96

25

unlimited. The user should consider however the computer time it will take to generate many

samples or very long samples especially if the generation is to be done for multisite seasonal data.

Furthermore, one of four options regarding the generation model, as shown in the dialog box

in Fig.23, must be chosen. One must select "Yes" if SAMS was used to fit the model from which

data are to be generated. On the other hand, if one would like to generate data using one of the

models available in SAMS, but the model was not fitted by SAMS, then the "No" option must be

selected. To illustrate this point further, lets assume that the user fitted an ARMA (1,1) model by

using an estimation method which is not available in the current version of SAMS or by using a

different package but he wants to generate data using SAMS. Then, the user should select either the

first or the second "No" option to generate the required data. Another difference between the "Yes"

and the "No"options is that after generating the data SAMS will compare the generated and

historical statistics only if the "Yes" option is selected. In the second "No" option the user will open

a (parameter) file which must have the model parameters. This parameter file has to be in a certain

format to be recognized by SAMS. The format of this file must be exactly the same as the format

of the parameter file that SAMS generates after fitting a stochastic model as mentioned in section

2.3. To make sure of this, the user may like to run SAMS to generate a parameter file using the

model desired, then edit the parameter file to insert the new parameter set. Again for clarification,

lets consider the ARMA(1,1) model where a method different than those available in SAMS was

used to estimate the parameters. SAMS can be used to fit an ARMA(1,1) model to the same data

but using say MOM estimation. Then the MOM parameters can be saved on a file and then the file

can be edited to replace the MOM parameters by the desired set of parameters. In this case, the user

needs to change the parameters , , and (refer to Section 4.2 for details). One must be aware 2

that this file must also contain the transformation parameters if transformation was used. Finally,

SAMS will generate data from the referred model based on the parameters contained in the edited

file.

After providing all the information needed for data generation, the user can click on the "Ok"

button shown in Fig.23. A generation menu will appear on the screen which will allow the user to

open the file which contains the model parameters. For example, Fig.24 will appear if the options

to generate single site and seasonal data were chosen. By clicking on the "Open Model Parameters

File" button, a window will appear which will allow the user to select the file that contains the model


30/96

26

Fig.24 Univariate seasonal generation menu

parameters as shown in Fig.24. After clicking on the "Generate and Save Data" button ( also shown

in Fig. 24) another menu will appear so that a file name (with an extension .gen automatically

attached) can be assigned to store the generated data. If the generation is based on a disaggregation

model, a menu as shown in Fig.19 will appear to remind the user about the adjustment methods

(which should have been read from the previously referred parameter file.) One can also make

changes to the adjustment methods at this point. Next, if statistical analysis of the generated data

is desired, the "Statistical Analysis of Generated Data" button must be clicked on and another menu

box as in Fig.25 will appear which will enable one to view the results. For example, the time series

of the generated data will be shown by clicking on the "Plot Time Series" button. In the case of

analysis pertaining drought, surplus, and storage related statistics, SAMS will ask the user to input

the desired threshold demand level, as shown in Fig.26. The default demand level is the sample

mean, but one can change it by keying a fraction of the sample mean or the actual desired demand

level. The results of the statistical analysis of the generated data can be saved into a file by clicking


31/96

27

Fig.25 Seasonal statistical characteristics of generated data menu

Fig.26 Window regarding the demand

level

on "Save Statistical Analysis" button. This will create a file with the extension .gst automatically

attached to store the results. Note that the referred feature of the statistical comparison of the

historical and generated data can be also used for further testing and verifying whether the fitted

model performs as desired.

In estimating the generated statistics,

the statistics of each generated sample are

firstly estimated then the means and standard

deviations of those statistics are computed

which will be used to compare with their

historical counterparts. The results are

presented in graphical or tabular formats.

Figure 27 shows a comparison of the

(observed) historical annual series and the


32/96

28

Fig.27 Time series plots of the historical and generated annual flows

generated series for one sample. The user can change the station number, sample number, and the

graph scale as needed. For annual series, the comparisons of the historical and generated mean,

standard deviation, skewness coefficient, coefficient of variation, and sample maximum and

minimum are presented in tabular form. For seasonal series, the comparisons are presented in both

graphical and tabular formats as shown in Fig.28. The comparisons of correlations for annual and

seasonal data may be presented in graphical or tabular formats as shown in Fig.29 (for seasonal

data). The comparisons of drought, surplus, and storage related statistics include the longest

drought, maximum deficit, longest surplus, maximum surplus, storage capacity, rescaled range, and

Hurst coefficient. Before showing these results, a window as in Fig.26 will pop up again to allow

the user to change the demand level if needed. The results are presented in tabular format and box

plots as shown in Fig.30. The box plots reflect the ratios of the means, quartiles, maximums, and


33/96

29

Fig.28 Comparison between the historical and the generated monthlymean and standard deviations

minimums of those statistics calculated from the generated series to the observed historical values.

The scale of the box plot can be adjusted by the user based on the ratio ranges provided in the dialog

box.


34/96

30

Fig.29 Comparisons of the historical and generated seasonal cross-

correlations

Fig.30 Comparison of drought, surplus, and storage related

statistics

Finally, the Status button has been added in all window menus in order to keep track of

all major results and options selected throughout the analysis, modeling, and generation exercise.


35/96

31

Fig.31 Example of update information regarding the transformation,modeling, and generation steps. This view is shown by clicking on

Status

For example, by clicking on the Status button under any menu or window, the user can review the

transformation methods and coefficients utilized for each site, the fitted model including parameters

and adjustments options, etc. and information related to the data generation as that shown in Fig.31.


36/96

32

3 DEFINITION OF STATISTICAL CHARACTERISTICS

A time series process can be characterized by a number of statistical properties such as the

mean, standard deviation, coefficient of variation, skewness coefficient, season-to-season

correlations, autocorrelations, cross-correlations, and storage and drought related statistics. These

statistics are defined for both annual and seasonal data as shown below.

3.1 Basic Statistics

3.1.1 Annual Data

The mean and the standard deviation of a time series ytare estimated by

(3.1)y N ytt

N

==

( / )11

and

(3.2)sN

y ytt

N

= =

1 21( )

respectively, where N is the sample size. The coefficient of variation is defined as .cv s y= /

Likewise, the skewness coefficient is estimated by

(3.3)g

Ny y

s

tt

N

=

=

1 3

1

3

( )

The sample autocorrelation coefficients rkof a time series may be estimated by

(3.4)r m

mk

k=0

where (3.5)m N y y y yk t k

t

N k

t= +=

( / ) ( )( )1

1

and k= time lag. Likewise, for multisite series, the lag-k sample cross-correlations between site i

and sitej, denoted by rkij, may be estimated by

(3.6)

( )

r m

m m

kij k

ij

ii jj=

0 0

1 2/

where

(3.7)( )( )m N y y y ykij

t k

i i

t

N k

tj j= +

=

( / ) ( ) ( ) ( ) ( )1

1

in which is the sample variance for site i.mii0

3.1.2 Seasonal data

Seasonal hydrologic time series, such as monthly flows, are better characterized by seasonal


37/96

33

statistics. Lety,

be the seasonal time series, where represents years and seasons; =1,...,N

withN=number of years, and =1,...,, and =number of seasons. The mean and standard

deviation for season can be estimated by

(3.8)yN

yN

==

11

,

and

(3.9)sN

y yN

= =

1 2

1( ),

respectively. The seasonal coefficient of variation is . Similarly, the seasonalcv s y = /

skewness coefficient is estimated by

(3.10)g

Ny y

s

N

=

=

1 3

1

3

( ),

The sample lag-kseason-to-season correlation coefficient may be estimated by

(3.11)rm

m mk

k

k

,,

, ,/

( )

=

0 01 2

where

(3.12)( )( )mN

y y y yk

N

k k, , ,

= =

1

1

in which represents the sample variance for season . Likewise, for multisite series, them0,

lag-k

sample cross-correlations between siteiand sitej, for season , may be estimated byrkij,

(3.13)( )

rm

m mk

ij kij

iik

jj,

,

, ,

/

=0 0

1 2

and

(3.14)[ ][ ]mN

y y y ykij i i

k

j

k

jN

, ,( ) ( )

,

( ) ( )

= =

1

1

in which represents the sample variance for season and site i. Note that in Eqs. (3.11)mii

0,

through (3.14) when , the terms, , , and


38/96

34

and a subsampley1 , ...,ynwith n N. Form the sequence of partial sums Sias

(3.15)S S y y i ni i i n= + =1 1( ) ,...,

where S0= 0and is the sample mean of y1 , ...,ynwhich is determined by Eq.(3.1). Then,yn

the adjusted range and the rescaled adjusted range can be calculated byRn*

Rn**

(3.16)R S S S S S Sn n n*

max( , ,..., ) min( , ,..., )= 0 1 0 1and

(3.17)R R

sn

n

n

***

=

respectively, in which is the standard deviation of y1 , ...,ynwhich is determined by Eq.sn(3.2). Likewise, the Hurst coefficient for a series is estimated by

(3.18)K R

nnn= >

ln( )

ln( / ),

**

22

The calculation of the storage capacity is based on the sequent peak algorithm (Loucks,

et al., 1981) which is equivalent to the Rippl mass curve method. The algorithm, applied to the

time seriesyi ,i = 1, ...,Nmay be described as follows. Based onyi and the demand level d, a

new sequence can be determined asSi

(3.19) = +

S

S d yi

i i1 if positive

0 otherwise

where Then the storage capacity is obtained as =S0 0.

(3.20)S S Sc N=max[ , ..., ]' '

1

Note that algorithms described in Eqs.(3.15) to (3.20) apply also to seasonal series. In

this case, the underlying seasonal series is simply denoted asy , yt.

3.2.2 Drought Related Statistics

The drought-related statistics are also important in modeling hydrologic time series. Forthe seriesyi, i = 1, ...,N, the demand level dmay be defined as (for example, for y ,0 1<

) A deficit occurs whenyi< dconsecutively during one or more years untilyi>= =1, .d y

dagain. Such a deficit can be defined by its durationL, by its magnitudeM, and by its intensity

I = M/L. Assume that mdeficits occur in a given hydrologic sample, then the maximum deficit

duration (longest drought or maximum run-length) is given by

(3.21)L L L L Lm m* max( , ..., ) min( , ..., )= 1 1

and the maximum deficit magnitude (maximum run-sum) is defined by

(3.22)M M Mm* max( ,..., )= 1


39/96

35

In SAMS, the longest drought duration and the maximum deficit magnitude are estimated for

both annual and seasonal series.

3.2.3 Surplus Related Statistics

For our purpose here, surplus related statistics are simply the opposite of drought related

statistics. Considering the same threshold level d, a surplus occurs whenyi> dconsecutively

untilyi< dagain. Then, assuming that msurpluses occur during a given time periodN, the

maximum surplus periodL*and maximum surplus magnitudeM*may be determined also from

Eqs. (3.21) and (3.22).

4 MATHEMATICAL MODELS

4.1 Data Transformations and Standardization

In cases where the normality tests indicate that the observed series are not normally

distributed, the data has to be transformed into normal before applying the models. To normalize

the data, the following transformations are available in SAMS:

- Logarithmic transformation

(4.1)Y X a= +ln( )

- Power transformation

(4.2)Y X a b= +( )

- Box-Cox transformation

(4.3)Y X a

bb

b

= +

( )

,1

0

where Yis the normalized series,Xis the original observed series, and aand bare transformation

coefficients. Note that the logarithmic transformation is simply the limiting form of the Box-Cox

transform as the coefficient bapproaches zero. Also, the power transformation is a shifted and

scaled form of the Box-Cox transform. The variables YandXcan represent either annual orseasonal data. For seasonal data aand bcan be chosen to vary with the season. The normalized

data can then be standardized by subtracting the mean and dividing by the standard deviation

(standardization is actually an option in SAMS). For example, for seasonal series, the

standardization may be expressed as:

(4.4)ZY Y

S Y

,

,

( )=

where is the standardized series, and and are the mean and the standard deviationZ ,

Y

S Y

( )

of the transformed series for month Then, the stochastic models can be fitted to the.


40/96

36

standardized series . For generating flows, the reverse procedure is followed. AfterZ ,

generating then can be obtained byZ , Y ,

(4.5)Y Y S Y Z , ,( )= +

and can be generated by applying the appropriate inverse transformation to theX , Y ,

process. For example, if was transformed by a natural log transformation, the processX ,

can be obtained from by applying the following inverse transformation:X , Y ,

(4.6)X Y a , ,exp( )=

4.2 Univariate ARMA(p,q) Model

The ARMA(p,q) model may be expressed as:

(4.7) ( ) ( )B Y B et t=

where Yt represents the streamflow process for year t, it is normally distributed with mean zero

and variance 2(Y) , etis the uncorrelated noise term with mean zero and variance2(e)and

also is normally distributed; and and are polynomials inBdefined as( )B ( )B

(4.8a) ( )B B B Bpp= 1 1

12

2

(4.8b) ( )B B B Bqq= 1 1

12

2

where are the autoregressive parameters; are the moving average 1 2, , . . ., p 1 2, , . . ., q

parameters;Bis the backward shift operator, i.e., , andpand qdefine the order ofB Y Yc t t c=

the ARMA model.

Method of moments (MOM) may be used in parameter estimation of ARMA(p, q)

models. For example, the moment estimators for the ARMA (1,0) , ARMA (1,1) and ARMA

(2,1) models are shown below:

- ARMA (1,0) model:

(4.9)Y Y et t t= +1 1

(4.10)$1 1=m

(4.11)$ ( ) ( $ ) 2 12 21e s=

- ARMA (1,1) model:

(4.12)Y Y e et t t t = + 1 1 1 1

(4.13)$12

1

=m

m

(4.14)$ $ ( $ )

( $ ) $

1 1

21 1

12

1 1

1= +

s m

s m


41/96

37

(4.15)$ ( )$

$

2 12

1

1

e s m

=

in which can be obtained by solving Eq. (4.14)$

1

- ARMA (2,1) model:

(4.16)Y Y Y e et t t t t = + + 1 1 2 2 1 1

(4.17)$1

2 12

3

12 2

2

=

m m s m

m s m

(4.18)$23 1 2 3

12 2

2

=

m m m m

m s m

(4.19)$ $ ( $ $ )

( $ $ )

( $ $ )

( $ $ ) $

1 1

21 1 2 2

12

1 2 1

12

1 2 1

12

1 2 1 1

= +

+

+

+

s m m

s m m

s m m

s m m

(4.20)$ ( )$ $

$

2 12

2 1 1

1

e s m m

= +

wheres2is the variance of Ytand mk is the estimate of the lag-kautocovariance of Ytwhich is

defined asMk= E[YtYt-k]. In the foregoing model it is assumed that the mean has been removed

or E(Yt)=0. Note also thats2=m0.

However, the Least Squares (LS) method is generally a more efficient parameter

estimation method. In this method, the parameters and are estimated by minimizing thes s

sum of squares of the residuals defined by

(4.21)F ett

N

==

2

1

whereNis the number of years of data. For the ARMA (p,q) model, the residuals are defined

as

(4.22)e Y Y et t ii

p

t i ii

q

t i= +=

=

1 1

Once the and are determined, then the noise variance is determined bys s 2 ( )e

. The minimization of the sum of squares of Eq. (4.21) may be obtained by a( / )1 2N et

numerical scheme. Powell's algorithm has been commonly employed for least squares

estimation of parameters of ARMA models. The Powell algorithm (Gill et al, 1981 and

Himmelblau, 1972), is an expanded version of the univariate gradient search which is a useful

optimization technique that does not require derivatives. The moment estimates of ARMA(p,q)


42/96

38

models may be taken as the initial values in the search algorithm. The non-derivative

optimization techniques depend very much on the starting points when the objective function is

not convex. In these cases there is no guarantee that the solution found corresponds to the global

minimum. The solution may be improved by choosing a different starting point.To generate synthetic series from an ARMA model , Eq. (4.7) can be used. First, a

standard uncorrelated normal random variable is generated, then is calculated ast

et

(4.23)e et t= ( )

To generate the correlated series Yt , the warm-up procedure is followed. In this procedure,

values of Ytprior tot=1 are assumed to be equal to the mean of the process (which is zero in this

case). Thus, Y1 , Y2 , . . . , YN+Lcan be generated using Eq. (4.7) by generating e1-q, e2-q, e3-q, ...

from Eq. (4.23) where Nis the required length to be generated and Lis the warm-up lengthrequired to remove the effect of the initial assumptions of Yt. Lis arbitrarily chosen as 50. The

advantage of the warm up procedure is that it can be used for low order and high order stationary

and periodic models while exact generation procedures available in the literature apply only for

stationary ARMA models or the low order periodic models.

4.3 Univariate GAR(1) model

Gamma-autoregressive (GAR) models assume that the underlying series is dependent

with a gamma marginal distribution and the models do not require variable transformation.

SAMS provides modeling and data generation based on the GAR(1) model. The model

parameters are estimated based on a procedure suggested by Fernandez and Salas (1990).

The GAR(1) model can be expressed as (Lawrence and Lewis, 1981)

(4.24)X Xt t t= + 1

whereXtis a gamma variable defined at time t, is the autoregression coefficient, and is the t

independent noise term. Xt is a three-parameter gamma distributed variable with marginaldensity function given by:

(4.25)f xx x

X( )( ) exp[ ( )]

( )=

1

where, , andare the location, scale, and shape parameters, respectively. Lawrence (1982)

found that tcan be obtained by the following scheme:

(4.26) = +( )1

where


43/96

39

(4.27)

= =

= >

=

0 0

01

if

if

M

Y MjU

j

Mj( )

whereMis an integer random variable Poisson distributed with mean and Uj,j =1,2, ln( )....are independent identically distributed (iid) random variables with uniform (0,1) distribution.

Additionally, Yj,j =1,2, ....are iid random variables exponential distributed with mean .1 /

The stationary GAR(1) process of Eq. (4.24) has four parameters, namely, ,, and .

It may be shown that the relationships between the model parameters and the population

moments of the underlying variable are:Xt

(4.28)

= +

(4.29)

2

2=

(4.30)

= 2

1=

(4.31)

where , , , and are the mean, variance, skewness coefficient, and the lag-one 2 1

autocorrelation coefficient, respectively.

Based on results given by Kendall (1968), Wallis and OConnell (1972), and Matalas

(1966) and based on extensive simulation experiments conducted by Fernandez and Salas

(1990), they suggested the following estimation procedure:

(4.32)$11 1

4=

+

r N

N

(4.33)$2 21

=

N

N Ks

(4.34)K N

N

N

=

[ ( $ ) $ ( $ )]

[ ( $ ) ]

1 2 1

1

12

1 1

12

in which is the lag-1 sample autocorrelation coefficient and is the sample variance. Inr1 s2

addition,

(4.35)( )

$$

. $ . .

=

0

13 7 0 49

1 3 12 N


44/96

40

where is the skewness coefficient suggested by Bobee and Robitaille (1975) as$0

(4.36)$0

1

2

12

=+

Lg A B

L

N g

N

in which is the sample skewness coefficient and the constantsA, B,andLare given byg1

(4.37)A N N= + + 1 6 51 20 21 2. .

(4.38)B N N= + 148 6 771 2. . ,

and

(4.39)L

(N 2)

(N 1),=

respectively. Furthermore, the mean is estimated by the usual sample mean . Therefore,x

substituting the population statistics and in Eqs.(4.28) through (4.31) by the , , , 1

corresponding estimates and as above suggested and solving the equationsx , $ , $, $1

simultaneously give the MOM estimates of the GAR(1) model parameters. For more details, the

interested reader is referred to Fernandez and Salas (1990).

4.4 Univariate PARMA(p,q) Model

Stationary ARMA models have been widely applied in stochastic hydrology to annual

time series where the mean, variance, and the correlation structure do not depend on time.

Seasonal statistics such as the mean and standard deviation may be reproduced by a stationary

ARMA model by means of standardizing the underlying seasonal series. However, this

procedure does not account for the season-to-season correlations that are generally exhibited by

hydrologic time series such as monthly streamflows. Thus, periodic ARMA (PARMA) models

have been suggested in the literature for this purpose.

A PARMA(p,q) model may be expressed as (Salas, 1993):

(4.40) ( ) ( ), ,B Y B e=

where represents the streamflow process for year and season , it has mean zero andY ,

variance and is normally distributed; is the uncorrelated noise term which is2

( )Y e ,

normally distributedwith mean zero and variance ; and2 ( )e ( )B ( )B

are periodic polynomials inBdefined as

(4.41a) ( ) ..., , ,B B B Bpp

= 1 11

22


45/96

41

(4.41b) ( ) ..., , ,B B B Bqq

= 1 11

22

where are the seasonal autoregressive parameters; are the seasonal 1, ,, , p 1, ,,..., q

moving average parameters;Bis the backward shift operator, i.e., , andpand qB Y Yc c , ,=

define the order of the PARMA model.

Method of moments (MOM) may be used in parameter estimation of low order

PARMA(p, q) models. In SAMS the MOM estimates are available for the PARMA(p,1) model.

For example, the moment estimators for the PARMA (1,1) and PARMA (2, 1) models are shown

below (Salas et al, 1982):

- PARMA (1,1) model:

(4.42)Y Y e e , , , , , ,= + 1 1 1 1

(4.43)$ ,,

,

1

2

1 1

=

m

m

(4.44)$ $ ( $ )

( $ )

( $ )

( $ ) $, ,

, ,

, ,

, ,

, , ,

1 1

21 1

1 12

1

1 12

1 1

1 12

1 1 1

= +

+ +

+

s m

s m

s m

s m

(4.45)$ ( )$

$

, ,

,

2 1 1 12

1 1

1 1

es m

= + +

+

- PARMA (2,1) model:

Y Y Y e e , , , , , , , ,= + + 1 1 2 2 1 1

(4.46)

(4.47)$

,

, , ,

, , ,

1

2 1 2 22

3

1 1 1 2 22

2 1=

m m s m

m m s m

(4.48)$ ,, , , ,

, , ,

2

3 1 1 2 2 1

1 1 1 2 22

2 1

=

m m m m

m m s m

(4.49)$ $ ( $ $ )

( $ $ )

( $ $ )

( $ $ ) $, ,

, , , ,

, , , ,

, , , ,

, , , , ,

1 1

21 1 2 2

1 12

1 2 1 1

1 12

1 1 2 1 1

1 12

1 2 1 1 1 1

= +

+

+

+

+ + +

+

s m m

s m m

s m m

s m m

$ ( )$ $

$

, , , ,

,

2 1 12

2 1 1 1 1

1 1

es m m

= + + + +

+


46/96

42

(4.50)

where is the seasonal variance and is the estimate of the lag-kseason-to-seasons2

mk,

covariance of which is equal toY ,

(4.51)M E Y Yk k, , ,[ ] =

because Note also thatE Y( ) ., = 0 s m 2

0= , .

In a similar manner as for the ARMA(p,q) model, the Least Squares (LS) method can be

used to estimate the model parameters of PARMA(p,q) models. In this case, the parameters s

and s are estimated by minimizing the sum of squares of the residuals defined by

(4.52)F e

N

= ==

,

2

11

where is the number of seasons andNis the number of years of data. For the PARMA (p,q)

model, the residuals are defined as

(4.53)e Y Y eii

p

i ii

q

i , , , , , ,= +=

=

1 1

Once the s and s are determined the seasonal noise variance can be estimated by 2 ( )e

. Alternatively, the method of moments can be applied but this later option is( / ) ,1 2N e

still not available in the current version of SAMS. In using Powells algorithm, for obtaining

the least squares estimates of the s and s the moment estimates of low order PARMA(p,q)' '

models such as PARMA(p,1) may be taken as the initial values in the search algorithm.

Generation of data from PARMA (p,q) models is carried out in a similar manner as for

ARMA(p,q) models. The warm up length procedure can be used again to generate seasonal

sequences of the process by assuming that values of prior to season 1 of year 1 areY , Y ,

equal to zero and generating uncorrelated random sequences of as needed in a similare ,

manner as for the ARMA (p,q) model. The warm-up period is taken as 50 years.

4.5 Multivariate MAR(p) Model

The MAR(p) model can be expressed as

(4.54)( )B Y et t=

where is a square matrix of polynomials in B which is defined as( )B

(4.55) ( )B I B B Bpp= 1

12

2

in which Iis an (nn) identity matrix; ,j= 1,...,p, are nnparameter matrices; is aj Bj


47/96

43

scalar difference operator such that ; Ytis an (n1) column vector with elements Yti,B Z Zj t t j=

i = 1, ... , n; and is an (nx 1) vector of normally distributed noise terms with mean0 andet

variance - covariance matrix G. The noises etare independent in time but are dependent in space

and nis the number of sites. Such spatially correlated noise can be modeled by (4.56)e Bt t=

where tis a (nx 1) vectorof standardized normal variables independent in both time and space

andB is an (nx n) parameter matrix.

It can be shown that the moment equations of the MAR(p) model are given by

(4.57)M M Gi iT

i

p

01

= +=

(4.58)M = M , kk ii

p

k-i= 1 1

where is the lag-kcross covariance matrix of defined as:Mk

Yt

M = E Y Yk t t-k T[ ] (4.59)

in which the superscript T indicates a matrix transpose and E(Yt)=0. In finding the MOM

estimates, Eq.(4.58) for k=1, ...,p, is solved simultaneously for the parameter matrices ,j=j

1, ..., p, by substituting

in Eq. (4.58) the population covariance

m a t r i c e s , b y t h e s a m p l e c o v a r i a n c eMk

, k 1, 2, ..., p

matrices . Then Eq.(4.57) is used to estimate the variance-Mk, k 1, 2, ... p

covariance matrix of the residuals . For example, the moment estimators of the MAR(1)G

model are:

$ $ $ = M M1 0 1

(4.60)

$ $ $ $ $

G M M M M

T=

0 1 0

1

1

(4.61)

in which superscript -1 indicates a matrix inverse.

After estimating , j= 1,..., p and G as indicated above, Bof Eq. (4.56) can bej

determined from

(4.62)$G BBT=

The above matrix equation can have more than one solution. However, a unique solution can

be obtained by assuming thatB is a lower triangular matrix. This solution, however, requires

that Gbe a positive definite matrix.


48/96

44

4.6 Multivariate CARMA(p,q) Model

When modeling multivariate hydrologic processes based on the full

multivariate ARMA model, often problems arises in parameter estimation. The CARMA

(Contemporaneous Autoregressive Moving Average) model was suggested as a simpleralternative to the full multivariate ARMA model (Salas, et al., 1980). In the CARMA model,

both autoregressive and moving average parameter matrices are assumed to be diagonal such that

a multivariate model can be decoupled into component univariate models. Thus, the model

parameters and do not need to be estimated jointly, but, instead, they can be estimated

independently for each single site by regular univariate ARMA model estimation procedures.

This allows that the best univariate ARMA model can be identified for each single station.

The CARMA(p, q) model can be expressed as

(4.63)Z Zt jj

p

t j t t jj

q

t j= + =

=

1 1

where is a multi-dimensional vector of the normalized and mean corrected observations atZt

time t, is the multi-dimensional vector of noises (residuals) of the processes at timet

t, are the diagonal autoregressive parameter matrices, and are the diagonal movingj

j

average parameter matrices. Equation (4.63) can be decoupled into the model components as

(4.64)Z Zti

ji

j

p

t ji

ti

ji

j

q

t ji= +

=

=

1 1

Thus, Eq.(4.64) is the expression of a univariate ARMA(p,q) model for site isuch that the

parameters and can be estimated by the regular ARMA model estimation methods.ji j

i

The matrix of residual (noise) terms can be expressed as t t t t n

= [ , ,..., ]1 2

(4.65) t tB=

where, the random vector is uncorrelated in time and space, i.e. . It may bet E It tT

( ) =

shown that the variance covariance matrix Gof the correlated series is equal tot

G E BBt tT T= =( ) (4.66)

Thus, a CARMA model implies that the cross-correlations between sites are carried through the

residuals.

Two methods are used for estimating the Gmatrix:

1. The MLE estimate of Gis obtained by

(4.67)$ $ $Gn t t

T

t=

1


49/96

45

where are the residuals calculated from each single site models by using the estimatedt

parameters and .j j

2. The moment (MOM) estimate of Gcomputed from the moment estimator as a function of the

given parameters and the cross-covariances of the data, i.e., (4.68)$ ( , , )G f Mm r k=

where, are the lag-kvariance-covariance matrices of processesZ, m= 1, ...,p; r= 1, ..., q,Mk

and k= 0, ..., max(p, q) - 1.

A moment estimator of the Gmatrix for a general CARMA model is obtained as follows.

By multiplying both sides of Eq. (4.63) by (the transpose of ) one may obtainZT

t Zt

(4.69)Z Z Z Z Z Z Z Z Zt tT

t t

T

p t p t

T

t t

T

t t

T

q t q t

T

= + + + 1 1 1 1

Because and , the lag-0, lag-1, ..., lag-k momentE(ZtZ

T

t k) Mk E( tT

t) G

equationsM0,M1, . . . ,Mpcan be obtained by taking expectations on both sides of Eq.(4.69).

Then, the (i, j) elements of the moment matrices, , can be expressed asM M M Mij ij ij pij

0 1 2, , ,...,

functions of , , . . ., ; , . . ., and Gij; which are( , ) 1 1i j

( , ) 2 2i j

( ) pi

pj

( , ), 1 1i j

( , ) 2 2i j

( , ) qi

qj

the elements of the matrices ; and G; respectively. Analogously, 1 2 1 2, ,..., ; , ,...,p q

anotherpsets of equations for the (j,i) elements can be obtained byM M M Mji ji ji pji

0 1 2, , ,...,

switching the site indices because on the symmetric structure of the CARMA model moment

matrices. Since , and are estimated from the observed processes, aG Gij ji= M Mij ji0 0=

system of 2p+1 linear equations with 2p+1 unknowns, namely, for , etc.G M M M ij ij ij pij

, , ,...,1 2

is formed. Solving each system of linear equations indexed (i, j), the matrix estimate canG

be obtained.

To obtain letG ij

(4.70)Kij

ki

k

q

l

j

l

j

l

k

k l

j

0

1 1

1= = =

( )

and

(4.71)Kmij

mi

ki

k m

q

l

j

l

j

l

k m

k m l

j= + = + =

1 1

( )

where, m= 1, ...,pand . For instance, for a CARMA(3, q) model0 1j

= M M M M ij ij ij ij0 1 2 3, , ,

can be expressed as

(4.72)M M M M K Gij i ji i ji i

pji ij ij

0 1 1 2 2 3 0= + + +

(4.73)M M M M K Gij i ij i ji i ji ij ij1 1 0 2 1 3 2 1= + +

(4.74)M M M M K Gij i ij i ij i ji ij ij2 1 1 2 0 3 1 2= + +


50/96


51/96

47

e B , ,=

(4.79)

where is a (nx 1) vectorof standardized normal variables independent in both time and,

space and is an (nx n) parameter matrix.BThe parameters of the MPAR(p) model are estimated by the MOM by substituting the

sample moments into the moment equations in a similar manner as

SAMS 2000 User's Manual.pdf

Documents

Transcript of SAMS 2000 User's Manual.pdf