i TIME SERIES MODELING USING MARKOV AND ARIMA MODELS ...

93
TIME SERIES MODELING USING MARKOV AND ARIMA MODELS MOHD KHAIRUL IDLAN BIN MUHAMMAD A report submitted in partial fulfillment of the requirements for the award of the degree of Master of Engineering (Civil Hydraulic & Hydrology) Faculty of Civil Engineering Universiti Teknologi Malaysia JANUARY 2012

Transcript of i TIME SERIES MODELING USING MARKOV AND ARIMA MODELS ...

i

TIME SERIES MODELING USING MARKOV AND ARIMA MODELS

MOHD KHAIRUL IDLAN BIN MUHAMMAD

A report submitted in partial fulfillment of the requirements for the award of the degree of

Master of Engineering (Civil – Hydraulic & Hydrology)

Faculty of Civil Engineering Universiti Teknologi Malaysia

JANUARY 2012

iii

DEDICATION

Special dedication to my beloved father and mother

Mr. Muhammad bin Ismail

and

Madam Siti Maznah binti Abdullah

and

My inspiration…

Jazakumullahu khairan for all love and inspiration

throughout the entire creation of this thesis.

iv

ACKNOWLEDGEMENT

Assalammualaikum w.b.t.

Alhamdulillah, all praise to Allah S.W.T for the gift of life and what I have achieved

today.

Appreciation goes to my family for their prayers, moral and financial support. May

Allay reward you abundantly.

My sincere and deepest gratitude goes to my supervisor, Dr. Sobri Harun for his

guidance, encouragement and support in completing this master project.

My gratitude to Dr. Muhammad Askari for his invaluable suggestions, guidance, and

encouragement.

Last but not least, to all my lecturers, classmates and friends, their help and supports are

really appreciated and will be remembers forever, InsyaALLAH. Thank you all

.

v

ABSTRACT

Streamflow forecasting plays important roles for flood mitigation and water

resources allocation and management. Inaccurate forecasting will cause losses to water

resources managers and users. The suitability of forecasting method depends on type and

number of available data. Thus, the objective of this study are to propose the streamflow

forecasting methods using Markov and ARIMA models and to inspect the accuracy of

Markov and ARIMA models in forecasting ability. Streamflow data of Sungai Bernam,

Selangor was used. Minitab and Microsoft Excel were used to model ARIMA and

Markov respectively. Criteria performance evaluation procedure that being used in this

study were Mean Absolute Percentage Error (MAPE), Root Mean Squared Error

(RMSE) and Chi-square test of Normality to inspect the forecasting accuracy of the

different models. The tentative model that best fits the criteria and meets the requirement

for ARIMA model is ARIMA (1,1,1)(0,1,1)12. From the criteria performance evaluation

procedure, ARIMA model has better performance of model for forecasting than Markov

model in this study. Therefore, ARIMA model has the ability to accurately predict the

future monthly streamflow for Sungai Bernam.

vi

ABSTRAK

Peramalan aliran sungai memainkan peranan yang penting untuk kawalan banjir

dan pengurusan air. Peramalan yang tidak tepat akan menyebabkan kerugian kepada

pihak pengurusan sumber air dan juga kepada pengguna. Kesesuaian kaedah peramalan

bergantung kepada jenis dan jumlah data yang tersedia. Maka, objektif kajian ini adalah

untuk mencadangkan kaedah peramalan aliran sungai dengan menggunakan model

Markov dan ARIMA dan untuk memeriksa ketepatan model Markov dan ARIMA dalam

membuat peramalan. Data aliran sungai Sungai Bernam telah digunakan. Minitab

digunakan untuk memodelkan model ARIMA dan Microsoft Excel digunakan untuk

memodelkan model Markov. Prosedur penilaian prestasi kriteria yang digunakan dalam

kajian ini ialah Mean Absolute Percentage Error (MAPE), Root Mean Squared error

(RMSE) dan ujian Chi-Squared untuk memeriksa ketepatan peramalan model-model

yang berlainan. Tentatif model yang terbaik sesuai dengan kriteria dan memenuhi

kehendak untuk model ARIMA ialah ARIMA (1,1,1)(0,1,1)12. Dari prosedur penilaian

prestasi kriteria, model ARIMA mempunyai prestasi yang lebih baik dalm membuat

ramalan berbanding dengan model Markov. Justeru, model ARIMA mempunyai

keupayaan untuk meramalkan dengan tepat aliran sungai di masa hadapan untuk Sungai

Bernam.

vii

TABLE OF CONTENTS

CHAPTER TITLE PAGE

DECLARATION ii DEDICATION iii ACKNOWLEDMENT iv ABSTRACT v ABSTRAK vi TABLE OF CONTENTS vii LIST OF TABLES x LIST OF FIGURES xi LIST OF APPENDICES xii LIST OF ABBREVIATIONS xiii

1 INTRODUCTION 1

1.1 Background of study 1

1.2 Problem Statement 4

1.3 Justification of the Study 4

1.4 Aim and Objectives 5

1.5 Scope of Study 5

2 LITERATURE REVIEW 6

2.1 Introduction 6

2.2 Time Series Model 7

2.3 Forecasting Time Series 8

2.4 Streamflow Forecasting Method 10

2.4.1 Markov Model 11

viii

2.4.2 ARIMA Theory 12

2.4.3 ARIMA Algorithms 13

2.4.3.1 AR Model 14

2.4.3.2 MA Model 14

2.4.3.3 ARMA Model 15

2.4.3.4 ARIMA Model 16

2.5 Reviews on Markov Model 17

2.6 Review on ARIMA Model 18

2.7 Concluding Remarks 19

3 METHODOLOGY 20

3.1 Introduction 20

3.2 Markov Model 21

3.2.1 Statistical Parameters of Historical Data 21

3.2.2 Identification of Distribution 23

3.2.3 Generation of Random Numbers 24

3.2.4 Formulation of the Markov Model 24

3.3 ARIMA Model 25

3.3.1 Model Assumptions 26

3.3.1.1 Data Stationarity 26

3.3.1.2 Normal Distribution 27

3.3.1.3 Outlier 28

3.3.1.4 Missing Data 28

3.3.2 Model Procedure 29

3.3.2.1 Model Identification 29

3.3.2.2 Parameter Estimation 31

3.3.2.3 Diagnostic Checking 31

ix

3.3.3 Minitab Procedure 32

3.4 Model Comparison and Forecast Evaluation Measures 33

4 RESULTS AND DISCUSSION 35

4.1 Introduction 35

4.2 Estimation of Missing Data Values 36

4.3 Markov Model 38

4.3.1 Statistical Parameters of Historical Data 39

4.3.2 Identification of Distribution 40

4.3.3 Generation of Random Numbers 43

4.3.4 Streamflow Generation of Markov Model 45

4.3.5 Validation of Markov Model 46

4.4 ARIMA Model 48

4.4.1 Model Identification 49

4.4.2 Parameter Estimation 53

4.4.3 Diagnostic Checking 55

4.4.4 Streamflow Generation of ARIMA Model 58

4.4.5 Validation of ARIMA Model 59

3.4 Model Comparison and Forecast Evaluation Measures 60

5 CONCLUSION AND RECOMMENDATIONS 65

5.1 Conclusion 65

5.2 Recommendations 66

REFERENCES 68

APPENDICES A-G 72 - 81

x

LIST OF TABLES

TABLE NO. TITLE PAGE

4.1 Parameters of Monthly Historaical Data 40

4.2 Logarithmic Values of Observed Streamflow Data

for 1960-1970 42

4.3 Generation of Random Number for Year 2006 45

4.4 Model Streamflow for Year 2006 46

4.5 Accuracy of the Markov Model 47

4.6 General Theoretical ACF and PACF of ARIMA

models

51

4.7 Final Estimates of Parameter for ARIMA (1,1,1)

(1,1,1)12

54

4.8 Final Estimates of Parameter for ARIMA (1,1,1)

(0,1,1)12

54

4.9 Modified Box-Pierce (Ljung Box) Chi-Square

statistic for ARIMA (1,1,1)(1,1,1)12

55

4.10 Modified Box-Pierce (Ljung Box) Chi-Square

statistic for ARIMA (1,1,1)(0,1,1)12

56

4.11 LSE and RMSE Test for ARIMA Tentative Model 56

4.12 Model Streamflow for Year 2006-2007 58

4.13 Accuracy of the ARIMA Model 60

4.14 Accuracy of the model 62

xi

LIST OF FIGURES

FIGURE NO. TITLE PAGE

2.1 Value of time series with forecast function at 50%

probability limits 9

3.1 Flowchart of ARIMA modeling 29

4.1 Linear Regression of Two Streamflow for 1962 36

4.2 Linear Regression of Rainfall and Streamflow 37

4.3 Linear Regression of Two Streamflow for 1993 38

4.4 Descriptive Statistics of Sungai Bernam Data 39

4.5 Probability Density Function 41

4.6 Cumulative Distribution Function 42

4.7 Cumulative Distribution Function of the Log-normal

Distribution

43

4.8 Comparison of Observed and Markov Flow 47

4.9 Flow Diagram of Box-Jenkins Methodology 48

4.10 Non stationary data of Sg. Bernam streamflow 50

4.11 Stationary data of Sg. Bernam streamflow 50

4.12 ACF after non-seasonal difference 51

4.13 PACF after non-seasonal difference 52

4.14 ACF after seasonal difference 52

4.15 PACF after seasonal difference 53

4.16 Comparison of Observed and ARIMA Model Flow 59

4.17 Model Comparison 61

4.18 Streamflow for actual and model 63

xii

LIST OF APPENDICES

APPENDIX TITLE PAGE

A Streamflow Data of Sungai Bernam 1960-2010 72

B Logarithmic of Observed Streamflow Data for 1960-2005 73

C Generation of Random Number for Year 2006-2010 74

D Markov Model Streamflow 75

E Performance Evaluation Procedure of Markov Model 76

F ARIMA Model Streamflow 78

G Performance Evaluation Procedure of ARIMA model 80

xiii

LIST OF ABBREVIATIONS

ACF - Autocorrelation Function

AD - Anderson Darling

AR - Autoregressive

ARIMA - Autoregressive Integrated Moving Average

DF - Degree of Freedom

K-S - Kolmogorov-Smirnov

LSE - Least Squared Error

MA - Moving Average

MAPE - Mean Absolute Percentage Error

PACF - Partial Autocorrelation Function

RMSE - Root Mean Square Error

R2 - Coefficient of Determination

S - Standard Deviation

SE - Standard Error

Sg. - Sungai

Χ2 - Chi-square

CHAPTER 1

INTRODUCTION

1.1 Background of Study

According to Bowerman and O’Connell (1993), predictions of future events and

conditions are called forecasts, and the act of making such predictions is called

forecasting. In many types of organizations, forecasting is very important as predictions

of future events must be incorporated into the decision-making process. In forecasting

events that will occur in the future, information concerning events that have occurred in

the past must be relied.

In order to prepare forecasts, past data need to be analyzed to identify a pattern

that can be used to describe it. Then, this pattern is extrapolated or extended into the

future. This forecasting technique rests on the assumption that the pattern that has been

identified will continue in the future to give good predictions. If the data pattern that has

been identified does not persist in the future, this indicates that the forecasting technique

used is likely to produce inaccurate predictions (Bowerman and O’Connell, 1993).

2

Most forecasting problems involve the use of time series data. In this study, time

series is used to prepare forecasts. Time series is formed from measurements of a

variable taken at regular intervals over time. It is a stochastic process which amounts to

a sequence of random variables. The hydrologic data of streamflows fall under the

category of time series (Gupta, 1989). Time series can be used in application of

forecasting of future values of a time series from current and past values, and can be

used to forecast streamflow (Box and Jenkins, 1976). Time series plots can reveal

patterns such as random, trends, level shifts, periods or cycles, unusual observations, or

a combination of patterns.

Streamflow forecasting plays important roles for flood mitigation and water

resources allocation and management. In water management, the high quality

streamflow forecast and efficient use of this forecast can give considerable economic

and social benefits. Short-term forecasting like hourly and daily forecasting is crucial for

flood warning and defense while long-term forecasting which is based on monthly,

seasonal or annual time series is very useful for reservoir operation, irrigation

management decision, drought mitigation and managing river treaties (Shalamu, 2009).

Recently, due to the increase in data availability from metering stations, real time

data retrieval and increasing computational capability with the development of more

robust methods and computer techniques, time series models have become quite popular

in streamflow forecasting (Wang, 2006). A considerable number of forecasting models

and methodologies have been developed and applied in streamflow forecasting due to

importance of hydrologic forecasting. In this study, Markov and ARIMA model have

been used in the modeling of monthly streamflow processes.

3

The Markov process considers that the value of streamflow at one time is

correlated with the value of the streamflow at an earlier period (i.e. a serial or

autocorrelation exists in the time series). In a first-order Markov process, this correlation

exists in two successive values of the events (Gupta, 1989).

The first order Markov model states that the value of a variable x in one time

period is dependent on the value of x in the preceding time period plus a random

component. Thus, the synthetic streamflow represent a sequence of numbers, each of

which consists of two parts, which are deterministic and random parts (Gupta, 1989).

Autoregressive Integrated Moving Average (ARIMA) which is often called

method of Box-Jenkins time series has good accuracy for short-term forecasting, but less

good accuracy for long-term forecasting. Usually, it will tend to become flat for a

sufficiently long period. ARIMA model ignores the independent variable completely,

and uses past and present values of dependent variable to produce accurate short-term

forecasting (Hendranata, 2003).

ARIMA is suitable when the observation of time series is statistically related to

the dependent. The purpose of this model is to determine good statistical relationships

between the variables that being predicted and the historical value of these variables, so

that forecasting can be performed with the model (Hendranata, 2003).

4

1.2 Problem Statement

There are many time series forecasting methods can be used to predict the

streamflow. However, not all of these methods can produce accurate forecasts.

Inaccurate forecasting will cause losses to water resources managers and users. The

suitability of forecasting method depends on type and number of available data. ARIMA

and Markov models must be inspected to determine the ability of this method to provide

accurate and reasonable monthly streamflow forecasting. Through statistical methods,

the accuracy of both models for forecasting monthly streamflow will be tested and

evaluated. ARIMA modeling approach and Markov model was employed to the data set

to further investigate the behavioral change in the streamflow. The result of the study

can be used as a reference guideline to the flood control as Markov and ARIMA models

best suited for short-term forecasting.

1.3 Justification of the Study

Monthly streamflow forecasting is an integral part of drought, irrigation and

reservoir operation management. Stochastic data generation aims to provide alternative

hydrologic data sequences that are likely to occur in future to assess the reliability of

alternative systems designs and policies, and to understand the variability in future

system performances. It is also very important to develop a stochastic hydrologic model

to generate the monthly streamflows and thus to estimate the future streamflows.

Through this model, it is wish that the problem on water shortage can be reduced.

Forecasting also can be used to give warning of extreme events like drought (Joomizan,

2010).

5

1.4 Aim and Objectives

The aim of this paper is to forecast streamflow by using appropriate time series

modeling approach. To achieve this aim, the following objectives have been identified:

1. To propose the streamflow forecasting methods using Markov and ARIMA

models.

2. To inspect the accuracy of Markov and ARIMA models in forecasting ability.

1.5 Scope of Study

In this study, two models of time series are used which are Markov model and

ARIMA model to predict the behavior of streamflow. Streamflow data of Sungai

Bernam, Selangor for the period of 1960 to 2010 were used for the application of the

model. The study area that located in southeast Perak and northeast Selangor is semi

developed area and the size is 186km2.

Streamflow data were obtained from station Sg. Bernam at Tanjung Malim

(Station No. 3615412). The data which is monthly streamflow were collected from the

Department of Irrigation and Drainage, Kuala Lumpur. Computer program that being

used for ARIMA model is Minitab 15 and Microsoft Excel is used for Markov model.

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

Generally, surface water hydrology is the basis to engineering design and sources

of water. High streamflow may cause disaster like flood and erosion. Short-term

forecasting is needed to control this. Meanwhile, low streamflow can disrupt water

supply to domestic user, industrial, generation of hydroelectric power and irrigation.

Here, long-term forecasting is useful to prevent this problem. Therefore, ability to

generate streamflow forecasting accurately can be used in water flow management and

flood control.

Modeling and forecasting time series has long been practiced by using different

statistical methods. Forecasting models of time series that are commonly used are

ARIMA, moving average, exponential smoothing, regression analysis, and Fourier series

analysis. In this study, Markov and ARIMA model are used to predict monthly

streamflow.

7

2.2 Time Series Model

A time series is a time-oriented or chronological sequence of observations on a

variable of interest (Montgomery et al., 2008). Time series models have become popular

in recent years since the publication of the book by Box and Jenkins (1970), and the

subsequent development of computer software for applying these models (Bell, 1984).

The time can be a discrete value, a time interval or a continuous function. The

hydrologic data of streamflows, precipitation, groundwater or lake levels, water

temperatures, or oxygen concentration fall under the category of time series. These data

can be deterministic, random, or a combination of the two (Gupta, 1989).

Many conventional statistical methods traditionally deals with models in which

the observations are assumed to be independent. However, a great deal of data in

business, economics, engineering and natural sciences occur in the form of time series

where observations are dependent. The systematic approach available for answering the

mathematical and statistical questions posed by these series of dependent observations is

called time series analysis. The objective of time series analysis is generally to

understand and identify the stochastic process that produced the observed series and then

to forecast future values of a series from past values alone (Akgun, 2003).

The analysis of a time series, in the time domain, is performed by a parameter

known as the serial correlation coefficient or the autocorrelation coefficient. This

parameter indicates the dependence in successive values of a time series. This

coefficient is determined for successive values (elements) and also for elements that are

various time intervals apart which known as lag period. A graph of the autocorrelation

coefficient against the lag period is known as the correlogram. If a correlogram shows

zero or nearly zero values for all lag periods, the process is purely random. A value close

to 1 will suggest a dominating deterministic process (Gupta, 1989).

8

The analysis of a time series in the frequency domain is done by the spectral

density that identifies the cyclic nature or periodicity in the series. The density indicates

the cycle in the deterministic data. In a purely random process it oscillates randomly.

The purpose of streamflow synthesis, however is not to analyze a time series but to

generate the data based on the series. This does not require the decomposition of the

time series by the analysis above but an understanding of its statistical properties to

reproduce series of similar statistical characteristics (Gupta, 1989).

2.3 Forecasting Time Series

Most forecasting problems involve the use of time series data. Montgomery et al.

(2008) stated that forecasting problems are often classified as short-term, medium term,

and long-term. Short-term forecasting problems involve predicting events only a few

time periods (days, weeks, months) into the future. Medium-term forecasts extend from

one to two years into the future, and long-term forecasting problems can extend beyond

that by many years. Short-term and medium-term forecasts are used for operations

management and development of projects while long-term forecasts can be used for

strategic planning.

In this study, we try to use Markov and ARIMA for long-term forecasting. As we

know, Markov and ARIMA models are best for short-term forecasting. Normally, short-

term and medium-term forecasts are based on identifying, modeling, and extrapolating

the patterns found in historical data. These historical data usually exhibit inertia and do

not change very drastically. Therefore, statistical methods are very useful for short-term

and medium-term forecasting (Montgomery et al., 2008).

9

The use at time t of available observations from a time series to forecasts its

value at some future time can provide a basis for (1) economic and business planning,

(2) production planning, (3) inventory and production control, and (4) control and

optimization of industrial processes (Box et al., 1994). As originally described by Brown

(1962), forecasts are usually needed over a period known as the lead time, which varies

with each problem. Usually, forecasts are made at time t by taking the current month Yt

and previous months Y1, Y2,…,Yt-1, to forecast at some future time Ft+1, Ft+2,…, Ft+m from

Y value forward.

In order to calculate best forecasts, it is necessary to specify their accuracy. The

accuracy of the forecasts may be expressed by calculating convenient set of probability

limits on either side of each forecast, such as 50% and 95%. It means that the realized

value of time series will be included within these limits with the stated probability when

it eventually happens. To illustrate, Figure 2.1 shows value of time series with forecast

made from origin t for lead time l together at 50% probability limits.

Figure 2.1: Value of time series with forecast function at 50% probability limits

(Source: Box et al., 1994)

10

2.4 Streamflow Forecasting Method

Being a natural phenomenon, streamflow has a random component. But, it is not

fully random because it has been observed that a low flow tends to follow low flow and

a high flow tends to follow high flow. The word “stochastic” is used to denote the

randomness in statistics but in hydrology it refers to a partial random sequence as well.

Therefore, the streamflow data that represent time series is actually involving a

stochastic process. Various stochastic processes are used for generating the hydrologic

data (Gupta, 1989).

Stochastic modeling of hydrologic time series has been widely used for planning

and management of water resources systems such as for reservoir sizing and forecasting

the occurrence of future hydrologic events. For example, stochastic models are used to

generate synthetic series of water supply that may occur in the future which are then

utilized for estimating the probability distribution of key decision parameters such as

reservoir storage size. Furthermore, stochastic models can be used for forecasting water

supplies and water demands in days, weeks, months and years in advance (Fortin et al.,

2004).

The previous rainfall and streamflow records can be utilized as model inputs for

forecasting the next time step ahead of the streamflow (Mohd Shafiek et al., 2005). This

study employs the previous streamflow records to forecast the streamflow discharge of

the following month.

There are some stochastic models that can be utilized for synthetic generation

and forecasting of hydrological process. Hydrologic processes such as monthly

streamflow may be well represented by stationary linear models such as Markov process

11

or autoregressive (AR) and autoregressive integrated moving average (ARIMA) models.

These models are usually capable of preserving the historical annual statistics, such as

the mean, variance, skewness and covariance (Fortin et al., 2004). In this study, Markov

and ARIMA models are used to predict future monthly streamflow.

2.4.1 Markov Model

The Markov process considers that the value of an event (i.e. streamflow) at one

time is correlated with the value of the event at an earlier period (i.e. a serial or

autocorrelation exists in the time series). In a first-order Markov process, this correlation

exists in two successive values of the events. The first order Markov model, which

constitutes the classic approach in synthetic hydrology, states that the value of a variable

x in one time period is dependent on the value of x in the preceding time period plus a

random component. Thus the synthetic flow for a stream represent a sequence of

numbers, each of which consists of two parts:

(2.1)

where is flow at ith time (ith number of a time series); di(t) is deterministic part at ith

time; and ei is random part at ith time. The values of ei are tied up with the historical data

by ensuring that they belong to the same frequency distribution and posses similar

statistical properties (mean, deviation, skewness) as the historical series (Gupta, 1989).

The various forms and combinations of deterministic and random component are

recognized as different models. Single season (annual) flow model of lag 1 is the

12

simplest model which assumes that the magnitude of the current flow is significantly

correlated with the previous flow value only. In the other hand, multiple-season models

divide the yearly flow into seasons or months (Gupta, 1989).

First order Markov Model has been successfully applied to many problems.

Examples include modeling sequential data using Markov chains, and solving control

problems posed in the Markov decision processes (MDP) framework. If the Markov

model’s parameters are estimated from data, the standard maximum likelihood estimates

consider the first order (single step) transitions only. But for many problems, the first

order conditional independence assumptions are not satisfied as a result of the higher

order transition probabilities can be poorly approximated by the learned model

(Joomizan, 2010).

The assumption of first order Markovian processes for representing the inflow

process of a reservoir has generally been considered in the literature as adequate for

most purposes. The development of models incorporating other approaches result in

extremely complex transition probability matrices (Wurbs, 2005).

2.4.2 ARIMA Theory

ARIMA is an abbreviation of AutoRegressive Integrated Moving Average

introduced by Box and Jenkins (Box et.al., 1994). As such, some authors refer to this

modeling approach as a Box and Jenkins model. Box-Jenkins model is stationary time

series model. Time series that generated from zero-mean, finite variance, and

13

uncorrelated variable is called a ‘white noise’ series which many useful models can be

constructed from it.

The ARIMA modeling is essentially an exploratory data-oriented approach that

has the flexibility of fitting an appropriate model which is adapted from the structure of

the data itself. The stochastic nature of the time series can be approximately modeled

with the aid of autocorrelation function and partial autocorrelation function; from which

information such as trend, random variables, periodic components, cyclic patterns and

serial correlation can be discovered. As a result, forecasts of the future values of the

series, with some degree of accuracy can be readily obtained (Ho and Xie, 1998).

Although ARIMA modeling is sophisticated in theory, but with the advent of

computer technology today, the iterative model building process and hence accurate

forecast can be aided and made simpler by the ease of many user-friendly statistical

software packages such as SAS, Statgraphics, Statistica and Minitab. An iterative three-

stage process, i.e. through model identification, parameter estimation and diagnostic

check is required to determine the adequacy of the proposed model (Ho and Xie, 1998).

2.4.3 ARIMA Algorithms

ARIMA contains three components, namely autoregressive (AR), Integrated (I)

and moving average (MA) parts. The AR part described the relationship between present

and past observations. The MA part represents the autocorrelation structure of error. The

I part represents the differencing level of the series to eliminate non-stationary

(Hasmida, 2009). It is usually denoted by (p,d,q)(P,D,Q) where p denotes order of auto-

regressive component, d denotes order of differencing, q denotes order of moving

average and (P,D,Q) denotes corresponding seasonal component.

14

2.4.3.1 AR Model

AR(p) model expressed the current value of time series as a linear combination

of p previous values and a white noise term (random shock). Bell (1984) expressed the

current value of time series of AR(p) model as:

Yt = φ1Yt-1 + ··· + φpYt-p + at (2.2)

where φ1,…, φp are AR(p) parameters, the at is the random shock in normal distribution

with zero mean and variance at time t, and p is the order of AR(p).

By introducing the backshift operator B, which defines (BYt = Yt-1), equation

(2.2) can be written as:

(1- φ1B - ··· - φpBp)Yt = at (2.3)

Or φ(B)Yt = at where φ(B) = 1- φ1B - ··· - φpBp

2.4.3.2 MA Model

MA(q) model expressed the current value of a time series as a linear combination

of a current and q previous values of a white noise process. The (purely) moving average

(MA) model is (Bell, 1984):

Yt = at - θ1at-1 - ··· - θqat-q (2.4)

Or Yt = (1- θ1B - ··· - θqBq) at (2.5)

15

Or Yt = θ(B) at.

where q is the order of MA(q), and θ coefficients are MA(q) model parameters.

2.4.3.3 ARMA Model

To increase flexibility when fitting actual time series, both autoregressive and

moving average operators are combined to give the ARMA (p,q) model (Bell, 1984):

Yt = φ1Yt-1 + ··· + φpYt-p + at - θ1at-1 - ··· - θqat-q (2.6)

which we write as:

(1- φ1B - ··· - φpBp)Yt = (1- θ1B - ··· - θqBq) at (2.7)

Or φ(B)Yt = θ(B) at.

The mixed type of series which are explained both by its own lagged values and

by lagged noise terms is called Autoregressive Moving-Average models of order (p,q).

This systematic class of stationary time series models carries great importance and

usefulness especially in real-life situations. If the process is stationary, a suitable ARMA

model can be used to represent the data. If it is nonstationary, differencing is applied to

make the model become stationary and this leads to ARIMA model (Akgun, 2003).

16

2.4.3.4 ARIMA model

The first of these conditions implies that the series Yt following (2.6) is

stationary. In practice Yt may well be nonstationary, but with stationary first difference,

Yt - Yt-1 = (1-B) Yt.

If (1-B) Yt is nonstationary, we may need to take the second difference,

Yt - 2Yt-1 + Yt-2 = (1-B) [(1-B)Yt]

= (1-B)2 Yt.

In general, we may need to take the dth difference (1-B)d Yt (although rarely is d

larger than 2). Substituting (1-B)d Yt for Yt in (2.7) yields the ARIMA (p,d,q) model

(Bell, 1984):

(1- φ1B - ··· - φpBp) (1-B)d Yt = (1- θ1B - ··· - θqBq) at (2.8)

Or φ(B) (1-B)d Yt = θ(B) at.

where d is the order of differencing.

When a time series exhibits potential seasonality indexed by s, using a multiplied

seasonal ARIMA(p,d,q)(P,D,Q)s model is advantageous. The seasonal time series is

transformed into a stationary time series with non-periodic trend components. A

multiplied seasonal ARIMA model can be expressed as (Lee and Ko, 2011):

(1- φ1B - ··· - φpBp) (1- Φ1Bs - ··· - ΦPBPs) (1-Bs)D Yt =

(1- θ1B - ··· - θqBq) (1- Θ1B - ··· - ΘQBQs) at (2.9)

17

Or φ(B)Φ(Bs) (1-Bs)D Yt = θ(B)Θ(Bs)at.

where D is the order of seasonal differencing, Φ(Bs) and Θ(Bs) are the seasonal AR(p)

and MA(q) operators respectively, which are defined as:

Φ(Bs) = 1- Φ1Bs - ··· - ΦPBPs

Θ(Bs) = 1- Θ1B - ··· - ΘQBQs

where Φ1,…, Φp are the seasonal AR(p) parameters and Θ1,…, Θp are the seasonal

MA(q) parameters.

To illustrate forecasting with ARIMA models, we shall use (2.9) written as:

Yt+l = Φ1Yn+l-1 + ··· + Φp+dYn+l-p-d + an+l - θ1an+l-1 - ··· - θqan+l-q (2.10)

for t = n + l. We shall assume we want to forecast Yn+l for l = 1, 2, … using data Yn, Yn-

1, …. For simplicity, we are assuming for now that the data set is long enough so that we

may effectively assume it extends into the infinite past.

2.5 Reviews on Markov Model

Naadimuthu and Lee (1982) proposed first order or lag one serially correlated

inflow. This means that the inflow of each month is dependent only on the inflow of the

previous month, forming a Markov chain. Markov chain method is stochastic method

that can be used to produce new time series of discharge of inflows based on available

time series of data (Adib and Majd, 2009).

18

According to Heiko (2000), Markov chains are stochastic processes that can be

parameterized by empirically estimating transition probabilities between discrete states

in the observed systems. The Markov chain of the first order is one for which each next

state depends only on immediately preceding one. Markov chains of second or higher

order are the processes in which the next state depends on two or more preceding ones.

Dalphin (1987) developed a lag-1 month-to-month Markov streamflow model in

which families of three-parameter Weibull distributions describe monthly streamflow

probabilistically, conditioned on streamflow in the preceding month.

2.6 Reviews on ARIMA Model

Tang et al. (1991) stated that ARIMA model is only good for short term

forecasting since it builds its forecast on previous observations. ARIMA model needs

long memory series, which are more inputs to provide more accurate forecasts. For long

memory series, more training patterns results in more accurate forecasts. This Box-

Jenkins model does not work well or does not work at all for short input series.

Ho and Xie (1998) proved that ARIMA model is a viable alternative that give

satisfactory results for repairable system reliability forecasting. Ayob and Amat (2004)

used ARIMA to represent water use behavior at Universiti Teknologi Malaysia. ARIMA

modeling method also can be applied to analyses the water quality and rainfall-runoff

data for Johor River recorded for a long period (Hasmida, 2009).

19

Maia et al. (2008) demonstrated that ARIMA exhibited a satisfactory

performance in forecasting interval series with either a linear or non-linear behavior and

are useful forecasting alternative to interval-valued time series. However, the hybrid

model using ARIMA and artificial neural network had better average performance.

A multiplicative seasonal autoregressive integrated moving average is applied to

the monthly streamflow forecasting of the Zayandehrud River in western Isfahan

province, Iran (Modarres, 2007). Nazuha (2010) used ARIMA to analyze monthly

Malaysia crude oil production. Besides that, Yurekli et al. (2004) used ARIMA to

simulate monthly maximum data of Cekerek Stream.

2.7 Concluding Remarks

Various techniques can be utilized for synthetic generation and forecasting of

hydrological process. Stochastic models can provide alternative hydrologic data

sequences that are likely to occur in the future to access the reliability of alternative

systems designs and policies, and to understand the variability in future system

performance.

Streamflow forecasting is an integral part of land management and water

resources management. Hydrologic processes such as monthly streamflow may be well

represented by stationary linear models such as Markov process or autoregressive (AR)

and autoregressive integrated moving average (ARIMA) models.

CHAPTER 3

METHODOLOGY

3.1 Introduction

Various stochastic processes are used for generating the hydrologic data of

streamflow. The models either developed or used in order to carry out this study are of

different types in terms of their purposes, capabilities, interfaces, inputs, and outputs.

These mainly include water balance model, reservoir simulation, and stochastic models.

The brief descriptions of the model development and considerations associated

with each of the models are presented in the following sections. The computation work

used the available historical data taken from Department of Irrigation and Drainage. The

relevant data is used in deriving the forecasting models. Markov and ARIMA modeling

methods have been proposed for streamflow forecasting of Sungai Bernam. The method

to determine the accuracy of these models in forecasting ability also will be discussed.

21

3.2 Markov Model

Gupta (1989) stated that the general Markov procedure of data synthesis comprises:

1. Determination of statistical parameters from the analysis of the historical

record

2. Identifying the frequency distribution of the historical data

3. Generating random numbers of the same distribution and statistical

characteristics

4. Constituting the deterministic part considering the persistence (influence

of previous flows) and combining with the random part.

3.2.1 Statistical Parameters of Historical Data

Four parameters that are important in a synthetic study are mean flow, standard

deviation, coefficient of skewness and correlation coefficient. The sample mean flow is

(Gupta, 1989):

(3.1)

Where,

mean observed (historical) flow

total numbers (values) of flow

ith number of observed flow

22

The sample estimate of the variance or standard deviation, S, which is a measure

of the variability of the data is given by (Gupta, 1989):

(3.2)

The sample of coefficient of skewness, g, which is a measure of the lack of symmetry, is

given by (Gupta, 1989):

(3.3)

The serial correlation coefficient is a measure of the extent to which a flow at

any time is affected by the flow at another time. The K-lag coefficient, in which the

effect extends by K time units is given by (Gupta, 1989):

(3.4)

The one-lag serial coefficient, in which the current flow is affected only by the

previous flow can be obtained by substituting K = 1. The additional lags should be

included as long as they produce a model that explains more about the pattern of flows

than one with fewer lag does (Fiering and Jackson, 1971).

23

3.2.2 Identification of Distribution

Generally, the distributions used in streamflow generation are normal, log-

normal and gamma families. The bell-shaped, or normal, distribution is most extensively

used in statistical applications because the sum of variables derived from any

distribution tends to be distributed normally according to the central limit theorem. To

test normality, the historical values of flow are plotted against the percentage of values

in the record that are equal to or greater than the plotted value. The flows are arranged in

descending order. For each value xi, the percent is computed by 100(n – i + 1) / n where

i is the rank of value xi and n is the number of historic values. If the plot is a straight

line, the distribution is normal. The coefficient of skewness also should be close to zero,

since the normal distribution has no skewness (Gupta, 1989).

The second distribution that is widely used in hydrology is log-normal

distribution. Log-normal distribution is positively skewed, match with characteristic of

many hydrologic variables. This distribution is suitable for low-flow studies because

small changes in low values produce large changes in their logarithmic values. A

straight-line plot indicates the log-normal distribution, while skewness calculated from

the logarithms of value should be close to zero (Gupta, 1989).

Gamma distribution is used when the historical records of flows or logarithms of

flows show appreciable skewness. However, this distribution cannot be used when

multiple lags exist when a flow is affected by many previous flows. Normally, historical

data do not clearly fit any of these distributions. The choice is made based on the

purpose, economics and any other considerations (Gupta, 1989).

24

3.2.3 Generation of Random Numbers

Gupta (1989) stated that the source of random numbers can be generated either

by the computer-based pseudorandom-number generator or the random number tables.

The random number should belong to the same distribution to which the historical

record belongs for the generated flow to have similar characteristics. Normal random

numbers have a zero mean and one standard deviation while Log-Normal random

numbers have both mean and standard deviation equal to one.

3.2.4 Formulation of the Markov Model

Formulation of the Markov Model for annual flow (Gupta, 1989):

(3.5)

where is streamflow at ith time; is mean of recorded flow; ri is lag 1 serial or

autocorrelation coefficient; S is standard deviation of recorded flow; ti is random variate

from an appropriate distribution with a mean of zero and variance of unity; and i is ith

position in series from 1 to N years.

A model on the same lines for monthly flows, developed by Thomas and Fiering

has the following form (Maass et al., 1962):

(3.6)

25

Where,

i = month in series, measured from the beginning

j = month in year, j = 1, 2, …, 12 for January to December

qi,j = flow in ith month from the beginning, for jth month of the year

qi-1,j-1 = immediate previous month

= mean of flows of jth month (12 values)

bj = regression coefficient of flows of jth month and flows of (j-1)th

month = rjSj/Sj-1 (12 values)

Sj = standard deviation for jth month (12 values)

ti,j = random normal deviate of zero mean and unit standard deviation

3.3 ARIMA Model

ARIMA models as become common practice for specification of stationary time-

dependent input processes since the work of Box and Jenkins (1970). ARIMA models

are usually used as discrete-time processes (Leemis, 1998) and hence the data from a

trace is interpreted as a count process for ARIMA fitting. There are some assumptions

that were made for performing ARIMA model. Besides, this model has specific

procedures to be followed for fitting ARIMA models to time series.

26

3.3.1 Model Assumptions

Before performing the ARIMA modelling, some assumptions were made such

that (Hasmida, 2009):

1. The data is stationary

2. The data have normal distribution

3. No outlier exist in the data

4. No missing data

3.3.1.1 Data Stationarity

Classical Box-Jenkins model describe stationary time series. Thus, in order to

tentatively identify Box-Jenkins model, we must first determine whether the time series

we wish to forecast is stationary. The stationarity of monthly streamflow data were

examined by graphical representation of the data. The original data were plotted against

its time interval which is in month. A time series is stationary if the statistical properties

(for example, the mean and the variance) of the time series are essentially constant

through time (Bowerman and O’Connell, 1993). In order word, stationary models

assume that the process remains in equilibrium about a constant mean level that is when

the plotting shows that the data fluctuates around its constant mean (Box et al., 1994).

Other graphical method applied in this present study is by examined the ACF and PACF

plot of the original data. Stationary data have randomly distributed ACF and PACF plot.

27

The transformation process might be required for the non stationary series and

this can be done using differencing method (Box et.al., 1994) and (Shumway, 1988).

This process has been considered in ARIMA modelling approach as the I (Integrated)

component or represent as d in ARIMA notation. The level of differencing is highly

depending on the level of stationarity of the data. The level of differencing might be 0, 1,

2 or higher than 2. 0 levels means that the differencing process is not perform to the

data. Then level 1 represent the first differencing process needed and second

differencing level needed for level 2. Higher level of differencing might be applied to

the nonstationary and complex data (Hasmida, 2009).

3.3.1.2 Normal Distribution

Data with normal distribution have a pattern of data distribution which follows a

bell shaped curve. The bell shaped curve has several properties such that the curve

concentrated in the center and decreases on either side. This means that the data has less

of a tendency to produce unusually extreme values, compared to some other

distributions. Besides, the bell shaped curve is symmetric. This tells that the probability

of deviations from the mean is comparable in either direction (Hasmida, 2009).

Data without normal distribution behavior must be transformed. Methods of data

transformation that can be applied are normal log transformation method and Box-Cox

transformation method. Box-Cox method is applied if the normal log transformation

method is not capable to transform the data into normal distribution (Hasmida, 2009).

28

3.3.1.3 Outlier

An outlier is an observation that lies outside the overall pattern of a distribution

(Moore and McCabe, 1999). The presence of an outlier always indicates some sort of

problem. This can be a case which does not fit the model under study or an error in

measurement. Outliers are often easy to spot in histograms. For example, the point on

the far left in the above figure is an outlier. This data point should be removed because it

also a sign of nonstationary data (Hasmida, 2009).

3.3.1.4 Missing Data

Yafee and McGee (2000) suggested that data should be replaced by a theoretical

defensible algorithm if some data values are missing is observed in the data series. A

crude missing data replacement method is to plug in the mean for the overall series. A

less crude algorithm is to use the mean of the period within the series in which the

observation is missing. Another algorithm is to take the mean of the adjacent

observations. Missing value in exponential smoothing often applies one step ahead

forecasting from the previous observation. Other form of interpolation employs linear

spines, cubic splines, or step function estimation of the missing data.

In order to handle missing data for this study, linear regression between flow of

study area station and flow of adjacent station is used. If data still cannot be obtained,

regression between streamflow and rainfall for that station is used to get the missing

data.

29

3.3.2 Model Procedure

The ARIMA modeling procedure for fitting ARIMA models to time series,

which was developed by Box and Jenkins (1976), consists of three iterative steps: model

identification; parameter estimation; and diagnostic checking. Figure 3.1 depicts the

process of ARIMA modeling. The procedure is itemized as follows:

Figure 3.1: Flowchart of ARIMA modeling (Lee and Ko, 2011)

3.3.2.1 Model Identification

One determines whether the time series is stationary or nonstationary. Examine a

time series plot or ACF. From ACF, if large autocorrelations do not die out, indicating

that differencing may be required to give a constant mean. A seasonal pattern that

repeats every kth time interval suggests taking the kth difference to remove a portion of

Streamflow

Model Identificatio

Parameters Estimation

Diagnostic Checking

Is adequate?

Original Streamflo

No

Yes

30

the pattern. Most series should not require more than two difference operations or

orders. Be careful not to overdifference. If spikes in the ACF die out rapidly, there is no

need for further differencing.

Next, examine the ACF and PACF of your stationary data in order to identify

what autoregressive or moving average models terms are suggested. Some general

guidelines (SPSS, 1993) using graphical method was applied in the identification

process:

i. Nonstationary series have an ACF that remains significant for half a dozen or

more lags, rather than quickly declining to 0. Difference must be done for such a

series until it is stationary before it can be identified.

ii. Autoregressive processes have an exponentially declining ACF and spikes in the

first one or more lags of the PACF. The number of spikes indicates the order of

the autoregression.

iii. Moving average processes have spikes in the first one or more lags of the ACF

and an exponentially declining PACF. The number of spikes indicates the order

of the moving average.

iv. Mixed (ARMA) processes typically show exponential declines in both the ACF

and the PACF.

At the identification stage, the sign of the ACF or PACF and the speed with which

an exponentially declining ACF or PACF approaches 0 are depend upon the sign and

actual value of the AR and MA coefficients (SSPS, 1993).

31

3.3.2.2 Parameter Estimation

Once the tentative model is formulated, the related model parameters are

estimated using the least squares scheme. Parameters are estimated to have zero gradient

of forecasting errors to the historical load data. The primary objective of this parameter

estimation is to minimize the forecasting error and determine both the model and its

parameters (Lee and Ko, 2011). Each ARIMA tentative model parameter can be tested

using t-values and p-values. Dividing the coefficient by its standard error calculates a t-

value.

3.3.2.3 Diagnostic Checking

Then, diagnostic test was conducted to ensure that the essential modeling

assumptions are satisfied for a given model. When the parameters have been well

estimated, the tentative model accuracy is validated by examining the ACF and PACF

residuals. The residuals should simulate the white noise process. Furthermore, the Q-

statistics test is applied to confirm the tentative model (O’Donovan, 1983). If the

calculated value Q exceeds the critical value of χ2 obtained from the chi-square tables,

the tentative model is inadequate (Lee and Ko, 2011).

Furthermore, for this stage, Ljung-Box is used for testing white noise residual.

Hypothesis null is that residual should be white noise. In other word, the residual series

should be independent, homoscedastic (having constant variance), and normally

distributed. We can reject hypothesis null if p-value in Chi-Square statistic greater than

alpha of 5%.

32

These steps are repeated until an adequate model is identified. When the steps in

ARIMA modeling are completed, a specific ARIMA model is applied to predict the

future monthly streamflow for 1 year ahead.

3.3.3 Minitab Procedures

For modeling ARIMA model, a statistical software has been uses, which is called

Minitab version 15. By using Minitab, ARIMA model step can be summarized as

follows:

1. Identify stationay of data

• If stationary, then go to step No. 3

• If non-stationary, then go to step No. 2

2. Apply the non-seasonal difference (d=1, k=1)

3. Identify seasonal pattern of the data using ACF

• If ACF indicating non-seasonal pattern, then go to step No. 5

• If ACF indicating seasonal pattern, then go to step No. 6

4. Identify general theoretical PACF of ARIMA model

5. Apply seasonal difference (D=1, k=12; D=2, k=24)

6. Identify general theoretical ACF and PACF of ARIMA model

• If seasonal pattern of ACF and PACF is still found from step No. 6, then go to

step No. 5

33

• If non-seasonal pattern of ACF is found then go to step No. 7

7. Apply the rest of procedures which are estimation, diagnostic check and

forecasting according to step No. 6until obtaining the best forecasting pattern.

3.4 Model Comparison and Forecast Evaluation Measures

In order to compare the forecasting accuracy of the different models, a

multicriterion performance evaluation procedure was used in this study. The following

indices were used to evaluate the performance of the models (Shalamu, 2009):

1. Mean Absolute Percentage Error (MAPE):

(3.7)

2. Root Mean Squared Error (RMSE):

(3.8)

3. Chi-Squared Test:

(3.9)

34

where,

Yi = the observed flow

Fi = the forecasted flow

CHAPTER 4

RESULT AND DISCUSSION

4.1 Introduction

This chapter consists of detail description on analysis of time series data using

both Markov and ARIMA modeling method for streamflow forecasting. Most of

computation work for ARIMA and Markov models are carried out by using Minitab

Microsoft Excel, respectively. Both of the methods will be used to model the streamflow

of Sungai Bernam at Tanjung Malim, Selangor (Station No. 3615412). The models will

be checked to get an adequate model for streamflow forecasting.

Data from January 1960 to December 2010 was used in deriving stochastic and

forecasting models. Data of 552 months from January 1960 to December 2005 are used

as calibration set for both model. Another 60 months data from January 2006 to

December 2010 is used as validation set.

36

4.2 Estimation of Missing Data Values

Some of data values are missing in the data series for Sungai Bernam

streamflow at Tanjung Malim (Station No. 3615412). In order to handle missing data for

this study, linear regression between flow of study area station and flow of adjacent

station is used. Regression line is determined as the best way to predict y from x. As

there was missing data of streamflow for Sungai Bernam at Tanjung Malim, streamflow

data of adjacent station at Jam. Skc (Station No. 3813411) is used. For example, there is

missing data of January 1962, February 1962 and March 1962. Some adjacent

observations month of streamflow data (previous and forward month) of both station are

used to get the regression line to estimate the missing data. This is shown in Figure 4.1.

Figure 4.1: Linear Regression of Two Streamflow Station for 1962

Missing month data of Station Tanjung Malim for January, February and March

1962 can be completed by using equation of linear regression y = 0.126x + 2.513 with

coefficient of determination, R2 of 0.845, which y and x represented flow of Station

Tanjung Malim (m3/s) and Jam. Skc (m3/s), respectively.

37

If data still cannot be obtained may be because the adjacent streamflow station

also had missing data for that month, rainfall data for adjacent station can be used to get

the regression equation to estimate the missing streamflow data. For example there is

missing data from February 1993 to May 1993 for both station of Tg. Malim and

Jam.Skc. Some adjacent observations month of rainfall data (previous and forward

month) of Station Ldg. Katoyang at Tg. Malim (Station No. 3714152) are used to get the

regression equation with flow data of Station Jam. Skc as shown in Figure 4.2. The

equation of the linear regression was found to be y = 0.146x + 10.43 with coefficient of

determination, R2 of 0.603, which y represented flow for Station Jam. Skc (m3/s) and x

represented rainfall for Station Ldg. Katayong (mm).

Figure 4.2: Linear Regression of Rainfall and Streamflow

After we know the streamflow data for February 1993 to May 1993 at Station

Jam. Skc, we can use that data to estimate the missing data of Station Tg. Malim from

the regression equation of both streamflow by using equation of linear regression y =

0.112x + 3.673 with coefficient of determination, R2 of 0.892, which y and x represented

flow of Station Tanjung Malim (m3/s) and Jam. Skc (m3/s), respectively. Figure 4.3

showed the regression line for the equation.

38

Figure 4.3: Linear Regression of Two Streamflow Station for 1993

After replacing all the missing data with appropriate estimation data from the

linear regression method, streamflow data of Sungai Bernam is shown in Appendix A.

4.3 Markov Model

Formulation of Markov Model is based on the procedures of data synthesis

which are: (1) determination of statistical parameters from the analysis of the historical

record, (2) identifying the frequency distribution of the historical data, (3) generating

random numbers of the same distribution and statistical characteristics and (4)

constituting the deterministic and combining with the random part.

39

4.3.1 Statistical Parameters of Historical Data

The sample mean flow for 612 month of data is 9.75 m3/s. Then, the sample

standard deviation, S is 4.66, skewness is 1.2, standard error is 0.18863 and coefficient

of variance is 0.47828. These statistical parameters can be calculated using Microsoft

Excel or can be obtained from EasyFit software. The result of the descriptive statistics

using EasyFit is shown in Figure 4.4.

Figure 4.4: Descriptive statistics of Sungai Bernam data

For data calibration, to model the streamflow, parameters of monthly historical

data from January 1960 to December 2005 which using 552 data is shown in Table 4.1.

40

Table 4.1: Parameters of Monthly Historical Data

i qj S2 Sj Rj Sj-1 bj qj-1

Jan 0.049549 9.07979E-05 0.009529 0.4442686 4.189053605 0.001 0.06 Feb 0.04537 8.89268E-05 0.00943 0.4901265 3.639813919 0.001 0.05 Mac 0.046522 9.69723E-05 0.009847 0.5777814 3.363576896 0.002 0.05 Apr 0.05187 9.10128E-05 0.00954 0.408 3.69337796 0.001 0.05 May 0.054888 5.21161E-05 0.007219 0.303 3.822355866 0.001 0.05 Jun 0.0488 6.94571E-05 0.008334 0.515 2.990121105 0.001 0.05 July 0.046073 7.22414E-05 0.008499 0.541 3.349038581 0.001 0.05 Aug 0.047227 7.71759E-05 0.008785 0.585 3.27283605 0.002 0.05 Sep 0.053852 7.21758E-05 0.008496 0.406 3.447681936 0.001 0.05 Oct 0.059644 7.62886E-05 0.008734 0.369 3.761513315 0.001 0.05 Nov 0.065038 6.89806E-05 0.008305 0.294 4.175448792 0.001 0.06 Dec 0.059643 0.000101211 0.01006 0.3699155 4.738293291 0.001 0.07

4.3.2 Identification of Distribution

In this study, statistical test is used for estimating the parameters of a probability

distribution. Kolmogorov-Smirnov (K-S) test, Anderson Darling (AD) test and Chi-

squared test can be used as statistical test. K-S test has being used as preference as it is

more powerful and robust. By using EasyFit application, the best-fitting distribution can

be found. K-S goodness of fit test for normal distribution is 0.13466 at ranking 42 while

for Lognormal distribution is 0.05954 at ranking 2. For AD goodness of fit test for

normal distribution is 139.43 at ranking 41 while for lognormal distribution is 34.169 at

ranking 6. Best-fitting distribution for the streamflow data of Sungai Bernam is

Lognormal Distribution (Figure 4.5 and Figure 4.6).

41

Histogram Inv. Gaussian (3P)

Flow, q (m3/s)30282624222018161412108642

0.3

0.28

0.26

0.24

0.22

0.2

0.18

0.16

0.14

0.12

0.1

0.08

0.06

0.04

0.02

0

Figure 4.5: Probability Density Function

Log-normal distribution is positively skewed, match with characteristic of many

hydrologic variables. This distribution is suitable for low-flow studies because small

changes in low values produce large changes in their logarithmic values.

42

Sample Inv. Gaussian (3P)

Flow, q (m3/s)30282624222018161412108642

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Figure 4.6: Cumulative Distribution Function

As the distribution is log-normal, use the logarithm of the values and finally

convert back the flows. For an example, observed streamflow data in logarithmic values

for 1960 until 1970 is shown in Table 4.2, while other data for year (1971-2005) can be

found in Appendix B. These data as act calibration set to get the parameter of historical

data in order to model the future streamflow.

Table 4.2: Logarithmic Values of Observed Streamflow Data for 1960-1970 i Jan Feb Mac Apr May Jun Jul Aug Sep Oct Nov Dec

1960 0.056 0.051 0.058 0.064 0.055 0.046 0.057 0.049 0.058 0.057 0.063 0.065 1961 0.052 0.044 0.045 0.051 0.055 0.051 0.046 0.051 0.058 0.056 0.060 0.066 1962 0.059 0.046 0.056 0.057 0.055 0.046 0.045 0.049 0.056 0.069 0.075 0.058 1963 0.050 0.045 0.045 0.044 0.044 0.046 0.045 0.053 0.060 0.070 0.079 0.066 1964 0.056 0.047 0.050 0.052 0.056 0.045 0.057 0.048 0.060 0.058 0.065 0.064 1965 0.046 0.044 0.050 0.066 0.068 0.050 0.039 0.043 0.053 0.069 0.067 0.072 1966 0.058 0.048 0.052 0.058 0.045 0.052 0.053 0.054 0.057 0.072 0.077 0.072 1967 0.065 0.054 0.047 0.060 0.059 0.043 0.043 0.044 0.055 0.060 0.076 0.055 1968 0.040 0.037 0.031 0.041 0.059 0.050 0.043 0.043 0.055 0.057 0.058 0.060 1969 0.054 0.047 0.040 0.046 0.060 0.050 0.037 0.044 0.042 0.059 0.056 0.053 1970 0.054 0.034 0.036 0.045 0.052 0.038 0.043 0.045 0.054 0.055 0.058 0.061

43

4.3.3 Generation of Random Numbers

In this study, we generate random numbers using Microsoft Excel command

RAND( ). To get the random normal deviate, t, of mean equal to 1 and unit standard

deviation, we use inverse error function, erf-1(z):

(4.1)

Value of z can be obtained from cumulative distribution function (CDF) of the

log-normal distribution:

(4.2)

Figure 4.7: Cumulative distribution function of the log-normal distribution

44

(4.3)

As log-normal random numbers have both mean and standard deviation equal to

one. Therefore, the Equation 4.3 becomes:

(4.4)

If erf (x) = y, then erf -1 (y) = x. Let,

The value of t = ln x. Therefore,

(4.5)

As an example, the calculation procedure of random numbers generation for year

2006 is shown in Table 4.3, while the random numbers generation for other year (2007-

2010) can be found in Appendix C.

45

Table 4.3: Generation of Random Number for Year 2006

i RAND ( ) z erf -1 ti,j

January 0.699645 0.399289 0.370085 1.523379 February 0.45481 -0.090379 -0.08027 0.886483 March 0.063732 -0.872536 -1.0558 -0.49313 April 0.224711 -0.550577 -0.53482 0.243657 May 0.236038 -0.527923 -0.50847 0.280915 June 0.471912 -0.056176 -0.04983 0.929536 July 0.999341 0.998683 1.443813 3.041859

August 0.533139 0.066278 0.058805 1.083163 September 0.095672 -0.808656 -0.91763 -0.29772

October 0.044674 -0.910651 -1.15355 -0.63136 November 0.997494 0.994989 1.429319 3.021363 December 0.407816 -0.184368 -0.16487 0.766834

4.3.4 Streamflow Generation of Markov Model

As an example, the calculation deterministic part considering the persistence

(influence of previous flows) and combining with the random part to develop monthly

streamflow model for year 2006 is shown in Table 4.4, while the streamflow model for

other year (2007-2010) can be found in Appendix D.

The Markov model for monthly flows, developed by Thomas and Fiering is

using the following form (Maass et al., 1962):

(4.6)

46

We will use Equation 4.6 to develop Markov model for monthly flows. Flow in

ith month from the beginning, for jth month of the year can be modeled by adding mean

of flow of jth month of the year (January to December) with deterministic and random

component.

Table 4.4: Model Streamflow for Year 2006

i Deterministic Component Random Component Model flow

qi-1,j-1 qj+bj(qi-1,j-1-qj-1) ti,j Sjti,j√(1-rj2) qi,j (Log) qi,j (m3/s)

Jan 0.049549 0.049541669 1.523379 0.013 0.063 13.533 Febr 0.063 0.045386033 0.886483 0.007 0.053 9.077 Mac 0.053 0.04653475 -0.49313 -0.004 0.043 5.641 Apr 0.043 0.051865643 0.243657 0.002 0.054 9.604 May 0.054 0.054889433 0.280915 0.002 0.057 10.807 Jun 0.057 0.048803168 0.929536 0.007 0.055 10.210 Jul 0.055 0.046082272 3.041859 0.022 0.068 16.422

Aug 0.068 0.04726108 1.083163 0.008 0.055 10.014 Sep 0.055 0.053859993 -0.29772 -0.002 0.052 8.642 Oct 0.052 0.059642058 -0.63136 -0.005 0.055 9.821 Nov 0.055 0.065034911 3.021363 0.024 0.089 32.326 Dec 0.089 0.059661808 0.766834 0.007 0.067 15.849

4.3.5 Validation of Markov Model

The model streamflow by using Markov model is compared with the observed

streamflow that have been set as validation set for 60 monthly data from January 2006 to

December 2010. Graphically, from Figure 4.8, we can say that Markov model cannot

work well for streamflow forecasting for Sungai Bernam because it not match well with

the actual streamflow.

47

Figure 4.8: Comparison of Observed and Markov Model Flow

The ability of Markov model in streamflow forecasting is inspected by using

some forecast evaluation measures like Root Mean Square Error (RMSE), Chi-square

Test and Mean Absolute Percentage Error (MAPE). The result of inspection is

summarized in Table 4.5 and the details of the calculation can be found in Appendix E.

Table 4.5: Accuracy of the Markov Model

Performance Evaluation Procedure

Markov model

MAPE 53.66

RMSE 7.29

Chi-square test 250.99

48

4.4 ARIMA Model

In this study, an appropriate ARIMA tentative model for Sg. Bernam streamflow

is investigated. Examination of the autocorrelation function (ACF) and partial

autocorrelation function (PACF) provides a thorough basis for analyzing the system

behavior under time independence, and will suggest the appropriate parameters to

include in the model.

These tentative models will be checked and best tentative model will be selected

for streamflow forecasting of ARIMA model. As mentioned in previous chapter, the

ARIMA modeling follows three important stages that can be figured in flow diagram of

Box-Jenkins methodology (Figure 4.9).

Figure 4.9: Flow Diagram of Box-Jenkins Methodology

Ye

No

1. Tentative Identification

2. Parameter Estimation

3. Diagnostic Checking [Is the model adequate?]

4. Forecasting

-Testing parameters

- White noise of residuals - Normal distribution of residual

- Stationary & non- stationary time series - ACF & PACF

-Forecast calculation

49

4.4.1 Model Identification

Identification involve looking at the graph of sample autocorrelation function

(ACF) and sample partial autocorrelation function (PACF) to determine whether the

series is stationary or not and then make a decision what functional form best fits and

appropriate model for the data. In practice, the ACF and PACF are random variables and

will not give the same picture as the theoretical functions. This makes the model

identification more difficult and can involve much trial and error (Nazuha et al., 2010).

The most common method to check stationary is through examining the time

series plot of the data. Stationary means that data fluctuate around a constant mean. If

the time series plot is found to be non stationary, differencing needs to be applied.

Figure 4.10 showed that the data is non-stationary. The data need to be applied with non-

seasonal difference (d = 1, lag, k = 1). Based on graphical examination, Figure 4.11

showed that the data is stationary at the level of the data after applying non-seasonal

difference.

50

Year

Month2002199519881981197419671960JanJanJanJanJanJanJan

30

25

20

15

10

5

0

Stre

amflo

w, Y

t (m

3/s)

12

1110

9

8765

4321

12

11

10

9

8

7

6

5

4

321

12

11

10

9

876

5

432112

1110

9

876

54

32

1

12

11

10

9

876

5

4

32

1

12

11

1098

7

6

5

4

3

2

1

12

11

10

987

6

5

43

21

12

11

10

9

8

7654321

12

11

10

9

8

76

5

432

1

12

11

10

987

65

4

321

1211

109

8

7654

3

21

12

1110

9

8

7

654321

12

1110

9

8

76

54

32112

11

109876

5

4

321

12

1110

9

87

6

5

4

3

21

12

1110

9

876

5

432

1

12

11

10

9

8

76

54

3

2

1

12

11

10

9

87

6

5

432

1

12

11

10

9

8

7654

321

12

1110

9

876

5

4

32

1

12

11

10

9

876

5

4

321

12

11

10

987

65

432

1

1211

10

9

876

5

4321

12

11

10

98

7

6

54

321

12

11

10

9

87

6

54

3

21

12

11

10

98

7

65

4321

12

11

109

87654

32112

11

10

9

876

54

321

12

11

10

987

65

432

112

11

10

98

7

6

5432

1

12

11

10

9

8

7

6

543

211211

109876

54

32

1

12

1110

98

7

6

54

321

12

11

10

9

87

6

5

4321

12

11

10

9

8

76

54

3

2

1

1211109

876

5

4

32

1121110

98

7

6

5

4

3

2

1

1211109

87

6

5

4

321

12

11

10

9

876

54

3

2

1

12

11

10

9876

5

4

32

1

12

1110

9

87

6

54

3

21

1211

109

8

7

6

5432

1

12

11

10

9

8

765432

1

12

11

10

9

876

543

2

1

12

11109

8765

4

32

1

1211

109

8

7

6

5

4

3

2

1

Figure 4.10: Non stationary data of Sg. Bernam streamflow

YearMonth

2002199519881981197419671960JanJanJanJanJanJanJan

15

10

5

0

-5

-10

-15

Stre

amflo

w, d

1-Y

t (m

3/s)

1211

10

9

8

76

543

2

1

12

11

10

9

8

7

6

54

321

12

1110

9

87

6

54

321

12

11

10

987

6

5

4

3

2112

11

10

9

87

65

4

3

2

1

1211

109

8

76

54

3

21

12

11

1098

7

6

54

32

112

11

10

98

765432

1

12

11

109

8

7

6

5

4

3

2

1

12

11

10

9

8

7

6

5

4

32

1

12

11

10

9

8

76543

2

1

12

1110

9

8

7

65432

1

12

11

109

8

7

6

5

4

32

1

12

11

1098

7

6

54

321

12

11

10

9

8

76

5

43

2

1

12

11

10

9

87

6

5

43

21

12

11

109

8

7

6

5

4

3

21

12

11

10

9

8

7

65

43

2

112

11

10

98

7

6

54

32

112

11

10

9

8765

4

32

1

12

11

10

9

87

6

5

4

32

1

1211

1098

7

65

43

2

1

1211

10

9

87

6

5

4

3

2

1

12

11

1098

76

5

4

3

2

112

11

10

9

8

7

6

5

4

32

1

12

11

10

98

7

6

5

432

1

12

11

10

9

87654

32

1

12

11

109

87

6

54

32

11211

10

98

7

654

321

12

11

10

9

8

7

6

543

2

1

1211

10

9

8

7

6

54

3

211211

10

987

6

54

3

2

11211

10

9

8

76

5

4

32

1

12

11109

8

76

5432

1

12

11

10

98

76

5

4

3

2

1

121110

9

87

6

54

3

2

1

1211

10

9

8

76

5

4

321

121110

9

8

76

5

4

32

1

12

11

109

87

6

5

4

3

2112

11

10

9876

5

43

2

1

12

11

10

9

8

7

6

5

4

3

2

1

12

11

10

9

8

7

6

543

2

1

12

1110

98

76543

21

12

11

10

98

7

6

5

4

3

2

1

12

11

10

98

76

54

3

2

1

12

11

10

9

8

7

65

43

2

Figure 4.11: Stationary data of Sg. Bernam streamflow

The next step is to identify the values of p and q which are the AR (p) and MA

(q) components for both seasonal and non-seasonal series. For this purpose, the ACF and

51

PACF coefficient are computed. The following Table 4.6 gives general theoretical for

identification of the likely model:

Table 4.6: General Theoretical ACF and PACF of ARIMA models

Model ACF PACF MA(q): moving average of order q Cut off after lag q Dies down

AR(p): autoregressive of order p Dies down Cuts off after lag p

ARMA(p,q): mixed autoregressive-moving average of order (p,q)

Dies down Dies down

AR(p) or MA(q) Cuts off after lag q Cuts off after lag p

No order AR or MA (White Noise or Random process)

No spike No spike

65605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Aut

ocor

rela

tion

Autocorrelation Function for d1-Yt(with 5% significance limits for the autocorrelations)

Figure 4.12: ACF after non-seasonal difference

52

65605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Part

ial A

utoc

orre

lati

on

Partial Autocorrelation Function for d1-Yt(with 5% significance limits for the partial autocorrelations)

Figure 4.13: PACF after non-seasonal difference

As we can see from the Figure 4.12 and 4.13, ACF and PACF die down

gradually. Based on the pattern, the respective values of p, d, q was determined for

ARIMA is: ARIMA (1, 1, 1). From ACF correlogram, seasonal pattern of the data is

identified. As ACF is indicating seasonal pattern, seasonal difference (D = 1, lag, k =

12) needs to be applied.

65605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Aut

ocor

rela

tion

Autocorrelation Function for D1-d1-Yt(with 5% significance limits for the autocorrelations)

Figure 4.14: ACF after seasonal difference

53

65605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Part

ial A

utoc

orre

lati

on

Partial Autocorrelation Function for D1-d1-Yt(with 5% significance limits for the partial autocorrelations)

Figure 4.15: PACF after seasonal difference

After applying seasonal difference, we can see from the Figure 4.14, ACF cuts

off after lag 12 while in figure 4.15, PACF dies down. For seasonal ARIMA, the general

notation is ARIMA (p, d, q) (P, D, Q)S. Based on the pattern, the respective values of P,

D, Q was determined for ARIMA is: ARIMA (0, 1, 1)12. However, in order to make sure

that we have identified the right model, we suggest another tentative model which is

ARIMA (1, 1, 1)12.

4.4.2 Parameter estimation

Each ARIMA tentative model parameter can be tested using t-values and p-

values. Dividing the coefficient by its standard error calculates a t-value. The standard

error (SE) of coefficient is the standard deviation of the estimate of a regression

coefficient. It measures how precisely your data can estimate the coefficient’s unknown

value. Its value is always positive, and smaller values indicate a more precise estimate.

The standard error of a coefficient helps determine whether the value of the coefficient

54

is significantly different than zero. If the p-value associated with this t-statistic is less

than alpha level, we can conclude that the coefficient is significantly different from zero.

From Table 4.7, the standard error of MA 1 coefficient is large relative to the

value of the coefficient itself, so the t-value of 1.26 is too small to declare statistical

significance. We reject hypothesis null if |t|> tα/2,df = n-np. For MA 1 parameter, tcalc

(=1.26) < ttable (=2.25). The resulting p-value also is much greater than common alpha

level. Therefore, hypothesis null cannot be rejected. So we can conclude this coefficient

not differs from zero. Table 4.8 which estimates parameters for ARIMA (1,1,1)(0,1,1)12

have |tcalc|> ttable (= 2.25) and p-value is less than alpha level. Hence, hypothesis null can

be rejected, and we can conclude that the coefficient is significantly different from zero.

Table 4.7: Final Estimates of Parameters for ARIMA (1,1,1)(1,1,1)12

Type Coefficient SE Coefficient T p

AR 1 0.2782 0.0520 5.35 0.000

SAR 12 0.0589 0.0467 1.26 0.208

MA 1 0.8765 0.0256 34.24 0.000

SMA 12 0.9537 0.0206 46.25 0.000

Table 4.8: Final Estimates of Parameters for ARIMA (1,1,1)(0,1,1)12

Type Coefficient SE Coefficient T p

AR 1 0.2894 0.0516 5.61 0.000

MA 1 0.8788 0.0248 35.41 0.000

SMA 12 0.9553 0.0184 51.98 0.000

55

4.4.3 Diagnostic Checking

The next step of model identification method of time series modeling approach is

diagnostic checking. It is aimed at examining the accuracy of the chosen tentative model

in ensuring that the modeling assumptions are satisfied. Several procedures can be

applied to check the adequacy of the model as to whether the model satisfies the stability

or stationary condition, as required in stochastic modeling works (Ayob and Amat,

2004).

For this stage, Ljung-Box is used for testing white noise residual. Hypothesis

null is that residual should be white noise. In other word, the residual series should be

independent, homoscedastic (having constant variance), and normally distributed. We

can reject hypothesis null if p-value in Chi-Square statistic greater than alpha of 5%.

In this study, both ARIMA tentative models have p-value less than alpha level.

Table 4.9 and Table 4.10 showed p-value for both tentative models. So, the hypothesis

null cannot be rejected and we can conclude that residual is significantly white noise for

both tentative models.

Table 4.9: Modified Box-Pierce (Ljung-Box) Chi-Square statistic

for ARIMA (1,1,1)(1,1,1)12

Lag 12 24 36 48

Chi-Square 21.2 61.8 82.7 98.1

DF 8 20 32 44

p-Value 0.007 0.000 0.000 0.000

56

Table 4.10: Modified Box-Pierce (Ljung-Box) Chi-Square statistic

for ARIMA (1,1,1)(0,1,1)12

Lag 12 24 36 48

Chi-Square 23.1 62.2 82.7 97.9

DF 9 21 33 45

p-Value 0.006 0.000 0.000 0.000

Besides that, the best tentative model can be determined through test of Least

Square Error (LSE) and Root Mean Square Error (RMSE). The result for the test on the

tentative model is summarized in Table 4.11. The best fit in the least-squares sense

minimizes the sum of squared residuals, a residual being the difference between an

observed value and the fitted value provided by a model. RMSE also is a good measure

of accuracy. The smaller the value of LSE and RMSE, the tentative model is more

accurate.

Table 4.11: LSE and RMSE Test for ARIMA Tentative Model

Test ARIMA

(1,1,1)(1,1,1)12

ARIMA

(1,1,1)(0,1,1)12

Least Square Error (LSE) 1798 1760

Root Mean Square Error (RMSE) 5.5 5.4

So, from two tentative models possible, the model that best fits the criteria and

meets the requirement is model ARIMA (1,1,1)(0,1,1)12. Forecasting is made based on

the chosen model. The model we identified as best-fit model for Sg. Bernam streamflow

is:

(1 - φ1B)(1-B)(1-B12)Yt = (1- θ1B)(1- θ2B12)at (4.7)

57

Rewriting the model, we have the following:

(1 - φ1B)(1-B12-B+B13)Yt = (1- θ2B12- θ1B + θ1θ2B13)at

(1 - φ1B)(1-B12-B+B13)Yt = (1- θ2B12- θ1B + θ1θ2B13)at

(1-B12-B+B13- φ1B+ φ1B13+ φ1B2- φ1B14) Yt = (1- θ2B12- θ1B + θ1θ2B13)at

(1 - B12 – (1+ φ1)B + (1+ φ1)B13 + φ1B2 - φ1B14) Yt = (1- θ1B - θ2B12 + θ1θ2B13)at

Yt – (1+ φ1)Yt-1 + φ1Yt-2 – Yt-12 + (1+ φ1)Yt-13 - φ1Yt-14 = at - θ1at-1 – θ2at-12 + θ1θ2at-13

Yt = (1+ φ1)Yt-1 - φ1Yt-2 + Yt-12 - (1+ φ1)Yt-13 + φ1Yt-14 + at - θ1at-1 – θ2at-12 + θ1θ2at-13

Noted that,

AR1, φ1 = 0.2894

MA1, θ1 = 0.8788

SMA 12 θ2 = 0.9553

Yt = (1+ 0.2894) Yt-1 – 0.2894Yt-2 + Yt-12 - (1+ 0.2894) Yt-13 + 0.2894Yt-14 + 0.2894Yt-14

+ at – 0.8788at-1 – 0.8788at-12 + (0.8788x0.9553)at-13

Yt = 1.2894 Yt-1 – 0.2894Yt-2 + Yt-12 - 1.2894Yt-13 + 0.2894Yt-14 +

at – 0.8788at-1 – 0.9553at-12 + 0.8395at-13

Yt = Yt-12 + [1.2894 Yt-1 - 1.2894Yt-13 - 0.2894Yt-2 + 0.2894Yt-14] +

[at – 0.8788at-1 – 0.9553at-12 + 0.8395at-13] (4.8)

Equation (4.8) can be used for streamflow forecasting of ARIMA model. From

Equation 4.8 also, its explained that the forecast for time period t is the sum of (1) the

value of the time series in the same month of the previous year, (2) a trend component

determined by the difference of previous month’s value and last year’s previous month’s

value and difference of last year’s previous two month’s value and previous two month’s

value; (3) the effects of random shocks (or residuals) of period t, t-1, t-12 and t-13 on the

forecast.

58

4.4.4 Streamflow Generation of ARIMA Model

In this study, we will use Minitab to develop Markov model for monthly flows.

As an example, develop monthly streamflow model using Minitab for year 2006 to 2007

is shown in Table 4.12, while the streamflow model for other year (2008-2010) can be

found in Appendix F.

Table 4.12: Model Streamflow for Year 2006-2007

i Actual Flow (m3/s)

Model Flow (m3/s) Residual Fit Coefficient

Jan 2006 13.08 9.6732 * * 0.289364 Feb 2006 8.12 7.1884 * * 0.878761 Mac 2006 6.11 7.2612 * * 0.955283 Apr 2006 29.72 9.0165 * * May 2006 29.22 9.9281 * * Jun 2006 17.82 7.6110 * * Jul 2006 7.94 6.7046 * *

Aug 2006 9.95 7.0851 * * Sep 2006 28.05 9.5168 * * Oct 2006 17.63 12.2889 * * Nov 2006 17.72 15.2005 * * Dec 2006 11.23 12.3581 * * Jan 2007 9.05 7.9227 * * Feb 2007 6.80 6.6970 -1.57988 7.5299 Mac 2007 7.62 7.1341 -1.39072 7.6507 Apr 2007 13.46 8.9949 -1.05700 9.4570 May 2007 12.05 9.9369 -0.14946 10.2195 Jun 2007 11.38 7.6286 1.10867 7.3913 Jul 2007 13.06 6.7248 -1.04180 7.8818

Aug 2007 8.95 7.1060 1.04920 7.2208 Sep 2007 9.36 9.5379 0.26505 11.0050 Oct 2007 14.33 12.3101 -2.99026 13.4603 Nov 2007 14.26 15.2217 -3.58500 15.6250 Dec 2007 8.24 12.3794 4.03841 11.4816

59

4.4.5 Validation of ARIMA Model

The model streamflow by using ARIMA model is compared with the observed

streamflow that have been set as validation set for 60 monthly data from January 2006 to

December 2010. Graphically, from Figure 4.16, we can say that ARIMA model may

works quite well for streamflow forecasting for Sungai Bernam because many data from

model match well with the actual streamflow. The ability of ARIMA model in

streamflow forecasting is inspected using some forecast evaluation measures.

Figure 4.16: Comparison Observed and ARIMA Model Flow

Like in Markov model’s validation, the forecast evaluation measures like Root

Mean Square Error (RMSE), Chi-square Test and Mean Absolute Percentage Error

60

(MAPE) are used to examine the accuracy of ARIMA model. The result of inspection is

summarized in Table 4.13 and the details of the calculation can be found in Appendix G.

Table 4.13: Accuracy of the ARIMA Model

Performance Evaluation Procedure

ARIMA model

MAPE 27.50

RMSE 5.41

Chi-square test 191.11

4.5 Model Comparison and Forecast Evaluation Measures

Streamflow forecasting methods of Markov model is being compared with

ARIMA model to inspect the accuracy between the models in forecasting ability.

Observed streamflow data that have been set as validation set for 60 monthly data from

January 2006 to December 2010 is used as bench mark to make the comparison. From

From graphical examination on Figure 4.17, we can say that ARIMA model is better for

streamflow forecasting for Sungai Bernam because more data from ARIMA model

match with the actual streamflow.

Most of streamflow forecast by Markov model has higher streamflow value

rather than the actual data. In the accuracy aspects, Markov model is not good rather

than ARIMA model because the model cannot obtain the exact or similar pattern with

the actual ones. However, these high values are a good forecasting as a reference

guideline to prevent damage due to flood problem. We can use Markov model for short-

61

term forecasting, like hourly and daily forecasting in order to give more accurate flood

warning.

Meanwhile, if the forecasts streamflow has the lower value from the actual data,

we cannot estimate the flood occurrence. Lower streamflow forecasts is needed in some

of agriculture field to make sure that plants have sufficient water and grow well.

Figure 4.17: Model Comparison

For short period, ARIMA model can obtain the exact or similar pattern with the

actual ones. ARIMA cannot forecast accurately for longer period as it is best used for

short-term forecasting. Usually, it will tend to become flat for sufficiently long period.

Actually, ARIMA model which is good at short-term forecasting can also be used to

control flood.

62

In order to inspect the forecasting accuracy of the different models, criteria

performance evaluation procedures which are MAPE, RMSE and Chi-square test for

both Markov and ARIMA models are compared. Table 4.14 shows the result of model

comparison of MAPE, RMSE and Chi-Square test for each model.

Table 4.14: Accuracy of the model

Performance Evaluation Procedure

Markov model

ARIMA model

MAPE 53.66 27.50

RMSE 7.29 5.4156

Chi-squared test 250.99 191.11

The minimum value of MAPE, RMSE and Chi-squared methods indicates that

the model is the best for streamflow forecasting. From the result of the performance

evaluation procedure, it showed that ARIMA has less value for all methods used to find

the accurate model. Therefore, in this study, the best performance of model for

streamflow forecasting between these two models is ARIMA model.

In this study, one factor that ARIMA model is better than Markov model because

the historical data for Sg. Bernam is non stationary. If the historical data is stationary,

Markov may has advantage because it is propagating the probability method which

transition from state to another state is depend on probability. Markov model cannot

remove non stationary data but the advantage of ARIMA model is it can transform non

stationary data to stationary data.

ARIMA model selected as best fit as it has minimum mean squared forecast error

and therefore it often used in statistical practice. Therefore, for forecasting one period

ahead, which is Yt+1, the equation is as follows:

63

Yt+1 = Yt-11 + [1.2894 Yt - 1.2894Yt-12 - 0.2894Yt-1 + 0.2894Yt-13] +

[at+1 – 0.8788at – 0.9553at-11 + 0.8395at-12] (4.9)

By using Minitab, we can easily do streamflow forecasting for the future values

of time series from current and past values. Figure 4.18 shows the comparison of pattern

of streamflow for actual and model streamflow for Sungai Bernam. The first 5 years

from Jan 2006 to December 2010 is the calibration process. This time series plot reveal

pattern of cycles of ARIMA model. We can see that, the model flows follow the pattern

of observed streamflow quite well although the data is nonstationary for several years.

YearMonth

2015201420132012201120102009200820072006JanJanJanJanJanJanJanJanJanJan

30

25

20

15

10

5

Stre

amflo

w, Y

t (m

3/s)

Yt-actualYt-model

Variable

12

11

109876

54

32

112

11

10

9876

5

43

2112

11

10

9

8

76

5

4

3

2

1

12

1110

98

7

654

32

1

12

1110

9

8

7

6

54

3

2

1 12

11

10

9

876

54

321

12

11

10

9

876

54

321

12

11

10

9

876

54

321

12

11

10

9

876

54

321

12

11

10

9

876

54

321

12

11

10

9

876

54

321

12

11

10

9

876

54

321

12

11

10

9

876

54

321

12

11

10

9

876

54

321

12

11

10

9

876

54

32

1

Figure 4.18: Streamflow for actual and model

The next 5 years is the forecast streamflow using ARIMA model which is 60

months from January 2011 to December 2015. We can see from the figure, the model

64

can forecast well but the pattern of streamflow is repeated the same pattern for longer

period. This is because ARIMA model is only good and best suited for short term

forecasting since its forecast on previous observations. For short term forecasting, Box-

Jenkins model can nicely reproduce the details of the original series. ARIMA cannot

forecast accurately for longer period.

CHAPTER 5

CONCLUSION AND RECOMMENDATIONS

4.1 Conclusion

This study has fulfilled the objectives of the study to propose the streamflow

forecasting methods using Markov and ARIMA models and then inspect the accuracy of

both models in forecasting ability. The Box-Jenkins or ARIMA model is one of the most

popular time series forecasting methods. Markov model has its own advantage in

forecasting ability.

In this study, the tentative model that best fits the criteria and meets the

requirement is model ARIMA (1,1,1)(0,1,1)12. By analyzing the forecasted value using

the performance evaluation procedure, it is found that use of ARIMA model for

forecasting Sg. Bernam streamflow is better than Markov model. From the result of the

performance evaluation procedure, it showed that ARIMA has less value for all methods

used. Therefore, ARIMA model has the ability to predict accurately the future monthly

streamflow for Sungai Bernam.

66

The critical part in modeling using ARIMA is identification of best tentative

model. The tentative model that has been identified will be tested and checked to clarify

that the model is the best fit.

Markov also has some advantage because it forecasts with higher streamflow

compare to actual streamflow. Higher streamflow can cause disaster like flood.

Therefore, Markov model can be used for flood control.

Both Markov and ARIMA models are good for short term forecasting. From the

result, we can see that both models can forecast well for earlier period. But, for longer

period, they cannot forecast accurately.

Although both models good for short-term forecasting and not good for long-

term forecasting, comparison between the two model shows that ARIMA is better in

giving accurate forecasts.

4.2 Recommendations

Based on the result, both Markov and ARIMA model can be used for streamflow

forecasting. However, there are some weaknesses that can be overcome. Here are some

recommendations that can be used to increase the accuracy for streamflow forecasting:

67

1. The amount of data, or equivalently the number of training patterns also affects

the forecast performance. For long memory series, more training patterns results

in more accurate forecasts. To forecast accurately, use long input series.

2. To control flood efficiently, we can use Markov model for short-term forecasting

because short-term forecasting is very useful for control flood.

3. Use ARIMA model for short-term forecasting only including for streamflow

forecasting.

4. Compare the streamflow forecasting with other forecasting methods of time

series such as exponential smoothing, regression analysis or Fourier series

analysis.

5. Do the forecasting time series after removing the outliers.

6. Use hybrid model using ARIMA and artificial neural network in streamflow

forecasting.

68

REFERENCES

Adib, A. and Majd, A. R. M. (2009). Optimization of Reservoir Volume by Yield Model

And Simulation of it by Dynamic Programming and Markov Chain Method.

American-Eurasion J. Agric. & Environ. Sci., 5(6), 796-803.

Akgun, B. (2003). Identification of Periodic Autoregressive Moving Average Models.

Middle East Technical University.

Ayob, K. and Amat, S. D. (2004). Water Use Trend at Universiti Tekologi Malaysia:

Application of Arima Model. Jurnal Teknology, 41 (B): 47-56

Bell, W. R. (1984). An Introduction to Forecasting with Time Series Models. Insurance:

Mathematics and Economics 3, pp. 241-255.

Bowerman, B. L. and O’Connell, R. T. (1993). Forecasting and Time Series: An

Applied Approach. Third Edition. Duxbury Press.

Box, G. E. P. and Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control.

Holden Day, San Francisco.

Box, G. E. P. and Jenkins, G. M. (1976). Time Series Analysis, Forecasting and Control.

Holden Day, San Francisco.

Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1994). Time Series Analysis:

Forecasting and Control. Third Edition. Prentice Hall.

Brown, R. G. (1962). Smoothing, Forecasting and Prediction of Discrete Time Series.

Prentice Hall, Englewood Cliffs, N. J.

69

Dalphin, R. J. (1987). Markov-Weibull Model of Monthly Streamflow. Journal of Water

Resources Planning and Management, Vol. 113, No. 1.

Fiering, M. B., and Jackson, B. B. (1971). Synthetic Streamflows. Water Resources

Monograph 1. American Geophysicists Union. Washington, D. C.

Fortin, V., Perreault, L. and Salas, J. D. (2004). Retrospective Analysis and Forecasting

of Streamflows Using a Shifting Model. Journal of Hydrology, Vol. 296,

135-163.

Gupta, R. S. (1989). Hydrology and Hydraulic Systems. Prentice Hall, pp 343-350.

Hasmida, H. (2009). Water Quality Trend at The Upper Part of Johor River in Relation

to Rainfall and Runoff Pattern. Universiti Teknologi Malaysia.

Heiko, B. (2000). Markov Chain Model for Vegetation Dynamics. Ecological Modeling,

Vol. 126, pp. 139-154.

Hendranata, A. (2003). ARIMA (Autoregressive Moving Average). Manajemen

Keuangan Sektor Publik FEUI

Ho, S. L. and Xie, M. (1998). The Use of ARIMA Models for Reliability Forecasting

and Analysis. Computers ind. Engng, Vol. 35, Nos 1-2, pp. 213-216.

Joomizan, N. (2010). Reservoir Storage Simulation and Forecasting Models for Muda

Irrigation Scheme, Malaysia. Universiti Teknologi Malaysia.

Lee, C. and Ko, C. (2011). Short-term Load Forecasting Using Lifting Scheme and

ARIMA Models. Expert Systems with Applications, Vol. 38, pp. 5902-5911.

Leemis, L. (1998). Input Modeling. In Proceedings of the 1998 Winter Simulation

70

Conference, ed. D. J. Medeiros, E. F. Watson, J. S. Carson, and M. S.

Manivannan, 15–22. Piscataway, New Jersey: Institute of Electrical and

Electronics Engineers, Inc.

Maass, A., Hufschmidt, M. M., Dorfman, R., Thomas, H. A., Marglin, S. A., Fair and G.

M. (1962). The Design of Water-Resource Systems. Harvard University Press,

Cambridge, Mass., pp 467

Maia, A. L. S., de Carvalho, F. de A. T. and Ludermir, T. B. (2008). Forecasting Models

for Interval-valued Time Series. Neurocomputing, Vol. 71, pp. 3344-3352.

Modarres, R. (2007). Streamflow Drought Time Series Forecasting. Stoch Environ Res

Risk Assess.

Mohd Shafiek, Y. Hishamuddin, J. and Sobri, H. (2005). Daily Streamflow Forecasting

Using Simplified Rule-Based Fuzzy Logic System. Journal-The Institution of

Engineers, Malaysia, Vol. 66, No. 4.

Montgomery, D. C., Jennings, C. L., Kulahci, M. (2008). Introduction to Time Series

Analysis and Forecasting. John Wiley & Sons, Inc.

Moore, D. S. and McCabe, G. P. (1999). Introduction to the Practice of Statistics. Third

Edition. New York: W. H. Freeman.

Naadimuthu, G. and Lee, E. S. (1982). Stochastic Modelling and Optimization of Water

Resources Systems. Mathematical Modelling, Vol. 3, pp. 117-136.

Nazuha, M., Ruzaidah, S. and Zamzulani, M. (2010). Malaysia Crude Oil Production

Estimation: an Application of ARIMA Model. International Conference on

Science and Social Research (CSSR 2010)

71

O’Donovan, T. M. (1983). Short Term Forecasting: An Introduction to the Box-Jenkins

Approach. New York: Wiley.

Shalamu, A. (2009). Monthly and Seasonal Streamflow Forecasting in the Rio Grande

Basin. New Mexico State University

Shunway, R. H. (1988). Applied Statistical Time Series Analysis. Prentice Hall,

Englewood Cliffs, New Jersey.

SPSS (1993). SPSS for Windows-Trend. Release 6.0.

Tang, Z., Almeida, C. and Fishwick, P. A. (1991). Time Series Forecasting Using

Neural Networks vs. Box-Jenkins Methodology. Simulation.

Wang, W. (2006). Stochasticity, Nonlinearity and Forecasting of Streamflow Processes.

IOS Press, Amsterdam.

Wurbs, R. A. (2005). Comparative Evaluation of Generalized River/Reservoir Systems

Models. Texas Water Resources Institute, TR-282, pp. 27-131.

Yafee, R. and McGee, M. (2000). Introduction to Time Series Analysis and Forecasting

with Application of SAS and SPSS. Academic Press, Inc., New York.

Yurekli, K., Kurunc, A. and Simyek, H. (2004). Prediction of Daily Maximum

Streamflow Based on Stochastic Approaches. Journal of Spatial Hydrology,

Vol.4.

72

APPENDIX A

Streamflow Data of Sungai Bernam 1960-2010 i Jan Feb Mac Apr May Jun Jul Aug Sep Oct Nov Dec

1960 10.62 8.38 11.23 14.08 10.04 6.68 11.01 7.87 11.11 10.83 13.83 14.62 1961 8.72 5.95 6.26 8.40 10.07 8.50 6.84 8.27 11.27 10.47 12.04 15.52 1962 11.98 6.66 10.28 11.06 10.07 6.62 6.36 7.79 10.54 16.97 21.14 11.29 1963 8.06 6.35 6.37 6.02 6.20 6.76 6.45 9.29 12.09 17.82 23.87 15.37 1964 10.31 7.08 7.90 8.73 10.35 6.44 11.09 7.52 12.39 11.24 14.58 14.11 1965 6.65 6.12 7.96 15.45 16.39 8.23 4.64 5.81 9.32 16.98 15.76 18.94 1966 11.41 7.38 8.96 11.25 6.45 8.70 9.38 9.49 11.04 19.17 22.08 18.80 1967 14.99 9.69 6.91 12.13 11.64 5.89 5.79 5.94 9.91 12.40 21.74 9.84 1968 4.87 4.12 2.94 5.14 11.72 8.16 5.64 5.91 9.91 11.05 11.56 12.16 1969 9.61 6.95 4.83 6.75 12.50 8.24 4.26 6.21 5.52 11.84 10.30 9.08 1970 9.69 3.49 3.83 6.48 8.71 4.41 5.73 6.49 9.79 9.96 11.16 12.61 1971 17.40 6.48 8.20 5.99 7.06 5.06 4.77 8.17 11.53 6.94 14.11 18.31 1972 8.09 8.04 7.84 9.31 11.60 9.16 5.88 5.75 8.74 11.88 16.21 9.22 1973 6.27 5.26 5.29 9.88 11.09 8.87 5.04 7.74 7.42 10.81 9.60 7.13 1974 3.62 7.44 6.80 8.65 9.52 7.24 7.54 7.79 8.43 7.11 8.15 8.68 1975 8.49 7.38 10.91 11.46 11.50 8.29 10.70 6.90 11.33 6.14 10.71 13.69 1976 7.88 4.36 5.59 6.05 4.92 7.92 5.94 8.74 7.55 14.43 12.72 7.66 1977 6.84 4.90 3.34 3.80 5.61 6.51 4.10 3.73 4.38 18.96 13.17 8.54 1978 5.14 3.54 3.36 6.39 7.93 3.67 3.99 2.99 4.72 7.86 20.75 5.60 1979 4.08 4.33 3.87 5.70 5.86 5.24 5.30 5.01 8.86 7.79 12.88 6.31 1980 4.05 3.95 4.91 5.01 10.07 9.37 6.51 8.79 9.64 12.48 8.48 19.07 1981 8.07 6.99 4.38 10.73 11.90 7.37 4.83 4.62 9.23 7.44 15.02 10.03 1982 4.95 3.93 5.26 8.94 9.40 6.94 4.84 6.55 7.73 10.01 18.93 8.03 1983 4.02 2.47 3.84 3.18 5.42 3.65 4.39 5.68 10.02 5.57 7.60 8.18 1984 5.86 9.04 8.07 8.10 9.83 9.93 5.98 5.07 5.64 7.82 14.58 19.78 1985 9.55 8.80 9.26 6.99 11.31 5.53 5.34 4.85 7.14 12.75 21.16 15.96 1986 7.95 5.79 4.94 8.87 6.31 4.32 3.21 3.68 7.39 14.19 13.57 8.74 1987 4.38 4.07 3.76 5.67 6.03 4.51 4.75 9.03 12.35 18.64 12.26 9.13 1988 4.81 7.86 6.48 6.22 9.92 12.09 8.45 8.57 18.19 9.24 11.01 7.39 1989 4.56 2.91 6.13 11.96 11.79 8.73 8.27 5.09 8.02 12.06 17.90 9.49 1990 5.80 3.14 2.65 3.46 10.15 5.94 4.55 3.31 6.83 16.42 15.36 11.83 1991 3.89 3.51 5.83 7.66 11.97 9.24 5.59 5.09 7.83 13.20 13.55 8.27 1992 7.37 6.28 5.56 7.18 9.65 5.64 6.94 6.01 6.43 7.65 12.25 8.49 1993 7.10 8.00 7.56 11.41 12.62 7.69 8.96 6.15 9.60 12.40 12.87 16.91 1994 10.66 10.16 10.32 10.59 10.87 10.78 8.43 10.89 16.08 14.22 13.97 16.15 1995 10.85 9.37 11.74 13.89 14.36 14.15 13.34 16.59 12.99 13.99 17.08 16.12 1996 12.29 11.50 12.12 18.45 15.91 16.73 13.12 14.24 13.39 20.88 17.08 29.78 1997 16.24 19.71 20.96 20.22 17.51 20.14 19.05 15.69 18.91 21.15 26.92 18.46 1998 15.95 16.02 14.69 14.42 14.90 15.72 16.14 20.16 23.75 19.72 27.30 18.59 1999 9.18 9.77 11.69 11.82 13.56 9.54 7.52 8.99 10.50 13.37 11.77 16.19 2000 14.05 11.67 15.70 12.87 9.26 7.45 4.21 9.25 8.85 9.31 14.58 19.88 2001 13.58 9.33 8.05 13.36 10.84 7.30 5.72 5.05 8.66 6.43 10.47 7.83 2002 6.04 4.35 4.84 11.45 12.75 7.89 6.99 7.26 8.96 15.31 16.94 7.30 2003 7.45 6.91 5.85 7.34 9.23 6.69 7.32 6.71 8.80 13.25 19.15 9.72 2004 8.08 6.60 6.67 8.68 11.12 4.13 6.49 4.44 11.62 14.57 21.65 8.07 2005 3.87 2.73 4.38 4.51 6.26 6.07 4.56 5.63 3.53 14.99 16.39 18.46 2006 13.08 8.12 6.11 29.72 29.22 17.82 7.94 9.95 28.05 17.63 17.72 11.23 2007 9.05 6.80 7.62 13.46 12.05 11.38 13.06 8.95 9.36 14.33 14.26 8.24 2008 11.29 6.76 9.58 12.86 9.73 12.28 10.89 7.83 9.85 13.14 16.74 10.96 2009 9.73 9.67 15.10 13.72 8.75 7.31 8.05 9.03 10.08 7.99 12.73 6.88 2010 6.83 4.86 4.36 7.18 6.17 7.51 7.45 8.04 7.16 6.30 9.56 11.01

73

APPENDIX B

Logarithm of Observed Streamflow Data for 1960-2005 i Jan Feb Mac Apr May Jun Jul Aug Sep Oct Nov Dec

1960 0.056 0.051 0.058 0.064 0.055 0.046 0.057 0.049 0.058 0.057 0.063 0.065 1961 0.052 0.044 0.045 0.051 0.055 0.051 0.046 0.051 0.058 0.056 0.060 0.066 1962 0.059 0.046 0.056 0.057 0.055 0.046 0.045 0.049 0.056 0.069 0.075 0.058 1963 0.050 0.045 0.045 0.044 0.044 0.046 0.045 0.053 0.060 0.070 0.079 0.066 1964 0.056 0.047 0.050 0.052 0.056 0.045 0.057 0.048 0.060 0.058 0.065 0.064 1965 0.046 0.044 0.050 0.066 0.068 0.050 0.039 0.043 0.053 0.069 0.067 0.072 1966 0.058 0.048 0.052 0.058 0.045 0.052 0.053 0.054 0.057 0.072 0.077 0.072 1967 0.065 0.054 0.047 0.060 0.059 0.043 0.043 0.044 0.055 0.060 0.076 0.055 1968 0.040 0.037 0.031 0.041 0.059 0.050 0.043 0.043 0.055 0.057 0.058 0.060 1969 0.054 0.047 0.040 0.046 0.060 0.050 0.037 0.044 0.042 0.059 0.056 0.053 1970 0.054 0.034 0.036 0.045 0.052 0.038 0.043 0.045 0.054 0.055 0.058 0.061 1971 0.069 0.045 0.050 0.044 0.047 0.041 0.039 0.050 0.058 0.047 0.064 0.071 1972 0.050 0.050 0.049 0.053 0.059 0.053 0.043 0.043 0.052 0.059 0.067 0.053 1973 0.045 0.041 0.041 0.055 0.057 0.052 0.040 0.049 0.048 0.057 0.054 0.047 1974 0.035 0.048 0.046 0.052 0.054 0.048 0.049 0.049 0.051 0.047 0.050 0.052 1975 0.051 0.048 0.057 0.058 0.058 0.051 0.057 0.047 0.058 0.044 0.057 0.063 1976 0.049 0.038 0.042 0.044 0.040 0.050 0.044 0.052 0.049 0.064 0.061 0.049 1977 0.046 0.040 0.033 0.035 0.042 0.045 0.037 0.035 0.038 0.072 0.062 0.051 1978 0.041 0.034 0.033 0.045 0.050 0.035 0.036 0.032 0.039 0.049 0.075 0.042 1979 0.037 0.038 0.036 0.043 0.043 0.041 0.041 0.040 0.052 0.049 0.061 0.045 1980 0.037 0.036 0.040 0.040 0.055 0.053 0.045 0.052 0.054 0.060 0.051 0.072 1981 0.050 0.047 0.038 0.057 0.059 0.048 0.040 0.039 0.053 0.048 0.065 0.055 1982 0.040 0.036 0.041 0.052 0.053 0.047 0.040 0.046 0.049 0.055 0.072 0.050 1983 0.036 0.029 0.036 0.033 0.042 0.035 0.038 0.043 0.055 0.042 0.049 0.050 1984 0.043 0.053 0.050 0.050 0.055 0.055 0.044 0.041 0.043 0.049 0.065 0.073 1985 0.054 0.052 0.053 0.047 0.058 0.042 0.042 0.040 0.047 0.061 0.075 0.067 1986 0.050 0.043 0.040 0.052 0.045 0.038 0.033 0.035 0.048 0.064 0.063 0.052 1987 0.038 0.037 0.035 0.043 0.044 0.038 0.039 0.053 0.060 0.071 0.060 0.053 1988 0.040 0.049 0.045 0.045 0.055 0.060 0.051 0.051 0.071 0.053 0.057 0.048 1989 0.039 0.031 0.044 0.059 0.059 0.052 0.051 0.041 0.050 0.060 0.070 0.054 1990 0.043 0.032 0.030 0.034 0.055 0.044 0.039 0.033 0.046 0.068 0.066 0.059 1991 0.036 0.034 0.043 0.049 0.059 0.053 0.042 0.041 0.049 0.062 0.063 0.051 1992 0.048 0.045 0.042 0.047 0.054 0.043 0.047 0.044 0.045 0.049 0.060 0.051 1993 0.047 0.050 0.049 0.058 0.061 0.049 0.052 0.044 0.054 0.060 0.061 0.069 1994 0.056 0.055 0.056 0.056 0.057 0.057 0.051 0.057 0.067 0.064 0.063 0.067 1995 0.057 0.053 0.059 0.063 0.064 0.064 0.062 0.068 0.061 0.063 0.069 0.067 1996 0.060 0.058 0.060 0.071 0.067 0.068 0.062 0.064 0.062 0.075 0.069 0.086 1997 0.068 0.073 0.075 0.074 0.070 0.074 0.072 0.067 0.072 0.075 0.083 0.071 1998 0.067 0.067 0.065 0.064 0.065 0.067 0.067 0.074 0.079 0.073 0.083 0.071 1999 0.053 0.054 0.059 0.059 0.063 0.054 0.048 0.052 0.056 0.062 0.059 0.067 2000 0.064 0.059 0.067 0.061 0.053 0.048 0.037 0.053 0.052 0.053 0.065 0.073 2001 0.063 0.053 0.050 0.062 0.057 0.048 0.043 0.040 0.052 0.045 0.056 0.049 2002 0.044 0.038 0.040 0.058 0.061 0.050 0.047 0.048 0.052 0.066 0.069 0.048 2003 0.048 0.047 0.043 0.048 0.053 0.046 0.048 0.046 0.052 0.062 0.072 0.054 2004 0.050 0.046 0.046 0.052 0.058 0.037 0.045 0.038 0.059 0.065 0.076 0.050 2005 0.036 0.030 0.038 0.038 0.045 0.044 0.039 0.043 0.034 0.065 0.068 0.071 Mean 0.050 0.045 0.047 0.052 0.055 0.049 0.046 0.047 0.054 0.060 0.065 0.060

74

APPENDIX C

Generation of Random Number for Year 2006-2010 i RAND ( ) z erf -1 ti,j

Jan-06 0.699645 0.399289 0.370085 1.523379 Feb-06 0.45481 -0.090379 -0.08027 0.886483 Mar-06 0.063732 -0.872536 -1.0558 -0.49313 Apr-06 0.224711 -0.550577 -0.53482 0.243657 May-06 0.236038 -0.527923 -0.50847 0.280915 Jun-06 0.471912 -0.056176 -0.04983 0.929536 Jul-06 0.999341 0.998683 1.443813 3.041859

Aug-06 0.533139 0.066278 0.058805 1.083163 Sep-06 0.095672 -0.808656 -0.91763 -0.29772 Oct-06 0.044674 -0.910651 -1.15355 -0.63136 Nov-06 0.997494 0.994989 1.429319 3.021363 Dec-06 0.407816 -0.184368 -0.16487 0.766834 Jan-07 0.656401 0.312802 0.284724 1.402661 Feb-07 0.32176 -0.35648 -0.32724 0.537217 Mar-07 0.733219 0.466438 0.440226 1.622573 Apr-07 0.724521 0.449041 0.421663 1.596322 May-07 0.401592 -0.196816 -0.17623 0.750771 Jun-07 0.010641 -0.978717 -1.36824 -0.93498 Jul-07 0.096817 -0.806366 -0.91316 -0.2914

Aug-07 0.516508 0.033016 0.029268 1.041391 Sep-07 0.053638 -0.892724 -1.1059 -0.56398 Oct-07 0.222905 -0.554191 -0.53909 0.237618 Nov-07 0.612597 0.225195 0.2023 1.286095 Dec-07 0.663435 0.32687 0.298297 1.421856 Jan-08 0.143889 -0.712222 -0.75074 -0.0617 Feb-08 0.070315 -0.85937 -1.02497 -0.44952 Mar-08 0.523247 0.046495 0.041228 1.058306 Apr-08 0.919276 0.838551 0.978848 2.384299 May-08 0.705168 0.410335 0.381358 1.539321 Jun-08 0.237308 -0.525384 -0.50556 0.28503 Jul-08 0.877403 0.754806 0.819547 2.159015

Aug-08 0.425101 -0.149797 -0.13354 0.81114 Sep-08 0.402188 -0.195624 -0.17514 0.752312 Oct-08 0.338947 -0.322107 -0.29369 0.584661 Nov-08 0.687608 0.375216 0.345832 1.489081 Dec-08 0.014286 -0.971427 -1.34224 -0.89822 Jan-09 0.684203 0.368406 0.339046 1.479484 Feb-09 0.305343 -0.389314 -0.35998 0.490905 Mar-09 0.627906 0.255813 0.230738 1.326313 Apr-09 0.641724 0.283447 0.256729 1.36307 May-09 0.751243 0.502486 0.479699 1.678397 Jun-09 0.729118 0.458237 0.431438 1.610146 Jul-09 0.289185 -0.421629 -0.39299 0.444235

Aug-09 0.954236 0.908473 1.147587 2.622933 Sep-09 0.428914 -0.142173 -0.12667 0.820859 Oct-09 0.264273 -0.471453 -0.44563 0.369778 Nov-09 0.687481 0.374963 0.34558 1.488724 Dec-09 0.765445 0.530889 0.511878 1.723905 Jan-10 0.846072 0.692144 0.720449 2.018868 Feb-10 0.27472 -0.45056 -0.42327 0.401403 Mar-10 0.555255 0.110509 0.098252 1.138949 Apr-10 0.800866 0.601733 0.597223 1.8446 May-10 0.779092 0.558183 0.543827 1.769087 Jun-10 0.847218 0.694435 0.723842 2.023667 Jul-10 0.420992 -0.158017 -0.14097 0.800643

Aug-10 0.996074 0.992148 1.418338 3.005833 Sep-10 0.600695 0.20139 0.180416 1.255146 Oct-10 0.32158 -0.35684 -0.32759 0.536714 Nov-10 0.630127 0.260254 0.234893 1.332189 Dec-10 0.323203 -0.353593 -0.32439 0.541241

75

APPENDIX D

Markov Model Streamflow

Month, i Deterministic Component Random Component Model Flow qi-1,j-1 qj+bj(qi-1,j-1-qj-1) ti,j Sjti,j√(1-rj

2) qi,j (Log) Jan-06 0.050 0.049541669 1.523379 0.013 0.063 Feb-06 0.063 0.045386033 0.886483 0.007 0.053 Mar-06 0.053 0.04653475 -0.49313 -0.004 0.043 Apr-06 0.043 0.051865643 0.243657 0.002 0.054 May-06 0.054 0.054889433 0.280915 0.002 0.057 Jun-06 0.057 0.048803168 0.929536 0.007 0.055 Jul-06 0.055 0.046082272 3.041859 0.022 0.068

Aug-06 0.068 0.04726108 1.083163 0.008 0.055 Sep-06 0.055 0.053859993 -0.29772 -0.002 0.052 Oct-06 0.052 0.059642058 -0.63136 -0.005 0.055 Nov-06 0.055 0.065034911 3.021363 0.024 0.089 Dec-06 0.089 0.059661808 0.766834 0.007 0.067 Jan-07 0.067 0.049559131 1.402661 0.012 0.062 Feb-07 0.062 0.045384746 0.537217 0.004 0.050 Mar-07 0.050 0.046529892 1.622573 0.013 0.060 Apr-07 0.060 0.051883571 1.596322 0.014 0.066 May-07 0.066 0.054896185 0.750771 0.005 0.060 Jun-07 0.060 0.04880782 -0.93498 -0.007 0.042 Jul-07 0.042 0.046063986 -0.2914 -0.002 0.044

Aug-07 0.044 0.047223647 1.041391 0.007 0.055 Sep-07 0.055 0.053859658 -0.56398 -0.004 0.049 Oct-07 0.049 0.059640286 0.237618 0.002 0.062 Nov-07 0.062 0.065039039 1.286095 0.010 0.075 Dec-07 0.075 0.059650993 1.421856 0.013 0.073 Jan-08 0.073 0.049565308 -0.0617 -0.001 0.049 Feb-08 0.049 0.04536888 -0.44952 -0.004 0.042 Mar-08 0.042 0.046516145 1.058306 0.009 0.055 Apr-08 0.055 0.051878774 2.384299 0.021 0.073 May-08 0.073 0.05490011 1.539321 0.011 0.065 Jun-08 0.065 0.048815617 0.28503 0.002 0.051 Jul-08 0.051 0.046075966 2.159015 0.015 0.062

Aug-08 0.062 0.047251163 0.81114 0.006 0.053 Sep-08 0.053 0.053858046 0.752312 0.006 0.060 Oct-08 0.060 0.059649041 0.584661 0.005 0.064 Nov-08 0.064 0.065040692 1.489081 0.012 0.077 Dec-08 0.077 0.059652259 -0.89822 -0.008 0.051 Jan-09 0.051 0.049543394 1.479484 0.013 0.062 Feb-09 0.062 0.045385559 0.490905 0.004 0.049 Mar-09 0.049 0.046529249 1.326313 0.011 0.057 Apr-09 0.057 0.051881059 1.36307 0.012 0.064 May-09 0.064 0.054895021 1.678397 0.012 0.066 Jun-09 0.066 0.048816984 1.610146 0.012 0.060 Jul-09 0.060 0.046088968 0.444235 0.003 0.049

Aug-09 0.049 0.047231941 2.622933 0.019 0.066 Sep-09 0.066 0.053870927 0.820859 0.006 0.060 Oct-09 0.060 0.059649508 0.369778 0.003 0.063 Nov-09 0.063 0.065039672 1.488724 0.012 0.077 Dec-09 0.077 0.059652256 1.723905 0.016 0.076 Jan-10 0.076 0.049568162 2.018868 0.017 0.067 Feb-10 0.067 0.045391438 0.401403 0.003 0.049 Mar-10 0.049 0.046528015 1.138949 0.009 0.056 Apr-10 0.056 0.05187947 1.8446 0.016 0.068 May-10 0.068 0.05489742 1.769087 0.012 0.067 Jun-10 0.067 0.048817884 2.023667 0.014 0.063 Jul-10 0.063 0.046093027 0.800643 0.006 0.052

Aug-10 0.052 0.047235947 3.005833 0.021 0.069 Sep-10 0.069 0.053873657 1.255146 0.010 0.064 Oct-10 0.064 0.0596524 0.536714 0.004 0.064 Nov-10 0.064 0.065040467 1.332189 0.011 0.076 Dec-10 0.076 0.059651281 0.541241 0.005 0.065

76

APPENDIX E

Performance Evaluation Procedure of Markov Model

i Actual Flow (m3/s)

Model Flow (m3/s)

MAPE RMSE Chi-square Test

Jan-06 13.08 13.533 3.462 0.205001 0.015148 Feb-06 8.12 9.077 11.786 0.915831 0.100896 Mar-06 6.11 5.641 7.681 0.220272 0.039051 Apr-06 29.72 9.604 67.685 404.6583 42.13488 May-06 29.22 10.807 63.015 339.0362 31.37171 Jun-06 17.82 10.210 42.706 57.91601 5.672621 Jul-06 7.94 16.422 106.822 71.93914 4.380738

Aug-06 9.95 10.014 0.644 0.00411 0.00041 Sep-06 28.05 8.642 69.192 376.688 43.59034 Oct-06 17.63 9.821 44.298 61.00476 6.211446 Nov-06 17.72 32.326 82.432 213.3443 6.599874 Dec-06 11.23 15.849 41.109 21.31845 1.345112 Jan-07 9.05 13.020 43.872 15.7641 1.210723 Feb-07 6.80 7.992 17.535 1.421735 0.177887 Mar-07 7.62 12.065 58.336 19.76006 1.637769 Apr-07 13.46 15.262 13.384 3.245474 0.212657 May-07 12.05 12.299 2.069 0.06214 0.005052 Jun-07 11.38 5.514 51.550 34.41474 6.241801 Jul-07 13.06 6.059 53.607 49.01556 8.089859

Aug-07 8.95 9.874 10.339 0.855987 0.086689 Sep-07 9.36 7.877 15.846 2.199906 0.27929 Oct-07 14.33 13.038 9.013 1.668293 0.127953 Nov-07 14.26 21.161 48.391 47.61862 2.250341 Dec-07 8.24 19.599 137.858 129.0379 6.58374 Jan-08 11.29 7.719 31.625 12.74858 1.651481 Feb-08 6.76 5.384 20.351 1.89258 0.3515 Mar-08 9.58 10.032 4.721 0.20453 0.020387 Apr-08 12.86 19.404 50.886 42.82308 2.206928 May-08 9.73 15.098 55.171 28.81687 1.908638 Jun-08 12.28 8.379 31.768 15.21902 1.816363 Jul-08 10.89 13.007 19.439 4.481376 0.344538

Aug-08 7.83 9.219 17.734 1.928084 0.209153 Sep-08 9.85 12.127 23.119 5.185165 0.427585 Oct-08 13.14 14.502 10.398 1.865887 0.12866 Nov-08 16.74 22.302 33.224 30.93221 1.386991 Dec-08 10.96 8.531 22.161 5.899487 0.691526 Jan-09 9.73 13.343 37.128 13.05078 0.97813 Feb-09 9.67 7.856 18.764 3.292187 0.41909 Mar-09 15.10 10.970 27.352 17.05788 1.554974 Apr-09 13.72 14.160 3.205 0.193326 0.013653 May-09 8.75 15.629 78.619 47.32323 3.027875 Jun-09 7.31 12.423 69.942 26.14056 2.104243 Jul-09 8.05 7.800 3.111 0.062732 0.008043

Aug-09 9.03 15.337 69.845 39.77884 2.593644 Sep-09 10.08 12.388 22.897 5.327024 0.430014 Oct-09 7.99 13.587 70.046 31.32252 2.305389

77

Nov-09 12.73 22.299 75.168 91.56381 4.106203 Dec-09 6.88 21.522 212.820 214.3895 9.961389 Jan-10 6.83 15.834 131.827 81.06853 5.119965 Feb-10 4.86 7.597 56.317 7.491085 0.98606 Mar-10 4.36 10.312 136.513 35.42605 3.435427 Apr-10 7.18 16.493 129.701 86.72375 5.258356 May-10 6.17 15.985 159.084 96.34386 6.026957 Jun-10 7.51 13.908 85.191 40.93266 2.943131 Jul-10 7.45 8.744 17.405 1.680274 0.192169

Aug-10 8.04 16.912 110.346 78.70912 4.65409 Sep-10 7.16 14.091 96.798 48.03522 3.408991 Oct-10 6.30 14.296 126.927 63.94217 4.472611 Nov-10 9.56 21.417 124.026 140.5846 6.564209 Dec-10 11.01 14.672 33.261 13.41047 0.914016

53.659 7.29 250.9884

78

APPENDIX F

ARIMA Model Streamflow

i Actual Flow (m3/s)

Model Flow (m3/s) Residual Fit Coefficient

Jan-06 13.08 9.6732 * * 0.289364 Feb-06 8.12 7.1884 * * 0.878761 Mar-06 6.11 7.2612 * * 0.955283 Apr-06 29.72 9.0165 * * May-06 29.22 9.9281 * * Jun-06 17.82 7.6110 * * Jul-06 7.94 6.7046 * *

Aug-06 9.95 7.0851 * * Sep-06 28.05 9.5168 * * Oct-06 17.63 12.2889 * * Nov-06 17.72 15.2005 * * Dec-06 11.23 12.3581 * * Jan-07 9.05 7.9227 * * Feb-07 6.80 6.6970 -1.57988 7.5299 Mar-07 7.62 7.1341 -1.39072 7.6507 Apr-07 13.46 8.9949 -1.05700 9.4570 May-07 12.05 9.9369 -0.14946 10.2195 Jun-07 11.38 7.6286 1.10867 7.3913 Jul-07 13.06 6.7248 -1.04180 7.8818

Aug-07 8.95 7.1060 1.04920 7.2208 Sep-07 9.36 9.5379 0.26505 11.0050 Oct-07 14.33 12.3101 -2.99026 13.4603 Nov-07 14.26 15.2217 -3.58500 15.6250 Dec-07 8.24 12.3794 4.03841 11.4816 Jan-08 11.29 7.9439 1.99786 9.9828 Feb-08 6.76 6.7182 -1.67458 8.3305 Mar-08 9.58 7.1553 2.57792 7.7005 Apr-08 12.86 9.0161 0.10621 10.9538 May-08 9.73 9.9581 -1.42906 11.4991 Jun-08 12.28 7.6499 -1.18154 7.8015 Jul-08 10.89 6.7460 -1.02019 7.3802

Aug-08 7.83 7.1273 0.57523 7.2148 Sep-08 9.85 9.5592 -0.37209 10.9121 Oct-08 13.14 12.3314 3.89633 13.0737 Nov-08 16.74 15.2429 3.01737 18.1226 Dec-08 10.96 12.4006 -4.56349 15.8535 Jan-09 9.73 7.9651 -1.32522 9.3852 Feb-09 9.67 6.7394 -0.91615 7.2661 Mar-09 15.10 7.1765 -1.58516 7.9552 Apr-09 13.72 9.0373 -3.54478 9.5648 May-09 8.75 9.9794 -3.07188 9.2719 Jun-09 7.31 7.6711 1.04294 5.7171 Jul-09 8.05 6.7673 -0.27357 6.7266

Aug-09 9.03 7.1485 2.58702 6.7039 Sep-09 10.08 9.5804 1.07675 11.0133 Oct-09 7.99 12.3526 4.26644 13.5536 Nov-09 12.73 15.2642 5.43986 18.4266

79

Dec-09 6.88 12.4218 -1.30056 16.6716 Jan-10 6.83 7.9864 -0.79690 11.1109 Feb-10 4.86 6.7607 -1.45611 8.5383 Mar-10 4.36 7.1978 -0.78662 8.6866 Apr-10 7.18 9.0586 -1.79769 10.5277 May-10 6.17 10.0006 -0.43999 10.7900 Jun-10 7.51 7.6923 -1.69829 8.1383 Jul-10 7.45 6.7885 3.62122 7.4688

Aug-10 8.04 7.1698 -1.95910 9.4791 Sep-10 7.16 9.6017 1.06042 11.3296 Oct-10 6.30 12.3739 -3.37562 14.6156 Nov-10 9.56 15.2854 -2.06698 16.6470 Dec-10 11.01 12.4431 1.18330 12.9267

80

APPENDIX G

Performance Evaluation Procedure of ARIMA Model

i Actual Flow (m3/s)

Model Flow (m3/s)

MAPE RMSE Chi-square Test

Jan-06 13.08 9.6732 26.046 11.606 1.200 Feb-06 8.12 7.1884 11.473 0.868 0.121 Mar-06 6.11 7.2612 18.841 1.325 0.183 Apr-06 29.72 9.0165 69.662 428.633 47.538 May-06 29.22 9.9281 66.023 372.178 37.487 Jun-06 17.82 7.6110 57.290 104.224 13.694 Jul-06 7.94 6.7046 15.559 1.526 0.228

Aug-06 9.95 7.0851 28.793 8.208 1.158 Sep-06 28.05 9.5168 66.072 343.480 36.092 Oct-06 17.63 12.2889 30.303 28.547 2.323 Nov-06 17.72 15.2005 14.215 6.344 0.417 Dec-06 11.23 12.3581 10.029 1.269 0.103 Jan-07 9.05 7.9227 12.457 1.271 0.160 Feb-07 6.80 6.6970 1.515 0.011 0.002 Mar-07 7.62 7.1341 6.377 0.236 0.033 Apr-07 13.46 8.9949 33.173 19.937 2.217 May-07 12.05 9.9369 17.536 4.465 0.449 Jun-07 11.38 7.6286 32.965 14.073 1.845 Jul-07 13.06 6.7248 48.508 40.135 5.968

Aug-07 8.95 7.1060 20.594 3.397 0.478 Sep-07 9.36 9.5379 1.901 0.032 0.003 Oct-07 14.33 12.3101 14.095 4.080 0.331 Nov-07 14.26 15.2217 6.744 0.925 0.061 Dec-07 8.24 12.3794 50.235 17.134 1.384 Jan-08 11.29 7.9439 29.638 11.196 1.409 Feb-08 6.76 6.7182 0.618 0.002 0.000 Mar-08 9.58 7.1553 25.310 5.879 0.822 Apr-08 12.86 9.0161 29.890 14.776 1.639 May-08 9.73 9.9581 2.345 0.052 0.005 Jun-08 12.28 7.6499 37.705 21.438 2.802 Jul-08 10.89 6.7460 38.053 17.172 2.546

Aug-08 7.83 7.1273 8.975 0.494 0.069 Sep-08 9.85 9.5592 2.948 0.084 0.009 Oct-08 13.14 12.3314 6.129 0.648 0.053 Nov-08 16.74 15.2429 8.943 2.241 0.147 Dec-08 10.96 12.4006 13.144 2.075 0.167 Jan-09 9.73 7.9651 18.138 3.115 0.391 Feb-09 9.67 6.7394 30.306 8.588 1.274 Mar-09 15.10 7.1765 52.473 62.781 8.748 Apr-09 13.72 9.0373 34.130 21.927 2.426 May-09 8.75 9.9794 14.050 1.511 0.151 Jun-09 7.31 7.6711 4.940 0.130 0.017 Jul-09 8.05 6.7673 15.934 1.645 0.243

Aug-09 9.03 7.1485 20.836 3.540 0.495 Sep-09 10.08 9.5804 4.956 0.250 0.026 Oct-09 7.99 12.3526 54.601 19.032 1.541

81

Nov-09 12.73 15.2642 19.907 6.422 0.421 Dec-09 6.88 12.4218 80.550 30.712 2.472 Jan-10 6.83 7.9864 16.931 1.337 0.167 Feb-10 4.86 6.7607 39.109 3.613 0.534 Mar-10 4.36 7.1978 65.087 8.053 1.119 Apr-10 7.18 9.0586 26.164 3.529 0.390 May-10 6.17 10.0006 62.085 14.674 1.467 Jun-10 7.51 7.6923 2.428 0.033 0.004 Jul-10 7.45 6.7885 8.848 0.434 0.064

Aug-10 8.04 7.1698 10.824 0.757 0.106 Sep-10 7.16 9.6017 34.102 5.962 0.621 Oct-10 6.30 12.3739 96.411 36.892 2.981 Nov-10 9.56 15.2854 59.889 32.780 2.145 Dec-10 11.01 12.4431 13.016 2.054 0.165

27.497 5.416 191.114