Lump Semi Select

12
Bangladesh Short term Discharge Forecasting time series forecasting Tom Hopson A project supported by USAID

description

Lump Semi Select

Transcript of Lump Semi Select

Page 1: Lump Semi Select

Bangladesh Short term Discharge Forecasting

time series forecasting

Tom HopsonA project supported by USAID

Page 2: Lump Semi Select

Forecasting Probabilities

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 1 2 3 4 5 6

Rainfall Probability

Rainfall [mm]

Discharge Probability

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

10,000 30,000 50,000 70,000 90,000

Discharge [m^3/s]

Above danger level probablity 36%Greater than climatological seasonal risk?

Page 3: Lump Semi Select
Page 4: Lump Semi Select

Data-Based ModelingLinear Transfer Function Approach

SQ=S/T

Mass Balance

dS/dt=u-Q

Combine to get

TdQ/dt=u-Q

For a catchment composed of linear stores inseries and in parallel (using finite differences)

Qt=a1ut-1+a2ut-2+…+amut-m+b1Qt-1+b2Qt-2+…+bnQt-n

where u is effective catchment-averaged rainfallDerived from non-linear rainfall filter ut=(Qt)c Rt

Reference: Beven, 2000

(=> Used for the lumped model)

Page 5: Lump Semi Select

Linear Transfer Function Approach (cont)

Qt+3=a1ut+2+a2ut+1+…+amut-m+b1Qt-1+b2Qt-2+…+bnQt-n

or for a 3-day forecast, say:

Our approach: for each day and forecast, use the AIC (Akaike information criterion) to optimize a’s, m, b’s, n, c, and precip smoothing

Residuals (model biases) are then corrected using anARMA (auto-regressive moving average) model

=> Something available in R

Page 6: Lump Semi Select

Autoregressive integrated moving average (ARIMA)

in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalisation of an autoregressive moving average (ARMA) model. These models are fitted to time series data either to better understand the data or to predict future points in the series. They are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied to remove the non-stationarity.

The model is generally referred to as an ARIMA(p,d,q) model where p, d, and q are integers greater than or equal to zero and refer to the order of the autoregressive, integrated, and moving average parts of the model respectively. ARIMA models form an important part of the Box-Jenkins approach to time-series modelling.

Reference: The Analysis of Time Series: An introduction. Texts in Statistical Science. Chatfield. 1996

Page 7: Lump Semi Select

Semi-distributed Model --

2-layer model for soil moisture states S1, S2

-Parameters to be estimated from FAO soil map of the world-Solved with a 6hr time-step (for daily 0Z discharge) using 4th-order Runge-Kutta semi-implicit scheme-t_s1, tp, t_s2 time constants; r_s1, r_s2 reservoir depths

+

Page 8: Lump Semi Select

Model selection -- Akaike information criterion

Akaike's information criterion, developed by Hirotsugu Akaike under the name of "an information criterion" (AIC) in 1971 and proposed in Akaike (1974), is a measure of the goodness of fit of an estimated statistical model. It is grounded in the concept of entropy, in effect offering a relative measure of the information lost when a given model is used to describe reality and can be said to describe the tradeoff between bias and variance in model construction, or loosely speaking that of precision and complexity of the model.

The AIC is not a test on the model in the sense of hypothesis testing, rather it is a tool for model selection. Given a data set, several competing models may be ranked according to their AIC, with the one having the lowest AIC being the best. From the AIC value one may infer that e.g the top three models are in a tie and the rest are far worse, but one should not assign a value above which a given model is 'rejected'

Page 9: Lump Semi Select

Model selection --

Akaike information criterion

AIC = 2k – 2 ln(L)

k = # model parameters; L = maximum likelihood estimator (e.g. square error sum)

Bayesian information criterion

BIC = ln(n) k - 2 ln(L)

n = # of data points

BIC penalty function is more demanding than AIC

Page 10: Lump Semi Select

Model selection --

-- most robust (secure), but most computationally-demanding!

-- Set aside part of data for testing, ‘train’ on other part; best to cycle through to use all data for testing.

e.g. If divide in halves (minimum), then 2X the computations required!

Cross-validation

Page 11: Lump Semi Select

Model selection --

Fitting an auto-regressive model

>zz <- ar(x, order.max = 100, method = “yule-walker”)

Fitting an ARIMA model

>zz <- arima.mle(x, model = list(order = c(a, b, c)))

R commands

Page 12: Lump Semi Select

Model selection --

1) Create three random normal vectors of data of length 200, call them p, r1, r2

2) Create a 4th vector such that q=10*p+r13) Set aside ½ of the data in each vector4) Using linear regression, solve for q as a function of p using the first ½ of

data, and calculate the square error5) Next, again using linear regression, solve for q as a function of p and r2

using the first ½ of data, and calculate the square error6) Compare the square errors of 4) and 5). Which one did you expect to be

smaller?7) Next, calculate the AIC of 4) and 5). Which one did you expect to be

smaller?8) Finally, with the coefficients determined in steps 4) and 5), estimate q for

each model using the other ½ of the data, and calculate the square error9) Comparing the square errors, which one is smaller? Makes sense?

Try this out in R!