Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia...

12
Data mining issues on Data mining issues on improving the accuracy of the improving the accuracy of the rainfall-runoff model for rainfall-runoff model for flood forecasting flood forecasting Jia Liu Jia Liu Supervisor: Dr. Supervisor: Dr. Dawei Han Dawei Han Email: [email protected] Email: [email protected] WEMRC, Department of Civil Engineering WEMRC, Department of Civil Engineering University of Bristol University of Bristol 24 May 2010 24 May 2010

Transcript of Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia...

Page 1: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

Data mining issues on Data mining issues on improving the accuracy of the improving the accuracy of the rainfall-runoff model for flood rainfall-runoff model for flood

forecastingforecasting

Jia LiuJia Liu

Supervisor: Dr.Supervisor: Dr. Dawei HanDawei Han

Email: [email protected]: [email protected]

WEMRC, Department of Civil EngineeringWEMRC, Department of Civil Engineering

University of BristolUniversity of Bristol

24 May 201024 May 2010

Page 2: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

OutlinesOutlines

Introduction to the Probability Distributed Model (PDM)Introduction to the Probability Distributed Model (PDM)

Two data mining issues:Two data mining issues:

Selection of data for model calibrationSelection of data for model calibration

Optimal data time interval in flood forecastingOptimal data time interval in flood forecasting

Conclusions and Future workConclusions and Future work

Page 3: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

Introduction to rainfall-runoff modelIntroduction to rainfall-runoff modelHydrological CycleHydrological Cycle

Rainfall-Runoff ModelRainfall-Runoff Model

RunoffRunoff

Rainfall (and Evaporation)

Rainfall (and Evaporation)

A conceptual representation of the hydrological cycleA conceptual representation of the hydrological cycle

The fundamental work for any water researches, i.e., The fundamental work for any water researches, i.e.,

real-time flood forecasting, land-use change evaluationsreal-time flood forecasting, land-use change evaluations

and design of hydraulic structures, etc.and design of hydraulic structures, etc.

Rainfall-runoff modelRainfall-runoff model

Page 4: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

Introduction to rainfall-runoff modelIntroduction to rainfall-runoff modelHydrological CycleHydrological Cycle

A conceptual representation of the hydrological cycleA conceptual representation of the hydrological cycle

The fundamental work for any water researches, i.e., The fundamental work for any water researches, i.e.,

real-time flood forecasting, land-use change real-time flood forecasting, land-use change

evaluations and design of hydraulic structures, etc.evaluations and design of hydraulic structures, etc.

Rainfall-runoff modelRainfall-runoff model

Probability Distributed ModelProbability Distributed Modelby Moore (1985) by Moore (1985)

13 Model Parameters 13 Model Parameters to be calibratedto be calibrated

ffcc, , TTdd, c, cminmin, c, cmaxmax, b, b, b, bee, k, kgg, ,

bbgg, S, Stt, k, k11, k, k22, k, kbb, q, qcc

Page 5: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

How to cope with the ‘data rich’ environment?How to cope with the ‘data rich’ environment?

Questions proposed:Questions proposed: A. How to select the most appropriate data to calibrate the model?A. How to select the most appropriate data to calibrate the model?

2. Which period the data should be selected from?2. Which period the data should be selected from?

1. How long the data should be?1. How long the data should be? Data LengthData Length

Data DurationData Duration

B. When used for forecasting, what is the most appropriate sampling rate?B. When used for forecasting, what is the most appropriate sampling rate?

Data Time IntervalData Time Interval

Large quantityLarge quantityDataData Fast sampling rateFast sampling rate++

Page 6: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

Calibration data selection: data length and durationCalibration data selection: data length and duration

Data used for model validation is often determined. Data used for model validation is often determined.

We assume that the more similarity the calibration data bears to the validation data, We assume that the more similarity the calibration data bears to the validation data,

the better performance the rainfall-runoff model should have after calibration. the better performance the rainfall-runoff model should have after calibration.

0

5

10

15

20

25

30

m3/

s

0

20

40

60

80

100

mm

Validation data set

A good information qualityA good information quality of the calibration data set = of the calibration data set =

A similar information content to validation data setA similar information content to validation data set

Calibration data set

Comparison of the information Comparison of the information quality of the two data setsquality of the two data sets

Page 7: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

Calibration data selection: data length and durationCalibration data selection: data length and duration

2jj kkE C

2jj kkE S

jj

jj E

EP

An indexAn index which can reveal the similarity between the calibration and validation data which can reveal the similarity between the calibration and validation data

sets, can be used as a guide for calibration data selection for the rainfall-runoff model.sets, can be used as a guide for calibration data selection for the rainfall-runoff model.

Information Cost Function (ICF)Information Cost Function (ICF)

ICF lnj jj

P P The Information Cost Function (ICF) is a an entropy-like function that gives a good estimate of the degree of disorder of a system

Energy of detail

Energy of approximation

Percentile energy on each decomposition level

Fast Fourier TransformFast Fourier Transform

Discrete Wavelet DecompositionDiscrete Wavelet Decomposition

Flow Duration CurveFlow Duration Curve

Liu, J., and D. Han (2010), Indices for calibration data selection of the rainfall-runoff model, Water Resour. Res., 46, W04512, doi:10.1029/2009WR008668.

Page 8: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

X

Z

YX1

XN

YN

Y1

Z1

ZN

Forecast lead time Data time interval

Model error

X 1

Z 1

Error

Time interval

Z N

X N

Error

Time interval

Long lead time

Short lead time

Optimal data time interval – for the forecast modeOptimal data time interval – for the forecast modeBf s 2

Optimal time intervalOptimal time intervalSampling theorySampling theory

Bf s 2Lower boundary: Lower boundary:

Too slowToo slow Too fastToo fast

Leading to numerical problemsLeading to numerical problems

[[Åström, 1968Åström, 1968;; Ljung, 1989]Ljung, 1989]

Sampling rate of model input dataSampling rate of model input data

Hypothetical curveHypothetical curve

A positive relationA positive relation

Data time interval

Forecast lead time

Page 9: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

Optimal data time interval – for the forecast modeOptimal data time interval – for the forecast modeBf s 2

Case studyCase study

Auto-Regressive Moving Average Auto-Regressive Moving Average

(ARMA) model for on-line updating(ARMA) model for on-line updating

Four catchments are selected from Four catchments are selected from

the Southwest England:the Southwest England:

CatchmentsCatchmentsAREA AREA (km(km22))

LDP LDP (km)(km)

DPSBAR DPSBAR (m/km)(m/km)

A A BelleverBellever 21.521.5 13.513.5 94.994.9

B B HalsewaterHalsewater 87.887.8 19.419.4 85.785.7

C C Brue Brue 135.2135.2 22.622.6 71.171.1

D D Bishop_HullBishop_Hull 202.0202.0 40.240.2 98.098.0

LDP: longest drainage path (km)

DPSBAR: mean drainage path slope (m/km)

51°05′N

51°00′N

3°10′W 3°05′W3°15′W4°00′W 3°55′W

50°35′N

50°40′N

2°35′W 2°30′W 2°25′W

51°10′N

51°05′N

3°20′W 3°15′W 3°10′W

51°05′N

51°00′N

Bellever Halsewater

Brue Bishop_Hull

Page 10: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

Optimal data time interval – for the forecast modeOptimal data time interval – for the forecast modeBf s 2

Case studyCase study

The positive pattern between the The positive pattern between the

optimal data time interval and the optimal data time interval and the

forecast lead time is found to be forecast lead time is found to be

highly related to the highly related to the catchment catchment

concentration timeconcentration time..

CatchmentsCatchmentsAREA AREA (km(km22))

LDP LDP (km)(km)

DPSBAR DPSBAR (m/km)(m/km)

A A BelleverBellever 21.521.5 13.513.5 94.994.9

B B HalsewaterHalsewater 87.887.8 19.419.4 85.785.7

C C Brue Brue 135.2135.2 22.622.6 71.171.1

D D Bishop_HullBishop_Hull 202.0202.0 40.240.2 98.098.0

LDP: longest drainage path (km)

DPSBAR: mean drainage path slope (m/km)

Bellever Halsewater

Brue Bishop_Hull

015

30

60

120

0123456

9

120

0.2

0.4

0.6

0.8

1

XY

Z

015

30

60

120

0123456

9

120

0.2

0.4

0.6

0.8

1

XY

Z

015

30

60

120

0123456

9

120

0.2

0.4

0.6

0.8

1

XY

Z

015

30

60

120

0123456

9

120

0.2

0.4

0.6

0.8

1

XY

Z

Page 11: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

Conclusions and Future workConclusions and Future work

Selecting data with the most appropriate Selecting data with the most appropriate length, duration and time intervallength, duration and time interval is of great is of great

significance in improving the model performance and helps to enhance the efficiency significance in improving the model performance and helps to enhance the efficiency

of data utilization in rainfall-runoff modelling and forecasting.of data utilization in rainfall-runoff modelling and forecasting.

More research is needed to explore the applicability of the ICF index for calibration data More research is needed to explore the applicability of the ICF index for calibration data

selection and to verify the hypothetical curve of the optimal data time interval.selection and to verify the hypothetical curve of the optimal data time interval.

Weather Research & Forecasting (WRF) ModelWeather Research & Forecasting (WRF) Model

Rainfall-Runoff ModelRainfall-Runoff Model

RunoffRunoff

Rainfall (and Evaporation)

Rainfall (and Evaporation)

As real-time inputsAs real-time inputs

Updated by observationsUpdated by observations

Page 12: Data mining issues on improving the accuracy of the rainfall-runoff model for flood forecasting Jia Liu Supervisor: Dr. Dawei Han Email: Jia.Liu@bristol.ac.uk.

The EndThe End

Thank you for your attention!Thank you for your attention!