DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED REGIONAL ...

DEVELOPMENT OF ARTIFICIAL

INTELLIGENCE BASED REGIONAL

FLOOD FREQUENCY ANALYSIS

TECHNIQUE

Kashif Aziz, BScEng, MEng

Student ID 16658598

A thesis submitted for fulfilment for the degree of

Doctor of Philosophy in Civil Engineering

Supervisory Panel:

Assoc Prof Ataur Rahman

Assoc Prof Gu Fang

Assoc Prof Surendra Shrestha

School of Computing, Engineering and Mathematics

University of Western Sydney, Australia

December 2014

Artificial Intelligence Based RFFA Aziz

University of Western Sydney II

ABSTRACT

Flood is one of the worst natural disasters, which brings disruptions to services and damages

to infrastructure, crops and properties and sometimes causes loss of human lives. In Australia,

the average annual flood damage is worth over $377 million, and infrastructure requiring

design flood estimate is over $1 billion per annum. The 2010-11 devastating flood in

Queensland alone caused flood damage over $5 billion.

Design flood estimation is required in numerous engineering applications e.g., design of

bridge, culvert, weir, spill way, detention basin, flood protection levees, highways, floodplain

modelling, flood insurance studies and flood damage assessment tasks. For design flood

estimation, the most direct method is flood frequency analysis, which requires long period of

recorded streamflow data at the site of interest. This is not a feasible option at many locations

due to absence or limitation of streamflow records. For these ungauged or poorly gauged

catchments, regional flood frequency analysis (RFFA) is adopted. The use of RFFA enables

the transfer of flood characteristics information from gauged to ungauged catchments. RFFA

essentially consists of two principal steps: (i) formation of regions; and (ii) development of

prediction equations.

For developing the regional flood prediction equations, the commonly used techniques

include the rational method, index flood method and quantile regression technique. These

techniques adopt a linear method of transforming inputs to outputs. Since hydrologic systems

are non-linear, RFFA techniques based on non-linear method can be a better alternative to

linear methods. Among the non-linear methods, artificial intelligence based techniques have

been widely adopted to various water resources engineering problems. However, their

application to RFFA is quite limited. Hence, this research focuses on the development of

artificial intelligence based RFFA methods for Australia. The non-linear techniques

considered in this thesis include artificial neural network (ANN), genetic algorithm based

artificial neural network (GAANN), gene-expression programing (GEP) and co-active neuro

fuzzy inference system (CANFIS).

This study uses data from 452 small to medium sized catchments from eastern Australia. In

the development/training of the artificial intelligence based RFFA models, the selected 452

catchments are divided into two parts randomly: (i) training data set consisting of 362

catchments; and (ii) validation data set consisting of 90 catchments. It has been found that a


University of Western Sydney III

RFFA model with two predictor variables i.e., catchment area and design rainfall intensity

provides more accurate flood quantile estimates than other models with a greater number of

predictor variables. The results show that when the data from all the eastern Australian states

are combined to form one region, the resulting ANN based RFFA model performs better as

compared with other candidate regions such as regions based on state boundaries,

geographical and climatic boundaries and the regions formed in the catchment characteristics

data space.

In the training of the four artificial intelligence based RFFA models, no model performs the

best for all the six average recurrence intervals over all the adopted statistical criteria. Overall,

the ANN based RFFA model performs better than the three other models in the

training/calibration.

In this research, it also has been found that non-linear artificial intelligence based RFFA

techniques can be applied successfully to eastern Australian catchments. Among the four

artificial intelligence based models considered in this study, the ANN based RFFA model has

demonstrated best performance based on independent split-sample validation, followed by the

GAANN based RFFA model. The ANN based RFFA model has been found to outperform the

ordinary least squares based RFFA model. Based on independent validation, the median

relative error values for the ANN based RFFA model are found to be in the range of 35% to

44% for eastern Australia, which is comparable to the generalised least squares regression and

region-of-influence based RFFA approach. The ANN based RFFA model exhibits no

noticeable spatial trend in the relative error values. Furthermore, the relative error values of

the ANN based RFFA model are found to be independent of catchment area.

The findings of this research would help to recommend the most appropriate RFFA

techniques in the 4th edition of Australian Rainfall and Runoff, which is due to be published in

2015.


University of Western Sydney IV

STATEMENT OF AUTHENTICTY

I certify that all materials presented in this thesis are of my own contribution, and that any

work adopted from other sources is duly cited and referenced as such. This thesis contains no

material that has been submitted for any award or degree in other university or institution.

Kashif Aziz


University of Western Sydney V

ACKNOWLEDGMENTS

I would like to express my heartfelt gratitude to Associate Professor Ataur Rahman, who is

not only a mentor of mine but a role model as well. This work would have not been possible

without his support, encouragement and most importantly the patience during the completion

of this work. I am also grateful to Associate Professor Gu Fang and Associate Professor

Surendra Shrestha for their valuable advice, support and constructive feedback towards the

completion of this research. I could not be prouder of my academic roots and hope that I can

in turn pass on the research values and the dreams that my supervisors have given to me.

I would not have contemplated this road if not for my parents, Mr. and Mrs. Choudhry Abdul

Aziz (late), who instilled within me a love of knowledge and a spirit of struggle to achieve the

goal, all of which finds a place in this thesis. To my parents, thank you. I sincerely

acknowledge and appreciate the support and patience of my wife Rabia Rehman during this

study by looking after myself and our kids. I am also thankful to my family and friends in

Australia and overseas for their prayers and encouragement.

To the staff and fellow students at University of Western Sydney’s School of Computing

Engineering and Mathematics, I am grateful for your help, encouragement and the company I

have enjoyed during my candidature. Thank you for welcoming me as a friend and for your

moral support.

I would like to acknowledge the technical and financial support of all the related Government

agencies for providing the resources towards the completion of this research.


University of Western Sydney VI

Publications made (UNTIL June 2015) from

this study

Aziz. K., Rahman, A., Fang, G., Shrestha, S. (2014). Application of Artificial Neural

Networks in Regional Flood Frequency Analysis: A Case Study for Australia, Stochastic

Environment Research & Risk Assessment, 28, 3, 541-554.

Aziz, K., Rai, S., Rahman, A. (2014). Design flood estimation in ungauged catchments using

genetic algorithm based artificial neural network (GAANN) technique for Australia, Natural

Hazards, 77, 2, 805-821.

Aziz, K., Rahman, A., Shamseldin, A.Y., Shoaib, M. (2013). Co-Active Neuro Fuzzy

Inference System for Regional Flood Estimation in Australia, Journal of Hydrology and

Environment Research, 1, 1, 11-20.

Aziz, K., Sohail, R., Rahman, A. (2014). Application of Artificial Neural Networks and

Genetic Algorithm for Regional Flood Estimation in Eastern Australia, 35th Hydrology and

Water Resources Symposium, Perth, Engineers Australia, 24-27 Feb, 2014.

Aziz, K., Rahman, A., Shamseldin, A., Shoaib, M. (2013). Regional flood estimation in

Australia: Application of gene expression programming and artificial neural network

techniques, 20th International Congress on Modelling and Simulation, 1 to 6 December, 2013,

Adelaide, Australia, 2283-2289.

Aziz, K., Rahman, A., Fang, G. Shrestha, S. (2012). Comparison of Artificial Neural

Networks and Adaptive Neuro-fuzzy Inference System for Regional Flood Estimation in

Australia, Hydrology and Water Resources Symposium, Engineers Australia, 19-22 Nov

2012, Sydney, Australia.

Aziz, K., Rahman, A., Shrestha, S., Fang, G. (2011). Derivation of optimum regions for ANN

based RFFA in Australia, 34th IAHR World Congress, 26 June – 1 July 2011, Brisbane, 17-

24.

Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2011). Application of Artificial Neural

Networks in Regional Flood Estimation in Australia: Formation of Regions Based on

Catchment Attributes, The Thirteenth International Conference on Civil, Structural and

Environmental Engineering Computing and CSC2011: The Second International Conference

on Soft Computing Technology in Civil, Structural and Environmental Engineering, Chania,


University of Western Sydney VII

Crete, Greece, 6-9 September, 2011, 13 pp.

Aziz, K., Rahman, A., Fang, G., Haddad, K. and Shrestha, S. (2010). Design flood estimation

for ungauged catchments: Application of artificial neural networks for eastern Australia,

World Environmental and Water Resources Congress 2010, American Society of Civil

Engineers (ASCE), 16-20 May 2010, Providence, Rhode Island, USA, pp. 2841-2850.


University of Western Sydney VIII

TABLE OF CONTENTS

ABSTRACT………………………………………………………………………………… II

STATEMENT OF AUTHENTICITY…………………………………………………… IV

ACKNOWLEDGEMENT ……………………………………………………………… … V

PUBLICATIONS MADE (UNTIL DECEMBER 2014) FROM THIS

STUDY................................................................................................................................. VI

LIST OF FIGURES……………………………………………………………………… XII

LIST OF TABLES…………………………………………………………………… XVIII

LIST OF SYMBOLS…………………………………………………………………… XX

LIST OF ABBREVIATIONS…………………………………………………………….XXII

CHAPTER 1 .......................................................................................................................... 1

INTRODUCTION ................................................................................................................. 1

1.1 General ......................................................................................................................... 1

1.2 Background .................................................................................................................. 1

1.3 Need for this research .................................................................................................. 5

1.4 Scope and objectives of the study ................................................................................ 6

1.5 Research questions ....................................................................................................... 7

1.6 Summary of research undertaken in this thesis ............................................................ 8

1.7 Outline of the thesis ..................................................................................................... 9

CHAPTER 2 ........................................................................................................................ 12

REVIEW OF REGIONAL FLOOD FREQUENCY ANALYSIS METHODS .................. 12

2.1 General ....................................................................................................................... 12

2.2 Design flood estimation methods ............................................................................... 12

2.2.1 Streamflow-based flood estimation methods ............................................... 13

2.3 Techniques for RFFA .................................................................................................... 15

2.3.1 Linear techniques .............................................................................................. 15

2.3.2 Non-linear RFFA techniques ............................................................................ 21

2.4 Summary .................................................................................................................. 32

CHAPTER 3 ........................................................................................................................ 34

METHODOLOGY .............................................................................................................. 34

3.1 General........................................................................................................................... 34

3.2 Methods adopted in the study ........................................................................................ 34

3.2.1 Artificial neural network (ANN) ....................................................................... 35

3.2.2 Genetic algorithm based ANN (GAANN) ........................................................ 39

3.2.3 Gene-expression programming ......................................................................... 45

3.2.4 Co-active neuro fuzzy inference system (CANFIS) ......................................... 47


University of Western Sydney IX

3.2.5 Quantile regression technique (QRT) ............................................................... 51

3.2.6 Cluster analysis ................................................................................................. 53

3.2.7 Principle component analysis (PCA) ................................................................ 55

3.2.8 Model validation technique ............................................................................... 55

3.3 Summary ........................................................................................................................ 56

CHAPTER 4 ........................................................................................................................ 57

SELECTION OF STUDY AREA AND DATA PREPARATION ..................................... 57

4.1 General........................................................................................................................... 57

4.2 Selection of study area ................................................................................................... 57

4.3 Selection of study catchments ....................................................................................... 58

4.3.1 Factors considered for selection of catchments ................................................ 58

4.4 Streamflow data preparation .......................................................................................... 59

4.4.1 Methods of streamflow data preparation .......................................................... 59

4.4.2 Tests for outliers ................................................................................................ 60

4.4.3 Trend analysis ................................................................................................... 60

4.4.4 Rating error analysis ......................................................................................... 61

4.5 Selection of catchment characteristics ........................................................................... 62

4.5.1 Selection criteria ............................................................................................... 62

4.5.2 Catchment characteristics considered in this thesis .......................................... 63

4.5.3 Rainfall intensity ............................................................................................... 63

4.5.4 Mean annual rainfall ......................................................................................... 64

4.5.5 Catchment area .................................................................................................. 64

4.5.6 Slope S1085 ...................................................................................................... 65

4.5.7 Mean annual evapo-transpiration ...................................................................... 66

4.6 Streamflow data preparation for various states ............................................................. 66

4.6.1 NSW and ACT .................................................................................................. 66

4.6.3 Queensland ........................................................................................................ 73

4.6.4 Victoria .............................................................................................................. 76

4.5 Flood frequency analysis ............................................................................................... 81

4.6 Summary of catchment characteristics data .................................................................. 82

4.7 Summary ........................................................................................................................ 83

CHAPTER 5 ........................................................................................................................ 84

SELECTION OF PREDICTOR VARIABLES FOR ARTIFICIAL INTELLIGENCE

BASED RFFA MODELS .......................................................................................... 84

5.1 General........................................................................................................................... 84

5.2 Initial selection of predictor variables for artificial intelligence based RFFA models .. 84

5.3 Selection of Predictor variables for ANN based RFFA models .................................... 88

5.4 Selection of predictor variables based on GEP models ................................................. 91

5.5 Summary ........................................................................................................................ 95

CHAPTER 6 ........................................................................................................................ 96


University of Western Sydney X

SELECTION OF REGIONS ............................................................................................... 96

6.1 General........................................................................................................................... 96

6.2 Description of candidate regions ................................................................................... 96

6.2.1 Selection of the best performing region based on state, geographic and climatic

boundaries .................................................................................................................. 97

6.3 Regions based on catchment characteristics data ........................................................ 100

6.3.1 Cluster analysis ............................................................................................... 100

6.3.2 Principal component analysis .......................................................................... 105

6.4 Summary ...................................................................................................................... 111

CHAPTER 7 ...................................................................................................................... 113

DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED RFFA MODELS ........ 113

7.1 General......................................................................................................................... 113

7.2 Training of artificial intelligence based RFFA models ............................................... 114

7.3 Comparison of training and validation results ............................................................. 120

7.3.1 ANN ................................................................................................................ 120

7.3.2 GAANN .......................................................................................................... 123

7.3.3 GEP ................................................................................................................. 126

7.3.4 CANFIS .......................................................................................................... 129

7.4 Selection of the best performing artificial intelligence based RFFA model based on

training ..................................................................................................................... 131

7.5 Summary ...................................................................................................................... 133

CHAPTER 8 ...................................................................................................................... 134

VALIDATION OF ARTIFICIAL INTELLIGENCE BASED RFFA MODELS ............. 134

8.1 General......................................................................................................................... 134

8.2 Validation of RFFA models ........................................................................................ 134

8.2.1 ANN ................................................................................................................ 134

8.2.2 GAANN .......................................................................................................... 138

8.2.3 GEP ................................................................................................................. 140

8.2.4 CANFIS .......................................................................................................... 143

8.3 Comparison of RFFA models based on validation data set ........................................ 145

8.3.1 Median Qpred/Qobs ratio .................................................................................... 145

8.3.2 Median RE (%) ............................................................................................... 147

8.3.3 Median CE ...................................................................................................... 149

8.3.5 Comparison of RFFA models based on RE (%) ranges .................................. 151

8.3.6 Selection of the best performing artificial intelligence based RFFA model ... 152

8.4 Performance of the finally selected artificial intelligence based RFFA model ........... 153

8.4.1 Spatial distribution of RE (%) of the ANN based RFFA model ..................... 154

8.4.2 Catchment area vs RE ..................................................................................... 157

8.5 Comparison with QRT ................................................................................................ 158

8.6 Summary ...................................................................................................................... 159

CHAPTER 9 ...................................................................................................................... 161


University of Western Sydney XI

SUMMARY, CONCLUSIONS AND RECOMMENDATIONS ..................................... 161

9.1 General......................................................................................................................... 161

9.2 Summary of the research undertaken in this thesis ..................................................... 161

9.3 Conclusions ................................................................................................................. 163

9.4 Recommendations for further research........................................................................ 164

REFERENCES .................................................................................................................. 166

REFERENCES .................................................................................................................. 167

APPENDICES ................................................................................................................... 182

APPENDIX A ................................................................................................................... 183

APPENDIX B .................................................................................................................... 205


University of Western Sydney XII

List of Figures

Figure 1.1 Flooding at Ipswich, Queensland 2011 (ABC News, Australia) ........................................................... 2

Figure 1.2 Aerial view of the flooded south western town of Wagga Wagga, NSW in March 2012 (ABC News,

2012) ....................................................................................................................................................................... 3

Figure 1.3 Illustration of major steps in this research ............................................................................................. 9

Figure 2.1 Various design flood estimation methods (modified from Rahman et al., 1998) .................................13

Figure 3.1 Different RFFA techniques adopted in this study .................................................................................34

Figure 3.2 Structure of typical natural neuron (Source:

http://staff.itee.uq.edu.au/janetw/cmc/chapters/Introduction/) ...............................................................................35

Figure 3.3 Configuration of Feedforward Three-Layer ANN (ASCE, 2000) ........................................................36

Figure 3.4 Basic idea of genetic algorithm (Sohail et al., 2005) ............................................................................43

Figure 3.5 Flow chart showing steps in GAANN model .......................................................................................44

Figure 3.6 An example of assigning gene values of a chromosome to the respective synaptic weights of ANN

architecture during a GAANN modelling ..............................................................................................................45

Figure 3.7 GEP expression tree (ET) .....................................................................................................................47

Figure 3.8 Fuzzy inference system (FIS) (Shi and Mozimoto, 2000) ....................................................................48

Figure 3.9 A typical structure of CANFIS (Parthiban and Subramanian, 2009) ....................................................50

Figure 4.1 Location of the selected study area (coloured parts of the map) ...........................................................57

Figure 4.2 Result of trend analysis (Station 219001). Here Vk is CUSUM test statistic defined in McGilchrist and

Wodyer, 1975 .........................................................................................................................................................68

Figure 4.3 Result of trend analysis – time series plot (Station 219001) .................................................................69

Figure 4.4 Histogram of rating ratios for 106 stations from NSW .........................................................................69

Figure 4.5 Distribution of streamflow record lengths of 96 stations from NSW and ACT ....................................70

Figure 4.6 Distribution of catchment areas of 96 stations from NSW and ACT ....................................................70

Figure 4.7 Geographical distributions of 96 catchments from NSW and ACT ......................................................71

Figure 4.8 Distribution of streamflow record lengths of the selected stations from Tasmania ..............................72

Figure 4.9 Distribution of catchment areas of the selected stations from Tasmania ..............................................72

Figure 4.10 Locations of selected catchments from Tasmania ..............................................................................73

Figure 4.11 Distribution of streamflow record lengths of the selected 172 stations from QLD ............................75

Figure 4.12 Distribution of catchment areas of the selected 172 stations from QLD ............................................75


University of Western Sydney XIII

Figure 4.13 Locations of the selected 172 stations from QLD ...............................................................................76

Figure 4.14 Time series graph showing significant trends after 1995 ....................................................................78

Figure 4.15 CUSUM test plot showing significant trends after 1995 ....................................................................78

Figure 4.15 Histogram of rating ratios (RR) of AM flood data in Victoria (stations with record lengths > 25

years) ......................................................................................................................................................................79

Figure 4.16 Distributions of streamflow record lengths of the selected 131 stations from Victoria ......................81

Figure 4.17 Distributions of catchment areas of the selected 131 catchments from Victoria ................................81

Figure 4.18 Geographical distributions of the selected 131 catchments from Victoria .........................................82

Figure 4.19 Locations of the study catchments ......................................................................................................82

Figure 6.1 Plot of median Qpred/Qobs ratio values for different ARIs for selected regions ......................................99

Figure 6.2 Median relative error (%) values for different ARIs for selected regions ...........................................100

Table 6.4 Regions/groups formation by cluster analysis......................................................................................101

Figure 6.3 Dendrogram using average linkage between groups ..........................................................................102

Figure 6.3 (a) Section of Dendrogram using average linkage between groups ....................................................103

Figure 6.3 (b) Section of Dendrogram using average linkage between groups ....................................................104

Figure 6.4 Scree plot from principal component analysis ....................................................................................107

Figure 6.5 Grouping derived from PC1 vs PC2 plot based on PC1 .....................................................................107

Figure 6.6 Grouping derived from PC1 vs PC2 plot based on PC2 .....................................................................108

Figure 6.7 Median Qpred/Qobs ratio values for different ARIs for candidate regions .............................................110

Figure 6.8 Median relative error (%) values for different ARIs for candidate regions ........................................111

Figure 6.9 Comparison of median relative error (%) values between combine data set and grouping based on K-

Means cluster analysis .........................................................................................................................................111

Figure 7.1 Plot of CE values of four artificial intelligence based RFFA models based on training data set .......114

Figure 7.2 Plot of median Qpred/Qobs ratio values of four artificial intelligence based RFFA models based on

training data set ....................................................................................................................................................115

Figure 7.3 Plot of median RE (%) values of four artificial intelligence based RFFA models based on training

data set .................................................................................................................................................................116

Figure 7.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q20 (training

data set) ................................................................................................................................................................117

Figure 7.5 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q20

(training data set) .................................................................................................................................................118


University of Western Sydney XIV

Figure 7.6 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q20 (training

data set) ................................................................................................................................................................119

Figure 7.7 Comparison of observed and predicted flood quantiles for CANFIS based RFFA model for Q20


Figure 7.8 Plot comparing the CE values given by the training and validation data sets for the ANN based RFFA

model ...................................................................................................................................................................121

Figure 7.9 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the

ANN based RFFA model .....................................................................................................................................121

.............................................................................................................................................................................122

Figure 7.10 Plot comparing the median RE (%) values given by the training and validation data sets for the ANN

based RFFA model...............................................................................................................................................122

Figure 7.11 Regression plot comparing the training and validation of the ANN based RFFA model for Q20 .....122

Figure 7.12 Plot showing the training state of the ANN based RFFA model for Q20 ..........................................123

Figure 7.13 Plot between Qobs and Qpred for the ANN based RFFA model for the validation data set ................123

Figure 7.14 Plot comparing the CE values given by the training and validation data sets for the GAANN based

RFFA model.........................................................................................................................................................125

Figure 7.15 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for

the GAANN based RFFA model .........................................................................................................................125

Figure 7.16 Plot comparing the median RE (%) values given by the training and validation data sets for the

GAANN based RFFA model ...............................................................................................................................126

Figure 7.17 Plot comparing the CE values given by the training and validation data sets for the GEP based

RFFA model.........................................................................................................................................................127


the GEP based RFFA model ................................................................................................................................128

Figure 7.19 Plot comparing the median RE (%) values given by the training and validation data sets for the GEP

based RFFA model...............................................................................................................................................128

Figure 7.20 Plot comparing the CE values given by the training and validation data sets for the CANFIS based

RFFA model.........................................................................................................................................................130


the CANFIS based RFFA model ..........................................................................................................................130


University of Western Sydney XV

Figure 7.22 Plot comparing the median RE (%) values given by the training and validation data sets for the

CANFIS based RFFA model ...............................................................................................................................131

Figure 8.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q20 ...........135

Figure 8.2 Boxplot of relative error (RE) values for ANN based RFFA model ..................................................136

Figure 8.3 Boxplot of Qpred/Qobs ratio values for ANN based RFFA model ........................................................137

Figure 8.4 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q20 .....138

Figure 8.5 Boxplot of relative error (RE) values for GAANN based RFFA model .............................................139

Figure 8.6 Boxplot of Qpred/Qobs ratio values for GAANN based RFFA model ...................................................140

Figure 8.7 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q20 ............141

Figure 8.8 Boxplot of relative error (RE) values for GEP based RFFA model ....................................................142

Figure 8.9 Boxplot of Qpred/Qobs ratio values for GEP based RFFA model .........................................................143

Figure 8.10 Comparison of observed and predicted flood quantiles for CANFIS based RFFA model for Q20 ...144

Figure 8.11 Boxplot of relative error (RE) values for CANFIS based RFFA model ...........................................145

Figure 8.12 Boxplot of Qpred/Qobs ratio values for CANFIS based RFFA model .................................................146

Figure 8.13 Plot of median Qpred/Qobs ratio values for the four artificial intelligence based RFFA models .........148

Figure 8.14 Plot of median RE (%) values for the four artificial intelligence based RFFA models ....................150

Figure 8.15 Plot of median CE values for the four artificial intelligence based RFFA models ...........................151

Figure 8.16 Spatial distribution of RE of ANN based model across NSW ..........................................................154

Figure 8.17 Spatial distribution of RE of ANN based model across VIC ............................................................155

Figure 8.18 Spatial distribution of RE of ANN based model across North QLD ................................................156

Figure 8.19 Spatial distribution of RE of ANN based model across Southeast QLD ..........................................156

Figure 8.20 Spatial distribution of RE of ANN based model across QLD ...........................................................157

Figure 8.21 Spatial distribution of RE of ANN based model across TAS ...........................................................157

Figure 8.22 Plot between catchment area and RE (%) values for ANN based RFFA model for 90 test catchments

.............................................................................................................................................................................158

Figure B.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q2 (training

data set) ................................................................................................................................................................206


data set) ................................................................................................................................................................206


data set) ................................................................................................................................................................207


University of Western Sydney XVI


data set) ................................................................................................................................................................207


data set) ................................................................................................................................................................208

Figure B.6 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q2










Figure B.11 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q2 (training

data set) ................................................................................................................................................................211


data set) ................................................................................................................................................................211


data set) ................................................................................................................................................................212


data set) ................................................................................................................................................................212

Figure B.15 Comparison of observed and predicted flood quantiles (training) for GEP based RFFA model for

Q100 (training data set) ..........................................................................................................................................213

Figure B.16 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model

for Q2 (training data set) .......................................................................................................................................213


for Q5 (training data set) .......................................................................................................................................214


for Q10 (training data set) ......................................................................................................................................214


University of Western Sydney XVII


for Q50 (training data set) ......................................................................................................................................215


for Q100 (training data set) ....................................................................................................................................215

Figure B.21 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for

Q2 .........................................................................................................................................................................216


Q5 .........................................................................................................................................................................216


Q10 ........................................................................................................................................................................217


Q50 ........................................................................................................................................................................217


Q100 .......................................................................................................................................................................218

Figure B.26 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model

for Q2 ....................................................................................................................................................................218


for Q5 ....................................................................................................................................................................219


for Q10 ..................................................................................................................................................................219


for Q50 ..................................................................................................................................................................220


for Q100 .................................................................................................................................................................220

Figure B.31 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for

Q2 .........................................................................................................................................................................221


Q5 .........................................................................................................................................................................221


Q10 ........................................................................................................................................................................222


University of Western Sydney XVIII


Q50 ........................................................................................................................................................................222


Q100 .......................................................................................................................................................................223

Figure B.36 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model

for Q2 ....................................................................................................................................................................223


for Q5 ....................................................................................................................................................................224


for Q10 ..................................................................................................................................................................224


for Q50 ..................................................................................................................................................................225


for Q100 .................................................................................................................................................................225

Figure B.41 Regression plot comparing the training and validation of the ANN based RFFA model for Q2 ......226

Figure B.42 Regression plot comparing the training and validation of the ANN based RFFA model for Q5 ......226

Figure B.43 Regression plot comparing the training and validation of the ANN based RFFA model for Q10 ....227

Figure B.44 Regression plot comparing the training and validation of the ANN based RFFA model for Q50 ....227

Figure B.45 Regression plot comparing the training and validation of the ANN based RFFA model for Q100 ...228

.............................................................................................................................................................................229

Figure B.46 Section of Dendrogram using average linkage between groups .......................................................229

.............................................................................................................................................................................230

Figure B.47 Section of Dendrogram using average linkage between groups .......................................................230


University of Western Sydney XIX

List of Tables

Table 3.1 Parameters used per run in GEP model ..................................................................................................47

Table 4.1 Summary statistics of the catchment characteristics data .......................................................................83

Table 5.1 Catchment characteristics predictor variables used in some previous RFFA studies .............................85

Table 5.2 Various candidate models and catchment characteristics used ..............................................................87

Table 5.3 Comparison of eight different ANN based RFFA models using 90 independent test catchments .........89

Table 5.4 Rating on the basis of median Qpred/Qobs ratio ........................................................................................90

Table 5.5 Grouping of stations on the basis of median Qpred/Qobs ratio using the criteria of Table 5.4 (ANN based

RFFA models) ........................................................................................................................................................90

Table 5.6 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio value using 90

independent test catchments (ANN based RFFA models) .....................................................................................91

Table 5.7 Comparison of Model 1 and Model 2 on the basis of median relative error (RE) values using 90

independent test catchments (ANN based RFFA models) .....................................................................................91

Table 5.8 Comparison of eight different GEP based RFFA models using 90 independent test catchments ..........92

Table 5.9 Grouping of stations on the basis of median Qpred/Qobs ratio values using the criteria of Table 5.4 for

GEP based RFFA models ......................................................................................................................................94

Table 5.10 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio values using 90

independent test catchments (for GEP based RFFA models ) ...............................................................................94

Table 5.11 Comparison of Model 1 and Model 2 on the basis of RE values using 90 independent test catchments

(for GEP based RFFA models) ..............................................................................................................................94

Table 6.1 Description of candidate regions ............................................................................................................97

Table 6.2 Median Qpred/Qobs ratio values for seven ANN based candidate regions ................................................98

Table 6.3 Median relative error values (%) for seven ANN-based candidate regions ...........................................99

Table 6.4 Regions/groups formation by cluster analysis......................................................................................101

Table 6.5 ANN based RFFA model performances for cluster groupings A1 & A2.............................................105

Table 6.6 ANN- based RFFA model performances for cluster groupings B1 & B2 ............................................105

Table 6.7 Eigenvalues and variance explained by the principal components ......................................................106

Table 6.8 Component matrix in principal component analysis ............................................................................106

Table 6.9 Descriptive statistics of standardised variables ....................................................................................107

Table 6.10 Grouping based on principal component analysis ..............................................................................109

Table 6.11 Median Qpred/Qobs ratio values for seven candidate regions ................................................................110


University of Western Sydney XX

Table 6.12 Median relative error (%) ...................................................................................................................110

Table 7.1 CE values of four artificial intelligence based RFFA models based on training data set .....................114

Table 7.2 Median Qpred/Qobs ratio values of four artificial intelligence based RFFA models based on training data

set .........................................................................................................................................................................115

Table 7.3 Median RE (%) values of four artificial intelligence based RFFA models (training) ..........................116

Table 7.4 Comparison of training and validation results for the ANN based RFFA model.................................120

Table 7.5 Comparison of training and validation results for the GAANN based RFFA model ...........................124

Table 7.6 Comparison of training and validation results for the GEP based RFFA model ..................................127

Table 7.8 Comparison of training and validation results for the CANFIS based RFFA model ...........................129

Table 7.9 Ranking of the four artificial intelligence based RFFA models with respect to training .....................132

Table 7.10 Ranking of the four artificial intelligence based RFFA models with respect to agreement between

training and validation .........................................................................................................................................132

Table 8.1 Median Qpred/Qobs ratio values for the four artificial intelligence based RFFA models .......................147

Table 8.2 Median RE (%) values for the four artificial intelligence based RFFA models

.............................................................................................................................................................................149

Table 8.3 Median CE values of the four artificial intelligence based RFFA models ...........................................150

Table 8.4 Grouping of 90 test catchments based on RE (%) ranges for the four artificial intelligence based RFFA

models ..................................................................................................................................................................152

Table 8.5 Ranking of the four artificial intelligence based RFFA models for eastern Australia..........................153

Table 8.6 Median Qpred/Qobs ratio values for seven ANN based candidate regions and QRT ..............................159

Table 8.7 Median relative error values (%) for seven ANN based candidate regions and QRT ..........................159

Table 8.8 Coefficient of efficiency (CE) values for seven ANN based candidate regions and QRT ...................159


University of Western Sydney XXI

List of symbols

A Catchment area (km2)

bj The threshold value associated with the node j

0 Regression coefficient

C Runoff coefficient

YC Dimensionless runoff coefficient for ARI of Y years

d Sub-storm duration (h)

Ei Elevation at ith level (m)

E Mean annual aerial evapotranspiration (mm)

f Activation function

g The binary gene

I Rainfall intensity (mm/s)

YtcI , Average rainfall intensity for time of concentration tc and Y years ARI (mm/h)

J Node in neural networks

L Mainstream length (km)

l Length of a chromosome

n Number of samples and points

pc Crossover rate

pm Mutation rate

Q Flood discharge (m3/s)

Q2 Flood peak discharge for 2 years average recurrence interval (ARI) (m3/s)

QE Estimated flow (m3/s)

QM Maximum measured flow (m3/s)

Qobs Observed flood quantile (m3/s)

Q Mean of Qobs (m3/s)

Qpred Predicted flood quantile (m3/s)

YQ Peak flow rate for an ARI of Y years (m3/s)

QT Peak flow rate for T years (m3/s)

R2 Coefficient of determination

R Mean annual rainfall (mm/h)

S1085 Slope of central 75% of mainstream (m/km)

tc Time of concentration (h)

T Return period (average recurrence interval) (year)

Vk CUSUM test statistic


University of Western Sydney XXII

Wj An input vector of jth node

wij The connection weight from the ith node

w Value of a synaptic weight

xmaxabs Absolute maximum difference

xn nth Input variable

X An input vector


University of Western Sydney XXIII

List of abbreviations BoM Bureau of Meteorology

ACT Australian Capital Territory

AEP Annual exceedance probability

AM Annual maximum

ANFIS Adaptive neuro fuzzy inference system

ANN Artificial neural network

ARI Average recurrence interval

ARMA Autoregressive Moving Average

ARR Australian Rainfall and Runoff

AUSIFD Software for Intensity-frequency-duration

BP Backpropagation

BGLS Bayesian Generalised Least Square

BGLS-ROI Bayesian Generalised Least Square - Region-of-influence

BITRE Bureau of Infrastructure, Transport and Regional Economics

CANFIS Co-active neuro fuzzy inference system

CE Co-efficient of Efficiency

CD Compact Disc

DOW Department of Water

Elman Elman partial recurrent neural network

ETs Expression Trees

FFBP Feedforward Backpropagation

FFA Flood frequency analysis

FFN Fuzzy neural Network

FIS Fuzzy inference system

FLIKE Flood frequency analysis software

GA Genetic algorithm

GAANN Genetic algorithm based artificial neural network

GB Grubbs and Beck

GEP Gene expression programming

GIUH Geomorphologic Instantaneous Unit Hydrograph

GLS Generalised Least Square

IFM Index Flood Method

I. E. Australia Institution of Engineers Australia

IFD Intensity-frequency-duration or design rainfall depth

IM Instantaneous Maximum


University of Western Sydney XXIV

LGP Linear Genetic Programing

LM Lavenberg-Marquardt

LP3 Log Pearson Type 3

LR Logistic Regression

MATLAB MATrix LABoratory

MF Membership Function

MINITAB Statistical Package

MLFN Multilayer Feedforward Neural Network

MMD Monthly Maximum mean Daily

MSE Mean Squared Error

NCWE National Committee on Water Engineering

NFS Neuro Fuzzy System

NSW New South Wales

NRW Department of Natural Resources & Water

OLS Ordinary Least Square

PCA Principle Component Analysis

pdf Probability density function

PRM Probabilistic Rational Method

QRT Quantile Regression Technique

r Ratio of predicted and observed flood quantile

RE Relative error

RFFA Regional flood frequency analysis

ROI Region of Influence

RR(s) Rating ratio(s)

SDRR Summer Dominated Rainfall Region

SWMM Storm Water Management Model

TAS Tasmania

TDNN Time Delay Neural Network

TSK Takagi, Sugeno and Kang (Fuzzy model)

UK United Kingdom

USGS United States’ Geological Survey

VIC Victoria

WDRR Winter Dominated Rainfall Region


University of Western Sydney 1

CHAPTER 1

INTRODUCTION

1.1 General

This thesis focuses on regional flood estimation by applying various non-linear techniques

based on artificial intelligence. The non-linear techniques considered in this thesis include

artificial neural network (ANN), genetic algorithm based artificial neural network (GAANN),

gene-expression programing (GEP) and co-active neuro fuzzy inference system (CANFIS).

This thesis aims to explore and enhance the non-linear techniques in regional flood estimation

so that these techniques can be applied to ungauged and poorly gauged catchments to obtain

accurate design flood estimates in Australia. This chapter begins by presenting a background

to this research, need for this research, research questions to be investigated, research tasks

undertaken and an outline of this thesis.

1.2 Background

Flood is one of the worst natural disasters, which brings disruptions to services and damages

to infrastructure, crops and properties and sometimes causes loss of human lives. For

example, 2010-11 floods in Queensland caused 35 deaths. Effects on industry and other

production units and the costs in the form of health disaster due to flooding also add up to the

overall losses to Australian economy. In Australia, the average annual flood damage is worth

over $377 million and infrastructure requiring design flood estimate is over $1 billion per

annum (BITRE, Australia). The state of New South Wales (NSW) alone has an average

annual cost of flood damage of over $172 million, which is almost 46% of the average annual

cost for Australia. The state of Queensland is second largest in terms of flood damage, with an

average annual cost of $125 million. Importantly, the 2010-11 devastating flood in

Queensland caused flood damage over $5 billion (Queensland Reconstruction Authority,

2011). Figure 1.1 shows flooding of Ipswich city in Queensland during the 2010-2011

flooding. Figure 1.2 shows an aerial view of the flooded south western town of Wagga

Wagga, NSW in March 2012.



Figure 1.1 Flooding at Ipswich, Queensland 2011 (ABC News, Australia)

Floods are caused by factors such as heavy rainfall, snowmelt, dam break and cyclones. The

catchment and land use characteristics determines the magnitude of flooding from a given

rainfall event. Urbanisation and clearing of catchment increase the flood risk for a given

catchment. Apart from rural areas, flood is a serious problem in urban areas where the runoff

volume increases due to increased impervious area plus shorter response time. Climate change

has increased the frequency and magnitude of extreme rainfall events resulting in many

devastating floods in recent years (Ishak et al., 2013; Ishak and Rahman, 2014). Australian

Bureau of Meteorology (BOM) in its state of the climate report 2014 stated “An increase in

the number and intensity of extreme rainfall events is projected for most regions”. This means

there will be more extreme floods in most regions of Australia (BOM, 2014).

Flood damage can be minimised by ensuring optimum capacity to drainage infrastructures.

An underdesign of these structures increases flood damage cost whereas an overdesign incurs

unnecessary expenses. The optimum design of drainage infrastructures depends largely on

reliable estimation of design floods which is a flood discharge associated with a given annual

exceedance probability (AEP).

Design flood estimation is required in numerous engineering applications e.g., design of

bridge, culvert, weir, spill way, detention basin, flood protection levees, highways, floodplain

modelling, flood insurance studies and flood damage assessment tasks. For design flood



estimation, the most direct method is flood frequency analysis, which requires long period of

recorded streamflow data at the site of interest. This is not a feasible option at many locations

due to absence or limitation of streamflow records. As at 1993, of the 12 drainage divisions in

Australia, seven did not have a stream with 20 or more years of data (Vogel et al., 1993).

Australian Rainfall and Runoff (ARR) 1987 recommended various design flood estimation

techniques for ungauged catchments for different regions of Australia (I. E. Aust., 1987,

2001). Since 1987, the methods in the ARR have not been upgraded although there have been

an additional 20 years of streamflow data available and notable developments in both at-site

and regional flood frequency analyses techniques in Australia and internationally.

Figure 1.2 Aerial view of the flooded south western town of Wagga Wagga, NSW in March 2012

(ABC News, 2012)

Different regional flood estimation methods have been proposed for different parts of

Australia (I. E. Aust., 1987). Among these, various forms of the rational method and the index

flood method are the most common. However, these methods have not been updated since

1987. Because of changing climatic conditions and improvements in regional flood estimation

methods in recent years, there is a need to look for new regional flood estimation techniques

for Australia. Some of the recent developments in regional flood estimation in Australia

include L moments based index flood method (Bates et al., 1998; Rahman et al., 1999),

various forms of regression techniques (Rahman, 2005; Haddad et al., 2006, 2008, 2009,

2014; Haddad and Rahman, 2012; Hackelbush et al., 2009; Zaman et al., 2012; Micevski et

al., 2014) and regional Monte Carlo simulation (Rahman et al., 2002; Caballero and Rahman,

2014).



Regional flood frequency analysis (RFFA) is the generic name given to describe techniques

which utilises streamflow data from gauged catchments in a region to estimate design floods

for poorly gauged or ungauged catchments. The use of RFFA enables the “transfer” of flood

characteristics information from gauged to ungauged catchments (Bloschl and Sivapalan,

1997; Pallard et al., 2009). The most commonly adopted RFFA methods have been described

in Cunnane (1988) and Hosking and Wallis (1997). RFFA essentially consists of two

principal steps: (a) formation of regions and (b) development of prediction equations.

Regions have traditionally been formed based on geographic, political, administrative or

physiographic boundaries (e.g. NERC, 1975; I. E. Aust., 1987). Regions have also been

formed in catchment characteristics data space using multivariate statistical techniques (e.g.

Acreman and Sinclair, 1986; Nathan and McMahon, 1990; Rao and Srinivas, 2008; Guse et

al., 2010). Regions can also be formed using a region-of-influence approach where a certain

number of catchments based on proximity in geographic or catchment attributes space are

pooled together based on some objective function to form an optimum region (e.g. Burn,

1990; Zrinji and Burn, 1994; Kjeldsen and Jones, 2009; Haddad and Rahman, 2012).

For developing the regional flood prediction equations, the commonly used techniques

include the rational method, index flood method and quantile regression technique (QRT).

The rational method has widely been adopted in estimating design floods for small ungauged

catchments (e.g. Mulvany, 1851; I. E. Aust., 1987; Jiapeng et al., 2003; Pegram and Parak,

2004; Rahman et al., 2011). The index flood method has widely been adopted in many

countries which heavily relies on the identification of homogeneous regions (Dalrymple,

1960; Hosking and Wallis, 1993; Bates et al., 1998; Rahman et al., 1999; Kjeldsen and Jones,

2010; Ishak et al., 2011). The QRT, proposed by the United States Geological Survey

(USGS), has been applied by many researchers using either an Ordinary Least Square (OLS)

or Generalized Least Square (GLS) regression technique (e.g. Benson, 1962; Thomas and

Benson, 1970; Tasker, 1980; Stedinger and Tasker, 1985; Tasker et al., 1986; Madsen et al.,

1997; Pandey and Nguyen, 1999; Bayazit and Onoz, 2004; Rahman, 2005; Griffis and

Stedinger, 2007; Ouarda et al., 2008; Kjeldsen and Jones, 2009; Haddad and Rahman, 2011;

Haddad et al., 2011, 2012).

Most of the above RFFA methods assume linear relationship between flood statistics and

predictor variables in log domain while developing the regional prediction equations.

However, most of the hydrologic processes are nonlinear and exhibit a high degree of spatial



and temporal variability and a simple log transformation cannot guarantee achievement of

linearity in modeling. Therefore, there have been applications of artificial intelligence such as

artificial neural networks (ANN), genetic algorithm based ANN (GAANN), gene expression

programming (GEP) and co-active neuro-fuzzy inference system (CANFIS) based methods in

water resources engineering such as rainfall runoff modeling and hydrologic forecasting, but

there have been relatively few studies involving the application of these techniques to RFFA

(e.g. Daniell, 1991; Muttiah et al., 1997; Shu and Burn, 2004; Kothyari, 2004; Dawson et al.,

2006; Shu and Ouarda, 2007, 2008). Importantly, there has not been any known application of

artificial intelligence based techniques in RFFA in Australia. Application of these techniques

may help developing new improved RFFA techniques for Australia. Unlike regression based

approach, the artificial intelligence based techniques do not impose any fixed model structure

on the data rather the data itself identifies the model form through use of artificial

intelligence.

This research seeks to fill the knowledge gap in RFFA by undertaking development and

testing of artificial intelligence based RFFA models using the most extensive and

comprehensive database that has become available in Australia as a part of the on-going

revision of the Australian Rainfall and Runoff.

1.3 Need for this research

Flood is one of the worst natural disasters causing millions of dollars’ of damage each year in

Australia. To reduce flood damage, accurate design flood estimates are needed to design

infrastructures such as bridges, culverts and flood protection levees. Australia is the sixth

largest country in the world with numerous streams. Most of these streams are ungauged or

poorly gauged as monitoring of such a large number of streams is too expensive. Moreover,

many of these streams are located far away from townships. The design flood estimation in

small to medium sized ungauged catchments is of great economic significance (Pilgrim and

Cordery, 1993). The need for flood estimation on ungauged catchments is one of the most

important aspects in hydrologic practice as it covers a large number of catchments where

hundreds of infrastructures are built each year in Australia. The accuracy of the flood

estimation for ungauged catchments is important as an over-estimation would result in higher

construction cost and under-estimation would increase flood damage. Hence, development of

new and more accurate RFFA techniques is important since it will help to design adequate

infrastructure that will allow passage of flood water safely.



In Australia, linear modelling techniques have been adopted so far in developing RFFA

models. The application of non-linear techniques such as artificial intelligence-based methods

in RFFA may provide a viable alternative RFFA technique for Australia. This would assist in

benchmarking the results of traditional RFFA models by comparing the results derived by

artificial intelligence based RFFA models.

The findings of this research would help to recommend the most appropriate RFFA

techniques in the 4th edition of Australian Rainfall and Runoff, which is due to be published in

2015.

1.4 Scope and objectives of the study

The study focuses on regional flood estimation problem, in particular it is devoted to

investigate whether artificial intelligence-based RFFA techniques can be applied to eastern

Australia. It requires carrying out a critical literature review on RFFA techniques, selection of

study catchments, collation of flood, climatic and catchment characteristics data, delineation

of regions, identification of the best set of predictor variables, training and validation of

artificial intelligence-based RFFA models and comparison with other RFFA techniques.

The objectives of this study are:

To carry out a critical literature review on RFFA methods with a particular emphasis

on non-linear artificial intelligence based techniques and to identify the gaps in the

current state of knowledge and further research opportunities on the artificial

intelligence based techniques to regional flood estimation problem.

To select study area and catchments from eastern Australia, to collate streamflow data,

to select catchment characteristics that govern flood generation process and prepare

the climatic and catchment characteristics data set for the RFFA modelling.

To select the best performing set of predictor variables for the artificial intelligence

based RFFA models.

To form different candidate regions based on (i) state boundaries (ii) climatic and

geographical boundaries and (iii) catchment characteristics data using multivariate



statistical techniques and identify the best performing region(s) for artificial

intelligence based RFFA modelling.

To train the artificial intelligence based RFFA models based on ANN, GANN, GEP

and CANFIS.

To validate the artificial intelligence based RFFA models using the validation data set

and select the best performing model.

To compare the best performing artificial intelligence based RFFA model with linear

quantile regression technique.

To make a conclusion based on the results obtained in the study.

1.5 Research questions

This thesis is devoted to answer the following research questions in relation to the

development of artificial intelligence based RFFA models for Australia.

Whether artificial intelligence based techniques can be applied in RFFA in Australia?

What is the best set of predictor variables for the development of artificial intelligence

based RFFA models in Australia?

What is the best region(s) in artificial intelligence based RFFA modelling for Australia

considering regions based on state boundaries, climatic and geographical boundaries

and regions formed in catchment characteristics data space using multivariate

statistical techniques?

How various artificial intelligence based RFFA models can be trained/calibrated?

Among different artificial intelligence based RFFA models (ANN, GAANN, GEP and

CANFIS), which one provides the most accurate flood quantile estimates for Eastern

Australia?

How artificial intelligence based RFFA models compare with linear quantile

regression technique?



1.6 Summary of research undertaken in this thesis

The main research tasks undertaken in this thesis to answer the research questions posed in

Section 1.4 are outlined below. Figure 1.3 illustrates major steps in this research.

Perform a literature review on RFFA and critically examine advantages and

disadvantages, limitations and assumptions associated with various RFFA techniques,

with a particular emphasis on non-linear artificial intelligence based techniques. Based

on the literature review, identify the gaps in the current state of knowledge and further

research opportunities on the non-linear artificial intelligence based techniques to

regional flood estimation.

Select study area and catchments. Prepare streamflow data by filling gaps in the

annual maximum flood series, checking for outliers, rating curve error and trends.

Select catchment characteristics that govern flood generation and prepare the climatic

and catchment characteristics data set.

Select the best performing set of predictor variables for the artificial intelligence based

RFFA models by comparing various combinations of the initially selected candidate

catchment characteristics variables.

Form different candidate regions based on (i) state boundaries (ii) climatic and

geographical boundaries and (iii) catchment characteristics data using multivariate

statistical techniques. Compare the performances of the candidate regions and select

the best performing region for artificial intelligence based RFFA modelling.

Develop artificial intelligence based RFFA models based on ANN, GANN, GEP and

CANFIS. Train the model using the training data set (80% of the selected catchments),

which involves minimisation of the mean squared error between the observed and

predicted flood quantiles by the model (being trained) for a given ARI for the training

data set. Evaluate the training of the model based on a number of statistical criteria:

plot of predicted and observed flood quantiles, median ratio of predicted and observed

flood quantiles, median relative error and coefficient of efficiency.



Validate the artificial intelligence based RFFA models using the validation data set

(20% of the selected catchments) and select the best performing model.

Compare the best performing artificial intelligence based RFFA model with linear

quantile regression technique.

Figure 1.3 Illustration of major steps in this research

1.7 Outline of the thesis

The research undertaken in this study is presented in nine chapters and four appendices, as

outlined below.

Chapter 1 presents a brief introduction to the proposed research. This includes a background

of the proposed research. This chapter also presents the needs for this research, research

questions being examined and the main research tasks undertaken to answer the identified

research questions.

Chapter 2 presents a critical review of RFFA techniques with a particular emphasis on non-

linear techniques such as artificial neural network (ANN), co-active neuro-fuzzy inference

system (CANFIS), genetic algorithm (GA) based ANN (GAANN) and gene-expression

programming (GEP). At the beginning, various methods of flood estimation are discussed.



The review of linear methods including rational method, index flood method and regression

method are then presented. The nonlinear artificial intelligence based methods are then

discussed with a particular emphasis on their applications to hydrology. The assumptions,

limitations, advantages and disadvantages of each of the RFFA methods are discussed. The

current state of knowledge in RFFA, in particular the artificial intelligence based methods, is

ascertained and the scopes of further research are identified.

Chapter 3 describes the mathematical tools adopted in this study. First, ANN is discussed,

which is followed by a description of GAANN, GEP and CANFIS. The quantile regression

technique is then discussed. The principles of cluster analysis and principal component

analysis are then presented. Finally, the adopted model validation technique is discussed.

Chapter 4 presents selection of study area, study catchments and data preparation. First,

criteria for selection of study catchments are presented. The methods of streamflow data

preparation are discussed which include gap filling, outlier detection, trend analysis and rating

curve error analysis. Selection of catchment characteristics is then presented. The preparation

of annual maximum flood series data is then described. Estimation of flood quantiles for

average recurrence intervals of 2, 5, 10, 20, 50 and 100 years for the selected gauged

catchments by at-site flood frequency analysis is then presented. Finally, a summary of the

catchment characteristics data is provided.

Chapter 5 presents the results of selecting the set of predictor variables for the development

of artificial intelligence based RFFA models. First, an initial selection is made based on the

findings of previous studies. These candidate sets of predictor variables are then evaluated

using ANN and GEP based RFFA models. The final set of predictor variables is then selected.

Chapter 6 presents the formation of regions using ANN based RFFA modelling technique.

Regions/groupings are first formed on the basis of state, geographical and climatic

boundaries. In the second step, the regions are formed in the catchment characteristics data

space based on cluster analysis and principal component analysis. All these candidate regions

are then compared and the best performing region is finally selected.

Chapter 7 presents the development of artificial intelligence based RFFA models using

ANN, GAANN, GEP and CANFIS based on the selected predictor variables in Chapter 5 and

optimum region in Chapter 6. The model development involves training of the model using



part of the randomly selected data set. For this purpose, 80% (362 catchments) of the total 452

catchments are used to train the model (training data set) and the remaining 20% (90

catchments) are used to validate the model (validation data set). A number of statistical

criteria are adopted to assess the training of the four artificial intelligence based RFFA

models.

Chapter 8 presents the validation of the artificial intelligence based RFFA models and

quantile regression technique. Initially the four artificial intelligence based RFFA models are

compared with each other to select the best artificial intelligence based RFFA model.

Secondly, the best performing artificial intelligence based RFFA model is compared with the

quantile regression technique. The spatial distribution of the relative error for the finally

selected model is evaluated. Finally, the relationship of the relative error with catchment area

is investigated.

Chapter 9 presents the summary of the research undertaken in this thesis, conclusions and

recommendations for further research.

Appendix A presents the list of the study catchments. This provides the area of each

catchment and the period of streamflow records.

Appendix B presents additional results to supplement the discussion presented in the main

body of the thesis.



CHAPTER 2

REVIEW OF REGIONAL FLOOD FREQUENCY

ANALYSIS METHODS

2.1 General

Regional flood frequency analysis (RFFA) is the generic name given to describe techniques

which utilise data from gauged catchments (donor) in a region to estimate design floods for

poorly gauged and ungauged catchments (receiver). There are many RFFA techniques

ranging from simple approximate methods to complex intelligence based techniques. RFFA

technique such as rational method is based on runoff coefficients which are developed and

used on the principles of geographical contiguity. Index flood method is based on the concept

of homogeneous regions which share a common set of growth factors, while regression based

approaches are based on regional prediction equations. These methods are generally

developed based on linear models; however, there are non-linear RFFA methods that are

based on artificial intelligence such as artificial neural network (ANN). This chapter presents

a review of various RFFA methods, in particular the non-linear intelligence based techniques,

with a particular emphasis on the limitations of various methods, recent advancements and

scope for further developments.

2.2 Design flood estimation methods

Different methods can be used to estimate a design flood for a given annual exceedance

probability (AEP) or average recurrence interval (ARI) or return period (T). The ARI of the

annual peak streamflow at a given location change if there are significant changes in the flow

patterns at that location, possibly caused by an impoundment or diversion of flow. The effect

of development (change of land use from forested or agricultural uses to commercial,

residential and industrial uses) on peak flows is generally much greater for low ARI than than

the higher ones. During these larger floods, the upper soil column is generally fully saturated

and does not have the capacity to absorb much additional rainfall. Under these conditions,

essentially all of the rain that falls, whether on paved surfaces or on saturated soil, runs off

and becomes streamflow. The selection of a type of flood estimation method for a given



application largely depends on the data availability and the purpose of the flood estimates

(Hoang, 2001). Lumb and James (1976), Feldman (1979), James and Robinson (1986) and

Australian Rainfall and Runoff (ARR) (I. E. Australia, 1987) broadly classified design flood

estimation methods into two broad categories: streamflow-based methods and rainfall-based

methods. These are discussed below and illustrated in Figure 2.1.

2.2.1 Streamflow-based flood estimation methods

Streamflow-based flood estimation methods formulate the analysis entirely on recorded data

from stream-gauging station in question and are applicable to gauged catchments, with a

considerably long streamflow record length. In these methods, the design floods for a given

AEP are estimated by undertaking a flood frequency analysis (FFA) of the observed

streamflow data. In this context, a gauged catchment means that streamflow records exist for

flood height and flood flow over a considerable period of time, normally 20 years or longer at

the location of interest so that the parameters of the assumed probability distribution can be

estimated with a reasonably high degree of confidence. The gauging locations are generally

found within a given large catchment and located at the points of interests such as the

convergence of two major creeks or the outlet of the catchment. FFA and regional flood

frequency analysis (RFFA) are the most common streamflow-based methods and these are

discussed below. It should be noted that RFFA methods generally consider catchment

characteristics in estimation; however, FFA is solely dependent on streamflow records.

Figure 2.1 Various design flood estimation methods (modified from Rahman et al., 1998)



Flood frequency analysis (FFA)

Flood frequency analysis (FFA) is a procedure of analysing the recorded flood data by

adopting statistical methods. Statistical techniques, such as FFA, are used to estimate the

AEP of flood or rainfall events. The ARI gives a general indication on how frequently a given

discharge/rainfall will be exceeded on average over a longer period of time. The main

objective of this statistical analysis is to develop a relationship between the magnitude of

extreme flood events and their frequency of occurrence through the use of probability

distributions (Chow et al., 1988). For the analysis to be of practical use, simpler distributions

are often used to characterise the relation between flood magnitudes and their frequencies

(Rao and Hamed, 2000). This deals mainly with direct frequency analysis, where a record of

floods at or near the design site is available. The application of these methods is primarily

made to flood peaks. These may sometimes be applied to flood volumes or even monthly

maximum floods; however, little evidence is available on appropriate types of probability

distributions in these cases (I. E. Australia, 1998). In terms of using the flood data, annual

maximum flood data is more frequently adopted in FFA than the partial series flood data.

Regional flood frequency analysis (RFFA)

Regional flood frequency analysis (RFFA) is a mean of transferring flood frequency

information from gauged catchments to another site on the basis of similarity in catchment

characteristics (I. E. Aust., 1987). This procedure is important for estimating design floods at

ungauged sites as this can stabilise site estimates using the regional relationships, particularly

for parameters such as skew, which is more prone to small-sample errors and data extremes.

In addition, regional relationship can mitigate the effects of outliers and can lead to more

reliable extrapolation of flood frequency curve to rarer frequencies. RFFA although more

commonly applied to ungauged catchments, this can also be adopted to enhance the design

flood estimates at gauged sites where data may be limited in terms of record length.

The use of RFFA enables the transfer of flood characteristics information from gauged to

ungauged sites if the donor catchments are hydrologically similar with the receiver ungauged

site. Last couple of decades have seen extensive research on RFFA. The effort has been to

develop new and improved reliable techniques for flood estimation. Because of vast area of

study, diversity of climatic conditions and site characteristics, different researchers have

emphasized on different issues relevant to RFFA. In the seventies and early eighties much

effort was spent on developing efficient at-site FFA procedures, but late eighties proved to be



of quite significance in developing new and improved RFFA techniques (e.g. Greis and

Wood, 1983; Potter, 1987; Kirby and Moss, 1987; Cunnane, 1987; NRC, 1988; and WMO,

1989). In late eighties, many suggested to compare the existing and available RFFA methods

and to look for better information/data instead of developing new methods (Potter, 1987;

Bobee et al., 1993a).

Many RFFA methods involve two major steps: (1) grouping of sites into homogeneous

regions, and (2) developing regional estimation method. Grouping of sites into homogeneous

regions or homogeneity is the main factor for the performance of many regional estimation

methods in particular the index flood methods. Geographically contiguous regions have been

used for a long time in hydrology, but have been criticised for being of arbitrary nature. In

fact, the geographical proximity does not guarantee hydrological similarity. During the last

five to ten years researchers have attempted to develop methods in which similarity between

sites is defined in a multidimensional space of catchment or statistical characteristics

(Douglas, 1995).

RFFA is needed to estimate design floods at the locations where there is a lack of sufficient

recorded flood data. The reason of insufficient recorded flood data at many locations are it is

quite expensive to operate stream gauges, and many streams are located at remote locations.

Regional analyses, to some extent can compensate for the lack of temporal data, but introduce

a spatial dimension which is not always well understood. Classical flood frequency analysis,

be it at-site or regional, has been criticised for lacking balance, for putting too much emphasis

on mathematical rigor while completely neglecting the understanding of the physical factors

that cause flood events (Klemes, 1993). According to Klemes (1993), “If more light is to be

shed on the probabilities of hydrological extremes then it will have to come from more

information on the physics of the phenomena involved, not from more mathematics.'' This is a

fact which is difficult to argue against. RFFA, in particular the identification of the physical or

meteorological catchment characteristics that cause similarity in flood response, is a step in

the right direction (e.g. Bates et al., 1998)

2.3 Techniques for RFFA

2.3.1 Linear techniques

Three linear RFFA methods are very common and are currently in use in most parts of the

world:



Rational method;

Index flood method; and

Regression method.

Rational method

The rational method is a simple technique for estimating a design discharge from a small

watershed. The rational method was developed by Mulvany (1851) for small drainage basins

in urban areas. This method has been widely regarded as a deterministic method for

estimating the peak discharge from an individual storm. In Australian Rainfall and Runoff

(ARR), probabilistic form of rational method known as Probabilistic Rational Method has

been recommended (I. E. Aust., 1987). Application of the rational method is based on a

simple formula that relates peak discharge with the average intensity of rainfall for a

particular length of time (the time of concentration), and catchment area. The formula is:

QY = 0.278CY.ItcY.A (3.1)

Where

QY = Peak discharge (m3/sec) of average recurrence interval (ARI) of Y years;

CY = runoff coefficient (dimensionless) for ARI of Y years;

A = area of catchment (km2)

ItcY = average rainfall intensity (mm/h) for design duration of tc hours and ARI of Y years.

This model is based on the following assumptions:

The rainfall occurs uniformly over the drainage area;

The peak rate of runoff can be reflected by the rainfall intensity averaged over a time

period equal to the time of concentration of the drainage area; and

The frequency of runoff is the same as the frequency of the rainfall used in the

equation.

The use of the rational formula is subject to several limitations and procedural issues in its

use:



The most important limitation is that the only output from the method is a peak

discharge (the method provides only an estimate of a single point on the runoff

hydrograph).

The simplest application of the method permits and requires the wide latitude of

subjective judgment by the user in its application. Therefore, the results are difficult to

replicate.

The average rainfall intensities used in the formula have no time sequence relation to

the actual rainfall pattern during the storm.

The computation of tc should include the overland flow time, plus the time of flow in

open and/or closed channels to the point of design.

The runoff coefficient, CY is usually estimated from map of runoff coefficient which is

produced based on the assumption of geographical contiguity i.e. runoff of nearby

catchments vary in a smooth fashion. This assumption is unlikely to be satisfied as

there is no guarantee that two nearby catchments are hydrologically similar.

Many users assume the entire drainage area is the value to be entered in the Rational

method equation. In some cases, the runoff from only the interconnected impervious

area yields the larger peak flow rate

In Australia, the Probabilistic Rational Method has been researched by Pilgrim and

McDermott (1982), Adams (1987), Weeks (1991) and Rahman et al. (2008; 2011) and Pirozzi

et al. (2009). There has been limited independent validation of the Probabilistic Rational

Method and the user has little idea about the uncertainty in the estimated flood quantiles

obtained from this method (Rahman and Hollerbach, 2003).

There have been few attempts to improve the rational method using a more advanced

statistical treatment such as Franchini et al. (2005).

Index flood method

The index flood method (IFM), introduced by Dalrymple (1960), is the most widely used

method of RFFA. It is based on the identification of a homogeneous region, within which the

probability distribution of annual maximum peak flows is invariant except for a scale factor

represented by the index flood (either the mean or median flood). Homogeneity with regards



to the index flood relies on the concept that the standardized flood peaks from individual sites

in the region follow a common probability distribution with identical parameter values. From

all the methods to be discussed in this thesis, this approach involves the strongest assumption

on homogeneity.

The flood peak discharge with an assigned return period T relative to the selected site is, in

fact, expressed as the product of two terms: the scale factor of the examined site (the index

flood) and the dimensionless growth factor, which has regional validity i.e. it is fixed within a

region. In general, it is assumed that the index flood is the average of annual maximum flood

peak flows at the site of interest. For ungauged site, the index flood is estimated from a

regional prediction equation that uses climate and catchment characteristics as predictor

variables.

The literature contains numerous studies on the identification of homogeneous groups of

catchments and the estimation of the growth factor (Reed et al., 1999; Burn and Goel, 2000;

Castellarin et al., 2001), and relatively few on estimating the index flood. Recent studies in

Australia, (Bates et al., 1998; Rahman et al., 1999), assigned ungauged catchments to a

particular homogenous group identified (through the use of L-moments, (Hosking and Wallis,

1993)) on the basis of catchment and climatic characteristics as opposed to geographical

proximity. However the deficiencies in this approach were already evident in that it needed 12

catchment/climatic descriptors to be used. Therefore its practical use is somewhat limited by

its complexity and the time needed to gather the relevant data. On an international level Fill

and Stedinger (1998) and Jeong et al. (2008), both demonstrated that the IFM can provide

improved quantile estimation, when different sources of errors are reduced, such as sampling

error and error due to inter-station correlation. As Australia is extremely diverse in hydrology

there exists a greater heterogeneity among catchments, the use of IFM in Australia is limited

(Bates et al., 1998) as results obtained through IFM would be subject to substantial error.

Therefore a method in Australia is needed where the assumption of homogeneity can be

relaxed and where heterogeneity can be accounted for by capturing the variability from site to

site within a region. Such an approach is quantile regression technique, which is discussed

below.

Australian Rainfall & Runoff (ARR) (I. E Aust., 1987) did not favour the IFM as a design

flood estimation technique. This has been criticised on the basis that the coefficient of

variation of the flood series may vary approximately inversely with catchment area, thus

resulting in flatter flood frequency curves for larger catchments. This had particularly been



noticed in the case of humid catchments that differed greatly in size (Dawdy, 1961; Benson,

1962; Riggs, 1973; Smith, 1992).

L moments based index flood methods have widely been researched in recent years (e.g.

Bates et al., 1998; Rahman et al., 1999; Zhang and Hall, 2004 and Saf, 2009).

Regression method

The quantile regression technique (QRT) for flood estimation was proposed by The United

States Geological Survey (USGS). In this method a large number of gauged catchments are

selected from a region and flood quantiles are estimated from recorded streamflow data,

which are then regressed against climatic and catchment variables that are most likely to

govern the flood generation process. Studies by Benson (1962) suggested that T-year flood

quantile could be estimated directly using catchment characteristics data by multiple

regression analysis. As with the index flood approach, this method is not based on a constant

coefficient of variation (Cv) of annual maximum flood series in the region. It has been noted

that the method can give design flood estimates that do not vary smoothly with T; however,

hydrological judgment can be exercised in situations such as these when flood frequency

curves need to be adjusted to increase smoothly with T.

The regression coefficients in the QRT are generally estimated by two methods:

Ordinary least squares approach (OLS)

Generalised least squares approach (GLS)

The OLS approach has traditionally been used by hydrologists to estimate the regression

coefficients in regional hydrological models. But in order for the OLS model to be statistically

efficient and robust, the annual maximum flood series in the region must be uncorrelated, all

the sites in the region should have equal record length and all estimates of T year events

should have equal variance. Since the annual maximum flow data in a region does not

generally satisfy these assumptions, the assumption that the model residual errors in OLS are

homoscedastic is violated and the OLS approach can provide distorted estimates of the

model’s predictive precision (model error) and the precision with which the regression model

coefficients are estimated (Stedinger and Tasker, 1985).

Stedinger and Tasker (1985) proposed the GLS procedure to overcome the above mentioned

problem with the OLS. This approach can be used to estimate the parameters of regional



hydrologic regression models and can produce more accurate results than the OLS, in

particular when the record length varies widely from site to site. In the GLS model, the

assumptions of equal variance of the T year events and zero cross-correlation for concurrent

flows are relaxed. Ever since its inception there have been a number of studies (e.g. Tasker,

1980; Kuczera, 1983; Tasker et al., 1986; Rosbjerg and Madsen, 1995; Madsen et al., 1997;

Pandey and Nguyen, 1999; Bayazit and Onoz, 2004; Griffis and Stedinger, 2007; and

Kjeldsen and Jones, 2009) that have dealt with the QRT in a GLS regression framework, all

of these studies have looked at ways of minimising uncertainty in flood quantile estimation.

Regression based methods have been in the focus in Australia in recent years to estimate flood

quantiles, for example, quantile regression technique (Rahman, 2005; Haddad et al., 2006,

2008, 2009, 2014) and parameter regression technique (Hackelbusch et al., 2009; Haddad and

Rahman, 2012).

Different regional flood estimation methods have been proposed for different parts of

Australia (I. E. Aust., 1987, 2001). Among these, various forms of the rational method and the

index flood method are the most common. However, these methods have not been updated

since 1987. Because of changing climatic conditions and improvements in regional flood

estimation methods in recent years, there is a need to look for new regional flood estimation

techniques for different parts of Australia. Some of the recent developments in regional flood

estimation in Australia include L moments based index flood method (Bates et al., 1998;

Rahman et al., 1999), various forms of regression techniques (Rahman, 2005; Haddad et al.,

2006, 2008, 2009; Hackelbush et al., 2009).

Most of the above RFFA methods assume linear relationship between flood statistics and

predicted variables. However, most of the hydrologic processes are nonlinear and exhibit a

high degree of spatial and temporal variability. There have been applications of non-linear

methods such as artificial neural network (ANN), adaptive neuro fuzzy inference system

(ANFIS), co-active neuro fuzzy inference system (CANFIS), gene expression programming

(GEP), genetic algorithm (GA) and genetic algorithm based artificial neural network

(GAANN) in hydrology in different parts of the world. However, there has not been any

notable application of these techniques in RFFA problem in Australia. Application of

nonlinear techniques may help developing new improved regional flood estimation methods

for Australia. Unlike regression based approach, these do not impose any fixed model

structure on the data; rather the data itself identifies the model form through use of artificial

intelligence. The discussion on various nonlinear RFFA methods is presented below:



2.3.2 Non-linear RFFA techniques

a) Artificial neural network (ANN)

An ANN is a mathematical or computational model that helps to simulate the structure and/or

functional aspects of biological neural networks. Structurally, they are interconnected group

of artificial neurons that process information using a connectionist approach to computation.

Mostly, ANN is an adaptive system that changes its structure based on external or internal

information that flows through the network during the learning phase. Important aspect of

ANN is its ability to model complex relationships between inputs and outputs or to find

patterns in data.

The development of ANN began approximately 60 years ago (McCulloch and Pitts, 1943),

inspired by a desire to understand the human brain and emulate its functioning. Within the last

decade, it has experienced a huge resurgence due to the development of more sophisticated

algorithms and the emergence of powerful computation tools. Extensive research has been

devoted to investigate the potential of ANN as computational tools that acquire, represent, and

compute a mapping from one multivariate input space to another (Wasserman, 1989).

The development of ANN techniques has experienced a renaissance only in the eighties due

to efforts of Hopfield (1982) in iterative auto-associable neural networks. A tremendous

growth in the interest of this computational mechanism has occurred since Rumelhart et al.

(1986) rediscovered a mathematically rigorous theoretical framework for neural networks,

i.e., back-propagation algorithm. Consequently, so far ANN has been applied to various fields

like neurophysiology, physics, biomedical engineering, electrical engineering, computer

science, acoustics, cybernetics, robotics, image processing and financing.

In early nineties, ANN was applied successfully in hydrology. In the very start this was used

for rainfall-runoff modelling, streamflow forecasting, groundwater modelling, water quality,

water management policy, precipitation forecasting, hydrologic time series modelling and

reservoir operations.

Application of ANN in hydrology

Most hydrologic processes are highly nonlinear and exhibit a high degree of spatial and

temporal variability. They are further complicated by uncertainty in parameter estimates.

Hydrologists are often confronted with problems of prediction and estimation of quantities

such as runoff, precipitation, contaminant concentrations, and water stages. This kind of



information is required in hydrologic and hydraulic engineering design as well as water

resources management (ASCE, 2000).

Application of neural networks in hydrological modeling was inspired by the work on

forecasting mapping (predictors) for chaotic dynamic systems (Farmer and Sidorowich,

1987). It followed a theorem, proven by Takens et al. (1981), that there exists a

smooth function that is a predictor of a dynamic system featuring an attractor with a

finite fractal dimension.

The Task Committee on Application of ANN in Hydrology by ASCE (2000) stated that ANN

would have to be classified as empirical models. This approach is called a ‘‘model’’ as it has

many features in common with other modelling approaches in hydrology. Empirical models

treat hydrologic systems (such as a watershed) as a black-box and try to find a relationship

between historical inputs (rainfall, temperature, etc.) and outputs (such as watershed runoff

measured at a stream gauge). Lumped catchment models fall under this category (Blackie and

Eeles, 1985). These methods need long historical records and have no physical basis and, as

such, are not applicable for ungauged catchments. This was suggested that physical

understanding can be useful in selecting the appropriate neural network (ASCE, 2000). As

ANN are heavily a data based technique, the committee suggested that optimal data may be

provided with limitation and with certain conditions based on existing sites.

An improvement over these kinds of models is the geomorphology-based models (e.g., Gupta

and Waymire, 1993; and Corradini and Singh, 1985). These models represent the watershed

structure and the stream network well, but various assumptions concerning the linearity of

response of individual watershed units (streams and overland sections) are needed to be made.

ANN has been used in many rainfall and runoff forecasting applications. For example, Luk et

al. (2001) used ANN forecasting model for rainfall forecasting in Australia. They identified

three types of ANN suitable for this application: multilayer feedforward neural network

(MLFN), Elman partial recurrent neural network (Elman) and time delay neural network

(TDNN). They found that these ANN models can make reasonable forecast of rainfall one

time step (15 minutes) ahead for 16 gauges concurrently.

A different approach of ANN was focused by Zhang and Govindaraju (2003), where they

applied geomorphology based ANN (GANN) for estimation of direct runoff over watersheds

catchment in Indiana, US. They concluded that GANN offer a promising step towards



elevating ANN from purely empirical models to those models that are based on

geomorphology. In his analysis, he found GANN to be outperforming geomorphologic

instantaneous unit hydrograph (GIUH) models.

Abrahart and See (2007) concluded that power of ANN model depends on the

reduced set of inputs. Chokmani et al. (2008) compared the results from ANN and

multiple regression techniques for ice-effected streamflow estimation in Canada. He used

nine different variables as inputs and found ANN to be outperforming the regression

techniques.

There have been some applications of ANN models in RFFA. Muttiah et al. (1997) used ANN

for the 2-year flood prediction in USA catchments. For each gauging station, the two year

peak discharge, drainage area, basin elevation, and average slope were extracted from the file

(containing 150 variables for each gauging station) for statistical and neural network analysis;

they concluded that ANN can provide reasonable estimates of Q2 discharge with simpler

variable input (input vector reductions) requirements. They used a set of data from different

catchments in USA. Kothyari (2004) used ANN for flood estimation of ungauged catchments

in India. He selected data from 97 catchments spread over a large part of India, with area

ranging from 14.5 km2 to 935,000 km2. He considered five different catchment characteristics

as predictor variables including mean annual flood discharge, area, slope of catchment,

rainfall and vegetation cover. He compared two scenarios: Scenario 1 with 12-neurons in the

hidden layer and scenario 2 with 1 neuron only in the hidden layer. He found that scenario 2

provided the best results with minimum error and best R2 values for training, validation and

testing data sets. He also described that an ANN model having more complex architecture

than the one used in scenario 1 did not produce any better results. He suggested that the

results from ANN models can be improved if the region is based on hydro-meteorological

similarity.

Dawson et al. (2006) applied the ANN using different site descriptors for flood estimation at

ungauged catchments in UK. They found that ANN could be used to estimate flood statistics

for ungauged catchments quite successfully. While ANN had been trained in their study to

model T-year flood magnitudes derived from the Gumbel distribution, they could just as

easily be trained to model floods derived from any other distribution. Although it would have

been possible to use conventional statistical approaches to build models for predicting T-year

flood events, the ANN proved to be superior in their study. However, there were a few

caveats to be noted. Firstly, the ANN was heavily data dependent. This was highlighted by



improvements in skill achieved by training ANN on the full available data set instead of a

limited (urban) data set. Secondly, the ANN could not explicitly account for physical

processes, reducing confidence in model predictions. Finally, despite limiting the analysis to

those sites that had at least ten years of record, the limited data at certain sites meant that

some T-year flood events and index floods could be grossly under- or over-estimated. This

was exacerbated when the data included periods of long-term drought or above average long

term rainfall. In those cases, the ANN might be predicting the T-year flood event accurately

but, with only limited observed data, evaluation of skill could be problematic. Dawson et al.

(2006) recommended the partitioning of data on the basis of size, geology and climatic

conditions. They also recommended the application of other ANN models like radial basis

function networks and support vector machines.

Turan and Yurdusev (2009) applied feed forward backpropagation neural networks,

generalized regression neural network and fuzzy logic estimate unmeasured data using the

data of the four runoff gauging station on the Birs River in Switzerland. The performances of

these models were measured by the mean square error, coefficient of determination and

coefficient of efficiency to choose the best fit model. Out of above mentioned techniques,

they observed that model of feedforward backpropagation (FFBP) algorithm should be

selected over the other models if the flows of station would be predicted. Based on the

findings of this study, it was concluded that the best method should be sought to model river

flows based on the flow values of the rivers considered as the specific characteristics of the

basin which feeds the river and the climatic conditions which may vary year by year. Such

exercises may be useful in practice to estimate the missing values of a downstream station

from those of upstream stations.

The application of ANN require careful consideration as highlighted by Maier and Dandy

(2000) who reported a review on ANN based on 43 papers dealing with the use of ANN

models for the prediction and forecasting of water resources variables. They found that in all

but two of the papers reviewed, feedforward networks were used. The vast majority of these

networks were trained using the backpropagation algorithm. They mentioned that issues in

relation to the optimal division of the available data, data pre-processing and the choice of

appropriate model inputs were seldom considered. In addition, the process of choosing

appropriate stopping criteria and optimising network geometry and internal network

parameters was generally described poorly or carried out inadequately. All of the above



factors could result in non-optimal model performance and an inability to draw meaningful

comparisons between different models.

However, one limitation of ANN is that, like other empirical methods, they are unable to

reliably extrapolate beyond the range of the data used for model calibration (Flood and

Kartam, 1994; Minns and Hall, 1996; Tokar and Johnson, 1999). This well-known limitation

of data-driven models is primarily because they are not based on the underlying physics.

Physically-based models tend to perform better at model extrapolation for inputs that are

outside of the range of those used in the calibration data as the mass and energy constraints

they comply with may still result in an appropriate response. Accordingly, it can be very

difficult to determine when data-driven models, such as ANN, will fail to generalize and to

understand the range of applicability of the model. This is true for all the RFFA techniques

e.g. index flood method, rational method and regression based methods.

ANN has been used in various parts of the world; however the application of ANN to RFFA

is very limited. In case of Australia, ANN has been used in the hydrological problems other

than RFFA. But to the author’s knowledge, there is no notable ANN based RFFA study in

Australia.

a) Genetic algorithm based artificial neural network (GAANN)

Genetic Algorithm (GA) was invented by John Holland during 1960s and 1970s (Holland,

1975) and was finally popularized by one of his students who was able to solve a difficult

problem involving the control of gas pipeline transmission for his dissertation (Goldberg,

1989). The concept of GA evolved from the biological evolutionary process. The major

difference between GA and the classical optimization search techniques is that the GA works

with a population of possible solutions; whereas, the classical optimization techniques work

with a single solution (Jain et al., 2005). GA is based on the Darwinian-type survival of the

fittest strategy, whereby potential solutions to a problem compete and mate with each other in

order to produce increasingly stronger individuals. Each individual in the population

represents a potential solution to the problem that is to be solved and is referred to as a

chromosome (Rooji et al., 1996).

A number of selection techniques has been developed by various researchers like ‘roulette

wheel’ (Holland, 1975), ‘stochastic universal sampling’ (Baker, 1987), ‘sigma scaling or

truncation’ (Goldberg, 1989), ‘boltzmann selection’ (de la Maza and Tidor, 1993), ‘rank



selection’ (Baker, 1985) and ‘tournament selection’ (Goldberg and Deb, 1991); however, their

success and utility depends upon the nature of problem in hand.

Application of GA in hydrology

In the fields of hydrology and water resources, although the GA techniques have been used

widely to solve a number of water resources problems (Wang, 1991; Franchini, 1996;

Franchini and Galeati 1997; Savic et al., 1999; Khu et al., 2001; Cheng et al., 2002); the

combined use of GA and ANN i.e. GAANN could not attract much attention of researchers as

yet. The probable reason might be that the algorithm of backpropagation (BP) is much simpler

and easy to understand than GA; hence, most of the ANN applications in literature used back

propagation algorithm. The GA and ANN hybrid applications in water resources field are

limited. One of the hybrid application studies, Jain and Srinivasulu (2004) demonstrated that

GA is better than BP for training an ANN model to predict daily flows more accurately.

Morshed and Kaluarachchi (1998) conducted experiments to compare GA and BP in

streamflow and transport simulations. They reported better performance of BP over GA and

concluded that their results were based on a single set of simulations, and therefore, more

research is needed to prepare alternate GA as a complementary to BP for situations where BP

may fail. See and Openshaw (1999) recombined a series of neural networks via a rule based

fuzzy logic model that has been optimized using a GA. Abrahart et al. (1999) also used a GA

to optimize the inputs to an ANN model used to forecast runoff from a small catchment. Rao

and Jamieson (1997) used hybrid neural network and genetic algorithm approach to

investigate the minimum-cost design of a pump-and-treat aquifer remediation scheme. Wu

and Chau (2006) applied neural networks and GA in flood forecasting. They applied the

model to a reach in the middle section of the Yangtze River in China. All the three techniques

i.e., ANN, GA and GAANN were applied separately. They concluded that when a cautious

treatment was addressed to avoid over-fitting problems, the hybrid GAANN model produced

more accurate flood predictions of the channel. According to authors, hybrid models such as

ANN and GAANN could be considered as feasible alternatives to conventional models and it

would be worth exploring into different types of hybrid techniques.

In the field of RFFA there are few studies using BPANN (Dawson et al., 2005 and Aziz et al.,

2013) but to the best of author’s knowledge there has been no notable application of GAANN

in RFFA especially using the Australian conditions and the data.

b) Gene-expression programming (GEP)



Gene-expression programming (GEP) is (like GA and genetic programming (GP)), a GA as it

uses population of individuals, selects them according to fitness, and introduces genetic

variation using one or more genetic operators (Mitchell, 1996). The fundamental difference

between the three algorithms resides in the nature of the individuals: in GA the individuals are

linear strings of fixed length (chromosomes); in GP the individuals are nonlinear entities of

different sizes and shapes (parse trees); and in GEP the individuals are encoded as linear

strings of fixed length (the genome or chromosomes) which are afterwards expressed as

nonlinear entities of different sizes and shapes (i.e., simple diagram representations or

expression trees).

GEP is an evolutionary computing method that generates a ‘transparent’ and structured

representation of the rainfall-runoff system being studied. The nature of GEP allows the user

to gain additional information on how the system performs, i.e., gives an insight into the

relationship between input (e.g. rainfall and evaporation) and output (flood runoff) data. One

of the additional advantages of this approach over the neural combination method is the

model’s ability to represent itself in the form of mathematical expressions (Fernando et al.,

2009).

GEP (which is an extension of GP (Koza, 1992)), is a search technique that evolves computer

programs (e.g., mathematical expressions, decision trees, polynomial constructs, and logical

expressions). Computer programs generated by GEP are encoded in linear chromosomes and

are then expressed or translated into expression trees (ETs). GEP is a comprehensive

genotype/phenotype system, with the genotype totally separated from the phenotype, whereas

in GP, genotype and phenotype are mixed together in a simple replicator system (Ferreira,

2001a, b; Guven and Aytek, 2009).

Application of GEP in hydrology

In case of water resource engineering, GP has been successfully applied in few cases to solve

various problems. Giustolisi (2004) used GP to determine Chezy resistance coefficient in

corrugated channels; Rabunal et al. (2007) applied GP and ANN to determine the unit

hydrograph of a typical urban basin; Guven et al. (2008) used the linear genetic programming

(LGP) approach for time-series modeling of daily flow rate; Guven and Gunal (2008)

successfully applied GEP approach for prediction of local scour downstream hydraulic

structures. These studies have drawn the hydrologists in investigating the use of GP in

estimating the river flow data (Guven, 2009; Guven and Talu, 2010; Guven and Kisi, 2011).



Most recently, Kisi and Shiri (2011) forecasted precipitation using wavelet-genetic

programming; Azamathulla and Ghani (2011) predicted the longitudinal dispersion

coefficients in streams and Azamathulla et al. (2011) developed stage-discharge rating curves

of Pahang River by using GEP.

In the context of rainfall-runoff modelling, the combination modelling approach advocates the

synchronous use of simulated discharges obtained from a number of rainfall-runoff models to

produce an overall combined/integrated discharge output which can be used as an alternative

to that produced by a single rainfall-runoff model. At present only a limited number of studies

have dealt with the multi-model combination of hydrological models (Coulibaly et al., 2005;

See and Openshaw, 2002; Shamseldin and O'Connor, 1999; Shamseldin et al., 1997). The

emerging conclusion from these pioneering studies is that the combination modelling

approach has tremendous potential for improving the accuracy and reliability of hydrological

modelling forecasts and predictions. However, in these studies no attempts had been made to

explore the nature of the combination function and their inner workings. Further, no

explanation had been provided to account for the drivers behind the improvements in the

modelling results essential to advance the use of combination modeling approaches in the

field of hydrology.

Savic et al. (1999) applied GEP approach for rainfall-runoff modelling. They used the Kirkton

catchment in Scotland (UK) for flow prediction. They concluded that the results of the data-

driven approaches (GP and ANN) could show a very good agreement with the conceptual

model results for which parameters were optimised using the best available optimisation

techniques. However, genetic programming seems to give more insight into the form of the

rainfall-runoff relationships than ANN because it explicitly gives the form of the function

identified. It also partially alleviates the problem of identifying the large number of

parameters necessary for conceptual model calibration. The number of GP parameters

(population size, crossover and mutation probability) is much smaller and does not necessarily

need to change for different rainfall-runoff problems.

Fernando et al. (2012) used GEP to forecast the river flow for different catchments in China

and Ireland. They investigated the application of the novel data driven technique of GEP

to develop one-day-ahead flow forecasting models for catchments with widely differing

characteristics. The outcome of the study found to be positive, although no comparisons

have been made with forecasts from other models, the fact that these are transparent

models and can serve the general purpose of producing daily forecasts of high accuracy



is valuable. Fernando et al. (2009) applied GEP to develop a combined runoff estimate

model from conventional rainfall-runoff model output. They investigated the structure of

the combined model (ANN and GEP) and also the use of GEP to develop a combination

rainfall-runoff model through the process of symbolic regression. They developed the GEP

model using the daily simulated river flows of four other rainfall runoff models for the Chu

catchment located in Vietnam. They found that GEP can be successfully used to combine

model outputs from other basic rainfall-runoff models to develop one with greater accuracy.

The combination allows an insight into the components that make up the model in terms of

mathematical expressions thereby making the GEP model unlike its “black-box” counterparts

that have been used in the past to develop combination models. The mathematical expressions

generated by the programming process can be subsequently applied to other data sets not used

in the model development as well as to further investigate the contributions from each of the

sub-models.

The most relevant study to RFFA has been conducted by Seckin and Guven (2012); where

GEP and linear genetic programming (LGP), which are extensions to GP, in addition to

logistic regression (LR) were employed in order to forecast peak flood discharges. The

data from 543 gauged sites across Turkey was used for the study. Drainage area, elevation,

latitude, longitude, and return period were used as the inputs while the peak flood

discharge was the output. They found that the proposed LGP and GEP models provided

a fast and practical way of estimating the peak flood discharges. The results of their

study indeed encourage the use of genetic programming in other aspects of water

resources engineering studies. The proposed LGP and GEP models offer no restriction

since they do not employ predefined functions unlike most regression-based models.

The results of their study suggest that both genetic programming techniques, LGP and

GEP can be successfully applied in estimating the peak discharges of floods in RFFA.

As discussed the application of GEP based technique in RFFA is very limited; however

there is no significant study for RFFA based on GEP using Australian data.

c) Co-active neuro-fuzzy inference system (CANFIS)

Fuzzy logic is a form of multi-valued logic derived from fuzzy set theory to deal with

reasoning that is approximate rather than precise. In contrast with "crisp logic", where binary

sets have binary logic, the fuzzy logic variables may have a membership value of not only 0



or 1 – that is, the degree of truth of a statement can range between 0 and 1 and is not

constrained to the two truth values of classic propositional logic Furthermore, when linguistic

variables are used, these degrees may be managed by specific functions (Novak et al., 1999).

In the field of artificial intelligence, Neuro-fuzzy refers to combinations of artificial neural

networks and fuzzy logic. Neuro-fuzzy was proposed by Jang (1993). Neuro-fuzzy

hybridization results in a hybrid intelligent system that synergizes these two techniques by

combining the human-like reasoning style of fuzzy systems with the learning and

connectionist structure of neural networks. Neuro-fuzzy hybridization is widely termed as

Fuzzy neural Network (FNN) or Neuro fuzzy System (NFS) in the literature. The Adaptive

neuro fuzzy inference system (ANFIS) is a soft computing technique which makes use of the

benefits of both the ANN and fuzzy systems. ANFIS serves as a basis for constructing a set of

fuzzy if-then rules with appropriate membership functions to generate the stipulated input-

output pairs

Generalized form of ANFIS is called as CANFIS. In CANFIS both Neural networks (NN) and

Fuzzy inference system (FIS) play an active role in an effort to reach a specific goal. CANFIS

has extended the notion of single-output system of ANFIS to produce multiple outputs.

Application of CANFIS in hydrology

Hydrologic analysis is complicated by uncertainties caused by nature (e.g., climate, land

characteristics), limited data, and imprecise modelling. For instance, aquifer parameters are

obtained from a few locations that represent a small fraction of the total volume. Definition of

system boundaries and initial conditions also introduce uncertainty. Future stresses on the

system are also imprecisely known. The stochastic approach of uncertainty analysis considers

aquifer properties as random variables with known distributions. Thus, the outputs from a

stochastic model are also characterized by the statistical moments or the full probability

density function. However, the point in favour of fuzzy logic is; despite the theoretical

development of the stochastic approach, its practical application is rather limited, especially if

a point process model needs to be upscaled (Bogardi et al., 2003). Hydrological sciences

require temporal and spatial data sources for a proper understanding of the phenomenon

concerned. This information provides foundation for the preparation and interpretation and

deduction of logically acceptable conclusions. In many hydrological studies, numerical data

are pumped into mathematical models, especially through readily available computer

software, which may produce unreliable results if the background of the working mechanism



related to any natural hydrological phenomenon is not appreciated qualitatively through

verbal information (Sen, 2009).

Nayak et al. (2003) applied Neuro-fuzzy System to model the river flow of Baitarani River in

India and compared its performance with the ANN and autoregressive moving average

(ARMA) models. The appropriate input was selected by testing different combinations of

flows at different time lags. The study also investigated the issue of transformation of input

data (into normal domain) by comparing the performance of models developed on

transformed and non-transformed data prior to being used as inputs to the models. It was

observed that the model performance increased significantly by using the transformed input

data. The results of the study showed that the neuro-fuzzy models performed slightly better

than ANN but it outperformed the ARMA model in terms of all performance indices.

Jacquin and Shamseldin (2006) developed two types of fuzzy rainfall runoff models based on

Takagi-Sugeno fuzzy inference systems. The developed models are applied to the data of six

catchments of diverse climatic characteristics. The results of the developed models are

compared with those of Simple Linear Model, the Linear Perturbation Model and the Nearest

Neighbour Linear Perturbation Model. The study concluded that the FIS is a suitable

alternative to the traditional methods of modelling non-linear rainfall and runoff.

Talei et al. (2010a) evaluated the rainfall runoff modelling for a sub-catchment of Kranji basin

in Singapore by using a neuro-fuzzy computational technique. The result of the ANFIS was

compared with those of physically based model storm water management model (SWMM). It

was found that two inputs (rainfall at time t and at time t-1) have the maximum coefficient of

efficiency. It was found that ANFIS model is comparable to storm water management model

(SWMM) in terms of goodness of fit. The potential of ANFIS for hydrological modelling was

assessed by applying the ANFIS model to monthly inflows of Bhakara Dam in India (Lohani

et al., 2012). The proposed ANFIS models were compared with ANN and with

Autoregressive (AR) models in order to determine the performance. Karimi et al. (2013)

employed two data driven models ANFIS and ANN models for predicting hourly sea levels

for Darwin Harbor, Australia.

Firat and Gungor (2007) applied neuro-fuzzy technique for flow estimation of the River Great

Menderes in Turkey. The results were compared with the observed flows in order to evaluate

the performance of the training/testing of this model. Using a data set of 5844 daily runoff



data this was found that the ANFIS models were accurate, reliable, and highly efficient and

with minimum root mean square error values.

Oarda and Shu (2007) developed the models for RFFA at ungauged sites using the neuro-

fuzzy for the hydrometric station network of southern Quebec, Canada. They used 15 years

historical data consisting of 151 gauging stations. It was found that neuro-fuzzy approach

provided a mechanism for integrating the two major steps, regionalisation and estimation, in

the RFFA into one system.

A comparative study of ANN and neuro-fuzzy in continuous modelling of the daily and

hourly behaviour of runoff was performed by Aqil et al. (2007). The data was derived from

the Cilalawi River basin in Indonesia. The total drainage area of the Cilalawi River basin is

approximately 60.17 km2. Forest, paddy field and perennial plantation dominate the land use

system in the river basin, which account for 85% of the area. Two types of three layer Feed

forward neural network (FFNN) models, each with one input layer, one hidden layer, and an

output layer, were developed in this study. Three different network architectures and training

algorithms were investigated, namely, Levenberg–Marquardt-FFNN, Bayesian regularization-

FFNN, and neuro-fuzzy. When contesting against the Levenberg–Marquardt-FFNN and the

Bayesian regularization-FFNN, the neuro-fuzzy model had proved better generalization

capabilities and adaptability in modelling complex rainfall–runoff dynamics.

ANFIS has been used in the field of hydrology in various parts of the world. But its

application in RFFA is very limited so far. Especially in Australia, the unique climatic and

geographical conditions draw a line from the rest of world for the application of ANFIS and

model development. There is no evidence of its application in RFFA in Australia till todate.

2.4 Summary

This chapter has discussed various regional flood frequency analysis (RFFA) techniques with

a particular emphasis on non-linear techniques i.e. artificial neural network (ANN), co-active

neuro-fuzzy inference system (CANFIS), genetic algorithm (GA) and gene-expression

programming (GEP). It has been found that the RFFA is widely used in design flood

estimation for ungauged catchments. There are many RFFA methods in the literature having

specific assumptions and data requirements. In Australia (in particular in New South Wales

and Victoria), a linear method i.e., the Probabilistic Rational Method was the method of

choice since 1987, which is likely to be changed in the new version of Australian Rainfall and

Runoff. More recently, regression based RFFA methods have been widely investigated in



Australia. Most of the linear RFFA methods assume linear relationship between flood

statistics and predictor variables. However, most of the hydrologic processes are nonlinear

and exhibit a high degree of spatial and temporal variability, a simple log transformation (the

most common form of transformation) cannot guarantee achievement of linearity in RFFA

modelling. They are further complicated by uncertainty in parameter estimates. Increased

computing power has created new opportunities for hydrologists for the solution of complex

problems using non-linear intelligence based techniques such as ANN, CANFIS, GA and

GEP. These non-linear techniques have been widely used in rainfall and streamflow

forecasting; however, there have been only few studies on RFFA that are based on these

techniques. In particular, there has been no major RFFA research in Australia based on these

non-linear techniques. Non-linear techniques for regional flood estimation could be powerful

methods of modelling as these do not impose a model structure on the data (i.e. they are

model free techniques).

The choice of non-linear model structure, grouping of data into meaningful regions, selection

of appropriate predictor variables, carefully designed model training, testing and validation

methods are key to the development of successful RFFA models based on various non-linear

techniques discussed in this chapter.

Non-linear techniques especially ANN have raised to prominence as a viable alternative to

many traditional water resources models, particularly in the field of forecasting hydrologic

variables. Some of the important features that have contributed to their popularity include

their ease of implementation, their ability to learn from examples without explicit knowledge

of the underlying physics and their powerful generalization abilities. However, one limitation

of the non-linear techniques is that they are data dependent and data driven models. But

unlike most commonly used regression based models, non-linear techniques do not impose a

fixed model.

As the Australian climate and geography are different from rest of the world, with one of the

most variable hydrology it is important to investigate the applicability of these non-linear

techniques in RFFA problems. Hence, this research focuses on the development and testing of

artificial intelligence based RFFA methods for Australia.



CHAPTER 3

METHODOLOGY

3.1 General

This chapter presents the statistical and mathematical tools adopted in this study to develop

the artificial intelligence based RFFA models and quantile regression technique. The cluster

analysis and principal component analysis are also described which are used to group the data

in catchment characteristics data space. At the beginning, artificial neural network (ANN)

method is presented, which is followed by genetic algorithm based ANN, gene-expression

programming, co-active neuro fuzzy inference system, quantile regression technique, cluster

analysis and principal analysis. At the end, adopted validation technique is presented.

3.2 Methods adopted in the study

Initially the RFFA methods based on artificial intelligence are discussed in detail. This covers

the features, fundamental concepts, mathematical equations and input data requirements for

each of these methods. Later, the linear techniques are discussed with major emphasise on

QRT. These are presented in the Figure 3.1.

Figure 3.1 Different RFFA techniques adopted in this study



3.2.1 Artificial neural network (ANN)

There are various types of ANN and their applications are found in many different fields of

science and engineering. Since the first neural model by McCulloch and Pitts (1943), there

have been developments of hundreds of different models considered as ANN. The differences

in them might be the functions, the accepted values, the topology, the learning algorithms, and

the like. Since the function of ANN is to process information, they are used mainly in fields

related to information processing. There are a wide variety of ANN that are used to model real

neural networks, and study behaviour and control in animals and machines, but also there are

ANN which are used for engineering purposes such as pattern recognition, forecasting, and

data compression.

In the ANN modelling, natural neurons receive signals through synapses located on the

dendrites or membrane of the neuron as shown in Figure 3.2. When the signals received are

strong enough (surpass a certain threshold), the neuron is activated and emits a signal through

the axon. This signal might be sent to another synapse, and might activate other neurons.

Figure 3.2 Structure of typical natural neuron (Source:

http://staff.itee.uq.edu.au/janetw/cmc/chapters/Introduction/)

Features and strengths of ANN

1. The most important aspect of ANN is its non-linearity.

2. ANN has the ability to perform input-output mapping in an intelligent manner. This

helps developing a relationship between the input and desired output. ANN has an

ability to adjust its parameters, known as weights, so that the difference between the

actual output from the ANN and the desired output under a certain input is minimized.

This makes the ANN remarkable. There is a bit of similarity between regression



modelling and ANN as they both find an optimum set of coefficients to achieve input-

output transformation; however, ANN can use complex non-linear models in making

such transformation.

3. Adaptivity is the main characteristic of ANN. They can adapt free parameters or

changes in the surrounding environment.

Working structure of artificial neural network (ANN)

A neural network comprises the neuron and weight building blocks. The behaviour of the

network depends largely on the interaction between these building blocks. There are three

types of neuron layers: input, hidden and output layers. Two layers of neuron communicate

via a weight connection network. There are four types of weighted connections: feedforward,

feedback, lateral, and time-delayed connections. A typical configuration of a feedforward

three layer ANN can be seen in Figure 3.3.

Figure 3.3 Configuration of Feedforward Three-Layer ANN (ASCE, 2000)

Various forms of architecture of ANN are discussed below:

Feedforward connections: For all the neural models, data from neurons of a lower layer are

propagated forward to neurons of an upper layer via feedforward connections networks.

Feedback connections: Feedback networks bring data from neurons of an upper layer back to

neurons of a lower layer. In other words, through connection links signals are passed between

nodes.

Lateral connections: The connection strength is represented by associated weight to each

link. One typical example of a lateral network is the winners-takes-all circuit, which serves

the important role of selecting the winner.



Time-delayed connections: Delay elements may be incorporated into the connections to

yield temporal dynamics models. They are more suitable for temporal pattern recognition.

The architecture of ANN represents the pattern of connection between nodes, its method of

determining the connection weights, and the activation function. Alkon (1989), Fausett

(1994), Caudill (1987, 1988 and 1989) presented a comprehensive description of ANN. As

mentioned above, a typical ANN consists of a number of nodes and these nodes are arranged

in a particular order as that of biological neurons.

One way of classifying ANN is by the number of layers: single (Hopfield nets), bilayer

(Carpenter/Grossberg adaptive resonance networks), and multilayer (most backpropagation

networks). ANN can also be categorised based on the direction of information flow and

processing. In a feedforward network, the nodes are generally arranged in layers, starting

from a first input layer and ending at the final output layer. There can be several hidden

layers, with each layer having one or more nodes. Information passes from the input to the

output side. The nodes in one layer are connected to those in the next, but not to those in the

same layer. Thus, the output of a node in a layer is only a dependent on the inputs it receives

from previous layers and the corresponding weights. On the other hand, in a recurrent ANN,

information flows through the nodes in both directions, from the input to the output side and

vice versa. Sometimes, lateral connections are used where nodes within a layer are also

connected (Smith, 1993; Wasserman, 1993; Lawrence, 1994; Bishop, 1995).

The input or the first layer receives the input variables for the problem at hand. This consists

of all quantities that can influence the output. The input layer is thus transparent and is a

means of providing information to the network. The last or output layer consists of values

predicted by the network and thus represents model output. The number of hidden layers and

the number of nodes in each hidden layer are usually determined by a trial-and-error

procedure. The nodes within neighbouring layers of the network are fully connected by links.

A synaptic weight is assigned to each link to represent the relative connection strength of two

nodes at both ends in predicting the input-output relationship. These kinds of ANN can be

used in solving a wide variety of problems, such as storing and recalling data, classifying

patterns, performing general mapping from input pattern (space) to output pattern (space),

grouping similar patterns, or finding solutions to constrained optimization problems. A

system input vector composed of a number of causal variables that influence system

behaviour, and system output vector composed of a number of resulting variables that

represent the system behaviour (Theodoridis and Koutroumbas, 2009).



Mathematical treatment of ANN

The overall output value of a neuron can be expressed as below:

yj = f (X Wi – bj) (3.1)

Where, the input in the first layer forms an input vector:

X = [x1. . . xi, . . . , xn] (3.2)

The sequence of weights leading to the node forms a weight vector:

Wj = [w1j, . . . ,wij, . . ., wnj] (3.3)

where,

j = 1, 2, …n and

m = number of neurons

Where, wij represents the connection weight from the ith node in the preceding layer to this jth

node. The output of node j, yj, is obtained by computing the value of function f with respect to

the inner product of vector X and Wj minus bj, where bj is the threshold value, also called the

bias, associated with this node. In ANN parlance, the bias bj of the node must be exceeded

before it can be activated.

The sigmoid function is a bounded, monotonic, non-decreasing function that provides a

graded, nonlinear response. This function enables a network to map any nonlinear process.

The popularity of the sigmoid function is partially attributed to the simplicity of its derivative

that will be used during the training process. Some researchers also employ the bipolar

sigmoid and hyperbolic tangent as activation functions, both of which are transformed from

the sigmoid function. A number of such nodes are organized to form an ANN.

The function f in (Equation 3.1) is called an activation function. Its functional form

determines the response of a node to the total input signal it receives. Typically the sigmoid

function is expressed as below:

x

x

e

exf

1

1)( (3.4)



In the ANN modelling adopted in this study, Lavenberg-Marquardt method was used as the

training algorithm to minimize the mean squared error (MSE). The purpose of training an

ANN with a set of input and output data is to adjust the weights in the ANN to minimize the

MSE between the desired outputs and the ANN outputs. The degree of error increases with

the number of layers in the network and with the percentage change in the weights. However,

the degree of error is essentially independent of the number of weights per neuron and the

number of neurons per layer, as long as these numbers are large (close to 100 or more). The

data set was split into training and validation sub-sets. In this study, the testing data set was

selected randomly to produce a reasonable sample of different catchment types and sizes. A

feedforward ANN consisting of three layers (input, hidden and output layers) was used with

the training algorithm known as ‘backpropagation of error’. Three hidden-layered neural

networks were selected with 7, 3 and 1 neurons to each of these three layers. Two inputs,

catchment area (A) and rainfall intensity with duration equal to time of concentration (tc) and

a given average recurrence interval (ARI) were used in one input layer and one output layer

with one output called predicted flood quantile (Qpred). The transfer function used for the

hidden layers and the output layer was all hyperbolic tangent sigmoid function (Equation.

3.4). Transfer functions calculate a layer’s output from its net input. A maximum training

iteration of 20,000 was adopted. Each predictor and predictand was standardized to the range

of (0.05, 0.95), such that extreme flood events which exceeded the range of the training data

set could be modelled between the boundaries (0, 1) during testing. A learning rate of 0.05

was used together with a momentum constant of 0.95. MATLAB was used to perform the

ANN training. To select the best performing model the different combinations of hidden

layers, algorithm, and number of neurons were observed against the MSE value. In order to

obtain the best ANN-based model, the MSE values between the observed and predicted flood

quantiles were calculated and the training was undertaken to minimise this error. To avoid

over-training during the training of ANN model, the MSE values were also calculated for the

testing data set. If the testing MSE was increasing, even when the training MSE still was

decreasing, the training of the ANN was terminated. This ensured the training quality of the

ANN and avoided over-fitting.

3.2.2 Genetic algorithm based ANN (GAANN)

In this study the analysis was done using two different types of ANN, one using the

backpropagation technique and the other using genetic algorithm (GA) technique for

optimization.



The major difference between GA and the classical optimization search techniques is that the

GA works with a population of possible solutions; whereas, the classical optimization

techniques work with a single solution (Jain et al., 2005). GA is based on the Darwinian-type

survival of the fittest strategy, whereby potential solutions to a problem compete and mate

with each other in order to produce increasingly stronger individuals. Each individual in the

population represents a potential solution to the problem that is to be solved and is referred to

as a chromosome (Rooji et al., 1996). The basic working of GA can be understood concisely

by the diagram shown in Figure 3.4. An initial population of individuals (also called

chromosomes) is created and according to an objective function in focus the fitness values of

all chromosomes is evaluated. From this initial population parents are selected who mate

together to produce off springs (also called children). The genes of parents and children are

mutated. The fittest among parents and children are sent to a new pool. The whole procedure

is carried over until any of the two stopping criteria is met i.e. the required number of

generations has been reached or convergence has been achieved.

Chromosomes are the basic unit of population and represent the possible solution vector; they

are assembled from a set of genes that are generally binary digits, integers or real numbers

(Mitchell, 1996, Randy and Sue, 1998). A chromosome can be thought of as a vector x

consisting of l genes gl:

x = (g1, g2,...gl), gl G (3.5)

l is referred to as the length of the chromosome. The “g” represents the binary genes (G

={0,1}), or integer genes (G ={...-2, -1, 0, 1, 2, …}) or real-value genes (G = R ). In the last

case, the real values are stored in a gene by means of a floating point representation (Rooji et

al., 1996)

The three genetic operators: selection, crossover (mating) and mutation in GA are primary

force to produce new and unique offsprings having the same number of genes as that of

parents. The selection operator is used to select parents from the pool. Crossover (mating)

operator is used to produce offsprings from the selected parents. The parent chromosomes are

mated to produce new offsprings representing new solution vectors. Like selection operator,

various crossover techniques have been developed over the years, out of which the famous are

single-point crossover (simple crossover), two-point crossover and uniform crossover

(Mitchell, 1996). A crossover point is selected arbitrarily at the identical location in two

parents and the two alternate halves of two parents are recombined to form two children



having new combination of gene values. The mutation operator is used to introduce changes

in genes of a chromosome. The mutation keeps the diversity in the genes of a population and

stops it from a premature convergence (Bowden et al., 2005). In the traditional binary GA,

using binary digits as the gene values (i.e. 0 and 1); the value of selected gene is inversed in

mutation i.e. if it has 0 value it is mutated as 1 or vice versa. However in real coded GA the

two genes in a chromosome are selected and there values are swapped to introduce mutation.

Combination of genetic algorithm and artificial neural network (GAANN)

The flow chart of the GAANN model is shown in Figure 3.5. An initial population is crowded

with “n” number of chromosomes where “n” is referred to as the population size. An

objective function comprising of feed forward ANN model with complete description of its

architecture is defined. It reads training patterns once at the start of model and stores them in

memory for applying to each chromosome. The total number of genes l of each chromosome

represents the total synaptic weights of ANN model.

{g1, g2, …gl} = {w(ifhr), w(ibhr), w(hfor), w(hbor)} (3.6)

where ‘w’ represents the value of a synaptic weight, subscript ‘i’ represents a node of input

layer, ‘h’ is a node of hidden layer and ‘o’ represents the output layer node, ‘f ’ is serial

number of node which forwards the information (i.e. f = 1, 2, 3, …), ‘r’ is serial number of

node which receives information (i.e. r = 1, 2, 3, .…), ‘ib’ represent the bias node of input

layer and ‘hb’ is bias node of hidden layer.

At the start of model, the fitness values of all the chromosomes of population are evaluated by

ANN function. The real values stored in the genes of chromosome are read as the respective

weights of ANN model. Figure 3.6 shows an example of translation of the genes of a

chromosome into the respective synaptic weights of an ANN model. The ANN performs feed

forward calculations with the weights read from genes of forwarded chromosome as per

Equation 3.6, and calculates MSE. The inverse of MSE is regarded as the fitness value of

chromosome. By this way, the fitness values of all chromosomes of initial population are

calculated by ANN function.

The selection operator selects two parent chromosomes randomly. The roulette wheel

operator with elitism is used in this model. Elitism is a scheme in which the best chromosome

of each generation is carried over to the next generation in order to ensure that the best

chromosome does not lost during the calculations. The selected parents are mated to produce



two children having the same number of genes. The uniform crossover operator is used with a

crossover rate of pc = 1.0. In uniform crossover, a toss is done at each gene position of an

offspring and depending upon the result of toss, the gene value of 1st parent or 2nd parent is

copied to the offspring. The genes of children are then mutated with the swap mutation

operator with a mutation rate of pm = 0.8. The mutated children are then evaluated by ANN

function to know their fitness values. The fitness values of all the four chromosomes (2

parents & 2 children) are compared and the two chromosomes of highest fitness values are

then sent to a new population and the other two are abolished. The evolutionary operators

continue this loop of selection, crossover, mutation and replacement until the population size

of new pool is same as old pool. One generation cycle completes at this stage and process is

repeated until any of two stopping criteria is fulfilled i.e. maximum number of generations are

reached or the convergence has been achieved. And the best chromosome which is tracked so

far through the number of generations is sent to the ANN function. The genes of best

chromosome are read as weights of ANN model and represent the optimised weights of ANN

model. With these weights, the model is said to be fully trained. Finally, the train and test sets

are simulated by using these weights (Sohail et al., 2005)

The GAANN is coded in C language and some sub routines of LibGA package (Arthur and

Rogers, 1995) for evolutionary operators of GA has been used with alterations to read and

process the negative real values.



Figure 3.4 Basic idea of genetic algorithm (Sohail et al., 2005)

Stop

Test

Convergence?

Create population

Evaluate fitness values

Selection

Crossover

Mutation

No Yes



Figure 3.5 Flow chart showing steps in GAANN model

Send 2 fittest individuals among 2 parents

and 2 children to a new pool

Start

Define feed forward ANN

Evaluate fitness values

FV = 1.0 / MSE

Select parents by roulette wheel method

Create initial population of individuals

Population Size (PS) = n

PS of new pool =n

YES

NO

Crossover parents by uniform crossover

method with pc = 1.0

Mutate genes of children by swapping with

pm = 0.8

Termination criteria

satisfied? = n

Select Best individual in all generations

NO

YES



Figure 3.6 An example of assigning gene values of a chromosome to the respective synaptic

weights of ANN architecture during a GAANN modelling

3.2.3 Gene-expression programming

Gene-expression Programming (GEP) is used to perform a non-parametric symbolic

regression. Symbolic regression although is very similar to traditional parametric regression,

does not start with a known function relating dependent and independent variables as the

latter. GEP programs are encoded as linear strings of fixed length (the genome or

chromosomes), which are afterwards expressed as nonlinear entities of different sizes and

shapes (Ferreira 2001a, b, 2006).

GEP automatically generates algorithms and expressions for the solution of problems, which

are coded as a tree structure with its leaves (terminals) and nodes (functions). The generated

candidates (programs) are evaluated against a “fitness function” and the candidates with

w(i1,h1)

w(i2,h2)

w(i2,h1)

w(i1,h2)

w(h1,o1)

w(h2,o1)

w(i1,h1) w(i1,h2) w(i2,h1) w(i2,h2) w(h1,o1) w(h2,o1)

i1

i2

h1

h2

o1

(a)

(b)



higher performance are then modified and re-evaluated. This modification evaluation cycle is

repeated until an optimum solution is achieved. In GEP a population of individual combined

model solutions is created initially in which each individual solution is described by genes

(sub-models) which are linked together using a predefined mathematical operation (e.g.

addition). In order to create the next generation of model solutions, individual solutions from

the current generation are selected according to fitness which is based on the pre-chosen

objective function. These selected individual solutions are allowed to evolve using

evolutionary dynamics to create the individual solutions of the next generation. This process

of creating new generations is repeated until a certain stopping criterion is met (Fernando et

al., 2009).

Two important components of the GEP include the chromosomes and the expression trees

(ETs). The ETs are the expression of the genetic information encoded in the chromosomes.

The process of information decoding from chromosomes to the ETs is called translation,

which is based on a kind of code and a set of rules. There exist very simple one to one

relationships between the symbols of the chromosome and the functions or terminals they

represent in the genetic code. To predict the flood quantiles the set of independent variables

(predictor variables) to be used in the individual prediction equation are to be identified. Then

a set of functions (e.g. ex, xa, sin(x), cos(x), ln(x), log(x), 10x , etc.) and arithmetic operations

(+, -, /, *) are defined. The terminals and the functions form the junctions in the tree of a

program.

In GEP, k-expressions (from Karva notation) which are fixed length list of symbols are used

to represent an ET as shown in Figure 3.7. These symbols are called chromosomes, and the

list is a gene. The Gene “sqrt, , ±, a, b, c and d” can be represented as ET as shown in

Figure 3.7. The GEP gene contains head and a tail. The symbols that represent both functions

and terminals are present in the head while tail only contains terminals. The length of the head

of the gene h is selected for each problem while the length of the tail is a function of length of

the head of the gene.

In order to obtain the best GEP model, the mean squared error was used as ‘fitness function’,

which was based on the observed and predicted flood quantiles; the training was undertaken

to minimise this error. In order to develop the combined model in GenXProTools®, the

parameter settings in Table 3.1 were used to develop the models.



Figure 3.7 GEP expression tree (ET)

Table 3.1 Parameters used per run in GEP model

Parameters Description Amount

P1 Chromosomes 20

P2 No of genes 5

P3 Head size 6

P4 Tail size 7

P5 Fitness function error type MSE

P6 Linking function Subtraction

P7 Mutation rate 0.044

P8 Function set +, -, *, /, x2, x3, sqrt, Exp, Ln, Sin, Cos,

3Rt, Atan, Pow, Pow10, Log, Log2

P9 Inversion rate 0.1

P10 Gene recombination rate 0.1

P10 One point recombination rate 0.3

P10 Two point recombination rate 0.1

P10 Gene Transposition rate 0.1

P10 Data type Floating-Type

3.2.4 Co-active neuro fuzzy inference system (CANFIS)

Fuzzy logic provides a different way to approach a control or classification problem. This

method focuses on what the system should do rather than trying to model how it works. This

procedure of developing a fuzzy inference system (FIS) using the framework of adaptive

neural network is called an adaptive neuro fuzzy inference system (ANFIS). A typical FIS is

shown in Figure 3.8.

Consider the example of simple FIS with only two inputs x and y and one output z and

suppose that the rule base contains two fuzzy if-then rules of Takagi and Sugeno (1983).

d c b a

sqrt

- +



Let A be a crisp set. An individual x from a universal set X is determined either to be a

member of A or a non-member of A. This can be expressed by:

}1,0{:)( XXA (3.7)

Figure 3.8 Fuzzy inference system (FIS) (Shi and Mozimoto, 2000)

Fuzzy logic can be best understood using set membership where the membership values

represent the degrees with which each object is associated with the properties that are

distinctive to the collection. Formally, a fuzzy set A is defined as a collection of objects with

membership values between 0 (complete exclusion) and 1 (complete membership).

Membership grade of each element in X is determined through a membership function A

which maps the elements of a universe of discourse X to the unit interval [0, 1].

}1,0{: XA (3.8)

By using approximate reasoning, a fuzzy logic description can be used to effectively model

the uncertainty and nonlinearity of a system (Shu et al., 2008). Approximate reasoning

provides decision support and expert system bund by a minimum of rules and it is the most

obvious implementation in the field of artificial intelligence.

Rule 1: If x is A1 and y is B1, then f1 = p1x + q1y + r1,

Rule 2: if x is A2 and y is B2, then f2 = p2x + q2y + r2



Where A1, A2 and B1, B2 are the membership functions of input x and y respectively; p1, q1, r1

and p2, q2, r2 are the parameters of the output functions. The node functions in the same layer

of the same function family as described below:

Layer 1: Each node in this layer performs fuzzification and generate membership grade of a

fuzzy set (A, B, C or D) and specifies the degree to which the given input belongs to one of

the fuzzy sets. The fuzzy sets are defined by membership functions (MFs).

Layer 2: Each node in this layer is denoted by determined MF of the whole input vector by

aggregating the fuzzified results of the individual scalar functions of the every input variable.

The output of each node in this layer is obtained by multiplying the incoming signals and

represents the firing strength of a rule.

Layer 3: This layer has two components. The upper component applies to the MFs to each of

the inputs while the lower component is a representation of the modular network that

computes, for each input, the sum of all the normalized firing strengths (Parthiban and

Subramanian, 2009).

Layer 4: The fourth layer calculates the weight normalization of the output of the two

components from the third layer and produces the output of the CANFIS network.

Fuzzy rules and fuzzy sets in the CANFIS capture and store the regional information. The

training algorithm tunes the system parameters over the entire data space according to the

hybrid learning rules. This approach provides a general framework that combines two

techniques, the ANN and fuzzy systems. CANFIS model provides nonlinear modelling

capability and requires no assumption of the underlying model. By utilizing the fuzzy

techniques, the linguistic relationship between the input and output can be expressed using the

fuzzy rules. Unlike the initialization of an ANN, which may require several rounds of random

selection, the initialization of a CANFIS can be performed using the one pass subtractive

clustering algorithm. A typical CANFIS model is shown in Figure 3.9.

In case of CANFIS, the fuzzy neuron that applies membership functions (MFs) to inputs is the

fundamental component of CANFIS. The general bell and Gaussian functions are the two

commonly used MFs (Principe et al., 2000). The bell shaped membership function is used in

this study. The normalized axon/neuron in the network is used to expand the output into the

range of 0 to1. One of the advantages associated with the fuzzy axon is that their MF can be

modified through back propagation during network training and results in the expedition of



the convergence. The modular neural network that applies functional rules to the inputs is the

second major component of CANFIS. The number of modular networks equals the number of

network outputs, and the number of processing elements in each network corresponds to the

number of MFs.

Figure 3.9 A typical structure of CANFIS (Parthiban and Subramanian, 2009)

The CANFIS also has a combiner axon that applies the MFs outputs to the modular network

outputs (Roger et al., 1997; Alecsandru et al., 2004). Finally, the combined outputs are

channelled through a final output layer and the error is backpropagated to both the MFs and

the modular networks. There are a total of five layers in the CANFIS similar to ANFIS and

each layer function is summarised as follows. The fuzzification of the input is performed by

the each node in layer 1. Each node in this layer is the membership grade of a fuzzy set (A1,

A2, B1 or B2) and specifies the degree to which the given input belongs to one of the fuzzy

set. The input to the layer 2 is the product of all the output pairs from layer 1. Two

components are present in the next third layer in the network. The upper component of this

layer applies the membership functions to each of the inputs, while the lower component is a

representation of the modular network that computes, for each output, the sum of all the firing



strength. The weight normalization of the outputs of the two components of the third layer is

performed in the fourth layer of the network and this produces the final output of the network

(Ishak and Trifiro, 2007).

The CANFIS model integrates adaptable fuzzy inputs with a modular neural network to

rapidly and accurately approximate complex functions. The TSK fuzzy model proposed by

Takagi, Sugeno and Kang (Takagi and Sugeno, 1985; Sugeno and Kang, 1988) is used in the

present study, since this type of fuzzy model best fits the multi-input, single output system

(Aytek, 2009).

For the CANFIS model development, model catchments were clustered based on model

variables (A, Itc_ARI) into several class values in layer 1 to build up fuzzy rules, and each fuzzy

rule was constructed through several parameters of membership function in layer 2. A fuzzy

inference system structure was generated from the data using subtractive clustering. This was

used in order to establish the rule base relationship between the inputs.

In order to obtain the best CANFIS models, the MSE was used as the ‘fitness function’, which

was based on the observed and predicted flood quantiles; the training was undertaken to

minimise this error. Lavenberg-Marquardt (LM) method was used as the training algorithm to

minimize the MSE. CANFIS model was trained with a set of input and output data to adjust

the weights and to minimize the MSE between the desired outputs and the model outputs. The

testing data set was selected randomly to produce a reasonable sample of different catchment

types and sizes. Two inputs (A, Itc_ARI) were used in one input layer and one output layer with

one output (Qpred).

In the case of CANFIS, the bell membership function and the TSK neuro fuzzy model were

used, as this type of fuzzy model best fits the multi-input, single output system (Aytek, 2009).

LM algorithm was used for the training of CANFIS model. The stopping criteria for the

training of the CANFIS network was set to be a maximum of 1000 epochs and training was

set to terminate when the MSE drops to 0.01 threshold value.

3.2.5 Quantile regression technique (QRT)

A flood quantile is probabilistic flood estimate for a selected ARI. United States Geological

Survey (USGS) proposed a quantile regression technique (QRT) where a large number of

gauged catchments are selected from a region and flood quantiles are estimated from recorded

streamflow data, which are then regressed against catchment variables that are most likely to



govern the flood generation process. Studies by Benson (1962) suggested that T-year flood

peak discharges could be estimated directly using catchment characteristics (predictor

variables) (X) data by multiple regression analysis. (Thomas and Benson, 1970; and Stedinger

and Tasker, 1985; Haddad and Rahman, 2012):

...21

210

XXQT (3.9)

Where, regression coefficients s are generally estimated by using an ordinary least squares

(OLS) or generalised least squares (GLS) regression. There have been various techniques and

many applications of regression models that have been adopted for hydrological regression.

Most of these methods are derived from the methodology set out by the USGS as described

above. The USGS has been applying the QRT for several decades. A well-known study using

the QRT with an OLS procedure was carried out by Thomas and Benson (1970). The study

tested four regions in the United States for design flood estimation using multiple regression

techniques that related streamflow characteristics to drainage-basin characteristics.

The OLS estimator has traditionally been used by hydrologists to estimate the regression

coefficients β in regional hydrological models. But in order for the OLS model to be

statistically efficient and robust, the annual maximum flood series in the region must be

uncorrelated, all the sites in the region should have equal record length and all estimates of T

year events have equal variance. Since the annual maximum flow data in a region does not

generally satisfy these assumptions, the OLS approach can provide very distorted estimates of

the model’s predictive precision (model error) and the precision with which the regression

model coefficients are being estimated (Stedinger and Tasker, 1985).

In this study, in developing the QRT, both the dependent and independent variables were log-

transformed to linearize Equation 3.9. In this study an OLS regression was adopted to

develop prediction equations for each of the six flood quantiles using two predictor variables

(A, Itc_ARI). The OLS is easily implementable approach whereas, GLS needs specialised

software. However, both provide almost similar results unless data is highly correlated

(Haddad et al., 2008). The data sets for building and independent testing of the QRT model

were the same as with the other non-linear models. The MINITAB 14 software was used to

develop the QRT models.



3.2.6 Cluster analysis

In the process of formation of regions and to identify the groups of catchments in catchment

characteristics data space, two methods were adopted in this study: cluster analysis and

principal component analysis.

Clustering algorithms are generally categorised under two different categories – partitional

and hierarchical. Partitional clustering algorithms divide the data set into non-overlapping

groups and algorithms, k-mean, bisecting k-mean, k-modes, etc., fall under this category.

Partitional clustering algorithms employ an iterative approach to group the data into a pre-

determined k number of clusters by minimising a cost function. Whereas, hierarchical

clustering involves creating clusters that have a predetermined ordering from top to bottom.

A number of methods of cluster analysis with different distance measures are used. One

problem in cluster analysis is that it generates different groupings with different methods of

cluster analysis. The question then arises which of these groupings is to be selected as the

‘acceptable grouping’. In selecting the ‘acceptable grouping’ the criterion was used that there

should be no chaining effect in the final clusters and there should be well defined grouping in

the final sets of clusters/groupings.

To overcome the problem arising from different dimensional units of the variables in

cluster analysis, the variables were standardized. The variables were transformed to z-

scores (mean = 0 and standard deviation = 1). Hence, it is assumed that there could be

two groupings in cluster analysis so that each group contains a relatively large number of

stations, which is needed for successful calibration of the RFFA model using non-linear

techniques.

The hierarchical cluster analysis

There are numerous ways in which clusters can be formed. Hierarchical clustering is one of

the most straightforward methods. A key component of the analysis is repeated calculation of

distance measures between objects, and between clusters once objects begin to be grouped

into clusters. The outcome is represented graphically which is known as a dendrogram. For

this study the hierarchical clustering was used with following methods:

Wards;

Median;



Baverage;

Waverage; and

Centroid.

Because the goal of this cluster analysis is to form similar groups of figure-skating judges, so

to measure a similarity or distance, a criterion needs to be selected. This distance is a measure

of how far apart two objects are, while similarity measures how similar two objects are. For

cases that are alike, distance measures are smaller and similarity measures are larger. Some,

like the Euclidean distance, are suitable for only continuous variables, while others are

suitable for only categorical variables. There are also many specialized measures for binary

variables. But in this case different measures were adopted and the method with best clusters

and with minimum outliers was selected for ANN modelling. For each of the above methods

following distance measure options were adopted:

Block;

Euclid;

Seuclid;

Correlation;

Cosine;

Chebychev;

Minkowski; and

Power.

Based on above mentioned criteria for selecting the best grouping, cluster method ‘Wards’

with a distance measure option of ‘Block’ was adopted for selection of region based on the

Hierarchical cluster analysis.

K-means clustering;

K-means clustering is a partitioning method. The function k-means partitions data

into k mutually exclusive clusters, and returns the index of the cluster to which it has assigned

each observation. Unlike hierarchical clustering, k-means clustering operates on actual

observations (rather than the larger set of dissimilarity measures), and creates a single level of

http://www.mathworks.com.au/help/stats/kmeans.html



clusters. The distinctions mean that k-means clustering is often more suitable than hierarchical

clustering for large amounts of data.

3.2.7 Principle component analysis (PCA)

At the second stage of selecting acceptable grouping as part of formation of regions, the

principal component analysis (PCA) was undertaken. PCA is basically a variable-reduction

technique that shares many similarities to exploratory factor analysis. Its aim is to reduce a

larger set of variables into a smaller set of artificial variables, called 'principal components',

which account for most of the variance in the original variables. The PCA transforms a set of

correlated variables into a new set of uncorrelated components, such that the first component

accounts for the largest amount of the total variation in the data; the second component, which

is uncorrelated with the first, accounts for the maximum amount of the remaining total

variation not already accounted for by the first component, and so on. The PCA transforms a

set of correlated variables into a new set of uncorrelated components, such that the first

component accounts for the largest amount of the total variation in the data; the second

component, which is uncorrelated with the first, accounts for the maximum amount of the

remaining total variation not already accounted for by the first component, and so on. In this

study PCA was undertaken using the statistical package SPSS. Variables used in PCA are

discussed in Chapters 4 and 5.

3.2.8 Model validation technique

In this study, models/prediction equations were developed for each of the 6 flood quantiles

being 2, 5, 10, 20, 50 and 100 years average recurrence intervals (ARIs). A split-sample

validation technique was adopted to test the performance of the developed models/prediction

equations where the data set was divided into two parts (i) training/modelling data set, which

includes 80% of the study catchments; and (ii) validation/testing data set, which includes 20%

of the study catchments. The artificial intelligence based RFFA models and QRT were first

developed using the training/modelling data set, which were then tested using the

validation/testing data set. This enabled an independent testing of the models/prediction

equations developed in this study.



3.3 Summary

This chapter provides a description of the statistical and mathematical tools adopted in this

study. These include ANN, GAANN, GEP, CANFIS, cluster analysis, principal component

analysis and quantile regression technique (QRT). The fundamental concepts, mathematical

equations and input data requirements for each of these methods are presented in this chapter.

The adopted split-sample validation technique is also described, which allowed an

independent testing of the models/prediction equations developed in this study.



CHAPTER 4

SELECTION OF STUDY AREA AND DATA

PREPARATION

4.1 General

This thesis focuses on design flood estimation in ungauged catchments using artificial

intelligence based methods. Regional flood frequency analysis (RFFA) method is based on

the streamflow and catchment characteristics data of a set of selected gauged catchments. It is

important that appropriate set of catchments are selected and data is prepared following

standard procedure. This chapter presents selection of study area and catchments, collation of

streamflow and catchment characteristics data used in this research.

4.2 Selection of study area

This study selects eastern Australia as the study area since this part of Australia has the

highest density of stream gauging stations with good quality data. The eastern Australia

covers the states of Queensland (QLD), New South Wales (NSW), Victoria (VIC), Australian

Capital Territory (ACT) and Tasmania (TAS). The selected study area is shown in Figure 4.1.

Figure 4.1 Location of the selected study area (coloured parts of the map)



4.3 Selection of study catchments

4.3.1 Factors considered for selection of catchments

The following factors were considered in making the initial selection of the study catchments:

Catchment area

The flood frequency behaviour of large catchments has been shown to significantly differ

from smaller catchments, and since the RFFA method is intended for small to medium sized

catchments, the proposed method should be developed based on small to medium sized

catchments. Australian Rainfall and Runoff (ARR) (I. E Aust., 1987) suggests an upper limit

of 1000 km2 for small to medium sized catchments, which seems to be reasonable and was

adopted in this thesis.

Record length

For a stream gauging station, a long enough streamflow record is ideally needed to

characterize the underlying flood probability distribution with reasonable accuracy. In most

practical situations, streamflow records at many gauging stations in a given study area are not

long enough and hence a balance is required between obtaining a sufficient number of stations

(which captures greater spatial information) and a reasonably long record length (which

enhances accuracy of at-site flood frequency analysis). Selection of a cut-off record length

appears to be difficult as this can affect the total number of stations available to develop the

RFFA technique in a study area. For this study, the stations having a minimum of 10 years of

annual instantaneous maximum flow records were selected initially as ‘candidate stations’.

Regulation

Ideally, the selected streams should be unregulated, since major regulation affects the rainfall-

runoff relationship significantly (storage effects). Streams with minor regulation, such as

small farm dams and diversion weirs, may be included because this type of regulation is

unlikely to have a significant effect on annual maximum (AM) floods. Gauging stations on

streams subject to major upstream regulation were not included in this thesis.



Urbanisation

Urbanisation can affect flood behaviour dramatically (e.g. decreased infiltration losses and

increased flow velocity). Therefore catchments with more than 10% of the area affected by

urbanisation were not included in this thesis.

Landuse change

Major landuse changes, such as the clearing of forests or changing agricultural practices

modify the flood generation mechanisms and make streamflow records heterogeneous over

the period of record length. Catchments which have undergone major landuse changes over

the period of streamflow records were not included in the data set.

Quality of data

Most of the statistical analyses of flood data assume that the available data are essentially

error free; at some stations, this assumption may be grossly violated. Stations graded as ‘poor

quality’ or with specific comments by the gauging authority regarding quality of the data were

assessed in greater detail; if they were deemed to be of ‘low quality’, they were excluded from

the study.

4.4 Streamflow data preparation

4.4.1 Methods of streamflow data preparation

Missing observations in streamflow records at gauging locations are very common and one of

the elementary steps in any hydrological data analysis is to make decisions about dealing with

these missing data points. Missing records in the AM flood series were in-filled where the

extra data points can be estimated with sufficient accuracy to contribute additional

information rather than ‘noise’. For this research following methods were applied following

the approach of Rahman (1997) and Haddad et al. (2010).

Method 1

In this method the monthly instantaneous maximum (IM) data was compared with monthly

maximum mean daily (MMD) data at the same station for years with data gaps. For a missing

month of instantaneous maximum flow corresponding to a month of very low maximum mean

daily flow, that was taken to indicate that the AM did not occur during that missing month.



Method 2

This method involved a linear regression of the AM mean daily flow series against the annual

instantaneous maximum series of the same station. Infilling of the gaps in IM record was

performed using the developed regression equations. The IM record is not to extend the

overall period of record of instantaneous flow data, but to infill the missing data points.

As Method 1 is more directly based on observed data for the missing month and involves

fewer assumptions, it was preferred over Method 2.

4.4.2 Tests for outliers

In a set of annual maximum (AM) flood series there is a possibility of outliers being present.

An outlier is an observation that deviates significantly from the bulk of the data, which may

be due to errors in data collection or recording, or due to natural causes.

The method for treating outliers suggested in ARR (I.E Aust., 1987) was not adopted here, as

it includes an adjustment for skew, employing somewhat ‘circular’ logic. Instead, the

procedure known as Grubbs and Beck (1972) method was adopted. The Grubbs and Beck

(1972) method is based on the principle of determining high and low outlier threshold values

by applying a one-sided 10% significance level test, which considers the sample size. The test

was developed by Grubbs and Beck (1972) for detecting single outlier from a normal

distribution, but has been shown to be also applicable to the LP3 distribution.

4.4.3 Trend analysis

Hydrological data for any flood frequency analysis, be it at-site or regional, should be

stationary, consistent and homogeneous. The AM flow series should not show any time trend

to satisfy the basic assumption of stationarity with traditional flood frequency analyses

methods. Thus, in this study, a trend analysis was carried out where possible to identify

stations showing significant trend and the stations which did not show any significant trend

were included in the primary data set for this study.

Two tests were initially applied to detect trend, the Mann–Kendall test (Kendall, 1970) and

the distribution free CUSUM test (McGilchrist and Wodyer, 1975); both tests were applied at

the 5% significance level. The Mann-Kendall test is concerned with testing whether there is

an increase or decrease in a time series, whereas the CUSUM test concentrates on whether the

means in two parts of a record are significantly different. As a useful guide and in addition to



the trend tests, a simple time series plot and a cumulative flow graph of the station were also

used to detect shifts in the AM flood data. It should be noted that trends in a time series data

do not necessarily mean non-stationarity. In climate change research, non-stationarity means

significant changes in statistical properties of the time series data of a hydro meteorological

variable over time. Trends may not change statistical properties (such as mean and variance)

of a time series data significantly. Therefore, trend analysis cannot be used as stationarity test;

however, trends may be an indicator of stationarity.

4.4.4 Rating error analysis

The rating curve used to convert measured flood levels to flood flow rates is based on

periodic measurements of flow areas and velocities over a range of flow magnitudes.

However, the range of observed flood levels generally exceeds the range of ‘measured’ flows,

thus requiring different degrees of extrapolation of well-established rating curves.

Any rating curve extrapolation errors are directly transferred into the largest observations in

the AM flood series, and use of extrapolated data in flood frequency analysis can thus result

in grossly inaccurate flood frequency estimates.

To assess the degree of rating curve related error for a given station, the AM flood series data

point for each year (estimated flow QE) was divided by the maximum measured flow (QM) to

obtain a rating ratio (RR) (see Equation 4.1). If the RR value is below or near 1, the

corresponding AM flow may be considered to be free of rating curve extrapolation error.

However, a RR value well above 1 indicates a rating curve error that can cause notable errors

in flood frequency analysis.

M

E

Q

QRRRatioRating )( (4.1)

For any RFFA, a large number of stations with reasonably long record lengths are required

and hence a trade-off needs to be made between an extensive data set that includes stations

with very large RR values (and thus lower accuracy) and a smaller data set with RR values

restricted to what could be considered to be a “reasonable upper limit” of rating curve errors.

A working method to decide on a cut-off RR value was determined by looking at the average

RR value and the maximum RR value for each station in a region/state. Based on the results

from Victoria and NSW, the following cut-off values were found to represent a reasonable



compromise between accuracy at individual sites and total size of the regional data set: an

average RR value of 4 and a maximum RR value of 20.

4.5 Selection of catchment characteristics

Identification of the most relevant catchment characteristics is difficult as there is no objective

method for doing this; also many catchment characteristics are highly correlated, thus the

presence of many of these in the model can cause problems with statistical analysis such as

introducing multi-colinearity and secondly it does not provide any extra useful information.

The evaluation and success of catchment characteristics used in past studies should be used as

a criterion for the initial selection of candidate characteristics. The initial selection of

candidate characteristics should be based on an evaluation and success of catchment

characteristics used in past studies. All the possible catchment/climatic characteristics must be

considered from the past studies to make the selection for a given study. This can increase the

validity of the model to be developed. Rahman (1997) considered this aspect in detail from

over 20 previous studies to develop a reasonable starting point. But in RFFA, the significance

of characteristics may vary from region to region; therefore, no general inference about the

significance of a particular catchment characteristic can be made for a given region based on

the findings of other studies.

4.5.1 Selection criteria

Following guidelines were adopted in this study to select the catchment characteristics

following the approach of Rahman (1997):

The characteristic should have a plausible role in flood generation.

They should be unambiguously defined.

Characteristics should be easily obtainable. When a simpler characteristic and a

complex one are correlated and have similar effects, then the simpler characteristic

should be chosen.

If a derived/combined characteristic is used, it should have a simple physical

interpretation.



The selected characteristics should not be highly correlated because this introduces

unstable parameters in multiple regression analysis.

The prediction performance of a particular characteristic in other regionalisation

studies should be examined as this might provide some information regarding the

importance of a characteristic.

4.5.2 Catchment characteristics considered in this thesis

Following five catchment characteristics were selected in this thesis on the basis of criteria

mentioned in section 4.5.1. They are also described in detail in the next section.

The candidate catchment/climatic characteristics are:

Design rainfall intensity (I_tc_ARI, mm/h);

Mean annual rainfall (R, mm);

Mean annual evapo-transpiration (E, mm);

Catchment area (A, km2); and

Slope of central 75% of mainstream S1085 (S, m/km).

4.5.3 Rainfall intensity

Rainfall intensity, with some appropriate duration and average recurrence interval (ARI), has

been found to be the most influential climatic characteristic in the previous RFFA studies.

There is no doubt that it is significant in the flood generation process. It is also quite easy to

obtain.

The use of rainfall intensity requires the selection of an appropriate duration and ARI. It

seems to be logical to use rainfall intensity with duration equal to the time of concentration

(tc), as applied in the rational method. However, the time of concentration (tc) differs for the

catchments in the study area due to variability in size and shape; i.e. it is virtually impossible

to select a storm having equal time of concentration which is representative of every

catchment in this thesis. It was therefore decided to include the following design rainfall

intensities in this study:

(tc) duration, 2 years ARI (I_tc_2, mm/h);






(tc) duration, 50 years ARI (I_tc_50, mm/h); and

(tc) duration, 100 years ARI (I_tc_100, mm/h).

All the basic design rainfall intensities data for the selected catchments were obtained from

ARR, Vol. 2 (I. E. Aust., 1987) and the software AUSIFD was used to obtain other design

rainfall intensities. AUSIFD is widely used software in Australia to derive design rainfalls.

For consistency, and ease of application, the formula recommended in ARR 1987 for Victoria

and eastern NSW, given by Equation 4.2, was adopted in this thesis to estimate time of

concentration tc (hours) from catchment area A (km2).

38.076.0 Atc (4.2)

4.5.4 Mean annual rainfall

Mean annual rainfall has been adopted in many previous studies; although it may not have a

direct influence or a link with flood peaks it can still have a secondary effect by acting as

surrogate for other catchment characteristics (e.g. vegetation). It is also quite easy to obtain.

Thus, mean annual rainfall was included as a candidate predictor variable in this study The

mean annual rainfall data was obtained from Australian Bureau of Meteorology CD. For all

the catchments, the mean annual rainfall value for the rainfall station closest to the centroid of

each catchment was extracted.

4.5.5 Catchment area

Catchment area is the most frequently adopted morphometric characteristic and the main

scaling factor in the flood process studies, since it has a direct impact on the possible flood

magnitude from a given storm event. Almost all of the reported RFFA studies have found

catchment area to be very significant. One of the reasons why the area variable has been so

useful in statistical hydrology is its association with other significant morphometric

characteristics like slope, stream length, and stream order. Catchment areas of the selected

catchments were measured by planimeter from 1:100,000 topographic maps. The derived



areas were also compared to the values provided in the catchment data base that contained the

streamflow data provided by the stream gauging authority. Area was characterised by

Anderson (1957) as the ‘devil’s own variable’, because almost every watershed characteristic

is correlated with it. As in the case of area, the mean annual flood is directly proportional to

other morphometric characteristics, which are again directly proportional to area (e.g. stream

order, stream length). The total volume of runoff (Q) is proportional to the area of the

catchment (A) and of the general form:

Q = cAm (4.3)

Where, the exponent m varies from 0.5 to 1.00. Catchment area was included in this study as a

candidate predictor variable.

4.5.6 Slope S1085

From the different measures of slope, S1085 seems to be easily obtainable and reported to be

the best measure for prediction of mean flood (Benson, 1959). Thus, S1085 was used in this

study. S1085 method of slope measurement in this study excludes the extremes of slope that

can be found at either end of the mainstream. It is the ratio of the difference in elevation of the

stream bed at 85% and 10% of its length from the catchment outlet and 75% of the

mainstream length.

The following methodology was adopted to derive the S1085 values:

Catchment boundaries were plotted on 1:100,000 topographic maps for each gauged

station.

The mainstream length was measured using an electronic map wheel. Where the

mainstream was taken as the total distance from the outlet to where it intersects with

the catchment boundary of the stream. The longest path was chosen for each

catchment as the main stream of that catchment.

Elevations were then derived for the 10% and 85% mainstream length positions. The

positions were interpolated from either 10 m or 20 m contours.

S1085 values were determined from Equation 4.4.

)(75.0

)(1085 12

L

EES

(4.4)



Where,2E is the elevation at the 0.85L position,

1E is the elevation at the 0.10L position and L

is the main stream length, where S1085 in m/km. The slope S1085 is referred to as S

henceforth.

4.5.7 Mean annual evapo-transpiration

Mean annual evapo-transpiration is the third influential climatic characteristic considered in

the flood generation process. Evapo-transpiration does not affect the flood peak directly but

can have a secondary effect by being a surrogate for other catchment characteristics. Evapo-

transpiration can be defined as the water lost from a water body through the combined effects

of evaporation and transpiration from catchment vegetation. In this study mean annual areal

potential evapo-transpiration data was used.

For this, the data was obtained from the Australian Bureau of Meteorology CD. For all the

catchments the value at the centroid of each catchment was extracted.

4.6 Streamflow data preparation for various states

4.6.1 NSW and ACT

A total of 635 stations were selected from NSW and ACT initially. For in-filling the gaps,

Method 1 was preferred over Method 2 (see Section 4.4.1 for description of these methods)

for different catchments in NSW.

Trend analysis

Initially the Mann-Kendall test was applied to the stations. The results showed that some 11%

of the stations had a decreasing trend generally after 1990. Given the magnitude of the

number of stations showing trend, time series plots and mass curves were prepared for the

stations showing trend to detect visually if significant changes in slope could be identified. A

typical plot is shown in Figure 4.2. A simple time series plot (Figure 4.3) is useful in addition

to trend tests in detecting and confirming shifts in data. With an indication from these tests

that flood data are not independently and identically distributed from year to year, there needs

to be caution applied when using short records in estimating long term risks.

The fact that the last 10–15 years of data (after late 1980’s) showed a significant downward

trend for many stations makes the inclusion of stations with short record length in flood

frequency analysis questionable, as this could introduce significant bias in the results. Hence,



it was decided that a station should have at least 25 years of streamflow data. The number of

eligible stations in NSW and ACT after the introduction of a cut off record length of 25 years

dropped to 106.

Checking for outliers in the AM flood series

The Grubbs and Beck (1972) method was adopted to check for the outliers. While the data

checking revealed many ‘outliers’ in the flood series, these did not preclude the use of the

remaining flood data in RFFA. The results of the outlier detection procedure are summarised

below:

40% of the stations were found to have low outliers. The maximum number of low

outliers detected in a data series was 9 and never exceeded 21% of the total number of

data points in a series.

Most of the detected low outliers occurred for stations located in low rainfall areas,

especially in the western parts of NSW.

31% of low outliers occurred in the years 1982, 1967 and 1994. This is not surprising

as there were severe droughts during these years; the maximum flows that occurred in

many rivers in these years were merely base flows, and not due to flood events.

47% of the stations did not show any outliers.

Only 5 stations had a high outlier.

The detected low outliers were treated as censored flows in flood frequency analysis using

ARR FLIKE (Kuczera and Franks, 2005).

Rating curve error

To assess the degree of rating curve related error for a given station, the rating ratio (RR) (see

Equation 4.1) was adopted. In the remaining data set of 106 stations from NSW, many had

RR values considerably greater than 1 (Figure 4.8). A cut-off RR value of 20 was adopted;

any station having an average RR value greater than 4 and a maximum RR value greater than

20 was rejected. This reduced the eligible number of stations from 106 to 96.

Final data set from NSW and ACT



A total of 635 stations were initially selected. After in-filling the gaps in the AM flood series,

trend analysis, introduction of a cut-off record length of 25 years, and consideration of rating

curve errors, only 96 stations remained, which represent about 15% of the initially selected

stations. The statistics of AM streamflow record lengths of these 96 stations are summarised

below:

Record lengths range from 25 to 74 years, mean 34 years, median 31 years and

standard deviation 10 years;

77% of the stations have record lengths in the range 25-35 years;

18% of the stations have record lengths in the range 40-55 years; and

5% of the stations have record lengths in the range 60-75 years.

The histogram of streamflow record lengths of the 96 stations from NSW and ACT is shown

in Figure 4.5.

Vk - Station 219001

-2

0

2

4

6

8

10

12

1940 1950 1960 1970 1980 1990 2000 2010Year

Vk

Significant shift

downwards

Figure 4.2 Result of trend analysis (Station 219001). Here Vk is CUSUM test statistic defined in

Histogram of Rating Ratio

2162

774

222

9967 61

2113

9 85 5

2

4

0

5

0

2

1

10

100

1000

10000

1 3 5 7 9 12 14 16 18 20 22 24 26 28 30 35 40 45

Rating Ratio - RR

Fre

qu

en

cy

Over 95% of rating ratios

between 1 & 20



McGilchrist and Wodyer, 1975

Figure 4.3 Result of trend analysis – time series plot (Station 219001)

Figure 4.4 Histogram of rating ratios for 106 stations from NSW

The statistics of catchment areas of the selected 96 stations are summarised below:

Catchment areas range from 8 to 1010 km2, with an average value of 353 km2, median

of 267 km2 and a standard deviation of 276 km2;

53% of catchments have areas smaller than 300 km2;

38% of stations have areas in the range of 301 km2 to 800 km2; and

10% of stations have areas in the range of 801 km2 to 1010 km2.

Station 219001

0

2000

4000

6000

8000

10000

12000

1940 1950 1960 1970 1980 1990 2000 2010

Year

An

nu

al M

ax

imu

m F

low

(m

3/s

)

Decrease in flow

magnitude



7

41

26

5 5 5

2 2 2

01

0

5

10

15

20

25

30

35

40

45

25 - 29 30 - 34 35 - 39 40 - 44 45 - 49 50 - 54 55 - 59 60 - 64 65 - 69 70 - 74 >75

Record Length (years)

Fre

qu

en

cy

Figure 4.5 Distribution of streamflow record lengths of 96 stations from NSW and ACT

The distribution of catchment areas is shown in Figure 4.6. The geographical distribution of

the finally selected 96 stations is shown in Figure 4.7. There is no station in far western NSW

that passed the selection criteria.

89

20

1312

78

45

6

3

1

0

5

10

15

20

25

0 - 25 26 - 100 101 -

200

201 -

300

301 -

400

401 -

500

501 -

600

601 -

700

701 -

800

801 -

900

901 -

1000

>1000

Catchment Area (km2)

Fre

qu

en

cy

Figure 4.6 Distribution of catchment areas of 96 stations from NSW and ACT

4.6.2 Tasmania

A total of 73 stations were selected as candidates from Tasmania, each having a minimum of

10 years of streamflow record. For in-filling the gaps in the AM flood series, Method 1 was

preferred over Method 2 (these methods are described in Section 4.4.1). The following points

summarise the results of the in-filling of the AM flood series data for Tasmania:



18 data points from 23 stations were in-filled by comparing flow records (Method 1);

27 data points from 12 stations were in-filled by regression (Method 2); and

20% of stations did not have any missing record.

After in-filling the gaps, the stations were then checked for possible trends (Section 4.4.3

details the method). Only three stations showed trends. The relevant data for checking the

rating ratios for Tasmania was largely unavailable, and hence no rating error analysis was

undertaken. About 9% of the stations showed low outliers. The maximum number of low

outliers detected in a data series was one and never exceeded 4% of the total number of data

points in a series. The low outliers occurred in the years 1967, 1982 and 2001. About 75% of

the stations did not show any outliers. About 14% of the stations showed high outliers;

however, these data points were not removed as no data error was detected.

While obtaining catchment characteristics data, 7 stations were found to have significant

proportions of lake areas, and were thus excluded; this reduced the dataset to 56 stations.

From this, 3 catchments over 1590 km2 were excluded, thus the final dataset contained 53

stations.

Figure 4.7 Geographical distributions of 96 catchments from NSW and ACT

The streamflow record lengths of the selected stations range from 10 to 58 years (median: 21

years and mean: 24 years). Figure 4.8 shows the distribution of record lengths. Figure 4.9

presents the distribution of catchment areas of the selected catchments. The catchment areas



range 4.6-1590 km2 (median: 102 km2 and mean: 240 km2). Figure 4.10 shows the locations

of the selected stations. There is a lack of station in the southern and eastern parts of the state.

1

15

10

7

2 2

0

2

4

6

8

10

12

14

16

1 - 10 11 - 20 21 - 30 31 - 40 41 - 50 51 - 60


Fre

qu

en

cy

Figure 4.8 Distribution of streamflow record lengths of the selected stations from Tasmania

9

6

2

5

2 2

3

1

0

3

0 0

1

0

1

2

3

4

5

6

7

8

9

10

0 - 25 26 -

50

51 -

100

101 -

200

201 -

300

301 -

400

401 -

500

501 -

600

601 -

700

701 -

800

801 -

900

901 -

1000

>1000


Fre

qu

en

cy

Figure 4.9 Distribution of catchment areas of the selected stations from Tasmania



Figure 4.10 Locations of selected catchments from Tasmania

4.6.3 Queensland

The streamflow data were obtained from the Department of Natural Resources & Water

(NRW), QLD. A total of 351 active and historical streamflow gauging station records were

provided by NRW. Gauge station metadata, AM flow records as well as the monthly and daily

records were supplied by the NRW for each station. Based on the adopted selection criteria,

the number of eligible stations was reduced to 289.

The streamflow data were in-filled by comparing flow records (Method 1) and/or regression

(Method 2). Method 1 was preferred over Method 2. Some years’ data could not be filled due

to many missing records. Some important statistics regarding the gap filling are:

81 data points were in-filled for 47 stations using Method 1;

413 data points were in-filled for 104 stations using Method 2; and

16 % of stations did not have any missing records.

To check for outliers, the Grubbs and Beck (1972) method was used. Some important

statistics about the outlier detection are:



39% of stations were found to have low outliers; the maximum number of outliers

detected in a data series was 4 and never exceeded 10% of the total number of data

points in a series.

Most of the detected low outliers occurred mainly in the midwestern and top parts of

Queensland.

The bulk of the low outliers occurred in the years 1967, 1982 and 2001.

61% of stations did not have any outliers.

A total of 117 stations (7% of the stations) showed a significant trend, and were removed

from the database. As a result, 265 stations were retained.

Furthermore, the data with streamflow record length of 25 years and greater was selected.

After the introduction of cut off period the numbers of catchments from QLD were dropped to

172. Figure 4.11 provides histogram of record lengths 172 stations. Some important statistics

of the streamflow record lengths are provided below:

The distribution of catchment areas of these catchments is shown in Figure 4.12. Some

important statistics of the catchment areas are summarised below:

24 catchments (9%) are smaller than 50 km2;

67 catchments (25%) are smaller than 100 km2;

47 catchments (18%) are in the range of 101 to 200 km2; and

37 catchments (14%) are larger than 600 km2.

The locations of the selected 172 stations are shown in Figure 4.13. There is no suitable

station located in the south-western part of Queensland.



1

99

62

73

23

1 1 3 1 1

0

20

40

60

80

100

120

1 - 10 11 - 20 21 - 30 31 - 40 41 - 50 51 - 60 61 - 70 71 - 80 81 - 90 91 - 100


Fre

qu

en

cy

Figure 4.11 Distribution of streamflow record lengths of the selected 172 stations from QLD

8

59

47

26

36

2725

1513

2

7

0

10

20

30

40

50

60

70

0 - 25 26 - 100 101 -

200

201 -

300

301 -

400

401 -

500

501 -

600

601 -

700

701 -

800

801 -

900

901 -

1000


Fre

qu

en

cy

Figure 4.12 Distribution of catchment areas of the selected 172 stations from QLD



Figure 4.13 Locations of the selected 172 stations from QLD

4.6.4 Victoria

Based on the adopted selection criteria, a total of 415 stations were initially selected as

candidates from Victoria each having a minimum of 10 years of streamflow record.

For in-filling the gaps in the AM flood series, Method 1 was preferred over Method 2. The

following points summarise the results of the in-filling of the AM flood series data in

Victoria.

273 data points from 187 stations were in-filled by comparing flow records (Method

1);

60 data points from 44 stations were in-filled by regression (Method 2);

Regression equations used in gap filling showed high R2 values (range 0.82 – 0.99,

mean = 0.93 and SD = 0.041); and

10% of stations did not have any missing records.

After in-filling the gaps, the stations were then checked for possible trends, as discussed

below.



Trend analysis:

Initially the Mann-Kendall test was applied to the stations. The results were rather surprising

as they revealed that some 20% of the stations had a decreasing trend. Given the magnitude of

the number of stations showing trend, time series plots and mass curves were prepared for the

stations showing trend to detect visually if significant changes in slope could be identified.

As an example, Figure 4.14 shows a significant overall downward trend for Station 230210,

supporting the result from the Mann-Kendall test, and a noticeable decrease in AM flows

from the late 1980s. In order to clarify this further the CUSUM test was applied; the result

was similar, with the plotted graph as seen in Figure 4.15 showing a downward shift in the

mean from 1995 onwards.

A simple time series plot was made in addition to trend tests in detecting and confirming

shifts in data. With an indication from these tests that flood data are not independently and

identically distributed from year to year, there needs to be caution applied when using short

records in estimating long term risks. The fact that the last 10–15 years of data (after late

1980’s) showed a significant downward trend for many stations (presumably due to the drier

climate epoch we have entered) makes the inclusion of stations with short records in

regionalization studies quite questionable.

Finally, 21 stations from Victoria were removed due to the presence of significant trend. The

number of eligible stations remaining after the application of trend tests and the introduction

of a cut off length of 25 years, dropped to 144, which is only 35% of the initially selected 415

stations. This result shows that the effective dataset for RFFA in a given region is likely to be

substantially smaller than the primary data set.

Impact of rating curve error on flood frequency analysis:

In the remaining data set of 144 stations, many had rating ratios (RR) considerably greater

than 1 (RR is defined by Equation 4.1). For any RFFA study, a large number of stations with

reasonably long record lengths are required and hence a trade-off needs to be made between

an extensive data set that includes stations with very large RR values and a smaller data set

with RR values restricted to what could be considered to be a “reasonable upper limit”.

A working method to decide on a cut-off RR value was determined by looking at the average

RR value and the maximum RR value for each station. From the histogram of RR values

shown in Figure 4.15 it can be seen that 90% of the RR values for all the recorded annual



maxima fall between 1 and 20. Thus it was decided that a cut-off RR value of 20 would be

reasonable, and that any station having an average RR value greater than 4 and a maximum

RR value greater than 20 would be rejected. Rating ratios significantly greater than one could

magnify the errors in flood frequency quantile estimates but, on the other hand, rejecting all

stations with RR greater than one would reduce the number of stations below the minimum

required for meaningful RFFA to be undertaken. Adopting the cut off values of RR,

mentioned above, and reduced the eligible number of stations from 144 to 131.

Figure 4.14 Time series graph showing significant trends after 1995

Vk - Station 230210

0

1

2

3

4

5

6

7

8

9

1970 1975 1980 1985 1990 1995 2000 2005

Year

Vk

Figure 4.15 CUSUM test plot showing significant trends after 1995

Station 230210

0

2000

4000

6000

8000

10000

12000

1970 1975 1980 1985 1990 1995 2000 2005 2010

Year

An

nu

al M

ax

imu

m F

low

(M

L/d

)

Decrease in flow

magnitude

Significant shift

downwards



Figure 4.15 Histogram of rating ratios (RR) of AM flood data in Victoria (stations with record

lengths > 25 years)

Outlier identification results

While the data checking revealed many ‘outliers’ in the flood series, these do not preclude the

use of the remaining flood data in RFFA. The results of the outlier detection procedure for

Victoria are summarised below.

43% of the stations were found to have low outliers. The maximum number of low

outliers detected in a data series was 5 and never exceed 19% of the total number of

data points in a series.

Most of the detected low outliers occurred for stations which were located in low

rainfall areas, especially in the western part of Victoria.

31% of low outliers occurred in the years 1982 and 1967. This is not surprising as

there were severe droughts during these two years; the maximum annual flows that

occurred in many rivers in these years were merely base flows, and not due to flood

events.

55% of the stations did not show any outliers. Even the values in drought years (1982

and 1967) were not low enough to be treated as low outliers. The locations of most of

these stations are in the south-eastern part of Victoria.

Histogram of Rating Ratio Values

384

111

61

19 18 18

9 10 10

45

1

4

2

4

1 1

2

3

2

1

2

0 0

5

4387

1

10

100

1000

10000

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50More

Ratio Ratio (RR)

Fre

qu

en

cy

Frequency

90% of rating ratios

between 1 & 20



Only 1 station shows a high outlier.

The detected outliers were treated as censored flows in flood frequency analysis using FLIKE

(that is the information that there is no flood in that year was taken into account).

Final data set from Victoria:

As noted earlier, a total of 415 stations, each with a minimum record length of 10 years, was

initially selected. After in-filling the gaps in the AM flood series, trend analysis, and

introduction of a cut-off record length of 25 years, only 131 stations remained, which

represents about one-third of the initially selected stations. The distribution of streamflow

record lengths of the selected 131 stations is shown in Figure 4.16. The statistics of record

lengths of these 131 stations are summarised below.

Record lengths range from 25 to 52 years, mean 32 years, median 32 years and standard

deviation 5 years;

87% of the stations have record lengths in the range 25-35 years;

8% of the stations have record lengths in the range 35-45 years; and

5% of the stations have record lengths in the range 50-55 years.

The catchment areas of the selected 131 catchments range from 3 to 997 km2 (mean: 321 km2

and median: 289 km2). The distribution of catchment areas is shown in Figure 4.25. The

statistics of catchments areas of the selected 131 catchments are summarised below:

15 catchments (11%) are in the range of 3 to 50 km2;

11 catchments (8%) are in the range of 51 to 100 km2;

78 catchments (60%) are in the range of 101 to 499 km2; and

27 catchments (21%) are in the range of 500 to 997 km2.

The geographical distribution of the finally selected 131 stations is shown in Figure 4.18.

There is no station in north-western Victoria that passed the selection criteria. This region is

characterized by very low runoff and ephemeral streams.



4.5 Flood frequency analysis

For each of the selected stations, at-site flood frequency analysis was carried out using ARR

FLIKE (Kuczera, 1999) software. The detected low flows were censored using in-built

facility in the FLIKE. A LP3 distribution with the Bayesian fitting method was adopted to

estimate flood quantiles for ARIs of 2, 5, 10, 20, 50 and 100 years. These flood quantiles were

used as dependent/target variables in the RFFA adopted in this thesis.

23

78

20

35

2

0

10

20

30

40

50

60

70

80

90

25 - 29 30 - 34 35 - 39 40 - 44 45 - 50 51 - 55


Fre

qu

en

cy

Figure 4.16 Distributions of streamflow record lengths of the selected 131 stations from Victoria

6

20

24

18

23

13

6

10

45

2

0

5

10

15

20

25

30

0 - 25 26 -

100

101 -

200

201 -

300

301 -

400

401 -

500

501 -

600

601 -

700

701 -

800

801 -

900

901 -

1000


Fre

qu

en

cy

Figure 4.17 Distributions of catchment areas of the selected 131 catchments from Victoria



Figure 4.18 Geographical distributions of the selected 131 catchments from Victoria

4.6 Summary of catchment characteristics data

For each of the selected catchments, five catchment characteristics data were obtained

following the procedures mentioned in section 4.5.2. Figure 4.19 shows the selected

catchments from NSW, ACT, VIC, QLD and TAS. The catchments from NSW and ACT will

be considered and discussed in this thesis as NSW.

Figure 4.19 Locations of the study catchments



The summary statistics of the catchment characteristics data set of the selected catchments are

provided in Table 4.1.

Table 4.1 Summary statistics of the catchment characteristics data

Variables Range Median Mean

Standard

Deviation

Catchment area (A), km2 1.3 to 1900 255.5 329.4 277.3

Mean annual areal evapo-transpiration (E), mm/y 410.1 to 1543.3 998.5 977.8 188.9

Mean annual rainfall (R), mm 416 to 4348 1005.6 1185.8 603.5

Main stream slope (S), m/km 0 to 197.7 7.7 11.3 16.8

Design rainfall intensity - 2 years ARI and time of

concentration of tc hours (I_tc_2), mm/h 2.9 to 43.1 8.9 10.9 6.1








concentration of tc hours (I_tc_50), mm/h 5.4 to 757 17.7 23.4 36.7


concentration of tc hours (I_tc_100), mm/h 6.0 to 91 20.1 24.5 14.0

4.7 Summary

A total of 452 catchments have been selected from eastern Australia as the study catchments

for this study. Among them, 96, 131, 172 and 53 catchments have been selected from the

states of NSW, VIC, QLD and TAS, respectively. The locations of these catchments are

shown in Figure 4.19. The streamflow data have been prepared for these catchments. At site

flood quantiles have been estimated using ARR FLIKE software for ARIs of 2, 5, 10, 20, 50

and 100 years using Bayesian LP3 distribution. For each of the selected catchments, five

catchment characteristics data have been extracted. These data will now be applied in the

following chapters to develop and test artificial intelligence based RFFA techniques.



CHAPTER 5

SELECTION OF PREDICTOR VARIABLES FOR

ARTIFICIAL INTELLIGENCE BASED RFFA

MODELS

5.1 General

The focus of this thesis is to develop regional prediction models for design flood estimation

using various artificial intelligence based techniques namely artificial neural networks (ANN),

adaptive neuro-fuzzy inference system (ANFIS), genetic algorithm (GA) and gene expression

programming (GEP). In Chapter 4, five candidate predictor variables were selected for RFFA.

This chapter focuses on the selection of final set of predictor variables from these candidate

predictor variables that can be used in developing the artificial intelligence based RFFA

models. In this chapter, predictor variables are selected based on the ANN and GEP based

RFFA modelling, and it is assumed that the same set of predictor variables will be applicable

to the GA and ANFIS based RFFA models.

5.2 Initial selection of predictor variables for artificial intelligence

based RFFA models

The variables adopted by similar previous RFFA studies were first examined (see Table 5.1).

It was found that all the mentioned previous studies adopted catchment area and mean annual

rainfall as the predictor variables and hence these were included as candidate predictor

variables in this thesis. Design rainfall intensity and evaporation were adopted by three

previous Australian studies, and hence these were included in this study. Main stream slope

was adopted by all but one study and hence it was included in this study. To use the design

rainfall intensity, one needs duration of rainfall and average recurrence interval (ARI); in this

study, 6 different combinations of durations and ARIs were adopted. Hence, this study

included a total of 10 predictor variables; however six of them represent design rainfall

intensity of different durations and ARIs. The correlations of these 10 variables are plotted in

Figure 5.1, which shows that 6 different rainfall intensities are highly correlated, which



indicates that the use of only one design rainfall intensity is desirable in the final prediction

equation since the use of highly correlated variables does not add any extra information to the

model. At the first stage of model development, different models based on various

combinations of initially selected predictor variables (A, I_tc_ARI, R, S, and E) were formed.

The candidate models are shown in Table 5.2.

Table 5.1 Catchment characteristics predictor variables used in some previous RFFA

studies Authors Country Predictor variables adopted

Flavell (2012) Australia Catchment area, mean annual rainfall, mainstream slope, main-channel

length, and 12 and 24 hours statistical rainfall totals.

Griffis and

Stedinger (2007) USA

Catchment area, mean annual rainfall, runoff measured, mainstream

slope, main-channel length, forest cover, and storage measured as the

percent of catchment area.

Haddad and

Rahman (2012) Australia

Catchment area, design rainfall intensity, mean annual rainfall, mean

annual evapo-transpiration, stream density, mainstream slope, stream

length, and forest cover.

Muttiah et al.

(1997) USA Catchment areas, mean annual rainfall, and mean basin elevation.

Rahman (2005) Australia

Catchment area, design rainfall intensity, mean annual rainfall, mean

annual rain days, mean annual Class A pan evaporation, mainstream

slope, river bed elevation at the gauging station, maximum elevation

difference in the basin, stream density, forest cover, and fraction

quaternary sediment area.

Shu and Oarda

(2008) Canada

Catchment area, mean annual rainfall, mainstream slope, fraction of the

basin area covered with lakes and annual mean degree-days.

Riad et al. (2004) Morocco Catchment area and mean annual rainfall.

For the five predictor variables, there could be 31 different models. However, all these models

may not necessarily be useful since some combination of variables would only result in

weaker RFFA models. For example, catchment area has been found to be the most important

predictor variable in almost all the previous RFFA studies as shown in Table 5.1. The second

most important predictor variable has been reported to be design rainfall intensity (e.g. Javelle

et al., 2002; Jingyi and Hall, 2004). Hence, the combination of these two predictor variables is

likely to result in the most significant prediction equation than that is delivered by any two

other variables. In fact, previous Australian RFFA studies have found that these two predictor

variables generate the best RFFA prediction equation (e.g. Haddad and Rahman, 2012;

Haddad et al., 2014).



400020000 2001000 40200 50250 100500

1500

1000

5004000

2000

0 2000

1000

0200

100

040

20

0200

100

0200

100

0

200

100

0 800

400

0

15001000500

100

50

0

200010000 40200 2001000 8004000

evap

rain

area

slope

I_tc_2

I_tc_5

I_tc_10

I_tc_20

I_tc_50

I_tc_100

Matrix plot

Figure 5.1 Plot representing bi-variate correlations of the candidate predictor variables

In this study, eight different models are considered as shown in Table 5.2, which contain

catchment area and design rainfall intensity and combinations of the other three predictor

variables. This approach, however, makes an assumption that there is no other combination of

predictor variables (from these five variables) that would deliver a better model than any one

of these eight models. This assumption seems to be justified.

ANN and GEP based RFFA models were developed for each of the eight combinations of

predictor variables based on 362 training/model catchments. The details of the training of the

the ANN and GEP based RFFA models are presented in Chapter 7. The developed models

were then tested using 90 validation/test catchments. Prediction equation was developed for



each of the 2, 5, 10, 20, 50 and 100 years ARI flood quantiles. The set of predictor variables

giving the best results based on the 90 independent test catchments were finally selected.

Table 5.2 Various candidate models and catchment characteristics used

Model ID Variables Description of variables (details in section 5.2)

1 A, I_tc_ARI

A: catchment area

I_tc_ARI : design rainfall intensity

S: slope

E: evapo-transpiration

R: mean annual rainfall

2 A, I_tc_ARI, S

3 A, I_tc_ARI, E

4 A, I_tc_ARI, R

5 A, I_tc_ARI, S, E

6 A, I_tc_ARI, R, E

7 A, I_tc_ARI, R, S

8 A, I_tc_ARI , R, S, E

The following statistical measures were used to compare various RFFA models:

Ratio between predicted and observed flood quantiles:

Ratio of predicted and observed flood quantile = obs

pred

Q

Q (5.1)

Relative error (RE):

RE (%) = Abs

100

obs

obspred

Q

QQ (5.2)

Coefficient of efficiency (CE):

CE = 1 -

n

i

pred

n

i

predobs

QQ

QQ

1

2

1

2

)(

)(

(5.3)

Where Qpred is the flood quantile estimate from the ANNs-based or GEP based RFFA model,

Qobs is the at-site flood frequency estimate obtained from LP3 distribution using a Bayesian

parameter fitting procedure (Kuczera, 1999) and Q is the mean of Qobs. The median relative

error and median ratio values were used to measure the relative accuracy of a model. A

Qpred/Qobs ratio closer to 1 indicates a perfect match between the observed and predicted value

and a smaller median relative error is desirable for a model. A CE value closer to 1 is the best;

however a value greater than 0.5 is acceptable.



5.3 Selection of Predictor variables for ANN based RFFA models

In the first stage, various ANN based RFFA models were compared based on median

Qpred/Qobs ratio, RE and CE values. Table 5.3 shows the median Qpred/Qobs ratio, RE and CE

values for various ANN based RFFA models. In the case of ANN, in terms of median

Qpred/Qobs ratio values for different models, the values range from 0.94 (Model 3 and Model 8)

to 1.69 (Model 6) with the best median Qpred/Qobs ratio value of 1.01 (Model 4) and good but

slightly under predicted value of 0.99 (Model 1). Models 2, 3, 4, 5, 6, 7 and 8 produce some

very good median Qpred/Qobs ratio values but for some ARIs they show notable variation e.g.,

Model 6 produces median Qpred/Qobs ratio value as 1.02 for Q50 but, 1.24 and 1.57 for Q2 and

Q20 respectively. Similarly, Model 6 median Qpred/Qobs ratio values range from 1.03 to 1.69.

Model 7 produces an overall median Qpred/Qobs ratio value of 1.17, with 36% over-prediction

for Q2 and 4% under-prediction for Q100. A clear inconsistency can be found in these models

with overall median Qpred/Qobs ratio values of 1.10 to 1.27. In case of Model 2, reasonably

good median Qpred/Qobs ratio values can be seen for all the ARIs except for Q10 with an

overestimation of 31% and an overall median Qpred/Qobs ratio value of 1.11. Model 1

consisting ‘A’ and I_tc_ARI outperforms the other models producing an overall median

Qpred/Qobs ratio value of 1.06 and ranging from 0.99 for Q5 to 1.14 for Q50. This model is

ranked as number 1 on the basis of median Qpred/Qobs ratio showing the consistency and good

estimates for all the ARIs.

The RE values for ANN based RFFA models for different ARIs range from 30.65% (Model

2) to 78.77% (Model 6) as mentioned in Table 5.3. Notable higher values can be seen for

Models 5, 6, 7 and 8 ranging from 40.28% to 78.77%. Models 3 and 4 produce RE values in

the range of 39.35% to 60.08%. But, for higher ARIs these two models are unable to maintain

this consistency especially for Q50 and Q100 with RE values of 55% and 60%. It can be seen

that Models 1 and 2 outperform the other models with RE values ranging from 30.65% to

50.01% and the overall values of 39.74% to 44.07%. In case of Model 2, a higher RE value

can be seen for smaller ARIs but it produces good result for 20 years ARI. Model 1 dominates

Model 2 in terms of consistency and competitive RE values for all the ARIs. Hence, Model 1

is regarded as the top model in terms of RE value.

Furthermore, when comparing different models for CE values, it can be found that Models 1,

2, 3 and 4 outperform the remaining four models. A poor performance can be seen in case of

Models 5, 6, 7 and 8 both for smaller and higher ARIs. Models 3 and 4 perform closely except

for Q10 where CE value is 0.72 for Model 3 as compared to 0.56 for Model 4. Overall, Models



1 and 2 are found to be performing well with CE value as 0.66. However, Model 1 exhibits

more consistency and better CE values for different ARIs when compared with closely

performing Model 2 as shown in Table 5.3. On the basis of results shown in Table 5.3, Model

1 (two variables) can be ranked as top model followed by Model 2 (three variables).

Table 5.3 Comparison of eight different ANN based RFFA models using 90 independent

test catchments

Models

Quantiles Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8

CE

Q2 0.73 0.68 0.78 0.76 0.37 0.21 0.40 0.70

Q5 0.61 0.52 0.65 0.65 0.79 0.28 0.71 0.70

Q10 0.63 0.78 0.72 0.56 0.46 0.40 0.39 0.67

Q20 0.71 0.72 0.68 0.68 0.57 0.67 0.27 -0.19

Q50 0.68 0.68 0.59 0.62 -0.33 0.36 0.54 0.14

Q100 0.52 0.57 0.44 0.55 0.37 0.33 0.45 0.37

Average 0.66 0.66 0.64 0.63 0.37 0.38 0.46 0.40

Median 0.64 0.68 0.67 0.64 0.42 0.35 0.43 0.52

Qpred/Qobs

(median)

Q2 1.04 1.09 0.94 1.26 1.37 1.19 1.36 1.20

Q5 0.99 1.03 1.02 1.13 1.13 1.09 1.08 1.24

Q10 1.02 1.31 1.31 1.07 1.34 1.41 1.21 1.06

Q20 1.04 1.09 1.06 1.01 1.26 1.07 1.19 0.94

Q50 1.14 1.06 1.41 1.16 1.17 1.69 1.22 1.11

Q100 1.10 1.09 1.14 1.30 1.37 1.03 0.96 1.03

Average 1.06 1.11 1.15 1.16 1.27 1.25 1.17 1.10

Median 1.04 1.09 1.14 1.16 1.27 1.19 1.19 1.10

RE (%)

(median)

Q2 37.56 49.93 44.22 46.98 55.75 40.28 61.36 44.05

Q5 40.39 50.01 39.60 44.25 49.56 57.66 38.28 46.78

Q10 44.63 43.98 55.26 39.35 49.87 55.01 55.20 44.68

Q20 35.62 30.65 49.42 40.69 47.48 51.66 46.90 52.95

Q50 39.09 44.00 55.01 41.10 69.61 78.77 46.66 66.80

Q100 44.53 44.13 51.18 60.08 55.75 53.11 53.72 49.20

Median 39.74 44.07 50.30 42.68 52.81 54.06 50.31 47.99

Average 40.30 43.78 49.12 45.41 54.67 56.08 50.35 50.74

In the second stage, the ANN based RFFA models are ranked on the basis of median

Qpred/Qobs ratio values. A criterion is developed to rank the models for different ARIs and the

catchments are rated as ‘good’, ‘reasonable’, ‘bad’ and ‘very bad’ as shown in Tables 5.4 and

5.5. In this stage, two top ranked models found in the first stage (i.e. Models 1 and 2) are

selected for comparison.



From Table 5.5, it is clear that Model 1 outperforms Model 2 in terms of ‘good’ groupings

except for Q10 and Q20 with very small difference. On the other hand, Model 1 shows higher

number of stations in ‘reasonable’ groupings and lower number of stations for ‘bad’ and ‘very

bad’ groupings. Thus it can be concluded that Model 1 outperforms Model 2 when catchments

are rated on the basis of median Qpred/Qobs ratio. Table 5.6 and Table 5.7 show the comparison

between the best performing Model 1 and Model 2. As shown in Table 5.6, Model 1 provides

a median Qpred/Qobs ratio value closer to 1 as compared to Model 2 except for Q50. Similarly,

as shown in Table 5.7, Model 1 shows much smaller values of median RE for Q2, Q5 and Q50,

a similar median RE values for Q10 and Q100 and a higher median RE value for Q20. These

results demonstrate that overall Model 1 outperforms Model 2 for the ANN based RFFA

models.

Table 5.4 Rating on the basis of median Qpred/Qobs ratio

Group Ratios (Median)

Very bad less than 0.25 and above 4

Bad 0.26-0.49 and 2-4

Reasonable 0.5-0.69 and 1.41-2

Good 0.7-1.4

Table 5.5 Grouping of stations on the basis of median Qpred/Qobs ratio using the criteria of

Table 5.4 (ANN based RFFA models)

Model 1 Model 2

No. of stations No. of stations

Quantile Very bad Bad Reasonable Good Very bad Bad Reasonable Good

Q2 6 18 27 39 7 25 27 31

Q5 6 20 24 40 10 24 23 33

Q10 5 21 31 33 5 19 30 36

Q20 5 20 24 41 9 20 15 46

Q50 14 19 19 38 11 21 20 38

Q100 8 23 27 32 11 23 23 33

Overall (%) 9.7 26.8 33.6 49.3 11.7 29.2 30.5 48.0



Table 5.6 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio

value using 90 independent test catchments (ANN based RFFA models)

Quantiles Median Qpred/Qobs ratio

Model 1 Model 2

Q2 1.04 1.09

Q5 0.99 1.03

Q10 1.02 1.31

Q20 1.04 1.09

Q50 1.14 1.06

Q100 1.10 1.09

Table 5.7 Comparison of Model 1 and Model 2 on the basis of median relative error

(RE) values using 90 independent test catchments (ANN based RFFA models)

Quantiles RE (median) (%)

Model 1 Model 2

Q2 37.56 49.93

Q5 40.39 50.01

Q10 44.63 43.98

Q20 35.62 30.65

Q50 39.09 44.01

Q100 44.53 44.13

5.4 Selection of predictor variables based on GEP models

In the first stage, various GEP based RFFA models are compared based on median Qpred/Qobs

ratio, RE and CE values. Table 5.8 shows the median Qpred/Qobs ratio, RE and CE values for

various GEP based RFFA models. The median Qpred/Qobs ratio values range from 0.06 (Model

5) to 2.07 (Model 8) with the best median Qpred/Qobs ratio value of 1.02 (for Model 1 and

Model 7). Other models produce some very good median Qpred/Qobs ratio values but for some

ARIs they show notable variation e.g., Model 4 produces median Qpred/Qobs ratio value as 0.99

for Q5 but 1.49 and 1.42 for Q20 and Q10 respectively. Similarly, Model 8 median Qpred/Qobs

ratio value ranges from 0.02 to 1.50. Model 3 produces overall median Qpred/Qobs ratio value

of 0.97, with 57% over-prediction for Q50 and 89% under-prediction for Q100. A clear

inconsistency can be found in these models with overall median Qpred/Qobs ratio values of 1.10

to 1.27. In case of Models 2 and 3, reasonably good values can be seen for all the ARIs except

for Q20 (Model 2) and Q100 (Model 3) with an overestimation of 54% and a poor performance

for Q100 with median Qpred/Qobs ratio value of 0.22. Model 1 consisting variables A and I_tc_ARI



outperforms the other models producing an overall median Qpred/Qobs ratio value of 1.06 and a

range from 1.02 for Q20 and Q100 and 1.10 for Q5. Hence Model 1 can be ranked as number 1

on the basis of median Qpred/Qobs ratio.

Table 5.8 Comparison of eight different GEP based RFFA models using 90 independent

test catchments

Models

Quantiles Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8

CE

Q2 0.49 0.63 0.59 0.65 0.59 0.62 0.73 0.55

Q5 0.67 0.64 0.67 0.68 0.27 0.56 0.69 0.46

Q10 0.56 0.39 -9.59 -5.10 -14.09 0.25 0.46 -0.77

Q20 0.67 -2.86 0.52 0.56 0.62 0.63 0.61 0.58

Q50 0.63 0.33 -0.87 0.54 0.49 -17.74 0.10 0.10

Q100 0.67 -0.01 -0.28 -0.29 -0.44 -0.20 -0.20 -0.27

Average 0.61 -0.15 -1.49 -0.49 -2.09 -2.65 0.40 0.11

Median 0.65 0.36 0.12 0.55 0.38 0.41 0.53 0.28

Qpred/Qobs

(median)

Q2 1.07 1.30 0.94 1.13 0.93 1.07 0.98 1.21

Q5 1.10 0.98 0.99 0.99 1.24 1.05 1.02 1.50

Q10 1.04 1.13 1.08 1.42 0.94 1.29 1.35 1.18

Q20 1.02 1.54 1.13 1.49 1.26 1.20 1.32 1.33

Q50 1.05 1.25 1.57 1.18 1.20 -0.62 1.97 2.07

Q100 1.02 0.22 0.11 0.26 -0.06 0.27 0.10 0.02

Average 1.02 1.07 0.97 1.08 0.92 0.71 1.12 1.22

Median 1.03 1.19 1.03 1.16 1.07 1.06 1.17 1.27

RE (%)

(median)

Q2 45.87 50.28 81.35 70.28 76.48 66.28 43.78 70.09

Q5 44.95 56.16 57.29 46.01 85.03 46.63 49.06 58.08

Q10 42.08 64.72 43.42 55.57 56.01 91.40 56.11 45.78

Q20 41.53 93.67 46.93 51.31 43.72 43.09 47.08 51.56

Q50 37.87 61.25 70.60 50.44 61.30 218.36 96.53 107.00

Q100 44.47 82.18 88.77 78.55 107.73 76.78 90.15 98.19

Median 43.27 62.98 63.94 53.44 68.89 71.53 52.59 64.09

Average 42.97 68.04 64.73 58.69 71.71 90.42 63.79 71.78

The RE values for various GEP based RFFA models for different ARIs range from 37.87%

(Model 1) to 218.36% (Model 6) as can be seen in Table 5.8. Notable higher RE values can

be seen for Models 5, 6, and 8 ranging from 43% to 218%. Models 4 and 7 produce RE values

in the range of 43% to 96%. Despite comparatively higher RE values, a consistency can be



found for these two models. But, for higher ARIs these two models are unable to maintain this

consistency especially for Q50 and Q100 with RE values of 96% and 88%. Models 1, 2 and 3

outperform the other models with RE values ranging from 37% to 88% and the overall values

of 43% to 63%. In case of Model 2, a higher RE values can be seen for medium to higher

ARIs but it produces good results for Q2 and Q5. Overall, Model 1 dominates Model 2 in

terms of consistency and competitive RE values for all the ARIs and hence Model 1 is ranked

number 1.

Furthermore, when comparing different models with respect to CE values, it is found that

Models 1, 2 and 7 outperform the remaining five models. A poor performance can be seen in

case of these five models. Overall, a good performance can be seen in case of small to

medium ARIs for Models 2 and 7; however, they perform poorly in case of higher ARIs.

Overall, Model 1 is found to be performing well with an average CE value of 0.61. However,

Model 1 exhibits more consistency and better CE values for different quantiles when

compared with closely performing Models 2 and 7, as shown in Table 5.8.

Hence, for GEP based RFFA models, Model 1 with two predictor variables (A, I_tc_ARI)

outperforms other models with respect to median Qpred/Qobs ratio, RE and CE values as

demonstrated in Table 5.8.

At the second stage, the GEP based RFFA models are ranked on the basis of median

Qpred/Qobs ratio values as shown in Table 5.9. Similar to ANN based RFFA models, criterion

is developed to rank the models for different ARIs and the catchments are rated as ‘good’,

‘reasonable’, ‘bad’ and ‘very bad’ as shown in Table 5.9. In the second stage two top ranked

models (selected in stage 1) (Model 1 and Model 2) are selected for comparison. From Table

5.9, it is clear that Model 1 outperforms Model 2 in terms of ‘good’ except for smaller ARIs.

Also, Model 1 shows higher number of stations in ‘reasonable’ grouping and lower number of

stations in the ‘bad’ and ‘very bad’ groupings. Hence, it can be concluded that Model 1

outperforms Model 2 when catchments are rated on the basis of median Qpred/Qobs ratio.

Tables 5.12 and Table 5.13 show the comparison between best performing Models 1 and 2.

As shown in Table 5.10, Model 1 provides a median Qpred/Qobs ratio value closer to 1 as

compared to Model 2. Similarly, as shown in Table 5.13, Model 1 shows much smaller values

of median RE for all the ARIs.

On the basis of results shown in Table 5.8, Model 1 (A, I_tc_ARI) can be ranked as top model

followed by Model 2 (A, I_tc_ARI, S) for the GEP based RFFA models.



Table 5.9 Grouping of stations on the basis of median Qpred/Qobs ratio values using the

criteria of Table 5.4 for GEP based RFFA models

Table 5.10 Comparison of Models 1 and 2 on the basis of median Qpred/Qobs ratio values

using 90 independent test catchments (for GEP based RFFA models )

Quantiles Median Qpred/Qobs ratio

Model 1 Model 2

Q2 1.07 1.30

Q5 1.10 0.98

Q10 1.04 1.13

Q20 1.02 1.54

Q50 1.05 1.25

Q100 1.02 0.22

Table 5.11 Comparison of Models 1 and 2 on the basis of RE values using 90

independent test catchments (for GEP based RFFA models)

Quantiles RE (median) (%)

Model 1 Model 2

Q2 45.87 50.28

Q5 44.95 56.16

Q10 42.08 64.72

Q20 41.53 93.67

Q50 37.87 61.25

Q100 44.47 82.18

Model 1 Model 2

No. of stations No. of stations

Quantile Very bad Bad Reasonable Good Very bad Bad Reasonable Good

Q2 20 24 17 29 7 28 23 32

Q5 14 21 23 32 15 25 18 32

Q10 13 21 24 32 15 25 21 29

Q20 24 32 19 15 29 21 18 16

Q50 18 23 21 27 13 23 23 31

Q100 17 21 28 24 31 26 14 19

Overall (%) 23.5 31.4 29.2 35.2 24.3 32.7 25.9 35.2



5.5 Summary

This chapter has examined various combinations of predictor variables to select the best set to

be adopted in the RFFA modelling. Two artificial intelligence based modelling techniques

(ANN and GEP) are used to develop the prediction equations using data of the selected 362

catchments. Independent testing is performed using 90 test catchments. Models are assessed

based on ratio between predicted and observed flood quantiles, percent relative error and

coefficient of efficiency. Based on the independent testing, it has been found that the ANN

and GEP based RFFA models with only two predictor variables (catchment area and design

rainfall intensity) outperform other models with a greater number of predictor variables. This

model would be easier to apply in practice as the data for two predictor variables can be

obtained relatively easily from the published maps and government websites. In the

subsequent analyses presented in the next chapters, these two predictor variables

(catchment area and design rainfall intensity) will be used.



CHAPTER 6

SELECTION OF REGIONS

6.1 General

In regional flood frequency analysis (RFFA), one of the key steps is to identify the

acceptable/optimum region(s) which consist(s) of a set of gauged catchments that may be

treated as homogeneous. Previous chapters cover the selection of study area, catchment data

and the predictor variables to be used in the RFFA presented in this study. This chapter

focuses on the formation and comparison of regions based on state, geographic and climatic

boundaries as well as based on the catchment attributes. These regions are tested by

developing RFFA models using artificial neural network (ANN) technique and the best

performing region is then selected (as the optimum region) based on the results of the

comparison of the alternative regions. This optimum region is then used to develop RFFA

models using all the selected artificial intelligence based RFFA methods considered in this

thesis.

6.2 Description of candidate regions

To identify the optimum regions for RFFA modeling in eastern Australia, a number of

candidate regions are formed as discussed below.

Regions based on state and geographic boundaries

Initially, each of the states of Victoria (VIC), New South Wales (NSW), Queensland (QLD)

and Tasmania (TAS) are treated as a separate region. The data for each of these regions are

discussed in detail in section 4.6. These states cover the eastern part of Australia (Figure 4.1).

These candidate regions are shown in Table 6.1.

Regions based on climatic boundaries

The Australian northern part is dominated by summer rainfall and the southern part is mainly

dominated by winter rainfall. In this step, data set is divided into two sub-sets i.e., summer

dominated rainfall region (SDRR) and winter dominated rainfall region (WDRR).



Combined data set

Here, the data for all the four states are combined to form one region. The detail of all the

candidate regions based on state boundaries, geographic and climatic conditions are shown in

Table 6.1.

Table 6.1 Description of candidate regions

Region label Description of region No. of stations Abbreviated region name

1 New South Wales 96 NSW

2 Victoria 131 VIC

3 Queensland 172 QLD

4 Tasmania 53 TAS

5 Combined Data Set 452 Combined

6 Summer Dominated Rainfall Region 203 SDRR

7 Winter Dominated Rainfall Region 249 WDRR

6.2.1 Selection of the best performing region based on state, geographic and

climatic boundaries

In each of these candidate regions, the available data set is divided into two parts: (i) 80% for

training (training data set); and (ii) 20% for testing/validation (validation data set). These sets

are selected randomly from the respective grouping. For each grouping, the ANN-based

RFFA model is built and used to predict 2, 5, 10, 20, 50 and 100 years ARI flood quantiles for

the selected 20% test catchments. The structure, algorithm and other criteria of ANN based

analyses are kept uniform throughout the analysis and are explained in Chapter 3.

Three statistical measures i.e. Qpred/Qobs ratio, relative error (RE) and coefficient of efficiency

(CE) (as mentioned in section 5.2) are used to assess the model performance.

Table 6.2 summarises the median Qpred/Qobs ratio values for the seven candidate regions. For

NSW candidate region, median Qpred/Qobs ratio for Q10 is too small (0.17) which indicates a

significant under-estimation. Also, for this region, Q50 shows remarkable over-estimation with

a median Qpred/Qobs ratio of 1.82. For VIC candidate region, all the median Qpred/Qobs ratios

seem to be reasonable with a range of 0.86 to 1.49. For QLD region, both Q50 and Q100 show

an excellent median Qpred/Qobs ratio closer to 1.00 and median Qpred/Qobs ratios are in the range

of 0.98 to 1.48, which appear to be reasonable. For TAS region, Q50 shows notable

overestimation with a median Qpred/Qobs ratio value of 2.46.



For SDRR and WDRR, results are better than the individual states except for Q50 for the

WDRR, which shows a notable overestimation with a median Qpred/Qobs ratio of 2.02. It seems

that when the region size increases, the median Qpred/Qobs ratio values are more consistent

over different ARIs. When all the data sets are combined together, the median Qpred/Qobs ratio

values show remarkable improvement with a range of 0.99 to 1.14, which appears to be

satisfactory. There are smaller differences in the median Qpred/Qobs ratio values across various

ARIs for the combined data set as compared to other regions as illustrated in Figure 6.1.

Table 6.2 Median Qpred/Qobs ratio values for seven ANN based candidate regions

Quantiles Candidate regions based on state, geographic and climatic boundaries

NSW VIC QLD TAS SDRR WDRR Combined

Q2 1.38 1.06 1.28 1.08 1.14 1.25 1.04

Q5 0.84 1.13 1.48 1.56 1.21 1.06 0.99

Q10 0.17 0.86 1.11 1.65 1.38 1.26 1.02

Q20 1.53 1.49 1.11 0.74 0.84 1.28 1.04

Q50 1.82 1.17 0.98 2.46 1.32 2.02 1.14

Q100 1.22 1.24 1.00 1.05 1.21 1.33 1.10

In terms of median of the absolute relative error values (Table 6.3), for NSW Q10 and Q50

show very high median relative error values, which are 91% and 82% respectively. The best

results are found for Q20 and Q100 with median relative error values close to 50%. For VIC

region, median relative error values for Q50 and Q100 are in the range of 66% to 78%, which

appear to be quite high. For QLD region, median relative error values are in the range of 37%

to 58% which seems to be consistent across various ARIs and the best result among the

individual states. For TAS region, Q50 has a very high median relative error value (146%), for

other ARIs results are quite reasonable. It seems that there is a sharp increase and decrease in

median relative error values from Q50 to Q100 which is unexpected. This indicates that for very

small data set (TAS region has only 53 stations) ANN-based RFFA model provides

inconsistent results across various ARIs.

For SDRR and WDRR, the median relative error values are in the range of 29% to 57% and

43% to 102%, respectively. Here all the median relative error values are in the reasonable

range except for Q50 for WDRR region. When all the data are combined the median relative



error values are consistent across all the ARIs (in the range of 37% to 44%). There are smaller

differences in the median relative error values across various ARIs for the combined data set

as compared to other regions, as illustrated in Figure 6.2. These results clearly show that the

combined data set provides the smallest median relative error values among all the seven

candidate regions, which is also consistent in terms of median Qpred/Qobs ratio values as

discussed before.

Table 6.3 Median relative error values (%) for seven ANN-based candidate regions

Quantile Candidate regions based on state, geographic and climatic boundaries

NSW VIC QLD TAS SDRR WDRR Combined

Q2 48.21 78.05 42.42 65.77 52.40 48.50 37.56

Q5 51.94 40.89 50.24 55.52 29.87 53.03 40.39

Q10 91.52 39.75 37.67 64.61 52.79 43.88 44.63

Q20 53.17 55.58 37.67 38.19 43.12 52.75 35.62

Q50 82.08 73.75 57.90 146.47 57.66 102.13 39.09

Q100 50.00 66.88 58.45 15.28 54.85 67.72 44.53

Overall 62.82 59.15 47.39 64.31 48.45 61.34 40.30

Figure 6.1 Plot of median Qpred/Qobs ratio values for different ARIs for selected regions



Figure 6.2 Median relative error (%) values for different ARIs for selected regions

6.3 Regions based on catchment characteristics data

To identify regions/groups of catchments in catchment characteristics data space, two

methods are adopted in this thesis: cluster analysis and principal component analysis. These

methods have been discussed in Chapter 3. In the cluster and principal component analyses,

five catchment characteristics variables (catchment area, design rainfall intensity, mean

annual evapo-transpiration, mean annual rainfall and main stream slope) are adopted.

6.3.1 Cluster analysis

The hierarchical cluster analysis

Hierarchical clustering is one of the most straightforward methods. For this study the

hierarchical clustering is used with a combination of Wards-Block method, as discussed in

Chapter 3.

K-means clustering

In this method all variables are given equal weights. The best results obtained from cluster

analysis are summarised in Table 6.4, which deliver two groupings: A1 (405 stations) and A2



(45 stations) from Wards-Block clustering and B1 (362 stations) and B2 (90 stations) from K-

Means clustering.

Table 6.4 Regions/groups formation by cluster analysis

Method Total no. of

stations Grouping Grouping

Out of cluster

stations

Wards-Block Cluster

combination 452 405 (A1) 45(A2) 2

K-Means Cluster 452 362 (B1) 90 (B2) 0



Figure 6.3 Dendrogram using average linkage between groups



Figure 6.3 (a) Section of Dendrogram using average linkage between groups



Figure 6.3 (b) Section of Dendrogram using average linkage between groups

In terms of median ratio values, for individual ARIs grouping A1 outperforms the other

groupings (A2, B1, and B2) except for Q20, where A2 performs better than A1 as shown in

Tables 6.5 and 6.6. When comparing the overall Qpred/Qobs ratio values, A1, B1 and B2

perform similarly (with median Qpred/Qobs ratio values 1.1 or 1.2); here, A2 performs quite

poorly with median Qpred/Qobs ratio value of 1.9. In terms of median relative error, grouping

A1 seems to be producing consistent and reasonable results. For grouping A2, median relative

error values for Q50 and Q100 are very high (164% and 191%, respectively), a similar

observation for Q50 for grouping B1 and Q5 and Q10 for grouping B2 can be seen in Tables 6.5

and 6.6. Overall, grouping A1 shows the best results among cluster groupings. However, if

both groupings A1 and A2 are compared (generated by Wards-Block cluster analysis method)

against groupings B1 and B2 (generated by K-means cluster analysis method), groupings B1

and B2 perform better than groupings A1 and A2. This shows that K-means cluster analysis

method has generated better groupings than the Wards-Block cluster analysis method.



Table 6.5 ANN based RFFA model performances for cluster groupings A1 & A2

Quantile Grouping A1

(405 stations)

Grouping A2

(45 stations)

ARI Qpred/Qobs ratio

(Median)

RE (Median)

(%)

Qpred/Qobs ratio

(Median)

RE (Median)

(%)

Q2 1.0 44.6 2.3 132.4

Q5 1.2 45.4 1.5 48.7

Q10 1.1 44.4 1.1 41.6

Q20 1.4 56.0 1.1 41.4

Q50 1.3 54.5 2.6 164.6

Q100 1.3 47.5 2.9 191.3

Overall 1.2 48.7 1.9 103.3

Table 6.6 ANN- based RFFA model performances for cluster groupings B1 & B2

Quantile Grouping B1

(362 stations)

Grouping B2

(90 stations)

ARI Qpred/Qobs ratio

(Median)

RE (Median)

(%)

Qpred/Qobs ratio

(Median)

RE (Median)

(%)

Q2 0.9 52.6 1.3 55.8

Q5 1.1 57.9 1.0 71.0

Q10 0.9 38.6 1.7 75.0

Q20 0.8 39.1 0.7 41.5

Q50 1.3 61.5 1.1 14.6

Q100 1.4 46.7 1.1 56.1

Overall 1.1 49.4 1.2 52.3

6.3.2 Principal component analysis

At the second stage, the principal component analysis (PCA) is undertaken. The eigenvalue

and the percentage variance explained for each of the derived 5 principal components are

listed in Table 6.7. The first two components have eigenvalues greater than 1, and account for

about 60% of the total variance. However, component 3 has eigenvalue not significantly

different from 1 (0.957). However, the component one (PC1) and component two (PC2)



account for more than 50% of the variation in the data, hence PC1 and PC2 may be deemed to

be adequate in capturing the bulk of the information in the data. The plots of PC1 vs PC2 are

shown in Figures 6.5 and 6.6. In Figure 6.5, two groups are formed based on PC1: Group C1

with PC1 0.0 and Group C2 with PC2 < 0. In Figure 6.6, similarly two groups are formed

based on PC2: Group D1 with PC 0 and Group D2 with PC2 < 0. Table 6.7 summarises

these groupings. Table 6.8, explains the component matrix later named as PC1 and PC2.

Table 6.9 explains the statistics of different variables used in this study.

Table 6.7 Eigenvalues and variance explained by the principal components

Component

Initial eigenvalues

Total % of variance Cumulative %

1 1.758 35.160 35.160

2 1.236 24.718 59.878

3 0.957 19.149 79.027

4 0.774 15.481 94.508

5 0.275 5.492 100.000

Table 6.8 Component matrix in principal component analysis

Component

1 2

Zevap -0.042 0.451

ZI_12_2 0.899 -0.209

Zrain 0.906 -0.156

Zarea -0.253 -0.708

Zslope 0.249 0.68



Figure 6.4 Scree plot from principal component analysis

Table 6.9 Descriptive statistics of standardised variables

Mean Standard deviation No. of data points

Zevap .0141 1.019 360

ZI_12_2 -.0413 0.976 360

Zrain -0.025 0.995 360

Zarea 0.017 1.036 360

Zslope 0.025 1.085 360

f

Figure 6.5 Grouping derived from PC1 vs PC2 plot based on PC1



Figure 6.6 Grouping derived from PC1 vs PC2 plot based on PC2

In each of these accepted candidate groupings, the available data set is divided into 80% for

training, and 20% for testing. Similar assessment criteria are used as mentioned in Section

5.4.1.

Table 6.10 shows the results of the performance assessment of the PCA-based groupings.

With respect to median Qpred/Qobs ratio values, grouping D1 outperforms other PCA-based

groupings. With respect to median relative error values, grouping D1 is the best performer.

Overall, groupings based on PC2 (i.e. groupings D1 and D2) outperforms the grouping based

on PC1 (which are groupings C1 and C2).

Now, if the best grouping based on cluster analysis (which are B1 and B2) are compared with

the best PCA-based grouping (which is D1 and D2), in terms of relative error, they perform

quite similarly, with little better performances for cluster analysis grouping B1 and B2.

Hence, it can be concluded that K-means cluster analysis generates the best performing

groups/regions in the catchment characteristics data space.

In the Tables 6.11 and 6.12, the results of the best catchment characteristics based groupings

(which are B1 and B2) are compared with various geographic regions as discussed in Section

6.2.1.

In the last step, the groups performing better in case of cluster analysis and PCA are compared

with the candidate regions based on geographic/state boundaries (Section 6.2.1). Table 6.11

and 6.12 summarise the results based on different candidate regions.



Table 6.10 Grouping based on principal component analysis

Grouping based on PC1

Grouping based on PC2

Quantile

s Grouping C1 Grouping C2 Grouping D1 Grouping D2

ARI

Qpred/Qobs

ratio

(Median)

RE

(Median)

(%)

Qpred/Qobs

ratio

(Median)

RE

(Median)

(%)

Qpred/Qobs

ratio

(Median)

RE

(Median)

(%)

Qpred/Qobs

ratio

(Median)

RE

(Median

)

(%)

Q2 1.3 48.1 1.4 55.1 1.5 52.3 1.8 80.7

Q5 1.4 64.0 1.2 62.5 1.4 48.4 1.0 47.8

Q10 1.4 44.8 0.9 51.6 1.1 48.7 1.2 35.4

Q20 1.3 59.7 1.4 54.4 1.2 41.1 1.5 45.9

Q50 1.2 58.3 1.2 53.0 1.1 50.3 1.4 60.7

Q100 0.5 91.5 1.2 44.1 1.5 53.5 0.9 68.5

Overall 1.2 61.1 1.2 53.5 1.3 49.1 1.3 56.5

In terms of median Qpred/Qobs ratio values both the groupings based on cluster analysis and

PCA outperform the groupings based in individual states as shown in Table 6.11. However,

the grouping A1 performs better than grouping D1 except for Q20 and Q100. But in terms of

consistency and an overall value of median Qpred/Qobs ratio, grouping A1 is found to perform

well. Finally grouping A1 is compared with combined data set. Both groupings perform

almost similar except for Q2 and Q10, but for the other ARIs combined data set outperform

grouping A1. Combined data set also shows an overall consistency and better average value

of median Qpred/Qobs ratio. Hence on the basis of median Qpred/Qobs ratio value, it can be

concluded that combined data set perform better than all other candidate regions.

Table 6.12 shows the median relative error values for grouping A1, D1, individual states and

combined data set. All the groups based on state boundaries show the poor performance

except for QLD which shows better results for small to medium ARIs. However this region

shows an overall poor performance. When A1 is compared with D1, it is noticed that overall

both groups perform approximately similar to each other. However, A1 performs better for

smaller ARIs while D1 performs well for higher ARIs; but overall, grouping based on cluster

analysis outperform the grouping based on PCA. Moreover, when the median relative error

values are compared between grouping A1 and combined data set; the latter is found to be

performing well except for Q2 as shown in Figure 6.8.



Hence on the basis of median Qpred/Qobs ratio and median relative error values it can be

concluded that combined data set perform better than all other candidate regions and can be

used for final model development.

Table 6.11 Median Qpred/Qobs ratio values for seven candidate regions

Quantiles Grouping A1

(cluster analysis)

Grouping D1

(PCA) NSW VIC QLD TAS Combined

Q2 1.0 1.5 1.4 1.1 1.3 1.1 1.4

Q5 1.2 1.4 0.8 1.1 1.5 1.6 1.1

Q10 1.1 1.1 0.2 0.9 1.1 1.6 1.2

Q20 1.4 1.2 1.5 1.5 1.1 0.7 1.1

Q50 1.3 1.1 1.8 1.2 1.0 2.5 1.1

Q100 1.3 1.5 1.2 1.2 1.0 1.0 1.1

Overall 1.2 1.3 1.2 1.2 1.2 1.4 1.1

Table 6.12 Median relative error (%)

Quantiles Grouping A1

(Cluster analysis)

Grouping D1

(PCA) NSW VIC QLD TAS Combined

Q2 44.6 52.3 48.2 78.1 42.4 65.8 56.2

Q5 45.4 48.4 51.9 40.9 50.2 55.5 41.4

Q10 44.4 48.7 91.5 39.8 37.7 64.6 39.1

Q20 56.0 41.1 53.2 55.6 37.7 38.2 37.2

Q50 54.5 50.3 82.1 73.7 57.9 146.5 40.0

Q100 47.5 53.5 50.0 66.9 58.4 15.3 39.6

Overall 48.7 49.1 62.8 59.2 47.4 64.3 42.3

Figure 6.7 Median Qpred/Qobs ratio values for different ARIs for candidate regions



Figure 6.8 Median relative error (%) values for different ARIs for candidate regions

Figure 6.9 Comparison of median relative error (%) values between combine data set and

grouping based on K-Means cluster analysis

6.4 Summary

This chapter has focused on the application of artificial neural network (ANN) based regional

flood frequency analysis (RFFA) in eastern Australia with a particular focus on the formation

of regions. Regions/groupings are first formed on the basis of state/geographic boundaries and

climatic boundaries. In the second step, the regions are formed in the catchment

characteristics data space based on cluster analysis and principal component analysis. It has

been found that that K-Means cluster analysis generates the best performing groups/regions in



the catchment characteristics data space. When compared with the geographic regions, some

state-based groupings perform poorer than the K-Means cluster groupings. Overall, the best

ANN based RFFA model is achieved when all the data of 452 catchments are combined

together, which gives a RFFA model with median relative error of 37% to 44%. Since all the

stations when combined together form the best performing region, this will be used in the

subsequent chapters for other artificial intelligence based RFFA model building.



CHAPTER 7

DEVELOPMENT OF ARTIFICIAL

INTELLIGENCE BASED RFFA MODELS

7.1 General

Previous two chapters have presented the selection of predictor variables and optimum region

for the development of artificial intelligence based RFFA models for eastern Australia. This

chapter presents the development of RFFA models based on the selected predictor variables

and optimum region using four artificial intelligence based methods, artificial neural networks

(ANN), genetic algorithm based artificial neural networks (GAANN), gene-expression

programing (GEP) and co-active neuro fuzzy inference system (CANFIS). A description of

these methods has been provided in Chapter 3.

The model development presented in this chapter involves training of a model using part of

the randomly selected data set. For this purpose, 80% (362 catchments) of the total 452

catchments are used to train the model (training data set) and the remaining 20% (90

catchments) are used to validate the model (validation data set). This division of the data set

has been done randomly. In the traditional hydrological model building sense, the

training/calibration of a model involves identification of a set of model parameters that allows

satisfactory transformation of selected model input(s) to model output(s). In case of

hydrological models, the calibration is generally carried out by a ‘trial and error’ method.

In this study, the artificial intelligence based models, which are basically black box type

models, are trained/calibrated using the training data set based on minimisation of the mean

squared error between the observed and predicted flood quantiles by the model (being trained)

for a given ARI for the training data set. The artificial intelligence based RFFA models are

also evaluated based on four criteria: median Qpred/Qobs ratio, plot of Qobs and Qpred, median

relative error (RE) and coefficient of efficiency (CE). This is initially done for the training

data set and then repeated for the validation data set. Models are ranked based on their relative

performances in relation to these criteria to identify the best trained/calibrated model.



7.2 Training of artificial intelligence based RFFA models

At the beginning each of the four artificial intelligence based RFFA models is trained using

MATLAB codes (developed as a part of this research) by minimising the mean squared error

between the observed and predicted flood quantiles for each of six ARIs (2, 5, 10, 20, 50 and

100 years). This is done using the training data set consisting of 362 catchments as mentioned

in Section 7.1. Table 7.1 and Figure 7.1 show the CE values for the ANN, GANN, GEP and

CANFIS based RFFA models. Among these four models, the GAANN is found have the

highest CE values for ARIs of 2, 5, 10 and 20 years. For ARIs of 50 and 100 years, the ANN

has the highest CE values. Considering all the six ARIs, GAANN has the highest CE value

(0.71) and the three other models have similar CE values in the range of 0.67 to 0.66.

Table 7.1 CE values of four artificial intelligence based RFFA models based on training

data set ARI (years) ANN GAANN GEP CANFIS

2 0.59 0.76 0.69 0.64

5 0.73 0.79 0.72 0.67

10 0.64 0.76 0.73 0.75

20 0.71 0.76 0.65 0.73

50 0.70 0.57 0.61 0.53

100 0.64 0.63 0.57 0.62

Overall 0.67 0.71 0.66 0.66

Figure 7.1 Plot of CE values of four artificial intelligence based RFFA models based on training

data set



Table 7.2 and Figure 7.2 show the median Qpred/Qobs ratio values for the four artificial

intelligence based RFFA models. The ANN based RFFA model shows the best performance

(i.e. Qpred/Qobs ratio value is closest to 1.00) for ARIs of 20, 50 and 100 years. Considering all

the six ARIs, the ANN outperforms the other three models with an overall Qpred/Qobs ratio

value of 1.09. The second best performance is demonstrated by the GEP (1.19), while the

GAANN and CANFIS perform similarly. In terms of consistency over the ARIs, GAANN,

GEP and CANFIS show very high Qpred/Qobs ratio values for some ARIs as can be seen in

Table 7.2. Here again, the ANN shows the best consistency over the ARIs.

Table 7.2 Median Qpred/Qobs ratio values of four artificial intelligence based RFFA

models based on training data set

ARI (years) ANN GAANN GEP CANFIS

2 1.03 1.22 0.99 1.76

5 1.12 1.20 1.08 0.99

10 1.06 1.02 1.08 0.87

20 1.10 1.11 1.17 1.26

50 1.08 1.52 1.45 1.04

100 1.15 1.18 1.39 1.36

Overall 1.09 1.21 1.19 1.21

Figure 7.2 Plot of median Qpred/Qobs ratio values of four artificial intelligence based RFFA

models based on training data set



Table 7.3 and Figure 7.3 show the median of the absolute relative error values for the ANN,

GAANN, GEP and CANFIS based RFFA models. It can be seen that ANN based RFFA

model outperforms the other models with a median RE value of 42.07% over all the six ARIs.

In some cases, the GAANN based RFFA model performs better or equal to the ANN based

model i.e. for ARIs of 2, 5, 20 and 100 years; however, for 50 years ARI it shows a very high

RE (60%). In terms of consistency over the ARIs, ANN outperforms the other three models.

Both GEP and CANFIS have quite high RE values (GEP = 54.02%, CANFIS = 59.46%).

Importantly, CANFIS shows very high RE values for 2 years ARI (94.02%) and 50 years ARI

(71.94%). Overall, in terms of RE value, the ANN is the best performer, followed by the

GAANN, GEP and CANFIS.

Table 7.3 Median RE (%) values of four artificial intelligence based RFFA models

(training)


2 43.75 40.92 73.3 94.02

5 39.53 39.31 43.91 43.55

10 39.14 41.01 43.25 45.27

20 40.38 40.29 54.61 46.07

50 43.32 60.00 54.22 71.94

100 46.30 45.28 54.82 55.89

Overall 42.07 44.47 54.02 59.46

Figure 7.3 Plot of median RE (%) values of four artificial intelligence based RFFA models

based on training data set



The predicted and the observed flood quantiles for the ANN based RFFA model for 20 years

ARI is shown in Figure 7.4 (the plots for the other five ARIs can be seen in Appendix B,

Figures B.1 to B.5). The reason of adopting 20 years ARI is that it is the most frequently

applied ARI in design. These plots generally present a good agreement between the predicted

and observed flood quantiles; however, there is some over-estimations by the ANN-based

RFFA model when the observed flood quantiles are smaller than about 50 m3/s for all the

ARIs except 50 years. Most of the training catchments are within a narrow range of

variability from the 45-degree line except for a few outliers, in particular for higher

discharges. Overall, the ANN based RFFA model shows better training results for higher

discharges.

Figure 7.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model

for Q20 (training data set)

Figure 7.5 shows the plot of predicted flood quantiles by the GAANN-based RFFA model

and the observed flood quantiles for 20 years ARI (the plots for the other four ARIs can be

seen in Appendix B, Figures B.6 to B.10). These plots show that GAANN based RFFA model

generally presents a good agreement between the observed and predicted flood quantiles;

however, for ARI of 50 years (Figure B.9) (and to some degree for ARI of 5 years), there is a

notable overestimation by the GAANN based RFFA model. Also, the 100 years ARI (Figure

B.10) shows a notable scatter around the 45-degree line, in particular for small and medium

discharges. Overall, the GAANN based RFFA model shows better training results for higher

discharges.



Figure 7.5 Comparison of observed and predicted flood quantiles for GAANN based RFFA

model for Q20 (training data set)

Figure 7.6 compares the predictcted flood quantiles by the GEP based RFFA model with the

observed flood quantiles for 20 years ARI (Q20) (the plots for the other four ARIs can be seen

in Appendix B, Figures B.11 to B.15). Figure 7.6 generally presents a good agreement

between the predicted and observed flood quantiles. For the 2 and 5 years ARIs (Figures B.11

and B.12, respectively), there are few outliers and for 50 and 100 years ARIs (Figures B.14

and B.15, respectively), there is noticeable over estimation by the GEP based RFFA model for

small to medium discharges. Overall, the GEP based RFFA model shows better training

results for higher discharges.

Figure 7.7 shows the plot of predicted flood quantiles by the CANFIS based RFFA model and

the observed flood quantiles for 20 years ARI (the plots for other ARIs can be seen in

Appendix B, Figures B.16 to B.20). Figure 7.7 shows an over estimation by the CANFIS

based RFFA model for smaller discharges for 20 years ARI. A very similar pattern can be

seen for ARI of 5 years (Figure B.17) and ARI of 100 years (Figure B.20). For ARI of 2 years

(Figure B.16) and ARI of 10 years (Figure B.18), number of outliers can be seen plus a

noticeable scatter around the 45-degree line. For 50 years ARI (Figure B.19), the scatter

around the 45-degree line is significant. Overall, the CANFIS based RFFA model shows

better training results for higher discharges for all the ARIs except 50 years.



Figure 7.6 Comparison of observed and predicted flood quantiles for GEP based RFFA model


Figure 7.7 Comparison of observed and predicted flood quantiles for CANFIS based RFFA




7.3 Comparison of training and validation results

7.3.1 ANN

The CE, median Qpred/Qobs ratio and median relative error values are compared in Table 7.4

for the training and validation datasets for the ANN based RFFA model. Figures 7.8, 7.9 and

7.10 compare the CE, median Qpred/Qobs ratio values and median relative error values,

respectively for the ANN based RFFA model. In terms of CE value, the best agreement

between the training and validation data sets is found for ARIs of 10, 20 and 50 years, a

reasonable degree of agreement is found for ARIs of 2 and 5 years and relatively poor

agreement is found for the ARI of 100 years where the CE value for the validation data set is

remarkably small. With respect to median Qpred/Qobs ratio value, the best agreement between

the training and validation data sets is found for 2 years ARI, a moderate agreement is noticed

for 10, 20, 50 and 100 years ARIs and a poor agreement is found for 5 years ARI. However,

for 5 years ARI validation data set gives a very good Qpred/Qobs ratio value (0.99). In relation

to the median relative error values, the best agreement between the training and validation

data sets is found for ARIs of 5 and 100 years, a moderate agreement for ARI of 50 years and

poor agreement for ARIs of 2 and 10 years. From these results, it is noted that the ANN based

RFFA model shows different degrees of agreement between the training and validation data

sets for different ARIs across the three criteria adopted here.

Table 7.4 Comparison of training and validation results for the ANN based RFFA model

Training Validation

ARI (years) CE Qpred/Qobs ratio

(median)

RE (%)

(median) CE

Qpred/Qobs ratio

(median) RE (median)

2 0.59 1.03 43.75 0.69 1.04 37.56

5 0.73 1.12 39.53 0.59 0.99 40.39

10 0.64 1.06 39.14 0.63 1.02 44.63

20 0.71 1.10 40.38 0.69 1.04 35.62

50 0.70 1.08 43.32 0.68 1.14 39.09

100 0.64 1.15 46.30 0.40 1.10 44.53

Overall 0.67 1.09 42.07 0.61 1.06 40.30

Figures 7.11 to 7.13 show some example plots generated during the training of the ANN

based RFFA model. Figure 7.11 shows the regression plot for the ANN based RFFA model

for the training and validation data sets for Q20 (the plots for other ARIs can be seen in

Appendix B, Figures B.41 to B.45) Figure 7.12 shows the training state of the ANN based



RFFA model for Q20 using 20,000 epochs and Figure 7.13 shows the plot for validation of

results for Q20.

Figure 7.8 Plot comparing the CE values given by the training and validation data sets for the

ANN based RFFA model

Figure 7.9 Plot comparing the median Qpred/Qobs ratio values given by the training and validation

data sets for the ANN based RFFA model



Figure 7.10 Plot comparing the median RE (%) values given by the training and validation data

sets for the ANN based RFFA model

Figure 7.11 Regression plot comparing the training and validation of the ANN based RFFA

model for Q20



Figure 7.12 Plot showing the training state of the ANN based RFFA model for Q20

0 10 20 30 40 50 60 70 80 900

500

1000

1500

2000

2500

3000

3500

4000

4500

Test Catchments

NN

Outp

ut

Vs A

ctu

al

NN output

Actual

10

20

30

40

50

60

Figure 7.13 Plot between Qobs and Qpred for the ANN based RFFA model for the validation data

set

7.3.2 GAANN

In Table 7.5, the CE, median Qpred/Qobs ratio and median relative error values are compared

for the training and validation datasets for the GAANN based RFFA model. Figures 7.14,



7.15 and 7.16 compare the CE, median Qpred/Qobs ratio and median relative error values,

respectively for the GAANN based RFFA model. In terms of CE value, the best agreement

between the training and validation data sets is found for ARIs of 2, 5, 20 and 100 years, a

moderate degree of agreement is found for ARI of 10 years and a relatively poor agreement is

found for the ARI of 50 years (for this, the CE value is 0.38, which is remarkably low). With

respect to median Qpred/Qobs ratio value, the best agreement between the training and

validation data sets is found for ARIs of 10 and 100 years, a moderate agreement is noticed

for ARIs of 2 and 20 years and a poor agreement is found for ARIs of 5 and 50 years. For 50

years ARI, the validation data set shows a good Qpred/Qobs ratio value (0.95) as compared with

a very high value (1.52) for the training data set. This shows that a poor performance during

the training does not always give a poor performance in the validation. In relation to the

median relative error values, the best agreement between the training and validation data sets

is found for ARIs of 50 and 100 years, a moderate agreement for ARI of 20 years and a very

poor agreement for ARIs of 2, 5 and 10 years. In particular, the relative error values for the

validation data set are remarkably high compared with the training data set for ARIs of 2, 5

and 10 years. This shows that a good performance during model training does not guarantee a

similar good performance during validation.

Table 7.5 Comparison of training and validation results for the GAANN based RFFA

model

Training Validation

ARI

(years) CE

Qpred/Qobs ratio

(median)

RE (%)

(median) CE

Qpred/Qobs ratio

(median)

RE

(median)

2 0.76 1.22 40.92 0.72 1.08 65.13

5 0.79 1.20 39.31 0.75 0.89 61.48

10 0.76 1.02 41.01 0.63 0.98 72.56

20 0.76 1.11 40.29 0.71 0.93 48.19

50 0.57 1.52 60.00 0.38 0.95 55.93

100 0.63 1.18 45.28 0.65 1.17 47.08

Overall 0.71 1.21 44.47 0.64 1.00 58.40




GAANN based RFFA model

Figure 7.15 Plot comparing the median Qpred/Qobs ratio values given by the training and

validation data sets for the GAANN based RFFA model




sets for the GAANN based RFFA model

7.3.3 GEP

The CE, median Qpred/Qobs ratio and median relative error values for the GEP based RFFA

model are compared in Table 7.6 for the training and validation datasets. Figures 7.17, 7.18

and 7.19 compare the CE, median Qpred/Qobs ratio and median relative error values,

respectively for the GEP based RFFA model. In terms of CE value, the best agreement


moderate degree of agreement is found for ARIs of 10 and 100 years and a relatively poor

agreement is found for the ARI of 2 years. The CE value for the validation data set for ARI of

2 years is quite low (0.49). With respect to median Qpred/Qobs ratio value, the best agreement

between the training and validation data sets is found for ARIs of 2, 5 and 10 years and a

moderate agreement is noticed for ARIs of 20, 50 and 100 years. In relation to median relative

error values, the best agreement between the training and validation data sets is found for

ARIs of 2, 5 and 10 years, a moderate agreement for ARIs of 20 and 100 years and a poor

agreement for ARI of 50 years. It should be noted that for 2 years ARI, both the training and

validation data sets exhibit a very high relative error value (73.3% and 69.38%).



Table 7.6 Comparison of training and validation results for the GEP based RFFA model

Training Validation

ARI

(years) CE

Qpred/Qobs ratio

(median)

RE (%)

(median) CE

Qpred/Qobs ratio


2 0.69 0.99 73.30 0.49 1.07 69.38

5 0.72 1.08 43.91 0.67 1.10 44.95

10 0.73 1.08 43.25 0.56 1.04 42.08

20 0.65 1.17 54.61 0.67 0.89 47.61

50 0.61 1.45 54.22 0.63 1.05 37.87

100 0.57 1.39 54.82 0.67 1.02 44.47

Overall 0.66 1.19 54.02 0.61 1.03 44.47


GEP based RFFA model




validation data sets for the GEP based RFFA model


sets for the GEP based RFFA model



7.3.4 CANFIS

The CE, median Qpred/Qobs ratio and median relative error values are compared in Table 7.8

for the training and validation datasets for the CANFIS based RFFA model. Figures 7.20,

7.21 and 7.22 compare the CE, median Qpred/Qobs ratio and median relative error values,

respectively for the CANFIS based RFFA model. In terms of CE value, the best agreement


reasonable degree of agreement is found for ARIs of 5 and 10 years and a significant

disagreement is found for the ARI of 2 years where the CE value for the validation data set is

-0.09 which is much smaller than 0.64 (the CE value for the training data set). With respect to

median Qpred/Qobs ratio value, the performance for both the training and validation data sets is

found to be in the acceptable range for all the ARIs except for 2 years. For 2 years ARI, the

Qpred/Qobs ratio value is relatively high for both the training data set (1.76) and validation data

set (2.81). Similarly, in relation to median relative error value, the performance of 5 and 10

years ARI is found to be the best; however, the worst performance is observed in the case of 2

years ARI for the training data set. Moreover, the best performance in the case of validation

data set is found for 20 years ARI, followed by 100 years ARI. The 2 years ARI shows a

relatively high median relative error value for the validation data set (180.77%). These results

show that the CANFIS based RFFA model is poorly trained/calibrated for 2 years ARI.

Table 7.8 Comparison of training and validation results for the CANFIS based RFFA

model

Training Validation

ARI (years) CE Qpred/Qobs ratio

(median)

RE (%)

(median) CE

Qpred/Qobs ratio


2 0.64 1.76 94.02 -0.09 2.81 180.77

5 0.67 0.99 43.55 0.54 0.95 48.92

10 0.75 0.87 45.27 0.67 0.79 51.97

20 0.73 1.26 46.07 0.72 1.18 34.48

50 0.53 1.04 71.94 0.55 0.93 59.20

100 0.62 1.36 55.89 0.59 1.31 42.63

Overall 0.66 1.21 59.46 0.50 1.33 69.66




CANFIS based RFFA model


validation data sets for the CANFIS based RFFA model




sets for the CANFIS based RFFA model

7.4 Selection of the best performing artificial intelligence based

RFFA model based on training

The training of the four artificial intelligence based RFFA models have been presented in

Section 7.2 using 362 catchments. It has been found that none of the four models perform the

best in all the adopted assessment criteria over the six ARIs, which makes it difficult to select

the best trained/calibrated model. Based on the four different criteria as shown in Table 7.9,

the performances of the four models are assessed in a heuristic manner. In this assessment, a

model is ranked based on four different criteria as shown in Table 7.9. Four different ranks

are used, with a relative score ranging from 4 to 1. If a model is ranked 1 for a criterion, it

scores 4. For ranks of 2, 3 and 4, scores of 3, 2 and 1, respectively are assigned.

Table 7.9 shows that the ANN based RFFA model has the highest score of 15, followed by

the GANN with a score of 12. The GEP receives a score of 10, while the CANFIS receives

only 7 making it the least favourable model in terms of its performance during training. The

ANN based model is placed at rank 1 in the 3 out of 4 criteria. Hence, it is decided that the

ANN based RFFA model is the best performing artificial intelligence based model in terms of

training/calibration of the model.

Table 7.10 shows the ranking of the four artificial intelligence based RFFA models based on

the agreement between the training and validation using three criteria. Four different ranks are

used with a relative score ranging from 4 to 1 as mentioned earlier. It is found that the ANN

and GEP based RFFA models both score 9, followed by the GAAANN and CANFIS.



Table 7.9 Ranking of the four artificial intelligence based RFFA models with respect to

training

Criterion Rank 1 Rank 2 Rank 3 Rank 4

Scatter plot of Qobs Vs Qpred ANN GANN CANFIS GEP

Median Qpred/Qobs ANN GEP GAANN/CANFIS #

Median RE ANN GAANN GEP CANFIS

Median CE GAANN ANN GEP/CANFIS #

Overall Score: ANN-15, GAANN-12, GEP-10, CANFIS-7

Table 7.10 Ranking of the four artificial intelligence based RFFA models with respect to

agreement between training and validation

Criterion Rank 1 Rank 2 Rank 3 Rank 4

Median

Qpred/Qobs

GEP

(Best agreement: Q2,

Q5, Q10, Q20

Moderate agreement:

Q50, Q100

Poor agreement: none)

ANN

(Best agreement:

Q2, Q10, Q100

Moderate

agreement: Q20, Q50

Poor agreement:

Q5)

CANFIS


Q10, Q20, Q50, Q100

Moderate agreement:

none

Very poor agreement:

Q2)

GAANN


Q10, Q20, Q100

Moderate agreement: Q5


Q50)

Median RE

(%)

GEP


Q5, Q10, Q20, Q100

Moderate agreement:

Q50

Poor agreement: none)

ANN

(Best agreement:

Q5, Q100

Moderate

agreement: Q50, Q20

Poor agreement:

Q2, Q10)

GAANN


Q100

Moderate agreement:

Q20


Q2, Q5, Q10)

CANFIS


Q10, Q20, Q50, Q100

Moderate agreement:

none

Significantly poor

agreement: Q2

Median CE

GAANN


Q5, Q20, Q100

Moderate agreement:

Q10

Poor agreement: Q50)

ANN

(Best agreement:

Q10, Q20, Q50

Moderate

agreement: Q2, Q5

Poor agreement:

Q100)

CANFIS


Q20, Q50, Q100

Moderate agreement:

Q5

Poor agreement: Q2)

GEP


Q20, Q50

Moderate agreement:

Q10, Q100

Poor agreement: Q2)

Overall Score: ANN-9, GEP-9, GAANN-7, CANFIS-5



Overall, ANN based RFFA model shows the best training/calibration and the CANFIS the

least favourable one.

7.5 Summary

In this chapter, four artificial intelligence based RFFA models (ANN, GAANN, GEP and

CANFIS) are developed. Some 80% (362 catchments) of the total 452 catchments are used to

train the model (training data set) and the remaining 20% (90 catchments) are used to validate

the model (validation data set). The selected artificial intelligence based models are basically

black box type models, which are trained/calibrated using the training data set, which involves

minimisation of the mean squared error between the observed and predicted flood quantiles

by the model (being trained) for a given ARI for the training data set. The artificial

intelligence based RFFA models are also evaluated based on four criteria: median Qpred/Qobs

ratio, plot of Qobs and Qpred, median relative error (RE) and coefficient of efficiency (CE).

This is initially done for the training data set and then repeated for the validation data set.

Models are ranked based on their relative performances in relation to these criteria to identify

the best trained/calibrated model.

It has been found that there is no model which performs the best for all the six ARIs over all

the adopted criteria. Overall, the ANN based RFFA model outperforms the three other models

(in terms of training/calibration). Hence, the ANN based RFFA model is the best calibrated

model.



CHAPTER 8

VALIDATION OF ARTIFICIAL INTELLIGENCE

BASED RFFA MODELS

8.1 General

Chapter 6 has discussed the formation of regions and selection of the best performing region

for RFFA in eastern Australia using artificial intelligence based methods. Based on the

available data of 452 natural catchments in NSW, VIC, QLD and TAS, it has been found that

the best results in RFFA can be obtained when data from these states are combined to form

one region. Chapter 5 has discussed the selection of the best set of predictor variables for the

RFFA model development. It has been found that two predictor variables i.e., catchment area

(A) and design rainfall intensity (Itc_ARI) deliver the best results in RFFA for eastern Australia.

Chapter 8 has developed/trained the RFFA models based on four artificial intelligence based

methods which are ANN, GAANN, GEP and CANFIS using data from 362 catchments. This

chapter presents the validation of these four RFFA models based on 90 independent test

catchments. The results based on these four models are also compared with QRT based RFFA

model. This chapter initially presents results in relation to each of the above four artificial

intelligence based models followed by an inter-comparison of these methods. Finally, the best

performing artificial intelligence based RFFA model is compared with the QRT based RFFA

model.

8.2 Validation of RFFA models

8.2.1 ANN

Figure 8.1 compares the predictcted flood quantiles for the selected 90 test catchments from

the ANN based RFFA model with the observed flood quantiles for 20 years ARI (Q20). The

observed flood quantiles are estimated using an LP3 distribution and Bayesian parameter

estimation procedure as discussed in Chapter 4. It should be noted here that the observed

flood quantiles are not free from error; these are subject to data error (such as rating curve

extrapolation error), sampling error (due to limited record length of annual maximum flood

series data), error due to choice of flood frequency distribution and error due to selection of



parameter estimation method. This error undermines the usefulness of the validation statistics

(e.g. RE); however, this provides an indication of possible error of the developed RFFA

model as far as practical application of the RFFA model is concerned. The ratio Qpred/Qobs and

RE values are used for the assessment of models; however, the CE value is not very useful

here as the mean of observed flood quantile is not known.

Figure 8.1 shows a good agreement overall between the predicted and observed flood

quantiles; however, there is some over-estimations by the ANN based RFFA model when the

observed flood quantiles are smaller than about 50 m3/s. Most of the test catchments are

within a narrow range of variability from the 45-degree line except for a few outliers. The

plots of predicted and observed flood quantiles for other ARIs can be seen in Appendix B

(Figures B.22 to B.25). The results are very similar for ARIs of 2, 5, 10 and 20 years. Results

for ARIs of 50 and 100 years (Figures B.24 and B.25, respectively) exhibit some

overestimation by the ANN based RFFA model for smaller to medium discharges.

Figure 8.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model

for Q20

Figure 8.2 shows the boxplot of relative error (RE) values of the selected test catchments for

ANN based RFFA model for different flood quantiles. It can be seen from Figure 8.2 that the

median RE values (represented by the thick black lines within the boxes) are located very



close to the zero RE line (indicated by 0 – 0 horizontal line in Figure 8.2), in particular for

ARIs of 2, 5, 10 and 20 years. However, for ARIs of 50 and 100 years, the median RE values

are located above the zero line with ARI of 50 years showing the highest departure, which

indicates an overestimation by the ANN based RFFA model. Overall, the ANN based RFFA

model produces nearly unbaised estimates of flood quantiles as the median RE values match

with the zero RE line quite closely as can be seen in Figure 8.2.

In terms of the spread of the RE (represented by the width of the box), ARI of 50 and 100

years present the highest RE band and ARIs of 2 and 5 years present the smallest RE band,

followed by ARI of 20 years and 10 years. The RE bands for 50 and 100 years ARIs are

almost double to RE bands of 2 and 5 years ARIs. This implies that ANN based RFFA model

provides the most accurate flood quantile estimates for 2 and 5 years ARIs, and the least

accurate flood quantiles for ARIs of 50 and 100 years. Overall. the boxplot in Figure 8.2

shows that better results in terms of RE values are achieved for the smaller ARIs (i.e. 2, 5, 10

and 20 years ARIs) as compared to higher ARIs for the ANN based RFFA model. Some

outliers (evidenced by notable overestimation with a positive RE) can be seen for all the

ARIs, which may need to be examined more closely for data errors or issues regarding the

hydrology and physical characteristics of these catchments; if these catchments are deemed to

be genuine outliers they should be removed to enhance the ANN based RFFA model;

however, this has not been undertaken in this thesis.

10050201052

300

200

100

0

-100

-200

-300

ARI (years)

RE

(%

)

0

Figure 8.2 Boxplot of relative error (RE) values for ANN based RFFA model



Figure 8.3 shows the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments

for ANN- based RFFA model for different ARIs. The median Qobs/Qpred ratio values

(represented by the thick black lines within the boxes) are located closer to 1 – 1 line (the

horizontal line in Figure 8.3), in particular for ARIs of 2, 5, 10 and 20 years. However, for

ARI of 50 years (and to a lesser degree for ARI of 100 years), the median Qobs/Qpred ratio

value is clearly located above the 1 – 1 line. These results indicate that the ANN based RFFA

model generally provides reasonably accurate flood quantiles with the expected Qobs/Qpred

ratio value very close to 1.00, although there is a noticeable overestimation for ARI of 50

years and 100 years. In terms of the spread of the Qobs/Qpred ratio values, ARI of 2 and 5 years

provide the lowest spread followed by ARIs of 20, 10, 100 and 50 years.

Considering, the RE and Qobs/Qpred ratio values as discussed above, it can be concluded that

ANN based RFFA model generally provide unbiased flood estimates for smaller to medium

ARIs (2 to 20 years); however, the model slightly overestimates the observed flood quantiles

for higher ARIs (50 to 100 years).

10050201052

3

2

1

0

-1

-2

-3

ARI (years)

Rati

o (

Qp

red

/Qo

bs) 1

Figure 8.3 Boxplot of Qpred/Qobs ratio values for ANN based RFFA model



8.2.2 GAANN

Figures 8.4, 8.5 and 8.6 show the validation results for GAANN based RFFA model. Figure

8.4 shows the plot of predicted flood quantiles by the GAANN based RFFA model and the

observed flood quantiles for 20 years ARI. Figure 8.4 shows a greater scatter than Figure 8.1

(which represents ANN based RFFA model); in particular, there is an underestimation of the

flood quantiles by the GAANN based RFFA model for few test catchments. Overall, the

scatter around the 45-degree line in Figure 8.4 is deemed reasonable for most of the test

catchments. The plots of predicted and observed flood quantiles for other ARIs can be seen in

Appendix B (Figures B.26 to B.30). The results are very similar for ARIs of 2, 5, 10 and 20

years. Results for ARIs of 50 and 100 years (Figures B.29 and B.30, respectively) exhibit

relatively better results by the GAANN based RFFA model, in particular for the higher

discharges.

Figure 8.4 Comparison of observed and predicted flood quantiles for GAANN based RFFA

model for Q20

Figure 8.5 shows the boxplot of RE (%) values for the GAANN based RFFA model. The

median RE values (represented by the black line within the boxes) match with the 0 – 0 line

very well for ARI of 10 years and reasonably well for ARIs of 2, 20 and 50 years. For ARIs

of 5 and 100 years, a noticeable underestimation and overestimation are provided by the



GAANN based RFFA model. In terms of the RE band (represented by the spread of the box),

ARI of 20 years shows the lowest spread followed by ARIs of 2, 5, 10, 50 and 100 years. The

RE band for 100 years ARI is about double to ARIs of 2 and 20 years. These results show that

in terms of RE, the best result overall is achieved for 20 years ARI for the GAANN based

RFFA model. Similar to ANN based RFFA model, the performance of GAANN based RFFA

model is relatively poor for the higher ARIs (i.e. 50 to 100 years). This is not unexpected as

estimation of flood quantiles for higher ARIs are associated with a greater degree of

uncertainty (e.g. Haddad and Rahman, 2012; Rahman et al., 2011).

10050201052

300

200

100

0

-100

-200

-300

ARI (years)

RE

(%

)

0

Figure 8.5 Boxplot of relative error (RE) values for GAANN based RFFA model

Figure 8.6 presents the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments

for the GAANN based RFFA model for different ARIs. It is found that the median Qobs/Qpred

ratio values (represented by the thick black lines within the boxes) are located closer to 1 – 1

line (the horizontal line in Figure 8.6), in particular for ARIs of 2, 10, 20 and 50 years (the

best agreement is for ARI of 10 years). However, for ARI of 5 years, the median Qobs/Qpred

ratio value is located a short distance below the 1 – 1 line and for ARI of 100 years, the

median Qobs/Qpred ratio value is located a short distance above the 1 – 1 line. These results

indicate a noticeable overall underestimation and overestimation of the predicted flood

quantiles by the GAANN based RFFA model for 5 years and 100 years ARI. In terms of the

spread of the Qobs/Qpred ratio values, ARI of 20 years exhibits the lowest spread followed by



ARIs of 5, 2, 50, 10 and 100 years. Furthermore, the spreads of the Qobs/Qpred ratio values for

10 and 100 years are very similar, which are remarkably larger than 2, 5 and 20 years.

10050201052

3

2

1

0

-1

-2

-3

ARI (years)

Rati

o (

Qp

red

/Qo

bs) 1

Figure 8.6 Boxplot of Qpred/Qobs ratio values for GAANN based RFFA model

8.2.3 GEP

Figure 8.7 compares the predictcted flood quantiles for the selected 90 test catchments by the

GEP based RFFA model with the observed flood quantiles for 20 years ARI (Q20). Figure 8.7

generally presents a good agreement between the predicted and observed flood quantiles;

however, there is some over-estimations by the GEP based RFFA model when the observed

flood quantiles are smaller than about 100 m3/s. Most of the test catchments are within a

narrow range of variability from the 45-degree line except for a few outliers. The plots of

predicted and observed flood quantiles for other ARIs were found to be very similar to the 20

years ARI. The plots of predicted and observed flood quantiles for other ARIs can be seen in

Appendix B (Figures B.31 to B.35). The results are very similar for ARIs of 2, 5, 10 and 20

years. Results for ARIs of 50 and 100 years (Figures B.34 and B.35, respectively) exhibit

some overestimation by the GEP based RFFA model for smaller to medium discharges



Figure 8.7 Comparison of observed and predicted flood quantiles for GEP based RFFA model

for Q20

Figure 8.8 shows the boxplot of relative error (RE) values of the selected test catchments for

GEP based RFFA model for different flood quantiles. It can be seen from Figure 8.8 that the

median RE values (represented by the thick black lines within the boxes) are located very

close to the zero RE line (indicated by 0 – 0 horizontal line in Figure 8.8), in particular for

ARIs of 2 and 10 years. However, for ARIs of 20, 50 and 100 years, the median RE values are

located above the zero line with ARI of 100 years showing the highest departure, which

indicates an overestimation by the GEP based RFFA model. Overall, the GEP based RFFA

model shows some overestimation bias in flood quantiles estiamtes for higher ARIs.

In terms of the spread of the RE (represented by the width of the box), ARI of 20, 50 and 100

years present the highest RE band and ARIs of 5 and 10 years present the smallest RE band,

followed by ARI of 2 years. The RE bands for 20, 50 and 100 years ARIs are almost double

to RE bands of 5 and 10 years ARIs. This implies that GEP based RFFA model provides the

most accurate flood quantile estimates for 5 and 10 years ARIs, and the least accurate flood

quantiles for ARIs of 20, 50 and 100 years. Overall, the boxplot in Figure 8.8 shows that

better results in terms of RE values are achieved for the smaller ARIs (i.e. 2, 5 and 10 years

ARIs) as compared to higher ARIs. Some outliers (evidenced by notable overestimation with

a positive RE) can be seen for all the ARIs, which may need to be examined more closely for



data errors or issues regarding the hydrology and physical characteristics of these catchments;

if these catchments are deemed to be genuine outliers they should be removed to enhance the

GEP based RFFA model; however, this has not been undertaken in this thesis.

10050201052

300

200

100

0

-100

-200

-300

ARI (years)

RE

(%

)

0

Figure 8.8 Boxplot of relative error (RE) values for GEP based RFFA model

Figure 8.9 shows the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments

for GEP based RFFA model for different ARIs. The median Qobs/Qpred ratio values

(represented by the thick black lines within the boxes) are located closer to 1 – 1 line (the

horizontal line in Figure 8.9), in particular for ARIs of 2, 5 and 10 years. However, for ARI of

20, 50 and 100 years the median Qobs/Qpred ratio value is clearly located above the 1 – 1 line.

These results indicate that the CANFIS based RFFA model generally provides reasonably

accurate flood quantiles with the expected Qobs/Qpred ratio value very close to 1.00 for smaller

ARIs. However; there is a noticeable overestimation for ARI of 20, 50 and 100 years. In terms

of the spread of the Qobs/Qpred ratio values, ARI of 5 and 10 years provide the lowest spread

followed by ARIs of 2, 20, 50 and 100 years.

Considering, the RE and Qobs/Qpred ratio values as discussed above, it can be concluded that

CANFIS based RFFA model generally provide unbiased flood estimates for smaller to

medium ARIs (5 and 10 years); however, the model slightly overestimates the observed flood

quantiles for higher ARIs (50 to 100 years) and a a slight underestimation for 2 years ARI.



Some outliers can be seen in the case of higher ARIs (e.g. 100 years), which may need to be

looked at more closely for data errors or issues regarding the hydrology of the catchment, if

deemed to be genuine outliers they should be removed from the model which however has not

been done in this thesis.

Q100Q50Q20Q10Q5Q2

4

3

2

1

0

-1

-2

-3

-4

ARI (years)

Rati

o (

Qp

red

/Qo

bs)

1

Figure 8.9 Boxplot of Qpred/Qobs ratio values for GEP based RFFA model

8.2.4 CANFIS

Figures 8.10, 8.11 and 8.12 show the validation results for CANFIS based RFFA model.

Figure 8.10 shows the plot of predicted flood quantiles by the CANFIS based RFFA model

and the observed flood quantiles for 20 years ARI. Figure 8.10 shows a greater scatter than

Figure 8.1 (which represents ANN based RFFA model) for flood events smaller than about

100 m3/sec (Qobs) in particular, there is an overestimation of the flood quantiles by the

CANFIS based RFFA model for the test catchments with Qobs values smaller than 100 m3/sec.

Overall, the scatter around the 45-degree line in Figure 8.10 is deemed reasonable for most of

the test catchments with Qobs values greater than 100 m3/sec. The plots of predicted and

observed flood quantiles for other ARIs can be seen in Appendix B (Figures B.36 to B.40).

The result for 2 years ARI is quite poor as can be seen in Figure B.36, with significant

overestimation by the CANFIS based RFFA model. The results for ARIs of 5, 10 and 20

years are very similar. Results for ARIs of 50 and 100 years (Figures B.39 and B.40,

respectively) exhibit some overestimation by the ANN based RFFA model for smaller to



medium discharges. For ARI of 50 years (Figure B.39), there is noticeable scatter at smaller

discharges.

Figure 8.10 Comparison of observed and predicted flood quantiles for CANFIS based RFFA

model for Q20

Figure 8.11 shows the boxplot of RE (%) values for the CANFIS based RFFA model. The

median RE values (represented by the black line within the boxes) match with the 0 – 0 line

very well for ARI of 5 and 50 years and reasonably well for ARIs of 20 years. For ARIs of 2

and 100 years, a noticeable overestimation is provided by the CANFIS based RFFA model. In

terms of the RE band (represented by the spread of the box), ARI of 5, 10 and 20 years shows

the lowest spread followed by ARIs of 50, 100 and 2 years. The RE band for 100 years ARI is

about double to ARIs of 5 and 10 years. The RE band for 2 years ARI is about four times

compared with ARIs of 5 and 10 years. These results show that in terms of RE, the best result

overall is achieved for 10 years ARI for the CANFIS based RFFA model.

Figure 8.12 presents the boxplot of the Qpred/Qobs ratio values of the selected 90 test

catchments for the CANFIS based RFFA model for different ARIs. It is found that the median

Qobs/Qpred ratio values (represented by the thick black lines within the boxes) are located

closer to 1 – 1 line (the horizontal line in Figure 8.12), in particular for ARIs of 2, 5, 10 and

20 years (the best agreement is for ARI of 10 years). However, for ARI of 50 and 100 years,

the median Qobs/Qpred ratio value is located a short distance above the 1 – 1 line. These results



indicate a noticeable overall overestimation of the predicted flood quantiles by the CANFIS

based RFFA model for 50 years and 100 years ARI. In terms of the spread of the Qobs/Qpred

ratio values, ARI of 2 and 5 years exhibits the lowest spread followed by ARIs of 20, 10, 100

and 50 years. Furthermore, the spreads of the Qobs/Qpred ratio values for 50 and 100 years are

very similar, which are remarkably larger than 2, 5 and 20 years.

10050201052

400

300

200

100

0

-100

-200

-300

ARI (years)

RE

(%

)

0

Figure 8.11 Boxplot of relative error (RE) values for CANFIS based RFFA model

8.3 Comparison of RFFA models based on validation data set

For selecting the best performing RFFA model, it is important to compare the results of these

models for independent test catchments. The following sub-sections compare the four

artificial intelligence based RFFA models based on Qpred/Qobs ratio, RE and CE values.

8.3.1 Median Qpred/Qobs ratio

Table 8.1 summarises the median Qpred/Qobs ratio values for the four different RFFA models.

For the ANN, the median Qpred/Qobs ratio values range from 0.99 to 1.14. For Q5 the median

Qpred/Qobs ratio value is 0.99, which indicates a small under-estimation by the ANN based

model. Also, for this model, Q50 and Q100 show over-estimation with median Qpred/Qobs ratio

values of 1.14 and 1.10, respectively. The best result is obtained for Q10 with a median



Qpred/Qobs ratio value of 1.02. In summary, the ANN based model shows a good median

Qpred/Qobs ratio value over all the ARIs (1.06) (at rank 3 among all the four models) and also

consistent values of median Qpred/Qobs ratio values for ARIs of 2, 5, 10 and 20 years.

10050201052

3

2

1

0

-1

-2

-3

ARI (years)

Rati

o (

Qp

red

/Qo

bs) 1

Figure 8.12 Boxplot of Qpred/Qobs ratio values for CANFIS based RFFA model

In case of the GAANN based RFFA model, the median Qpred/Qobs ratio values range from

0.89 (Q5) to 1.17 (Q100); all the median Qpred/Qobs ratio values seem to be within acceptable

range except for Q5, which is 0.89 indicating an underestimation by 11%. Similar to the ANN

based model, the best GAANN model is found for Q10 in terms of median Qpred/Qobs ratio

value. The GAANN based RFFA model provides an overall median Qpred/Qobs ratio value of

1, which is at rank 1 among the four models, but for the individual ARIs, lesser consistency

can be seen compared with the ANN based model.

In case of the GEP based RFFA model, all the flood quantiles seem to be performing well in

terms of median Qpred/Qobs ratio value except for Q20 and Q5 where 11% underestimation and

10% overestimation, respectively can be seen. The best median Qpred/Qobs ratio value (1.02) is

achieved for Q100 for the GEP based model followed by Q10 (1.4) and Q50 (1.05). These

results show that the GEP based model provides better results for higher ARIs (in particular

for 100 years ARI) as compared with all the three other models. The overall median ratio

value for all ARIs is 1.03 which is at rank 2 among the four models.



For the CANFIS based RFFA model, the best results in terms of median Qpred/Qobs ratio value

are obtained in case of Q5 (0.95) followed by Q50 (0.93). This model performs poorly for very

small and very high ARIs, however; for medium ARIs the performance of this model is quite

good. Overall, CANFIS based model provides median ratio values in the range of 0.79 to 2.81

that shows the highest degree of fluctuation among the four models. The overall median

Qpred/Qobs ratio value for the CANFIS model is 1.33, which is at rank 4 among the four

models.

Figure 8.13 plots the median Qpred/Qobs ratio values of all the four artificial intelligence based

RFFA models. It can be seen that in terms of consistency, the GEP based model is at rank 1

and ANN based model is at rank 2. The CANFIS based model is the poorest where the degree

of fluctuation among the ARIs is the highest.

Table 8.1 Median Qpred/Qobs ratio values for the four artificial intelligence based RFFA

models Median ratio (Qpred/Qobs)


2 1.04 1.08 1.07 2.81

5 0.99 0.89 1.10 0.95

10 1.02 0.98 1.04 0.79

20 1.04 0.93 0.89 1.18

50 1.14 0.95 1.05 0.93

100 1.10 1.17 1.02 1.31

Overall 1.06 1.00 1.03 1.33

8.3.2 Median RE (%)

Table 8.2 summarises the median RE (%) values of the ANN, GAANN, GEP and CANFIS

based RFFA models. The median RE values are calculated based on the absolute RE values of

the individual test catchments. In case of ANN, median RE values range from 35.62% to

44.63%. The smallest and highest median RE values are found for ARIs of 20 and 100 years,

respectively. The ANN model shows the smallest median RE values for ARIs of 2, 5 and 20

years among the four models. The ANN based RFFA model shows an overall median RE

value of 40.3%, which places it at rank 1 among the four RFFA models.

For the GAANN based RFFA model, higher median RE values can be observed for smaller

ARIs (2 to 10 years) whereas a better performance can be seen in case of higher ARIs. The

best value is obtained in case of Q100 (47.08%) whereas, the highest median RE (%) value is

obtained for Q10 (72.56%). The overall median RE values over all the 6 ARIs for the GAANN



based model is found to be 58.4% (Table 8.2), which places it at rank 3 among the four RFFA

models.

0

0.5

1

1.5

2

2.5

3

2 5 10 20 50 100

Med

ian R

ati

o (

Qpre

d/Q

obs)

ARI (years)

ANN

GAANN

GEP

CANFIS

Figure 8.13 Plot of median Qpred/Qobs ratio values for the four artificial intelligence based RFFA

models In case of the GEP model, median RE values range from 37.87% (Q50) to 69.38% (Q2). The

GEP based RFFA model seems to be performing well for higher ARIs. For 2 years ARI, it

performs very poorly. The GEP model shows the smallest median RE values for ARIs of 10

and 50 years among the four models. The overall median RE values over all the 6 ARIs for

the GEP based model is found to be 44.47% (Table 8.2), which places it at rank 2 among the

four RFFA models.

The CANFIS based RFFA model shows median RE values in the range of 34.48% (Q20) to

180.77% (Q2). The CANFIS model shows the smallest median RE values for ARIs of 20 and

100 years among the four models. The overall median RE values over all the 6 ARIs for the

CANFIS based model is found to be 69.66% (Table 8.2), which places it at rank 4 among the

four RFFA models.

Figure 8.14 plots the median RE values of all the four artificial intelligence based RFFA

models. It shows that the ANN based model shows the smallest degree of fluctuation in the



median RE values over all the six ARIs. The GAANN and GEP models show a similar degree

of fluctuation and the CANFIS shows the highest degree of fluctuation.

Table 8.2 Median RE (%) values for the four artificial intelligence based RFFA models

Median RE (%)


2 37.56 65.13 69.38 180.77

5 40.39 61.48 44.95 48.92

10 44.63 72.56 42.08 51.97

20 35.62 48.19 47.61 34.48

50 39.09 55.93 37.87 59.20

100 44.53 47.08 44.47 42.63

Overall 40.30 58.40 44.47 69.66

8.3.3 Median CE

Table 8.3 depicts the summary of median CE values of the ANN, GAANN, GEP and

CANFIS based RFFA models. In case of ANN based RFFA model, the median CE values

range from 0.40 (Q100) to 0.69 (Q2 and Q20). Overall ANN based model shows a consistency

except for Q100. The best results are obtained in the cases of Q2, Q20 and Q50. The ANN model

shows the highest median CE value for 50 years ARI among all the four models. In terms of

overall median CE value, the ANN is placed at rank 2 (jointly with the GEP model).

The GAANN based model shows median CE values in the range of 0.38 (Q50) to 0.75 (Q5).

The GAANN model shows the highest median CE values for ARI of 2 and 5 years. In terms

of overall median CE value, the GAANN is placed at rank 1 among the four models.

The GEP based model shows median CE values in the range of 0.49 (Q2) to 0.67 (Q5, Q20,

Q100). The GEP model shows the highest median CE values for ARI of 100 years. In terms of

overall median CE value, the GEP model is at rank 2 (jointly with the ANN based model).



Figure 8.14 Plot of median RE (%) values for the four artificial intelligence based RFFA models

The CANFIS based model provides poor results for Q2, with a negative median CE value. For

the other ARIs, the median CE values are in the range of 0.50 to 0.72. The CANFIS model

shows the highest median CE values for ARIs of 10 and 20 years. In terms of overall median

CE value, the CANFIS model is at rank 4 among the four models.

Figure 8.15 plots the median CE values of all the four artificial intelligence based RFFA

models. This plot shows that the lowest degree of fluctuation in the median CE values is

demonstrated by the GEP model followed by the ANN based model and the highest degree of

fluctuation is provided by the CANFIS model.

Table 8.3 Median CE values of the four artificial intelligence based RFFA models

Median CE values


2 0.69 0.72 0.49 -0.09

5 0.59 0.75 0.67 0.54

10 0.63 0.63 0.56 0.67

20 0.69 0.71 0.67 0.72

50 0.68 0.38 0.63 0.55

100 0.40 0.65 0.67 0.59

Overall 0.61 0.64 0.61 0.50



Figure 8.15 Plot of median CE values for the four artificial intelligence based RFFA models

8.3.5 Comparison of RFFA models based on RE (%) ranges

When comparing different RFFA models, it is important to observe how many test

catchments fall within specified ranges of RE. For this purpose, RE (%) values (considering

its sign), are grouped into four classes as shown in Table 8.4. The selected arbitrary ranges of

RE (%) are (-10 to 10), (-20 to 20), (-50 to 50) and (-100 to 100).

In the range of -10 to 10 of RE (%), the ANN is placed at rank 1 with 22% of the 90 test

catchments falling in this range, followed by CANFIS (14%) and GAANN (12%). However;

in case of GEP, only 9 test catchments fall in this range, which is 10%.

In case of -20 to 20 of RE (%), a total of 32 (35%) test catchments fall under this category for

the ANN based model. Some 27% (25) of the test catchments fall in the range of -20 to 20 in

case of CANFIS based RFFA model, which is higher than the GEP (25%) and GAANN

(22%) based models. In this case, the ANN based model is placed again at rank 1 and the

GAANN at rank 4 among the four models.

In the range of -50 to 50 of RE (%), the CANFIS is found to be placed at rank 1 with 61% of

the test catchments fall in this range, which is very closely followed by the ANN model where

60% of the test catchments fall in this range. The GEP is placed at rank 3 among the four



models, with 55% of the test catchments falling in this range followed by GAANN to be

ranked at 4 with 47 of the test catchments falling in this category which is 52% of the total

catchments.

In the range of -100 to 100 of RE (%), the ANN based RFFA model is again placed at rank 1

with 92% of the catchments falling in this range. This is followed by the CANFIS (77%),

GAANN (76%) and 63 test catchments of GEP based RFFA model which is 70% of the test

catchments.

Overall, the ANN based model outperforms the three other models in terms of the

distributions of RE (%) values, which is followed by the CANFIS based model.

Table 8.4 Grouping of 90 test catchments based on RE (%) ranges for the four artificial

intelligence based RFFA models

Models (-10 to 10) (-20 to 20) (-50 to 50) (-100 to 100)

ANN 20 32 54 83

% of test catchments 22 35 60 92

GAANN 11 20 47 69


GEP 9 22 49 63


CANFIS 13 25 55 70


8.3.6 Selection of the best performing artificial intelligence based RFFA

model

The four artificial intelligence based RFFA models have been compared in Section 8.2 and

Sections 8.3.1 to 8.3.5 based on the results from application of these models to 90 test

catchments. It has been found that none of the four models perform the best in all the

assessment criteria, which makes it difficult to select the best model. Based on seven different

criteria as shown in Table 8.5, the performances of the four models are assessed in a heuristic



manner. In this assessment, a model is ranked based on seven different criteria as shown in

Table 8.5. Four different ranks are used with a relative score ranging 4 to 1. If a model is

ranked 1 for a criterion, it scores 4. For ranks 2, 3 and 4, scores of 3, 2 and 1, respectively are

used. Table 8.5 shows that the ANN based RFFA model has the highest score of 25, followed

by the GAANN with a score of 19. The GEP receives a score of 17, while the CANFIS

receives only 10 making it the least favourable model. The ANN based model is placed at

rank 1 in 5 out of 7 criteria. Hence, it is reasonable to conclude that the ANN based RFFA

model is the best performing artificial intelligence based model for eastern Australia.

Table 8.5 Ranking of the four artificial intelligence based RFFA models for eastern

Australia

Criteria Rank 1 Rank 2 Rank 3 Rank 4

Scatter plot of Qobs Vs Qpred ANN GEP GAANN CANFIS

Box plot of RE ANN GEP GAANN CANFIS

Box plot of Qpred/Qobs ANN GAANN CANFIS GEP

Median Qpred/Qobs GAANN GEP ANN CANFIS

Median RE ANN GEP GAANN CANFIS

Median CE GAANN ANN/GEP # CANFIS

RE (%) ranges ANN CANFIS GAANN GEP

Overall Scoring: ANN: 25, GAANN: 19, GEP: 17, CANFIS: 10

8.4 Performance of the finally selected artificial intelligence based

RFFA model

This section further evaluates the performance of the best performing artificial intelligence

based RFFA model, which is the ANN based RFFA model. Here, the spatial distributions of

the relative error (RE) values of the ANN based RFFA model for the 90 test catchments are

evaluated. Secondly, relation between the RE and catchment area is investigated.



8.4.1 Spatial distribution of RE (%) of the ANN based RFFA model

Figure 8.16 shows the spatial distribution of RE values across NSW. Most of the test

catchments fall in the eastern part of the NSW since not many catchments qualified from the

western NSW in the study data set. Overall, the catchments near the north-eastern NSW are

found to be exhibiting smaller RE values. Most importantly, Figure 8.16 does not show any

notable spatial pattern and in general test catchments with higher RE values are surrounded by

catchments with relatively small RE values.

Figure 8.16 Spatial distribution of RE of ANN based model across NSW

Figure 8.17 shows the distribution of RE values across the state of Victoria. Similar to NSW

there is no noticeable spatial trend of the RE values across the state. Figures 8.18, 8.19 and

8.20 show the spatial distribution of RE values across QLD. Figure 8.18 shows the RE values

across northern and northeastern parts of QLD. Generally, the catchments in this part of QLD

show a relatively small RE values. Figure 8.19 shows the catchments in the southern and

southeastern parts of QLD. Most of the test catchments fall near the coastal area of QLD. The

catchments close to NSW and QLD border are found to be exhibiting better results with RE

values quite small. Figure 8.20 shows a full view of the spatial distribution of RE values

across QLD, which shows that there is no noticeable spatial trend in the RE values for QLD.



Figure 8.21 shows the spatial distribution of RE values across the state of TAS. Most of the

test catchments in TAS fall in the middle of TAS and away from coastal regions of TAS. No

spatial trend is observed in the RE values over TAS.

It should also be noted that there are some outlier catchments where RE values are quite high;

these catchments may need further investigation, which however is not undertaken in this

thesis.

Figure 8.17 Spatial distribution of RE of ANN based model across VIC



Figure 8.18 Spatial distribution of RE of ANN based model across North QLD

Figure 8.19 Spatial distribution of RE of ANN based model across Southeast QLD



Figure 8.20 Spatial distribution of RE of ANN based model across QLD

Figure 8.21 Spatial distribution of RE of ANN based model across TAS

8.4.2 Catchment area vs RE

Figure 8.22 shows a plot between RE values and the area of the test catchments. Catchments

with areas in the range of 1 to 200 km2 fall within minimum RE group. In the range of 200 to

400 km2, most catchments show smaller RE values except two outliers where RE values are



greater than 500%. Of importance, there is no statistically significant relationship between RE

and catchment area as the coefficient of determination (R2) of the fitted regression line in

Figure 8.22 is only 6%.

8.5 Comparison with QRT

Finally, the ANN based RFFA model is compared with the QRT based models. Here the same

dataset are used for building and testing the ANN and QRT models. Based on the median

Qpred/Qobs ratio values as shown in Table 8.6, the ANN based RFFA model shows median

Qpred/Qobs ratio values closer to 1.00 compared with the QRT model for all the 6 ARIs.

Similarly, as shown in Table 8.7, the ANN based RFFA model shows a smaller median RE

values than the QRT model for all the ARIs. Furthermore, in Table 8.8, the ANN based RFFA

model outperforms the QRT models with respect to CE values. These results demonstrate that

ANN based RFFA model outperform the QRT model considering all the three evaluation

statistics. It should be noted here that the median RE values for the best ANN based RFFA

model developed here range from 35% to 44% (with few cases where RE > 100%), which is

typical with Australian regional flood estimation methods (e.g., see Haddad et al., 2011;

Haddad and Rahman, 2012). Since RE is independent of catchment area, the model can be

applied to smaller as well as larger catchments up to 1000 km2.

Figure 8.22 Plot between catchment area and RE (%) values for ANN based RFFA model for 90

test catchments



Table 8.6 Median Qpred/Qobs ratio values for seven ANN based candidate regions and

QRT

Flood quantile Median ratio (Qpred/Qobs)

ANN QRT

Q2 1.04 1.15

Q5 0.99 1.06

Q10 1.02 1.35

Q20 1.04 1.13

Q50 1.14 1.19

Q100 1.10 1.28

Table 8.7 Median relative error values (%) for seven ANN based candidate regions and QRT

Flood quantile Median RE (%)

ANN QRT

Q2 37.56 65.38

Q5 40.39 45.35

Q10 44.63 57.62

Q20 35.62 42.64

Q50 39.09 48.71

Q100 44.53 51.72

Table 8.8 Coefficient of efficiency (CE) values for seven ANN based candidate regions

and QRT Flood quantile CE

ANN QRT

Q2 0.73 0.35

Q5 0.61 0.37

Q10 0.63 0.30

Q20 0.71 0.37

Q50 0.68 -8.42

Q100 0.52 0.38

8.6 Summary

In this chapter, four artificial intelligence based RFFA models which are ANN, GAANN,

GEP and CANFIS have been validated based on 90 independent test catchments. It has been

found that there is no model which performs the best for all the six ARIs and for all the seven

criteria (Table 8.17). It has been found that the ANN based RFFA model is the best

performing model among the four artificial intelligence based RFFA models. The ANN based

RFFA model is found to outperform the ordinary least squares based quantile regression

technique.



The median relative error values for the finally selected ANN based RFFA model ranges 35%

to 44%, which is slightly higher than the GLS regression based region-of-influence approach

(parameter regression technique) reported by Haddad and Rahman (2012) (relative error

ranges 29% to 45%). However, these relative error values by both the techniques are within

the expected error/variability of RFFA models, which is dependent on at-site flood frequency

analysis estimates (that has a high degree of sampling variability).

The ANN based RFFA model shows that there is no noticeable spatial trend in the relative

error values across four states in eastern Australia. Furthermore, the relative error values are

independent of catchment area.

There are few catchments where the ANN based RFFA model shows relatively high relative

error values (similar to the results by Haddad and Rahman, 2012). These catchments may

need further investigation, which however is not undertaken in this thesis.

To enhance the accuracy of regional flood estimation methods in eastern Australia, a larger

data set with longer streamflow record lengths would be needed as Australia is characterised

by a highly variable hydrology/flood regime. It is expected that the availability of such a

larger data in future would enhance the accuracy of artificial intelligence based RFFA models

in eastern Australia.



CHAPTER 9

SUMMARY, CONCLUSIONS AND

RECOMMENDATIONS

9.1 General

This thesis has focused on the development and testing of non-linear artificial intelligence

based regional flood frequency analysis (RFFA) models. For this purpose, a database of 452

small to medium sized catchments from eastern Australia has been used. Four different

artificial intelligence based RFFA models have been considered in this research. These non-

linear RFFA models have also been compared with the linear ordinary least squares based

regression model. This chapter presents a summary of the research undertaken in this thesis,

conclusions and recommendations for further study.

9.2 Summary of the research undertaken in this thesis

Selection of study catchments and data preparation: This research selects eastern Australia

as the study area since it has the highest density of stream gauging stations in Australia. A

total of 452 catchments were selected from the study area that consist of 96 catchments from

New South Wales and Australian Capital Territory, 131 catchments from Victoria, 172

catchments from Queensland and 53 catchments from Tasmania. The geographical locations

of the selected 452 catchments can be seen in Figure 4.19. These catchments are not affected

by major regulation and land use changes. These are small to medium-sized catchments, with

catchment areas in the range of 1.3 to 1900 km2 (mean: 329.4 km2). The annual maximum

flood series of the selected stations were prepared by adopting standard procedure (e.g. by

filling gaps in the data and by checking for rating curve error and trends). The annual

maximum flood record lengths of the selected stations range from 25 to 75 years (mean: 33

years). For each of the selected stations, at-site flood frequency analysis was carried out using

FLIKE software (Kuczera, 1999). The detected low flows were censored using in-built

facility in the FLIKE. A LP3 distribution with the Bayesian parameter estimation procedure

was adopted to estimate flood quantiles for six average recurrence intervals (i.e. 2, 5, 10, 20,

50 and 100 years). These flood quantiles were used as dependent/target variables in the



development of models using linear and non-linear methods. For each of the selected

catchments, data for five catchment characteristics were abstracted, which are catchment area,

mean annual areal evapo-transpiration, mean annual rainfall, main stream slope and design

rainfall intensity. The summary of these catchment characteristics data can be seen in Table

4.1.

Selection of predictor variables: From the selected five candidate catchment characteristics

variables, eight different combinations were formed. Each of these combinations contained

catchment area and design rainfall intensity and combinations of the remaining three predictor

variables (mean annual areal evapo-transpiration, mean annual rainfall and main stream slope)

as can be seen in Table 5.2. Two artificial intelligence based RFFA techniques (ANN and

GEP) were then used to develop prediction equations. From the selected 452 catchments, 90

catchments were selected randomly as test catchments, the remaining 362 catchments were

used to develop models. Models were assessed based on ratio between predicted and observed

flood quantiles, percent relative error and coefficient of efficiency. Based on the independent

testing, it was found that the ANN and GEP based RFFA model with only two predictor

variables (catchment area and design rainfall intensity) outperformed other models with a

greater number of predictor variables. This model would be easier to apply in practice as

these two predictor variables can be obtained relatively easily from the published maps

and government websites. In the subsequent analyses, these two predictor variables

(catchment area and design rainfall intensity) were used.

Formation of regions: From the selected 452 catchments covering four eastern Australian

states, different regions/groupings were formed. In the first step, regions were formed on the

basis of state/geographical and climatic boundaries. Here, seven different regions were

considered as can be seen in Table 6.1. In the second step, the regions were formed in the

catchment characteristics data space based on cluster analysis and principal component

analysis. Here, two regions were formed based on cluster analysis and four regions were

formed based on principal component analysis. It was found that that K-Means cluster

analysis generated the best performing groupings in the catchment characteristics data space.

When compared with the geographical regions, some state-based regions performed more

poorly than the K-Means cluster groupings. Overall, the best ANN-based RFFA model was

achieved when all the data set of 452 catchments were combined to form a single region.

Development of artificial intelligence based RFFA models: In the development/training of

the artificial intelligence based RFFA models, the selected 452 catchments were divided into



two parts randomly: (i) training data set consisting of 362 catchments; and (ii) validation data

set consisting of 90 catchments. The artificial intelligence based RFFA models were evaluated

based on four criteria: median Qpred/Qobs ratio, plot of Qobs and Qpred, median relative error and

coefficient of efficiency (Tables 7.9 and 7.10). It was found that no model performed the best

for all the six ARIs over all the adopted criteria. Overall, the ANN based RFFA model

outperformed the three other models in the training/calibration.

Validation of the artificial intelligence based RFFA models: The four artificial intelligence

based RFFA models (ANN, GAANN, GEP and CANFIS) were validated using the 90

independent test catchments. In the first step, the four artificial intelligence based RFFA

models were compared with each other. Based on seven different criteria (can be seen in

Table 8.5), it was found that there was no model which performed the best for all the six ARIs

based on all the seven criteria (Table 8.17). It was found that the ANN based RFFA model

was the best performing model among the four artificial intelligence based RFFA models. In

the second step, the ANN based RFFA model was compared with the ordinary least squares

based quantile regression technique. It was found that ANN based RFFA model outperformed

the quantile regression technique.

The median relative error values for the finally selected ANN based RFFA model were found

to be in the range of 35% to 44%, which is comparable to the generalised least squares

regression and region-of-influence approach (parameter regression technique) which reported

relative error values in the range of 29% to 45% for eastern Australia (Haddad and Rahman,

2012). The ANN based RFFA model exhibited no noticeable spatial trend in the relative error

values on the map of the selected study area. Furthermore, the relative error values were

found to be independent of catchment area. There are few catchments where the ANN based

RFFA model (and the other three artificial intelligence based RFFA models and the quantile

regression technique) showed quite high relative error values (similar to the results by Haddad

and Rahman, 2012). These catchments need further investigation, which however was not

undertaken in this thesis.

9.3 Conclusions

The following conclusions can be made from this research:

It has been found that non-linear artificial intelligence based RFFA techniques can be

applied successfully to eastern Australian catchments. Among the four artificial

intelligence based models, the ANN based RFFA model has been found to be the best



performing model, followed by the GAANN based RFFA model. The ANN based

RFFA model has been found to outperform the ordinary least squares based RFFA

model.

It has been shown that in the training of the four artificial intelligence based RFFA

models, no model performs the best for all the six ARIs over all the adopted criteria.

Overall, the ANN based RFFA model is found to outperform the three other models in

the training/calibration.

Based on independent validation, the median relative error values for the ANN based

RFFA model are observed to be in the range of 35% to 44% for eastern Australia, which

is comparable to the generalised least squares regression and region-of-influence based

RFFA approach.

It has been demonstrated that a RFFA model with two predictor variables i.e.,

catchment area and design rainfall intensity provides more accurate flood quantile

estimates than other models with a greater number of predictor variables. The finally

selected ANN based RFFA model would be easier to apply in practice since data of

these two predictor variables can be obtained relatively easily from published maps and

government websites.

It has been shown that when the data from all the eastern Australian states are combined

to form one region, the resulting ANN based RFFA model performs better as compared

with other candidate regions such as regions based on state boundaries, geographical

and climatic boundaries and the regions formed in the catchment characteristics data

space.

The ANN based RFFA model exhibits no noticeable spatial trend in the relative error

values. Furthermore, the relative error values of the ANN based RFFA model are found

to be independent of catchment area.

9.4 Recommendations for further research

The ANN based RFFA model developed in this study is based on the catchments in the states

of New South Wales, Victoria, Queensland and Tasmania. In future research, the ANN based

RFFA model should be tested to other Australian states.



The ANN based RFFA model developed in this study is based on design rainfall data from

Australian Rainfall and Runoff (ARR) 1987. The ANN based RFFA model should be

calibrated and tested with the recently released design rainfall data by Australian Bureau of

Meteorology.

In future research, detail investigation should be made on the catchments where relative error

values have been found to be quite high for all the modelling techniques adopted in this

research. In this regard, streamflow data of these catchments should be checked. Furthermore,

it should be checked whether these catchments have other special features which make them

significantly different to other catchments in the data set.

To enhance the accuracy of the ANN based RFFA model, a lager data set consisting of a

greater number of catchments and additional predictor variables (when available in future)

should be used to develop and test the ANN based RFFA model in future.

In future research, leave-one-out validation and Monte Carlo cross validation technique

should be adopted to train and validate the ANN based RFFA model.



REFERENCES



REFERENCES

ABC News (2011). Aerial shot of the flooded Queensland town of Ipswich. (accessed on 5th

August 2013). Accessible at http://www.abc.net.au.

ABC News (2011). Aerial shot of the flooded New South Wales town of Wagga Wagga.

(accessed on 5th August 2013). Accessible at http://www.abc.net.au.

Abrahart, R.J., See, L. and Kneale P.E. (1999). Using pruning algorithms and genetic

algorithms to optimize network architectures and forecasting inputs in a neural network

rainfall-runoff model. Journal of Hydroinformatics, 1, 103-114.

Acreman, M.C. and Sinclair, C.D. (1986) Classification of drainage basins according to their

physical characteristics and application for flood frequency analysis in Scotland, Journal of

Hydrology, 84(3), 365-380.

Adams, C.A. (1987). Design flood estimation for ungauged rural catchments in Victoria Road

Construction Authority, Victoria, Draft Technical Bulletin.

Alkon, D.L. (1989). Memory storage and neural systems. Scientific American, 26 (1), 42-50.

Alecsandru, C. and Ishak, S. (2004). Hybrid model-based and memory-based traffic

prediction system. Transportation Research Record: Journal of the Transportation Research

Board, 1879(1), 59-70.

Arthur, L.C. and Roger, L.W. (1995). LibGA for solving combinatorial optimization

problems. In L. Chambers (ed.), Practical handbook of Genetic Algorithms, CRC Press, Inc.

ASCE. (2000). Task Committee, 2000. Artificial neural networks in hydrology-I: Preliminary

concepts. Journal of Hydrologic Engineering, ASCE 5 (2), 115–123.

Aytek, A. (2009). Co-Active neuro-fuzzy inference system for evapotranspiration modelling.

Soft Computing, 13(7), 691-700.

Azamathulla, H.M., Ghani, A.A., Leow, C.S., Chang, C.K. and Zakaria, N.A. (2011). Gene-

expression programming for the development of a stage-discharge curve of the Pahang River.

Water Resources Management, 25(11), 2901–2916

Azamathulla, H.M. and Ghani, A.A. (2011). Genetic programming for longitudinal dispersion

coefficients in streams. Water Resources Management, 25(6), 1537–1544.

Aziz, K., Rahman, A., Fang, G., Haddad, K. and Shrestha, S. (2010). Design flood estimation

for ungauged catchments: Application of artificial neural networks for eastern Australia.

World Environment and Water Resources Congress, ASCE, Providence, Rhodes Island, USA.

http://www.abc.net.au/



Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2011). Artificial neural networks based

regional flood estimation methods for eastern Australia: Identification of optimum regions.

33rd Hydrology and Water Resources Symposium, 26 June-1 July 2011, Brisbane, Australia.

Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2012). Comparison of artificial neural

networks and adaptive neuro-fuzzy inference system for regional flood estimation in

Australia, Hydrology and Water Resources Symposium, Engineers Australia, 19-22 Nov

2012, Sydney, Australia.

Aziz, K., Rahman, A., Fang, G. and Shreshtha, S. (2013). Application of artificial neural

networks in regional flood frequency analysis: A case study for Australia, Stochastic

Environment Research & Risk Assessment, 28(3), 541-554.

Baker, J.E. (1985). Adaptive selection method for genetic algorithms. Proceedings of an

International Conference on Genetic Algorithms and their Applications, 100-111.

Baker. J.E. (1987). Reducing bias inefficiency in the selection algorithm. In J.J. Grefenstette

(ed.), Genetic algorithms and their applications, Proceedings of the second international

conference on genetic algorithms, Erlbaum.

Bates, B.C., Rahman, A., Mein, R.G. and Weinmann, P.E. (1998). Climatic and physical

factors that influence the homogeneity of regional floods in south-eastern Australia. Water

Resources Research, 34(12), 3369-3382.

Bayazit, M. and Onoz, B. (2004). Sampling variances of regional flood quantiles affected by

inter-site correlation, Journal of Hydrology, 291, 42-51.

Benson, M.A. (1962). Evolution of methods for evaluating the occurrence of floods. U.S.

Geological Surveying Water Supply Paper, 30, 1580-A.

Bureau of Infrastructure, Transport and Regional Economics (BITRE) (2008). Analysis of the

Emergency Management Australia database. About Australia’s Regions, Department of

Infrastructure, Transport, Regional Development and Local Government, Australian

Government, Canberra, Table 30, 44 pp.

Bureau of Meteorology (2014). State of the Climate 2014. http://www.bom.gov.au/state-of-

the-climate/.

Bishop, C.M. (1995). Neural networks for pattern recognition, Oxford University Press.

Blöschl, G. and Sivapalan, M. (1997). Process controls on regional flood frequency:

Coefficient of variation and basin scale, Water Resources Research, 33, 2967-2980.

Blackie, J.R. and Eeles, C.W.O. (1985). Lumped catchment models. In: Hydrological

Forecasting (ed. by M. G. Anderson &T. P. Burt), 311-346. Wiley.

http://www.bom.gov.au/state-of-the-climate/

http://www.bom.gov.au/state-of-the-climate/



Bogardi, I., Bardossy, A., Duckstein, L. and Pongra´cz, R. (2003). Fuzzy logic in hydrology

and water resources. In: Demicco, R.V., Klir, G.J. (Eds.), Fuzzy Logic in Geology. Elsevier

Academic Press, 153–190.

Bowden, G.J., Dandy, G.C. and Maier, H.R. (2005). Input determination for neural network

models in water resources applications. Part 1-background and methodology. Journal of

Hydrology, 301, 75-92.

Burn, D.H. (1990). Evaluation of regional flood frequency analysis with a region of influence

approach, Water Resources Research, 26(10), 2257-2265.

Burn, D.H. and Goel N.K. (2000). The formation of groups for regional flood frequency,

Journal of Hydrological Sciences, 45(1), 97-112.

Caballero, W.L. and Rahman, A. (2014). Development of regionalized joint probability

approach to flood estimation: A case study for New South Wales, Australia, Hydrological

Processes, 28, 4001-4010.

Castellarin, A., Burn, D.H. and Brath, A. (2001). Assessing the effectiveness of hydrological

similarity measures for regional flood frequency analysis, Journal of Hydrology, 241(3-4),

270-285.

Cheng, C.T., Ou, C.P. and Chau, K.W. (2002). Combining a fuzzy optimal model with a

genetic algorithm to solve multi-objective rainfall-runoff model calibration. Journal of

Hydrology, 268, 72-86.

Chokmani, K., Ouarda, B.M.J.T., Hamilton, S., Ghedira, M.H. and Gingras, H. (2008).

Comparison of ice-affected streamflow estimates computed using artificial neural networks

and multiple regression techniques. Journal of Hydrology, 349, 83–396.

Caudill, M. (1987). Neural networks primer, Part I, AI Expert, December, 46-52.

Caudill, M. (1988). Neural networks primer, Part II, AI Expert, No. February, 55-61.

Caudill, M. (1989). Neural networks primer, Part VII, AI Expert, No. May, 51 - 8.

Chow, V.T., Maidment, D.R. and Mays, L.W. (1988). Applied Hydrology, McGraw-Hill,

New York, NY.

Corradini, C. and Singh, V.P. (1985). Effect of spatial variability of effective rainfall on direct

runoff by a géomorphologie approach. Journal of Hydrology, 81, 27-43.

Cunnane, J.R. (1987). Review of Statistical Methods for Flood Frequency Estimation. V. P.

Singh (Ed.), in Hydrologic Frequency Modeling, D. Reidel, Dordrecht.



Cunnane, C. (1988). Methods and merits of regional flood frequency analysis, Journal of

Hydrology, 100, 269-290.

Dalrymple, T. (1960). Flood frequency analyses. U.S., Geological Survey Water Supply

Paper, 1543-A, 11-51.

Daniell, T.M. (1991). Neural networks – applications in hydrology and water resources

engineering. International Hydrology & Water Resources Symposium. Perth, Australia, 2-4

October, 1991.

de la Maza M. and Tidor B. (1993). An analysis of selection procedures with particular

attention paid to proportional and Boltzmann selection. In S. Forrest (ed.), Proceedings of the

fifth international conference on genetic algorithms.

Dawson, C.W., Abrahart, R.J., Shamseldin, A.Y. and Wilby, R.L. (2006). Flood estimation at

ungauged sites using artificial neural networks, Journal of Hydrology, 319, 391–409.

Dawdy, D.R. (1961). Variation of flood ratios with size of drainage area. U. S. Geol. Surv.

Prof. Pap. 424-C, Paper C36.

Douglas, B.C. (1995). U.S. National Report to IUGG, 1991-1994. Reviews of Geophysics, 33

Supplement. Online; available at http://www.agu.org/revgeophys/dougla01/dougla01.

(Accessed on 13 Nov, 2009).

Efron, B. and Tibshirani, R.J. (1993). An introduction to the bootstrap. Monographs on

Statistics and Applied Probability. Chapman and Hall, New York.

Fausett, L. (1994). Fundamentals of neural networks, Englewood Cliffs, NJ: Prentice Hall.

Farmer, J.D. and Sidorowich, J. (1987). Predicting chaotic time series. Physical Review

Letter, 59(8), 845-848.

Feldman, A.D. (1979). Flood hydrograph and peak flow frequency analysis. (Technical Paper

No. 62). US Army Corps of Engineers, Institute for Water Resources, The Hydrologic

Engineering Centre.

Fernando, D.A.K., Shamseldin, A.Y. and Abrahart, R.J. (2009). Using gene expression

programming to develop a combined runoff estimate model from conventional rainfall-runoff

model outputs. 18th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 2009.

Ferreira, C. (2001a). Gene expression programming in problem solving”, 6th Online World

Conference on Soft Computing in Industrial Applications (invited tutorial).

Ferreira, C. (2001b). Gene expression programming: a new adaptive algorithm for solving

problems. Complex Systems 13(2), 87–129.



Ferreira, C. (2006). Gene-expression programming; mathematical modeling by an artificial

intelligence. Springer, Berling, Heidelberg, New York.

Flavell, D. (2012). Design flood estimation in Western Australia. Australian Journal of Water

Resources, Vol. 16 (1), 1-20.

Flood, I. and Kartam, N. (1994). Neural networks in civil engineering; Principles and

understanding. Journal of Computing in Civil Engineering, 8(2), 131-148, 194.

Franchini, M. (1996). Using a genetic algorithm combined with a local search method for the

automatic calibration of conceptual rainfall-runoff models. Hydrological Sciences Journal

41(1), 21-40.

Franchini, M. and Galeati, G. (1997). Comparing several genetic algorithm schemes for the

calibration of conceptual rainfall-runoff models. Hydrological Sciences Journal, 42 (3), 357-

379.

Franchini, M., Galeati, G. and Lolli, M. (2005). Analytical derivation of the flood frequency

curve through partial duration series analysis and a probabilistic representation of the runoff

coefficient, Journal of Hydrology, 303, 1–15.

Giustolisi, O. (2004). Using genetic programming to determine Chèzy resistance coefficient

in corrugated channels. Journal of Hydroinformatics, 6(3), 157–173.

Griffis, V.W. and Stedinger, J.R. (2007). The use of GLS regression in regional hydrologic

analyses. Journal of Hydrology, 344, 82-95.

Grubbs, F.E. and Beck, G. (1972). Extension of sample sizes and percentage points for

significance tests of outlying observations. Technometrics, 14, 847–854.

Goldberg, D.E. (1989). Genetic algorithms in search, optimization and machine learning.

Addison-Wesley, Reading, MA.

Goldberg, D.E. and Deb, K. (1991). A comparative analysis of selection schemes used in

genetic algorithms. In G. Rawlins (ed.), Foundations of genetic algorithms.

Gupta, V.K. and E. Waymire. (1993). A statistical analysis of mesoscale rainfall as a random

cascade. Journal of Applied Meteorology, 32(2), 251-267.

Guven, A. and Talu, N.E. (2010). Gene-expression programming for estimating suspended

sediment in Middle Euphrates Basin, Turkey. Clean Soil Air and Water, 38(12), 1159–1168.



Guven, A. and Kisi, O. (2011). Estimation of suspended sediment yield in natural rivers using

machine-coded linear genetic programming. Water Resources Management, 25(2), 691–704.

Hackelbusch A., Micevski T., Kuczera G., Rahman A. and Haddad K. (2009). Regional flood

frequency analysis for eastern New South Wales: A region of influence approach using

generalized least squares based parameter regression. In Proc. 31st Hydrology and Water

Resources Symp., Newcastle, Australia.

Haddad, K., Rahman, A. and Weinmann, P.E. (2006). Design flood estimation in ungauged

catchments by quantile regression technique: ordinary least squares and generalised least

squares compared. 30th Hydrology and Water Resources Symposium, The Institution of

Engineers Australia, 4-7 Dec 2006, Launceston.

Haddad, K., Rahman, A. and Weinmann, P.E. (2008). Development of a generalised least

squares based quantile regression technique for design flood estimation in Victoria, 31st

Hydrology and Water Resources Symp., Adelaide, 15-17 April 2008, 2546-2557.

Haddad, K., Pirozzi, J., McPherson, G., Rahman, A. and Kuczera, G. (2009). Regional flood

estimation technique for NSW: Application of generalised least squares quantile regression

technique. In Proc. 31st Hydrology and Water Resources Symp., Newcastle, Australia.

Haddad, K., Rahman, A., Weinmann, P.E., Kuczera, G. and Ball, J.E. (2010). Streamflow

data preparation for regional flood frequency analysis: Lessons from south-east Australia.

Australian Journal of Water Resources, 14, 1, 17-32.

Haddad, K., Rahman, A. and Stedinger, J.R. (2011). Regional flood frequency analysis using

bayesian generalized least squares: A comparison between quantile and parameter regression

techniques, Hydrological Processes, 25, 1-14.

Haddad, K. and Rahman, A. (2011). Regional flood estimation in New South Wales Australia

using generalised least squares quantile regression. Journal of Hydrologic Engineering,

ASCE, 16 (11), 920-925.

Haddad, K. and Rahman, A. (2012). Regional flood frequency analysis in eastern Australia:

Bayesian GLS regression-based methods within fixed region and ROI framework – Quantile

Regression vs. Parameter Regression Technique, Journal of Hydrology, 430-431 (2012), 142-

161.

Haddad, K., Rahman, A., Ling, F. (2014). Regional flood frequency analysis method for

Tasmania, Australia: A case study on the comparison of fixed region and region-of-influence

approaches, Hydrological Sciences Journal, DOI:10.1080/02626667.2014.950583.

Holland, J.H. (1975). Adaptation in natural and artificial systems. University of Michigan

Press, Ann Arbor, MI pp. 183.



Hoang, T.M.T. (2001). Joint probability approach to design flood estimation, PhD thesis,

Department of Civil Engineering, Monash University, Australia.

Hosking, J.R.M. and Wallis, J.R. (1993). Some statics useful in regional frequency analysis,

Water Resources Research, 29(2), 271–281.

Hosking, J.R.M. and Wallis J.R. (1997). Regional frequency analysis – An approach based on

L-moments, Cambridge University Press, New York, 224 pp.

Hopfield, J. (1982). Neural networks and physical systems with emergent collective

computational abilities. Proceedings of the National Academy of Sciences of the USA,

9(2554).

Institution of Engineers Australia (I.E. Aust.) (1987, 2001). Australian rainfall and runoff: A

guide to flood estimation. Editor: D.H. Pilgrim, Vol.1, I. E. Aust., Canberra.

Ishak, E., Rahman, A., Westra, S., Sharma, A. and Kuczera, G. (2013). Evaluating the non-

stationarity of Australian annual maximum floods. Journal of Hydrology, 494, 134-145.

Ishak, E., Rahman, A. (2014). Detection of changes in flood data in Victoria, Australia over

1975-2011, Hydrology Research, doi:10.2166/nh.2014.064.

Ishak, E., Haddad, K., Zaman, M. and Rahman, A. (2011). Scaling property of regional floods

in New South Wales Australia, Natural Hazards, 58, 1155-1167.

Jain A. and Srinivasulu S. (2004). Development of effective and efficient rainfall-runoff

models using integration of deterministic, real-coded genetic algorithms and artificial neural

network techniques. Water Resources Research, 40, W04302.

Jain A., Srinivasalu S. and Bhattacharjya, R.K. (2005). Determination of an optimal unit pulse

response function using real-coded genetic algorithm. Journal of Hydrology, 303, 199-214.

Jain, A., Srinivasalu, S., Bhattacharjya, R.K. (2005). Determination of an optimal unit pulse

response function using real-coded genetic algorithm. Journal of Hydrology, 303, 199-214.

James, W. and Robinson, M.A. (1986). Continuous deterministic urban runoff modelling, in

C. Maksimovic and M. Radojkovic (Edition), Proceedings of the International Symposium on

Comparison of Urban Drainage Models with Real Catchment Data, Dubrovnik, Yugoslavia,

Pergamon Press, Oxford.

Jang, J.S.R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEE

Transactions on Systems, Man and Cybernetics, 23(3), 665-685.

Jang, J.S.R., Sum, C.T. and Mizutani, E. (1997). Neuro-fuzzy and soft computing Prentice-

Hall, New Jersey.



Javelle, P., Ouarda, B.M.J.T., Lang, M., Bobee, B., Galea, G. and Gresillon, J.M. (2002).

Devalopment of regional flood-duration-frequency curves based on the index flood method,

Journal of Hydrology, 258, 249-259.

Jeong, D.I., Stedinger, J.R., Kim, Y., Sung, J.H. and Yoon, S.Y. (2008). Reflecting a Climate

Change Factor in Flood Frequency Analysis for Korean River Basins. Water Down Under,

Adelaide, Australia, 14-17 April.

Jiapeng, H., Zhongmin, L. and Zhongbo, Y. (2003). A modified rational formula for flood

design in small basins, Journal of the American Water Resources Association, 39(5), 1017-

1025.

Jingyi, Z. and Hall, M.J. (2004). Regional flood frequency analysis for the Gan-Ming River

basin in China, Journal of Hydrology, 296, 98–117.

Kendall, M.G. (1970). Rank Correlation Methods, 2nd Ed., New York: Hafner.

Khu, S.T., Liong, S.Y., Babovic, V., Madsen, H. and Muttil, N. (2001). Genetic programming

and its application in real-time runoff forecasting. Journal of the American Water Resources

Association, 37(2), 439-451.

Kirby, W. and Moss, M. (1987). Summary of flood frequency analysis in the United States.

Journal of Hydrology, 96, 5-14.

Kisi, O. and Shiri, J. (2011). Precipitation forecasting using wavelet-genetic programming and

wavelet-neuro-fuzzy conjunction models. Water Resources Management, 25(13), 3135–3152.

Kjeldsen, T.R. and Jones, D.A. (2010). Predicting the index flood in ungauged UK

catchments: On the link between data-transfer and spatial model error structure, Journal of

Hydrology, 387(1-2), 1-9.

Kjeldsen, T.R. and Jones, D. (2009). An exploratory analysis of error components in

hydrological regression modelling. Water Resources Research, 45, W02407.

Klemes, V. (1993). Probability of extreme hydrometeorological events - A different approach

in extreme hydrological events: Precipitation, Floods and Droughts, 167-176, IAHS, Publi.

Kothyari, U.C. (2004). Estimation of mean annual flood from ungauged catchments using

artificial neural networks. Hydrology: Science and Practice for the 21st Century. Volume 1,

British Hydrological Society.

Kuczera, G. (1983). A Bayesian surrogate for regional skew in flood frequency analysis.

Water Resources Research, 19, 3, 832-837.



Kuczera, G. (1999). Comprehensive at-site flood frequency analysis using Monte Carlo

Bayesian inference. Water Resources Research, 35, 5, 1551-1557.

Lawrence, W.T. (1994). Comparative analysis of data acquired by three Narrow-band

airbome spectroradiometers over subboreal vegetation. Remote Sens. Environ., vol47, 204-

215.

Luk, K.C., Ball, J.E. and Sharma, A. (2001). An application of artificial neural networks for

rainfall forecasting. Mathematical and Computer Modelling, 33, 683-693.

Lumb, A.M. and James, L.D. (1976). Runoff files for flood hydrograph simulation. Journal of

the Hydraulics Division, ASCE, 1515-1531.

Madsen, H., Rosbjerg, D. and Harremoes, P. (1995). Application of the Bayesian approach in

regional analysis of extreme rainfalls. Stochastic Hydrology and Hydraulics, 9, 77-88.

Madsen, H., Pearson, C.P. and Rosbjerg, D. (1997). Comparison of annual maximum series

and partial duration series for modeling extreme hydrological events—2. Regional modeling.

Water Resources Research, 33(4), 771–781.

Maier, H.R. and Dandy, G.C. (2000). Neural networks for the prediction and forecasting of

water resources variables: a review of modelling issues and applications. Environmental

Modelling and Software, 15(1), 101-123.

McCulloch, W.S. and Pitts, W. (1943). A logic calculus of the ideas immanent in nervous

activity. Bulletin of Mathematical Biophysics, 5, 115–133.

Micevski, T., Hackelbusch, A., Haddad, K., Kuczera, G., Rahman, A. (2014).

Regionalisation of the parameters of the log-Pearson 3 distribution: a case study for New

South Wales, Australia, Hydrological Processes, DOI: 10.1002/hyp.10147.

Minns, A. and Hall, M. (1996). Artificial neural networks as rainfall-runoff models.

Hydrological Sciences, 41, 399-417.

Morshed, J. and Kaluarachchi, J.J. (1998). Application of artificial neural network and genetic

algorithm in flow and transport simulations. Journal of Advances in Water Resources, 22(2),

145-158.

Mitchell, M. (1996). An Introduction to Genetic Algorithms. MIT Press.

Muttiah, R.S., Srinivasan, R. and Allen, P.M. (1997). Prediction of two year peak stream

discharges using neural networks. Journal of the American Water Resources Association, 33

(3), 625–630.



Mulvany, T.J. (1851). On the use of self-registering rain and flood gauges, Inst. Civ. Eng.

(Ireland) Trans, 4(2), 1-8.

Nathan, R.J. and McMahon, T.A. (1990). Identification of homogeneous regions for the

purpose of regionalisation, Journal of Hydrology, 121, 217-238.

National Research Council (NRC). (1988). Estimating probabilities of extreme floods:

methods and recommended research. National Academy Press, Washington, D.C., 141.

Nayak, P.C. and Sudheer, K.P. (2004). A neuro-fuzzy computing technique for modelling

hydrological time series. Journal of Hydrology, 291(1–2), 52-66.

NERC. (1975). Flood studies report, Natural Environment Research Centre (NERC), London.

Novak, V., Perfilieva, I. and Mockor, J. (1999). Mathematical principles of fuzzy logic

dodrecht: Kluwer Acedamic. ISBN 0-7923-8595-0.

Ouarda, T.B.M.J., Bâ, K.M., Diaz-Delgado, C., Cârsteanu, C., Chokmani, K., Gingras, H.,

Quentin, E., Trujillo, E. and Bobée, B. (2008). Intercomparison of regional flood frequency

estimation methods at ungauged sites for a Mexican case study, Journal of Hydrology, 348,

40-58.

Pallard, B., Castellarin, A. and Montanari, A. (2009). A look at the links between drainage

density and flood statistics, Hydrology and Earth System Sciences (HESS), 13, 1019-1029.

Pandey, G.R. and Nguyen, V.T.V. (1999). A comparative study of regression based methods

in regional flood frequency analysis. Journal of Hydrology, 225, 92-101.

Parthiban, L. and Subramianian, R. (2009). CANFIS- A computer aided diagnostic tool for

cancer detection. Journal of Biomedical Science and Engineering, 2, 323-335.

Pegram G.G.S. and Parak, M. (2004). A review of the regional maximum flood and rational

formula using geomorphological information and observed floods, ISSN 0378-4738, Water

South Africa, 30(3), 377-392.

Pilgrim, D.H. and McDermott, G.E. (1982). Design floods for small rural catchments in

eastern New South Wales. Civil Engg Trans, Inst. Engrs Aust., CE24, 226-234.

Pilgrim, D.H. and Cordery, I. (1993). Flood Runoff, in Maidment, D.R., ed., Handbook of

Hydrology, McGraw-Hill, New York, 9, 9.1-9.42.

Pirozzi, J., Ashraf, M., Rahman, A. and Haddad, K. (2009). Design flood estimation for

ungauged catchments in eastern NSW: Evaluation of the probabilistic rational method. In

Proc. 31st Hydrology and Water Resources Symposium, Newcastle, Australia.



Principe, J.C., Euliano, N.R. and Lefebvre, W.C. (2000). Neural and adaptive systems, John

Wiley & Sons, Inc.

Queensland Reconstruction Authority (2011). Operation Queenslander: The State

Community, Economic and Environmental Recovery and Reconstruction Plan 2011–2013.

Queensland Reconstruction Authority, Queensland, Australia, March 2011, 48 pp.

Rahman, A. (1997). Flood Estimation for ungauged catchments: A regional approach using

flood and catchment characteristics, PhD thesis, Department of Civil Engineering, Monash

University.

Rabunal, J.R., Puertas, J., Suarez, J. and Rivero, D. (2007). Determination of the unit

hydrograph of a typical urban basin using genetic programming and artificial neural networks.

Hydrological Process, 27(4), 476–485.

Rahman, A. (2005). A quantile regression technique to estimate design floods for ungauged

catchments in South-east Australia. Australian Journal of Water Resources, 9(1), 81-89.

Rahman, A., Bates, B.C., Mein, R.G. and Weinmann, P.E. (1999). Regional flood frequency

analysis for ungauged basins in south-eastern Australia. Australian Journal of Water

Resources. 3(2), 199-207, 1324-1583.

Rahman, A., Weinmann, P.E. and Mein, R.G. (1999). At-site frequency analysis: LP3-product

moment, GEV-L moment and GEV-LH moment procedures compared. Water 99 Joint

Congress, 715-720.

Rahman, A., Weinmann, P.E., Hoang, T.M.T, Laurenson, E. M. (2002) Monte Carlo

Simulation of flood frequency curves from rainfall. Journal of Hydrology, 256 (3-4), 196-210.

ISSN 0022-1694.

Rahman, A. and Hollerbach, D. (2003). Study of runoff coefficients associated with the

probabilistic rational method for flood estimation in South-east Australia In Proc. 28th Intl.

Hydrology and Water Resources Symp., I. E. Aust., Wollongong, Australia, 10-13 Nov. 2003,

1, 199-203.

Rahman, A., Haddad, K., Caballero, W. and Weinmann, P.E. (2008). Progress on the

enhancement of the Probabilistic Rational Method for Victoria in Australia. 31st Hydrology

and Water Resources Symp., Adelaide, 15-17 April 2008, 940-951.

Rahman, A., Haddad, K., Kuczera, G. and Weinmann, P.E. (2009). Regional flood methods

for Australia: data preparation and exploratory analysis. Australian Rainfall and Runoff

Revision Projects, Project 5 Regional Flood Methods, Report No. P5/S1/003, Nov 2009,

Engineers Australia, Water Engineering, pp. 181.



Rahman, A., Haddad, K., Zaman, M., Kuczera, G. and Weinmann, P.E. (2011). Design flood

estimation in ungauged catchments: A comparison between the Probabilistic Rational Method

and Quantile Regression Technique for NSW. Australian Journal of Water Resources, 14, 2,

127-137.

Rahman, A., Haddad, K., Zaman, M., Ishak, E., Kuczera, G. and Weinmann, P.E. (2012).

Australian Rainfall and Runoff Revision Projects, Project 5 Regional flood methods, Stage 2

Report No. P5/S2/015, Engineers Australia, Water Engineering, pp. 319.

Rao, Z.F. and Jamieson, D.G. (1997). The use of neural networks and genetic algorithms for

design of groundwater remediation schemes. Hydrology and Earth System Sciences, 1(2),

345-356.

Rao, A.R. and Hamed, K.H. (2000). Flood frequency analysis. CRC Press, Florida, USA.

Riggs, H.C. (1973). Regional analyses of streamflow techniques. Techniques of water

resources investigations of the U.S. Geol. Surv., Book 4, Chapter B3, U.S.Geol. Surv.,

Washington D.C.

Reed, D.W. and Robson, A.J. (1999). Flood estimation handbook, vol. 3. Centre for Ecology

and Hydrology, UK.

Roger, J.S. Chuen-Tsai, S. and Eiji, M. (1997). Neuro-fuzzy and soft computing, Englewood

Cliffs, Prentice Hall.

Rooij, A.J.F.V., Jain, L.C. and Johnson, R.P. (1996). Neural network training using genetic

algorithms. World Scientific Publishing Co. Pty. Ltd., pp. 130.

Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986). Learning internal representations by

error propagation. In Rumelhart, D. E., McClelland, J. L. and the PDP Research Group,

editors, Paralled Distributed Processing. Explorations in the Microstructure of Cognition. Vol.

1, 318-362. The MIT Press, Cambridge, MA.

Saf, B. (2009). Regional flood frequency analysis using L-Moments for the West

Mediterranean Region of Turkey. Water Resources Management, 23(3), 531–551.

Savic, D.A., Walters, G.A. and Davidson, J.W. (1999). A genetic programming approach to

rainfall-runoff modelling. Water Resources Management, 12, 219-231.

See, L., and Openshaw, S. (1999). Applying soft computing approaches to river level

forecasting. Hydrological Sciences Journal, 44(5), 763-778.

Sekin, N. and Guven, A. (2012). Estimation of peak flood discharges at ungauged sites across

Turkey, Water Resources Management, 26, 2569–2581.



Shamseldin, A.Y. (1997). Application of a neural network technique to rainfall-runoff

modeling. Journal of Hydrology, 199, 272–294.

Shi, Y. and Muzimoto, M. (2000) Some Considerations on Convention Neuro-Fuzzy

Learning Algorithms Gradient Descent Method, Fuzzy Sets and Systems, 112, 51-63.

Shu, C. and Burn, D.H. (2004). Artificial neural network ensembles and their application in

pooled flood frequency analysis, Water Resources Research, 40(9), W09301,

doi:10.1029/2003WR002816.

Shu, C. and Ouarda, T.B.M.J. (2007). Flood frequency analysis at ungauged sites using

artificial neural networks in canonical correlation analysis physiographic space, Water

Resources Research, 43, W07438, doi:10.1029/2006WR005142.

Shu, C. and Ouarda, T.B.M.J. (2008). Regional flood frequency analysis at ungauged sites

using the adaptive neuro-fuzzy inference system. Journal of Hydrology, 349, 31-43.

Simonovic, S.P. (1992), Reservoir systems-analysis—Closing gap between theory and

practice. Journal of Water Resources Planning and Management, 118(3), 262–280.

Smith, J.A. (1992). Representation of basin scale in flood peak distributions. Water Resources

Research, 28 (11), 2993-2999.

Smith, J.A. (1993). LAI Inversion using a back-propagation neural network trained with a

multiple scattering Model. IEEE Transactions on Geoscience and Remote Sensing, 31,

5,1102-1106.

Stedinger, J.R., Tasker, G.D. (1985). Regional hydrologic analysis - 1. Ordinary, weighted

and generalized least squares compared. Water Resources Research, 21, 1421-1432.

Takens, F. (1981). Detecting strange attractors in turbulence. In: D.A. Rand and L.-S. Young,

Editors, Dynamical systems and turbulence, Lecture Notes in Mathematics. Vol. 898,

Springer-Verlag, Berlin, pp. 366–381.

Takagi, T. and M. Sugeno. (1983). Derivation of fuzzy control rules from human operator’s

control actions. Proceedings of the IFAC symposium on fuzzy information, knowledge

representation and decision analysis.

Takagi, T. and M. Sugeno. (1985). Fuzzy identification of systems and its applications to

modeling and control. Systems, Man and Cybernetics, IEEE Transactions, (1), 116-132.

Takagi, H. and Hayashi, I. (1991). Neural Network driven fuzzy reasoning. International.

Journal of Approximate Reasoning, 5(3), 191-212.



Talei, A. and Chua, L.H.C. (2010a). A novel application of a neuro-fuzzy computational

technique in event-based rainfall–runoff modelling. Expert Systems with Applications,

37(12), 7456-7468.

Tasker, G.D. (1980). Hydrologic regression with weighted least squares. Water Resources

Research, 16(6), 1107-1113.

Tasker, G.D., Eychaner, J.H. and Stedinger J.R. (1986). Application of generalised least

squares in regional hydrologic regression analysis. US Geological Survey Water Supply

Paper, 2310, 107–115.

Tasker, G.D., Hodge, S.A. and C.S. Barks. (1996). Region of Influence regression for estimat-

ing the 50-year flood at ungauged sites, Water Resources Bulletin, 32(1), 163-170.

Thandaveswara, B.S. and Sajikumar, N. (2000). Classification of river basins using artificial

neural networks. Journal of Hydrologic Engineering, 5 (3), 290–298.

Theodoridis, S. and Koutroumbas, K. (2009). Pattern Recognition, 4th Edition, Academic

Press, ISBN: 978-1-59749-272-0.

Thomas, D.M. and Benson, M.A. (1970). Generalization of streamflow characteristics from

drainage-basin characteristics, U.S. Geological Survey Water Supply Paper 1975, US

Governmental Printing Office.

Tokar, A.S. and Johnson, P.A. (1999). Rainfall-Runoff Modeling using Artificial Neural

Networks, J. Hydrologic Engineering, ASCE, 4(3), 232-239.

Turan, M.E. and Yurdusev, M.A. (2009). River flow estimation from upstream flow records

by artificial intelligence methods. Journal of Hydrology, 369, 71–77.

Vogel, R.M., McMahon, T.A. and Chiew, F.H.S. (1993). Flood flow frequency model

selection in Australia. Journal of Hydrology, 146, 421-449.

Wang, Q.J. (1991). The genetic algorithm and its application to calibrating conceptual

rainfall-runoff models. Water Resources Research, 27(9), 2467-2471.

Wasserman, P.D. (1989). Neural computing: theory and practice. Van Nostrand Reinhold,

New York.

Wasserman, P. (1993). Advanced methods in neural computing, Van Nostrand Reinhold,

ISBN 0-442-00461-3.

Weeks, W.D. (1991). Design floods for small rural catchments in Queensland, Civil

Engineering Transactions, IEAust, 33(4), 249-260.

http://en.wikipedia.org/wiki/Special:BookSources/0442004613



Wu, C.L. and Chau, K.W. (2006). A flood forecasting neural network model with genetic

algorithm. International Journal Environment and Pollution, 28, 261, 3-4.

Zaman, M., Rahman, A., Haddad, K. (2012). Regional flood frequency analysis in arid

regions: A case study for Australia. Journal of Hydrology, 475, 74-83.

Zhang, B. and Govindaraju, R.S. (2003). Geomorphology-based artificial neural networks for

estimation of direct runoff over watersheds. Journal of Hydrology, 273 (1), 18–34.

Zhang, Z. and Hall, D.B. (2004). Marginal models for zero inflated clustered data. Statistical

Modelling, 4, 161–180.

Zrinji, Z. and Burn, D.H. (1994). Flood frequency analysis for ungauged sites using a region

of influence approach. Journal of Hydrology, 153(1-4), 1-21.



APPENDICES



APPENDIX A

List of selected study catchments



Appendix A List of selected catchments

Table A1 Selected catchments from New South Wales

Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length

(years) Period of Record

201001 Eungella Oxley -28.36 153.29 213 49 1958 - 2006

203002 Repentance Coopers Ck -28.64 153.41 62 30 1977 - 2006

203012 Binna Burra Byron Ck -28.71 153.50 39 29 1978 - 2006

203030 Rappville Myrtle Ck -29.11 153.00 332 27 1980 - 2006

204025 Karangi Orara -30.26 153.03 135 37 1970 - 2006

204026 Bobo Nursery Bobo -30.25 152.85 80 29 1956 - 1984

204030 Aberfoyle Aberfoyle -30.26 152.01 200 29 1978 - 2006

204036 Sandy Hill(below Snake Cre Cataract Ck -28.93 152.22 236 54 1953 - 2006

204037 Clouds Ck Clouds Ck -30.09 152.63 62 35 1972 - 2006

204056 Gibraltar Range Dandahra Ck -29.49 152.45 104 31 1976 - 2006

204906 Glenreagh Orara -30.07 152.99 446 34 1973 - 2006

206009 Tia Tia -31.19 151.83 261 53 1955 - 2007

206025 near Dangar Falls Salisbury Waters -30.68 151.71 594 34 1973 - 2006

206026 Newholme Sandy Ck -30.42 151.66 8 33 1975 - 2007

207006 Birdwood(Filly Flat) Forbes -31.39 152.33 363 32 1976 - 2007

208001 Bobs Crossing Barrington -32.03 151.47 20 52 1955 - 2006

209001 Monkerai Karuah -32.24 151.82 203 34 1946 - 1979

209002 Crossing Mammy Johnsons -32.25 151.98 156 31 1976 - 2006

209003 Booral Karuah -32.48 151.95 974 38 1969 - 2006

209006 Willina Wang Wauk -32.16 152.26 150 36 1970 - 2005

209018 Dam Site Karuah -32.28 151.90 300 27 1980 - 2006

210011 Tillegra Williams -32.32 151.69 194 75 1932 - 2006

210014 Rouchel Brook (The Vale) Rouchel Brook -32.15 151.05 395 42 1960 - 2001

210017 Moonan Brook Moonan Brook -31.94 151.28 103 67 1941 - 2007





210022 Halton Allyn -32.31 151.51 205 65 1941 - 2005

210040 Wybong Wybong Ck -32.27 150.64 676 50 1956 - 2005

210042 Ravensworth Foy Brook -32.40 151.05 170 30 1967 - 1996

210044 Middle Falbrook(Fal Dam Si Glennies Ck -32.45 151.15 466 51 1957 - 2007

210068 Pokolbin Site 3 Pokolbin Ck -32.80 151.33 25 41 1965 - 2005

210076 Liddell Antiene Ck -32.34 150.98 13 37 1969 - 2005

210079 Gostwyck Paterson -32.55 151.59 956 33 1975 - 2007

210080 U/S Glendon Brook West Brook -32.47 151.28 80 31 1977 - 2007

211009 Gracemere Wyong -33.27 151.36 236 35 1973 - 2007

211013 U/S Weir Ourimbah Ck -33.35 151.34 83 30 1977 - 2006

212008 Bathurst Rd Coxs -33.43 150.08 199 55 1952 - 2006

212018 Glen Davis Capertee -33.12 150.28 1010 35 1972 - 2006

212040 Pomeroy Kialla Ck -34.61 149.54 96 27 1980 - 2004

213005 Briens Rd Toongabbie Ck -33.80 150.98 70 27 1980 - 2006

215004 Hockeys Corang -35.15 150.03 166 75 1930 - 2004

218002 Belowra Tuross -36.20 149.71 556 29 1955 - 1983

218005 D/S Wadbilliga R Junct Tuross -36.20 149.76 900 42 1965 - 2006

218007 Wadbilliga Wadbilliga -36.26 149.69 122 33 1975 - 2005

219003 Morans Crossing Bemboka -36.67 149.65 316 64 1944 - 2007

219017 near Brogo Double Ck -36.60 149.81 152 41 1967 - 2007

219022 Candelo Dam Site Tantawangalo Ck -36.73 149.68 202 36 1972 - 2007

219025 Angledale Brogo -36.62 149.88 717 30 1977 - 2006

220001 New Buildings Br Towamba -36.96 149.56 272 26 1955 - 1980

220003 Lochiel Pambula -36.94 149.82 105 41 1967 - 2005





220004 Towamba Towamba -37.07 149.66 745 37 1971 - 2007

221002 Princes HWY Wallagaraugh -37.37 149.71 479 36 1972 - 2007

222004 Wellesley (Rowes) Little Plains -37.00 149.09 604 65 1942 - 2006

222007 Woolway Wullwye Ck -36.42 148.91 520 57 1950 - 2006

222009 The Falls Bombala -36.92 149.21 559 43 1952 - 1994

222015 Jacobs Ladder Jacobs -36.73 148.43 187 27 1976 - 2002

222016 The Barry Way Pinch -36.79 148.40 155 31 1976 - 2006

222017 The Hut Maclaughlin -36.66 149.11 313 28 1979 - 2006

401009 Maragle Maragle Ck -35.93 148.10 220 56 1950 - 2005

401013 Jingellic Jingellic Ck -35.90 147.69 378 33 1973 - 2005

401015 Yambla Bowna Ck -35.92 146.98 316 31 1975 - 2005

410038 Darbalara Adjungbilly Ck -35.02 148.25 411 37 1969 - 2005

410048 Ladysmith Kyeamba Ck -35.20 147.51 530 48 1939 - 1986

410057 Lacmalac Goobarragandra -35.33 148.35 673 49 1958 - 2006

410061 Batlow Rd Adelong Ck -35.33 148.07 155 60 1948 - 2007

410062 Numeralla School Numeralla -36.18 149.35 673 43 1965 - 2007

410076 Jerangle Rd Strike-A-Light C -35.92 149.24 212 31 1975 - 2005

410088 Brindabella (No.2&No.3-Cab Goodradigbee -35.42 148.73 427 38 1968 - 2005

410112 Jindalee Jindalee Ck -34.58 148.09 14 30 1976 - 2005

410114 Wyangle Killimcat Ck -35.24 148.31 23 30 1977 - 2006

411001 Bungendore Mill Post Ck -35.28 149.39 16 25 1960 - 1984

411003 Butmaroo Butmaroo Ck -35.26 149.54 65 28 1979 - 2006

412050 Narrawa North Crookwell -34.31 149.17 740 34 1970 - 2003

412063 Gunning Lachlan -34.74 149.29 570 39 1961 - 1999





412081 near Neville Rocky Br Ck -33.80 149.19 145 33 1969 - 2001

412083 Tuena Tuena Ck -34.02 149.33 321 33 1969 - 2001

416003 Clifton Tenterfield Ck -29.03 151.72 570 28 1979 - 2006

416008 Haystack Beardy -29.22 151.38 866 35 1972 - 2006

416016 Inverell (Middle Ck) Macintyre -29.79 151.13 726 35 1972 - 2006

416020 Coolatai Ottleys Ck -29.23 150.76 402 28 1979 - 2006

416023 Bolivia Deepwater -29.29 151.92 505 28 1979 - 2006

418005 Kimberley Copes Ck -29.92 151.11 259 35 1972 - 2006

418014 Yarrowyck Gwydir -30.47 151.36 855 37 1971 - 2007

418017 Molroy Myall Ck -29.80 150.58 842 29 1979 - 2007

418021 Laura Laura Ck -30.23 151.19 311 29 1978 - 2006

418025 Bingara Halls Ck -29.94 150.57 156 28 1980 - 2007

418027 Horton Dam Site Horton -30.21 150.43 220 36 1972 - 2007

418034 Black Mountain Boorolong Ck -30.30 151.64 14 29 1976 - 2004

419010 Woolbrook Macdonald -30.97 151.35 829 28 1980 - 2007

419016 Mulla Crossing Cockburn -31.06 151.13 907 33 1974 - 2006

419029 Ukolan Halls Ck -30.71 150.83 389 27 1979 - 2005

419051 Avoca East Maules Ck -30.50 150.08 454 31 1977 - 2007

419053 Black Springs Manilla -30.42 150.65 791 33 1975 - 2007

419054 Limbri Swamp Oak Ck -31.04 151.17 391 33 1975 - 2007

420003 Warkton (Blackburns) Belar Ck -31.39 149.20 133 30 1976 - 2005

421026 Sofala Turon -33.08 149.69 883 34 1974 - 2007

421036 below Dam Site Duckmaloi -33.75 149.94 112 25 1956 - 1980

421050 Molong Bell -33.03 148.95 365 33 1975 - 2007



Table A2 Selected catchments from Victoria



221207 Errinundra Errinundra -37.45 148.91 158 35 1971 - 2005

221209 Weeragua Cann(East Branch -37.37 149.20 154 33 1973 - 2005

221210 The Gorge Genoa -37.43 149.53 837 34 1972 - 2005

221211 Combienbar Combienbar -37.44 148.98 179 32 1974 - 2005

221212 Princes HWY Bemm -37.61 148.90 725 31 1975 - 2005

222202 Sardine Ck Brodribb -37.51 148.55 650 41 1965 - 2005

222206 Buchan Buchan -37.50 148.18 822 32 1974 - 2005

222210 Deddick (Caseys) Deddick -37.09 148.43 857 35 1970 - 2005

222213 Suggan Buggan Suggan Buggan -36.95 148.33 357 35 1971 - 2005

222217 Jacksons Crossing Rodger -37.41 148.36 447 30 1976 - 2005

223202 Swifts Ck Tambo -37.26 147.72 943 32 1974 - 2005

223204 Deptford Nicholson -37.60 147.70 287 32 1974 - 2005

224213 Lower Dargo Rd Dargo -37.50 147.27 676 33 1973 - 2005

224214 Tabberabbera Wentworth -37.50 147.39 443 32 1974 - 2005

225213 Beardmore Aberfeldy -37.85 146.43 311 33 1973 - 2005

225218 Briagalong Freestone Ck -37.81 147.09 309 35 1971 - 2005

225219 Glencairn Macalister -37.52 146.57 570 39 1967 - 2005

225223 Gillio Rd Valencia Ck -37.73 146.98 195 35 1971 - 2005

225224 The Channel Avon -37.80 146.88 554 34 1972 - 2005

226204 Willow Grove Latrobe -38.09 146.16 580 35 1971 - 2005

226205 Noojee Latrobe -37.91 146.02 290 46 1960 - 2005

226209 Darnum Moe -38.21 146.00 214 34 1972 - 2005

226217 Hawthorn Br Latrobe -37.98 146.08 440 34 1955 - 1988

226218 Thorpdale Narracan Ck -38.27 146.19 66 35 1971 - 2005





226222 Near Noojee (U/S Ada R Jun Latrobe -37.88 145.89 62 35 1971 - 2005

226226 Tanjil Junction Tanjil -38.01 146.20 289 46 1960 - 2005

226402 Trafalgar East Moe Drain -38.18 146.21 622 31 1975 - 2005

227200 Yarram Tarra -38.46 146.69 25 41 1965 - 2005

227205 Calignee South Merriman Ck -38.36 146.65 36 31 1975 - 2005

227210 Carrajung Lower Bruthen Ck -38.40 146.74 18 33 1973 - 2005

227211 Toora Agnes -38.64 146.37 67 32 1974 - 2005

227213 Jack Jack -38.53 146.53 34 36 1970 - 2005

227219 Loch Bass -38.38 145.56 52 32 1973 - 2004

227225 Fischers Tarra -38.47 146.56 16 33 1973 - 2005

227226 Dumbalk North Tarwineast Branc -38.50 146.16 127 36 1970 - 2005

227231 Glen Forbes South Bass -38.47 145.51 233 32 1974 - 2005

227236 D/S Foster Ck Jun Powlett -38.56 145.71 228 27 1979 - 2005

228212 Tonimbuk Bunyip -38.03 145.76 174 30 1975 - 2004

228217 Pakenham Toomuc Ck -38.07 145.46 41 29 1974 - 2002

229218 Watsons Ck Watsons Ck -37.67 145.26 36 26 1974 - 1999

230202 Sunbury Jackson Ck -37.58 144.74 337 31 1975 - 2005

230204 Riddells Ck Riddells Ck -37.47 144.67 79 32 1974 - 2005

230205 Bulla (D/S of Emu Ck Jun) Deep Ck -37.63 144.80 865 32 1974 - 2005

230211 Clarkefield Emu Ck -37.47 144.75 93 31 1975 - 2005

231200 Bacchus Marsh Werribee Ck -37.68 144.43 363 28 1978 - 2005

231213 Sardine Ck- O'Brien Cro Lerderderg Ck -37.50 144.36 153 47 1959 - 2005

231225 Ballan (U/S Old Western H) Werribee Ck -37.60 144.25 71 33 1973 - 2005

231231 Melton South Toolern Ck -37.91 144.58 95 27 1979 - 2005





232200 Little Little Ck -37.96 144.48 417 32 1974 - 2005

232210 Lal Lal Mooraboolwest Br -37.65 144.04 83 33 1973 - 2005

232213 U/S of Bungal Dam Lal Lal Ck -37.66 144.03 157 29 1977 - 2005

233211 Ricketts Marsh Birregurra Ck -38.30 143.84 245 31 1975 - 2005

233214 Forrest (above Tunnel) Barwoneast Branc -38.53 143.73 17 28 1978 - 2005

234200 Pitfield Woady Yaloak -37.81 143.59 324 34 1972 - 2005

235202 Upper Gellibrand Gellibrand -37.56 143.64 53 31 1975 - 2005

235203 Curdie Curdies -38.45 142.96 790 31 1975 - 2005

235204 Beech Forest Little Aire Ck -38.66 143.53 11 30 1976 - 2005

235205 Wyelangta Arkins Ck West B -38.65 143.44 3 28 1978 - 2005

235227 Bunkers Hill Gellibrand -38.53 143.48 311 32 1974 - 2005

235233 Apollo Bay- Paradise Barhameast Branc -38.76 143.62 43 29 1977 - 2005

235234 Gellibrand Love Ck -38.49 143.57 75 27 1979 - 2005

236205 Woodford Merri -38.32 142.48 899 32 1974 - 2005

236212 Cudgee Brucknell Ck -38.35 142.65 570 31 1975 - 2005

237207 Heathmere Surry -38.25 141.66 310 31 1975 - 2005

238207 Jimmy Ck Wannon -37.37 142.50 40 32 1974 - 2005

238219 Morgiana Grange Burn -37.71 141.83 997 33 1973 - 2005

401208 Berringama Cudgewa Ck -36.21 147.68 350 41 1965 - 2005

401209 Omeo Livingstone Ck -37.11 147.57 243 27 1968 - 1994

401210 below Granite Flat Snowy Ck -36.57 147.41 407 38 1968 - 2005

401212 Upper Nariel Nariel Ck -36.45 147.83 252 52 1954 - 2005

401215 Uplands Morass Ck -36.87 147.70 471 35 1971 - 2005

401216 Jokers Ck Big -36.95 141.47 356 52 1952 - 2005





401217 Gibbo Park Gibbo -36.75 147.71 389 35 1971 - 2005

401220 McCallums Tallangatta Ck -36.21 147.50 464 30 1976 - 2005

402203 Mongans Br Kiewa -36.60 147.10 552 36 1970 - 2005

402204 Osbornes Flat Yackandandah Ck -36.31 146.90 255 39 1967 - 2005

402206 Running Ck Running Ck -36.54 147.05 126 31 1975 - 2005

402217 Myrtleford Rd Br Flaggy Ck -36.39 146.88 24 36 1970 - 2005

403205 Bright Ovens Rivers -36.73 146.95 495 35 1971 - 2005

403209 Wangaratta North Reedy Ck -36.33 146.34 368 33 1973 - 2005

403213 Greta South Fifteen Mile Ck -36.62 146.24 229 33 1973 - 2005

403221 Woolshed Reedy Ck -36.31 146.60 214 30 1975 - 2004

403222 Abbeyard Buffalo -36.91 146.70 425 33 1973 - 2005

403224 Bobinawarrah Hurdle Ck -36.52 146.45 158 31 1975 - 2005

403226 Angleside Boggy Ck -36.61 146.36 108 32 1974 - 2005

403227 Cheshunt King -36.83 146.40 453 33 1973 - 2005

403233 Harris Lane Buckland -36.72 146.88 435 34 1972 - 2005

404206 Moorngag Broken -36.80 146.02 497 33 1973 - 2005

404207 Kelfeera Holland Ck -36.61 146.06 451 31 1975 - 2005

405205 Murrindindi above Colwells Murrindindi -37.41 145.56 108 31 1975 - 2005

405209 Taggerty Acheron -37.32 145.71 619 33 1973 - 2005

405212 Tallarook Sunday Ck -37.10 145.05 337 31 1975 - 2005

405214 Tonga Br Delatite -37.15 146.13 368 49 1957 - 2005

405215 Glen Esk Howqua -37.23 146.21 368 32 1974 - 2005

405217 Devlins Br Yea -37.38 145.48 360 31 1975 - 2005

405218 Gerrang Br Jamieson -37.29 146.19 368 47 1959 - 2005





405219 Dohertys Goulburn -37.33 146.13 694 39 1967 - 2005

405226 Moorilim Pranjip Ck -36.62 145.31 787 32 1974 - 2005

405227 Jamieson Big Ck -37.37 146.06 619 36 1970 - 2005

405229 Wanalta Wanalta Ck -36.64 144.87 108 36 1969 - 2005

405230 Colbinabbin Cornella Ck -36.61 144.80 259 33 1973 - 2005

405231 Flowerdale King Parrot Ck -37.35 145.29 181 32 1974 - 2005

405237 Euroa Township Seven Creeks -36.76 145.58 332 33 1973 - 2005

405240 Ash Br Sugarloaf Ck -37.06 145.05 609 33 1973 - 2005

405241 Rubicon Rubicon -37.29 145.83 129 33 1973 - 2005

405245 Mansfield Ford Ck -37.04 146.05 115 36 1970 - 2005

405248 Graytown Major Ck -36.86 144.91 282 35 1971 - 2005

405251 Ancona Brankeet Ck -36.97 145.78 121 33 1973 - 2005

405263 U/S of Snake Ck Jun Goulburn -37.46 146.25 327 31 1975 - 2005

405264 D/S of Frenchman Ck Jun Big -37.52 146.08 333 31 1975 - 2005

405274 Yarck Home Ck -37.11 145.60 187 29 1977 - 2005

406213 Redesdale Campaspe -37.02 144.54 629 30 1975 - 2004

406214 Longlea Axe Ck -36.78 144.43 234 34 1972 - 2005

406215 Lyal Coliban -36.96 144.49 717 32 1974 - 2005

406216 Sedgewick Axe Ck -36.90 144.36 34 26 1975 - 2005

406224 Runnymede Mount Pleasant C -36.55 144.64 248 30 1975 - 2004

406226 Derrinal Mount Ida Ck -36.88 144.65 174 28 1978 - 2005

407214 Clunes Creswick Ck -37.30 143.79 308 31 1975 - 2005

407217 Vaughan atD/S Fryers Ck Loddon -37.16 144.21 299 38 1968 - 2005

407220 Norwood Bet Bet Ck -37.00 143.64 347 33 1973 - 2005





407221 Yandoit Jim Crow Ck -37.21 144.10 166 33 1973 - 2005

407222 Clunes Tullaroop Ck -37.23 143.83 632 33 1973 - 2005

407230 Strathlea Joyces Ck -37.17 143.96 153 33 1973 - 2005

407246 Marong Bullock Ck -36.73 144.13 184 33 1973 - 2005

407253 Minto Piccaninny Ck -36.45 144.47 668 33 1973 - 2005

415207 Eversley Wimmera -37.19 143.19 304 31 1975 - 2005

415217 Grampians Rd Br Fyans Ck -37.26 142.53 34 33 1973 - 2005

415220 Wimmera HWY Avon -36.64 142.98 596 32 1974 - 2005

415226 Carrs Plains Richardson -36.75 142.79 130 31 1971 - 2001

415237 Stawell Concongella Ck -37.02 142.82 239 29 1977 - 2005

415238 Navarre Wattle Ck -36.90 143.10 141 30 1976 - 2005



Table A3 Selected catchments from Tasmania



76 at Ballroom Offtake North Esk -41.50 147.39 335.0 74 1923 - 1996

159 D/S Rapid Arthur -41.12 145.08 1600.0 42 1955 - 1996

473 D/S Crossing Rv Davey -43.14 145.95 680.0 34 1964 - 1997

499 at Newbury Tyenna -42.71 146.71 198.0 33 1965 - 1997

852 at Strathbridge Meander -41.49 146.91 1025.0 24 1985 - 2008

1012 3.5 Km U/S Esperance Peak Rivulet -43.32 146.90 35.0 23 1975 - 1997

1200 at Whitemark Water Supply South Pats -40.09 148.02 21.0 22 1969 - 1990

2200 at The Grange Swan -42.05 148.07 440.0 33 1964 - 1996

2204 U/S Coles Bay Rd Bdg Apsley -41.94 148.24 157.0 24 1969 - 1992

2206 U/S Scamander Water Supply Scamander -41.45 148.18 265.0 28 1969 - 1996

2207 3 Km U/S Tasman Hwy Little Swanport -42.34 147.90 600.0 19 1971 - 1989

2208 at Swansea Meredith -42.12 148.04 88.0 27 1970 - 1996

2209 Tidal Limit Carlton -42.87 147.70 136.0 28 1969 - 1996

2211 U/S Brinktop Rd Orielton Rivulet -42.76 147.54 46.0 24 1973 - 1996

2213 D/S McNeils Rd Goatrock Ck -42.14 147.92 1.3 22 1975 - 1996

3203 at Baden Coal -42.43 147.45 55.0 26 1971 - 1996

4201 at Mauriceton Jordan -42.53 147.12 730.0 36 1966 - 2001

5200 at Summerleas Rd Br Browns -42.96 147.27 15.0 30 1963 - 1992

6200 D/S Grundys Ck Mountain -42.94 147.13 42.0 29 1968 - 1996

7200 Dover Ws Intake Esperance -43.34 146.96 174.0 29 1965 - 1993

14206 1.5 Km U/S of Mouth Sulphur Ck -41.11 146.03 23.0 29 1964 - 1992

14207 at Bannons Br Leven -41.25 146.09 495.0 35 1963 - 1997

14210 U/S Flowerdale R Juncti Inglis -41.00 145.63 170.0 21 1968 - 1988

14215 at Moorleah Flowerdale -40.97 145.61 150.0 31 1966 - 1996





14217 at Sprent Claytons Rivulet -41.26 146.17 13.5 26 1970 - 1995

14220 U/S Bass HWY Seabrook Ck -41.01 145.77 40.0 20 1977 - 1996

16200 U/S Old Bass Hwy Don -41.19 146.31 130.0 24 1967 - 1990

17200 at Tidal Limit Rubicon -41.26 146.57 255.0 31 1967 - 1997

17201 1.5KM U/S Tidal Limit Franklin Rivulet -41.26 146.61 131.0 20 1975 - 1994

18201 0.5 Km U/S Tamar Supply -41.26 146.94 135.0 19 1965 - 1983

18221 D/S Jackeys Marsh Jackeys Ck -41.68 146.66 29.0 27 1982 - 2008

18312 D/S Elizabeth R Junctio Macquarie -41.91 147.39 1900.0 19 1989 - 2007

19200 2.6KM U/S Tidal Limit Brid -41.02 147.37 134.0 32 1965 - 1996

19201 2KM U/S Forester Rd Bdg Great Forester -41.11 147.61 195.0 27 1970 - 1996

19204 D/S Yarrow Ck Pipers -41.07 147.11 292.0 25 1972 - 1996

304040 U/S Derwent Junction Florentine River -42.44 146.52 435.8 58 1951 - 2008

304125 Below Lagoon Travellers Rest River -42.07 146.25 43.6 25 1949 - 1973

304597 At Lake Highway Pine Tree Rivulet Ck -41.80 146.68 19.4 40 1969 - 2008

308145 At Mount Ficham Track Franklin River -42.24 145.77 757.0 56 1953 - 2008

308183 Below Jane River Franklin River -42.47 145.76 1590.3 22 1957 - 1978

308225 Below Darwin Dam Andrew River -42.22 145.62 5.3 21 1988 - 2008

308446 Below Huntley Gordon River -42.66 146.37 458.0 27 1953 - 1979

308799 B/L Alma Collingwood Ck -42.16 145.93 292.5 28 1981 - 2008

308819 Above Kelly Basin Rd Andrew River -42.22 145.62 4.6 26 1983 - 2008

310061 At Murchison Highway Que River -41.58 145.68 18.4 22 1987 - 2008

310148 Above Sterling Murchison River -41.76 145.62 756.3 28 1955 - 1982

310149 Below Sophia River Mackintosh River -41.72 145.63 523.2 27 1954 - 1980

310472 Below Bulgobac Creek Que River -41.62 145.58 119.1 32 1964 - 1995





315074 At Moina Wilmot River -41.47 146.07 158.1 46 1923 - 1968

315450 U/S Lemonthyme Forth River -41.61 146.13 311.0 46 1963 - 2008

316624 Above Mersey Arm River -41.69 146.21 86.0 37 1972 - 2008

318065 Below Deloraine Meander River -41.53 146.66 474.0 28 1969 - 1996

318350 Above Rocky Creek Whyte River -41.63 145.19 310.8 33 1960 - 1992



Table A4 Selected catchments from Queensland



102101A Fall Ck Pascoe -12.88 142.98 651 33 1968 - 2005

104001A Telegraph Rd Stewart -14.17 143.39 470 32 1970 - 2005

105105A Developmental Rd East Normanby -15.77 145.01 297 34 1970 - 2005

107001B Flaggy Endeavour -15.42 145.07 337 43 1959 - 2004

108002A Bairds Daintree -16.18 145.28 911 29 1969 - 2000

108003A China Camp Bloomfield -15.99 145.29 264 32 1971 - 2004

110003A Picnic Crossing Barron -17.26 145.54 228 80 1926 - 2005

110011B Recorder Flaggy Ck -16.78 145.53 150 44 1956 - 2003

110101B Freshwater Freshwater Ck -16.94 145.70 70 37 1922 - 1958

111001A Gordonvale Mulgrave -17.10 145.79 552 43 1917 - 1972

111003C Aloomba Behana Ck -17.13 145.84 86 28 1943 - 1970

111005A The Fisheries Mulgrave -17.19 145.72 357 34 1967 - 2004

111007A Peets Br Mulgrave -17.14 145.76 520 31 1973 - 2004

111105A The Boulders Babinda Ck -17.35 145.87 39 29 1967 - 2003

112001A Goondi North Johnstone -17.53 145.97 936 39 1929 - 1967

112002A Nerada Fisher Ck -17.57 145.91 15.7 75 1929 - 2004

112003A Glen Allyn North Johnstone -17.38 145.65 165 46 1959 - 2004

112004A Tung Oil North Johnstone -17.55 145.93 925 31 1967 - 2004

112101B U/S Central Mill South Johnstone -17.61 145.98 400 81 1917 - 2003

113004A Powerline Cochable Ck -17.75 145.63 95 32 1967 - 2001

114001A Upper Murray Murray -18.11 145.80 156 31 1971 - 2003

116005B Peacocks Siding Stone -18.69 145.98 368 36 1936 - 1971

116008B Abergowrie Gowrie Ck -18.45 145.85 124 51 1954 - 2004

116010A Blencoe Falls Blencoe Ck -18.20 145.54 226 40 1961 - 2000





116011A Ravenshoe Millstream -17.60 145.48 89 42 1963 - 2004

116012A 8.7KM Cameron Ck -18.07 145.34 360 41 1962 - 2002

116013A Archer Ck Millstream -17.65 145.34 308 42 1962 - 2003

116014A Silver Valley Wild -17.63 145.30 591 44 1962 - 2005

116015A Wooroora Blunder Ck -17.74 145.44 127 38 1967 - 2004

116017A Running Ck Stone -18.77 145.95 157 33 1971 - 2004

117002A Bruce HWY Black -19.24 146.63 256 31 1974 - 2004

117003A Bluewater Bluewater Ck -19.18 146.55 86 30 1974 - 2003

118101A Gleesons Weir Ross -19.32 146.74 797 44 1916 - 1959

118106A Allendale Alligator Ck -19.39 146.96 69 30 1975 - 2004

119006A Damsite Major Ck -19.67 147.02 468 25 1979 - 2003

120014A Oak Meadows Broughton -20.18 146.32 182 28 1971 - 1998

120102A Keelbottom Keelbottom Ck -19.37 146.36 193 38 1968 - 2005

120120A Mt. Bradley Running -19.13 145.91 490 30 1976 - 2005

120204B Crediton Recorder Broken -21.17 148.51 41 31 1957 - 1987

120206A Mt Jimmy Pelican Ck -20.60 147.69 545 27 1961 - 1987

120216A Old Racecourse Broken -21.19 148.45 100 36 1970 - 2005

120307A Pentland Cape -20.48 145.47 775 34 1970 - 2003

121001A Ida Ck Don -20.29 148.12 604 48 1958 - 2005

121002A Guthalungra Elliot -19.94 147.84 273 32 1974 - 2005

122004A Lower Gregory Gregory -20.30 148.55 47 33 1973 - 2005

124001A Caping Siding O'Connell -20.63 148.57 363 35 1970 - 2004

124002A Calen StHelens Ck -20.91 148.76 118 32 1974 - 2005

124003A Jochheims Andromache -20.58 148.47 230 29 1977 - 2005





125002C Sarich's Pioneer -21.27 148.82 757 43 1961 - 2005

125004B Gargett Cattle Ck -21.18 148.74 326 38 1968 - 2005

125005A Whitefords Blacks Ck -21.33 148.83 506 32 1974 - 2005

125006A Dam Site Finch Hatton Ck -21.11 148.63 35 29 1977 - 2005

126003A Carmila Carmila Ck -21.92 149.40 84 31 1974 - 2004

129001A Byfield Waterpark Ck -22.84 150.67 212 48 1953 - 2005

130004A Old Stn Raglan Ck -23.82 150.82 389 41 1964 - 2004

130108B Curragh Blackwater Ck -23.50 148.88 776 31 1973 - 2005

130207A Clermont Sandy Ck -22.80 147.58 409 40 1966 - 2005

130208A Ellendale Theresa Ck -22.98 147.58 758 37 1965 - 2001

130215A Lilyvale Lagoon Crinum Ck -23.21 148.34 252 29 1977 - 2005

130319A Craiglands Bell Ck -24.15 150.52 300 44 1961 - 2004

130321A Mt. Kroombit Kroombit Ck -24.41 150.72 373 41 1964 - 2004

130334A Pump Stn South Kariboe Ck -24.56 150.75 284 33 1973 - 2005

130335A Wura Dee -23.77 150.36 472 34 1972 - 2005

130336A Folding Hills Grevillea Ck -24.58 150.62 233 33 1973 - 2005

130348A Red Hill Prospect Ck -24.45 150.42 369 30 1976 - 2005

130349A Kingsborough Don -23.97 150.39 593 28 1977 - 2005

130413A Braeside Denison Ck -21.77 148.79 757 34 1972 - 2005

133003A Marlua Diglum Ck -24.19 151.16 203 36 1969 - 2004

135002A Springfield Kolan -24.75 151.59 551 40 1966 - 2005

135004A Dam Site Gin Gin Ck -24.97 151.89 531 40 1966 - 2005

136006A Dam Site Reid Ck -25.27 151.52 219 40 1966 - 2005

136102A Meldale Three Moon Ck -24.69 150.96 310 32 1949 - 1980





136107A Cania Gorge Three Moon Ck -24.73 151.01 370 26 1963 - 1988

136108A Upper Monal Monal Ck -24.61 151.11 92 43 1963 - 2005

136111A Dakiel Splinter Ck -24.75 151.26 139 41 1965 - 2005

136112A Yarrol Burnett -24.99 151.35 370 40 1966 - 2005

136202D Litzows Barambah Ck -26.30 152.04 681 85 1921 - 2005

136203A Brooklands Barker Ck -26.74 151.82 249 64 1941 - 2005

136301B Weens Br Stuart -26.50 151.77 512 66 1936 - 2005

137001B Elliott Elliott -24.99 152.37 220 52 1949 - 2004

137003A Dr Mays Crossing Elliott -24.97 152.42 251 30 1975 - 2004

137101A Burrum HWY Gregory -25.09 152.24 454 36 1967 - 2004

137201A Bruce HWY Isis -25.27 152.37 446 38 1967 - 2004

138002C Brooyar Wide Bay Ck -26.01 152.41 655 94 1910 - 2005

138003D Glastonbury Glastonbury Ck -26.22 152.52 113 81 1921 - 2006

138009A Tagigan Rd Tinana Ck -26.08 152.78 100 31 1975 - 2005

138010A Kilkivan Wide Bay Ck -26.08 152.22 322 97 1910 - 2006

138101B Kenilworth Mary -26.60 152.73 720 52 1921 - 1972

138102C Zachariah Amamoor Ck -26.37 152.62 133 83 1921 - 2005

138103A Knockdomny Kandanga Ck -26.40 152.64 142 34 1921 - 1954

138104A Kidaman Obi Obi Ck -26.63 152.77 174 42 1921 - 1963

138106A Baroon Pocket Obi Obi Ck -26.71 152.86 67 39 1941 - 1986

138107B Cooran Six Mile Ck -26.33 152.81 186 58 1948 - 2005

138110A Bellbird Ck Mary -26.63 152.70 486 45 1960 - 2004

138111A Moy Pocket Mary -26.53 152.74 820 39 1964 - 2004

138113A Hygait Kandanga Ck -26.39 152.64 143 34 1972 - 2005





140002A Coops Corner Teewah Ck -26.06 153.04 53 27 1975 - 2005

141001B Kiamba South Maroochy -26.59 152.90 33 65 1938 - 2004

141003C Warana Br Petrie Ck -26.62 152.96 38 41 1959 - 2004

141004B Yandina South Maroochy -26.56 152.94 75 27 1959 - 2004

141006A Mooloolah Mooloolah -26.76 152.98 39 33 1972 - 2004

142001A Upper Caboolture Caboolture -27.10 152.89 94 40 1966 - 2005

142201D Cashs Crossing South Pine -27.34 152.96 178 46 1918 - 1963

142202A Drapers Crossing South Pine -27.35 152.92 156 39 1966 - 2005

143010B Boat Mountain Emu Ck -26.98 152.29 915 31 1967 - 2005

143015B Damsite Cooyar Ck -26.74 152.14 963 35 1969 - 2005

143101A Mutdapily Warrill Ck -27.75 152.69 771 39 1915 - 1953

143102B Kalbar No.2 Warrill Ck -27.92 152.60 468 55 1913 - 1970

143103A Moogerah Reynolds Ck -28.04 152.55 190 36 1918 - 1953

143107A Walloon Bremer -27.60 152.69 622 36 1962 - 1999

143108A Amberley Warrill Ck -27.67 152.70 914 36 1962 - 2004

143110A Adams Br Bremer -27.83 152.51 125 29 1972 - 2004

143113A Loamside Purga Ck -27.68 152.73 215 28 1974 - 2004

143203C Helidon Number 3 Lockyer Ck -27.54 152.11 357 74 1927 - 2004

143208A Dam Site Fifteen Mile Ck -27.46 152.10 87 26 1957 - 1985

143209B Mulgowie2 Laidley Ck -27.73 152.36 167 31 1958 - 2004

143303A Peachester Stanley -26.84 152.84 104 77 1928 - 2005

143307A Causeway Byron Ck -27.13 152.65 79 26 1976 - 2005

145002A Lamington No.1 Christmas Ck -28.24 152.99 95 43 1910 - 1953

145003B Forest Home Logan -28.20 152.77 175 83 1918 - 2005





145005A Avonmore Running Ck -28.30 152.91 89 30 1923 - 1952

145010A 5.8KM Deickmans Br Running Ckreek -28.25 152.89 128 40 1966 - 2005

145011A Croftby Teviot Brook -28.15 152.57 83 38 1967 - 2005

145012A The Overflow Teviot Brook -27.93 152.86 503 39 1967 - 2005

145018A Up Stream Maroon Dam Burnett Ck -28.22 152.61 82 32 1971 - 2005

145020A Rathdowney Logan -28.22 152.87 533 32 1974 - 2005

145101D Lumeah Number 2 Albert -28.06 153.04 169 43 1911 - 1953

145102B Bromfleet Albert -27.91 153.11 544 85 1919 - 2005

145103A Good Dam Site Cainbable Ck -28.09 153.08 42 32 1963 - 2004

145107A Main Rd Br Canungra Ck -28.00 153.16 101 32 1974 - 2005

146002B Glenhurst Nerang -28.00 153.31 241 85 1920 - 2005

146003B Camberra Number 2 Currumbin Ck -28.20 153.41 24 55 1928 - 1982

146004A Neranwood Little Nerang Ck -28.13 153.29 40 35 1927 - 1961

146005A Chippendale Tallebudgera Ck -28.16 153.40 55 27 1927 -1953

146010A Army Camp Coomera -28.03 153.19 88 43 1963 - 2005

146012A Nicolls Br Currumbin Ck -28.18 153.42 30 31 1971 - 2005

146014A Beechmont Back Ck -28.12 153.19 7 31 1972 - 2004

146095A Tallebudgera Ck Rd Tallebudgera Ck -28.15 153.40 56 29 1971 - 2004

416303C Clearview Pike Ck -28.81 151.52 950 48 1935 - 1987

416305B Beebo Brush Ck -28.69 150.98 335 36 1969 - 2005

416312A Texas Oaky Ck -28.81 151.15 422 35 1970 - 2004

416404C Terraine Bracker Ck -28.49 151.28 685 45 1953 - 2001

416410A Barongarook Macintyre Brook -28.44 151.46 465 32 1968 - 2001

422210A Tabers Bungil Ck -26.41 148.78 710 32 1967 - 2004





422301A Long Crossing Condamine -28.32 152.34 85 66 1912 - 1977

422302A Killarney Spring Ck -28.35 152.34 21 45 1910 - 1954

422303A Killarney Spring Ck South -28.36 152.34 10 45 1910 - 1954

422304A Elbow Valley Condamine -28.37 152.16 275 56 1916 - 1971

422306A Swanfels Swan Ck -28.16 152.28 83 85 1920 - 2004

422307A Kings Ck Kings Ck -27.90 151.91 334 42 1921 - 1966

422313B Emu Vale Emu Ck -28.23 152.23 148 58 1948 - 2005

422317B Rocky Pond Glengallan Ck -28.13 151.92 520 38 1954 - 1991

422319B Allora Dalrymple Ck -28.04 152.01 246 36 1969 - 2005

422321B Killarney Spring Ck -28.35 152.33 35 45 1960 - 2004

422326A Cranley Gowrie Ck -27.52 151.94 47 34 1970 - 2004

422332B Oakey Gowrie Ck -27.47 151.74 142 25 1969 - 2006

422334A Aides Br Kings Ck -27.93 151.86 516 35 1970 - 2004

422338A Leyburn Canal Ck -28.03 151.59 395 27 1975 - 2004

422341A Brosnans Barn Condamine -28.33 152.31 92 29 1977 - 2005

422394A Elbow Valley Condamine -28.37 152.14 325 32 1973 - 2004

913010A 16 Mile Waterhole Fiery Ck -18.88 139.36 722 29 1973 - 2004

915011A Mt Emu Plains Porcupine Ck -20.18 144.52 540 31 1972 - 2004

915206A Railway Crossing Dugald -20.20 140.22 660 31 1970 - 2004

915211A Landsborough HWY Williams -20.87 140.83 415 31 1971 - 2003

917104A Roseglen Etheridge -18.31 143.58 867 32 1967 - 2005

917107A Mount Surprise Elizabeth Ck -18.13 144.31 651 32 1969 - 2002

919005A Fonthill Rifle Ck -16.68 145.23 366 32 1969 - 2004

919013A Mulligan HWY McLeod -16.50 145.00 532 25 1973 - 2005





919201A Goldfields Palmer -16.11 144.78 533 30 1968 - 2004

919305B Nullinga Walsh -17.18 145.30 326 35 1957 - 1991

922101B Racecourse Coen -13.96 143.17 172 32 1968 - 2004

926002A Dougs Pad Dulhunty -11.83 142.42 332 30 1971 - 2004



APPENDIX B

Additional results on training and validation of RFFA models



Figure B.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model








Figure B.6 Comparison of observed and predicted flood quantiles for GAANN based RFFA




Figure B.11 Comparison of observed and predicted flood quantiles for GEP based RFFA model






Figure B.15 Comparison of observed and predicted flood quantiles (training) for GEP based

RFFA model for Q100 (training data set)

Figure B.16 Comparison of observed and predicted flood quantiles (training) for CANFIS based




Figure B.21 Comparison of observed and predicted flood quantiles (validation) for ANN based

RFFA model for Q2


RFFA model for Q5




RFFA model for Q10


RFFA model for Q50




RFFA model for Q100

Figure B.26 Comparison of observed and predicted flood quantiles (validation) for GAANN

based RFFA model for Q2





1

10

100

1000

10000

1 10 100 1000 10000

Qp

red

(m3/s

ec)

Qobs (m3/sec)







1

10

100

1000

10000

1 10 100 1000 10000

Qp

red

(m3/s

ec

)

Qobs (m3/sec)





Figure B.31 Comparison of observed and predicted flood quantiles (validation) for GEP based

RFFA model for Q2


RFFA model for Q5




RFFA model for Q10


RFFA model for Q50




RFFA model for Q100

Figure B.36 Comparison of observed and predicted flood quantiles (validation) for CANFIS




Figure B.41 Regression plot comparing the training and validation of the ANN based RFFA

model for Q2


model for Q5




model for Q10


model for Q50




model for Q100



Figure B.46 Section of Dendrogram using average linkage between groups



Figure B.47 Section of Dendrogram using average linkage between groups

DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED REGIONAL ...

Documents

Transcript of DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED REGIONAL ...