DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED REGIONAL ...
Transcript of DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED REGIONAL ...
DEVELOPMENT OF ARTIFICIAL
INTELLIGENCE BASED REGIONAL
FLOOD FREQUENCY ANALYSIS
TECHNIQUE
Kashif Aziz, BScEng, MEng
Student ID 16658598
A thesis submitted for fulfilment for the degree of
Doctor of Philosophy in Civil Engineering
Supervisory Panel:
Assoc Prof Ataur Rahman
Assoc Prof Gu Fang
Assoc Prof Surendra Shrestha
School of Computing, Engineering and Mathematics
University of Western Sydney, Australia
December 2014
Artificial Intelligence Based RFFA Aziz
University of Western Sydney II
ABSTRACT
Flood is one of the worst natural disasters, which brings disruptions to services and damages
to infrastructure, crops and properties and sometimes causes loss of human lives. In Australia,
the average annual flood damage is worth over $377 million, and infrastructure requiring
design flood estimate is over $1 billion per annum. The 2010-11 devastating flood in
Queensland alone caused flood damage over $5 billion.
Design flood estimation is required in numerous engineering applications e.g., design of
bridge, culvert, weir, spill way, detention basin, flood protection levees, highways, floodplain
modelling, flood insurance studies and flood damage assessment tasks. For design flood
estimation, the most direct method is flood frequency analysis, which requires long period of
recorded streamflow data at the site of interest. This is not a feasible option at many locations
due to absence or limitation of streamflow records. For these ungauged or poorly gauged
catchments, regional flood frequency analysis (RFFA) is adopted. The use of RFFA enables
the transfer of flood characteristics information from gauged to ungauged catchments. RFFA
essentially consists of two principal steps: (i) formation of regions; and (ii) development of
prediction equations.
For developing the regional flood prediction equations, the commonly used techniques
include the rational method, index flood method and quantile regression technique. These
techniques adopt a linear method of transforming inputs to outputs. Since hydrologic systems
are non-linear, RFFA techniques based on non-linear method can be a better alternative to
linear methods. Among the non-linear methods, artificial intelligence based techniques have
been widely adopted to various water resources engineering problems. However, their
application to RFFA is quite limited. Hence, this research focuses on the development of
artificial intelligence based RFFA methods for Australia. The non-linear techniques
considered in this thesis include artificial neural network (ANN), genetic algorithm based
artificial neural network (GAANN), gene-expression programing (GEP) and co-active neuro
fuzzy inference system (CANFIS).
This study uses data from 452 small to medium sized catchments from eastern Australia. In
the development/training of the artificial intelligence based RFFA models, the selected 452
catchments are divided into two parts randomly: (i) training data set consisting of 362
catchments; and (ii) validation data set consisting of 90 catchments. It has been found that a
Artificial Intelligence Based RFFA Aziz
University of Western Sydney III
RFFA model with two predictor variables i.e., catchment area and design rainfall intensity
provides more accurate flood quantile estimates than other models with a greater number of
predictor variables. The results show that when the data from all the eastern Australian states
are combined to form one region, the resulting ANN based RFFA model performs better as
compared with other candidate regions such as regions based on state boundaries,
geographical and climatic boundaries and the regions formed in the catchment characteristics
data space.
In the training of the four artificial intelligence based RFFA models, no model performs the
best for all the six average recurrence intervals over all the adopted statistical criteria. Overall,
the ANN based RFFA model performs better than the three other models in the
training/calibration.
In this research, it also has been found that non-linear artificial intelligence based RFFA
techniques can be applied successfully to eastern Australian catchments. Among the four
artificial intelligence based models considered in this study, the ANN based RFFA model has
demonstrated best performance based on independent split-sample validation, followed by the
GAANN based RFFA model. The ANN based RFFA model has been found to outperform the
ordinary least squares based RFFA model. Based on independent validation, the median
relative error values for the ANN based RFFA model are found to be in the range of 35% to
44% for eastern Australia, which is comparable to the generalised least squares regression and
region-of-influence based RFFA approach. The ANN based RFFA model exhibits no
noticeable spatial trend in the relative error values. Furthermore, the relative error values of
the ANN based RFFA model are found to be independent of catchment area.
The findings of this research would help to recommend the most appropriate RFFA
techniques in the 4th edition of Australian Rainfall and Runoff, which is due to be published in
2015.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney IV
STATEMENT OF AUTHENTICTY
I certify that all materials presented in this thesis are of my own contribution, and that any
work adopted from other sources is duly cited and referenced as such. This thesis contains no
material that has been submitted for any award or degree in other university or institution.
Kashif Aziz
Artificial Intelligence Based RFFA Aziz
University of Western Sydney V
ACKNOWLEDGMENTS
I would like to express my heartfelt gratitude to Associate Professor Ataur Rahman, who is
not only a mentor of mine but a role model as well. This work would have not been possible
without his support, encouragement and most importantly the patience during the completion
of this work. I am also grateful to Associate Professor Gu Fang and Associate Professor
Surendra Shrestha for their valuable advice, support and constructive feedback towards the
completion of this research. I could not be prouder of my academic roots and hope that I can
in turn pass on the research values and the dreams that my supervisors have given to me.
I would not have contemplated this road if not for my parents, Mr. and Mrs. Choudhry Abdul
Aziz (late), who instilled within me a love of knowledge and a spirit of struggle to achieve the
goal, all of which finds a place in this thesis. To my parents, thank you. I sincerely
acknowledge and appreciate the support and patience of my wife Rabia Rehman during this
study by looking after myself and our kids. I am also thankful to my family and friends in
Australia and overseas for their prayers and encouragement.
To the staff and fellow students at University of Western Sydney’s School of Computing
Engineering and Mathematics, I am grateful for your help, encouragement and the company I
have enjoyed during my candidature. Thank you for welcoming me as a friend and for your
moral support.
I would like to acknowledge the technical and financial support of all the related Government
agencies for providing the resources towards the completion of this research.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney VI
Publications made (UNTIL June 2015) from
this study
Aziz. K., Rahman, A., Fang, G., Shrestha, S. (2014). Application of Artificial Neural
Networks in Regional Flood Frequency Analysis: A Case Study for Australia, Stochastic
Environment Research & Risk Assessment, 28, 3, 541-554.
Aziz, K., Rai, S., Rahman, A. (2014). Design flood estimation in ungauged catchments using
genetic algorithm based artificial neural network (GAANN) technique for Australia, Natural
Hazards, 77, 2, 805-821.
Aziz, K., Rahman, A., Shamseldin, A.Y., Shoaib, M. (2013). Co-Active Neuro Fuzzy
Inference System for Regional Flood Estimation in Australia, Journal of Hydrology and
Environment Research, 1, 1, 11-20.
Aziz, K., Sohail, R., Rahman, A. (2014). Application of Artificial Neural Networks and
Genetic Algorithm for Regional Flood Estimation in Eastern Australia, 35th Hydrology and
Water Resources Symposium, Perth, Engineers Australia, 24-27 Feb, 2014.
Aziz, K., Rahman, A., Shamseldin, A., Shoaib, M. (2013). Regional flood estimation in
Australia: Application of gene expression programming and artificial neural network
techniques, 20th International Congress on Modelling and Simulation, 1 to 6 December, 2013,
Adelaide, Australia, 2283-2289.
Aziz, K., Rahman, A., Fang, G. Shrestha, S. (2012). Comparison of Artificial Neural
Networks and Adaptive Neuro-fuzzy Inference System for Regional Flood Estimation in
Australia, Hydrology and Water Resources Symposium, Engineers Australia, 19-22 Nov
2012, Sydney, Australia.
Aziz, K., Rahman, A., Shrestha, S., Fang, G. (2011). Derivation of optimum regions for ANN
based RFFA in Australia, 34th IAHR World Congress, 26 June – 1 July 2011, Brisbane, 17-
24.
Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2011). Application of Artificial Neural
Networks in Regional Flood Estimation in Australia: Formation of Regions Based on
Catchment Attributes, The Thirteenth International Conference on Civil, Structural and
Environmental Engineering Computing and CSC2011: The Second International Conference
on Soft Computing Technology in Civil, Structural and Environmental Engineering, Chania,
Artificial Intelligence Based RFFA Aziz
University of Western Sydney VII
Crete, Greece, 6-9 September, 2011, 13 pp.
Aziz, K., Rahman, A., Fang, G., Haddad, K. and Shrestha, S. (2010). Design flood estimation
for ungauged catchments: Application of artificial neural networks for eastern Australia,
World Environmental and Water Resources Congress 2010, American Society of Civil
Engineers (ASCE), 16-20 May 2010, Providence, Rhode Island, USA, pp. 2841-2850.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney VIII
TABLE OF CONTENTS
ABSTRACT………………………………………………………………………………… II
STATEMENT OF AUTHENTICITY…………………………………………………… IV
ACKNOWLEDGEMENT ……………………………………………………………… … V
PUBLICATIONS MADE (UNTIL DECEMBER 2014) FROM THIS
STUDY................................................................................................................................. VI
LIST OF FIGURES……………………………………………………………………… XII
LIST OF TABLES…………………………………………………………………… XVIII
LIST OF SYMBOLS…………………………………………………………………… XX
LIST OF ABBREVIATIONS…………………………………………………………….XXII
CHAPTER 1 .......................................................................................................................... 1
INTRODUCTION ................................................................................................................. 1
1.1 General ......................................................................................................................... 1
1.2 Background .................................................................................................................. 1
1.3 Need for this research .................................................................................................. 5
1.4 Scope and objectives of the study ................................................................................ 6
1.5 Research questions ....................................................................................................... 7
1.6 Summary of research undertaken in this thesis ............................................................ 8
1.7 Outline of the thesis ..................................................................................................... 9
CHAPTER 2 ........................................................................................................................ 12
REVIEW OF REGIONAL FLOOD FREQUENCY ANALYSIS METHODS .................. 12
2.1 General ....................................................................................................................... 12
2.2 Design flood estimation methods ............................................................................... 12
2.2.1 Streamflow-based flood estimation methods ............................................... 13
2.3 Techniques for RFFA .................................................................................................... 15
2.3.1 Linear techniques .............................................................................................. 15
2.3.2 Non-linear RFFA techniques ............................................................................ 21
2.4 Summary .................................................................................................................. 32
CHAPTER 3 ........................................................................................................................ 34
METHODOLOGY .............................................................................................................. 34
3.1 General........................................................................................................................... 34
3.2 Methods adopted in the study ........................................................................................ 34
3.2.1 Artificial neural network (ANN) ....................................................................... 35
3.2.2 Genetic algorithm based ANN (GAANN) ........................................................ 39
3.2.3 Gene-expression programming ......................................................................... 45
3.2.4 Co-active neuro fuzzy inference system (CANFIS) ......................................... 47
Artificial Intelligence Based RFFA Aziz
University of Western Sydney IX
3.2.5 Quantile regression technique (QRT) ............................................................... 51
3.2.6 Cluster analysis ................................................................................................. 53
3.2.7 Principle component analysis (PCA) ................................................................ 55
3.2.8 Model validation technique ............................................................................... 55
3.3 Summary ........................................................................................................................ 56
CHAPTER 4 ........................................................................................................................ 57
SELECTION OF STUDY AREA AND DATA PREPARATION ..................................... 57
4.1 General........................................................................................................................... 57
4.2 Selection of study area ................................................................................................... 57
4.3 Selection of study catchments ....................................................................................... 58
4.3.1 Factors considered for selection of catchments ................................................ 58
4.4 Streamflow data preparation .......................................................................................... 59
4.4.1 Methods of streamflow data preparation .......................................................... 59
4.4.2 Tests for outliers ................................................................................................ 60
4.4.3 Trend analysis ................................................................................................... 60
4.4.4 Rating error analysis ......................................................................................... 61
4.5 Selection of catchment characteristics ........................................................................... 62
4.5.1 Selection criteria ............................................................................................... 62
4.5.2 Catchment characteristics considered in this thesis .......................................... 63
4.5.3 Rainfall intensity ............................................................................................... 63
4.5.4 Mean annual rainfall ......................................................................................... 64
4.5.5 Catchment area .................................................................................................. 64
4.5.6 Slope S1085 ...................................................................................................... 65
4.5.7 Mean annual evapo-transpiration ...................................................................... 66
4.6 Streamflow data preparation for various states ............................................................. 66
4.6.1 NSW and ACT .................................................................................................. 66
4.6.3 Queensland ........................................................................................................ 73
4.6.4 Victoria .............................................................................................................. 76
4.5 Flood frequency analysis ............................................................................................... 81
4.6 Summary of catchment characteristics data .................................................................. 82
4.7 Summary ........................................................................................................................ 83
CHAPTER 5 ........................................................................................................................ 84
SELECTION OF PREDICTOR VARIABLES FOR ARTIFICIAL INTELLIGENCE
BASED RFFA MODELS .......................................................................................... 84
5.1 General........................................................................................................................... 84
5.2 Initial selection of predictor variables for artificial intelligence based RFFA models .. 84
5.3 Selection of Predictor variables for ANN based RFFA models .................................... 88
5.4 Selection of predictor variables based on GEP models ................................................. 91
5.5 Summary ........................................................................................................................ 95
CHAPTER 6 ........................................................................................................................ 96
Artificial Intelligence Based RFFA Aziz
University of Western Sydney X
SELECTION OF REGIONS ............................................................................................... 96
6.1 General........................................................................................................................... 96
6.2 Description of candidate regions ................................................................................... 96
6.2.1 Selection of the best performing region based on state, geographic and climatic
boundaries .................................................................................................................. 97
6.3 Regions based on catchment characteristics data ........................................................ 100
6.3.1 Cluster analysis ............................................................................................... 100
6.3.2 Principal component analysis .......................................................................... 105
6.4 Summary ...................................................................................................................... 111
CHAPTER 7 ...................................................................................................................... 113
DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED RFFA MODELS ........ 113
7.1 General......................................................................................................................... 113
7.2 Training of artificial intelligence based RFFA models ............................................... 114
7.3 Comparison of training and validation results ............................................................. 120
7.3.1 ANN ................................................................................................................ 120
7.3.2 GAANN .......................................................................................................... 123
7.3.3 GEP ................................................................................................................. 126
7.3.4 CANFIS .......................................................................................................... 129
7.4 Selection of the best performing artificial intelligence based RFFA model based on
training ..................................................................................................................... 131
7.5 Summary ...................................................................................................................... 133
CHAPTER 8 ...................................................................................................................... 134
VALIDATION OF ARTIFICIAL INTELLIGENCE BASED RFFA MODELS ............. 134
8.1 General......................................................................................................................... 134
8.2 Validation of RFFA models ........................................................................................ 134
8.2.1 ANN ................................................................................................................ 134
8.2.2 GAANN .......................................................................................................... 138
8.2.3 GEP ................................................................................................................. 140
8.2.4 CANFIS .......................................................................................................... 143
8.3 Comparison of RFFA models based on validation data set ........................................ 145
8.3.1 Median Qpred/Qobs ratio .................................................................................... 145
8.3.2 Median RE (%) ............................................................................................... 147
8.3.3 Median CE ...................................................................................................... 149
8.3.5 Comparison of RFFA models based on RE (%) ranges .................................. 151
8.3.6 Selection of the best performing artificial intelligence based RFFA model ... 152
8.4 Performance of the finally selected artificial intelligence based RFFA model ........... 153
8.4.1 Spatial distribution of RE (%) of the ANN based RFFA model ..................... 154
8.4.2 Catchment area vs RE ..................................................................................... 157
8.5 Comparison with QRT ................................................................................................ 158
8.6 Summary ...................................................................................................................... 159
CHAPTER 9 ...................................................................................................................... 161
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XI
SUMMARY, CONCLUSIONS AND RECOMMENDATIONS ..................................... 161
9.1 General......................................................................................................................... 161
9.2 Summary of the research undertaken in this thesis ..................................................... 161
9.3 Conclusions ................................................................................................................. 163
9.4 Recommendations for further research........................................................................ 164
REFERENCES .................................................................................................................. 166
REFERENCES .................................................................................................................. 167
APPENDICES ................................................................................................................... 182
APPENDIX A ................................................................................................................... 183
APPENDIX B .................................................................................................................... 205
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XII
List of Figures
Figure 1.1 Flooding at Ipswich, Queensland 2011 (ABC News, Australia) ........................................................... 2
Figure 1.2 Aerial view of the flooded south western town of Wagga Wagga, NSW in March 2012 (ABC News,
2012) ....................................................................................................................................................................... 3
Figure 1.3 Illustration of major steps in this research ............................................................................................. 9
Figure 2.1 Various design flood estimation methods (modified from Rahman et al., 1998) .................................13
Figure 3.1 Different RFFA techniques adopted in this study .................................................................................34
Figure 3.2 Structure of typical natural neuron (Source:
http://staff.itee.uq.edu.au/janetw/cmc/chapters/Introduction/) ...............................................................................35
Figure 3.3 Configuration of Feedforward Three-Layer ANN (ASCE, 2000) ........................................................36
Figure 3.4 Basic idea of genetic algorithm (Sohail et al., 2005) ............................................................................43
Figure 3.5 Flow chart showing steps in GAANN model .......................................................................................44
Figure 3.6 An example of assigning gene values of a chromosome to the respective synaptic weights of ANN
architecture during a GAANN modelling ..............................................................................................................45
Figure 3.7 GEP expression tree (ET) .....................................................................................................................47
Figure 3.8 Fuzzy inference system (FIS) (Shi and Mozimoto, 2000) ....................................................................48
Figure 3.9 A typical structure of CANFIS (Parthiban and Subramanian, 2009) ....................................................50
Figure 4.1 Location of the selected study area (coloured parts of the map) ...........................................................57
Figure 4.2 Result of trend analysis (Station 219001). Here Vk is CUSUM test statistic defined in McGilchrist and
Wodyer, 1975 .........................................................................................................................................................68
Figure 4.3 Result of trend analysis – time series plot (Station 219001) .................................................................69
Figure 4.4 Histogram of rating ratios for 106 stations from NSW .........................................................................69
Figure 4.5 Distribution of streamflow record lengths of 96 stations from NSW and ACT ....................................70
Figure 4.6 Distribution of catchment areas of 96 stations from NSW and ACT ....................................................70
Figure 4.7 Geographical distributions of 96 catchments from NSW and ACT ......................................................71
Figure 4.8 Distribution of streamflow record lengths of the selected stations from Tasmania ..............................72
Figure 4.9 Distribution of catchment areas of the selected stations from Tasmania ..............................................72
Figure 4.10 Locations of selected catchments from Tasmania ..............................................................................73
Figure 4.11 Distribution of streamflow record lengths of the selected 172 stations from QLD ............................75
Figure 4.12 Distribution of catchment areas of the selected 172 stations from QLD ............................................75
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XIII
Figure 4.13 Locations of the selected 172 stations from QLD ...............................................................................76
Figure 4.14 Time series graph showing significant trends after 1995 ....................................................................78
Figure 4.15 CUSUM test plot showing significant trends after 1995 ....................................................................78
Figure 4.15 Histogram of rating ratios (RR) of AM flood data in Victoria (stations with record lengths > 25
years) ......................................................................................................................................................................79
Figure 4.16 Distributions of streamflow record lengths of the selected 131 stations from Victoria ......................81
Figure 4.17 Distributions of catchment areas of the selected 131 catchments from Victoria ................................81
Figure 4.18 Geographical distributions of the selected 131 catchments from Victoria .........................................82
Figure 4.19 Locations of the study catchments ......................................................................................................82
Figure 6.1 Plot of median Qpred/Qobs ratio values for different ARIs for selected regions ......................................99
Figure 6.2 Median relative error (%) values for different ARIs for selected regions ...........................................100
Table 6.4 Regions/groups formation by cluster analysis......................................................................................101
Figure 6.3 Dendrogram using average linkage between groups ..........................................................................102
Figure 6.3 (a) Section of Dendrogram using average linkage between groups ....................................................103
Figure 6.3 (b) Section of Dendrogram using average linkage between groups ....................................................104
Figure 6.4 Scree plot from principal component analysis ....................................................................................107
Figure 6.5 Grouping derived from PC1 vs PC2 plot based on PC1 .....................................................................107
Figure 6.6 Grouping derived from PC1 vs PC2 plot based on PC2 .....................................................................108
Figure 6.7 Median Qpred/Qobs ratio values for different ARIs for candidate regions .............................................110
Figure 6.8 Median relative error (%) values for different ARIs for candidate regions ........................................111
Figure 6.9 Comparison of median relative error (%) values between combine data set and grouping based on K-
Means cluster analysis .........................................................................................................................................111
Figure 7.1 Plot of CE values of four artificial intelligence based RFFA models based on training data set .......114
Figure 7.2 Plot of median Qpred/Qobs ratio values of four artificial intelligence based RFFA models based on
training data set ....................................................................................................................................................115
Figure 7.3 Plot of median RE (%) values of four artificial intelligence based RFFA models based on training
data set .................................................................................................................................................................116
Figure 7.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q20 (training
data set) ................................................................................................................................................................117
Figure 7.5 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q20
(training data set) .................................................................................................................................................118
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XIV
Figure 7.6 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q20 (training
data set) ................................................................................................................................................................119
Figure 7.7 Comparison of observed and predicted flood quantiles for CANFIS based RFFA model for Q20
(training data set) .................................................................................................................................................119
Figure 7.8 Plot comparing the CE values given by the training and validation data sets for the ANN based RFFA
model ...................................................................................................................................................................121
Figure 7.9 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the
ANN based RFFA model .....................................................................................................................................121
.............................................................................................................................................................................122
Figure 7.10 Plot comparing the median RE (%) values given by the training and validation data sets for the ANN
based RFFA model...............................................................................................................................................122
Figure 7.11 Regression plot comparing the training and validation of the ANN based RFFA model for Q20 .....122
Figure 7.12 Plot showing the training state of the ANN based RFFA model for Q20 ..........................................123
Figure 7.13 Plot between Qobs and Qpred for the ANN based RFFA model for the validation data set ................123
Figure 7.14 Plot comparing the CE values given by the training and validation data sets for the GAANN based
RFFA model.........................................................................................................................................................125
Figure 7.15 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for
the GAANN based RFFA model .........................................................................................................................125
Figure 7.16 Plot comparing the median RE (%) values given by the training and validation data sets for the
GAANN based RFFA model ...............................................................................................................................126
Figure 7.17 Plot comparing the CE values given by the training and validation data sets for the GEP based
RFFA model.........................................................................................................................................................127
Figure 7.18 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for
the GEP based RFFA model ................................................................................................................................128
Figure 7.19 Plot comparing the median RE (%) values given by the training and validation data sets for the GEP
based RFFA model...............................................................................................................................................128
Figure 7.20 Plot comparing the CE values given by the training and validation data sets for the CANFIS based
RFFA model.........................................................................................................................................................130
Figure 7.21 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for
the CANFIS based RFFA model ..........................................................................................................................130
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XV
Figure 7.22 Plot comparing the median RE (%) values given by the training and validation data sets for the
CANFIS based RFFA model ...............................................................................................................................131
Figure 8.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q20 ...........135
Figure 8.2 Boxplot of relative error (RE) values for ANN based RFFA model ..................................................136
Figure 8.3 Boxplot of Qpred/Qobs ratio values for ANN based RFFA model ........................................................137
Figure 8.4 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q20 .....138
Figure 8.5 Boxplot of relative error (RE) values for GAANN based RFFA model .............................................139
Figure 8.6 Boxplot of Qpred/Qobs ratio values for GAANN based RFFA model ...................................................140
Figure 8.7 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q20 ............141
Figure 8.8 Boxplot of relative error (RE) values for GEP based RFFA model ....................................................142
Figure 8.9 Boxplot of Qpred/Qobs ratio values for GEP based RFFA model .........................................................143
Figure 8.10 Comparison of observed and predicted flood quantiles for CANFIS based RFFA model for Q20 ...144
Figure 8.11 Boxplot of relative error (RE) values for CANFIS based RFFA model ...........................................145
Figure 8.12 Boxplot of Qpred/Qobs ratio values for CANFIS based RFFA model .................................................146
Figure 8.13 Plot of median Qpred/Qobs ratio values for the four artificial intelligence based RFFA models .........148
Figure 8.14 Plot of median RE (%) values for the four artificial intelligence based RFFA models ....................150
Figure 8.15 Plot of median CE values for the four artificial intelligence based RFFA models ...........................151
Figure 8.16 Spatial distribution of RE of ANN based model across NSW ..........................................................154
Figure 8.17 Spatial distribution of RE of ANN based model across VIC ............................................................155
Figure 8.18 Spatial distribution of RE of ANN based model across North QLD ................................................156
Figure 8.19 Spatial distribution of RE of ANN based model across Southeast QLD ..........................................156
Figure 8.20 Spatial distribution of RE of ANN based model across QLD ...........................................................157
Figure 8.21 Spatial distribution of RE of ANN based model across TAS ...........................................................157
Figure 8.22 Plot between catchment area and RE (%) values for ANN based RFFA model for 90 test catchments
.............................................................................................................................................................................158
Figure B.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q2 (training
data set) ................................................................................................................................................................206
Figure B.2 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q5 (training
data set) ................................................................................................................................................................206
Figure B.3 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q10 (training
data set) ................................................................................................................................................................207
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XVI
Figure B.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q50 (training
data set) ................................................................................................................................................................207
Figure B.5 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q100 (training
data set) ................................................................................................................................................................208
Figure B.6 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q2
(training data set) .................................................................................................................................................208
Figure B.7 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q5
(training data set) .................................................................................................................................................209
Figure B.8 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q10
(training data set) .................................................................................................................................................209
Figure B.9 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q50
(training data set) .................................................................................................................................................210
Figure B.10 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q100
(training data set) .................................................................................................................................................210
Figure B.11 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q2 (training
data set) ................................................................................................................................................................211
Figure B.12 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q5 (training
data set) ................................................................................................................................................................211
Figure B.13 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q10 (training
data set) ................................................................................................................................................................212
Figure B.14 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q50 (training
data set) ................................................................................................................................................................212
Figure B.15 Comparison of observed and predicted flood quantiles (training) for GEP based RFFA model for
Q100 (training data set) ..........................................................................................................................................213
Figure B.16 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model
for Q2 (training data set) .......................................................................................................................................213
Figure B.17 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model
for Q5 (training data set) .......................................................................................................................................214
Figure B.18 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model
for Q10 (training data set) ......................................................................................................................................214
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XVII
Figure B.19 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model
for Q50 (training data set) ......................................................................................................................................215
Figure B.20 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model
for Q100 (training data set) ....................................................................................................................................215
Figure B.21 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for
Q2 .........................................................................................................................................................................216
Figure B.22 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for
Q5 .........................................................................................................................................................................216
Figure B.23 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for
Q10 ........................................................................................................................................................................217
Figure B.24 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for
Q50 ........................................................................................................................................................................217
Figure B.25 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for
Q100 .......................................................................................................................................................................218
Figure B.26 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model
for Q2 ....................................................................................................................................................................218
Figure B.27 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model
for Q5 ....................................................................................................................................................................219
Figure B.28 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model
for Q10 ..................................................................................................................................................................219
Figure B.29 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model
for Q50 ..................................................................................................................................................................220
Figure B.30 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model
for Q100 .................................................................................................................................................................220
Figure B.31 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for
Q2 .........................................................................................................................................................................221
Figure B.32 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for
Q5 .........................................................................................................................................................................221
Figure B.33 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for
Q10 ........................................................................................................................................................................222
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XVIII
Figure B.34 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for
Q50 ........................................................................................................................................................................222
Figure B.35 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for
Q100 .......................................................................................................................................................................223
Figure B.36 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model
for Q2 ....................................................................................................................................................................223
Figure B.37 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model
for Q5 ....................................................................................................................................................................224
Figure B.38 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model
for Q10 ..................................................................................................................................................................224
Figure B.39 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model
for Q50 ..................................................................................................................................................................225
Figure B.40 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model
for Q100 .................................................................................................................................................................225
Figure B.41 Regression plot comparing the training and validation of the ANN based RFFA model for Q2 ......226
Figure B.42 Regression plot comparing the training and validation of the ANN based RFFA model for Q5 ......226
Figure B.43 Regression plot comparing the training and validation of the ANN based RFFA model for Q10 ....227
Figure B.44 Regression plot comparing the training and validation of the ANN based RFFA model for Q50 ....227
Figure B.45 Regression plot comparing the training and validation of the ANN based RFFA model for Q100 ...228
.............................................................................................................................................................................229
Figure B.46 Section of Dendrogram using average linkage between groups .......................................................229
.............................................................................................................................................................................230
Figure B.47 Section of Dendrogram using average linkage between groups .......................................................230
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XIX
List of Tables
Table 3.1 Parameters used per run in GEP model ..................................................................................................47
Table 4.1 Summary statistics of the catchment characteristics data .......................................................................83
Table 5.1 Catchment characteristics predictor variables used in some previous RFFA studies .............................85
Table 5.2 Various candidate models and catchment characteristics used ..............................................................87
Table 5.3 Comparison of eight different ANN based RFFA models using 90 independent test catchments .........89
Table 5.4 Rating on the basis of median Qpred/Qobs ratio ........................................................................................90
Table 5.5 Grouping of stations on the basis of median Qpred/Qobs ratio using the criteria of Table 5.4 (ANN based
RFFA models) ........................................................................................................................................................90
Table 5.6 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio value using 90
independent test catchments (ANN based RFFA models) .....................................................................................91
Table 5.7 Comparison of Model 1 and Model 2 on the basis of median relative error (RE) values using 90
independent test catchments (ANN based RFFA models) .....................................................................................91
Table 5.8 Comparison of eight different GEP based RFFA models using 90 independent test catchments ..........92
Table 5.9 Grouping of stations on the basis of median Qpred/Qobs ratio values using the criteria of Table 5.4 for
GEP based RFFA models ......................................................................................................................................94
Table 5.10 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio values using 90
independent test catchments (for GEP based RFFA models ) ...............................................................................94
Table 5.11 Comparison of Model 1 and Model 2 on the basis of RE values using 90 independent test catchments
(for GEP based RFFA models) ..............................................................................................................................94
Table 6.1 Description of candidate regions ............................................................................................................97
Table 6.2 Median Qpred/Qobs ratio values for seven ANN based candidate regions ................................................98
Table 6.3 Median relative error values (%) for seven ANN-based candidate regions ...........................................99
Table 6.4 Regions/groups formation by cluster analysis......................................................................................101
Table 6.5 ANN based RFFA model performances for cluster groupings A1 & A2.............................................105
Table 6.6 ANN- based RFFA model performances for cluster groupings B1 & B2 ............................................105
Table 6.7 Eigenvalues and variance explained by the principal components ......................................................106
Table 6.8 Component matrix in principal component analysis ............................................................................106
Table 6.9 Descriptive statistics of standardised variables ....................................................................................107
Table 6.10 Grouping based on principal component analysis ..............................................................................109
Table 6.11 Median Qpred/Qobs ratio values for seven candidate regions ................................................................110
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XX
Table 6.12 Median relative error (%) ...................................................................................................................110
Table 7.1 CE values of four artificial intelligence based RFFA models based on training data set .....................114
Table 7.2 Median Qpred/Qobs ratio values of four artificial intelligence based RFFA models based on training data
set .........................................................................................................................................................................115
Table 7.3 Median RE (%) values of four artificial intelligence based RFFA models (training) ..........................116
Table 7.4 Comparison of training and validation results for the ANN based RFFA model.................................120
Table 7.5 Comparison of training and validation results for the GAANN based RFFA model ...........................124
Table 7.6 Comparison of training and validation results for the GEP based RFFA model ..................................127
Table 7.8 Comparison of training and validation results for the CANFIS based RFFA model ...........................129
Table 7.9 Ranking of the four artificial intelligence based RFFA models with respect to training .....................132
Table 7.10 Ranking of the four artificial intelligence based RFFA models with respect to agreement between
training and validation .........................................................................................................................................132
Table 8.1 Median Qpred/Qobs ratio values for the four artificial intelligence based RFFA models .......................147
Table 8.2 Median RE (%) values for the four artificial intelligence based RFFA models
.............................................................................................................................................................................149
Table 8.3 Median CE values of the four artificial intelligence based RFFA models ...........................................150
Table 8.4 Grouping of 90 test catchments based on RE (%) ranges for the four artificial intelligence based RFFA
models ..................................................................................................................................................................152
Table 8.5 Ranking of the four artificial intelligence based RFFA models for eastern Australia..........................153
Table 8.6 Median Qpred/Qobs ratio values for seven ANN based candidate regions and QRT ..............................159
Table 8.7 Median relative error values (%) for seven ANN based candidate regions and QRT ..........................159
Table 8.8 Coefficient of efficiency (CE) values for seven ANN based candidate regions and QRT ...................159
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XXI
List of symbols
A Catchment area (km2)
bj The threshold value associated with the node j
0 Regression coefficient
C Runoff coefficient
YC Dimensionless runoff coefficient for ARI of Y years
d Sub-storm duration (h)
Ei Elevation at ith level (m)
E Mean annual aerial evapotranspiration (mm)
f Activation function
g The binary gene
I Rainfall intensity (mm/s)
YtcI , Average rainfall intensity for time of concentration tc and Y years ARI (mm/h)
J Node in neural networks
L Mainstream length (km)
l Length of a chromosome
n Number of samples and points
pc Crossover rate
pm Mutation rate
Q Flood discharge (m3/s)
Q2 Flood peak discharge for 2 years average recurrence interval (ARI) (m3/s)
QE Estimated flow (m3/s)
QM Maximum measured flow (m3/s)
Qobs Observed flood quantile (m3/s)
Q Mean of Qobs (m3/s)
Qpred Predicted flood quantile (m3/s)
YQ Peak flow rate for an ARI of Y years (m3/s)
QT Peak flow rate for T years (m3/s)
R2 Coefficient of determination
R Mean annual rainfall (mm/h)
S1085 Slope of central 75% of mainstream (m/km)
tc Time of concentration (h)
T Return period (average recurrence interval) (year)
Vk CUSUM test statistic
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XXII
Wj An input vector of jth node
wij The connection weight from the ith node
w Value of a synaptic weight
xmaxabs Absolute maximum difference
xn nth Input variable
X An input vector
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XXIII
List of abbreviations BoM Bureau of Meteorology
ACT Australian Capital Territory
AEP Annual exceedance probability
AM Annual maximum
ANFIS Adaptive neuro fuzzy inference system
ANN Artificial neural network
ARI Average recurrence interval
ARMA Autoregressive Moving Average
ARR Australian Rainfall and Runoff
AUSIFD Software for Intensity-frequency-duration
BP Backpropagation
BGLS Bayesian Generalised Least Square
BGLS-ROI Bayesian Generalised Least Square - Region-of-influence
BITRE Bureau of Infrastructure, Transport and Regional Economics
CANFIS Co-active neuro fuzzy inference system
CE Co-efficient of Efficiency
CD Compact Disc
DOW Department of Water
Elman Elman partial recurrent neural network
ETs Expression Trees
FFBP Feedforward Backpropagation
FFA Flood frequency analysis
FFN Fuzzy neural Network
FIS Fuzzy inference system
FLIKE Flood frequency analysis software
GA Genetic algorithm
GAANN Genetic algorithm based artificial neural network
GB Grubbs and Beck
GEP Gene expression programming
GIUH Geomorphologic Instantaneous Unit Hydrograph
GLS Generalised Least Square
IFM Index Flood Method
I. E. Australia Institution of Engineers Australia
IFD Intensity-frequency-duration or design rainfall depth
IM Instantaneous Maximum
Artificial Intelligence Based RFFA Aziz
University of Western Sydney XXIV
LGP Linear Genetic Programing
LM Lavenberg-Marquardt
LP3 Log Pearson Type 3
LR Logistic Regression
MATLAB MATrix LABoratory
MF Membership Function
MINITAB Statistical Package
MLFN Multilayer Feedforward Neural Network
MMD Monthly Maximum mean Daily
MSE Mean Squared Error
NCWE National Committee on Water Engineering
NFS Neuro Fuzzy System
NSW New South Wales
NRW Department of Natural Resources & Water
OLS Ordinary Least Square
PCA Principle Component Analysis
pdf Probability density function
PRM Probabilistic Rational Method
QRT Quantile Regression Technique
r Ratio of predicted and observed flood quantile
RE Relative error
RFFA Regional flood frequency analysis
ROI Region of Influence
RR(s) Rating ratio(s)
SDRR Summer Dominated Rainfall Region
SWMM Storm Water Management Model
TAS Tasmania
TDNN Time Delay Neural Network
TSK Takagi, Sugeno and Kang (Fuzzy model)
UK United Kingdom
USGS United States’ Geological Survey
VIC Victoria
WDRR Winter Dominated Rainfall Region
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 1
CHAPTER 1
INTRODUCTION
1.1 General
This thesis focuses on regional flood estimation by applying various non-linear techniques
based on artificial intelligence. The non-linear techniques considered in this thesis include
artificial neural network (ANN), genetic algorithm based artificial neural network (GAANN),
gene-expression programing (GEP) and co-active neuro fuzzy inference system (CANFIS).
This thesis aims to explore and enhance the non-linear techniques in regional flood estimation
so that these techniques can be applied to ungauged and poorly gauged catchments to obtain
accurate design flood estimates in Australia. This chapter begins by presenting a background
to this research, need for this research, research questions to be investigated, research tasks
undertaken and an outline of this thesis.
1.2 Background
Flood is one of the worst natural disasters, which brings disruptions to services and damages
to infrastructure, crops and properties and sometimes causes loss of human lives. For
example, 2010-11 floods in Queensland caused 35 deaths. Effects on industry and other
production units and the costs in the form of health disaster due to flooding also add up to the
overall losses to Australian economy. In Australia, the average annual flood damage is worth
over $377 million and infrastructure requiring design flood estimate is over $1 billion per
annum (BITRE, Australia). The state of New South Wales (NSW) alone has an average
annual cost of flood damage of over $172 million, which is almost 46% of the average annual
cost for Australia. The state of Queensland is second largest in terms of flood damage, with an
average annual cost of $125 million. Importantly, the 2010-11 devastating flood in
Queensland caused flood damage over $5 billion (Queensland Reconstruction Authority,
2011). Figure 1.1 shows flooding of Ipswich city in Queensland during the 2010-2011
flooding. Figure 1.2 shows an aerial view of the flooded south western town of Wagga
Wagga, NSW in March 2012.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 2
Figure 1.1 Flooding at Ipswich, Queensland 2011 (ABC News, Australia)
Floods are caused by factors such as heavy rainfall, snowmelt, dam break and cyclones. The
catchment and land use characteristics determines the magnitude of flooding from a given
rainfall event. Urbanisation and clearing of catchment increase the flood risk for a given
catchment. Apart from rural areas, flood is a serious problem in urban areas where the runoff
volume increases due to increased impervious area plus shorter response time. Climate change
has increased the frequency and magnitude of extreme rainfall events resulting in many
devastating floods in recent years (Ishak et al., 2013; Ishak and Rahman, 2014). Australian
Bureau of Meteorology (BOM) in its state of the climate report 2014 stated “An increase in
the number and intensity of extreme rainfall events is projected for most regions”. This means
there will be more extreme floods in most regions of Australia (BOM, 2014).
Flood damage can be minimised by ensuring optimum capacity to drainage infrastructures.
An underdesign of these structures increases flood damage cost whereas an overdesign incurs
unnecessary expenses. The optimum design of drainage infrastructures depends largely on
reliable estimation of design floods which is a flood discharge associated with a given annual
exceedance probability (AEP).
Design flood estimation is required in numerous engineering applications e.g., design of
bridge, culvert, weir, spill way, detention basin, flood protection levees, highways, floodplain
modelling, flood insurance studies and flood damage assessment tasks. For design flood
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 3
estimation, the most direct method is flood frequency analysis, which requires long period of
recorded streamflow data at the site of interest. This is not a feasible option at many locations
due to absence or limitation of streamflow records. As at 1993, of the 12 drainage divisions in
Australia, seven did not have a stream with 20 or more years of data (Vogel et al., 1993).
Australian Rainfall and Runoff (ARR) 1987 recommended various design flood estimation
techniques for ungauged catchments for different regions of Australia (I. E. Aust., 1987,
2001). Since 1987, the methods in the ARR have not been upgraded although there have been
an additional 20 years of streamflow data available and notable developments in both at-site
and regional flood frequency analyses techniques in Australia and internationally.
Figure 1.2 Aerial view of the flooded south western town of Wagga Wagga, NSW in March 2012
(ABC News, 2012)
Different regional flood estimation methods have been proposed for different parts of
Australia (I. E. Aust., 1987). Among these, various forms of the rational method and the index
flood method are the most common. However, these methods have not been updated since
1987. Because of changing climatic conditions and improvements in regional flood estimation
methods in recent years, there is a need to look for new regional flood estimation techniques
for Australia. Some of the recent developments in regional flood estimation in Australia
include L moments based index flood method (Bates et al., 1998; Rahman et al., 1999),
various forms of regression techniques (Rahman, 2005; Haddad et al., 2006, 2008, 2009,
2014; Haddad and Rahman, 2012; Hackelbush et al., 2009; Zaman et al., 2012; Micevski et
al., 2014) and regional Monte Carlo simulation (Rahman et al., 2002; Caballero and Rahman,
2014).
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 4
Regional flood frequency analysis (RFFA) is the generic name given to describe techniques
which utilises streamflow data from gauged catchments in a region to estimate design floods
for poorly gauged or ungauged catchments. The use of RFFA enables the “transfer” of flood
characteristics information from gauged to ungauged catchments (Bloschl and Sivapalan,
1997; Pallard et al., 2009). The most commonly adopted RFFA methods have been described
in Cunnane (1988) and Hosking and Wallis (1997). RFFA essentially consists of two
principal steps: (a) formation of regions and (b) development of prediction equations.
Regions have traditionally been formed based on geographic, political, administrative or
physiographic boundaries (e.g. NERC, 1975; I. E. Aust., 1987). Regions have also been
formed in catchment characteristics data space using multivariate statistical techniques (e.g.
Acreman and Sinclair, 1986; Nathan and McMahon, 1990; Rao and Srinivas, 2008; Guse et
al., 2010). Regions can also be formed using a region-of-influence approach where a certain
number of catchments based on proximity in geographic or catchment attributes space are
pooled together based on some objective function to form an optimum region (e.g. Burn,
1990; Zrinji and Burn, 1994; Kjeldsen and Jones, 2009; Haddad and Rahman, 2012).
For developing the regional flood prediction equations, the commonly used techniques
include the rational method, index flood method and quantile regression technique (QRT).
The rational method has widely been adopted in estimating design floods for small ungauged
catchments (e.g. Mulvany, 1851; I. E. Aust., 1987; Jiapeng et al., 2003; Pegram and Parak,
2004; Rahman et al., 2011). The index flood method has widely been adopted in many
countries which heavily relies on the identification of homogeneous regions (Dalrymple,
1960; Hosking and Wallis, 1993; Bates et al., 1998; Rahman et al., 1999; Kjeldsen and Jones,
2010; Ishak et al., 2011). The QRT, proposed by the United States Geological Survey
(USGS), has been applied by many researchers using either an Ordinary Least Square (OLS)
or Generalized Least Square (GLS) regression technique (e.g. Benson, 1962; Thomas and
Benson, 1970; Tasker, 1980; Stedinger and Tasker, 1985; Tasker et al., 1986; Madsen et al.,
1997; Pandey and Nguyen, 1999; Bayazit and Onoz, 2004; Rahman, 2005; Griffis and
Stedinger, 2007; Ouarda et al., 2008; Kjeldsen and Jones, 2009; Haddad and Rahman, 2011;
Haddad et al., 2011, 2012).
Most of the above RFFA methods assume linear relationship between flood statistics and
predictor variables in log domain while developing the regional prediction equations.
However, most of the hydrologic processes are nonlinear and exhibit a high degree of spatial
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 5
and temporal variability and a simple log transformation cannot guarantee achievement of
linearity in modeling. Therefore, there have been applications of artificial intelligence such as
artificial neural networks (ANN), genetic algorithm based ANN (GAANN), gene expression
programming (GEP) and co-active neuro-fuzzy inference system (CANFIS) based methods in
water resources engineering such as rainfall runoff modeling and hydrologic forecasting, but
there have been relatively few studies involving the application of these techniques to RFFA
(e.g. Daniell, 1991; Muttiah et al., 1997; Shu and Burn, 2004; Kothyari, 2004; Dawson et al.,
2006; Shu and Ouarda, 2007, 2008). Importantly, there has not been any known application of
artificial intelligence based techniques in RFFA in Australia. Application of these techniques
may help developing new improved RFFA techniques for Australia. Unlike regression based
approach, the artificial intelligence based techniques do not impose any fixed model structure
on the data rather the data itself identifies the model form through use of artificial
intelligence.
This research seeks to fill the knowledge gap in RFFA by undertaking development and
testing of artificial intelligence based RFFA models using the most extensive and
comprehensive database that has become available in Australia as a part of the on-going
revision of the Australian Rainfall and Runoff.
1.3 Need for this research
Flood is one of the worst natural disasters causing millions of dollars’ of damage each year in
Australia. To reduce flood damage, accurate design flood estimates are needed to design
infrastructures such as bridges, culverts and flood protection levees. Australia is the sixth
largest country in the world with numerous streams. Most of these streams are ungauged or
poorly gauged as monitoring of such a large number of streams is too expensive. Moreover,
many of these streams are located far away from townships. The design flood estimation in
small to medium sized ungauged catchments is of great economic significance (Pilgrim and
Cordery, 1993). The need for flood estimation on ungauged catchments is one of the most
important aspects in hydrologic practice as it covers a large number of catchments where
hundreds of infrastructures are built each year in Australia. The accuracy of the flood
estimation for ungauged catchments is important as an over-estimation would result in higher
construction cost and under-estimation would increase flood damage. Hence, development of
new and more accurate RFFA techniques is important since it will help to design adequate
infrastructure that will allow passage of flood water safely.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 6
In Australia, linear modelling techniques have been adopted so far in developing RFFA
models. The application of non-linear techniques such as artificial intelligence-based methods
in RFFA may provide a viable alternative RFFA technique for Australia. This would assist in
benchmarking the results of traditional RFFA models by comparing the results derived by
artificial intelligence based RFFA models.
The findings of this research would help to recommend the most appropriate RFFA
techniques in the 4th edition of Australian Rainfall and Runoff, which is due to be published in
2015.
1.4 Scope and objectives of the study
The study focuses on regional flood estimation problem, in particular it is devoted to
investigate whether artificial intelligence-based RFFA techniques can be applied to eastern
Australia. It requires carrying out a critical literature review on RFFA techniques, selection of
study catchments, collation of flood, climatic and catchment characteristics data, delineation
of regions, identification of the best set of predictor variables, training and validation of
artificial intelligence-based RFFA models and comparison with other RFFA techniques.
The objectives of this study are:
To carry out a critical literature review on RFFA methods with a particular emphasis
on non-linear artificial intelligence based techniques and to identify the gaps in the
current state of knowledge and further research opportunities on the artificial
intelligence based techniques to regional flood estimation problem.
To select study area and catchments from eastern Australia, to collate streamflow data,
to select catchment characteristics that govern flood generation process and prepare
the climatic and catchment characteristics data set for the RFFA modelling.
To select the best performing set of predictor variables for the artificial intelligence
based RFFA models.
To form different candidate regions based on (i) state boundaries (ii) climatic and
geographical boundaries and (iii) catchment characteristics data using multivariate
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 7
statistical techniques and identify the best performing region(s) for artificial
intelligence based RFFA modelling.
To train the artificial intelligence based RFFA models based on ANN, GANN, GEP
and CANFIS.
To validate the artificial intelligence based RFFA models using the validation data set
and select the best performing model.
To compare the best performing artificial intelligence based RFFA model with linear
quantile regression technique.
To make a conclusion based on the results obtained in the study.
1.5 Research questions
This thesis is devoted to answer the following research questions in relation to the
development of artificial intelligence based RFFA models for Australia.
Whether artificial intelligence based techniques can be applied in RFFA in Australia?
What is the best set of predictor variables for the development of artificial intelligence
based RFFA models in Australia?
What is the best region(s) in artificial intelligence based RFFA modelling for Australia
considering regions based on state boundaries, climatic and geographical boundaries
and regions formed in catchment characteristics data space using multivariate
statistical techniques?
How various artificial intelligence based RFFA models can be trained/calibrated?
Among different artificial intelligence based RFFA models (ANN, GAANN, GEP and
CANFIS), which one provides the most accurate flood quantile estimates for Eastern
Australia?
How artificial intelligence based RFFA models compare with linear quantile
regression technique?
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 8
1.6 Summary of research undertaken in this thesis
The main research tasks undertaken in this thesis to answer the research questions posed in
Section 1.4 are outlined below. Figure 1.3 illustrates major steps in this research.
Perform a literature review on RFFA and critically examine advantages and
disadvantages, limitations and assumptions associated with various RFFA techniques,
with a particular emphasis on non-linear artificial intelligence based techniques. Based
on the literature review, identify the gaps in the current state of knowledge and further
research opportunities on the non-linear artificial intelligence based techniques to
regional flood estimation.
Select study area and catchments. Prepare streamflow data by filling gaps in the
annual maximum flood series, checking for outliers, rating curve error and trends.
Select catchment characteristics that govern flood generation and prepare the climatic
and catchment characteristics data set.
Select the best performing set of predictor variables for the artificial intelligence based
RFFA models by comparing various combinations of the initially selected candidate
catchment characteristics variables.
Form different candidate regions based on (i) state boundaries (ii) climatic and
geographical boundaries and (iii) catchment characteristics data using multivariate
statistical techniques. Compare the performances of the candidate regions and select
the best performing region for artificial intelligence based RFFA modelling.
Develop artificial intelligence based RFFA models based on ANN, GANN, GEP and
CANFIS. Train the model using the training data set (80% of the selected catchments),
which involves minimisation of the mean squared error between the observed and
predicted flood quantiles by the model (being trained) for a given ARI for the training
data set. Evaluate the training of the model based on a number of statistical criteria:
plot of predicted and observed flood quantiles, median ratio of predicted and observed
flood quantiles, median relative error and coefficient of efficiency.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 9
Validate the artificial intelligence based RFFA models using the validation data set
(20% of the selected catchments) and select the best performing model.
Compare the best performing artificial intelligence based RFFA model with linear
quantile regression technique.
Figure 1.3 Illustration of major steps in this research
1.7 Outline of the thesis
The research undertaken in this study is presented in nine chapters and four appendices, as
outlined below.
Chapter 1 presents a brief introduction to the proposed research. This includes a background
of the proposed research. This chapter also presents the needs for this research, research
questions being examined and the main research tasks undertaken to answer the identified
research questions.
Chapter 2 presents a critical review of RFFA techniques with a particular emphasis on non-
linear techniques such as artificial neural network (ANN), co-active neuro-fuzzy inference
system (CANFIS), genetic algorithm (GA) based ANN (GAANN) and gene-expression
programming (GEP). At the beginning, various methods of flood estimation are discussed.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 10
The review of linear methods including rational method, index flood method and regression
method are then presented. The nonlinear artificial intelligence based methods are then
discussed with a particular emphasis on their applications to hydrology. The assumptions,
limitations, advantages and disadvantages of each of the RFFA methods are discussed. The
current state of knowledge in RFFA, in particular the artificial intelligence based methods, is
ascertained and the scopes of further research are identified.
Chapter 3 describes the mathematical tools adopted in this study. First, ANN is discussed,
which is followed by a description of GAANN, GEP and CANFIS. The quantile regression
technique is then discussed. The principles of cluster analysis and principal component
analysis are then presented. Finally, the adopted model validation technique is discussed.
Chapter 4 presents selection of study area, study catchments and data preparation. First,
criteria for selection of study catchments are presented. The methods of streamflow data
preparation are discussed which include gap filling, outlier detection, trend analysis and rating
curve error analysis. Selection of catchment characteristics is then presented. The preparation
of annual maximum flood series data is then described. Estimation of flood quantiles for
average recurrence intervals of 2, 5, 10, 20, 50 and 100 years for the selected gauged
catchments by at-site flood frequency analysis is then presented. Finally, a summary of the
catchment characteristics data is provided.
Chapter 5 presents the results of selecting the set of predictor variables for the development
of artificial intelligence based RFFA models. First, an initial selection is made based on the
findings of previous studies. These candidate sets of predictor variables are then evaluated
using ANN and GEP based RFFA models. The final set of predictor variables is then selected.
Chapter 6 presents the formation of regions using ANN based RFFA modelling technique.
Regions/groupings are first formed on the basis of state, geographical and climatic
boundaries. In the second step, the regions are formed in the catchment characteristics data
space based on cluster analysis and principal component analysis. All these candidate regions
are then compared and the best performing region is finally selected.
Chapter 7 presents the development of artificial intelligence based RFFA models using
ANN, GAANN, GEP and CANFIS based on the selected predictor variables in Chapter 5 and
optimum region in Chapter 6. The model development involves training of the model using
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 11
part of the randomly selected data set. For this purpose, 80% (362 catchments) of the total 452
catchments are used to train the model (training data set) and the remaining 20% (90
catchments) are used to validate the model (validation data set). A number of statistical
criteria are adopted to assess the training of the four artificial intelligence based RFFA
models.
Chapter 8 presents the validation of the artificial intelligence based RFFA models and
quantile regression technique. Initially the four artificial intelligence based RFFA models are
compared with each other to select the best artificial intelligence based RFFA model.
Secondly, the best performing artificial intelligence based RFFA model is compared with the
quantile regression technique. The spatial distribution of the relative error for the finally
selected model is evaluated. Finally, the relationship of the relative error with catchment area
is investigated.
Chapter 9 presents the summary of the research undertaken in this thesis, conclusions and
recommendations for further research.
Appendix A presents the list of the study catchments. This provides the area of each
catchment and the period of streamflow records.
Appendix B presents additional results to supplement the discussion presented in the main
body of the thesis.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 12
CHAPTER 2
REVIEW OF REGIONAL FLOOD FREQUENCY
ANALYSIS METHODS
2.1 General
Regional flood frequency analysis (RFFA) is the generic name given to describe techniques
which utilise data from gauged catchments (donor) in a region to estimate design floods for
poorly gauged and ungauged catchments (receiver). There are many RFFA techniques
ranging from simple approximate methods to complex intelligence based techniques. RFFA
technique such as rational method is based on runoff coefficients which are developed and
used on the principles of geographical contiguity. Index flood method is based on the concept
of homogeneous regions which share a common set of growth factors, while regression based
approaches are based on regional prediction equations. These methods are generally
developed based on linear models; however, there are non-linear RFFA methods that are
based on artificial intelligence such as artificial neural network (ANN). This chapter presents
a review of various RFFA methods, in particular the non-linear intelligence based techniques,
with a particular emphasis on the limitations of various methods, recent advancements and
scope for further developments.
2.2 Design flood estimation methods
Different methods can be used to estimate a design flood for a given annual exceedance
probability (AEP) or average recurrence interval (ARI) or return period (T). The ARI of the
annual peak streamflow at a given location change if there are significant changes in the flow
patterns at that location, possibly caused by an impoundment or diversion of flow. The effect
of development (change of land use from forested or agricultural uses to commercial,
residential and industrial uses) on peak flows is generally much greater for low ARI than than
the higher ones. During these larger floods, the upper soil column is generally fully saturated
and does not have the capacity to absorb much additional rainfall. Under these conditions,
essentially all of the rain that falls, whether on paved surfaces or on saturated soil, runs off
and becomes streamflow. The selection of a type of flood estimation method for a given
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 13
application largely depends on the data availability and the purpose of the flood estimates
(Hoang, 2001). Lumb and James (1976), Feldman (1979), James and Robinson (1986) and
Australian Rainfall and Runoff (ARR) (I. E. Australia, 1987) broadly classified design flood
estimation methods into two broad categories: streamflow-based methods and rainfall-based
methods. These are discussed below and illustrated in Figure 2.1.
2.2.1 Streamflow-based flood estimation methods
Streamflow-based flood estimation methods formulate the analysis entirely on recorded data
from stream-gauging station in question and are applicable to gauged catchments, with a
considerably long streamflow record length. In these methods, the design floods for a given
AEP are estimated by undertaking a flood frequency analysis (FFA) of the observed
streamflow data. In this context, a gauged catchment means that streamflow records exist for
flood height and flood flow over a considerable period of time, normally 20 years or longer at
the location of interest so that the parameters of the assumed probability distribution can be
estimated with a reasonably high degree of confidence. The gauging locations are generally
found within a given large catchment and located at the points of interests such as the
convergence of two major creeks or the outlet of the catchment. FFA and regional flood
frequency analysis (RFFA) are the most common streamflow-based methods and these are
discussed below. It should be noted that RFFA methods generally consider catchment
characteristics in estimation; however, FFA is solely dependent on streamflow records.
Figure 2.1 Various design flood estimation methods (modified from Rahman et al., 1998)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 14
Flood frequency analysis (FFA)
Flood frequency analysis (FFA) is a procedure of analysing the recorded flood data by
adopting statistical methods. Statistical techniques, such as FFA, are used to estimate the
AEP of flood or rainfall events. The ARI gives a general indication on how frequently a given
discharge/rainfall will be exceeded on average over a longer period of time. The main
objective of this statistical analysis is to develop a relationship between the magnitude of
extreme flood events and their frequency of occurrence through the use of probability
distributions (Chow et al., 1988). For the analysis to be of practical use, simpler distributions
are often used to characterise the relation between flood magnitudes and their frequencies
(Rao and Hamed, 2000). This deals mainly with direct frequency analysis, where a record of
floods at or near the design site is available. The application of these methods is primarily
made to flood peaks. These may sometimes be applied to flood volumes or even monthly
maximum floods; however, little evidence is available on appropriate types of probability
distributions in these cases (I. E. Australia, 1998). In terms of using the flood data, annual
maximum flood data is more frequently adopted in FFA than the partial series flood data.
Regional flood frequency analysis (RFFA)
Regional flood frequency analysis (RFFA) is a mean of transferring flood frequency
information from gauged catchments to another site on the basis of similarity in catchment
characteristics (I. E. Aust., 1987). This procedure is important for estimating design floods at
ungauged sites as this can stabilise site estimates using the regional relationships, particularly
for parameters such as skew, which is more prone to small-sample errors and data extremes.
In addition, regional relationship can mitigate the effects of outliers and can lead to more
reliable extrapolation of flood frequency curve to rarer frequencies. RFFA although more
commonly applied to ungauged catchments, this can also be adopted to enhance the design
flood estimates at gauged sites where data may be limited in terms of record length.
The use of RFFA enables the transfer of flood characteristics information from gauged to
ungauged sites if the donor catchments are hydrologically similar with the receiver ungauged
site. Last couple of decades have seen extensive research on RFFA. The effort has been to
develop new and improved reliable techniques for flood estimation. Because of vast area of
study, diversity of climatic conditions and site characteristics, different researchers have
emphasized on different issues relevant to RFFA. In the seventies and early eighties much
effort was spent on developing efficient at-site FFA procedures, but late eighties proved to be
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 15
of quite significance in developing new and improved RFFA techniques (e.g. Greis and
Wood, 1983; Potter, 1987; Kirby and Moss, 1987; Cunnane, 1987; NRC, 1988; and WMO,
1989). In late eighties, many suggested to compare the existing and available RFFA methods
and to look for better information/data instead of developing new methods (Potter, 1987;
Bobee et al., 1993a).
Many RFFA methods involve two major steps: (1) grouping of sites into homogeneous
regions, and (2) developing regional estimation method. Grouping of sites into homogeneous
regions or homogeneity is the main factor for the performance of many regional estimation
methods in particular the index flood methods. Geographically contiguous regions have been
used for a long time in hydrology, but have been criticised for being of arbitrary nature. In
fact, the geographical proximity does not guarantee hydrological similarity. During the last
five to ten years researchers have attempted to develop methods in which similarity between
sites is defined in a multidimensional space of catchment or statistical characteristics
(Douglas, 1995).
RFFA is needed to estimate design floods at the locations where there is a lack of sufficient
recorded flood data. The reason of insufficient recorded flood data at many locations are it is
quite expensive to operate stream gauges, and many streams are located at remote locations.
Regional analyses, to some extent can compensate for the lack of temporal data, but introduce
a spatial dimension which is not always well understood. Classical flood frequency analysis,
be it at-site or regional, has been criticised for lacking balance, for putting too much emphasis
on mathematical rigor while completely neglecting the understanding of the physical factors
that cause flood events (Klemes, 1993). According to Klemes (1993), “If more light is to be
shed on the probabilities of hydrological extremes then it will have to come from more
information on the physics of the phenomena involved, not from more mathematics.'' This is a
fact which is difficult to argue against. RFFA, in particular the identification of the physical or
meteorological catchment characteristics that cause similarity in flood response, is a step in
the right direction (e.g. Bates et al., 1998)
2.3 Techniques for RFFA
2.3.1 Linear techniques
Three linear RFFA methods are very common and are currently in use in most parts of the
world:
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 16
Rational method;
Index flood method; and
Regression method.
Rational method
The rational method is a simple technique for estimating a design discharge from a small
watershed. The rational method was developed by Mulvany (1851) for small drainage basins
in urban areas. This method has been widely regarded as a deterministic method for
estimating the peak discharge from an individual storm. In Australian Rainfall and Runoff
(ARR), probabilistic form of rational method known as Probabilistic Rational Method has
been recommended (I. E. Aust., 1987). Application of the rational method is based on a
simple formula that relates peak discharge with the average intensity of rainfall for a
particular length of time (the time of concentration), and catchment area. The formula is:
QY = 0.278CY.ItcY.A (3.1)
Where
QY = Peak discharge (m3/sec) of average recurrence interval (ARI) of Y years;
CY = runoff coefficient (dimensionless) for ARI of Y years;
A = area of catchment (km2)
ItcY = average rainfall intensity (mm/h) for design duration of tc hours and ARI of Y years.
This model is based on the following assumptions:
The rainfall occurs uniformly over the drainage area;
The peak rate of runoff can be reflected by the rainfall intensity averaged over a time
period equal to the time of concentration of the drainage area; and
The frequency of runoff is the same as the frequency of the rainfall used in the
equation.
The use of the rational formula is subject to several limitations and procedural issues in its
use:
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 17
The most important limitation is that the only output from the method is a peak
discharge (the method provides only an estimate of a single point on the runoff
hydrograph).
The simplest application of the method permits and requires the wide latitude of
subjective judgment by the user in its application. Therefore, the results are difficult to
replicate.
The average rainfall intensities used in the formula have no time sequence relation to
the actual rainfall pattern during the storm.
The computation of tc should include the overland flow time, plus the time of flow in
open and/or closed channels to the point of design.
The runoff coefficient, CY is usually estimated from map of runoff coefficient which is
produced based on the assumption of geographical contiguity i.e. runoff of nearby
catchments vary in a smooth fashion. This assumption is unlikely to be satisfied as
there is no guarantee that two nearby catchments are hydrologically similar.
Many users assume the entire drainage area is the value to be entered in the Rational
method equation. In some cases, the runoff from only the interconnected impervious
area yields the larger peak flow rate
In Australia, the Probabilistic Rational Method has been researched by Pilgrim and
McDermott (1982), Adams (1987), Weeks (1991) and Rahman et al. (2008; 2011) and Pirozzi
et al. (2009). There has been limited independent validation of the Probabilistic Rational
Method and the user has little idea about the uncertainty in the estimated flood quantiles
obtained from this method (Rahman and Hollerbach, 2003).
There have been few attempts to improve the rational method using a more advanced
statistical treatment such as Franchini et al. (2005).
Index flood method
The index flood method (IFM), introduced by Dalrymple (1960), is the most widely used
method of RFFA. It is based on the identification of a homogeneous region, within which the
probability distribution of annual maximum peak flows is invariant except for a scale factor
represented by the index flood (either the mean or median flood). Homogeneity with regards
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 18
to the index flood relies on the concept that the standardized flood peaks from individual sites
in the region follow a common probability distribution with identical parameter values. From
all the methods to be discussed in this thesis, this approach involves the strongest assumption
on homogeneity.
The flood peak discharge with an assigned return period T relative to the selected site is, in
fact, expressed as the product of two terms: the scale factor of the examined site (the index
flood) and the dimensionless growth factor, which has regional validity i.e. it is fixed within a
region. In general, it is assumed that the index flood is the average of annual maximum flood
peak flows at the site of interest. For ungauged site, the index flood is estimated from a
regional prediction equation that uses climate and catchment characteristics as predictor
variables.
The literature contains numerous studies on the identification of homogeneous groups of
catchments and the estimation of the growth factor (Reed et al., 1999; Burn and Goel, 2000;
Castellarin et al., 2001), and relatively few on estimating the index flood. Recent studies in
Australia, (Bates et al., 1998; Rahman et al., 1999), assigned ungauged catchments to a
particular homogenous group identified (through the use of L-moments, (Hosking and Wallis,
1993)) on the basis of catchment and climatic characteristics as opposed to geographical
proximity. However the deficiencies in this approach were already evident in that it needed 12
catchment/climatic descriptors to be used. Therefore its practical use is somewhat limited by
its complexity and the time needed to gather the relevant data. On an international level Fill
and Stedinger (1998) and Jeong et al. (2008), both demonstrated that the IFM can provide
improved quantile estimation, when different sources of errors are reduced, such as sampling
error and error due to inter-station correlation. As Australia is extremely diverse in hydrology
there exists a greater heterogeneity among catchments, the use of IFM in Australia is limited
(Bates et al., 1998) as results obtained through IFM would be subject to substantial error.
Therefore a method in Australia is needed where the assumption of homogeneity can be
relaxed and where heterogeneity can be accounted for by capturing the variability from site to
site within a region. Such an approach is quantile regression technique, which is discussed
below.
Australian Rainfall & Runoff (ARR) (I. E Aust., 1987) did not favour the IFM as a design
flood estimation technique. This has been criticised on the basis that the coefficient of
variation of the flood series may vary approximately inversely with catchment area, thus
resulting in flatter flood frequency curves for larger catchments. This had particularly been
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 19
noticed in the case of humid catchments that differed greatly in size (Dawdy, 1961; Benson,
1962; Riggs, 1973; Smith, 1992).
L moments based index flood methods have widely been researched in recent years (e.g.
Bates et al., 1998; Rahman et al., 1999; Zhang and Hall, 2004 and Saf, 2009).
Regression method
The quantile regression technique (QRT) for flood estimation was proposed by The United
States Geological Survey (USGS). In this method a large number of gauged catchments are
selected from a region and flood quantiles are estimated from recorded streamflow data,
which are then regressed against climatic and catchment variables that are most likely to
govern the flood generation process. Studies by Benson (1962) suggested that T-year flood
quantile could be estimated directly using catchment characteristics data by multiple
regression analysis. As with the index flood approach, this method is not based on a constant
coefficient of variation (Cv) of annual maximum flood series in the region. It has been noted
that the method can give design flood estimates that do not vary smoothly with T; however,
hydrological judgment can be exercised in situations such as these when flood frequency
curves need to be adjusted to increase smoothly with T.
The regression coefficients in the QRT are generally estimated by two methods:
Ordinary least squares approach (OLS)
Generalised least squares approach (GLS)
The OLS approach has traditionally been used by hydrologists to estimate the regression
coefficients in regional hydrological models. But in order for the OLS model to be statistically
efficient and robust, the annual maximum flood series in the region must be uncorrelated, all
the sites in the region should have equal record length and all estimates of T year events
should have equal variance. Since the annual maximum flow data in a region does not
generally satisfy these assumptions, the assumption that the model residual errors in OLS are
homoscedastic is violated and the OLS approach can provide distorted estimates of the
model’s predictive precision (model error) and the precision with which the regression model
coefficients are estimated (Stedinger and Tasker, 1985).
Stedinger and Tasker (1985) proposed the GLS procedure to overcome the above mentioned
problem with the OLS. This approach can be used to estimate the parameters of regional
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 20
hydrologic regression models and can produce more accurate results than the OLS, in
particular when the record length varies widely from site to site. In the GLS model, the
assumptions of equal variance of the T year events and zero cross-correlation for concurrent
flows are relaxed. Ever since its inception there have been a number of studies (e.g. Tasker,
1980; Kuczera, 1983; Tasker et al., 1986; Rosbjerg and Madsen, 1995; Madsen et al., 1997;
Pandey and Nguyen, 1999; Bayazit and Onoz, 2004; Griffis and Stedinger, 2007; and
Kjeldsen and Jones, 2009) that have dealt with the QRT in a GLS regression framework, all
of these studies have looked at ways of minimising uncertainty in flood quantile estimation.
Regression based methods have been in the focus in Australia in recent years to estimate flood
quantiles, for example, quantile regression technique (Rahman, 2005; Haddad et al., 2006,
2008, 2009, 2014) and parameter regression technique (Hackelbusch et al., 2009; Haddad and
Rahman, 2012).
Different regional flood estimation methods have been proposed for different parts of
Australia (I. E. Aust., 1987, 2001). Among these, various forms of the rational method and the
index flood method are the most common. However, these methods have not been updated
since 1987. Because of changing climatic conditions and improvements in regional flood
estimation methods in recent years, there is a need to look for new regional flood estimation
techniques for different parts of Australia. Some of the recent developments in regional flood
estimation in Australia include L moments based index flood method (Bates et al., 1998;
Rahman et al., 1999), various forms of regression techniques (Rahman, 2005; Haddad et al.,
2006, 2008, 2009; Hackelbush et al., 2009).
Most of the above RFFA methods assume linear relationship between flood statistics and
predicted variables. However, most of the hydrologic processes are nonlinear and exhibit a
high degree of spatial and temporal variability. There have been applications of non-linear
methods such as artificial neural network (ANN), adaptive neuro fuzzy inference system
(ANFIS), co-active neuro fuzzy inference system (CANFIS), gene expression programming
(GEP), genetic algorithm (GA) and genetic algorithm based artificial neural network
(GAANN) in hydrology in different parts of the world. However, there has not been any
notable application of these techniques in RFFA problem in Australia. Application of
nonlinear techniques may help developing new improved regional flood estimation methods
for Australia. Unlike regression based approach, these do not impose any fixed model
structure on the data; rather the data itself identifies the model form through use of artificial
intelligence. The discussion on various nonlinear RFFA methods is presented below:
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 21
2.3.2 Non-linear RFFA techniques
a) Artificial neural network (ANN)
An ANN is a mathematical or computational model that helps to simulate the structure and/or
functional aspects of biological neural networks. Structurally, they are interconnected group
of artificial neurons that process information using a connectionist approach to computation.
Mostly, ANN is an adaptive system that changes its structure based on external or internal
information that flows through the network during the learning phase. Important aspect of
ANN is its ability to model complex relationships between inputs and outputs or to find
patterns in data.
The development of ANN began approximately 60 years ago (McCulloch and Pitts, 1943),
inspired by a desire to understand the human brain and emulate its functioning. Within the last
decade, it has experienced a huge resurgence due to the development of more sophisticated
algorithms and the emergence of powerful computation tools. Extensive research has been
devoted to investigate the potential of ANN as computational tools that acquire, represent, and
compute a mapping from one multivariate input space to another (Wasserman, 1989).
The development of ANN techniques has experienced a renaissance only in the eighties due
to efforts of Hopfield (1982) in iterative auto-associable neural networks. A tremendous
growth in the interest of this computational mechanism has occurred since Rumelhart et al.
(1986) rediscovered a mathematically rigorous theoretical framework for neural networks,
i.e., back-propagation algorithm. Consequently, so far ANN has been applied to various fields
like neurophysiology, physics, biomedical engineering, electrical engineering, computer
science, acoustics, cybernetics, robotics, image processing and financing.
In early nineties, ANN was applied successfully in hydrology. In the very start this was used
for rainfall-runoff modelling, streamflow forecasting, groundwater modelling, water quality,
water management policy, precipitation forecasting, hydrologic time series modelling and
reservoir operations.
Application of ANN in hydrology
Most hydrologic processes are highly nonlinear and exhibit a high degree of spatial and
temporal variability. They are further complicated by uncertainty in parameter estimates.
Hydrologists are often confronted with problems of prediction and estimation of quantities
such as runoff, precipitation, contaminant concentrations, and water stages. This kind of
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 22
information is required in hydrologic and hydraulic engineering design as well as water
resources management (ASCE, 2000).
Application of neural networks in hydrological modeling was inspired by the work on
forecasting mapping (predictors) for chaotic dynamic systems (Farmer and Sidorowich,
1987). It followed a theorem, proven by Takens et al. (1981), that there exists a
smooth function that is a predictor of a dynamic system featuring an attractor with a
finite fractal dimension.
The Task Committee on Application of ANN in Hydrology by ASCE (2000) stated that ANN
would have to be classified as empirical models. This approach is called a ‘‘model’’ as it has
many features in common with other modelling approaches in hydrology. Empirical models
treat hydrologic systems (such as a watershed) as a black-box and try to find a relationship
between historical inputs (rainfall, temperature, etc.) and outputs (such as watershed runoff
measured at a stream gauge). Lumped catchment models fall under this category (Blackie and
Eeles, 1985). These methods need long historical records and have no physical basis and, as
such, are not applicable for ungauged catchments. This was suggested that physical
understanding can be useful in selecting the appropriate neural network (ASCE, 2000). As
ANN are heavily a data based technique, the committee suggested that optimal data may be
provided with limitation and with certain conditions based on existing sites.
An improvement over these kinds of models is the geomorphology-based models (e.g., Gupta
and Waymire, 1993; and Corradini and Singh, 1985). These models represent the watershed
structure and the stream network well, but various assumptions concerning the linearity of
response of individual watershed units (streams and overland sections) are needed to be made.
ANN has been used in many rainfall and runoff forecasting applications. For example, Luk et
al. (2001) used ANN forecasting model for rainfall forecasting in Australia. They identified
three types of ANN suitable for this application: multilayer feedforward neural network
(MLFN), Elman partial recurrent neural network (Elman) and time delay neural network
(TDNN). They found that these ANN models can make reasonable forecast of rainfall one
time step (15 minutes) ahead for 16 gauges concurrently.
A different approach of ANN was focused by Zhang and Govindaraju (2003), where they
applied geomorphology based ANN (GANN) for estimation of direct runoff over watersheds
catchment in Indiana, US. They concluded that GANN offer a promising step towards
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 23
elevating ANN from purely empirical models to those models that are based on
geomorphology. In his analysis, he found GANN to be outperforming geomorphologic
instantaneous unit hydrograph (GIUH) models.
Abrahart and See (2007) concluded that power of ANN model depends on the
reduced set of inputs. Chokmani et al. (2008) compared the results from ANN and
multiple regression techniques for ice-effected streamflow estimation in Canada. He used
nine different variables as inputs and found ANN to be outperforming the regression
techniques.
There have been some applications of ANN models in RFFA. Muttiah et al. (1997) used ANN
for the 2-year flood prediction in USA catchments. For each gauging station, the two year
peak discharge, drainage area, basin elevation, and average slope were extracted from the file
(containing 150 variables for each gauging station) for statistical and neural network analysis;
they concluded that ANN can provide reasonable estimates of Q2 discharge with simpler
variable input (input vector reductions) requirements. They used a set of data from different
catchments in USA. Kothyari (2004) used ANN for flood estimation of ungauged catchments
in India. He selected data from 97 catchments spread over a large part of India, with area
ranging from 14.5 km2 to 935,000 km2. He considered five different catchment characteristics
as predictor variables including mean annual flood discharge, area, slope of catchment,
rainfall and vegetation cover. He compared two scenarios: Scenario 1 with 12-neurons in the
hidden layer and scenario 2 with 1 neuron only in the hidden layer. He found that scenario 2
provided the best results with minimum error and best R2 values for training, validation and
testing data sets. He also described that an ANN model having more complex architecture
than the one used in scenario 1 did not produce any better results. He suggested that the
results from ANN models can be improved if the region is based on hydro-meteorological
similarity.
Dawson et al. (2006) applied the ANN using different site descriptors for flood estimation at
ungauged catchments in UK. They found that ANN could be used to estimate flood statistics
for ungauged catchments quite successfully. While ANN had been trained in their study to
model T-year flood magnitudes derived from the Gumbel distribution, they could just as
easily be trained to model floods derived from any other distribution. Although it would have
been possible to use conventional statistical approaches to build models for predicting T-year
flood events, the ANN proved to be superior in their study. However, there were a few
caveats to be noted. Firstly, the ANN was heavily data dependent. This was highlighted by
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 24
improvements in skill achieved by training ANN on the full available data set instead of a
limited (urban) data set. Secondly, the ANN could not explicitly account for physical
processes, reducing confidence in model predictions. Finally, despite limiting the analysis to
those sites that had at least ten years of record, the limited data at certain sites meant that
some T-year flood events and index floods could be grossly under- or over-estimated. This
was exacerbated when the data included periods of long-term drought or above average long
term rainfall. In those cases, the ANN might be predicting the T-year flood event accurately
but, with only limited observed data, evaluation of skill could be problematic. Dawson et al.
(2006) recommended the partitioning of data on the basis of size, geology and climatic
conditions. They also recommended the application of other ANN models like radial basis
function networks and support vector machines.
Turan and Yurdusev (2009) applied feed forward backpropagation neural networks,
generalized regression neural network and fuzzy logic estimate unmeasured data using the
data of the four runoff gauging station on the Birs River in Switzerland. The performances of
these models were measured by the mean square error, coefficient of determination and
coefficient of efficiency to choose the best fit model. Out of above mentioned techniques,
they observed that model of feedforward backpropagation (FFBP) algorithm should be
selected over the other models if the flows of station would be predicted. Based on the
findings of this study, it was concluded that the best method should be sought to model river
flows based on the flow values of the rivers considered as the specific characteristics of the
basin which feeds the river and the climatic conditions which may vary year by year. Such
exercises may be useful in practice to estimate the missing values of a downstream station
from those of upstream stations.
The application of ANN require careful consideration as highlighted by Maier and Dandy
(2000) who reported a review on ANN based on 43 papers dealing with the use of ANN
models for the prediction and forecasting of water resources variables. They found that in all
but two of the papers reviewed, feedforward networks were used. The vast majority of these
networks were trained using the backpropagation algorithm. They mentioned that issues in
relation to the optimal division of the available data, data pre-processing and the choice of
appropriate model inputs were seldom considered. In addition, the process of choosing
appropriate stopping criteria and optimising network geometry and internal network
parameters was generally described poorly or carried out inadequately. All of the above
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 25
factors could result in non-optimal model performance and an inability to draw meaningful
comparisons between different models.
However, one limitation of ANN is that, like other empirical methods, they are unable to
reliably extrapolate beyond the range of the data used for model calibration (Flood and
Kartam, 1994; Minns and Hall, 1996; Tokar and Johnson, 1999). This well-known limitation
of data-driven models is primarily because they are not based on the underlying physics.
Physically-based models tend to perform better at model extrapolation for inputs that are
outside of the range of those used in the calibration data as the mass and energy constraints
they comply with may still result in an appropriate response. Accordingly, it can be very
difficult to determine when data-driven models, such as ANN, will fail to generalize and to
understand the range of applicability of the model. This is true for all the RFFA techniques
e.g. index flood method, rational method and regression based methods.
ANN has been used in various parts of the world; however the application of ANN to RFFA
is very limited. In case of Australia, ANN has been used in the hydrological problems other
than RFFA. But to the author’s knowledge, there is no notable ANN based RFFA study in
Australia.
a) Genetic algorithm based artificial neural network (GAANN)
Genetic Algorithm (GA) was invented by John Holland during 1960s and 1970s (Holland,
1975) and was finally popularized by one of his students who was able to solve a difficult
problem involving the control of gas pipeline transmission for his dissertation (Goldberg,
1989). The concept of GA evolved from the biological evolutionary process. The major
difference between GA and the classical optimization search techniques is that the GA works
with a population of possible solutions; whereas, the classical optimization techniques work
with a single solution (Jain et al., 2005). GA is based on the Darwinian-type survival of the
fittest strategy, whereby potential solutions to a problem compete and mate with each other in
order to produce increasingly stronger individuals. Each individual in the population
represents a potential solution to the problem that is to be solved and is referred to as a
chromosome (Rooji et al., 1996).
A number of selection techniques has been developed by various researchers like ‘roulette
wheel’ (Holland, 1975), ‘stochastic universal sampling’ (Baker, 1987), ‘sigma scaling or
truncation’ (Goldberg, 1989), ‘boltzmann selection’ (de la Maza and Tidor, 1993), ‘rank
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 26
selection’ (Baker, 1985) and ‘tournament selection’ (Goldberg and Deb, 1991); however, their
success and utility depends upon the nature of problem in hand.
Application of GA in hydrology
In the fields of hydrology and water resources, although the GA techniques have been used
widely to solve a number of water resources problems (Wang, 1991; Franchini, 1996;
Franchini and Galeati 1997; Savic et al., 1999; Khu et al., 2001; Cheng et al., 2002); the
combined use of GA and ANN i.e. GAANN could not attract much attention of researchers as
yet. The probable reason might be that the algorithm of backpropagation (BP) is much simpler
and easy to understand than GA; hence, most of the ANN applications in literature used back
propagation algorithm. The GA and ANN hybrid applications in water resources field are
limited. One of the hybrid application studies, Jain and Srinivasulu (2004) demonstrated that
GA is better than BP for training an ANN model to predict daily flows more accurately.
Morshed and Kaluarachchi (1998) conducted experiments to compare GA and BP in
streamflow and transport simulations. They reported better performance of BP over GA and
concluded that their results were based on a single set of simulations, and therefore, more
research is needed to prepare alternate GA as a complementary to BP for situations where BP
may fail. See and Openshaw (1999) recombined a series of neural networks via a rule based
fuzzy logic model that has been optimized using a GA. Abrahart et al. (1999) also used a GA
to optimize the inputs to an ANN model used to forecast runoff from a small catchment. Rao
and Jamieson (1997) used hybrid neural network and genetic algorithm approach to
investigate the minimum-cost design of a pump-and-treat aquifer remediation scheme. Wu
and Chau (2006) applied neural networks and GA in flood forecasting. They applied the
model to a reach in the middle section of the Yangtze River in China. All the three techniques
i.e., ANN, GA and GAANN were applied separately. They concluded that when a cautious
treatment was addressed to avoid over-fitting problems, the hybrid GAANN model produced
more accurate flood predictions of the channel. According to authors, hybrid models such as
ANN and GAANN could be considered as feasible alternatives to conventional models and it
would be worth exploring into different types of hybrid techniques.
In the field of RFFA there are few studies using BPANN (Dawson et al., 2005 and Aziz et al.,
2013) but to the best of author’s knowledge there has been no notable application of GAANN
in RFFA especially using the Australian conditions and the data.
b) Gene-expression programming (GEP)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 27
Gene-expression programming (GEP) is (like GA and genetic programming (GP)), a GA as it
uses population of individuals, selects them according to fitness, and introduces genetic
variation using one or more genetic operators (Mitchell, 1996). The fundamental difference
between the three algorithms resides in the nature of the individuals: in GA the individuals are
linear strings of fixed length (chromosomes); in GP the individuals are nonlinear entities of
different sizes and shapes (parse trees); and in GEP the individuals are encoded as linear
strings of fixed length (the genome or chromosomes) which are afterwards expressed as
nonlinear entities of different sizes and shapes (i.e., simple diagram representations or
expression trees).
GEP is an evolutionary computing method that generates a ‘transparent’ and structured
representation of the rainfall-runoff system being studied. The nature of GEP allows the user
to gain additional information on how the system performs, i.e., gives an insight into the
relationship between input (e.g. rainfall and evaporation) and output (flood runoff) data. One
of the additional advantages of this approach over the neural combination method is the
model’s ability to represent itself in the form of mathematical expressions (Fernando et al.,
2009).
GEP (which is an extension of GP (Koza, 1992)), is a search technique that evolves computer
programs (e.g., mathematical expressions, decision trees, polynomial constructs, and logical
expressions). Computer programs generated by GEP are encoded in linear chromosomes and
are then expressed or translated into expression trees (ETs). GEP is a comprehensive
genotype/phenotype system, with the genotype totally separated from the phenotype, whereas
in GP, genotype and phenotype are mixed together in a simple replicator system (Ferreira,
2001a, b; Guven and Aytek, 2009).
Application of GEP in hydrology
In case of water resource engineering, GP has been successfully applied in few cases to solve
various problems. Giustolisi (2004) used GP to determine Chezy resistance coefficient in
corrugated channels; Rabunal et al. (2007) applied GP and ANN to determine the unit
hydrograph of a typical urban basin; Guven et al. (2008) used the linear genetic programming
(LGP) approach for time-series modeling of daily flow rate; Guven and Gunal (2008)
successfully applied GEP approach for prediction of local scour downstream hydraulic
structures. These studies have drawn the hydrologists in investigating the use of GP in
estimating the river flow data (Guven, 2009; Guven and Talu, 2010; Guven and Kisi, 2011).
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 28
Most recently, Kisi and Shiri (2011) forecasted precipitation using wavelet-genetic
programming; Azamathulla and Ghani (2011) predicted the longitudinal dispersion
coefficients in streams and Azamathulla et al. (2011) developed stage-discharge rating curves
of Pahang River by using GEP.
In the context of rainfall-runoff modelling, the combination modelling approach advocates the
synchronous use of simulated discharges obtained from a number of rainfall-runoff models to
produce an overall combined/integrated discharge output which can be used as an alternative
to that produced by a single rainfall-runoff model. At present only a limited number of studies
have dealt with the multi-model combination of hydrological models (Coulibaly et al., 2005;
See and Openshaw, 2002; Shamseldin and O'Connor, 1999; Shamseldin et al., 1997). The
emerging conclusion from these pioneering studies is that the combination modelling
approach has tremendous potential for improving the accuracy and reliability of hydrological
modelling forecasts and predictions. However, in these studies no attempts had been made to
explore the nature of the combination function and their inner workings. Further, no
explanation had been provided to account for the drivers behind the improvements in the
modelling results essential to advance the use of combination modeling approaches in the
field of hydrology.
Savic et al. (1999) applied GEP approach for rainfall-runoff modelling. They used the Kirkton
catchment in Scotland (UK) for flow prediction. They concluded that the results of the data-
driven approaches (GP and ANN) could show a very good agreement with the conceptual
model results for which parameters were optimised using the best available optimisation
techniques. However, genetic programming seems to give more insight into the form of the
rainfall-runoff relationships than ANN because it explicitly gives the form of the function
identified. It also partially alleviates the problem of identifying the large number of
parameters necessary for conceptual model calibration. The number of GP parameters
(population size, crossover and mutation probability) is much smaller and does not necessarily
need to change for different rainfall-runoff problems.
Fernando et al. (2012) used GEP to forecast the river flow for different catchments in China
and Ireland. They investigated the application of the novel data driven technique of GEP
to develop one-day-ahead flow forecasting models for catchments with widely differing
characteristics. The outcome of the study found to be positive, although no comparisons
have been made with forecasts from other models, the fact that these are transparent
models and can serve the general purpose of producing daily forecasts of high accuracy
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 29
is valuable. Fernando et al. (2009) applied GEP to develop a combined runoff estimate
model from conventional rainfall-runoff model output. They investigated the structure of
the combined model (ANN and GEP) and also the use of GEP to develop a combination
rainfall-runoff model through the process of symbolic regression. They developed the GEP
model using the daily simulated river flows of four other rainfall runoff models for the Chu
catchment located in Vietnam. They found that GEP can be successfully used to combine
model outputs from other basic rainfall-runoff models to develop one with greater accuracy.
The combination allows an insight into the components that make up the model in terms of
mathematical expressions thereby making the GEP model unlike its “black-box” counterparts
that have been used in the past to develop combination models. The mathematical expressions
generated by the programming process can be subsequently applied to other data sets not used
in the model development as well as to further investigate the contributions from each of the
sub-models.
The most relevant study to RFFA has been conducted by Seckin and Guven (2012); where
GEP and linear genetic programming (LGP), which are extensions to GP, in addition to
logistic regression (LR) were employed in order to forecast peak flood discharges. The
data from 543 gauged sites across Turkey was used for the study. Drainage area, elevation,
latitude, longitude, and return period were used as the inputs while the peak flood
discharge was the output. They found that the proposed LGP and GEP models provided
a fast and practical way of estimating the peak flood discharges. The results of their
study indeed encourage the use of genetic programming in other aspects of water
resources engineering studies. The proposed LGP and GEP models offer no restriction
since they do not employ predefined functions unlike most regression-based models.
The results of their study suggest that both genetic programming techniques, LGP and
GEP can be successfully applied in estimating the peak discharges of floods in RFFA.
As discussed the application of GEP based technique in RFFA is very limited; however
there is no significant study for RFFA based on GEP using Australian data.
c) Co-active neuro-fuzzy inference system (CANFIS)
Fuzzy logic is a form of multi-valued logic derived from fuzzy set theory to deal with
reasoning that is approximate rather than precise. In contrast with "crisp logic", where binary
sets have binary logic, the fuzzy logic variables may have a membership value of not only 0
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 30
or 1 – that is, the degree of truth of a statement can range between 0 and 1 and is not
constrained to the two truth values of classic propositional logic Furthermore, when linguistic
variables are used, these degrees may be managed by specific functions (Novak et al., 1999).
In the field of artificial intelligence, Neuro-fuzzy refers to combinations of artificial neural
networks and fuzzy logic. Neuro-fuzzy was proposed by Jang (1993). Neuro-fuzzy
hybridization results in a hybrid intelligent system that synergizes these two techniques by
combining the human-like reasoning style of fuzzy systems with the learning and
connectionist structure of neural networks. Neuro-fuzzy hybridization is widely termed as
Fuzzy neural Network (FNN) or Neuro fuzzy System (NFS) in the literature. The Adaptive
neuro fuzzy inference system (ANFIS) is a soft computing technique which makes use of the
benefits of both the ANN and fuzzy systems. ANFIS serves as a basis for constructing a set of
fuzzy if-then rules with appropriate membership functions to generate the stipulated input-
output pairs
Generalized form of ANFIS is called as CANFIS. In CANFIS both Neural networks (NN) and
Fuzzy inference system (FIS) play an active role in an effort to reach a specific goal. CANFIS
has extended the notion of single-output system of ANFIS to produce multiple outputs.
Application of CANFIS in hydrology
Hydrologic analysis is complicated by uncertainties caused by nature (e.g., climate, land
characteristics), limited data, and imprecise modelling. For instance, aquifer parameters are
obtained from a few locations that represent a small fraction of the total volume. Definition of
system boundaries and initial conditions also introduce uncertainty. Future stresses on the
system are also imprecisely known. The stochastic approach of uncertainty analysis considers
aquifer properties as random variables with known distributions. Thus, the outputs from a
stochastic model are also characterized by the statistical moments or the full probability
density function. However, the point in favour of fuzzy logic is; despite the theoretical
development of the stochastic approach, its practical application is rather limited, especially if
a point process model needs to be upscaled (Bogardi et al., 2003). Hydrological sciences
require temporal and spatial data sources for a proper understanding of the phenomenon
concerned. This information provides foundation for the preparation and interpretation and
deduction of logically acceptable conclusions. In many hydrological studies, numerical data
are pumped into mathematical models, especially through readily available computer
software, which may produce unreliable results if the background of the working mechanism
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 31
related to any natural hydrological phenomenon is not appreciated qualitatively through
verbal information (Sen, 2009).
Nayak et al. (2003) applied Neuro-fuzzy System to model the river flow of Baitarani River in
India and compared its performance with the ANN and autoregressive moving average
(ARMA) models. The appropriate input was selected by testing different combinations of
flows at different time lags. The study also investigated the issue of transformation of input
data (into normal domain) by comparing the performance of models developed on
transformed and non-transformed data prior to being used as inputs to the models. It was
observed that the model performance increased significantly by using the transformed input
data. The results of the study showed that the neuro-fuzzy models performed slightly better
than ANN but it outperformed the ARMA model in terms of all performance indices.
Jacquin and Shamseldin (2006) developed two types of fuzzy rainfall runoff models based on
Takagi-Sugeno fuzzy inference systems. The developed models are applied to the data of six
catchments of diverse climatic characteristics. The results of the developed models are
compared with those of Simple Linear Model, the Linear Perturbation Model and the Nearest
Neighbour Linear Perturbation Model. The study concluded that the FIS is a suitable
alternative to the traditional methods of modelling non-linear rainfall and runoff.
Talei et al. (2010a) evaluated the rainfall runoff modelling for a sub-catchment of Kranji basin
in Singapore by using a neuro-fuzzy computational technique. The result of the ANFIS was
compared with those of physically based model storm water management model (SWMM). It
was found that two inputs (rainfall at time t and at time t-1) have the maximum coefficient of
efficiency. It was found that ANFIS model is comparable to storm water management model
(SWMM) in terms of goodness of fit. The potential of ANFIS for hydrological modelling was
assessed by applying the ANFIS model to monthly inflows of Bhakara Dam in India (Lohani
et al., 2012). The proposed ANFIS models were compared with ANN and with
Autoregressive (AR) models in order to determine the performance. Karimi et al. (2013)
employed two data driven models ANFIS and ANN models for predicting hourly sea levels
for Darwin Harbor, Australia.
Firat and Gungor (2007) applied neuro-fuzzy technique for flow estimation of the River Great
Menderes in Turkey. The results were compared with the observed flows in order to evaluate
the performance of the training/testing of this model. Using a data set of 5844 daily runoff
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 32
data this was found that the ANFIS models were accurate, reliable, and highly efficient and
with minimum root mean square error values.
Oarda and Shu (2007) developed the models for RFFA at ungauged sites using the neuro-
fuzzy for the hydrometric station network of southern Quebec, Canada. They used 15 years
historical data consisting of 151 gauging stations. It was found that neuro-fuzzy approach
provided a mechanism for integrating the two major steps, regionalisation and estimation, in
the RFFA into one system.
A comparative study of ANN and neuro-fuzzy in continuous modelling of the daily and
hourly behaviour of runoff was performed by Aqil et al. (2007). The data was derived from
the Cilalawi River basin in Indonesia. The total drainage area of the Cilalawi River basin is
approximately 60.17 km2. Forest, paddy field and perennial plantation dominate the land use
system in the river basin, which account for 85% of the area. Two types of three layer Feed
forward neural network (FFNN) models, each with one input layer, one hidden layer, and an
output layer, were developed in this study. Three different network architectures and training
algorithms were investigated, namely, Levenberg–Marquardt-FFNN, Bayesian regularization-
FFNN, and neuro-fuzzy. When contesting against the Levenberg–Marquardt-FFNN and the
Bayesian regularization-FFNN, the neuro-fuzzy model had proved better generalization
capabilities and adaptability in modelling complex rainfall–runoff dynamics.
ANFIS has been used in the field of hydrology in various parts of the world. But its
application in RFFA is very limited so far. Especially in Australia, the unique climatic and
geographical conditions draw a line from the rest of world for the application of ANFIS and
model development. There is no evidence of its application in RFFA in Australia till todate.
2.4 Summary
This chapter has discussed various regional flood frequency analysis (RFFA) techniques with
a particular emphasis on non-linear techniques i.e. artificial neural network (ANN), co-active
neuro-fuzzy inference system (CANFIS), genetic algorithm (GA) and gene-expression
programming (GEP). It has been found that the RFFA is widely used in design flood
estimation for ungauged catchments. There are many RFFA methods in the literature having
specific assumptions and data requirements. In Australia (in particular in New South Wales
and Victoria), a linear method i.e., the Probabilistic Rational Method was the method of
choice since 1987, which is likely to be changed in the new version of Australian Rainfall and
Runoff. More recently, regression based RFFA methods have been widely investigated in
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 33
Australia. Most of the linear RFFA methods assume linear relationship between flood
statistics and predictor variables. However, most of the hydrologic processes are nonlinear
and exhibit a high degree of spatial and temporal variability, a simple log transformation (the
most common form of transformation) cannot guarantee achievement of linearity in RFFA
modelling. They are further complicated by uncertainty in parameter estimates. Increased
computing power has created new opportunities for hydrologists for the solution of complex
problems using non-linear intelligence based techniques such as ANN, CANFIS, GA and
GEP. These non-linear techniques have been widely used in rainfall and streamflow
forecasting; however, there have been only few studies on RFFA that are based on these
techniques. In particular, there has been no major RFFA research in Australia based on these
non-linear techniques. Non-linear techniques for regional flood estimation could be powerful
methods of modelling as these do not impose a model structure on the data (i.e. they are
model free techniques).
The choice of non-linear model structure, grouping of data into meaningful regions, selection
of appropriate predictor variables, carefully designed model training, testing and validation
methods are key to the development of successful RFFA models based on various non-linear
techniques discussed in this chapter.
Non-linear techniques especially ANN have raised to prominence as a viable alternative to
many traditional water resources models, particularly in the field of forecasting hydrologic
variables. Some of the important features that have contributed to their popularity include
their ease of implementation, their ability to learn from examples without explicit knowledge
of the underlying physics and their powerful generalization abilities. However, one limitation
of the non-linear techniques is that they are data dependent and data driven models. But
unlike most commonly used regression based models, non-linear techniques do not impose a
fixed model.
As the Australian climate and geography are different from rest of the world, with one of the
most variable hydrology it is important to investigate the applicability of these non-linear
techniques in RFFA problems. Hence, this research focuses on the development and testing of
artificial intelligence based RFFA methods for Australia.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 34
CHAPTER 3
METHODOLOGY
3.1 General
This chapter presents the statistical and mathematical tools adopted in this study to develop
the artificial intelligence based RFFA models and quantile regression technique. The cluster
analysis and principal component analysis are also described which are used to group the data
in catchment characteristics data space. At the beginning, artificial neural network (ANN)
method is presented, which is followed by genetic algorithm based ANN, gene-expression
programming, co-active neuro fuzzy inference system, quantile regression technique, cluster
analysis and principal analysis. At the end, adopted validation technique is presented.
3.2 Methods adopted in the study
Initially the RFFA methods based on artificial intelligence are discussed in detail. This covers
the features, fundamental concepts, mathematical equations and input data requirements for
each of these methods. Later, the linear techniques are discussed with major emphasise on
QRT. These are presented in the Figure 3.1.
Figure 3.1 Different RFFA techniques adopted in this study
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 35
3.2.1 Artificial neural network (ANN)
There are various types of ANN and their applications are found in many different fields of
science and engineering. Since the first neural model by McCulloch and Pitts (1943), there
have been developments of hundreds of different models considered as ANN. The differences
in them might be the functions, the accepted values, the topology, the learning algorithms, and
the like. Since the function of ANN is to process information, they are used mainly in fields
related to information processing. There are a wide variety of ANN that are used to model real
neural networks, and study behaviour and control in animals and machines, but also there are
ANN which are used for engineering purposes such as pattern recognition, forecasting, and
data compression.
In the ANN modelling, natural neurons receive signals through synapses located on the
dendrites or membrane of the neuron as shown in Figure 3.2. When the signals received are
strong enough (surpass a certain threshold), the neuron is activated and emits a signal through
the axon. This signal might be sent to another synapse, and might activate other neurons.
Figure 3.2 Structure of typical natural neuron (Source:
http://staff.itee.uq.edu.au/janetw/cmc/chapters/Introduction/)
Features and strengths of ANN
1. The most important aspect of ANN is its non-linearity.
2. ANN has the ability to perform input-output mapping in an intelligent manner. This
helps developing a relationship between the input and desired output. ANN has an
ability to adjust its parameters, known as weights, so that the difference between the
actual output from the ANN and the desired output under a certain input is minimized.
This makes the ANN remarkable. There is a bit of similarity between regression
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 36
modelling and ANN as they both find an optimum set of coefficients to achieve input-
output transformation; however, ANN can use complex non-linear models in making
such transformation.
3. Adaptivity is the main characteristic of ANN. They can adapt free parameters or
changes in the surrounding environment.
Working structure of artificial neural network (ANN)
A neural network comprises the neuron and weight building blocks. The behaviour of the
network depends largely on the interaction between these building blocks. There are three
types of neuron layers: input, hidden and output layers. Two layers of neuron communicate
via a weight connection network. There are four types of weighted connections: feedforward,
feedback, lateral, and time-delayed connections. A typical configuration of a feedforward
three layer ANN can be seen in Figure 3.3.
Figure 3.3 Configuration of Feedforward Three-Layer ANN (ASCE, 2000)
Various forms of architecture of ANN are discussed below:
Feedforward connections: For all the neural models, data from neurons of a lower layer are
propagated forward to neurons of an upper layer via feedforward connections networks.
Feedback connections: Feedback networks bring data from neurons of an upper layer back to
neurons of a lower layer. In other words, through connection links signals are passed between
nodes.
Lateral connections: The connection strength is represented by associated weight to each
link. One typical example of a lateral network is the winners-takes-all circuit, which serves
the important role of selecting the winner.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 37
Time-delayed connections: Delay elements may be incorporated into the connections to
yield temporal dynamics models. They are more suitable for temporal pattern recognition.
The architecture of ANN represents the pattern of connection between nodes, its method of
determining the connection weights, and the activation function. Alkon (1989), Fausett
(1994), Caudill (1987, 1988 and 1989) presented a comprehensive description of ANN. As
mentioned above, a typical ANN consists of a number of nodes and these nodes are arranged
in a particular order as that of biological neurons.
One way of classifying ANN is by the number of layers: single (Hopfield nets), bilayer
(Carpenter/Grossberg adaptive resonance networks), and multilayer (most backpropagation
networks). ANN can also be categorised based on the direction of information flow and
processing. In a feedforward network, the nodes are generally arranged in layers, starting
from a first input layer and ending at the final output layer. There can be several hidden
layers, with each layer having one or more nodes. Information passes from the input to the
output side. The nodes in one layer are connected to those in the next, but not to those in the
same layer. Thus, the output of a node in a layer is only a dependent on the inputs it receives
from previous layers and the corresponding weights. On the other hand, in a recurrent ANN,
information flows through the nodes in both directions, from the input to the output side and
vice versa. Sometimes, lateral connections are used where nodes within a layer are also
connected (Smith, 1993; Wasserman, 1993; Lawrence, 1994; Bishop, 1995).
The input or the first layer receives the input variables for the problem at hand. This consists
of all quantities that can influence the output. The input layer is thus transparent and is a
means of providing information to the network. The last or output layer consists of values
predicted by the network and thus represents model output. The number of hidden layers and
the number of nodes in each hidden layer are usually determined by a trial-and-error
procedure. The nodes within neighbouring layers of the network are fully connected by links.
A synaptic weight is assigned to each link to represent the relative connection strength of two
nodes at both ends in predicting the input-output relationship. These kinds of ANN can be
used in solving a wide variety of problems, such as storing and recalling data, classifying
patterns, performing general mapping from input pattern (space) to output pattern (space),
grouping similar patterns, or finding solutions to constrained optimization problems. A
system input vector composed of a number of causal variables that influence system
behaviour, and system output vector composed of a number of resulting variables that
represent the system behaviour (Theodoridis and Koutroumbas, 2009).
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 38
Mathematical treatment of ANN
The overall output value of a neuron can be expressed as below:
yj = f (X Wi – bj) (3.1)
Where, the input in the first layer forms an input vector:
X = [x1. . . xi, . . . , xn] (3.2)
The sequence of weights leading to the node forms a weight vector:
Wj = [w1j, . . . ,wij, . . ., wnj] (3.3)
where,
j = 1, 2, …n and
m = number of neurons
Where, wij represents the connection weight from the ith node in the preceding layer to this jth
node. The output of node j, yj, is obtained by computing the value of function f with respect to
the inner product of vector X and Wj minus bj, where bj is the threshold value, also called the
bias, associated with this node. In ANN parlance, the bias bj of the node must be exceeded
before it can be activated.
The sigmoid function is a bounded, monotonic, non-decreasing function that provides a
graded, nonlinear response. This function enables a network to map any nonlinear process.
The popularity of the sigmoid function is partially attributed to the simplicity of its derivative
that will be used during the training process. Some researchers also employ the bipolar
sigmoid and hyperbolic tangent as activation functions, both of which are transformed from
the sigmoid function. A number of such nodes are organized to form an ANN.
The function f in (Equation 3.1) is called an activation function. Its functional form
determines the response of a node to the total input signal it receives. Typically the sigmoid
function is expressed as below:
x
x
e
exf
1
1)( (3.4)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 39
In the ANN modelling adopted in this study, Lavenberg-Marquardt method was used as the
training algorithm to minimize the mean squared error (MSE). The purpose of training an
ANN with a set of input and output data is to adjust the weights in the ANN to minimize the
MSE between the desired outputs and the ANN outputs. The degree of error increases with
the number of layers in the network and with the percentage change in the weights. However,
the degree of error is essentially independent of the number of weights per neuron and the
number of neurons per layer, as long as these numbers are large (close to 100 or more). The
data set was split into training and validation sub-sets. In this study, the testing data set was
selected randomly to produce a reasonable sample of different catchment types and sizes. A
feedforward ANN consisting of three layers (input, hidden and output layers) was used with
the training algorithm known as ‘backpropagation of error’. Three hidden-layered neural
networks were selected with 7, 3 and 1 neurons to each of these three layers. Two inputs,
catchment area (A) and rainfall intensity with duration equal to time of concentration (tc) and
a given average recurrence interval (ARI) were used in one input layer and one output layer
with one output called predicted flood quantile (Qpred). The transfer function used for the
hidden layers and the output layer was all hyperbolic tangent sigmoid function (Equation.
3.4). Transfer functions calculate a layer’s output from its net input. A maximum training
iteration of 20,000 was adopted. Each predictor and predictand was standardized to the range
of (0.05, 0.95), such that extreme flood events which exceeded the range of the training data
set could be modelled between the boundaries (0, 1) during testing. A learning rate of 0.05
was used together with a momentum constant of 0.95. MATLAB was used to perform the
ANN training. To select the best performing model the different combinations of hidden
layers, algorithm, and number of neurons were observed against the MSE value. In order to
obtain the best ANN-based model, the MSE values between the observed and predicted flood
quantiles were calculated and the training was undertaken to minimise this error. To avoid
over-training during the training of ANN model, the MSE values were also calculated for the
testing data set. If the testing MSE was increasing, even when the training MSE still was
decreasing, the training of the ANN was terminated. This ensured the training quality of the
ANN and avoided over-fitting.
3.2.2 Genetic algorithm based ANN (GAANN)
In this study the analysis was done using two different types of ANN, one using the
backpropagation technique and the other using genetic algorithm (GA) technique for
optimization.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 40
The major difference between GA and the classical optimization search techniques is that the
GA works with a population of possible solutions; whereas, the classical optimization
techniques work with a single solution (Jain et al., 2005). GA is based on the Darwinian-type
survival of the fittest strategy, whereby potential solutions to a problem compete and mate
with each other in order to produce increasingly stronger individuals. Each individual in the
population represents a potential solution to the problem that is to be solved and is referred to
as a chromosome (Rooji et al., 1996). The basic working of GA can be understood concisely
by the diagram shown in Figure 3.4. An initial population of individuals (also called
chromosomes) is created and according to an objective function in focus the fitness values of
all chromosomes is evaluated. From this initial population parents are selected who mate
together to produce off springs (also called children). The genes of parents and children are
mutated. The fittest among parents and children are sent to a new pool. The whole procedure
is carried over until any of the two stopping criteria is met i.e. the required number of
generations has been reached or convergence has been achieved.
Chromosomes are the basic unit of population and represent the possible solution vector; they
are assembled from a set of genes that are generally binary digits, integers or real numbers
(Mitchell, 1996, Randy and Sue, 1998). A chromosome can be thought of as a vector x
consisting of l genes gl:
x = (g1, g2,...gl), gl G (3.5)
l is referred to as the length of the chromosome. The “g” represents the binary genes (G
={0,1}), or integer genes (G ={...-2, -1, 0, 1, 2, …}) or real-value genes (G = R ). In the last
case, the real values are stored in a gene by means of a floating point representation (Rooji et
al., 1996)
The three genetic operators: selection, crossover (mating) and mutation in GA are primary
force to produce new and unique offsprings having the same number of genes as that of
parents. The selection operator is used to select parents from the pool. Crossover (mating)
operator is used to produce offsprings from the selected parents. The parent chromosomes are
mated to produce new offsprings representing new solution vectors. Like selection operator,
various crossover techniques have been developed over the years, out of which the famous are
single-point crossover (simple crossover), two-point crossover and uniform crossover
(Mitchell, 1996). A crossover point is selected arbitrarily at the identical location in two
parents and the two alternate halves of two parents are recombined to form two children
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 41
having new combination of gene values. The mutation operator is used to introduce changes
in genes of a chromosome. The mutation keeps the diversity in the genes of a population and
stops it from a premature convergence (Bowden et al., 2005). In the traditional binary GA,
using binary digits as the gene values (i.e. 0 and 1); the value of selected gene is inversed in
mutation i.e. if it has 0 value it is mutated as 1 or vice versa. However in real coded GA the
two genes in a chromosome are selected and there values are swapped to introduce mutation.
Combination of genetic algorithm and artificial neural network (GAANN)
The flow chart of the GAANN model is shown in Figure 3.5. An initial population is crowded
with “n” number of chromosomes where “n” is referred to as the population size. An
objective function comprising of feed forward ANN model with complete description of its
architecture is defined. It reads training patterns once at the start of model and stores them in
memory for applying to each chromosome. The total number of genes l of each chromosome
represents the total synaptic weights of ANN model.
{g1, g2, …gl} = {w(ifhr), w(ibhr), w(hfor), w(hbor)} (3.6)
where ‘w’ represents the value of a synaptic weight, subscript ‘i’ represents a node of input
layer, ‘h’ is a node of hidden layer and ‘o’ represents the output layer node, ‘f ’ is serial
number of node which forwards the information (i.e. f = 1, 2, 3, …), ‘r’ is serial number of
node which receives information (i.e. r = 1, 2, 3, .…), ‘ib’ represent the bias node of input
layer and ‘hb’ is bias node of hidden layer.
At the start of model, the fitness values of all the chromosomes of population are evaluated by
ANN function. The real values stored in the genes of chromosome are read as the respective
weights of ANN model. Figure 3.6 shows an example of translation of the genes of a
chromosome into the respective synaptic weights of an ANN model. The ANN performs feed
forward calculations with the weights read from genes of forwarded chromosome as per
Equation 3.6, and calculates MSE. The inverse of MSE is regarded as the fitness value of
chromosome. By this way, the fitness values of all chromosomes of initial population are
calculated by ANN function.
The selection operator selects two parent chromosomes randomly. The roulette wheel
operator with elitism is used in this model. Elitism is a scheme in which the best chromosome
of each generation is carried over to the next generation in order to ensure that the best
chromosome does not lost during the calculations. The selected parents are mated to produce
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 42
two children having the same number of genes. The uniform crossover operator is used with a
crossover rate of pc = 1.0. In uniform crossover, a toss is done at each gene position of an
offspring and depending upon the result of toss, the gene value of 1st parent or 2nd parent is
copied to the offspring. The genes of children are then mutated with the swap mutation
operator with a mutation rate of pm = 0.8. The mutated children are then evaluated by ANN
function to know their fitness values. The fitness values of all the four chromosomes (2
parents & 2 children) are compared and the two chromosomes of highest fitness values are
then sent to a new population and the other two are abolished. The evolutionary operators
continue this loop of selection, crossover, mutation and replacement until the population size
of new pool is same as old pool. One generation cycle completes at this stage and process is
repeated until any of two stopping criteria is fulfilled i.e. maximum number of generations are
reached or the convergence has been achieved. And the best chromosome which is tracked so
far through the number of generations is sent to the ANN function. The genes of best
chromosome are read as weights of ANN model and represent the optimised weights of ANN
model. With these weights, the model is said to be fully trained. Finally, the train and test sets
are simulated by using these weights (Sohail et al., 2005)
The GAANN is coded in C language and some sub routines of LibGA package (Arthur and
Rogers, 1995) for evolutionary operators of GA has been used with alterations to read and
process the negative real values.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 43
Figure 3.4 Basic idea of genetic algorithm (Sohail et al., 2005)
Stop
Test
Convergence?
Create population
Evaluate fitness values
Selection
Crossover
Mutation
No Yes
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 44
Figure 3.5 Flow chart showing steps in GAANN model
Send 2 fittest individuals among 2 parents
and 2 children to a new pool
Start
Define feed forward ANN
Evaluate fitness values
FV = 1.0 / MSE
Select parents by roulette wheel method
Create initial population of individuals
Population Size (PS) = n
PS of new pool =n
YES
NO
Crossover parents by uniform crossover
method with pc = 1.0
Mutate genes of children by swapping with
pm = 0.8
Termination criteria
satisfied? = n
Select Best individual in all generations
NO
YES
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 45
Figure 3.6 An example of assigning gene values of a chromosome to the respective synaptic
weights of ANN architecture during a GAANN modelling
3.2.3 Gene-expression programming
Gene-expression Programming (GEP) is used to perform a non-parametric symbolic
regression. Symbolic regression although is very similar to traditional parametric regression,
does not start with a known function relating dependent and independent variables as the
latter. GEP programs are encoded as linear strings of fixed length (the genome or
chromosomes), which are afterwards expressed as nonlinear entities of different sizes and
shapes (Ferreira 2001a, b, 2006).
GEP automatically generates algorithms and expressions for the solution of problems, which
are coded as a tree structure with its leaves (terminals) and nodes (functions). The generated
candidates (programs) are evaluated against a “fitness function” and the candidates with
w(i1,h1)
w(i2,h2)
w(i2,h1)
w(i1,h2)
w(h1,o1)
w(h2,o1)
w(i1,h1) w(i1,h2) w(i2,h1) w(i2,h2) w(h1,o1) w(h2,o1)
i1
i2
h1
h2
o1
(a)
(b)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 46
higher performance are then modified and re-evaluated. This modification evaluation cycle is
repeated until an optimum solution is achieved. In GEP a population of individual combined
model solutions is created initially in which each individual solution is described by genes
(sub-models) which are linked together using a predefined mathematical operation (e.g.
addition). In order to create the next generation of model solutions, individual solutions from
the current generation are selected according to fitness which is based on the pre-chosen
objective function. These selected individual solutions are allowed to evolve using
evolutionary dynamics to create the individual solutions of the next generation. This process
of creating new generations is repeated until a certain stopping criterion is met (Fernando et
al., 2009).
Two important components of the GEP include the chromosomes and the expression trees
(ETs). The ETs are the expression of the genetic information encoded in the chromosomes.
The process of information decoding from chromosomes to the ETs is called translation,
which is based on a kind of code and a set of rules. There exist very simple one to one
relationships between the symbols of the chromosome and the functions or terminals they
represent in the genetic code. To predict the flood quantiles the set of independent variables
(predictor variables) to be used in the individual prediction equation are to be identified. Then
a set of functions (e.g. ex, xa, sin(x), cos(x), ln(x), log(x), 10x , etc.) and arithmetic operations
(+, -, /, *) are defined. The terminals and the functions form the junctions in the tree of a
program.
In GEP, k-expressions (from Karva notation) which are fixed length list of symbols are used
to represent an ET as shown in Figure 3.7. These symbols are called chromosomes, and the
list is a gene. The Gene “sqrt, , ±, a, b, c and d” can be represented as ET as shown in
Figure 3.7. The GEP gene contains head and a tail. The symbols that represent both functions
and terminals are present in the head while tail only contains terminals. The length of the head
of the gene h is selected for each problem while the length of the tail is a function of length of
the head of the gene.
In order to obtain the best GEP model, the mean squared error was used as ‘fitness function’,
which was based on the observed and predicted flood quantiles; the training was undertaken
to minimise this error. In order to develop the combined model in GenXProTools®, the
parameter settings in Table 3.1 were used to develop the models.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 47
Figure 3.7 GEP expression tree (ET)
Table 3.1 Parameters used per run in GEP model
Parameters Description Amount
P1 Chromosomes 20
P2 No of genes 5
P3 Head size 6
P4 Tail size 7
P5 Fitness function error type MSE
P6 Linking function Subtraction
P7 Mutation rate 0.044
P8 Function set +, -, *, /, x2, x3, sqrt, Exp, Ln, Sin, Cos,
3Rt, Atan, Pow, Pow10, Log, Log2
P9 Inversion rate 0.1
P10 Gene recombination rate 0.1
P10 One point recombination rate 0.3
P10 Two point recombination rate 0.1
P10 Gene Transposition rate 0.1
P10 Data type Floating-Type
3.2.4 Co-active neuro fuzzy inference system (CANFIS)
Fuzzy logic provides a different way to approach a control or classification problem. This
method focuses on what the system should do rather than trying to model how it works. This
procedure of developing a fuzzy inference system (FIS) using the framework of adaptive
neural network is called an adaptive neuro fuzzy inference system (ANFIS). A typical FIS is
shown in Figure 3.8.
Consider the example of simple FIS with only two inputs x and y and one output z and
suppose that the rule base contains two fuzzy if-then rules of Takagi and Sugeno (1983).
d c b a
sqrt
- +
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 48
Let A be a crisp set. An individual x from a universal set X is determined either to be a
member of A or a non-member of A. This can be expressed by:
}1,0{:)( XXA (3.7)
Figure 3.8 Fuzzy inference system (FIS) (Shi and Mozimoto, 2000)
Fuzzy logic can be best understood using set membership where the membership values
represent the degrees with which each object is associated with the properties that are
distinctive to the collection. Formally, a fuzzy set A is defined as a collection of objects with
membership values between 0 (complete exclusion) and 1 (complete membership).
Membership grade of each element in X is determined through a membership function A
which maps the elements of a universe of discourse X to the unit interval [0, 1].
}1,0{: XA (3.8)
By using approximate reasoning, a fuzzy logic description can be used to effectively model
the uncertainty and nonlinearity of a system (Shu et al., 2008). Approximate reasoning
provides decision support and expert system bund by a minimum of rules and it is the most
obvious implementation in the field of artificial intelligence.
Rule 1: If x is A1 and y is B1, then f1 = p1x + q1y + r1,
Rule 2: if x is A2 and y is B2, then f2 = p2x + q2y + r2
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 49
Where A1, A2 and B1, B2 are the membership functions of input x and y respectively; p1, q1, r1
and p2, q2, r2 are the parameters of the output functions. The node functions in the same layer
of the same function family as described below:
Layer 1: Each node in this layer performs fuzzification and generate membership grade of a
fuzzy set (A, B, C or D) and specifies the degree to which the given input belongs to one of
the fuzzy sets. The fuzzy sets are defined by membership functions (MFs).
Layer 2: Each node in this layer is denoted by determined MF of the whole input vector by
aggregating the fuzzified results of the individual scalar functions of the every input variable.
The output of each node in this layer is obtained by multiplying the incoming signals and
represents the firing strength of a rule.
Layer 3: This layer has two components. The upper component applies to the MFs to each of
the inputs while the lower component is a representation of the modular network that
computes, for each input, the sum of all the normalized firing strengths (Parthiban and
Subramanian, 2009).
Layer 4: The fourth layer calculates the weight normalization of the output of the two
components from the third layer and produces the output of the CANFIS network.
Fuzzy rules and fuzzy sets in the CANFIS capture and store the regional information. The
training algorithm tunes the system parameters over the entire data space according to the
hybrid learning rules. This approach provides a general framework that combines two
techniques, the ANN and fuzzy systems. CANFIS model provides nonlinear modelling
capability and requires no assumption of the underlying model. By utilizing the fuzzy
techniques, the linguistic relationship between the input and output can be expressed using the
fuzzy rules. Unlike the initialization of an ANN, which may require several rounds of random
selection, the initialization of a CANFIS can be performed using the one pass subtractive
clustering algorithm. A typical CANFIS model is shown in Figure 3.9.
In case of CANFIS, the fuzzy neuron that applies membership functions (MFs) to inputs is the
fundamental component of CANFIS. The general bell and Gaussian functions are the two
commonly used MFs (Principe et al., 2000). The bell shaped membership function is used in
this study. The normalized axon/neuron in the network is used to expand the output into the
range of 0 to1. One of the advantages associated with the fuzzy axon is that their MF can be
modified through back propagation during network training and results in the expedition of
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 50
the convergence. The modular neural network that applies functional rules to the inputs is the
second major component of CANFIS. The number of modular networks equals the number of
network outputs, and the number of processing elements in each network corresponds to the
number of MFs.
Figure 3.9 A typical structure of CANFIS (Parthiban and Subramanian, 2009)
The CANFIS also has a combiner axon that applies the MFs outputs to the modular network
outputs (Roger et al., 1997; Alecsandru et al., 2004). Finally, the combined outputs are
channelled through a final output layer and the error is backpropagated to both the MFs and
the modular networks. There are a total of five layers in the CANFIS similar to ANFIS and
each layer function is summarised as follows. The fuzzification of the input is performed by
the each node in layer 1. Each node in this layer is the membership grade of a fuzzy set (A1,
A2, B1 or B2) and specifies the degree to which the given input belongs to one of the fuzzy
set. The input to the layer 2 is the product of all the output pairs from layer 1. Two
components are present in the next third layer in the network. The upper component of this
layer applies the membership functions to each of the inputs, while the lower component is a
representation of the modular network that computes, for each output, the sum of all the firing
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 51
strength. The weight normalization of the outputs of the two components of the third layer is
performed in the fourth layer of the network and this produces the final output of the network
(Ishak and Trifiro, 2007).
The CANFIS model integrates adaptable fuzzy inputs with a modular neural network to
rapidly and accurately approximate complex functions. The TSK fuzzy model proposed by
Takagi, Sugeno and Kang (Takagi and Sugeno, 1985; Sugeno and Kang, 1988) is used in the
present study, since this type of fuzzy model best fits the multi-input, single output system
(Aytek, 2009).
For the CANFIS model development, model catchments were clustered based on model
variables (A, Itc_ARI) into several class values in layer 1 to build up fuzzy rules, and each fuzzy
rule was constructed through several parameters of membership function in layer 2. A fuzzy
inference system structure was generated from the data using subtractive clustering. This was
used in order to establish the rule base relationship between the inputs.
In order to obtain the best CANFIS models, the MSE was used as the ‘fitness function’, which
was based on the observed and predicted flood quantiles; the training was undertaken to
minimise this error. Lavenberg-Marquardt (LM) method was used as the training algorithm to
minimize the MSE. CANFIS model was trained with a set of input and output data to adjust
the weights and to minimize the MSE between the desired outputs and the model outputs. The
testing data set was selected randomly to produce a reasonable sample of different catchment
types and sizes. Two inputs (A, Itc_ARI) were used in one input layer and one output layer with
one output (Qpred).
In the case of CANFIS, the bell membership function and the TSK neuro fuzzy model were
used, as this type of fuzzy model best fits the multi-input, single output system (Aytek, 2009).
LM algorithm was used for the training of CANFIS model. The stopping criteria for the
training of the CANFIS network was set to be a maximum of 1000 epochs and training was
set to terminate when the MSE drops to 0.01 threshold value.
3.2.5 Quantile regression technique (QRT)
A flood quantile is probabilistic flood estimate for a selected ARI. United States Geological
Survey (USGS) proposed a quantile regression technique (QRT) where a large number of
gauged catchments are selected from a region and flood quantiles are estimated from recorded
streamflow data, which are then regressed against catchment variables that are most likely to
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 52
govern the flood generation process. Studies by Benson (1962) suggested that T-year flood
peak discharges could be estimated directly using catchment characteristics (predictor
variables) (X) data by multiple regression analysis. (Thomas and Benson, 1970; and Stedinger
and Tasker, 1985; Haddad and Rahman, 2012):
...21
210
XXQT (3.9)
Where, regression coefficients s are generally estimated by using an ordinary least squares
(OLS) or generalised least squares (GLS) regression. There have been various techniques and
many applications of regression models that have been adopted for hydrological regression.
Most of these methods are derived from the methodology set out by the USGS as described
above. The USGS has been applying the QRT for several decades. A well-known study using
the QRT with an OLS procedure was carried out by Thomas and Benson (1970). The study
tested four regions in the United States for design flood estimation using multiple regression
techniques that related streamflow characteristics to drainage-basin characteristics.
The OLS estimator has traditionally been used by hydrologists to estimate the regression
coefficients β in regional hydrological models. But in order for the OLS model to be
statistically efficient and robust, the annual maximum flood series in the region must be
uncorrelated, all the sites in the region should have equal record length and all estimates of T
year events have equal variance. Since the annual maximum flow data in a region does not
generally satisfy these assumptions, the OLS approach can provide very distorted estimates of
the model’s predictive precision (model error) and the precision with which the regression
model coefficients are being estimated (Stedinger and Tasker, 1985).
In this study, in developing the QRT, both the dependent and independent variables were log-
transformed to linearize Equation 3.9. In this study an OLS regression was adopted to
develop prediction equations for each of the six flood quantiles using two predictor variables
(A, Itc_ARI). The OLS is easily implementable approach whereas, GLS needs specialised
software. However, both provide almost similar results unless data is highly correlated
(Haddad et al., 2008). The data sets for building and independent testing of the QRT model
were the same as with the other non-linear models. The MINITAB 14 software was used to
develop the QRT models.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 53
3.2.6 Cluster analysis
In the process of formation of regions and to identify the groups of catchments in catchment
characteristics data space, two methods were adopted in this study: cluster analysis and
principal component analysis.
Clustering algorithms are generally categorised under two different categories – partitional
and hierarchical. Partitional clustering algorithms divide the data set into non-overlapping
groups and algorithms, k-mean, bisecting k-mean, k-modes, etc., fall under this category.
Partitional clustering algorithms employ an iterative approach to group the data into a pre-
determined k number of clusters by minimising a cost function. Whereas, hierarchical
clustering involves creating clusters that have a predetermined ordering from top to bottom.
A number of methods of cluster analysis with different distance measures are used. One
problem in cluster analysis is that it generates different groupings with different methods of
cluster analysis. The question then arises which of these groupings is to be selected as the
‘acceptable grouping’. In selecting the ‘acceptable grouping’ the criterion was used that there
should be no chaining effect in the final clusters and there should be well defined grouping in
the final sets of clusters/groupings.
To overcome the problem arising from different dimensional units of the variables in
cluster analysis, the variables were standardized. The variables were transformed to z-
scores (mean = 0 and standard deviation = 1). Hence, it is assumed that there could be
two groupings in cluster analysis so that each group contains a relatively large number of
stations, which is needed for successful calibration of the RFFA model using non-linear
techniques.
The hierarchical cluster analysis
There are numerous ways in which clusters can be formed. Hierarchical clustering is one of
the most straightforward methods. A key component of the analysis is repeated calculation of
distance measures between objects, and between clusters once objects begin to be grouped
into clusters. The outcome is represented graphically which is known as a dendrogram. For
this study the hierarchical clustering was used with following methods:
Wards;
Median;
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 54
Baverage;
Waverage; and
Centroid.
Because the goal of this cluster analysis is to form similar groups of figure-skating judges, so
to measure a similarity or distance, a criterion needs to be selected. This distance is a measure
of how far apart two objects are, while similarity measures how similar two objects are. For
cases that are alike, distance measures are smaller and similarity measures are larger. Some,
like the Euclidean distance, are suitable for only continuous variables, while others are
suitable for only categorical variables. There are also many specialized measures for binary
variables. But in this case different measures were adopted and the method with best clusters
and with minimum outliers was selected for ANN modelling. For each of the above methods
following distance measure options were adopted:
Block;
Euclid;
Seuclid;
Correlation;
Cosine;
Chebychev;
Minkowski; and
Power.
Based on above mentioned criteria for selecting the best grouping, cluster method ‘Wards’
with a distance measure option of ‘Block’ was adopted for selection of region based on the
Hierarchical cluster analysis.
K-means clustering;
K-means clustering is a partitioning method. The function k-means partitions data
into k mutually exclusive clusters, and returns the index of the cluster to which it has assigned
each observation. Unlike hierarchical clustering, k-means clustering operates on actual
observations (rather than the larger set of dissimilarity measures), and creates a single level of
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 55
clusters. The distinctions mean that k-means clustering is often more suitable than hierarchical
clustering for large amounts of data.
3.2.7 Principle component analysis (PCA)
At the second stage of selecting acceptable grouping as part of formation of regions, the
principal component analysis (PCA) was undertaken. PCA is basically a variable-reduction
technique that shares many similarities to exploratory factor analysis. Its aim is to reduce a
larger set of variables into a smaller set of artificial variables, called 'principal components',
which account for most of the variance in the original variables. The PCA transforms a set of
correlated variables into a new set of uncorrelated components, such that the first component
accounts for the largest amount of the total variation in the data; the second component, which
is uncorrelated with the first, accounts for the maximum amount of the remaining total
variation not already accounted for by the first component, and so on. The PCA transforms a
set of correlated variables into a new set of uncorrelated components, such that the first
component accounts for the largest amount of the total variation in the data; the second
component, which is uncorrelated with the first, accounts for the maximum amount of the
remaining total variation not already accounted for by the first component, and so on. In this
study PCA was undertaken using the statistical package SPSS. Variables used in PCA are
discussed in Chapters 4 and 5.
3.2.8 Model validation technique
In this study, models/prediction equations were developed for each of the 6 flood quantiles
being 2, 5, 10, 20, 50 and 100 years average recurrence intervals (ARIs). A split-sample
validation technique was adopted to test the performance of the developed models/prediction
equations where the data set was divided into two parts (i) training/modelling data set, which
includes 80% of the study catchments; and (ii) validation/testing data set, which includes 20%
of the study catchments. The artificial intelligence based RFFA models and QRT were first
developed using the training/modelling data set, which were then tested using the
validation/testing data set. This enabled an independent testing of the models/prediction
equations developed in this study.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 56
3.3 Summary
This chapter provides a description of the statistical and mathematical tools adopted in this
study. These include ANN, GAANN, GEP, CANFIS, cluster analysis, principal component
analysis and quantile regression technique (QRT). The fundamental concepts, mathematical
equations and input data requirements for each of these methods are presented in this chapter.
The adopted split-sample validation technique is also described, which allowed an
independent testing of the models/prediction equations developed in this study.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 57
CHAPTER 4
SELECTION OF STUDY AREA AND DATA
PREPARATION
4.1 General
This thesis focuses on design flood estimation in ungauged catchments using artificial
intelligence based methods. Regional flood frequency analysis (RFFA) method is based on
the streamflow and catchment characteristics data of a set of selected gauged catchments. It is
important that appropriate set of catchments are selected and data is prepared following
standard procedure. This chapter presents selection of study area and catchments, collation of
streamflow and catchment characteristics data used in this research.
4.2 Selection of study area
This study selects eastern Australia as the study area since this part of Australia has the
highest density of stream gauging stations with good quality data. The eastern Australia
covers the states of Queensland (QLD), New South Wales (NSW), Victoria (VIC), Australian
Capital Territory (ACT) and Tasmania (TAS). The selected study area is shown in Figure 4.1.
Figure 4.1 Location of the selected study area (coloured parts of the map)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 58
4.3 Selection of study catchments
4.3.1 Factors considered for selection of catchments
The following factors were considered in making the initial selection of the study catchments:
Catchment area
The flood frequency behaviour of large catchments has been shown to significantly differ
from smaller catchments, and since the RFFA method is intended for small to medium sized
catchments, the proposed method should be developed based on small to medium sized
catchments. Australian Rainfall and Runoff (ARR) (I. E Aust., 1987) suggests an upper limit
of 1000 km2 for small to medium sized catchments, which seems to be reasonable and was
adopted in this thesis.
Record length
For a stream gauging station, a long enough streamflow record is ideally needed to
characterize the underlying flood probability distribution with reasonable accuracy. In most
practical situations, streamflow records at many gauging stations in a given study area are not
long enough and hence a balance is required between obtaining a sufficient number of stations
(which captures greater spatial information) and a reasonably long record length (which
enhances accuracy of at-site flood frequency analysis). Selection of a cut-off record length
appears to be difficult as this can affect the total number of stations available to develop the
RFFA technique in a study area. For this study, the stations having a minimum of 10 years of
annual instantaneous maximum flow records were selected initially as ‘candidate stations’.
Regulation
Ideally, the selected streams should be unregulated, since major regulation affects the rainfall-
runoff relationship significantly (storage effects). Streams with minor regulation, such as
small farm dams and diversion weirs, may be included because this type of regulation is
unlikely to have a significant effect on annual maximum (AM) floods. Gauging stations on
streams subject to major upstream regulation were not included in this thesis.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 59
Urbanisation
Urbanisation can affect flood behaviour dramatically (e.g. decreased infiltration losses and
increased flow velocity). Therefore catchments with more than 10% of the area affected by
urbanisation were not included in this thesis.
Landuse change
Major landuse changes, such as the clearing of forests or changing agricultural practices
modify the flood generation mechanisms and make streamflow records heterogeneous over
the period of record length. Catchments which have undergone major landuse changes over
the period of streamflow records were not included in the data set.
Quality of data
Most of the statistical analyses of flood data assume that the available data are essentially
error free; at some stations, this assumption may be grossly violated. Stations graded as ‘poor
quality’ or with specific comments by the gauging authority regarding quality of the data were
assessed in greater detail; if they were deemed to be of ‘low quality’, they were excluded from
the study.
4.4 Streamflow data preparation
4.4.1 Methods of streamflow data preparation
Missing observations in streamflow records at gauging locations are very common and one of
the elementary steps in any hydrological data analysis is to make decisions about dealing with
these missing data points. Missing records in the AM flood series were in-filled where the
extra data points can be estimated with sufficient accuracy to contribute additional
information rather than ‘noise’. For this research following methods were applied following
the approach of Rahman (1997) and Haddad et al. (2010).
Method 1
In this method the monthly instantaneous maximum (IM) data was compared with monthly
maximum mean daily (MMD) data at the same station for years with data gaps. For a missing
month of instantaneous maximum flow corresponding to a month of very low maximum mean
daily flow, that was taken to indicate that the AM did not occur during that missing month.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 60
Method 2
This method involved a linear regression of the AM mean daily flow series against the annual
instantaneous maximum series of the same station. Infilling of the gaps in IM record was
performed using the developed regression equations. The IM record is not to extend the
overall period of record of instantaneous flow data, but to infill the missing data points.
As Method 1 is more directly based on observed data for the missing month and involves
fewer assumptions, it was preferred over Method 2.
4.4.2 Tests for outliers
In a set of annual maximum (AM) flood series there is a possibility of outliers being present.
An outlier is an observation that deviates significantly from the bulk of the data, which may
be due to errors in data collection or recording, or due to natural causes.
The method for treating outliers suggested in ARR (I.E Aust., 1987) was not adopted here, as
it includes an adjustment for skew, employing somewhat ‘circular’ logic. Instead, the
procedure known as Grubbs and Beck (1972) method was adopted. The Grubbs and Beck
(1972) method is based on the principle of determining high and low outlier threshold values
by applying a one-sided 10% significance level test, which considers the sample size. The test
was developed by Grubbs and Beck (1972) for detecting single outlier from a normal
distribution, but has been shown to be also applicable to the LP3 distribution.
4.4.3 Trend analysis
Hydrological data for any flood frequency analysis, be it at-site or regional, should be
stationary, consistent and homogeneous. The AM flow series should not show any time trend
to satisfy the basic assumption of stationarity with traditional flood frequency analyses
methods. Thus, in this study, a trend analysis was carried out where possible to identify
stations showing significant trend and the stations which did not show any significant trend
were included in the primary data set for this study.
Two tests were initially applied to detect trend, the Mann–Kendall test (Kendall, 1970) and
the distribution free CUSUM test (McGilchrist and Wodyer, 1975); both tests were applied at
the 5% significance level. The Mann-Kendall test is concerned with testing whether there is
an increase or decrease in a time series, whereas the CUSUM test concentrates on whether the
means in two parts of a record are significantly different. As a useful guide and in addition to
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 61
the trend tests, a simple time series plot and a cumulative flow graph of the station were also
used to detect shifts in the AM flood data. It should be noted that trends in a time series data
do not necessarily mean non-stationarity. In climate change research, non-stationarity means
significant changes in statistical properties of the time series data of a hydro meteorological
variable over time. Trends may not change statistical properties (such as mean and variance)
of a time series data significantly. Therefore, trend analysis cannot be used as stationarity test;
however, trends may be an indicator of stationarity.
4.4.4 Rating error analysis
The rating curve used to convert measured flood levels to flood flow rates is based on
periodic measurements of flow areas and velocities over a range of flow magnitudes.
However, the range of observed flood levels generally exceeds the range of ‘measured’ flows,
thus requiring different degrees of extrapolation of well-established rating curves.
Any rating curve extrapolation errors are directly transferred into the largest observations in
the AM flood series, and use of extrapolated data in flood frequency analysis can thus result
in grossly inaccurate flood frequency estimates.
To assess the degree of rating curve related error for a given station, the AM flood series data
point for each year (estimated flow QE) was divided by the maximum measured flow (QM) to
obtain a rating ratio (RR) (see Equation 4.1). If the RR value is below or near 1, the
corresponding AM flow may be considered to be free of rating curve extrapolation error.
However, a RR value well above 1 indicates a rating curve error that can cause notable errors
in flood frequency analysis.
M
E
Q
QRRRatioRating )( (4.1)
For any RFFA, a large number of stations with reasonably long record lengths are required
and hence a trade-off needs to be made between an extensive data set that includes stations
with very large RR values (and thus lower accuracy) and a smaller data set with RR values
restricted to what could be considered to be a “reasonable upper limit” of rating curve errors.
A working method to decide on a cut-off RR value was determined by looking at the average
RR value and the maximum RR value for each station in a region/state. Based on the results
from Victoria and NSW, the following cut-off values were found to represent a reasonable
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 62
compromise between accuracy at individual sites and total size of the regional data set: an
average RR value of 4 and a maximum RR value of 20.
4.5 Selection of catchment characteristics
Identification of the most relevant catchment characteristics is difficult as there is no objective
method for doing this; also many catchment characteristics are highly correlated, thus the
presence of many of these in the model can cause problems with statistical analysis such as
introducing multi-colinearity and secondly it does not provide any extra useful information.
The evaluation and success of catchment characteristics used in past studies should be used as
a criterion for the initial selection of candidate characteristics. The initial selection of
candidate characteristics should be based on an evaluation and success of catchment
characteristics used in past studies. All the possible catchment/climatic characteristics must be
considered from the past studies to make the selection for a given study. This can increase the
validity of the model to be developed. Rahman (1997) considered this aspect in detail from
over 20 previous studies to develop a reasonable starting point. But in RFFA, the significance
of characteristics may vary from region to region; therefore, no general inference about the
significance of a particular catchment characteristic can be made for a given region based on
the findings of other studies.
4.5.1 Selection criteria
Following guidelines were adopted in this study to select the catchment characteristics
following the approach of Rahman (1997):
The characteristic should have a plausible role in flood generation.
They should be unambiguously defined.
Characteristics should be easily obtainable. When a simpler characteristic and a
complex one are correlated and have similar effects, then the simpler characteristic
should be chosen.
If a derived/combined characteristic is used, it should have a simple physical
interpretation.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 63
The selected characteristics should not be highly correlated because this introduces
unstable parameters in multiple regression analysis.
The prediction performance of a particular characteristic in other regionalisation
studies should be examined as this might provide some information regarding the
importance of a characteristic.
4.5.2 Catchment characteristics considered in this thesis
Following five catchment characteristics were selected in this thesis on the basis of criteria
mentioned in section 4.5.1. They are also described in detail in the next section.
The candidate catchment/climatic characteristics are:
Design rainfall intensity (I_tc_ARI, mm/h);
Mean annual rainfall (R, mm);
Mean annual evapo-transpiration (E, mm);
Catchment area (A, km2); and
Slope of central 75% of mainstream S1085 (S, m/km).
4.5.3 Rainfall intensity
Rainfall intensity, with some appropriate duration and average recurrence interval (ARI), has
been found to be the most influential climatic characteristic in the previous RFFA studies.
There is no doubt that it is significant in the flood generation process. It is also quite easy to
obtain.
The use of rainfall intensity requires the selection of an appropriate duration and ARI. It
seems to be logical to use rainfall intensity with duration equal to the time of concentration
(tc), as applied in the rational method. However, the time of concentration (tc) differs for the
catchments in the study area due to variability in size and shape; i.e. it is virtually impossible
to select a storm having equal time of concentration which is representative of every
catchment in this thesis. It was therefore decided to include the following design rainfall
intensities in this study:
(tc) duration, 2 years ARI (I_tc_2, mm/h);
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 64
(tc) duration, 5 years ARI (I_tc_5, mm/h);
(tc) duration, 10 years ARI (I_tc_10, mm/h);
(tc) duration, 20 years ARI (I_tc_20, mm/h);
(tc) duration, 50 years ARI (I_tc_50, mm/h); and
(tc) duration, 100 years ARI (I_tc_100, mm/h).
All the basic design rainfall intensities data for the selected catchments were obtained from
ARR, Vol. 2 (I. E. Aust., 1987) and the software AUSIFD was used to obtain other design
rainfall intensities. AUSIFD is widely used software in Australia to derive design rainfalls.
For consistency, and ease of application, the formula recommended in ARR 1987 for Victoria
and eastern NSW, given by Equation 4.2, was adopted in this thesis to estimate time of
concentration tc (hours) from catchment area A (km2).
38.076.0 Atc (4.2)
4.5.4 Mean annual rainfall
Mean annual rainfall has been adopted in many previous studies; although it may not have a
direct influence or a link with flood peaks it can still have a secondary effect by acting as
surrogate for other catchment characteristics (e.g. vegetation). It is also quite easy to obtain.
Thus, mean annual rainfall was included as a candidate predictor variable in this study The
mean annual rainfall data was obtained from Australian Bureau of Meteorology CD. For all
the catchments, the mean annual rainfall value for the rainfall station closest to the centroid of
each catchment was extracted.
4.5.5 Catchment area
Catchment area is the most frequently adopted morphometric characteristic and the main
scaling factor in the flood process studies, since it has a direct impact on the possible flood
magnitude from a given storm event. Almost all of the reported RFFA studies have found
catchment area to be very significant. One of the reasons why the area variable has been so
useful in statistical hydrology is its association with other significant morphometric
characteristics like slope, stream length, and stream order. Catchment areas of the selected
catchments were measured by planimeter from 1:100,000 topographic maps. The derived
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 65
areas were also compared to the values provided in the catchment data base that contained the
streamflow data provided by the stream gauging authority. Area was characterised by
Anderson (1957) as the ‘devil’s own variable’, because almost every watershed characteristic
is correlated with it. As in the case of area, the mean annual flood is directly proportional to
other morphometric characteristics, which are again directly proportional to area (e.g. stream
order, stream length). The total volume of runoff (Q) is proportional to the area of the
catchment (A) and of the general form:
Q = cAm (4.3)
Where, the exponent m varies from 0.5 to 1.00. Catchment area was included in this study as a
candidate predictor variable.
4.5.6 Slope S1085
From the different measures of slope, S1085 seems to be easily obtainable and reported to be
the best measure for prediction of mean flood (Benson, 1959). Thus, S1085 was used in this
study. S1085 method of slope measurement in this study excludes the extremes of slope that
can be found at either end of the mainstream. It is the ratio of the difference in elevation of the
stream bed at 85% and 10% of its length from the catchment outlet and 75% of the
mainstream length.
The following methodology was adopted to derive the S1085 values:
Catchment boundaries were plotted on 1:100,000 topographic maps for each gauged
station.
The mainstream length was measured using an electronic map wheel. Where the
mainstream was taken as the total distance from the outlet to where it intersects with
the catchment boundary of the stream. The longest path was chosen for each
catchment as the main stream of that catchment.
Elevations were then derived for the 10% and 85% mainstream length positions. The
positions were interpolated from either 10 m or 20 m contours.
S1085 values were determined from Equation 4.4.
)(75.0
)(1085 12
L
EES
(4.4)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 66
Where,2E is the elevation at the 0.85L position,
1E is the elevation at the 0.10L position and L
is the main stream length, where S1085 in m/km. The slope S1085 is referred to as S
henceforth.
4.5.7 Mean annual evapo-transpiration
Mean annual evapo-transpiration is the third influential climatic characteristic considered in
the flood generation process. Evapo-transpiration does not affect the flood peak directly but
can have a secondary effect by being a surrogate for other catchment characteristics. Evapo-
transpiration can be defined as the water lost from a water body through the combined effects
of evaporation and transpiration from catchment vegetation. In this study mean annual areal
potential evapo-transpiration data was used.
For this, the data was obtained from the Australian Bureau of Meteorology CD. For all the
catchments the value at the centroid of each catchment was extracted.
4.6 Streamflow data preparation for various states
4.6.1 NSW and ACT
A total of 635 stations were selected from NSW and ACT initially. For in-filling the gaps,
Method 1 was preferred over Method 2 (see Section 4.4.1 for description of these methods)
for different catchments in NSW.
Trend analysis
Initially the Mann-Kendall test was applied to the stations. The results showed that some 11%
of the stations had a decreasing trend generally after 1990. Given the magnitude of the
number of stations showing trend, time series plots and mass curves were prepared for the
stations showing trend to detect visually if significant changes in slope could be identified. A
typical plot is shown in Figure 4.2. A simple time series plot (Figure 4.3) is useful in addition
to trend tests in detecting and confirming shifts in data. With an indication from these tests
that flood data are not independently and identically distributed from year to year, there needs
to be caution applied when using short records in estimating long term risks.
The fact that the last 10–15 years of data (after late 1980’s) showed a significant downward
trend for many stations makes the inclusion of stations with short record length in flood
frequency analysis questionable, as this could introduce significant bias in the results. Hence,
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 67
it was decided that a station should have at least 25 years of streamflow data. The number of
eligible stations in NSW and ACT after the introduction of a cut off record length of 25 years
dropped to 106.
Checking for outliers in the AM flood series
The Grubbs and Beck (1972) method was adopted to check for the outliers. While the data
checking revealed many ‘outliers’ in the flood series, these did not preclude the use of the
remaining flood data in RFFA. The results of the outlier detection procedure are summarised
below:
40% of the stations were found to have low outliers. The maximum number of low
outliers detected in a data series was 9 and never exceeded 21% of the total number of
data points in a series.
Most of the detected low outliers occurred for stations located in low rainfall areas,
especially in the western parts of NSW.
31% of low outliers occurred in the years 1982, 1967 and 1994. This is not surprising
as there were severe droughts during these years; the maximum flows that occurred in
many rivers in these years were merely base flows, and not due to flood events.
47% of the stations did not show any outliers.
Only 5 stations had a high outlier.
The detected low outliers were treated as censored flows in flood frequency analysis using
ARR FLIKE (Kuczera and Franks, 2005).
Rating curve error
To assess the degree of rating curve related error for a given station, the rating ratio (RR) (see
Equation 4.1) was adopted. In the remaining data set of 106 stations from NSW, many had
RR values considerably greater than 1 (Figure 4.8). A cut-off RR value of 20 was adopted;
any station having an average RR value greater than 4 and a maximum RR value greater than
20 was rejected. This reduced the eligible number of stations from 106 to 96.
Final data set from NSW and ACT
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 68
A total of 635 stations were initially selected. After in-filling the gaps in the AM flood series,
trend analysis, introduction of a cut-off record length of 25 years, and consideration of rating
curve errors, only 96 stations remained, which represent about 15% of the initially selected
stations. The statistics of AM streamflow record lengths of these 96 stations are summarised
below:
Record lengths range from 25 to 74 years, mean 34 years, median 31 years and
standard deviation 10 years;
77% of the stations have record lengths in the range 25-35 years;
18% of the stations have record lengths in the range 40-55 years; and
5% of the stations have record lengths in the range 60-75 years.
The histogram of streamflow record lengths of the 96 stations from NSW and ACT is shown
in Figure 4.5.
Vk - Station 219001
-2
0
2
4
6
8
10
12
1940 1950 1960 1970 1980 1990 2000 2010Year
Vk
Significant shift
downwards
Figure 4.2 Result of trend analysis (Station 219001). Here Vk is CUSUM test statistic defined in
Histogram of Rating Ratio
2162
774
222
9967 61
2113
9 85 5
2
4
0
5
0
2
1
10
100
1000
10000
1 3 5 7 9 12 14 16 18 20 22 24 26 28 30 35 40 45
Rating Ratio - RR
Fre
qu
en
cy
Over 95% of rating ratios
between 1 & 20
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 69
McGilchrist and Wodyer, 1975
Figure 4.3 Result of trend analysis – time series plot (Station 219001)
Figure 4.4 Histogram of rating ratios for 106 stations from NSW
The statistics of catchment areas of the selected 96 stations are summarised below:
Catchment areas range from 8 to 1010 km2, with an average value of 353 km2, median
of 267 km2 and a standard deviation of 276 km2;
53% of catchments have areas smaller than 300 km2;
38% of stations have areas in the range of 301 km2 to 800 km2; and
10% of stations have areas in the range of 801 km2 to 1010 km2.
Station 219001
0
2000
4000
6000
8000
10000
12000
1940 1950 1960 1970 1980 1990 2000 2010
Year
An
nu
al M
ax
imu
m F
low
(m
3/s
)
Decrease in flow
magnitude
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 70
7
41
26
5 5 5
2 2 2
01
0
5
10
15
20
25
30
35
40
45
25 - 29 30 - 34 35 - 39 40 - 44 45 - 49 50 - 54 55 - 59 60 - 64 65 - 69 70 - 74 >75
Record Length (years)
Fre
qu
en
cy
Figure 4.5 Distribution of streamflow record lengths of 96 stations from NSW and ACT
The distribution of catchment areas is shown in Figure 4.6. The geographical distribution of
the finally selected 96 stations is shown in Figure 4.7. There is no station in far western NSW
that passed the selection criteria.
89
20
1312
78
45
6
3
1
0
5
10
15
20
25
0 - 25 26 - 100 101 -
200
201 -
300
301 -
400
401 -
500
501 -
600
601 -
700
701 -
800
801 -
900
901 -
1000
>1000
Catchment Area (km2)
Fre
qu
en
cy
Figure 4.6 Distribution of catchment areas of 96 stations from NSW and ACT
4.6.2 Tasmania
A total of 73 stations were selected as candidates from Tasmania, each having a minimum of
10 years of streamflow record. For in-filling the gaps in the AM flood series, Method 1 was
preferred over Method 2 (these methods are described in Section 4.4.1). The following points
summarise the results of the in-filling of the AM flood series data for Tasmania:
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 71
18 data points from 23 stations were in-filled by comparing flow records (Method 1);
27 data points from 12 stations were in-filled by regression (Method 2); and
20% of stations did not have any missing record.
After in-filling the gaps, the stations were then checked for possible trends (Section 4.4.3
details the method). Only three stations showed trends. The relevant data for checking the
rating ratios for Tasmania was largely unavailable, and hence no rating error analysis was
undertaken. About 9% of the stations showed low outliers. The maximum number of low
outliers detected in a data series was one and never exceeded 4% of the total number of data
points in a series. The low outliers occurred in the years 1967, 1982 and 2001. About 75% of
the stations did not show any outliers. About 14% of the stations showed high outliers;
however, these data points were not removed as no data error was detected.
While obtaining catchment characteristics data, 7 stations were found to have significant
proportions of lake areas, and were thus excluded; this reduced the dataset to 56 stations.
From this, 3 catchments over 1590 km2 were excluded, thus the final dataset contained 53
stations.
Figure 4.7 Geographical distributions of 96 catchments from NSW and ACT
The streamflow record lengths of the selected stations range from 10 to 58 years (median: 21
years and mean: 24 years). Figure 4.8 shows the distribution of record lengths. Figure 4.9
presents the distribution of catchment areas of the selected catchments. The catchment areas
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 72
range 4.6-1590 km2 (median: 102 km2 and mean: 240 km2). Figure 4.10 shows the locations
of the selected stations. There is a lack of station in the southern and eastern parts of the state.
1
15
10
7
2 2
0
2
4
6
8
10
12
14
16
1 - 10 11 - 20 21 - 30 31 - 40 41 - 50 51 - 60
Record Length (years)
Fre
qu
en
cy
Figure 4.8 Distribution of streamflow record lengths of the selected stations from Tasmania
9
6
2
5
2 2
3
1
0
3
0 0
1
0
1
2
3
4
5
6
7
8
9
10
0 - 25 26 -
50
51 -
100
101 -
200
201 -
300
301 -
400
401 -
500
501 -
600
601 -
700
701 -
800
801 -
900
901 -
1000
>1000
Catchment Area (km2)
Fre
qu
en
cy
Figure 4.9 Distribution of catchment areas of the selected stations from Tasmania
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 73
Figure 4.10 Locations of selected catchments from Tasmania
4.6.3 Queensland
The streamflow data were obtained from the Department of Natural Resources & Water
(NRW), QLD. A total of 351 active and historical streamflow gauging station records were
provided by NRW. Gauge station metadata, AM flow records as well as the monthly and daily
records were supplied by the NRW for each station. Based on the adopted selection criteria,
the number of eligible stations was reduced to 289.
The streamflow data were in-filled by comparing flow records (Method 1) and/or regression
(Method 2). Method 1 was preferred over Method 2. Some years’ data could not be filled due
to many missing records. Some important statistics regarding the gap filling are:
81 data points were in-filled for 47 stations using Method 1;
413 data points were in-filled for 104 stations using Method 2; and
16 % of stations did not have any missing records.
To check for outliers, the Grubbs and Beck (1972) method was used. Some important
statistics about the outlier detection are:
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 74
39% of stations were found to have low outliers; the maximum number of outliers
detected in a data series was 4 and never exceeded 10% of the total number of data
points in a series.
Most of the detected low outliers occurred mainly in the midwestern and top parts of
Queensland.
The bulk of the low outliers occurred in the years 1967, 1982 and 2001.
61% of stations did not have any outliers.
A total of 117 stations (7% of the stations) showed a significant trend, and were removed
from the database. As a result, 265 stations were retained.
Furthermore, the data with streamflow record length of 25 years and greater was selected.
After the introduction of cut off period the numbers of catchments from QLD were dropped to
172. Figure 4.11 provides histogram of record lengths 172 stations. Some important statistics
of the streamflow record lengths are provided below:
The distribution of catchment areas of these catchments is shown in Figure 4.12. Some
important statistics of the catchment areas are summarised below:
24 catchments (9%) are smaller than 50 km2;
67 catchments (25%) are smaller than 100 km2;
47 catchments (18%) are in the range of 101 to 200 km2; and
37 catchments (14%) are larger than 600 km2.
The locations of the selected 172 stations are shown in Figure 4.13. There is no suitable
station located in the south-western part of Queensland.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 75
1
99
62
73
23
1 1 3 1 1
0
20
40
60
80
100
120
1 - 10 11 - 20 21 - 30 31 - 40 41 - 50 51 - 60 61 - 70 71 - 80 81 - 90 91 - 100
Record Length (years)
Fre
qu
en
cy
Figure 4.11 Distribution of streamflow record lengths of the selected 172 stations from QLD
8
59
47
26
36
2725
1513
2
7
0
10
20
30
40
50
60
70
0 - 25 26 - 100 101 -
200
201 -
300
301 -
400
401 -
500
501 -
600
601 -
700
701 -
800
801 -
900
901 -
1000
Catchment Area (km2)
Fre
qu
en
cy
Figure 4.12 Distribution of catchment areas of the selected 172 stations from QLD
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 76
Figure 4.13 Locations of the selected 172 stations from QLD
4.6.4 Victoria
Based on the adopted selection criteria, a total of 415 stations were initially selected as
candidates from Victoria each having a minimum of 10 years of streamflow record.
For in-filling the gaps in the AM flood series, Method 1 was preferred over Method 2. The
following points summarise the results of the in-filling of the AM flood series data in
Victoria.
273 data points from 187 stations were in-filled by comparing flow records (Method
1);
60 data points from 44 stations were in-filled by regression (Method 2);
Regression equations used in gap filling showed high R2 values (range 0.82 – 0.99,
mean = 0.93 and SD = 0.041); and
10% of stations did not have any missing records.
After in-filling the gaps, the stations were then checked for possible trends, as discussed
below.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 77
Trend analysis:
Initially the Mann-Kendall test was applied to the stations. The results were rather surprising
as they revealed that some 20% of the stations had a decreasing trend. Given the magnitude of
the number of stations showing trend, time series plots and mass curves were prepared for the
stations showing trend to detect visually if significant changes in slope could be identified.
As an example, Figure 4.14 shows a significant overall downward trend for Station 230210,
supporting the result from the Mann-Kendall test, and a noticeable decrease in AM flows
from the late 1980s. In order to clarify this further the CUSUM test was applied; the result
was similar, with the plotted graph as seen in Figure 4.15 showing a downward shift in the
mean from 1995 onwards.
A simple time series plot was made in addition to trend tests in detecting and confirming
shifts in data. With an indication from these tests that flood data are not independently and
identically distributed from year to year, there needs to be caution applied when using short
records in estimating long term risks. The fact that the last 10–15 years of data (after late
1980’s) showed a significant downward trend for many stations (presumably due to the drier
climate epoch we have entered) makes the inclusion of stations with short records in
regionalization studies quite questionable.
Finally, 21 stations from Victoria were removed due to the presence of significant trend. The
number of eligible stations remaining after the application of trend tests and the introduction
of a cut off length of 25 years, dropped to 144, which is only 35% of the initially selected 415
stations. This result shows that the effective dataset for RFFA in a given region is likely to be
substantially smaller than the primary data set.
Impact of rating curve error on flood frequency analysis:
In the remaining data set of 144 stations, many had rating ratios (RR) considerably greater
than 1 (RR is defined by Equation 4.1). For any RFFA study, a large number of stations with
reasonably long record lengths are required and hence a trade-off needs to be made between
an extensive data set that includes stations with very large RR values and a smaller data set
with RR values restricted to what could be considered to be a “reasonable upper limit”.
A working method to decide on a cut-off RR value was determined by looking at the average
RR value and the maximum RR value for each station. From the histogram of RR values
shown in Figure 4.15 it can be seen that 90% of the RR values for all the recorded annual
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 78
maxima fall between 1 and 20. Thus it was decided that a cut-off RR value of 20 would be
reasonable, and that any station having an average RR value greater than 4 and a maximum
RR value greater than 20 would be rejected. Rating ratios significantly greater than one could
magnify the errors in flood frequency quantile estimates but, on the other hand, rejecting all
stations with RR greater than one would reduce the number of stations below the minimum
required for meaningful RFFA to be undertaken. Adopting the cut off values of RR,
mentioned above, and reduced the eligible number of stations from 144 to 131.
Figure 4.14 Time series graph showing significant trends after 1995
Vk - Station 230210
0
1
2
3
4
5
6
7
8
9
1970 1975 1980 1985 1990 1995 2000 2005
Year
Vk
Figure 4.15 CUSUM test plot showing significant trends after 1995
Station 230210
0
2000
4000
6000
8000
10000
12000
1970 1975 1980 1985 1990 1995 2000 2005 2010
Year
An
nu
al M
ax
imu
m F
low
(M
L/d
)
Decrease in flow
magnitude
Significant shift
downwards
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 79
Figure 4.15 Histogram of rating ratios (RR) of AM flood data in Victoria (stations with record
lengths > 25 years)
Outlier identification results
While the data checking revealed many ‘outliers’ in the flood series, these do not preclude the
use of the remaining flood data in RFFA. The results of the outlier detection procedure for
Victoria are summarised below.
43% of the stations were found to have low outliers. The maximum number of low
outliers detected in a data series was 5 and never exceed 19% of the total number of
data points in a series.
Most of the detected low outliers occurred for stations which were located in low
rainfall areas, especially in the western part of Victoria.
31% of low outliers occurred in the years 1982 and 1967. This is not surprising as
there were severe droughts during these two years; the maximum annual flows that
occurred in many rivers in these years were merely base flows, and not due to flood
events.
55% of the stations did not show any outliers. Even the values in drought years (1982
and 1967) were not low enough to be treated as low outliers. The locations of most of
these stations are in the south-eastern part of Victoria.
Histogram of Rating Ratio Values
384
111
61
19 18 18
9 10 10
45
1
4
2
4
1 1
2
3
2
1
2
0 0
5
4387
1
10
100
1000
10000
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50More
Ratio Ratio (RR)
Fre
qu
en
cy
Frequency
90% of rating ratios
between 1 & 20
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 80
Only 1 station shows a high outlier.
The detected outliers were treated as censored flows in flood frequency analysis using FLIKE
(that is the information that there is no flood in that year was taken into account).
Final data set from Victoria:
As noted earlier, a total of 415 stations, each with a minimum record length of 10 years, was
initially selected. After in-filling the gaps in the AM flood series, trend analysis, and
introduction of a cut-off record length of 25 years, only 131 stations remained, which
represents about one-third of the initially selected stations. The distribution of streamflow
record lengths of the selected 131 stations is shown in Figure 4.16. The statistics of record
lengths of these 131 stations are summarised below.
Record lengths range from 25 to 52 years, mean 32 years, median 32 years and standard
deviation 5 years;
87% of the stations have record lengths in the range 25-35 years;
8% of the stations have record lengths in the range 35-45 years; and
5% of the stations have record lengths in the range 50-55 years.
The catchment areas of the selected 131 catchments range from 3 to 997 km2 (mean: 321 km2
and median: 289 km2). The distribution of catchment areas is shown in Figure 4.25. The
statistics of catchments areas of the selected 131 catchments are summarised below:
15 catchments (11%) are in the range of 3 to 50 km2;
11 catchments (8%) are in the range of 51 to 100 km2;
78 catchments (60%) are in the range of 101 to 499 km2; and
27 catchments (21%) are in the range of 500 to 997 km2.
The geographical distribution of the finally selected 131 stations is shown in Figure 4.18.
There is no station in north-western Victoria that passed the selection criteria. This region is
characterized by very low runoff and ephemeral streams.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 81
4.5 Flood frequency analysis
For each of the selected stations, at-site flood frequency analysis was carried out using ARR
FLIKE (Kuczera, 1999) software. The detected low flows were censored using in-built
facility in the FLIKE. A LP3 distribution with the Bayesian fitting method was adopted to
estimate flood quantiles for ARIs of 2, 5, 10, 20, 50 and 100 years. These flood quantiles were
used as dependent/target variables in the RFFA adopted in this thesis.
23
78
20
35
2
0
10
20
30
40
50
60
70
80
90
25 - 29 30 - 34 35 - 39 40 - 44 45 - 50 51 - 55
Record Length (years)
Fre
qu
en
cy
Figure 4.16 Distributions of streamflow record lengths of the selected 131 stations from Victoria
6
20
24
18
23
13
6
10
45
2
0
5
10
15
20
25
30
0 - 25 26 -
100
101 -
200
201 -
300
301 -
400
401 -
500
501 -
600
601 -
700
701 -
800
801 -
900
901 -
1000
Catchment Area (km2)
Fre
qu
en
cy
Figure 4.17 Distributions of catchment areas of the selected 131 catchments from Victoria
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 82
Figure 4.18 Geographical distributions of the selected 131 catchments from Victoria
4.6 Summary of catchment characteristics data
For each of the selected catchments, five catchment characteristics data were obtained
following the procedures mentioned in section 4.5.2. Figure 4.19 shows the selected
catchments from NSW, ACT, VIC, QLD and TAS. The catchments from NSW and ACT will
be considered and discussed in this thesis as NSW.
Figure 4.19 Locations of the study catchments
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 83
The summary statistics of the catchment characteristics data set of the selected catchments are
provided in Table 4.1.
Table 4.1 Summary statistics of the catchment characteristics data
Variables Range Median Mean
Standard
Deviation
Catchment area (A), km2 1.3 to 1900 255.5 329.4 277.3
Mean annual areal evapo-transpiration (E), mm/y 410.1 to 1543.3 998.5 977.8 188.9
Mean annual rainfall (R), mm 416 to 4348 1005.6 1185.8 603.5
Main stream slope (S), m/km 0 to 197.7 7.7 11.3 16.8
Design rainfall intensity - 2 years ARI and time of
concentration of tc hours (I_tc_2), mm/h 2.9 to 43.1 8.9 10.9 6.1
Design rainfall intensity - 5 years ARI and time of
concentration of tc hours (I_tc_5), mm/h 3.6 to 54.5 11.4 13.9 8.0
Design rainfall intensity - 10 years ARI and time of
concentration of tc hours (I_tc_10), mm/h 4.0 to 235.8 12.9 16.3 13.7
Design rainfall intensity - 20 years ARI and time of
concentration of tc hours (I_tc_20), mm/h 4.6 to 70.1 15.0 18.3 10.5
Design rainfall intensity - 50 years ARI and time of
concentration of tc hours (I_tc_50), mm/h 5.4 to 757 17.7 23.4 36.7
Design rainfall intensity - 100 years ARI and time of
concentration of tc hours (I_tc_100), mm/h 6.0 to 91 20.1 24.5 14.0
4.7 Summary
A total of 452 catchments have been selected from eastern Australia as the study catchments
for this study. Among them, 96, 131, 172 and 53 catchments have been selected from the
states of NSW, VIC, QLD and TAS, respectively. The locations of these catchments are
shown in Figure 4.19. The streamflow data have been prepared for these catchments. At site
flood quantiles have been estimated using ARR FLIKE software for ARIs of 2, 5, 10, 20, 50
and 100 years using Bayesian LP3 distribution. For each of the selected catchments, five
catchment characteristics data have been extracted. These data will now be applied in the
following chapters to develop and test artificial intelligence based RFFA techniques.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 84
CHAPTER 5
SELECTION OF PREDICTOR VARIABLES FOR
ARTIFICIAL INTELLIGENCE BASED RFFA
MODELS
5.1 General
The focus of this thesis is to develop regional prediction models for design flood estimation
using various artificial intelligence based techniques namely artificial neural networks (ANN),
adaptive neuro-fuzzy inference system (ANFIS), genetic algorithm (GA) and gene expression
programming (GEP). In Chapter 4, five candidate predictor variables were selected for RFFA.
This chapter focuses on the selection of final set of predictor variables from these candidate
predictor variables that can be used in developing the artificial intelligence based RFFA
models. In this chapter, predictor variables are selected based on the ANN and GEP based
RFFA modelling, and it is assumed that the same set of predictor variables will be applicable
to the GA and ANFIS based RFFA models.
5.2 Initial selection of predictor variables for artificial intelligence
based RFFA models
The variables adopted by similar previous RFFA studies were first examined (see Table 5.1).
It was found that all the mentioned previous studies adopted catchment area and mean annual
rainfall as the predictor variables and hence these were included as candidate predictor
variables in this thesis. Design rainfall intensity and evaporation were adopted by three
previous Australian studies, and hence these were included in this study. Main stream slope
was adopted by all but one study and hence it was included in this study. To use the design
rainfall intensity, one needs duration of rainfall and average recurrence interval (ARI); in this
study, 6 different combinations of durations and ARIs were adopted. Hence, this study
included a total of 10 predictor variables; however six of them represent design rainfall
intensity of different durations and ARIs. The correlations of these 10 variables are plotted in
Figure 5.1, which shows that 6 different rainfall intensities are highly correlated, which
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 85
indicates that the use of only one design rainfall intensity is desirable in the final prediction
equation since the use of highly correlated variables does not add any extra information to the
model. At the first stage of model development, different models based on various
combinations of initially selected predictor variables (A, I_tc_ARI, R, S, and E) were formed.
The candidate models are shown in Table 5.2.
Table 5.1 Catchment characteristics predictor variables used in some previous RFFA
studies Authors Country Predictor variables adopted
Flavell (2012) Australia Catchment area, mean annual rainfall, mainstream slope, main-channel
length, and 12 and 24 hours statistical rainfall totals.
Griffis and
Stedinger (2007) USA
Catchment area, mean annual rainfall, runoff measured, mainstream
slope, main-channel length, forest cover, and storage measured as the
percent of catchment area.
Haddad and
Rahman (2012) Australia
Catchment area, design rainfall intensity, mean annual rainfall, mean
annual evapo-transpiration, stream density, mainstream slope, stream
length, and forest cover.
Muttiah et al.
(1997) USA Catchment areas, mean annual rainfall, and mean basin elevation.
Rahman (2005) Australia
Catchment area, design rainfall intensity, mean annual rainfall, mean
annual rain days, mean annual Class A pan evaporation, mainstream
slope, river bed elevation at the gauging station, maximum elevation
difference in the basin, stream density, forest cover, and fraction
quaternary sediment area.
Shu and Oarda
(2008) Canada
Catchment area, mean annual rainfall, mainstream slope, fraction of the
basin area covered with lakes and annual mean degree-days.
Riad et al. (2004) Morocco Catchment area and mean annual rainfall.
For the five predictor variables, there could be 31 different models. However, all these models
may not necessarily be useful since some combination of variables would only result in
weaker RFFA models. For example, catchment area has been found to be the most important
predictor variable in almost all the previous RFFA studies as shown in Table 5.1. The second
most important predictor variable has been reported to be design rainfall intensity (e.g. Javelle
et al., 2002; Jingyi and Hall, 2004). Hence, the combination of these two predictor variables is
likely to result in the most significant prediction equation than that is delivered by any two
other variables. In fact, previous Australian RFFA studies have found that these two predictor
variables generate the best RFFA prediction equation (e.g. Haddad and Rahman, 2012;
Haddad et al., 2014).
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 86
400020000 2001000 40200 50250 100500
1500
1000
5004000
2000
0 2000
1000
0200
100
040
20
0200
100
0200
100
0
200
100
0 800
400
0
15001000500
100
50
0
200010000 40200 2001000 8004000
evap
rain
area
slope
I_tc_2
I_tc_5
I_tc_10
I_tc_20
I_tc_50
I_tc_100
Matrix plot
Figure 5.1 Plot representing bi-variate correlations of the candidate predictor variables
In this study, eight different models are considered as shown in Table 5.2, which contain
catchment area and design rainfall intensity and combinations of the other three predictor
variables. This approach, however, makes an assumption that there is no other combination of
predictor variables (from these five variables) that would deliver a better model than any one
of these eight models. This assumption seems to be justified.
ANN and GEP based RFFA models were developed for each of the eight combinations of
predictor variables based on 362 training/model catchments. The details of the training of the
the ANN and GEP based RFFA models are presented in Chapter 7. The developed models
were then tested using 90 validation/test catchments. Prediction equation was developed for
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 87
each of the 2, 5, 10, 20, 50 and 100 years ARI flood quantiles. The set of predictor variables
giving the best results based on the 90 independent test catchments were finally selected.
Table 5.2 Various candidate models and catchment characteristics used
Model ID Variables Description of variables (details in section 5.2)
1 A, I_tc_ARI
A: catchment area
I_tc_ARI : design rainfall intensity
S: slope
E: evapo-transpiration
R: mean annual rainfall
2 A, I_tc_ARI, S
3 A, I_tc_ARI, E
4 A, I_tc_ARI, R
5 A, I_tc_ARI, S, E
6 A, I_tc_ARI, R, E
7 A, I_tc_ARI, R, S
8 A, I_tc_ARI , R, S, E
The following statistical measures were used to compare various RFFA models:
Ratio between predicted and observed flood quantiles:
Ratio of predicted and observed flood quantile = obs
pred
Q
Q (5.1)
Relative error (RE):
RE (%) = Abs
100
obs
obspred
Q
QQ (5.2)
Coefficient of efficiency (CE):
CE = 1 -
n
i
pred
n
i
predobs
1
2
1
2
)(
)(
(5.3)
Where Qpred is the flood quantile estimate from the ANNs-based or GEP based RFFA model,
Qobs is the at-site flood frequency estimate obtained from LP3 distribution using a Bayesian
parameter fitting procedure (Kuczera, 1999) and Q is the mean of Qobs. The median relative
error and median ratio values were used to measure the relative accuracy of a model. A
Qpred/Qobs ratio closer to 1 indicates a perfect match between the observed and predicted value
and a smaller median relative error is desirable for a model. A CE value closer to 1 is the best;
however a value greater than 0.5 is acceptable.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 88
5.3 Selection of Predictor variables for ANN based RFFA models
In the first stage, various ANN based RFFA models were compared based on median
Qpred/Qobs ratio, RE and CE values. Table 5.3 shows the median Qpred/Qobs ratio, RE and CE
values for various ANN based RFFA models. In the case of ANN, in terms of median
Qpred/Qobs ratio values for different models, the values range from 0.94 (Model 3 and Model 8)
to 1.69 (Model 6) with the best median Qpred/Qobs ratio value of 1.01 (Model 4) and good but
slightly under predicted value of 0.99 (Model 1). Models 2, 3, 4, 5, 6, 7 and 8 produce some
very good median Qpred/Qobs ratio values but for some ARIs they show notable variation e.g.,
Model 6 produces median Qpred/Qobs ratio value as 1.02 for Q50 but, 1.24 and 1.57 for Q2 and
Q20 respectively. Similarly, Model 6 median Qpred/Qobs ratio values range from 1.03 to 1.69.
Model 7 produces an overall median Qpred/Qobs ratio value of 1.17, with 36% over-prediction
for Q2 and 4% under-prediction for Q100. A clear inconsistency can be found in these models
with overall median Qpred/Qobs ratio values of 1.10 to 1.27. In case of Model 2, reasonably
good median Qpred/Qobs ratio values can be seen for all the ARIs except for Q10 with an
overestimation of 31% and an overall median Qpred/Qobs ratio value of 1.11. Model 1
consisting ‘A’ and I_tc_ARI outperforms the other models producing an overall median
Qpred/Qobs ratio value of 1.06 and ranging from 0.99 for Q5 to 1.14 for Q50. This model is
ranked as number 1 on the basis of median Qpred/Qobs ratio showing the consistency and good
estimates for all the ARIs.
The RE values for ANN based RFFA models for different ARIs range from 30.65% (Model
2) to 78.77% (Model 6) as mentioned in Table 5.3. Notable higher values can be seen for
Models 5, 6, 7 and 8 ranging from 40.28% to 78.77%. Models 3 and 4 produce RE values in
the range of 39.35% to 60.08%. But, for higher ARIs these two models are unable to maintain
this consistency especially for Q50 and Q100 with RE values of 55% and 60%. It can be seen
that Models 1 and 2 outperform the other models with RE values ranging from 30.65% to
50.01% and the overall values of 39.74% to 44.07%. In case of Model 2, a higher RE value
can be seen for smaller ARIs but it produces good result for 20 years ARI. Model 1 dominates
Model 2 in terms of consistency and competitive RE values for all the ARIs. Hence, Model 1
is regarded as the top model in terms of RE value.
Furthermore, when comparing different models for CE values, it can be found that Models 1,
2, 3 and 4 outperform the remaining four models. A poor performance can be seen in case of
Models 5, 6, 7 and 8 both for smaller and higher ARIs. Models 3 and 4 perform closely except
for Q10 where CE value is 0.72 for Model 3 as compared to 0.56 for Model 4. Overall, Models
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 89
1 and 2 are found to be performing well with CE value as 0.66. However, Model 1 exhibits
more consistency and better CE values for different ARIs when compared with closely
performing Model 2 as shown in Table 5.3. On the basis of results shown in Table 5.3, Model
1 (two variables) can be ranked as top model followed by Model 2 (three variables).
Table 5.3 Comparison of eight different ANN based RFFA models using 90 independent
test catchments
Models
Quantiles Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8
CE
Q2 0.73 0.68 0.78 0.76 0.37 0.21 0.40 0.70
Q5 0.61 0.52 0.65 0.65 0.79 0.28 0.71 0.70
Q10 0.63 0.78 0.72 0.56 0.46 0.40 0.39 0.67
Q20 0.71 0.72 0.68 0.68 0.57 0.67 0.27 -0.19
Q50 0.68 0.68 0.59 0.62 -0.33 0.36 0.54 0.14
Q100 0.52 0.57 0.44 0.55 0.37 0.33 0.45 0.37
Average 0.66 0.66 0.64 0.63 0.37 0.38 0.46 0.40
Median 0.64 0.68 0.67 0.64 0.42 0.35 0.43 0.52
Qpred/Qobs
(median)
Q2 1.04 1.09 0.94 1.26 1.37 1.19 1.36 1.20
Q5 0.99 1.03 1.02 1.13 1.13 1.09 1.08 1.24
Q10 1.02 1.31 1.31 1.07 1.34 1.41 1.21 1.06
Q20 1.04 1.09 1.06 1.01 1.26 1.07 1.19 0.94
Q50 1.14 1.06 1.41 1.16 1.17 1.69 1.22 1.11
Q100 1.10 1.09 1.14 1.30 1.37 1.03 0.96 1.03
Average 1.06 1.11 1.15 1.16 1.27 1.25 1.17 1.10
Median 1.04 1.09 1.14 1.16 1.27 1.19 1.19 1.10
RE (%)
(median)
Q2 37.56 49.93 44.22 46.98 55.75 40.28 61.36 44.05
Q5 40.39 50.01 39.60 44.25 49.56 57.66 38.28 46.78
Q10 44.63 43.98 55.26 39.35 49.87 55.01 55.20 44.68
Q20 35.62 30.65 49.42 40.69 47.48 51.66 46.90 52.95
Q50 39.09 44.00 55.01 41.10 69.61 78.77 46.66 66.80
Q100 44.53 44.13 51.18 60.08 55.75 53.11 53.72 49.20
Median 39.74 44.07 50.30 42.68 52.81 54.06 50.31 47.99
Average 40.30 43.78 49.12 45.41 54.67 56.08 50.35 50.74
In the second stage, the ANN based RFFA models are ranked on the basis of median
Qpred/Qobs ratio values. A criterion is developed to rank the models for different ARIs and the
catchments are rated as ‘good’, ‘reasonable’, ‘bad’ and ‘very bad’ as shown in Tables 5.4 and
5.5. In this stage, two top ranked models found in the first stage (i.e. Models 1 and 2) are
selected for comparison.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 90
From Table 5.5, it is clear that Model 1 outperforms Model 2 in terms of ‘good’ groupings
except for Q10 and Q20 with very small difference. On the other hand, Model 1 shows higher
number of stations in ‘reasonable’ groupings and lower number of stations for ‘bad’ and ‘very
bad’ groupings. Thus it can be concluded that Model 1 outperforms Model 2 when catchments
are rated on the basis of median Qpred/Qobs ratio. Table 5.6 and Table 5.7 show the comparison
between the best performing Model 1 and Model 2. As shown in Table 5.6, Model 1 provides
a median Qpred/Qobs ratio value closer to 1 as compared to Model 2 except for Q50. Similarly,
as shown in Table 5.7, Model 1 shows much smaller values of median RE for Q2, Q5 and Q50,
a similar median RE values for Q10 and Q100 and a higher median RE value for Q20. These
results demonstrate that overall Model 1 outperforms Model 2 for the ANN based RFFA
models.
Table 5.4 Rating on the basis of median Qpred/Qobs ratio
Group Ratios (Median)
Very bad less than 0.25 and above 4
Bad 0.26-0.49 and 2-4
Reasonable 0.5-0.69 and 1.41-2
Good 0.7-1.4
Table 5.5 Grouping of stations on the basis of median Qpred/Qobs ratio using the criteria of
Table 5.4 (ANN based RFFA models)
Model 1 Model 2
No. of stations No. of stations
Quantile Very bad Bad Reasonable Good Very bad Bad Reasonable Good
Q2 6 18 27 39 7 25 27 31
Q5 6 20 24 40 10 24 23 33
Q10 5 21 31 33 5 19 30 36
Q20 5 20 24 41 9 20 15 46
Q50 14 19 19 38 11 21 20 38
Q100 8 23 27 32 11 23 23 33
Overall (%) 9.7 26.8 33.6 49.3 11.7 29.2 30.5 48.0
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 91
Table 5.6 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio
value using 90 independent test catchments (ANN based RFFA models)
Quantiles Median Qpred/Qobs ratio
Model 1 Model 2
Q2 1.04 1.09
Q5 0.99 1.03
Q10 1.02 1.31
Q20 1.04 1.09
Q50 1.14 1.06
Q100 1.10 1.09
Table 5.7 Comparison of Model 1 and Model 2 on the basis of median relative error
(RE) values using 90 independent test catchments (ANN based RFFA models)
Quantiles RE (median) (%)
Model 1 Model 2
Q2 37.56 49.93
Q5 40.39 50.01
Q10 44.63 43.98
Q20 35.62 30.65
Q50 39.09 44.01
Q100 44.53 44.13
5.4 Selection of predictor variables based on GEP models
In the first stage, various GEP based RFFA models are compared based on median Qpred/Qobs
ratio, RE and CE values. Table 5.8 shows the median Qpred/Qobs ratio, RE and CE values for
various GEP based RFFA models. The median Qpred/Qobs ratio values range from 0.06 (Model
5) to 2.07 (Model 8) with the best median Qpred/Qobs ratio value of 1.02 (for Model 1 and
Model 7). Other models produce some very good median Qpred/Qobs ratio values but for some
ARIs they show notable variation e.g., Model 4 produces median Qpred/Qobs ratio value as 0.99
for Q5 but 1.49 and 1.42 for Q20 and Q10 respectively. Similarly, Model 8 median Qpred/Qobs
ratio value ranges from 0.02 to 1.50. Model 3 produces overall median Qpred/Qobs ratio value
of 0.97, with 57% over-prediction for Q50 and 89% under-prediction for Q100. A clear
inconsistency can be found in these models with overall median Qpred/Qobs ratio values of 1.10
to 1.27. In case of Models 2 and 3, reasonably good values can be seen for all the ARIs except
for Q20 (Model 2) and Q100 (Model 3) with an overestimation of 54% and a poor performance
for Q100 with median Qpred/Qobs ratio value of 0.22. Model 1 consisting variables A and I_tc_ARI
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 92
outperforms the other models producing an overall median Qpred/Qobs ratio value of 1.06 and a
range from 1.02 for Q20 and Q100 and 1.10 for Q5. Hence Model 1 can be ranked as number 1
on the basis of median Qpred/Qobs ratio.
Table 5.8 Comparison of eight different GEP based RFFA models using 90 independent
test catchments
Models
Quantiles Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8
CE
Q2 0.49 0.63 0.59 0.65 0.59 0.62 0.73 0.55
Q5 0.67 0.64 0.67 0.68 0.27 0.56 0.69 0.46
Q10 0.56 0.39 -9.59 -5.10 -14.09 0.25 0.46 -0.77
Q20 0.67 -2.86 0.52 0.56 0.62 0.63 0.61 0.58
Q50 0.63 0.33 -0.87 0.54 0.49 -17.74 0.10 0.10
Q100 0.67 -0.01 -0.28 -0.29 -0.44 -0.20 -0.20 -0.27
Average 0.61 -0.15 -1.49 -0.49 -2.09 -2.65 0.40 0.11
Median 0.65 0.36 0.12 0.55 0.38 0.41 0.53 0.28
Qpred/Qobs
(median)
Q2 1.07 1.30 0.94 1.13 0.93 1.07 0.98 1.21
Q5 1.10 0.98 0.99 0.99 1.24 1.05 1.02 1.50
Q10 1.04 1.13 1.08 1.42 0.94 1.29 1.35 1.18
Q20 1.02 1.54 1.13 1.49 1.26 1.20 1.32 1.33
Q50 1.05 1.25 1.57 1.18 1.20 -0.62 1.97 2.07
Q100 1.02 0.22 0.11 0.26 -0.06 0.27 0.10 0.02
Average 1.02 1.07 0.97 1.08 0.92 0.71 1.12 1.22
Median 1.03 1.19 1.03 1.16 1.07 1.06 1.17 1.27
RE (%)
(median)
Q2 45.87 50.28 81.35 70.28 76.48 66.28 43.78 70.09
Q5 44.95 56.16 57.29 46.01 85.03 46.63 49.06 58.08
Q10 42.08 64.72 43.42 55.57 56.01 91.40 56.11 45.78
Q20 41.53 93.67 46.93 51.31 43.72 43.09 47.08 51.56
Q50 37.87 61.25 70.60 50.44 61.30 218.36 96.53 107.00
Q100 44.47 82.18 88.77 78.55 107.73 76.78 90.15 98.19
Median 43.27 62.98 63.94 53.44 68.89 71.53 52.59 64.09
Average 42.97 68.04 64.73 58.69 71.71 90.42 63.79 71.78
The RE values for various GEP based RFFA models for different ARIs range from 37.87%
(Model 1) to 218.36% (Model 6) as can be seen in Table 5.8. Notable higher RE values can
be seen for Models 5, 6, and 8 ranging from 43% to 218%. Models 4 and 7 produce RE values
in the range of 43% to 96%. Despite comparatively higher RE values, a consistency can be
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 93
found for these two models. But, for higher ARIs these two models are unable to maintain this
consistency especially for Q50 and Q100 with RE values of 96% and 88%. Models 1, 2 and 3
outperform the other models with RE values ranging from 37% to 88% and the overall values
of 43% to 63%. In case of Model 2, a higher RE values can be seen for medium to higher
ARIs but it produces good results for Q2 and Q5. Overall, Model 1 dominates Model 2 in
terms of consistency and competitive RE values for all the ARIs and hence Model 1 is ranked
number 1.
Furthermore, when comparing different models with respect to CE values, it is found that
Models 1, 2 and 7 outperform the remaining five models. A poor performance can be seen in
case of these five models. Overall, a good performance can be seen in case of small to
medium ARIs for Models 2 and 7; however, they perform poorly in case of higher ARIs.
Overall, Model 1 is found to be performing well with an average CE value of 0.61. However,
Model 1 exhibits more consistency and better CE values for different quantiles when
compared with closely performing Models 2 and 7, as shown in Table 5.8.
Hence, for GEP based RFFA models, Model 1 with two predictor variables (A, I_tc_ARI)
outperforms other models with respect to median Qpred/Qobs ratio, RE and CE values as
demonstrated in Table 5.8.
At the second stage, the GEP based RFFA models are ranked on the basis of median
Qpred/Qobs ratio values as shown in Table 5.9. Similar to ANN based RFFA models, criterion
is developed to rank the models for different ARIs and the catchments are rated as ‘good’,
‘reasonable’, ‘bad’ and ‘very bad’ as shown in Table 5.9. In the second stage two top ranked
models (selected in stage 1) (Model 1 and Model 2) are selected for comparison. From Table
5.9, it is clear that Model 1 outperforms Model 2 in terms of ‘good’ except for smaller ARIs.
Also, Model 1 shows higher number of stations in ‘reasonable’ grouping and lower number of
stations in the ‘bad’ and ‘very bad’ groupings. Hence, it can be concluded that Model 1
outperforms Model 2 when catchments are rated on the basis of median Qpred/Qobs ratio.
Tables 5.12 and Table 5.13 show the comparison between best performing Models 1 and 2.
As shown in Table 5.10, Model 1 provides a median Qpred/Qobs ratio value closer to 1 as
compared to Model 2. Similarly, as shown in Table 5.13, Model 1 shows much smaller values
of median RE for all the ARIs.
On the basis of results shown in Table 5.8, Model 1 (A, I_tc_ARI) can be ranked as top model
followed by Model 2 (A, I_tc_ARI, S) for the GEP based RFFA models.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 94
Table 5.9 Grouping of stations on the basis of median Qpred/Qobs ratio values using the
criteria of Table 5.4 for GEP based RFFA models
Table 5.10 Comparison of Models 1 and 2 on the basis of median Qpred/Qobs ratio values
using 90 independent test catchments (for GEP based RFFA models )
Quantiles Median Qpred/Qobs ratio
Model 1 Model 2
Q2 1.07 1.30
Q5 1.10 0.98
Q10 1.04 1.13
Q20 1.02 1.54
Q50 1.05 1.25
Q100 1.02 0.22
Table 5.11 Comparison of Models 1 and 2 on the basis of RE values using 90
independent test catchments (for GEP based RFFA models)
Quantiles RE (median) (%)
Model 1 Model 2
Q2 45.87 50.28
Q5 44.95 56.16
Q10 42.08 64.72
Q20 41.53 93.67
Q50 37.87 61.25
Q100 44.47 82.18
Model 1 Model 2
No. of stations No. of stations
Quantile Very bad Bad Reasonable Good Very bad Bad Reasonable Good
Q2 20 24 17 29 7 28 23 32
Q5 14 21 23 32 15 25 18 32
Q10 13 21 24 32 15 25 21 29
Q20 24 32 19 15 29 21 18 16
Q50 18 23 21 27 13 23 23 31
Q100 17 21 28 24 31 26 14 19
Overall (%) 23.5 31.4 29.2 35.2 24.3 32.7 25.9 35.2
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 95
5.5 Summary
This chapter has examined various combinations of predictor variables to select the best set to
be adopted in the RFFA modelling. Two artificial intelligence based modelling techniques
(ANN and GEP) are used to develop the prediction equations using data of the selected 362
catchments. Independent testing is performed using 90 test catchments. Models are assessed
based on ratio between predicted and observed flood quantiles, percent relative error and
coefficient of efficiency. Based on the independent testing, it has been found that the ANN
and GEP based RFFA models with only two predictor variables (catchment area and design
rainfall intensity) outperform other models with a greater number of predictor variables. This
model would be easier to apply in practice as the data for two predictor variables can be
obtained relatively easily from the published maps and government websites. In the
subsequent analyses presented in the next chapters, these two predictor variables
(catchment area and design rainfall intensity) will be used.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 96
CHAPTER 6
SELECTION OF REGIONS
6.1 General
In regional flood frequency analysis (RFFA), one of the key steps is to identify the
acceptable/optimum region(s) which consist(s) of a set of gauged catchments that may be
treated as homogeneous. Previous chapters cover the selection of study area, catchment data
and the predictor variables to be used in the RFFA presented in this study. This chapter
focuses on the formation and comparison of regions based on state, geographic and climatic
boundaries as well as based on the catchment attributes. These regions are tested by
developing RFFA models using artificial neural network (ANN) technique and the best
performing region is then selected (as the optimum region) based on the results of the
comparison of the alternative regions. This optimum region is then used to develop RFFA
models using all the selected artificial intelligence based RFFA methods considered in this
thesis.
6.2 Description of candidate regions
To identify the optimum regions for RFFA modeling in eastern Australia, a number of
candidate regions are formed as discussed below.
Regions based on state and geographic boundaries
Initially, each of the states of Victoria (VIC), New South Wales (NSW), Queensland (QLD)
and Tasmania (TAS) are treated as a separate region. The data for each of these regions are
discussed in detail in section 4.6. These states cover the eastern part of Australia (Figure 4.1).
These candidate regions are shown in Table 6.1.
Regions based on climatic boundaries
The Australian northern part is dominated by summer rainfall and the southern part is mainly
dominated by winter rainfall. In this step, data set is divided into two sub-sets i.e., summer
dominated rainfall region (SDRR) and winter dominated rainfall region (WDRR).
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 97
Combined data set
Here, the data for all the four states are combined to form one region. The detail of all the
candidate regions based on state boundaries, geographic and climatic conditions are shown in
Table 6.1.
Table 6.1 Description of candidate regions
Region label Description of region No. of stations Abbreviated region name
1 New South Wales 96 NSW
2 Victoria 131 VIC
3 Queensland 172 QLD
4 Tasmania 53 TAS
5 Combined Data Set 452 Combined
6 Summer Dominated Rainfall Region 203 SDRR
7 Winter Dominated Rainfall Region 249 WDRR
6.2.1 Selection of the best performing region based on state, geographic and
climatic boundaries
In each of these candidate regions, the available data set is divided into two parts: (i) 80% for
training (training data set); and (ii) 20% for testing/validation (validation data set). These sets
are selected randomly from the respective grouping. For each grouping, the ANN-based
RFFA model is built and used to predict 2, 5, 10, 20, 50 and 100 years ARI flood quantiles for
the selected 20% test catchments. The structure, algorithm and other criteria of ANN based
analyses are kept uniform throughout the analysis and are explained in Chapter 3.
Three statistical measures i.e. Qpred/Qobs ratio, relative error (RE) and coefficient of efficiency
(CE) (as mentioned in section 5.2) are used to assess the model performance.
Table 6.2 summarises the median Qpred/Qobs ratio values for the seven candidate regions. For
NSW candidate region, median Qpred/Qobs ratio for Q10 is too small (0.17) which indicates a
significant under-estimation. Also, for this region, Q50 shows remarkable over-estimation with
a median Qpred/Qobs ratio of 1.82. For VIC candidate region, all the median Qpred/Qobs ratios
seem to be reasonable with a range of 0.86 to 1.49. For QLD region, both Q50 and Q100 show
an excellent median Qpred/Qobs ratio closer to 1.00 and median Qpred/Qobs ratios are in the range
of 0.98 to 1.48, which appear to be reasonable. For TAS region, Q50 shows notable
overestimation with a median Qpred/Qobs ratio value of 2.46.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 98
For SDRR and WDRR, results are better than the individual states except for Q50 for the
WDRR, which shows a notable overestimation with a median Qpred/Qobs ratio of 2.02. It seems
that when the region size increases, the median Qpred/Qobs ratio values are more consistent
over different ARIs. When all the data sets are combined together, the median Qpred/Qobs ratio
values show remarkable improvement with a range of 0.99 to 1.14, which appears to be
satisfactory. There are smaller differences in the median Qpred/Qobs ratio values across various
ARIs for the combined data set as compared to other regions as illustrated in Figure 6.1.
Table 6.2 Median Qpred/Qobs ratio values for seven ANN based candidate regions
Quantiles Candidate regions based on state, geographic and climatic boundaries
NSW VIC QLD TAS SDRR WDRR Combined
Q2 1.38 1.06 1.28 1.08 1.14 1.25 1.04
Q5 0.84 1.13 1.48 1.56 1.21 1.06 0.99
Q10 0.17 0.86 1.11 1.65 1.38 1.26 1.02
Q20 1.53 1.49 1.11 0.74 0.84 1.28 1.04
Q50 1.82 1.17 0.98 2.46 1.32 2.02 1.14
Q100 1.22 1.24 1.00 1.05 1.21 1.33 1.10
In terms of median of the absolute relative error values (Table 6.3), for NSW Q10 and Q50
show very high median relative error values, which are 91% and 82% respectively. The best
results are found for Q20 and Q100 with median relative error values close to 50%. For VIC
region, median relative error values for Q50 and Q100 are in the range of 66% to 78%, which
appear to be quite high. For QLD region, median relative error values are in the range of 37%
to 58% which seems to be consistent across various ARIs and the best result among the
individual states. For TAS region, Q50 has a very high median relative error value (146%), for
other ARIs results are quite reasonable. It seems that there is a sharp increase and decrease in
median relative error values from Q50 to Q100 which is unexpected. This indicates that for very
small data set (TAS region has only 53 stations) ANN-based RFFA model provides
inconsistent results across various ARIs.
For SDRR and WDRR, the median relative error values are in the range of 29% to 57% and
43% to 102%, respectively. Here all the median relative error values are in the reasonable
range except for Q50 for WDRR region. When all the data are combined the median relative
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 99
error values are consistent across all the ARIs (in the range of 37% to 44%). There are smaller
differences in the median relative error values across various ARIs for the combined data set
as compared to other regions, as illustrated in Figure 6.2. These results clearly show that the
combined data set provides the smallest median relative error values among all the seven
candidate regions, which is also consistent in terms of median Qpred/Qobs ratio values as
discussed before.
Table 6.3 Median relative error values (%) for seven ANN-based candidate regions
Quantile Candidate regions based on state, geographic and climatic boundaries
NSW VIC QLD TAS SDRR WDRR Combined
Q2 48.21 78.05 42.42 65.77 52.40 48.50 37.56
Q5 51.94 40.89 50.24 55.52 29.87 53.03 40.39
Q10 91.52 39.75 37.67 64.61 52.79 43.88 44.63
Q20 53.17 55.58 37.67 38.19 43.12 52.75 35.62
Q50 82.08 73.75 57.90 146.47 57.66 102.13 39.09
Q100 50.00 66.88 58.45 15.28 54.85 67.72 44.53
Overall 62.82 59.15 47.39 64.31 48.45 61.34 40.30
Figure 6.1 Plot of median Qpred/Qobs ratio values for different ARIs for selected regions
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 100
Figure 6.2 Median relative error (%) values for different ARIs for selected regions
6.3 Regions based on catchment characteristics data
To identify regions/groups of catchments in catchment characteristics data space, two
methods are adopted in this thesis: cluster analysis and principal component analysis. These
methods have been discussed in Chapter 3. In the cluster and principal component analyses,
five catchment characteristics variables (catchment area, design rainfall intensity, mean
annual evapo-transpiration, mean annual rainfall and main stream slope) are adopted.
6.3.1 Cluster analysis
The hierarchical cluster analysis
Hierarchical clustering is one of the most straightforward methods. For this study the
hierarchical clustering is used with a combination of Wards-Block method, as discussed in
Chapter 3.
K-means clustering
In this method all variables are given equal weights. The best results obtained from cluster
analysis are summarised in Table 6.4, which deliver two groupings: A1 (405 stations) and A2
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 101
(45 stations) from Wards-Block clustering and B1 (362 stations) and B2 (90 stations) from K-
Means clustering.
Table 6.4 Regions/groups formation by cluster analysis
Method Total no. of
stations Grouping Grouping
Out of cluster
stations
Wards-Block Cluster
combination 452 405 (A1) 45(A2) 2
K-Means Cluster 452 362 (B1) 90 (B2) 0
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 102
Figure 6.3 Dendrogram using average linkage between groups
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 103
Figure 6.3 (a) Section of Dendrogram using average linkage between groups
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 104
Figure 6.3 (b) Section of Dendrogram using average linkage between groups
In terms of median ratio values, for individual ARIs grouping A1 outperforms the other
groupings (A2, B1, and B2) except for Q20, where A2 performs better than A1 as shown in
Tables 6.5 and 6.6. When comparing the overall Qpred/Qobs ratio values, A1, B1 and B2
perform similarly (with median Qpred/Qobs ratio values 1.1 or 1.2); here, A2 performs quite
poorly with median Qpred/Qobs ratio value of 1.9. In terms of median relative error, grouping
A1 seems to be producing consistent and reasonable results. For grouping A2, median relative
error values for Q50 and Q100 are very high (164% and 191%, respectively), a similar
observation for Q50 for grouping B1 and Q5 and Q10 for grouping B2 can be seen in Tables 6.5
and 6.6. Overall, grouping A1 shows the best results among cluster groupings. However, if
both groupings A1 and A2 are compared (generated by Wards-Block cluster analysis method)
against groupings B1 and B2 (generated by K-means cluster analysis method), groupings B1
and B2 perform better than groupings A1 and A2. This shows that K-means cluster analysis
method has generated better groupings than the Wards-Block cluster analysis method.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 105
Table 6.5 ANN based RFFA model performances for cluster groupings A1 & A2
Quantile Grouping A1
(405 stations)
Grouping A2
(45 stations)
ARI Qpred/Qobs ratio
(Median)
RE (Median)
(%)
Qpred/Qobs ratio
(Median)
RE (Median)
(%)
Q2 1.0 44.6 2.3 132.4
Q5 1.2 45.4 1.5 48.7
Q10 1.1 44.4 1.1 41.6
Q20 1.4 56.0 1.1 41.4
Q50 1.3 54.5 2.6 164.6
Q100 1.3 47.5 2.9 191.3
Overall 1.2 48.7 1.9 103.3
Table 6.6 ANN- based RFFA model performances for cluster groupings B1 & B2
Quantile Grouping B1
(362 stations)
Grouping B2
(90 stations)
ARI Qpred/Qobs ratio
(Median)
RE (Median)
(%)
Qpred/Qobs ratio
(Median)
RE (Median)
(%)
Q2 0.9 52.6 1.3 55.8
Q5 1.1 57.9 1.0 71.0
Q10 0.9 38.6 1.7 75.0
Q20 0.8 39.1 0.7 41.5
Q50 1.3 61.5 1.1 14.6
Q100 1.4 46.7 1.1 56.1
Overall 1.1 49.4 1.2 52.3
6.3.2 Principal component analysis
At the second stage, the principal component analysis (PCA) is undertaken. The eigenvalue
and the percentage variance explained for each of the derived 5 principal components are
listed in Table 6.7. The first two components have eigenvalues greater than 1, and account for
about 60% of the total variance. However, component 3 has eigenvalue not significantly
different from 1 (0.957). However, the component one (PC1) and component two (PC2)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 106
account for more than 50% of the variation in the data, hence PC1 and PC2 may be deemed to
be adequate in capturing the bulk of the information in the data. The plots of PC1 vs PC2 are
shown in Figures 6.5 and 6.6. In Figure 6.5, two groups are formed based on PC1: Group C1
with PC1 0.0 and Group C2 with PC2 < 0. In Figure 6.6, similarly two groups are formed
based on PC2: Group D1 with PC 0 and Group D2 with PC2 < 0. Table 6.7 summarises
these groupings. Table 6.8, explains the component matrix later named as PC1 and PC2.
Table 6.9 explains the statistics of different variables used in this study.
Table 6.7 Eigenvalues and variance explained by the principal components
Component
Initial eigenvalues
Total % of variance Cumulative %
1 1.758 35.160 35.160
2 1.236 24.718 59.878
3 0.957 19.149 79.027
4 0.774 15.481 94.508
5 0.275 5.492 100.000
Table 6.8 Component matrix in principal component analysis
Component
1 2
Zevap -0.042 0.451
ZI_12_2 0.899 -0.209
Zrain 0.906 -0.156
Zarea -0.253 -0.708
Zslope 0.249 0.68
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 107
Figure 6.4 Scree plot from principal component analysis
Table 6.9 Descriptive statistics of standardised variables
Mean Standard deviation No. of data points
Zevap .0141 1.019 360
ZI_12_2 -.0413 0.976 360
Zrain -0.025 0.995 360
Zarea 0.017 1.036 360
Zslope 0.025 1.085 360
f
Figure 6.5 Grouping derived from PC1 vs PC2 plot based on PC1
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 108
Figure 6.6 Grouping derived from PC1 vs PC2 plot based on PC2
In each of these accepted candidate groupings, the available data set is divided into 80% for
training, and 20% for testing. Similar assessment criteria are used as mentioned in Section
5.4.1.
Table 6.10 shows the results of the performance assessment of the PCA-based groupings.
With respect to median Qpred/Qobs ratio values, grouping D1 outperforms other PCA-based
groupings. With respect to median relative error values, grouping D1 is the best performer.
Overall, groupings based on PC2 (i.e. groupings D1 and D2) outperforms the grouping based
on PC1 (which are groupings C1 and C2).
Now, if the best grouping based on cluster analysis (which are B1 and B2) are compared with
the best PCA-based grouping (which is D1 and D2), in terms of relative error, they perform
quite similarly, with little better performances for cluster analysis grouping B1 and B2.
Hence, it can be concluded that K-means cluster analysis generates the best performing
groups/regions in the catchment characteristics data space.
In the Tables 6.11 and 6.12, the results of the best catchment characteristics based groupings
(which are B1 and B2) are compared with various geographic regions as discussed in Section
6.2.1.
In the last step, the groups performing better in case of cluster analysis and PCA are compared
with the candidate regions based on geographic/state boundaries (Section 6.2.1). Table 6.11
and 6.12 summarise the results based on different candidate regions.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 109
Table 6.10 Grouping based on principal component analysis
Grouping based on PC1
Grouping based on PC2
Quantile
s Grouping C1 Grouping C2 Grouping D1 Grouping D2
ARI
Qpred/Qobs
ratio
(Median)
RE
(Median)
(%)
Qpred/Qobs
ratio
(Median)
RE
(Median)
(%)
Qpred/Qobs
ratio
(Median)
RE
(Median)
(%)
Qpred/Qobs
ratio
(Median)
RE
(Median
)
(%)
Q2 1.3 48.1 1.4 55.1 1.5 52.3 1.8 80.7
Q5 1.4 64.0 1.2 62.5 1.4 48.4 1.0 47.8
Q10 1.4 44.8 0.9 51.6 1.1 48.7 1.2 35.4
Q20 1.3 59.7 1.4 54.4 1.2 41.1 1.5 45.9
Q50 1.2 58.3 1.2 53.0 1.1 50.3 1.4 60.7
Q100 0.5 91.5 1.2 44.1 1.5 53.5 0.9 68.5
Overall 1.2 61.1 1.2 53.5 1.3 49.1 1.3 56.5
In terms of median Qpred/Qobs ratio values both the groupings based on cluster analysis and
PCA outperform the groupings based in individual states as shown in Table 6.11. However,
the grouping A1 performs better than grouping D1 except for Q20 and Q100. But in terms of
consistency and an overall value of median Qpred/Qobs ratio, grouping A1 is found to perform
well. Finally grouping A1 is compared with combined data set. Both groupings perform
almost similar except for Q2 and Q10, but for the other ARIs combined data set outperform
grouping A1. Combined data set also shows an overall consistency and better average value
of median Qpred/Qobs ratio. Hence on the basis of median Qpred/Qobs ratio value, it can be
concluded that combined data set perform better than all other candidate regions.
Table 6.12 shows the median relative error values for grouping A1, D1, individual states and
combined data set. All the groups based on state boundaries show the poor performance
except for QLD which shows better results for small to medium ARIs. However this region
shows an overall poor performance. When A1 is compared with D1, it is noticed that overall
both groups perform approximately similar to each other. However, A1 performs better for
smaller ARIs while D1 performs well for higher ARIs; but overall, grouping based on cluster
analysis outperform the grouping based on PCA. Moreover, when the median relative error
values are compared between grouping A1 and combined data set; the latter is found to be
performing well except for Q2 as shown in Figure 6.8.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 110
Hence on the basis of median Qpred/Qobs ratio and median relative error values it can be
concluded that combined data set perform better than all other candidate regions and can be
used for final model development.
Table 6.11 Median Qpred/Qobs ratio values for seven candidate regions
Quantiles Grouping A1
(cluster analysis)
Grouping D1
(PCA) NSW VIC QLD TAS Combined
Q2 1.0 1.5 1.4 1.1 1.3 1.1 1.4
Q5 1.2 1.4 0.8 1.1 1.5 1.6 1.1
Q10 1.1 1.1 0.2 0.9 1.1 1.6 1.2
Q20 1.4 1.2 1.5 1.5 1.1 0.7 1.1
Q50 1.3 1.1 1.8 1.2 1.0 2.5 1.1
Q100 1.3 1.5 1.2 1.2 1.0 1.0 1.1
Overall 1.2 1.3 1.2 1.2 1.2 1.4 1.1
Table 6.12 Median relative error (%)
Quantiles Grouping A1
(Cluster analysis)
Grouping D1
(PCA) NSW VIC QLD TAS Combined
Q2 44.6 52.3 48.2 78.1 42.4 65.8 56.2
Q5 45.4 48.4 51.9 40.9 50.2 55.5 41.4
Q10 44.4 48.7 91.5 39.8 37.7 64.6 39.1
Q20 56.0 41.1 53.2 55.6 37.7 38.2 37.2
Q50 54.5 50.3 82.1 73.7 57.9 146.5 40.0
Q100 47.5 53.5 50.0 66.9 58.4 15.3 39.6
Overall 48.7 49.1 62.8 59.2 47.4 64.3 42.3
Figure 6.7 Median Qpred/Qobs ratio values for different ARIs for candidate regions
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 111
Figure 6.8 Median relative error (%) values for different ARIs for candidate regions
Figure 6.9 Comparison of median relative error (%) values between combine data set and
grouping based on K-Means cluster analysis
6.4 Summary
This chapter has focused on the application of artificial neural network (ANN) based regional
flood frequency analysis (RFFA) in eastern Australia with a particular focus on the formation
of regions. Regions/groupings are first formed on the basis of state/geographic boundaries and
climatic boundaries. In the second step, the regions are formed in the catchment
characteristics data space based on cluster analysis and principal component analysis. It has
been found that that K-Means cluster analysis generates the best performing groups/regions in
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 112
the catchment characteristics data space. When compared with the geographic regions, some
state-based groupings perform poorer than the K-Means cluster groupings. Overall, the best
ANN based RFFA model is achieved when all the data of 452 catchments are combined
together, which gives a RFFA model with median relative error of 37% to 44%. Since all the
stations when combined together form the best performing region, this will be used in the
subsequent chapters for other artificial intelligence based RFFA model building.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 113
CHAPTER 7
DEVELOPMENT OF ARTIFICIAL
INTELLIGENCE BASED RFFA MODELS
7.1 General
Previous two chapters have presented the selection of predictor variables and optimum region
for the development of artificial intelligence based RFFA models for eastern Australia. This
chapter presents the development of RFFA models based on the selected predictor variables
and optimum region using four artificial intelligence based methods, artificial neural networks
(ANN), genetic algorithm based artificial neural networks (GAANN), gene-expression
programing (GEP) and co-active neuro fuzzy inference system (CANFIS). A description of
these methods has been provided in Chapter 3.
The model development presented in this chapter involves training of a model using part of
the randomly selected data set. For this purpose, 80% (362 catchments) of the total 452
catchments are used to train the model (training data set) and the remaining 20% (90
catchments) are used to validate the model (validation data set). This division of the data set
has been done randomly. In the traditional hydrological model building sense, the
training/calibration of a model involves identification of a set of model parameters that allows
satisfactory transformation of selected model input(s) to model output(s). In case of
hydrological models, the calibration is generally carried out by a ‘trial and error’ method.
In this study, the artificial intelligence based models, which are basically black box type
models, are trained/calibrated using the training data set based on minimisation of the mean
squared error between the observed and predicted flood quantiles by the model (being trained)
for a given ARI for the training data set. The artificial intelligence based RFFA models are
also evaluated based on four criteria: median Qpred/Qobs ratio, plot of Qobs and Qpred, median
relative error (RE) and coefficient of efficiency (CE). This is initially done for the training
data set and then repeated for the validation data set. Models are ranked based on their relative
performances in relation to these criteria to identify the best trained/calibrated model.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 114
7.2 Training of artificial intelligence based RFFA models
At the beginning each of the four artificial intelligence based RFFA models is trained using
MATLAB codes (developed as a part of this research) by minimising the mean squared error
between the observed and predicted flood quantiles for each of six ARIs (2, 5, 10, 20, 50 and
100 years). This is done using the training data set consisting of 362 catchments as mentioned
in Section 7.1. Table 7.1 and Figure 7.1 show the CE values for the ANN, GANN, GEP and
CANFIS based RFFA models. Among these four models, the GAANN is found have the
highest CE values for ARIs of 2, 5, 10 and 20 years. For ARIs of 50 and 100 years, the ANN
has the highest CE values. Considering all the six ARIs, GAANN has the highest CE value
(0.71) and the three other models have similar CE values in the range of 0.67 to 0.66.
Table 7.1 CE values of four artificial intelligence based RFFA models based on training
data set ARI (years) ANN GAANN GEP CANFIS
2 0.59 0.76 0.69 0.64
5 0.73 0.79 0.72 0.67
10 0.64 0.76 0.73 0.75
20 0.71 0.76 0.65 0.73
50 0.70 0.57 0.61 0.53
100 0.64 0.63 0.57 0.62
Overall 0.67 0.71 0.66 0.66
Figure 7.1 Plot of CE values of four artificial intelligence based RFFA models based on training
data set
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 115
Table 7.2 and Figure 7.2 show the median Qpred/Qobs ratio values for the four artificial
intelligence based RFFA models. The ANN based RFFA model shows the best performance
(i.e. Qpred/Qobs ratio value is closest to 1.00) for ARIs of 20, 50 and 100 years. Considering all
the six ARIs, the ANN outperforms the other three models with an overall Qpred/Qobs ratio
value of 1.09. The second best performance is demonstrated by the GEP (1.19), while the
GAANN and CANFIS perform similarly. In terms of consistency over the ARIs, GAANN,
GEP and CANFIS show very high Qpred/Qobs ratio values for some ARIs as can be seen in
Table 7.2. Here again, the ANN shows the best consistency over the ARIs.
Table 7.2 Median Qpred/Qobs ratio values of four artificial intelligence based RFFA
models based on training data set
ARI (years) ANN GAANN GEP CANFIS
2 1.03 1.22 0.99 1.76
5 1.12 1.20 1.08 0.99
10 1.06 1.02 1.08 0.87
20 1.10 1.11 1.17 1.26
50 1.08 1.52 1.45 1.04
100 1.15 1.18 1.39 1.36
Overall 1.09 1.21 1.19 1.21
Figure 7.2 Plot of median Qpred/Qobs ratio values of four artificial intelligence based RFFA
models based on training data set
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 116
Table 7.3 and Figure 7.3 show the median of the absolute relative error values for the ANN,
GAANN, GEP and CANFIS based RFFA models. It can be seen that ANN based RFFA
model outperforms the other models with a median RE value of 42.07% over all the six ARIs.
In some cases, the GAANN based RFFA model performs better or equal to the ANN based
model i.e. for ARIs of 2, 5, 20 and 100 years; however, for 50 years ARI it shows a very high
RE (60%). In terms of consistency over the ARIs, ANN outperforms the other three models.
Both GEP and CANFIS have quite high RE values (GEP = 54.02%, CANFIS = 59.46%).
Importantly, CANFIS shows very high RE values for 2 years ARI (94.02%) and 50 years ARI
(71.94%). Overall, in terms of RE value, the ANN is the best performer, followed by the
GAANN, GEP and CANFIS.
Table 7.3 Median RE (%) values of four artificial intelligence based RFFA models
(training)
ARI (years) ANN GAANN GEP CANFIS
2 43.75 40.92 73.3 94.02
5 39.53 39.31 43.91 43.55
10 39.14 41.01 43.25 45.27
20 40.38 40.29 54.61 46.07
50 43.32 60.00 54.22 71.94
100 46.30 45.28 54.82 55.89
Overall 42.07 44.47 54.02 59.46
Figure 7.3 Plot of median RE (%) values of four artificial intelligence based RFFA models
based on training data set
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 117
The predicted and the observed flood quantiles for the ANN based RFFA model for 20 years
ARI is shown in Figure 7.4 (the plots for the other five ARIs can be seen in Appendix B,
Figures B.1 to B.5). The reason of adopting 20 years ARI is that it is the most frequently
applied ARI in design. These plots generally present a good agreement between the predicted
and observed flood quantiles; however, there is some over-estimations by the ANN-based
RFFA model when the observed flood quantiles are smaller than about 50 m3/s for all the
ARIs except 50 years. Most of the training catchments are within a narrow range of
variability from the 45-degree line except for a few outliers, in particular for higher
discharges. Overall, the ANN based RFFA model shows better training results for higher
discharges.
Figure 7.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model
for Q20 (training data set)
Figure 7.5 shows the plot of predicted flood quantiles by the GAANN-based RFFA model
and the observed flood quantiles for 20 years ARI (the plots for the other four ARIs can be
seen in Appendix B, Figures B.6 to B.10). These plots show that GAANN based RFFA model
generally presents a good agreement between the observed and predicted flood quantiles;
however, for ARI of 50 years (Figure B.9) (and to some degree for ARI of 5 years), there is a
notable overestimation by the GAANN based RFFA model. Also, the 100 years ARI (Figure
B.10) shows a notable scatter around the 45-degree line, in particular for small and medium
discharges. Overall, the GAANN based RFFA model shows better training results for higher
discharges.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 118
Figure 7.5 Comparison of observed and predicted flood quantiles for GAANN based RFFA
model for Q20 (training data set)
Figure 7.6 compares the predictcted flood quantiles by the GEP based RFFA model with the
observed flood quantiles for 20 years ARI (Q20) (the plots for the other four ARIs can be seen
in Appendix B, Figures B.11 to B.15). Figure 7.6 generally presents a good agreement
between the predicted and observed flood quantiles. For the 2 and 5 years ARIs (Figures B.11
and B.12, respectively), there are few outliers and for 50 and 100 years ARIs (Figures B.14
and B.15, respectively), there is noticeable over estimation by the GEP based RFFA model for
small to medium discharges. Overall, the GEP based RFFA model shows better training
results for higher discharges.
Figure 7.7 shows the plot of predicted flood quantiles by the CANFIS based RFFA model and
the observed flood quantiles for 20 years ARI (the plots for other ARIs can be seen in
Appendix B, Figures B.16 to B.20). Figure 7.7 shows an over estimation by the CANFIS
based RFFA model for smaller discharges for 20 years ARI. A very similar pattern can be
seen for ARI of 5 years (Figure B.17) and ARI of 100 years (Figure B.20). For ARI of 2 years
(Figure B.16) and ARI of 10 years (Figure B.18), number of outliers can be seen plus a
noticeable scatter around the 45-degree line. For 50 years ARI (Figure B.19), the scatter
around the 45-degree line is significant. Overall, the CANFIS based RFFA model shows
better training results for higher discharges for all the ARIs except 50 years.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 119
Figure 7.6 Comparison of observed and predicted flood quantiles for GEP based RFFA model
for Q20 (training data set)
Figure 7.7 Comparison of observed and predicted flood quantiles for CANFIS based RFFA
model for Q20 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 120
7.3 Comparison of training and validation results
7.3.1 ANN
The CE, median Qpred/Qobs ratio and median relative error values are compared in Table 7.4
for the training and validation datasets for the ANN based RFFA model. Figures 7.8, 7.9 and
7.10 compare the CE, median Qpred/Qobs ratio values and median relative error values,
respectively for the ANN based RFFA model. In terms of CE value, the best agreement
between the training and validation data sets is found for ARIs of 10, 20 and 50 years, a
reasonable degree of agreement is found for ARIs of 2 and 5 years and relatively poor
agreement is found for the ARI of 100 years where the CE value for the validation data set is
remarkably small. With respect to median Qpred/Qobs ratio value, the best agreement between
the training and validation data sets is found for 2 years ARI, a moderate agreement is noticed
for 10, 20, 50 and 100 years ARIs and a poor agreement is found for 5 years ARI. However,
for 5 years ARI validation data set gives a very good Qpred/Qobs ratio value (0.99). In relation
to the median relative error values, the best agreement between the training and validation
data sets is found for ARIs of 5 and 100 years, a moderate agreement for ARI of 50 years and
poor agreement for ARIs of 2 and 10 years. From these results, it is noted that the ANN based
RFFA model shows different degrees of agreement between the training and validation data
sets for different ARIs across the three criteria adopted here.
Table 7.4 Comparison of training and validation results for the ANN based RFFA model
Training Validation
ARI (years) CE Qpred/Qobs ratio
(median)
RE (%)
(median) CE
Qpred/Qobs ratio
(median) RE (median)
2 0.59 1.03 43.75 0.69 1.04 37.56
5 0.73 1.12 39.53 0.59 0.99 40.39
10 0.64 1.06 39.14 0.63 1.02 44.63
20 0.71 1.10 40.38 0.69 1.04 35.62
50 0.70 1.08 43.32 0.68 1.14 39.09
100 0.64 1.15 46.30 0.40 1.10 44.53
Overall 0.67 1.09 42.07 0.61 1.06 40.30
Figures 7.11 to 7.13 show some example plots generated during the training of the ANN
based RFFA model. Figure 7.11 shows the regression plot for the ANN based RFFA model
for the training and validation data sets for Q20 (the plots for other ARIs can be seen in
Appendix B, Figures B.41 to B.45) Figure 7.12 shows the training state of the ANN based
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 121
RFFA model for Q20 using 20,000 epochs and Figure 7.13 shows the plot for validation of
results for Q20.
Figure 7.8 Plot comparing the CE values given by the training and validation data sets for the
ANN based RFFA model
Figure 7.9 Plot comparing the median Qpred/Qobs ratio values given by the training and validation
data sets for the ANN based RFFA model
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 122
Figure 7.10 Plot comparing the median RE (%) values given by the training and validation data
sets for the ANN based RFFA model
Figure 7.11 Regression plot comparing the training and validation of the ANN based RFFA
model for Q20
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 123
Figure 7.12 Plot showing the training state of the ANN based RFFA model for Q20
0 10 20 30 40 50 60 70 80 900
500
1000
1500
2000
2500
3000
3500
4000
4500
Test Catchments
NN
Outp
ut
Vs A
ctu
al
NN output
Actual
10
20
30
40
50
60
Figure 7.13 Plot between Qobs and Qpred for the ANN based RFFA model for the validation data
set
7.3.2 GAANN
In Table 7.5, the CE, median Qpred/Qobs ratio and median relative error values are compared
for the training and validation datasets for the GAANN based RFFA model. Figures 7.14,
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 124
7.15 and 7.16 compare the CE, median Qpred/Qobs ratio and median relative error values,
respectively for the GAANN based RFFA model. In terms of CE value, the best agreement
between the training and validation data sets is found for ARIs of 2, 5, 20 and 100 years, a
moderate degree of agreement is found for ARI of 10 years and a relatively poor agreement is
found for the ARI of 50 years (for this, the CE value is 0.38, which is remarkably low). With
respect to median Qpred/Qobs ratio value, the best agreement between the training and
validation data sets is found for ARIs of 10 and 100 years, a moderate agreement is noticed
for ARIs of 2 and 20 years and a poor agreement is found for ARIs of 5 and 50 years. For 50
years ARI, the validation data set shows a good Qpred/Qobs ratio value (0.95) as compared with
a very high value (1.52) for the training data set. This shows that a poor performance during
the training does not always give a poor performance in the validation. In relation to the
median relative error values, the best agreement between the training and validation data sets
is found for ARIs of 50 and 100 years, a moderate agreement for ARI of 20 years and a very
poor agreement for ARIs of 2, 5 and 10 years. In particular, the relative error values for the
validation data set are remarkably high compared with the training data set for ARIs of 2, 5
and 10 years. This shows that a good performance during model training does not guarantee a
similar good performance during validation.
Table 7.5 Comparison of training and validation results for the GAANN based RFFA
model
Training Validation
ARI
(years) CE
Qpred/Qobs ratio
(median)
RE (%)
(median) CE
Qpred/Qobs ratio
(median)
RE
(median)
2 0.76 1.22 40.92 0.72 1.08 65.13
5 0.79 1.20 39.31 0.75 0.89 61.48
10 0.76 1.02 41.01 0.63 0.98 72.56
20 0.76 1.11 40.29 0.71 0.93 48.19
50 0.57 1.52 60.00 0.38 0.95 55.93
100 0.63 1.18 45.28 0.65 1.17 47.08
Overall 0.71 1.21 44.47 0.64 1.00 58.40
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 125
Figure 7.14 Plot comparing the CE values given by the training and validation data sets for the
GAANN based RFFA model
Figure 7.15 Plot comparing the median Qpred/Qobs ratio values given by the training and
validation data sets for the GAANN based RFFA model
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 126
Figure 7.16 Plot comparing the median RE (%) values given by the training and validation data
sets for the GAANN based RFFA model
7.3.3 GEP
The CE, median Qpred/Qobs ratio and median relative error values for the GEP based RFFA
model are compared in Table 7.6 for the training and validation datasets. Figures 7.17, 7.18
and 7.19 compare the CE, median Qpred/Qobs ratio and median relative error values,
respectively for the GEP based RFFA model. In terms of CE value, the best agreement
between the training and validation data sets is found for ARIs of 5, 20 and 50 years, a
moderate degree of agreement is found for ARIs of 10 and 100 years and a relatively poor
agreement is found for the ARI of 2 years. The CE value for the validation data set for ARI of
2 years is quite low (0.49). With respect to median Qpred/Qobs ratio value, the best agreement
between the training and validation data sets is found for ARIs of 2, 5 and 10 years and a
moderate agreement is noticed for ARIs of 20, 50 and 100 years. In relation to median relative
error values, the best agreement between the training and validation data sets is found for
ARIs of 2, 5 and 10 years, a moderate agreement for ARIs of 20 and 100 years and a poor
agreement for ARI of 50 years. It should be noted that for 2 years ARI, both the training and
validation data sets exhibit a very high relative error value (73.3% and 69.38%).
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 127
Table 7.6 Comparison of training and validation results for the GEP based RFFA model
Training Validation
ARI
(years) CE
Qpred/Qobs ratio
(median)
RE (%)
(median) CE
Qpred/Qobs ratio
(median) RE (median)
2 0.69 0.99 73.30 0.49 1.07 69.38
5 0.72 1.08 43.91 0.67 1.10 44.95
10 0.73 1.08 43.25 0.56 1.04 42.08
20 0.65 1.17 54.61 0.67 0.89 47.61
50 0.61 1.45 54.22 0.63 1.05 37.87
100 0.57 1.39 54.82 0.67 1.02 44.47
Overall 0.66 1.19 54.02 0.61 1.03 44.47
Figure 7.17 Plot comparing the CE values given by the training and validation data sets for the
GEP based RFFA model
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 128
Figure 7.18 Plot comparing the median Qpred/Qobs ratio values given by the training and
validation data sets for the GEP based RFFA model
Figure 7.19 Plot comparing the median RE (%) values given by the training and validation data
sets for the GEP based RFFA model
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 129
7.3.4 CANFIS
The CE, median Qpred/Qobs ratio and median relative error values are compared in Table 7.8
for the training and validation datasets for the CANFIS based RFFA model. Figures 7.20,
7.21 and 7.22 compare the CE, median Qpred/Qobs ratio and median relative error values,
respectively for the CANFIS based RFFA model. In terms of CE value, the best agreement
between the training and validation data sets is found for ARIs of 20, 50 and 100 years, a
reasonable degree of agreement is found for ARIs of 5 and 10 years and a significant
disagreement is found for the ARI of 2 years where the CE value for the validation data set is
-0.09 which is much smaller than 0.64 (the CE value for the training data set). With respect to
median Qpred/Qobs ratio value, the performance for both the training and validation data sets is
found to be in the acceptable range for all the ARIs except for 2 years. For 2 years ARI, the
Qpred/Qobs ratio value is relatively high for both the training data set (1.76) and validation data
set (2.81). Similarly, in relation to median relative error value, the performance of 5 and 10
years ARI is found to be the best; however, the worst performance is observed in the case of 2
years ARI for the training data set. Moreover, the best performance in the case of validation
data set is found for 20 years ARI, followed by 100 years ARI. The 2 years ARI shows a
relatively high median relative error value for the validation data set (180.77%). These results
show that the CANFIS based RFFA model is poorly trained/calibrated for 2 years ARI.
Table 7.8 Comparison of training and validation results for the CANFIS based RFFA
model
Training Validation
ARI (years) CE Qpred/Qobs ratio
(median)
RE (%)
(median) CE
Qpred/Qobs ratio
(median) RE (median)
2 0.64 1.76 94.02 -0.09 2.81 180.77
5 0.67 0.99 43.55 0.54 0.95 48.92
10 0.75 0.87 45.27 0.67 0.79 51.97
20 0.73 1.26 46.07 0.72 1.18 34.48
50 0.53 1.04 71.94 0.55 0.93 59.20
100 0.62 1.36 55.89 0.59 1.31 42.63
Overall 0.66 1.21 59.46 0.50 1.33 69.66
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 130
Figure 7.20 Plot comparing the CE values given by the training and validation data sets for the
CANFIS based RFFA model
Figure 7.21 Plot comparing the median Qpred/Qobs ratio values given by the training and
validation data sets for the CANFIS based RFFA model
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 131
Figure 7.22 Plot comparing the median RE (%) values given by the training and validation data
sets for the CANFIS based RFFA model
7.4 Selection of the best performing artificial intelligence based
RFFA model based on training
The training of the four artificial intelligence based RFFA models have been presented in
Section 7.2 using 362 catchments. It has been found that none of the four models perform the
best in all the adopted assessment criteria over the six ARIs, which makes it difficult to select
the best trained/calibrated model. Based on the four different criteria as shown in Table 7.9,
the performances of the four models are assessed in a heuristic manner. In this assessment, a
model is ranked based on four different criteria as shown in Table 7.9. Four different ranks
are used, with a relative score ranging from 4 to 1. If a model is ranked 1 for a criterion, it
scores 4. For ranks of 2, 3 and 4, scores of 3, 2 and 1, respectively are assigned.
Table 7.9 shows that the ANN based RFFA model has the highest score of 15, followed by
the GANN with a score of 12. The GEP receives a score of 10, while the CANFIS receives
only 7 making it the least favourable model in terms of its performance during training. The
ANN based model is placed at rank 1 in the 3 out of 4 criteria. Hence, it is decided that the
ANN based RFFA model is the best performing artificial intelligence based model in terms of
training/calibration of the model.
Table 7.10 shows the ranking of the four artificial intelligence based RFFA models based on
the agreement between the training and validation using three criteria. Four different ranks are
used with a relative score ranging from 4 to 1 as mentioned earlier. It is found that the ANN
and GEP based RFFA models both score 9, followed by the GAAANN and CANFIS.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 132
Table 7.9 Ranking of the four artificial intelligence based RFFA models with respect to
training
Criterion Rank 1 Rank 2 Rank 3 Rank 4
Scatter plot of Qobs Vs Qpred ANN GANN CANFIS GEP
Median Qpred/Qobs ANN GEP GAANN/CANFIS #
Median RE ANN GAANN GEP CANFIS
Median CE GAANN ANN GEP/CANFIS #
Overall Score: ANN-15, GAANN-12, GEP-10, CANFIS-7
Table 7.10 Ranking of the four artificial intelligence based RFFA models with respect to
agreement between training and validation
Criterion Rank 1 Rank 2 Rank 3 Rank 4
Median
Qpred/Qobs
GEP
(Best agreement: Q2,
Q5, Q10, Q20
Moderate agreement:
Q50, Q100
Poor agreement: none)
ANN
(Best agreement:
Q2, Q10, Q100
Moderate
agreement: Q20, Q50
Poor agreement:
Q5)
CANFIS
(Best agreement: Q5,
Q10, Q20, Q50, Q100
Moderate agreement:
none
Very poor agreement:
Q2)
GAANN
(Best agreement: Q2,
Q10, Q20, Q100
Moderate agreement: Q5
Very poor agreement:
Q50)
Median RE
(%)
GEP
(Best agreement: Q2,
Q5, Q10, Q20, Q100
Moderate agreement:
Q50
Poor agreement: none)
ANN
(Best agreement:
Q5, Q100
Moderate
agreement: Q50, Q20
Poor agreement:
Q2, Q10)
GAANN
(Best agreement: Q50,
Q100
Moderate agreement:
Q20
Very poor agreement:
Q2, Q5, Q10)
CANFIS
(Best agreement: Q5,
Q10, Q20, Q50, Q100
Moderate agreement:
none
Significantly poor
agreement: Q2
Median CE
GAANN
(Best agreement: Q2,
Q5, Q20, Q100
Moderate agreement:
Q10
Poor agreement: Q50)
ANN
(Best agreement:
Q10, Q20, Q50
Moderate
agreement: Q2, Q5
Poor agreement:
Q100)
CANFIS
(Best agreement: Q10,
Q20, Q50, Q100
Moderate agreement:
Q5
Poor agreement: Q2)
GEP
(Best agreement: Q5,
Q20, Q50
Moderate agreement:
Q10, Q100
Poor agreement: Q2)
Overall Score: ANN-9, GEP-9, GAANN-7, CANFIS-5
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 133
Overall, ANN based RFFA model shows the best training/calibration and the CANFIS the
least favourable one.
7.5 Summary
In this chapter, four artificial intelligence based RFFA models (ANN, GAANN, GEP and
CANFIS) are developed. Some 80% (362 catchments) of the total 452 catchments are used to
train the model (training data set) and the remaining 20% (90 catchments) are used to validate
the model (validation data set). The selected artificial intelligence based models are basically
black box type models, which are trained/calibrated using the training data set, which involves
minimisation of the mean squared error between the observed and predicted flood quantiles
by the model (being trained) for a given ARI for the training data set. The artificial
intelligence based RFFA models are also evaluated based on four criteria: median Qpred/Qobs
ratio, plot of Qobs and Qpred, median relative error (RE) and coefficient of efficiency (CE).
This is initially done for the training data set and then repeated for the validation data set.
Models are ranked based on their relative performances in relation to these criteria to identify
the best trained/calibrated model.
It has been found that there is no model which performs the best for all the six ARIs over all
the adopted criteria. Overall, the ANN based RFFA model outperforms the three other models
(in terms of training/calibration). Hence, the ANN based RFFA model is the best calibrated
model.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 134
CHAPTER 8
VALIDATION OF ARTIFICIAL INTELLIGENCE
BASED RFFA MODELS
8.1 General
Chapter 6 has discussed the formation of regions and selection of the best performing region
for RFFA in eastern Australia using artificial intelligence based methods. Based on the
available data of 452 natural catchments in NSW, VIC, QLD and TAS, it has been found that
the best results in RFFA can be obtained when data from these states are combined to form
one region. Chapter 5 has discussed the selection of the best set of predictor variables for the
RFFA model development. It has been found that two predictor variables i.e., catchment area
(A) and design rainfall intensity (Itc_ARI) deliver the best results in RFFA for eastern Australia.
Chapter 8 has developed/trained the RFFA models based on four artificial intelligence based
methods which are ANN, GAANN, GEP and CANFIS using data from 362 catchments. This
chapter presents the validation of these four RFFA models based on 90 independent test
catchments. The results based on these four models are also compared with QRT based RFFA
model. This chapter initially presents results in relation to each of the above four artificial
intelligence based models followed by an inter-comparison of these methods. Finally, the best
performing artificial intelligence based RFFA model is compared with the QRT based RFFA
model.
8.2 Validation of RFFA models
8.2.1 ANN
Figure 8.1 compares the predictcted flood quantiles for the selected 90 test catchments from
the ANN based RFFA model with the observed flood quantiles for 20 years ARI (Q20). The
observed flood quantiles are estimated using an LP3 distribution and Bayesian parameter
estimation procedure as discussed in Chapter 4. It should be noted here that the observed
flood quantiles are not free from error; these are subject to data error (such as rating curve
extrapolation error), sampling error (due to limited record length of annual maximum flood
series data), error due to choice of flood frequency distribution and error due to selection of
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 135
parameter estimation method. This error undermines the usefulness of the validation statistics
(e.g. RE); however, this provides an indication of possible error of the developed RFFA
model as far as practical application of the RFFA model is concerned. The ratio Qpred/Qobs and
RE values are used for the assessment of models; however, the CE value is not very useful
here as the mean of observed flood quantile is not known.
Figure 8.1 shows a good agreement overall between the predicted and observed flood
quantiles; however, there is some over-estimations by the ANN based RFFA model when the
observed flood quantiles are smaller than about 50 m3/s. Most of the test catchments are
within a narrow range of variability from the 45-degree line except for a few outliers. The
plots of predicted and observed flood quantiles for other ARIs can be seen in Appendix B
(Figures B.22 to B.25). The results are very similar for ARIs of 2, 5, 10 and 20 years. Results
for ARIs of 50 and 100 years (Figures B.24 and B.25, respectively) exhibit some
overestimation by the ANN based RFFA model for smaller to medium discharges.
Figure 8.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model
for Q20
Figure 8.2 shows the boxplot of relative error (RE) values of the selected test catchments for
ANN based RFFA model for different flood quantiles. It can be seen from Figure 8.2 that the
median RE values (represented by the thick black lines within the boxes) are located very
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 136
close to the zero RE line (indicated by 0 – 0 horizontal line in Figure 8.2), in particular for
ARIs of 2, 5, 10 and 20 years. However, for ARIs of 50 and 100 years, the median RE values
are located above the zero line with ARI of 50 years showing the highest departure, which
indicates an overestimation by the ANN based RFFA model. Overall, the ANN based RFFA
model produces nearly unbaised estimates of flood quantiles as the median RE values match
with the zero RE line quite closely as can be seen in Figure 8.2.
In terms of the spread of the RE (represented by the width of the box), ARI of 50 and 100
years present the highest RE band and ARIs of 2 and 5 years present the smallest RE band,
followed by ARI of 20 years and 10 years. The RE bands for 50 and 100 years ARIs are
almost double to RE bands of 2 and 5 years ARIs. This implies that ANN based RFFA model
provides the most accurate flood quantile estimates for 2 and 5 years ARIs, and the least
accurate flood quantiles for ARIs of 50 and 100 years. Overall. the boxplot in Figure 8.2
shows that better results in terms of RE values are achieved for the smaller ARIs (i.e. 2, 5, 10
and 20 years ARIs) as compared to higher ARIs for the ANN based RFFA model. Some
outliers (evidenced by notable overestimation with a positive RE) can be seen for all the
ARIs, which may need to be examined more closely for data errors or issues regarding the
hydrology and physical characteristics of these catchments; if these catchments are deemed to
be genuine outliers they should be removed to enhance the ANN based RFFA model;
however, this has not been undertaken in this thesis.
10050201052
300
200
100
0
-100
-200
-300
ARI (years)
RE
(%
)
0
Figure 8.2 Boxplot of relative error (RE) values for ANN based RFFA model
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 137
Figure 8.3 shows the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments
for ANN- based RFFA model for different ARIs. The median Qobs/Qpred ratio values
(represented by the thick black lines within the boxes) are located closer to 1 – 1 line (the
horizontal line in Figure 8.3), in particular for ARIs of 2, 5, 10 and 20 years. However, for
ARI of 50 years (and to a lesser degree for ARI of 100 years), the median Qobs/Qpred ratio
value is clearly located above the 1 – 1 line. These results indicate that the ANN based RFFA
model generally provides reasonably accurate flood quantiles with the expected Qobs/Qpred
ratio value very close to 1.00, although there is a noticeable overestimation for ARI of 50
years and 100 years. In terms of the spread of the Qobs/Qpred ratio values, ARI of 2 and 5 years
provide the lowest spread followed by ARIs of 20, 10, 100 and 50 years.
Considering, the RE and Qobs/Qpred ratio values as discussed above, it can be concluded that
ANN based RFFA model generally provide unbiased flood estimates for smaller to medium
ARIs (2 to 20 years); however, the model slightly overestimates the observed flood quantiles
for higher ARIs (50 to 100 years).
10050201052
3
2
1
0
-1
-2
-3
ARI (years)
Rati
o (
Qp
red
/Qo
bs) 1
Figure 8.3 Boxplot of Qpred/Qobs ratio values for ANN based RFFA model
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 138
8.2.2 GAANN
Figures 8.4, 8.5 and 8.6 show the validation results for GAANN based RFFA model. Figure
8.4 shows the plot of predicted flood quantiles by the GAANN based RFFA model and the
observed flood quantiles for 20 years ARI. Figure 8.4 shows a greater scatter than Figure 8.1
(which represents ANN based RFFA model); in particular, there is an underestimation of the
flood quantiles by the GAANN based RFFA model for few test catchments. Overall, the
scatter around the 45-degree line in Figure 8.4 is deemed reasonable for most of the test
catchments. The plots of predicted and observed flood quantiles for other ARIs can be seen in
Appendix B (Figures B.26 to B.30). The results are very similar for ARIs of 2, 5, 10 and 20
years. Results for ARIs of 50 and 100 years (Figures B.29 and B.30, respectively) exhibit
relatively better results by the GAANN based RFFA model, in particular for the higher
discharges.
Figure 8.4 Comparison of observed and predicted flood quantiles for GAANN based RFFA
model for Q20
Figure 8.5 shows the boxplot of RE (%) values for the GAANN based RFFA model. The
median RE values (represented by the black line within the boxes) match with the 0 – 0 line
very well for ARI of 10 years and reasonably well for ARIs of 2, 20 and 50 years. For ARIs
of 5 and 100 years, a noticeable underestimation and overestimation are provided by the
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 139
GAANN based RFFA model. In terms of the RE band (represented by the spread of the box),
ARI of 20 years shows the lowest spread followed by ARIs of 2, 5, 10, 50 and 100 years. The
RE band for 100 years ARI is about double to ARIs of 2 and 20 years. These results show that
in terms of RE, the best result overall is achieved for 20 years ARI for the GAANN based
RFFA model. Similar to ANN based RFFA model, the performance of GAANN based RFFA
model is relatively poor for the higher ARIs (i.e. 50 to 100 years). This is not unexpected as
estimation of flood quantiles for higher ARIs are associated with a greater degree of
uncertainty (e.g. Haddad and Rahman, 2012; Rahman et al., 2011).
10050201052
300
200
100
0
-100
-200
-300
ARI (years)
RE
(%
)
0
Figure 8.5 Boxplot of relative error (RE) values for GAANN based RFFA model
Figure 8.6 presents the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments
for the GAANN based RFFA model for different ARIs. It is found that the median Qobs/Qpred
ratio values (represented by the thick black lines within the boxes) are located closer to 1 – 1
line (the horizontal line in Figure 8.6), in particular for ARIs of 2, 10, 20 and 50 years (the
best agreement is for ARI of 10 years). However, for ARI of 5 years, the median Qobs/Qpred
ratio value is located a short distance below the 1 – 1 line and for ARI of 100 years, the
median Qobs/Qpred ratio value is located a short distance above the 1 – 1 line. These results
indicate a noticeable overall underestimation and overestimation of the predicted flood
quantiles by the GAANN based RFFA model for 5 years and 100 years ARI. In terms of the
spread of the Qobs/Qpred ratio values, ARI of 20 years exhibits the lowest spread followed by
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 140
ARIs of 5, 2, 50, 10 and 100 years. Furthermore, the spreads of the Qobs/Qpred ratio values for
10 and 100 years are very similar, which are remarkably larger than 2, 5 and 20 years.
10050201052
3
2
1
0
-1
-2
-3
ARI (years)
Rati
o (
Qp
red
/Qo
bs) 1
Figure 8.6 Boxplot of Qpred/Qobs ratio values for GAANN based RFFA model
8.2.3 GEP
Figure 8.7 compares the predictcted flood quantiles for the selected 90 test catchments by the
GEP based RFFA model with the observed flood quantiles for 20 years ARI (Q20). Figure 8.7
generally presents a good agreement between the predicted and observed flood quantiles;
however, there is some over-estimations by the GEP based RFFA model when the observed
flood quantiles are smaller than about 100 m3/s. Most of the test catchments are within a
narrow range of variability from the 45-degree line except for a few outliers. The plots of
predicted and observed flood quantiles for other ARIs were found to be very similar to the 20
years ARI. The plots of predicted and observed flood quantiles for other ARIs can be seen in
Appendix B (Figures B.31 to B.35). The results are very similar for ARIs of 2, 5, 10 and 20
years. Results for ARIs of 50 and 100 years (Figures B.34 and B.35, respectively) exhibit
some overestimation by the GEP based RFFA model for smaller to medium discharges
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 141
Figure 8.7 Comparison of observed and predicted flood quantiles for GEP based RFFA model
for Q20
Figure 8.8 shows the boxplot of relative error (RE) values of the selected test catchments for
GEP based RFFA model for different flood quantiles. It can be seen from Figure 8.8 that the
median RE values (represented by the thick black lines within the boxes) are located very
close to the zero RE line (indicated by 0 – 0 horizontal line in Figure 8.8), in particular for
ARIs of 2 and 10 years. However, for ARIs of 20, 50 and 100 years, the median RE values are
located above the zero line with ARI of 100 years showing the highest departure, which
indicates an overestimation by the GEP based RFFA model. Overall, the GEP based RFFA
model shows some overestimation bias in flood quantiles estiamtes for higher ARIs.
In terms of the spread of the RE (represented by the width of the box), ARI of 20, 50 and 100
years present the highest RE band and ARIs of 5 and 10 years present the smallest RE band,
followed by ARI of 2 years. The RE bands for 20, 50 and 100 years ARIs are almost double
to RE bands of 5 and 10 years ARIs. This implies that GEP based RFFA model provides the
most accurate flood quantile estimates for 5 and 10 years ARIs, and the least accurate flood
quantiles for ARIs of 20, 50 and 100 years. Overall, the boxplot in Figure 8.8 shows that
better results in terms of RE values are achieved for the smaller ARIs (i.e. 2, 5 and 10 years
ARIs) as compared to higher ARIs. Some outliers (evidenced by notable overestimation with
a positive RE) can be seen for all the ARIs, which may need to be examined more closely for
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 142
data errors or issues regarding the hydrology and physical characteristics of these catchments;
if these catchments are deemed to be genuine outliers they should be removed to enhance the
GEP based RFFA model; however, this has not been undertaken in this thesis.
10050201052
300
200
100
0
-100
-200
-300
ARI (years)
RE
(%
)
0
Figure 8.8 Boxplot of relative error (RE) values for GEP based RFFA model
Figure 8.9 shows the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments
for GEP based RFFA model for different ARIs. The median Qobs/Qpred ratio values
(represented by the thick black lines within the boxes) are located closer to 1 – 1 line (the
horizontal line in Figure 8.9), in particular for ARIs of 2, 5 and 10 years. However, for ARI of
20, 50 and 100 years the median Qobs/Qpred ratio value is clearly located above the 1 – 1 line.
These results indicate that the CANFIS based RFFA model generally provides reasonably
accurate flood quantiles with the expected Qobs/Qpred ratio value very close to 1.00 for smaller
ARIs. However; there is a noticeable overestimation for ARI of 20, 50 and 100 years. In terms
of the spread of the Qobs/Qpred ratio values, ARI of 5 and 10 years provide the lowest spread
followed by ARIs of 2, 20, 50 and 100 years.
Considering, the RE and Qobs/Qpred ratio values as discussed above, it can be concluded that
CANFIS based RFFA model generally provide unbiased flood estimates for smaller to
medium ARIs (5 and 10 years); however, the model slightly overestimates the observed flood
quantiles for higher ARIs (50 to 100 years) and a a slight underestimation for 2 years ARI.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 143
Some outliers can be seen in the case of higher ARIs (e.g. 100 years), which may need to be
looked at more closely for data errors or issues regarding the hydrology of the catchment, if
deemed to be genuine outliers they should be removed from the model which however has not
been done in this thesis.
Q100Q50Q20Q10Q5Q2
4
3
2
1
0
-1
-2
-3
-4
ARI (years)
Rati
o (
Qp
red
/Qo
bs)
1
Figure 8.9 Boxplot of Qpred/Qobs ratio values for GEP based RFFA model
8.2.4 CANFIS
Figures 8.10, 8.11 and 8.12 show the validation results for CANFIS based RFFA model.
Figure 8.10 shows the plot of predicted flood quantiles by the CANFIS based RFFA model
and the observed flood quantiles for 20 years ARI. Figure 8.10 shows a greater scatter than
Figure 8.1 (which represents ANN based RFFA model) for flood events smaller than about
100 m3/sec (Qobs) in particular, there is an overestimation of the flood quantiles by the
CANFIS based RFFA model for the test catchments with Qobs values smaller than 100 m3/sec.
Overall, the scatter around the 45-degree line in Figure 8.10 is deemed reasonable for most of
the test catchments with Qobs values greater than 100 m3/sec. The plots of predicted and
observed flood quantiles for other ARIs can be seen in Appendix B (Figures B.36 to B.40).
The result for 2 years ARI is quite poor as can be seen in Figure B.36, with significant
overestimation by the CANFIS based RFFA model. The results for ARIs of 5, 10 and 20
years are very similar. Results for ARIs of 50 and 100 years (Figures B.39 and B.40,
respectively) exhibit some overestimation by the ANN based RFFA model for smaller to
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 144
medium discharges. For ARI of 50 years (Figure B.39), there is noticeable scatter at smaller
discharges.
Figure 8.10 Comparison of observed and predicted flood quantiles for CANFIS based RFFA
model for Q20
Figure 8.11 shows the boxplot of RE (%) values for the CANFIS based RFFA model. The
median RE values (represented by the black line within the boxes) match with the 0 – 0 line
very well for ARI of 5 and 50 years and reasonably well for ARIs of 20 years. For ARIs of 2
and 100 years, a noticeable overestimation is provided by the CANFIS based RFFA model. In
terms of the RE band (represented by the spread of the box), ARI of 5, 10 and 20 years shows
the lowest spread followed by ARIs of 50, 100 and 2 years. The RE band for 100 years ARI is
about double to ARIs of 5 and 10 years. The RE band for 2 years ARI is about four times
compared with ARIs of 5 and 10 years. These results show that in terms of RE, the best result
overall is achieved for 10 years ARI for the CANFIS based RFFA model.
Figure 8.12 presents the boxplot of the Qpred/Qobs ratio values of the selected 90 test
catchments for the CANFIS based RFFA model for different ARIs. It is found that the median
Qobs/Qpred ratio values (represented by the thick black lines within the boxes) are located
closer to 1 – 1 line (the horizontal line in Figure 8.12), in particular for ARIs of 2, 5, 10 and
20 years (the best agreement is for ARI of 10 years). However, for ARI of 50 and 100 years,
the median Qobs/Qpred ratio value is located a short distance above the 1 – 1 line. These results
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 145
indicate a noticeable overall overestimation of the predicted flood quantiles by the CANFIS
based RFFA model for 50 years and 100 years ARI. In terms of the spread of the Qobs/Qpred
ratio values, ARI of 2 and 5 years exhibits the lowest spread followed by ARIs of 20, 10, 100
and 50 years. Furthermore, the spreads of the Qobs/Qpred ratio values for 50 and 100 years are
very similar, which are remarkably larger than 2, 5 and 20 years.
10050201052
400
300
200
100
0
-100
-200
-300
ARI (years)
RE
(%
)
0
Figure 8.11 Boxplot of relative error (RE) values for CANFIS based RFFA model
8.3 Comparison of RFFA models based on validation data set
For selecting the best performing RFFA model, it is important to compare the results of these
models for independent test catchments. The following sub-sections compare the four
artificial intelligence based RFFA models based on Qpred/Qobs ratio, RE and CE values.
8.3.1 Median Qpred/Qobs ratio
Table 8.1 summarises the median Qpred/Qobs ratio values for the four different RFFA models.
For the ANN, the median Qpred/Qobs ratio values range from 0.99 to 1.14. For Q5 the median
Qpred/Qobs ratio value is 0.99, which indicates a small under-estimation by the ANN based
model. Also, for this model, Q50 and Q100 show over-estimation with median Qpred/Qobs ratio
values of 1.14 and 1.10, respectively. The best result is obtained for Q10 with a median
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 146
Qpred/Qobs ratio value of 1.02. In summary, the ANN based model shows a good median
Qpred/Qobs ratio value over all the ARIs (1.06) (at rank 3 among all the four models) and also
consistent values of median Qpred/Qobs ratio values for ARIs of 2, 5, 10 and 20 years.
10050201052
3
2
1
0
-1
-2
-3
ARI (years)
Rati
o (
Qp
red
/Qo
bs) 1
Figure 8.12 Boxplot of Qpred/Qobs ratio values for CANFIS based RFFA model
In case of the GAANN based RFFA model, the median Qpred/Qobs ratio values range from
0.89 (Q5) to 1.17 (Q100); all the median Qpred/Qobs ratio values seem to be within acceptable
range except for Q5, which is 0.89 indicating an underestimation by 11%. Similar to the ANN
based model, the best GAANN model is found for Q10 in terms of median Qpred/Qobs ratio
value. The GAANN based RFFA model provides an overall median Qpred/Qobs ratio value of
1, which is at rank 1 among the four models, but for the individual ARIs, lesser consistency
can be seen compared with the ANN based model.
In case of the GEP based RFFA model, all the flood quantiles seem to be performing well in
terms of median Qpred/Qobs ratio value except for Q20 and Q5 where 11% underestimation and
10% overestimation, respectively can be seen. The best median Qpred/Qobs ratio value (1.02) is
achieved for Q100 for the GEP based model followed by Q10 (1.4) and Q50 (1.05). These
results show that the GEP based model provides better results for higher ARIs (in particular
for 100 years ARI) as compared with all the three other models. The overall median ratio
value for all ARIs is 1.03 which is at rank 2 among the four models.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 147
For the CANFIS based RFFA model, the best results in terms of median Qpred/Qobs ratio value
are obtained in case of Q5 (0.95) followed by Q50 (0.93). This model performs poorly for very
small and very high ARIs, however; for medium ARIs the performance of this model is quite
good. Overall, CANFIS based model provides median ratio values in the range of 0.79 to 2.81
that shows the highest degree of fluctuation among the four models. The overall median
Qpred/Qobs ratio value for the CANFIS model is 1.33, which is at rank 4 among the four
models.
Figure 8.13 plots the median Qpred/Qobs ratio values of all the four artificial intelligence based
RFFA models. It can be seen that in terms of consistency, the GEP based model is at rank 1
and ANN based model is at rank 2. The CANFIS based model is the poorest where the degree
of fluctuation among the ARIs is the highest.
Table 8.1 Median Qpred/Qobs ratio values for the four artificial intelligence based RFFA
models Median ratio (Qpred/Qobs)
ARI (years) ANN GAANN GEP CANFIS
2 1.04 1.08 1.07 2.81
5 0.99 0.89 1.10 0.95
10 1.02 0.98 1.04 0.79
20 1.04 0.93 0.89 1.18
50 1.14 0.95 1.05 0.93
100 1.10 1.17 1.02 1.31
Overall 1.06 1.00 1.03 1.33
8.3.2 Median RE (%)
Table 8.2 summarises the median RE (%) values of the ANN, GAANN, GEP and CANFIS
based RFFA models. The median RE values are calculated based on the absolute RE values of
the individual test catchments. In case of ANN, median RE values range from 35.62% to
44.63%. The smallest and highest median RE values are found for ARIs of 20 and 100 years,
respectively. The ANN model shows the smallest median RE values for ARIs of 2, 5 and 20
years among the four models. The ANN based RFFA model shows an overall median RE
value of 40.3%, which places it at rank 1 among the four RFFA models.
For the GAANN based RFFA model, higher median RE values can be observed for smaller
ARIs (2 to 10 years) whereas a better performance can be seen in case of higher ARIs. The
best value is obtained in case of Q100 (47.08%) whereas, the highest median RE (%) value is
obtained for Q10 (72.56%). The overall median RE values over all the 6 ARIs for the GAANN
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 148
based model is found to be 58.4% (Table 8.2), which places it at rank 3 among the four RFFA
models.
0
0.5
1
1.5
2
2.5
3
2 5 10 20 50 100
Med
ian R
ati
o (
Qpre
d/Q
obs)
ARI (years)
ANN
GAANN
GEP
CANFIS
Figure 8.13 Plot of median Qpred/Qobs ratio values for the four artificial intelligence based RFFA
models In case of the GEP model, median RE values range from 37.87% (Q50) to 69.38% (Q2). The
GEP based RFFA model seems to be performing well for higher ARIs. For 2 years ARI, it
performs very poorly. The GEP model shows the smallest median RE values for ARIs of 10
and 50 years among the four models. The overall median RE values over all the 6 ARIs for
the GEP based model is found to be 44.47% (Table 8.2), which places it at rank 2 among the
four RFFA models.
The CANFIS based RFFA model shows median RE values in the range of 34.48% (Q20) to
180.77% (Q2). The CANFIS model shows the smallest median RE values for ARIs of 20 and
100 years among the four models. The overall median RE values over all the 6 ARIs for the
CANFIS based model is found to be 69.66% (Table 8.2), which places it at rank 4 among the
four RFFA models.
Figure 8.14 plots the median RE values of all the four artificial intelligence based RFFA
models. It shows that the ANN based model shows the smallest degree of fluctuation in the
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 149
median RE values over all the six ARIs. The GAANN and GEP models show a similar degree
of fluctuation and the CANFIS shows the highest degree of fluctuation.
Table 8.2 Median RE (%) values for the four artificial intelligence based RFFA models
Median RE (%)
ARI (years) ANN GAANN GEP CANFIS
2 37.56 65.13 69.38 180.77
5 40.39 61.48 44.95 48.92
10 44.63 72.56 42.08 51.97
20 35.62 48.19 47.61 34.48
50 39.09 55.93 37.87 59.20
100 44.53 47.08 44.47 42.63
Overall 40.30 58.40 44.47 69.66
8.3.3 Median CE
Table 8.3 depicts the summary of median CE values of the ANN, GAANN, GEP and
CANFIS based RFFA models. In case of ANN based RFFA model, the median CE values
range from 0.40 (Q100) to 0.69 (Q2 and Q20). Overall ANN based model shows a consistency
except for Q100. The best results are obtained in the cases of Q2, Q20 and Q50. The ANN model
shows the highest median CE value for 50 years ARI among all the four models. In terms of
overall median CE value, the ANN is placed at rank 2 (jointly with the GEP model).
The GAANN based model shows median CE values in the range of 0.38 (Q50) to 0.75 (Q5).
The GAANN model shows the highest median CE values for ARI of 2 and 5 years. In terms
of overall median CE value, the GAANN is placed at rank 1 among the four models.
The GEP based model shows median CE values in the range of 0.49 (Q2) to 0.67 (Q5, Q20,
Q100). The GEP model shows the highest median CE values for ARI of 100 years. In terms of
overall median CE value, the GEP model is at rank 2 (jointly with the ANN based model).
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 150
Figure 8.14 Plot of median RE (%) values for the four artificial intelligence based RFFA models
The CANFIS based model provides poor results for Q2, with a negative median CE value. For
the other ARIs, the median CE values are in the range of 0.50 to 0.72. The CANFIS model
shows the highest median CE values for ARIs of 10 and 20 years. In terms of overall median
CE value, the CANFIS model is at rank 4 among the four models.
Figure 8.15 plots the median CE values of all the four artificial intelligence based RFFA
models. This plot shows that the lowest degree of fluctuation in the median CE values is
demonstrated by the GEP model followed by the ANN based model and the highest degree of
fluctuation is provided by the CANFIS model.
Table 8.3 Median CE values of the four artificial intelligence based RFFA models
Median CE values
ARI (years) ANN GAANN GEP CANFIS
2 0.69 0.72 0.49 -0.09
5 0.59 0.75 0.67 0.54
10 0.63 0.63 0.56 0.67
20 0.69 0.71 0.67 0.72
50 0.68 0.38 0.63 0.55
100 0.40 0.65 0.67 0.59
Overall 0.61 0.64 0.61 0.50
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 151
Figure 8.15 Plot of median CE values for the four artificial intelligence based RFFA models
8.3.5 Comparison of RFFA models based on RE (%) ranges
When comparing different RFFA models, it is important to observe how many test
catchments fall within specified ranges of RE. For this purpose, RE (%) values (considering
its sign), are grouped into four classes as shown in Table 8.4. The selected arbitrary ranges of
RE (%) are (-10 to 10), (-20 to 20), (-50 to 50) and (-100 to 100).
In the range of -10 to 10 of RE (%), the ANN is placed at rank 1 with 22% of the 90 test
catchments falling in this range, followed by CANFIS (14%) and GAANN (12%). However;
in case of GEP, only 9 test catchments fall in this range, which is 10%.
In case of -20 to 20 of RE (%), a total of 32 (35%) test catchments fall under this category for
the ANN based model. Some 27% (25) of the test catchments fall in the range of -20 to 20 in
case of CANFIS based RFFA model, which is higher than the GEP (25%) and GAANN
(22%) based models. In this case, the ANN based model is placed again at rank 1 and the
GAANN at rank 4 among the four models.
In the range of -50 to 50 of RE (%), the CANFIS is found to be placed at rank 1 with 61% of
the test catchments fall in this range, which is very closely followed by the ANN model where
60% of the test catchments fall in this range. The GEP is placed at rank 3 among the four
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 152
models, with 55% of the test catchments falling in this range followed by GAANN to be
ranked at 4 with 47 of the test catchments falling in this category which is 52% of the total
catchments.
In the range of -100 to 100 of RE (%), the ANN based RFFA model is again placed at rank 1
with 92% of the catchments falling in this range. This is followed by the CANFIS (77%),
GAANN (76%) and 63 test catchments of GEP based RFFA model which is 70% of the test
catchments.
Overall, the ANN based model outperforms the three other models in terms of the
distributions of RE (%) values, which is followed by the CANFIS based model.
Table 8.4 Grouping of 90 test catchments based on RE (%) ranges for the four artificial
intelligence based RFFA models
Models (-10 to 10) (-20 to 20) (-50 to 50) (-100 to 100)
ANN 20 32 54 83
% of test catchments 22 35 60 92
GAANN 11 20 47 69
% of test catchments 12 22 52 76
GEP 9 22 49 63
% of test catchments 10 25 55 70
CANFIS 13 25 55 70
% of test catchments 14 27 61 77
8.3.6 Selection of the best performing artificial intelligence based RFFA
model
The four artificial intelligence based RFFA models have been compared in Section 8.2 and
Sections 8.3.1 to 8.3.5 based on the results from application of these models to 90 test
catchments. It has been found that none of the four models perform the best in all the
assessment criteria, which makes it difficult to select the best model. Based on seven different
criteria as shown in Table 8.5, the performances of the four models are assessed in a heuristic
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 153
manner. In this assessment, a model is ranked based on seven different criteria as shown in
Table 8.5. Four different ranks are used with a relative score ranging 4 to 1. If a model is
ranked 1 for a criterion, it scores 4. For ranks 2, 3 and 4, scores of 3, 2 and 1, respectively are
used. Table 8.5 shows that the ANN based RFFA model has the highest score of 25, followed
by the GAANN with a score of 19. The GEP receives a score of 17, while the CANFIS
receives only 10 making it the least favourable model. The ANN based model is placed at
rank 1 in 5 out of 7 criteria. Hence, it is reasonable to conclude that the ANN based RFFA
model is the best performing artificial intelligence based model for eastern Australia.
Table 8.5 Ranking of the four artificial intelligence based RFFA models for eastern
Australia
Criteria Rank 1 Rank 2 Rank 3 Rank 4
Scatter plot of Qobs Vs Qpred ANN GEP GAANN CANFIS
Box plot of RE ANN GEP GAANN CANFIS
Box plot of Qpred/Qobs ANN GAANN CANFIS GEP
Median Qpred/Qobs GAANN GEP ANN CANFIS
Median RE ANN GEP GAANN CANFIS
Median CE GAANN ANN/GEP # CANFIS
RE (%) ranges ANN CANFIS GAANN GEP
Overall Scoring: ANN: 25, GAANN: 19, GEP: 17, CANFIS: 10
8.4 Performance of the finally selected artificial intelligence based
RFFA model
This section further evaluates the performance of the best performing artificial intelligence
based RFFA model, which is the ANN based RFFA model. Here, the spatial distributions of
the relative error (RE) values of the ANN based RFFA model for the 90 test catchments are
evaluated. Secondly, relation between the RE and catchment area is investigated.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 154
8.4.1 Spatial distribution of RE (%) of the ANN based RFFA model
Figure 8.16 shows the spatial distribution of RE values across NSW. Most of the test
catchments fall in the eastern part of the NSW since not many catchments qualified from the
western NSW in the study data set. Overall, the catchments near the north-eastern NSW are
found to be exhibiting smaller RE values. Most importantly, Figure 8.16 does not show any
notable spatial pattern and in general test catchments with higher RE values are surrounded by
catchments with relatively small RE values.
Figure 8.16 Spatial distribution of RE of ANN based model across NSW
Figure 8.17 shows the distribution of RE values across the state of Victoria. Similar to NSW
there is no noticeable spatial trend of the RE values across the state. Figures 8.18, 8.19 and
8.20 show the spatial distribution of RE values across QLD. Figure 8.18 shows the RE values
across northern and northeastern parts of QLD. Generally, the catchments in this part of QLD
show a relatively small RE values. Figure 8.19 shows the catchments in the southern and
southeastern parts of QLD. Most of the test catchments fall near the coastal area of QLD. The
catchments close to NSW and QLD border are found to be exhibiting better results with RE
values quite small. Figure 8.20 shows a full view of the spatial distribution of RE values
across QLD, which shows that there is no noticeable spatial trend in the RE values for QLD.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 155
Figure 8.21 shows the spatial distribution of RE values across the state of TAS. Most of the
test catchments in TAS fall in the middle of TAS and away from coastal regions of TAS. No
spatial trend is observed in the RE values over TAS.
It should also be noted that there are some outlier catchments where RE values are quite high;
these catchments may need further investigation, which however is not undertaken in this
thesis.
Figure 8.17 Spatial distribution of RE of ANN based model across VIC
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 156
Figure 8.18 Spatial distribution of RE of ANN based model across North QLD
Figure 8.19 Spatial distribution of RE of ANN based model across Southeast QLD
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 157
Figure 8.20 Spatial distribution of RE of ANN based model across QLD
Figure 8.21 Spatial distribution of RE of ANN based model across TAS
8.4.2 Catchment area vs RE
Figure 8.22 shows a plot between RE values and the area of the test catchments. Catchments
with areas in the range of 1 to 200 km2 fall within minimum RE group. In the range of 200 to
400 km2, most catchments show smaller RE values except two outliers where RE values are
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 158
greater than 500%. Of importance, there is no statistically significant relationship between RE
and catchment area as the coefficient of determination (R2) of the fitted regression line in
Figure 8.22 is only 6%.
8.5 Comparison with QRT
Finally, the ANN based RFFA model is compared with the QRT based models. Here the same
dataset are used for building and testing the ANN and QRT models. Based on the median
Qpred/Qobs ratio values as shown in Table 8.6, the ANN based RFFA model shows median
Qpred/Qobs ratio values closer to 1.00 compared with the QRT model for all the 6 ARIs.
Similarly, as shown in Table 8.7, the ANN based RFFA model shows a smaller median RE
values than the QRT model for all the ARIs. Furthermore, in Table 8.8, the ANN based RFFA
model outperforms the QRT models with respect to CE values. These results demonstrate that
ANN based RFFA model outperform the QRT model considering all the three evaluation
statistics. It should be noted here that the median RE values for the best ANN based RFFA
model developed here range from 35% to 44% (with few cases where RE > 100%), which is
typical with Australian regional flood estimation methods (e.g., see Haddad et al., 2011;
Haddad and Rahman, 2012). Since RE is independent of catchment area, the model can be
applied to smaller as well as larger catchments up to 1000 km2.
Figure 8.22 Plot between catchment area and RE (%) values for ANN based RFFA model for 90
test catchments
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 159
Table 8.6 Median Qpred/Qobs ratio values for seven ANN based candidate regions and
QRT
Flood quantile Median ratio (Qpred/Qobs)
ANN QRT
Q2 1.04 1.15
Q5 0.99 1.06
Q10 1.02 1.35
Q20 1.04 1.13
Q50 1.14 1.19
Q100 1.10 1.28
Table 8.7 Median relative error values (%) for seven ANN based candidate regions and QRT
Flood quantile Median RE (%)
ANN QRT
Q2 37.56 65.38
Q5 40.39 45.35
Q10 44.63 57.62
Q20 35.62 42.64
Q50 39.09 48.71
Q100 44.53 51.72
Table 8.8 Coefficient of efficiency (CE) values for seven ANN based candidate regions
and QRT Flood quantile CE
ANN QRT
Q2 0.73 0.35
Q5 0.61 0.37
Q10 0.63 0.30
Q20 0.71 0.37
Q50 0.68 -8.42
Q100 0.52 0.38
8.6 Summary
In this chapter, four artificial intelligence based RFFA models which are ANN, GAANN,
GEP and CANFIS have been validated based on 90 independent test catchments. It has been
found that there is no model which performs the best for all the six ARIs and for all the seven
criteria (Table 8.17). It has been found that the ANN based RFFA model is the best
performing model among the four artificial intelligence based RFFA models. The ANN based
RFFA model is found to outperform the ordinary least squares based quantile regression
technique.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 160
The median relative error values for the finally selected ANN based RFFA model ranges 35%
to 44%, which is slightly higher than the GLS regression based region-of-influence approach
(parameter regression technique) reported by Haddad and Rahman (2012) (relative error
ranges 29% to 45%). However, these relative error values by both the techniques are within
the expected error/variability of RFFA models, which is dependent on at-site flood frequency
analysis estimates (that has a high degree of sampling variability).
The ANN based RFFA model shows that there is no noticeable spatial trend in the relative
error values across four states in eastern Australia. Furthermore, the relative error values are
independent of catchment area.
There are few catchments where the ANN based RFFA model shows relatively high relative
error values (similar to the results by Haddad and Rahman, 2012). These catchments may
need further investigation, which however is not undertaken in this thesis.
To enhance the accuracy of regional flood estimation methods in eastern Australia, a larger
data set with longer streamflow record lengths would be needed as Australia is characterised
by a highly variable hydrology/flood regime. It is expected that the availability of such a
larger data in future would enhance the accuracy of artificial intelligence based RFFA models
in eastern Australia.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 161
CHAPTER 9
SUMMARY, CONCLUSIONS AND
RECOMMENDATIONS
9.1 General
This thesis has focused on the development and testing of non-linear artificial intelligence
based regional flood frequency analysis (RFFA) models. For this purpose, a database of 452
small to medium sized catchments from eastern Australia has been used. Four different
artificial intelligence based RFFA models have been considered in this research. These non-
linear RFFA models have also been compared with the linear ordinary least squares based
regression model. This chapter presents a summary of the research undertaken in this thesis,
conclusions and recommendations for further study.
9.2 Summary of the research undertaken in this thesis
Selection of study catchments and data preparation: This research selects eastern Australia
as the study area since it has the highest density of stream gauging stations in Australia. A
total of 452 catchments were selected from the study area that consist of 96 catchments from
New South Wales and Australian Capital Territory, 131 catchments from Victoria, 172
catchments from Queensland and 53 catchments from Tasmania. The geographical locations
of the selected 452 catchments can be seen in Figure 4.19. These catchments are not affected
by major regulation and land use changes. These are small to medium-sized catchments, with
catchment areas in the range of 1.3 to 1900 km2 (mean: 329.4 km2). The annual maximum
flood series of the selected stations were prepared by adopting standard procedure (e.g. by
filling gaps in the data and by checking for rating curve error and trends). The annual
maximum flood record lengths of the selected stations range from 25 to 75 years (mean: 33
years). For each of the selected stations, at-site flood frequency analysis was carried out using
FLIKE software (Kuczera, 1999). The detected low flows were censored using in-built
facility in the FLIKE. A LP3 distribution with the Bayesian parameter estimation procedure
was adopted to estimate flood quantiles for six average recurrence intervals (i.e. 2, 5, 10, 20,
50 and 100 years). These flood quantiles were used as dependent/target variables in the
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 162
development of models using linear and non-linear methods. For each of the selected
catchments, data for five catchment characteristics were abstracted, which are catchment area,
mean annual areal evapo-transpiration, mean annual rainfall, main stream slope and design
rainfall intensity. The summary of these catchment characteristics data can be seen in Table
4.1.
Selection of predictor variables: From the selected five candidate catchment characteristics
variables, eight different combinations were formed. Each of these combinations contained
catchment area and design rainfall intensity and combinations of the remaining three predictor
variables (mean annual areal evapo-transpiration, mean annual rainfall and main stream slope)
as can be seen in Table 5.2. Two artificial intelligence based RFFA techniques (ANN and
GEP) were then used to develop prediction equations. From the selected 452 catchments, 90
catchments were selected randomly as test catchments, the remaining 362 catchments were
used to develop models. Models were assessed based on ratio between predicted and observed
flood quantiles, percent relative error and coefficient of efficiency. Based on the independent
testing, it was found that the ANN and GEP based RFFA model with only two predictor
variables (catchment area and design rainfall intensity) outperformed other models with a
greater number of predictor variables. This model would be easier to apply in practice as
these two predictor variables can be obtained relatively easily from the published maps
and government websites. In the subsequent analyses, these two predictor variables
(catchment area and design rainfall intensity) were used.
Formation of regions: From the selected 452 catchments covering four eastern Australian
states, different regions/groupings were formed. In the first step, regions were formed on the
basis of state/geographical and climatic boundaries. Here, seven different regions were
considered as can be seen in Table 6.1. In the second step, the regions were formed in the
catchment characteristics data space based on cluster analysis and principal component
analysis. Here, two regions were formed based on cluster analysis and four regions were
formed based on principal component analysis. It was found that that K-Means cluster
analysis generated the best performing groupings in the catchment characteristics data space.
When compared with the geographical regions, some state-based regions performed more
poorly than the K-Means cluster groupings. Overall, the best ANN-based RFFA model was
achieved when all the data set of 452 catchments were combined to form a single region.
Development of artificial intelligence based RFFA models: In the development/training of
the artificial intelligence based RFFA models, the selected 452 catchments were divided into
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 163
two parts randomly: (i) training data set consisting of 362 catchments; and (ii) validation data
set consisting of 90 catchments. The artificial intelligence based RFFA models were evaluated
based on four criteria: median Qpred/Qobs ratio, plot of Qobs and Qpred, median relative error and
coefficient of efficiency (Tables 7.9 and 7.10). It was found that no model performed the best
for all the six ARIs over all the adopted criteria. Overall, the ANN based RFFA model
outperformed the three other models in the training/calibration.
Validation of the artificial intelligence based RFFA models: The four artificial intelligence
based RFFA models (ANN, GAANN, GEP and CANFIS) were validated using the 90
independent test catchments. In the first step, the four artificial intelligence based RFFA
models were compared with each other. Based on seven different criteria (can be seen in
Table 8.5), it was found that there was no model which performed the best for all the six ARIs
based on all the seven criteria (Table 8.17). It was found that the ANN based RFFA model
was the best performing model among the four artificial intelligence based RFFA models. In
the second step, the ANN based RFFA model was compared with the ordinary least squares
based quantile regression technique. It was found that ANN based RFFA model outperformed
the quantile regression technique.
The median relative error values for the finally selected ANN based RFFA model were found
to be in the range of 35% to 44%, which is comparable to the generalised least squares
regression and region-of-influence approach (parameter regression technique) which reported
relative error values in the range of 29% to 45% for eastern Australia (Haddad and Rahman,
2012). The ANN based RFFA model exhibited no noticeable spatial trend in the relative error
values on the map of the selected study area. Furthermore, the relative error values were
found to be independent of catchment area. There are few catchments where the ANN based
RFFA model (and the other three artificial intelligence based RFFA models and the quantile
regression technique) showed quite high relative error values (similar to the results by Haddad
and Rahman, 2012). These catchments need further investigation, which however was not
undertaken in this thesis.
9.3 Conclusions
The following conclusions can be made from this research:
It has been found that non-linear artificial intelligence based RFFA techniques can be
applied successfully to eastern Australian catchments. Among the four artificial
intelligence based models, the ANN based RFFA model has been found to be the best
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 164
performing model, followed by the GAANN based RFFA model. The ANN based
RFFA model has been found to outperform the ordinary least squares based RFFA
model.
It has been shown that in the training of the four artificial intelligence based RFFA
models, no model performs the best for all the six ARIs over all the adopted criteria.
Overall, the ANN based RFFA model is found to outperform the three other models in
the training/calibration.
Based on independent validation, the median relative error values for the ANN based
RFFA model are observed to be in the range of 35% to 44% for eastern Australia, which
is comparable to the generalised least squares regression and region-of-influence based
RFFA approach.
It has been demonstrated that a RFFA model with two predictor variables i.e.,
catchment area and design rainfall intensity provides more accurate flood quantile
estimates than other models with a greater number of predictor variables. The finally
selected ANN based RFFA model would be easier to apply in practice since data of
these two predictor variables can be obtained relatively easily from published maps and
government websites.
It has been shown that when the data from all the eastern Australian states are combined
to form one region, the resulting ANN based RFFA model performs better as compared
with other candidate regions such as regions based on state boundaries, geographical
and climatic boundaries and the regions formed in the catchment characteristics data
space.
The ANN based RFFA model exhibits no noticeable spatial trend in the relative error
values. Furthermore, the relative error values of the ANN based RFFA model are found
to be independent of catchment area.
9.4 Recommendations for further research
The ANN based RFFA model developed in this study is based on the catchments in the states
of New South Wales, Victoria, Queensland and Tasmania. In future research, the ANN based
RFFA model should be tested to other Australian states.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 165
The ANN based RFFA model developed in this study is based on design rainfall data from
Australian Rainfall and Runoff (ARR) 1987. The ANN based RFFA model should be
calibrated and tested with the recently released design rainfall data by Australian Bureau of
Meteorology.
In future research, detail investigation should be made on the catchments where relative error
values have been found to be quite high for all the modelling techniques adopted in this
research. In this regard, streamflow data of these catchments should be checked. Furthermore,
it should be checked whether these catchments have other special features which make them
significantly different to other catchments in the data set.
To enhance the accuracy of the ANN based RFFA model, a lager data set consisting of a
greater number of catchments and additional predictor variables (when available in future)
should be used to develop and test the ANN based RFFA model in future.
In future research, leave-one-out validation and Monte Carlo cross validation technique
should be adopted to train and validate the ANN based RFFA model.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 166
REFERENCES
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 167
REFERENCES
ABC News (2011). Aerial shot of the flooded Queensland town of Ipswich. (accessed on 5th
August 2013). Accessible at http://www.abc.net.au.
ABC News (2011). Aerial shot of the flooded New South Wales town of Wagga Wagga.
(accessed on 5th August 2013). Accessible at http://www.abc.net.au.
Abrahart, R.J., See, L. and Kneale P.E. (1999). Using pruning algorithms and genetic
algorithms to optimize network architectures and forecasting inputs in a neural network
rainfall-runoff model. Journal of Hydroinformatics, 1, 103-114.
Acreman, M.C. and Sinclair, C.D. (1986) Classification of drainage basins according to their
physical characteristics and application for flood frequency analysis in Scotland, Journal of
Hydrology, 84(3), 365-380.
Adams, C.A. (1987). Design flood estimation for ungauged rural catchments in Victoria Road
Construction Authority, Victoria, Draft Technical Bulletin.
Alkon, D.L. (1989). Memory storage and neural systems. Scientific American, 26 (1), 42-50.
Alecsandru, C. and Ishak, S. (2004). Hybrid model-based and memory-based traffic
prediction system. Transportation Research Record: Journal of the Transportation Research
Board, 1879(1), 59-70.
Arthur, L.C. and Roger, L.W. (1995). LibGA for solving combinatorial optimization
problems. In L. Chambers (ed.), Practical handbook of Genetic Algorithms, CRC Press, Inc.
ASCE. (2000). Task Committee, 2000. Artificial neural networks in hydrology-I: Preliminary
concepts. Journal of Hydrologic Engineering, ASCE 5 (2), 115–123.
Aytek, A. (2009). Co-Active neuro-fuzzy inference system for evapotranspiration modelling.
Soft Computing, 13(7), 691-700.
Azamathulla, H.M., Ghani, A.A., Leow, C.S., Chang, C.K. and Zakaria, N.A. (2011). Gene-
expression programming for the development of a stage-discharge curve of the Pahang River.
Water Resources Management, 25(11), 2901–2916
Azamathulla, H.M. and Ghani, A.A. (2011). Genetic programming for longitudinal dispersion
coefficients in streams. Water Resources Management, 25(6), 1537–1544.
Aziz, K., Rahman, A., Fang, G., Haddad, K. and Shrestha, S. (2010). Design flood estimation
for ungauged catchments: Application of artificial neural networks for eastern Australia.
World Environment and Water Resources Congress, ASCE, Providence, Rhodes Island, USA.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 168
Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2011). Artificial neural networks based
regional flood estimation methods for eastern Australia: Identification of optimum regions.
33rd Hydrology and Water Resources Symposium, 26 June-1 July 2011, Brisbane, Australia.
Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2012). Comparison of artificial neural
networks and adaptive neuro-fuzzy inference system for regional flood estimation in
Australia, Hydrology and Water Resources Symposium, Engineers Australia, 19-22 Nov
2012, Sydney, Australia.
Aziz, K., Rahman, A., Fang, G. and Shreshtha, S. (2013). Application of artificial neural
networks in regional flood frequency analysis: A case study for Australia, Stochastic
Environment Research & Risk Assessment, 28(3), 541-554.
Baker, J.E. (1985). Adaptive selection method for genetic algorithms. Proceedings of an
International Conference on Genetic Algorithms and their Applications, 100-111.
Baker. J.E. (1987). Reducing bias inefficiency in the selection algorithm. In J.J. Grefenstette
(ed.), Genetic algorithms and their applications, Proceedings of the second international
conference on genetic algorithms, Erlbaum.
Bates, B.C., Rahman, A., Mein, R.G. and Weinmann, P.E. (1998). Climatic and physical
factors that influence the homogeneity of regional floods in south-eastern Australia. Water
Resources Research, 34(12), 3369-3382.
Bayazit, M. and Onoz, B. (2004). Sampling variances of regional flood quantiles affected by
inter-site correlation, Journal of Hydrology, 291, 42-51.
Benson, M.A. (1962). Evolution of methods for evaluating the occurrence of floods. U.S.
Geological Surveying Water Supply Paper, 30, 1580-A.
Bureau of Infrastructure, Transport and Regional Economics (BITRE) (2008). Analysis of the
Emergency Management Australia database. About Australia’s Regions, Department of
Infrastructure, Transport, Regional Development and Local Government, Australian
Government, Canberra, Table 30, 44 pp.
Bureau of Meteorology (2014). State of the Climate 2014. http://www.bom.gov.au/state-of-
the-climate/.
Bishop, C.M. (1995). Neural networks for pattern recognition, Oxford University Press.
Blöschl, G. and Sivapalan, M. (1997). Process controls on regional flood frequency:
Coefficient of variation and basin scale, Water Resources Research, 33, 2967-2980.
Blackie, J.R. and Eeles, C.W.O. (1985). Lumped catchment models. In: Hydrological
Forecasting (ed. by M. G. Anderson &T. P. Burt), 311-346. Wiley.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 169
Bogardi, I., Bardossy, A., Duckstein, L. and Pongra´cz, R. (2003). Fuzzy logic in hydrology
and water resources. In: Demicco, R.V., Klir, G.J. (Eds.), Fuzzy Logic in Geology. Elsevier
Academic Press, 153–190.
Bowden, G.J., Dandy, G.C. and Maier, H.R. (2005). Input determination for neural network
models in water resources applications. Part 1-background and methodology. Journal of
Hydrology, 301, 75-92.
Burn, D.H. (1990). Evaluation of regional flood frequency analysis with a region of influence
approach, Water Resources Research, 26(10), 2257-2265.
Burn, D.H. and Goel N.K. (2000). The formation of groups for regional flood frequency,
Journal of Hydrological Sciences, 45(1), 97-112.
Caballero, W.L. and Rahman, A. (2014). Development of regionalized joint probability
approach to flood estimation: A case study for New South Wales, Australia, Hydrological
Processes, 28, 4001-4010.
Castellarin, A., Burn, D.H. and Brath, A. (2001). Assessing the effectiveness of hydrological
similarity measures for regional flood frequency analysis, Journal of Hydrology, 241(3-4),
270-285.
Cheng, C.T., Ou, C.P. and Chau, K.W. (2002). Combining a fuzzy optimal model with a
genetic algorithm to solve multi-objective rainfall-runoff model calibration. Journal of
Hydrology, 268, 72-86.
Chokmani, K., Ouarda, B.M.J.T., Hamilton, S., Ghedira, M.H. and Gingras, H. (2008).
Comparison of ice-affected streamflow estimates computed using artificial neural networks
and multiple regression techniques. Journal of Hydrology, 349, 83–396.
Caudill, M. (1987). Neural networks primer, Part I, AI Expert, December, 46-52.
Caudill, M. (1988). Neural networks primer, Part II, AI Expert, No. February, 55-61.
Caudill, M. (1989). Neural networks primer, Part VII, AI Expert, No. May, 51 - 8.
Chow, V.T., Maidment, D.R. and Mays, L.W. (1988). Applied Hydrology, McGraw-Hill,
New York, NY.
Corradini, C. and Singh, V.P. (1985). Effect of spatial variability of effective rainfall on direct
runoff by a géomorphologie approach. Journal of Hydrology, 81, 27-43.
Cunnane, J.R. (1987). Review of Statistical Methods for Flood Frequency Estimation. V. P.
Singh (Ed.), in Hydrologic Frequency Modeling, D. Reidel, Dordrecht.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 170
Cunnane, C. (1988). Methods and merits of regional flood frequency analysis, Journal of
Hydrology, 100, 269-290.
Dalrymple, T. (1960). Flood frequency analyses. U.S., Geological Survey Water Supply
Paper, 1543-A, 11-51.
Daniell, T.M. (1991). Neural networks – applications in hydrology and water resources
engineering. International Hydrology & Water Resources Symposium. Perth, Australia, 2-4
October, 1991.
de la Maza M. and Tidor B. (1993). An analysis of selection procedures with particular
attention paid to proportional and Boltzmann selection. In S. Forrest (ed.), Proceedings of the
fifth international conference on genetic algorithms.
Dawson, C.W., Abrahart, R.J., Shamseldin, A.Y. and Wilby, R.L. (2006). Flood estimation at
ungauged sites using artificial neural networks, Journal of Hydrology, 319, 391–409.
Dawdy, D.R. (1961). Variation of flood ratios with size of drainage area. U. S. Geol. Surv.
Prof. Pap. 424-C, Paper C36.
Douglas, B.C. (1995). U.S. National Report to IUGG, 1991-1994. Reviews of Geophysics, 33
Supplement. Online; available at http://www.agu.org/revgeophys/dougla01/dougla01.
(Accessed on 13 Nov, 2009).
Efron, B. and Tibshirani, R.J. (1993). An introduction to the bootstrap. Monographs on
Statistics and Applied Probability. Chapman and Hall, New York.
Fausett, L. (1994). Fundamentals of neural networks, Englewood Cliffs, NJ: Prentice Hall.
Farmer, J.D. and Sidorowich, J. (1987). Predicting chaotic time series. Physical Review
Letter, 59(8), 845-848.
Feldman, A.D. (1979). Flood hydrograph and peak flow frequency analysis. (Technical Paper
No. 62). US Army Corps of Engineers, Institute for Water Resources, The Hydrologic
Engineering Centre.
Fernando, D.A.K., Shamseldin, A.Y. and Abrahart, R.J. (2009). Using gene expression
programming to develop a combined runoff estimate model from conventional rainfall-runoff
model outputs. 18th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 2009.
Ferreira, C. (2001a). Gene expression programming in problem solving”, 6th Online World
Conference on Soft Computing in Industrial Applications (invited tutorial).
Ferreira, C. (2001b). Gene expression programming: a new adaptive algorithm for solving
problems. Complex Systems 13(2), 87–129.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 171
Ferreira, C. (2006). Gene-expression programming; mathematical modeling by an artificial
intelligence. Springer, Berling, Heidelberg, New York.
Flavell, D. (2012). Design flood estimation in Western Australia. Australian Journal of Water
Resources, Vol. 16 (1), 1-20.
Flood, I. and Kartam, N. (1994). Neural networks in civil engineering; Principles and
understanding. Journal of Computing in Civil Engineering, 8(2), 131-148, 194.
Franchini, M. (1996). Using a genetic algorithm combined with a local search method for the
automatic calibration of conceptual rainfall-runoff models. Hydrological Sciences Journal
41(1), 21-40.
Franchini, M. and Galeati, G. (1997). Comparing several genetic algorithm schemes for the
calibration of conceptual rainfall-runoff models. Hydrological Sciences Journal, 42 (3), 357-
379.
Franchini, M., Galeati, G. and Lolli, M. (2005). Analytical derivation of the flood frequency
curve through partial duration series analysis and a probabilistic representation of the runoff
coefficient, Journal of Hydrology, 303, 1–15.
Giustolisi, O. (2004). Using genetic programming to determine Chèzy resistance coefficient
in corrugated channels. Journal of Hydroinformatics, 6(3), 157–173.
Griffis, V.W. and Stedinger, J.R. (2007). The use of GLS regression in regional hydrologic
analyses. Journal of Hydrology, 344, 82-95.
Grubbs, F.E. and Beck, G. (1972). Extension of sample sizes and percentage points for
significance tests of outlying observations. Technometrics, 14, 847–854.
Goldberg, D.E. (1989). Genetic algorithms in search, optimization and machine learning.
Addison-Wesley, Reading, MA.
Goldberg, D.E. and Deb, K. (1991). A comparative analysis of selection schemes used in
genetic algorithms. In G. Rawlins (ed.), Foundations of genetic algorithms.
Gupta, V.K. and E. Waymire. (1993). A statistical analysis of mesoscale rainfall as a random
cascade. Journal of Applied Meteorology, 32(2), 251-267.
Guven, A. and Talu, N.E. (2010). Gene-expression programming for estimating suspended
sediment in Middle Euphrates Basin, Turkey. Clean Soil Air and Water, 38(12), 1159–1168.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 172
Guven, A. and Kisi, O. (2011). Estimation of suspended sediment yield in natural rivers using
machine-coded linear genetic programming. Water Resources Management, 25(2), 691–704.
Hackelbusch A., Micevski T., Kuczera G., Rahman A. and Haddad K. (2009). Regional flood
frequency analysis for eastern New South Wales: A region of influence approach using
generalized least squares based parameter regression. In Proc. 31st Hydrology and Water
Resources Symp., Newcastle, Australia.
Haddad, K., Rahman, A. and Weinmann, P.E. (2006). Design flood estimation in ungauged
catchments by quantile regression technique: ordinary least squares and generalised least
squares compared. 30th Hydrology and Water Resources Symposium, The Institution of
Engineers Australia, 4-7 Dec 2006, Launceston.
Haddad, K., Rahman, A. and Weinmann, P.E. (2008). Development of a generalised least
squares based quantile regression technique for design flood estimation in Victoria, 31st
Hydrology and Water Resources Symp., Adelaide, 15-17 April 2008, 2546-2557.
Haddad, K., Pirozzi, J., McPherson, G., Rahman, A. and Kuczera, G. (2009). Regional flood
estimation technique for NSW: Application of generalised least squares quantile regression
technique. In Proc. 31st Hydrology and Water Resources Symp., Newcastle, Australia.
Haddad, K., Rahman, A., Weinmann, P.E., Kuczera, G. and Ball, J.E. (2010). Streamflow
data preparation for regional flood frequency analysis: Lessons from south-east Australia.
Australian Journal of Water Resources, 14, 1, 17-32.
Haddad, K., Rahman, A. and Stedinger, J.R. (2011). Regional flood frequency analysis using
bayesian generalized least squares: A comparison between quantile and parameter regression
techniques, Hydrological Processes, 25, 1-14.
Haddad, K. and Rahman, A. (2011). Regional flood estimation in New South Wales Australia
using generalised least squares quantile regression. Journal of Hydrologic Engineering,
ASCE, 16 (11), 920-925.
Haddad, K. and Rahman, A. (2012). Regional flood frequency analysis in eastern Australia:
Bayesian GLS regression-based methods within fixed region and ROI framework – Quantile
Regression vs. Parameter Regression Technique, Journal of Hydrology, 430-431 (2012), 142-
161.
Haddad, K., Rahman, A., Ling, F. (2014). Regional flood frequency analysis method for
Tasmania, Australia: A case study on the comparison of fixed region and region-of-influence
approaches, Hydrological Sciences Journal, DOI:10.1080/02626667.2014.950583.
Holland, J.H. (1975). Adaptation in natural and artificial systems. University of Michigan
Press, Ann Arbor, MI pp. 183.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 173
Hoang, T.M.T. (2001). Joint probability approach to design flood estimation, PhD thesis,
Department of Civil Engineering, Monash University, Australia.
Hosking, J.R.M. and Wallis, J.R. (1993). Some statics useful in regional frequency analysis,
Water Resources Research, 29(2), 271–281.
Hosking, J.R.M. and Wallis J.R. (1997). Regional frequency analysis – An approach based on
L-moments, Cambridge University Press, New York, 224 pp.
Hopfield, J. (1982). Neural networks and physical systems with emergent collective
computational abilities. Proceedings of the National Academy of Sciences of the USA,
9(2554).
Institution of Engineers Australia (I.E. Aust.) (1987, 2001). Australian rainfall and runoff: A
guide to flood estimation. Editor: D.H. Pilgrim, Vol.1, I. E. Aust., Canberra.
Ishak, E., Rahman, A., Westra, S., Sharma, A. and Kuczera, G. (2013). Evaluating the non-
stationarity of Australian annual maximum floods. Journal of Hydrology, 494, 134-145.
Ishak, E., Rahman, A. (2014). Detection of changes in flood data in Victoria, Australia over
1975-2011, Hydrology Research, doi:10.2166/nh.2014.064.
Ishak, E., Haddad, K., Zaman, M. and Rahman, A. (2011). Scaling property of regional floods
in New South Wales Australia, Natural Hazards, 58, 1155-1167.
Jain A. and Srinivasulu S. (2004). Development of effective and efficient rainfall-runoff
models using integration of deterministic, real-coded genetic algorithms and artificial neural
network techniques. Water Resources Research, 40, W04302.
Jain A., Srinivasalu S. and Bhattacharjya, R.K. (2005). Determination of an optimal unit pulse
response function using real-coded genetic algorithm. Journal of Hydrology, 303, 199-214.
Jain, A., Srinivasalu, S., Bhattacharjya, R.K. (2005). Determination of an optimal unit pulse
response function using real-coded genetic algorithm. Journal of Hydrology, 303, 199-214.
James, W. and Robinson, M.A. (1986). Continuous deterministic urban runoff modelling, in
C. Maksimovic and M. Radojkovic (Edition), Proceedings of the International Symposium on
Comparison of Urban Drainage Models with Real Catchment Data, Dubrovnik, Yugoslavia,
Pergamon Press, Oxford.
Jang, J.S.R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEE
Transactions on Systems, Man and Cybernetics, 23(3), 665-685.
Jang, J.S.R., Sum, C.T. and Mizutani, E. (1997). Neuro-fuzzy and soft computing Prentice-
Hall, New Jersey.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 174
Javelle, P., Ouarda, B.M.J.T., Lang, M., Bobee, B., Galea, G. and Gresillon, J.M. (2002).
Devalopment of regional flood-duration-frequency curves based on the index flood method,
Journal of Hydrology, 258, 249-259.
Jeong, D.I., Stedinger, J.R., Kim, Y., Sung, J.H. and Yoon, S.Y. (2008). Reflecting a Climate
Change Factor in Flood Frequency Analysis for Korean River Basins. Water Down Under,
Adelaide, Australia, 14-17 April.
Jiapeng, H., Zhongmin, L. and Zhongbo, Y. (2003). A modified rational formula for flood
design in small basins, Journal of the American Water Resources Association, 39(5), 1017-
1025.
Jingyi, Z. and Hall, M.J. (2004). Regional flood frequency analysis for the Gan-Ming River
basin in China, Journal of Hydrology, 296, 98–117.
Kendall, M.G. (1970). Rank Correlation Methods, 2nd Ed., New York: Hafner.
Khu, S.T., Liong, S.Y., Babovic, V., Madsen, H. and Muttil, N. (2001). Genetic programming
and its application in real-time runoff forecasting. Journal of the American Water Resources
Association, 37(2), 439-451.
Kirby, W. and Moss, M. (1987). Summary of flood frequency analysis in the United States.
Journal of Hydrology, 96, 5-14.
Kisi, O. and Shiri, J. (2011). Precipitation forecasting using wavelet-genetic programming and
wavelet-neuro-fuzzy conjunction models. Water Resources Management, 25(13), 3135–3152.
Kjeldsen, T.R. and Jones, D.A. (2010). Predicting the index flood in ungauged UK
catchments: On the link between data-transfer and spatial model error structure, Journal of
Hydrology, 387(1-2), 1-9.
Kjeldsen, T.R. and Jones, D. (2009). An exploratory analysis of error components in
hydrological regression modelling. Water Resources Research, 45, W02407.
Klemes, V. (1993). Probability of extreme hydrometeorological events - A different approach
in extreme hydrological events: Precipitation, Floods and Droughts, 167-176, IAHS, Publi.
Kothyari, U.C. (2004). Estimation of mean annual flood from ungauged catchments using
artificial neural networks. Hydrology: Science and Practice for the 21st Century. Volume 1,
British Hydrological Society.
Kuczera, G. (1983). A Bayesian surrogate for regional skew in flood frequency analysis.
Water Resources Research, 19, 3, 832-837.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 175
Kuczera, G. (1999). Comprehensive at-site flood frequency analysis using Monte Carlo
Bayesian inference. Water Resources Research, 35, 5, 1551-1557.
Lawrence, W.T. (1994). Comparative analysis of data acquired by three Narrow-band
airbome spectroradiometers over subboreal vegetation. Remote Sens. Environ., vol47, 204-
215.
Luk, K.C., Ball, J.E. and Sharma, A. (2001). An application of artificial neural networks for
rainfall forecasting. Mathematical and Computer Modelling, 33, 683-693.
Lumb, A.M. and James, L.D. (1976). Runoff files for flood hydrograph simulation. Journal of
the Hydraulics Division, ASCE, 1515-1531.
Madsen, H., Rosbjerg, D. and Harremoes, P. (1995). Application of the Bayesian approach in
regional analysis of extreme rainfalls. Stochastic Hydrology and Hydraulics, 9, 77-88.
Madsen, H., Pearson, C.P. and Rosbjerg, D. (1997). Comparison of annual maximum series
and partial duration series for modeling extreme hydrological events—2. Regional modeling.
Water Resources Research, 33(4), 771–781.
Maier, H.R. and Dandy, G.C. (2000). Neural networks for the prediction and forecasting of
water resources variables: a review of modelling issues and applications. Environmental
Modelling and Software, 15(1), 101-123.
McCulloch, W.S. and Pitts, W. (1943). A logic calculus of the ideas immanent in nervous
activity. Bulletin of Mathematical Biophysics, 5, 115–133.
Micevski, T., Hackelbusch, A., Haddad, K., Kuczera, G., Rahman, A. (2014).
Regionalisation of the parameters of the log-Pearson 3 distribution: a case study for New
South Wales, Australia, Hydrological Processes, DOI: 10.1002/hyp.10147.
Minns, A. and Hall, M. (1996). Artificial neural networks as rainfall-runoff models.
Hydrological Sciences, 41, 399-417.
Morshed, J. and Kaluarachchi, J.J. (1998). Application of artificial neural network and genetic
algorithm in flow and transport simulations. Journal of Advances in Water Resources, 22(2),
145-158.
Mitchell, M. (1996). An Introduction to Genetic Algorithms. MIT Press.
Muttiah, R.S., Srinivasan, R. and Allen, P.M. (1997). Prediction of two year peak stream
discharges using neural networks. Journal of the American Water Resources Association, 33
(3), 625–630.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 176
Mulvany, T.J. (1851). On the use of self-registering rain and flood gauges, Inst. Civ. Eng.
(Ireland) Trans, 4(2), 1-8.
Nathan, R.J. and McMahon, T.A. (1990). Identification of homogeneous regions for the
purpose of regionalisation, Journal of Hydrology, 121, 217-238.
National Research Council (NRC). (1988). Estimating probabilities of extreme floods:
methods and recommended research. National Academy Press, Washington, D.C., 141.
Nayak, P.C. and Sudheer, K.P. (2004). A neuro-fuzzy computing technique for modelling
hydrological time series. Journal of Hydrology, 291(1–2), 52-66.
NERC. (1975). Flood studies report, Natural Environment Research Centre (NERC), London.
Novak, V., Perfilieva, I. and Mockor, J. (1999). Mathematical principles of fuzzy logic
dodrecht: Kluwer Acedamic. ISBN 0-7923-8595-0.
Ouarda, T.B.M.J., Bâ, K.M., Diaz-Delgado, C., Cârsteanu, C., Chokmani, K., Gingras, H.,
Quentin, E., Trujillo, E. and Bobée, B. (2008). Intercomparison of regional flood frequency
estimation methods at ungauged sites for a Mexican case study, Journal of Hydrology, 348,
40-58.
Pallard, B., Castellarin, A. and Montanari, A. (2009). A look at the links between drainage
density and flood statistics, Hydrology and Earth System Sciences (HESS), 13, 1019-1029.
Pandey, G.R. and Nguyen, V.T.V. (1999). A comparative study of regression based methods
in regional flood frequency analysis. Journal of Hydrology, 225, 92-101.
Parthiban, L. and Subramianian, R. (2009). CANFIS- A computer aided diagnostic tool for
cancer detection. Journal of Biomedical Science and Engineering, 2, 323-335.
Pegram G.G.S. and Parak, M. (2004). A review of the regional maximum flood and rational
formula using geomorphological information and observed floods, ISSN 0378-4738, Water
South Africa, 30(3), 377-392.
Pilgrim, D.H. and McDermott, G.E. (1982). Design floods for small rural catchments in
eastern New South Wales. Civil Engg Trans, Inst. Engrs Aust., CE24, 226-234.
Pilgrim, D.H. and Cordery, I. (1993). Flood Runoff, in Maidment, D.R., ed., Handbook of
Hydrology, McGraw-Hill, New York, 9, 9.1-9.42.
Pirozzi, J., Ashraf, M., Rahman, A. and Haddad, K. (2009). Design flood estimation for
ungauged catchments in eastern NSW: Evaluation of the probabilistic rational method. In
Proc. 31st Hydrology and Water Resources Symposium, Newcastle, Australia.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 177
Principe, J.C., Euliano, N.R. and Lefebvre, W.C. (2000). Neural and adaptive systems, John
Wiley & Sons, Inc.
Queensland Reconstruction Authority (2011). Operation Queenslander: The State
Community, Economic and Environmental Recovery and Reconstruction Plan 2011–2013.
Queensland Reconstruction Authority, Queensland, Australia, March 2011, 48 pp.
Rahman, A. (1997). Flood Estimation for ungauged catchments: A regional approach using
flood and catchment characteristics, PhD thesis, Department of Civil Engineering, Monash
University.
Rabunal, J.R., Puertas, J., Suarez, J. and Rivero, D. (2007). Determination of the unit
hydrograph of a typical urban basin using genetic programming and artificial neural networks.
Hydrological Process, 27(4), 476–485.
Rahman, A. (2005). A quantile regression technique to estimate design floods for ungauged
catchments in South-east Australia. Australian Journal of Water Resources, 9(1), 81-89.
Rahman, A., Bates, B.C., Mein, R.G. and Weinmann, P.E. (1999). Regional flood frequency
analysis for ungauged basins in south-eastern Australia. Australian Journal of Water
Resources. 3(2), 199-207, 1324-1583.
Rahman, A., Weinmann, P.E. and Mein, R.G. (1999). At-site frequency analysis: LP3-product
moment, GEV-L moment and GEV-LH moment procedures compared. Water 99 Joint
Congress, 715-720.
Rahman, A., Weinmann, P.E., Hoang, T.M.T, Laurenson, E. M. (2002) Monte Carlo
Simulation of flood frequency curves from rainfall. Journal of Hydrology, 256 (3-4), 196-210.
ISSN 0022-1694.
Rahman, A. and Hollerbach, D. (2003). Study of runoff coefficients associated with the
probabilistic rational method for flood estimation in South-east Australia In Proc. 28th Intl.
Hydrology and Water Resources Symp., I. E. Aust., Wollongong, Australia, 10-13 Nov. 2003,
1, 199-203.
Rahman, A., Haddad, K., Caballero, W. and Weinmann, P.E. (2008). Progress on the
enhancement of the Probabilistic Rational Method for Victoria in Australia. 31st Hydrology
and Water Resources Symp., Adelaide, 15-17 April 2008, 940-951.
Rahman, A., Haddad, K., Kuczera, G. and Weinmann, P.E. (2009). Regional flood methods
for Australia: data preparation and exploratory analysis. Australian Rainfall and Runoff
Revision Projects, Project 5 Regional Flood Methods, Report No. P5/S1/003, Nov 2009,
Engineers Australia, Water Engineering, pp. 181.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 178
Rahman, A., Haddad, K., Zaman, M., Kuczera, G. and Weinmann, P.E. (2011). Design flood
estimation in ungauged catchments: A comparison between the Probabilistic Rational Method
and Quantile Regression Technique for NSW. Australian Journal of Water Resources, 14, 2,
127-137.
Rahman, A., Haddad, K., Zaman, M., Ishak, E., Kuczera, G. and Weinmann, P.E. (2012).
Australian Rainfall and Runoff Revision Projects, Project 5 Regional flood methods, Stage 2
Report No. P5/S2/015, Engineers Australia, Water Engineering, pp. 319.
Rao, Z.F. and Jamieson, D.G. (1997). The use of neural networks and genetic algorithms for
design of groundwater remediation schemes. Hydrology and Earth System Sciences, 1(2),
345-356.
Rao, A.R. and Hamed, K.H. (2000). Flood frequency analysis. CRC Press, Florida, USA.
Riggs, H.C. (1973). Regional analyses of streamflow techniques. Techniques of water
resources investigations of the U.S. Geol. Surv., Book 4, Chapter B3, U.S.Geol. Surv.,
Washington D.C.
Reed, D.W. and Robson, A.J. (1999). Flood estimation handbook, vol. 3. Centre for Ecology
and Hydrology, UK.
Roger, J.S. Chuen-Tsai, S. and Eiji, M. (1997). Neuro-fuzzy and soft computing, Englewood
Cliffs, Prentice Hall.
Rooij, A.J.F.V., Jain, L.C. and Johnson, R.P. (1996). Neural network training using genetic
algorithms. World Scientific Publishing Co. Pty. Ltd., pp. 130.
Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986). Learning internal representations by
error propagation. In Rumelhart, D. E., McClelland, J. L. and the PDP Research Group,
editors, Paralled Distributed Processing. Explorations in the Microstructure of Cognition. Vol.
1, 318-362. The MIT Press, Cambridge, MA.
Saf, B. (2009). Regional flood frequency analysis using L-Moments for the West
Mediterranean Region of Turkey. Water Resources Management, 23(3), 531–551.
Savic, D.A., Walters, G.A. and Davidson, J.W. (1999). A genetic programming approach to
rainfall-runoff modelling. Water Resources Management, 12, 219-231.
See, L., and Openshaw, S. (1999). Applying soft computing approaches to river level
forecasting. Hydrological Sciences Journal, 44(5), 763-778.
Sekin, N. and Guven, A. (2012). Estimation of peak flood discharges at ungauged sites across
Turkey, Water Resources Management, 26, 2569–2581.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 179
Shamseldin, A.Y. (1997). Application of a neural network technique to rainfall-runoff
modeling. Journal of Hydrology, 199, 272–294.
Shi, Y. and Muzimoto, M. (2000) Some Considerations on Convention Neuro-Fuzzy
Learning Algorithms Gradient Descent Method, Fuzzy Sets and Systems, 112, 51-63.
Shu, C. and Burn, D.H. (2004). Artificial neural network ensembles and their application in
pooled flood frequency analysis, Water Resources Research, 40(9), W09301,
doi:10.1029/2003WR002816.
Shu, C. and Ouarda, T.B.M.J. (2007). Flood frequency analysis at ungauged sites using
artificial neural networks in canonical correlation analysis physiographic space, Water
Resources Research, 43, W07438, doi:10.1029/2006WR005142.
Shu, C. and Ouarda, T.B.M.J. (2008). Regional flood frequency analysis at ungauged sites
using the adaptive neuro-fuzzy inference system. Journal of Hydrology, 349, 31-43.
Simonovic, S.P. (1992), Reservoir systems-analysis—Closing gap between theory and
practice. Journal of Water Resources Planning and Management, 118(3), 262–280.
Smith, J.A. (1992). Representation of basin scale in flood peak distributions. Water Resources
Research, 28 (11), 2993-2999.
Smith, J.A. (1993). LAI Inversion using a back-propagation neural network trained with a
multiple scattering Model. IEEE Transactions on Geoscience and Remote Sensing, 31,
5,1102-1106.
Stedinger, J.R., Tasker, G.D. (1985). Regional hydrologic analysis - 1. Ordinary, weighted
and generalized least squares compared. Water Resources Research, 21, 1421-1432.
Takens, F. (1981). Detecting strange attractors in turbulence. In: D.A. Rand and L.-S. Young,
Editors, Dynamical systems and turbulence, Lecture Notes in Mathematics. Vol. 898,
Springer-Verlag, Berlin, pp. 366–381.
Takagi, T. and M. Sugeno. (1983). Derivation of fuzzy control rules from human operator’s
control actions. Proceedings of the IFAC symposium on fuzzy information, knowledge
representation and decision analysis.
Takagi, T. and M. Sugeno. (1985). Fuzzy identification of systems and its applications to
modeling and control. Systems, Man and Cybernetics, IEEE Transactions, (1), 116-132.
Takagi, H. and Hayashi, I. (1991). Neural Network driven fuzzy reasoning. International.
Journal of Approximate Reasoning, 5(3), 191-212.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 180
Talei, A. and Chua, L.H.C. (2010a). A novel application of a neuro-fuzzy computational
technique in event-based rainfall–runoff modelling. Expert Systems with Applications,
37(12), 7456-7468.
Tasker, G.D. (1980). Hydrologic regression with weighted least squares. Water Resources
Research, 16(6), 1107-1113.
Tasker, G.D., Eychaner, J.H. and Stedinger J.R. (1986). Application of generalised least
squares in regional hydrologic regression analysis. US Geological Survey Water Supply
Paper, 2310, 107–115.
Tasker, G.D., Hodge, S.A. and C.S. Barks. (1996). Region of Influence regression for estimat-
ing the 50-year flood at ungauged sites, Water Resources Bulletin, 32(1), 163-170.
Thandaveswara, B.S. and Sajikumar, N. (2000). Classification of river basins using artificial
neural networks. Journal of Hydrologic Engineering, 5 (3), 290–298.
Theodoridis, S. and Koutroumbas, K. (2009). Pattern Recognition, 4th Edition, Academic
Press, ISBN: 978-1-59749-272-0.
Thomas, D.M. and Benson, M.A. (1970). Generalization of streamflow characteristics from
drainage-basin characteristics, U.S. Geological Survey Water Supply Paper 1975, US
Governmental Printing Office.
Tokar, A.S. and Johnson, P.A. (1999). Rainfall-Runoff Modeling using Artificial Neural
Networks, J. Hydrologic Engineering, ASCE, 4(3), 232-239.
Turan, M.E. and Yurdusev, M.A. (2009). River flow estimation from upstream flow records
by artificial intelligence methods. Journal of Hydrology, 369, 71–77.
Vogel, R.M., McMahon, T.A. and Chiew, F.H.S. (1993). Flood flow frequency model
selection in Australia. Journal of Hydrology, 146, 421-449.
Wang, Q.J. (1991). The genetic algorithm and its application to calibrating conceptual
rainfall-runoff models. Water Resources Research, 27(9), 2467-2471.
Wasserman, P.D. (1989). Neural computing: theory and practice. Van Nostrand Reinhold,
New York.
Wasserman, P. (1993). Advanced methods in neural computing, Van Nostrand Reinhold,
ISBN 0-442-00461-3.
Weeks, W.D. (1991). Design floods for small rural catchments in Queensland, Civil
Engineering Transactions, IEAust, 33(4), 249-260.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 181
Wu, C.L. and Chau, K.W. (2006). A flood forecasting neural network model with genetic
algorithm. International Journal Environment and Pollution, 28, 261, 3-4.
Zaman, M., Rahman, A., Haddad, K. (2012). Regional flood frequency analysis in arid
regions: A case study for Australia. Journal of Hydrology, 475, 74-83.
Zhang, B. and Govindaraju, R.S. (2003). Geomorphology-based artificial neural networks for
estimation of direct runoff over watersheds. Journal of Hydrology, 273 (1), 18–34.
Zhang, Z. and Hall, D.B. (2004). Marginal models for zero inflated clustered data. Statistical
Modelling, 4, 161–180.
Zrinji, Z. and Burn, D.H. (1994). Flood frequency analysis for ungauged sites using a region
of influence approach. Journal of Hydrology, 153(1-4), 1-21.
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 182
APPENDICES
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 183
APPENDIX A
List of selected study catchments
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 184
Appendix A List of selected catchments
Table A1 Selected catchments from New South Wales
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
201001 Eungella Oxley -28.36 153.29 213 49 1958 - 2006
203002 Repentance Coopers Ck -28.64 153.41 62 30 1977 - 2006
203012 Binna Burra Byron Ck -28.71 153.50 39 29 1978 - 2006
203030 Rappville Myrtle Ck -29.11 153.00 332 27 1980 - 2006
204025 Karangi Orara -30.26 153.03 135 37 1970 - 2006
204026 Bobo Nursery Bobo -30.25 152.85 80 29 1956 - 1984
204030 Aberfoyle Aberfoyle -30.26 152.01 200 29 1978 - 2006
204036 Sandy Hill(below Snake Cre Cataract Ck -28.93 152.22 236 54 1953 - 2006
204037 Clouds Ck Clouds Ck -30.09 152.63 62 35 1972 - 2006
204056 Gibraltar Range Dandahra Ck -29.49 152.45 104 31 1976 - 2006
204906 Glenreagh Orara -30.07 152.99 446 34 1973 - 2006
206009 Tia Tia -31.19 151.83 261 53 1955 - 2007
206025 near Dangar Falls Salisbury Waters -30.68 151.71 594 34 1973 - 2006
206026 Newholme Sandy Ck -30.42 151.66 8 33 1975 - 2007
207006 Birdwood(Filly Flat) Forbes -31.39 152.33 363 32 1976 - 2007
208001 Bobs Crossing Barrington -32.03 151.47 20 52 1955 - 2006
209001 Monkerai Karuah -32.24 151.82 203 34 1946 - 1979
209002 Crossing Mammy Johnsons -32.25 151.98 156 31 1976 - 2006
209003 Booral Karuah -32.48 151.95 974 38 1969 - 2006
209006 Willina Wang Wauk -32.16 152.26 150 36 1970 - 2005
209018 Dam Site Karuah -32.28 151.90 300 27 1980 - 2006
210011 Tillegra Williams -32.32 151.69 194 75 1932 - 2006
210014 Rouchel Brook (The Vale) Rouchel Brook -32.15 151.05 395 42 1960 - 2001
210017 Moonan Brook Moonan Brook -31.94 151.28 103 67 1941 - 2007
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 185
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
210022 Halton Allyn -32.31 151.51 205 65 1941 - 2005
210040 Wybong Wybong Ck -32.27 150.64 676 50 1956 - 2005
210042 Ravensworth Foy Brook -32.40 151.05 170 30 1967 - 1996
210044 Middle Falbrook(Fal Dam Si Glennies Ck -32.45 151.15 466 51 1957 - 2007
210068 Pokolbin Site 3 Pokolbin Ck -32.80 151.33 25 41 1965 - 2005
210076 Liddell Antiene Ck -32.34 150.98 13 37 1969 - 2005
210079 Gostwyck Paterson -32.55 151.59 956 33 1975 - 2007
210080 U/S Glendon Brook West Brook -32.47 151.28 80 31 1977 - 2007
211009 Gracemere Wyong -33.27 151.36 236 35 1973 - 2007
211013 U/S Weir Ourimbah Ck -33.35 151.34 83 30 1977 - 2006
212008 Bathurst Rd Coxs -33.43 150.08 199 55 1952 - 2006
212018 Glen Davis Capertee -33.12 150.28 1010 35 1972 - 2006
212040 Pomeroy Kialla Ck -34.61 149.54 96 27 1980 - 2004
213005 Briens Rd Toongabbie Ck -33.80 150.98 70 27 1980 - 2006
215004 Hockeys Corang -35.15 150.03 166 75 1930 - 2004
218002 Belowra Tuross -36.20 149.71 556 29 1955 - 1983
218005 D/S Wadbilliga R Junct Tuross -36.20 149.76 900 42 1965 - 2006
218007 Wadbilliga Wadbilliga -36.26 149.69 122 33 1975 - 2005
219003 Morans Crossing Bemboka -36.67 149.65 316 64 1944 - 2007
219017 near Brogo Double Ck -36.60 149.81 152 41 1967 - 2007
219022 Candelo Dam Site Tantawangalo Ck -36.73 149.68 202 36 1972 - 2007
219025 Angledale Brogo -36.62 149.88 717 30 1977 - 2006
220001 New Buildings Br Towamba -36.96 149.56 272 26 1955 - 1980
220003 Lochiel Pambula -36.94 149.82 105 41 1967 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 186
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
220004 Towamba Towamba -37.07 149.66 745 37 1971 - 2007
221002 Princes HWY Wallagaraugh -37.37 149.71 479 36 1972 - 2007
222004 Wellesley (Rowes) Little Plains -37.00 149.09 604 65 1942 - 2006
222007 Woolway Wullwye Ck -36.42 148.91 520 57 1950 - 2006
222009 The Falls Bombala -36.92 149.21 559 43 1952 - 1994
222015 Jacobs Ladder Jacobs -36.73 148.43 187 27 1976 - 2002
222016 The Barry Way Pinch -36.79 148.40 155 31 1976 - 2006
222017 The Hut Maclaughlin -36.66 149.11 313 28 1979 - 2006
401009 Maragle Maragle Ck -35.93 148.10 220 56 1950 - 2005
401013 Jingellic Jingellic Ck -35.90 147.69 378 33 1973 - 2005
401015 Yambla Bowna Ck -35.92 146.98 316 31 1975 - 2005
410038 Darbalara Adjungbilly Ck -35.02 148.25 411 37 1969 - 2005
410048 Ladysmith Kyeamba Ck -35.20 147.51 530 48 1939 - 1986
410057 Lacmalac Goobarragandra -35.33 148.35 673 49 1958 - 2006
410061 Batlow Rd Adelong Ck -35.33 148.07 155 60 1948 - 2007
410062 Numeralla School Numeralla -36.18 149.35 673 43 1965 - 2007
410076 Jerangle Rd Strike-A-Light C -35.92 149.24 212 31 1975 - 2005
410088 Brindabella (No.2&No.3-Cab Goodradigbee -35.42 148.73 427 38 1968 - 2005
410112 Jindalee Jindalee Ck -34.58 148.09 14 30 1976 - 2005
410114 Wyangle Killimcat Ck -35.24 148.31 23 30 1977 - 2006
411001 Bungendore Mill Post Ck -35.28 149.39 16 25 1960 - 1984
411003 Butmaroo Butmaroo Ck -35.26 149.54 65 28 1979 - 2006
412050 Narrawa North Crookwell -34.31 149.17 740 34 1970 - 2003
412063 Gunning Lachlan -34.74 149.29 570 39 1961 - 1999
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 187
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
412081 near Neville Rocky Br Ck -33.80 149.19 145 33 1969 - 2001
412083 Tuena Tuena Ck -34.02 149.33 321 33 1969 - 2001
416003 Clifton Tenterfield Ck -29.03 151.72 570 28 1979 - 2006
416008 Haystack Beardy -29.22 151.38 866 35 1972 - 2006
416016 Inverell (Middle Ck) Macintyre -29.79 151.13 726 35 1972 - 2006
416020 Coolatai Ottleys Ck -29.23 150.76 402 28 1979 - 2006
416023 Bolivia Deepwater -29.29 151.92 505 28 1979 - 2006
418005 Kimberley Copes Ck -29.92 151.11 259 35 1972 - 2006
418014 Yarrowyck Gwydir -30.47 151.36 855 37 1971 - 2007
418017 Molroy Myall Ck -29.80 150.58 842 29 1979 - 2007
418021 Laura Laura Ck -30.23 151.19 311 29 1978 - 2006
418025 Bingara Halls Ck -29.94 150.57 156 28 1980 - 2007
418027 Horton Dam Site Horton -30.21 150.43 220 36 1972 - 2007
418034 Black Mountain Boorolong Ck -30.30 151.64 14 29 1976 - 2004
419010 Woolbrook Macdonald -30.97 151.35 829 28 1980 - 2007
419016 Mulla Crossing Cockburn -31.06 151.13 907 33 1974 - 2006
419029 Ukolan Halls Ck -30.71 150.83 389 27 1979 - 2005
419051 Avoca East Maules Ck -30.50 150.08 454 31 1977 - 2007
419053 Black Springs Manilla -30.42 150.65 791 33 1975 - 2007
419054 Limbri Swamp Oak Ck -31.04 151.17 391 33 1975 - 2007
420003 Warkton (Blackburns) Belar Ck -31.39 149.20 133 30 1976 - 2005
421026 Sofala Turon -33.08 149.69 883 34 1974 - 2007
421036 below Dam Site Duckmaloi -33.75 149.94 112 25 1956 - 1980
421050 Molong Bell -33.03 148.95 365 33 1975 - 2007
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 188
Table A2 Selected catchments from Victoria
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
221207 Errinundra Errinundra -37.45 148.91 158 35 1971 - 2005
221209 Weeragua Cann(East Branch -37.37 149.20 154 33 1973 - 2005
221210 The Gorge Genoa -37.43 149.53 837 34 1972 - 2005
221211 Combienbar Combienbar -37.44 148.98 179 32 1974 - 2005
221212 Princes HWY Bemm -37.61 148.90 725 31 1975 - 2005
222202 Sardine Ck Brodribb -37.51 148.55 650 41 1965 - 2005
222206 Buchan Buchan -37.50 148.18 822 32 1974 - 2005
222210 Deddick (Caseys) Deddick -37.09 148.43 857 35 1970 - 2005
222213 Suggan Buggan Suggan Buggan -36.95 148.33 357 35 1971 - 2005
222217 Jacksons Crossing Rodger -37.41 148.36 447 30 1976 - 2005
223202 Swifts Ck Tambo -37.26 147.72 943 32 1974 - 2005
223204 Deptford Nicholson -37.60 147.70 287 32 1974 - 2005
224213 Lower Dargo Rd Dargo -37.50 147.27 676 33 1973 - 2005
224214 Tabberabbera Wentworth -37.50 147.39 443 32 1974 - 2005
225213 Beardmore Aberfeldy -37.85 146.43 311 33 1973 - 2005
225218 Briagalong Freestone Ck -37.81 147.09 309 35 1971 - 2005
225219 Glencairn Macalister -37.52 146.57 570 39 1967 - 2005
225223 Gillio Rd Valencia Ck -37.73 146.98 195 35 1971 - 2005
225224 The Channel Avon -37.80 146.88 554 34 1972 - 2005
226204 Willow Grove Latrobe -38.09 146.16 580 35 1971 - 2005
226205 Noojee Latrobe -37.91 146.02 290 46 1960 - 2005
226209 Darnum Moe -38.21 146.00 214 34 1972 - 2005
226217 Hawthorn Br Latrobe -37.98 146.08 440 34 1955 - 1988
226218 Thorpdale Narracan Ck -38.27 146.19 66 35 1971 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 189
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
226222 Near Noojee (U/S Ada R Jun Latrobe -37.88 145.89 62 35 1971 - 2005
226226 Tanjil Junction Tanjil -38.01 146.20 289 46 1960 - 2005
226402 Trafalgar East Moe Drain -38.18 146.21 622 31 1975 - 2005
227200 Yarram Tarra -38.46 146.69 25 41 1965 - 2005
227205 Calignee South Merriman Ck -38.36 146.65 36 31 1975 - 2005
227210 Carrajung Lower Bruthen Ck -38.40 146.74 18 33 1973 - 2005
227211 Toora Agnes -38.64 146.37 67 32 1974 - 2005
227213 Jack Jack -38.53 146.53 34 36 1970 - 2005
227219 Loch Bass -38.38 145.56 52 32 1973 - 2004
227225 Fischers Tarra -38.47 146.56 16 33 1973 - 2005
227226 Dumbalk North Tarwineast Branc -38.50 146.16 127 36 1970 - 2005
227231 Glen Forbes South Bass -38.47 145.51 233 32 1974 - 2005
227236 D/S Foster Ck Jun Powlett -38.56 145.71 228 27 1979 - 2005
228212 Tonimbuk Bunyip -38.03 145.76 174 30 1975 - 2004
228217 Pakenham Toomuc Ck -38.07 145.46 41 29 1974 - 2002
229218 Watsons Ck Watsons Ck -37.67 145.26 36 26 1974 - 1999
230202 Sunbury Jackson Ck -37.58 144.74 337 31 1975 - 2005
230204 Riddells Ck Riddells Ck -37.47 144.67 79 32 1974 - 2005
230205 Bulla (D/S of Emu Ck Jun) Deep Ck -37.63 144.80 865 32 1974 - 2005
230211 Clarkefield Emu Ck -37.47 144.75 93 31 1975 - 2005
231200 Bacchus Marsh Werribee Ck -37.68 144.43 363 28 1978 - 2005
231213 Sardine Ck- O'Brien Cro Lerderderg Ck -37.50 144.36 153 47 1959 - 2005
231225 Ballan (U/S Old Western H) Werribee Ck -37.60 144.25 71 33 1973 - 2005
231231 Melton South Toolern Ck -37.91 144.58 95 27 1979 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 190
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
232200 Little Little Ck -37.96 144.48 417 32 1974 - 2005
232210 Lal Lal Mooraboolwest Br -37.65 144.04 83 33 1973 - 2005
232213 U/S of Bungal Dam Lal Lal Ck -37.66 144.03 157 29 1977 - 2005
233211 Ricketts Marsh Birregurra Ck -38.30 143.84 245 31 1975 - 2005
233214 Forrest (above Tunnel) Barwoneast Branc -38.53 143.73 17 28 1978 - 2005
234200 Pitfield Woady Yaloak -37.81 143.59 324 34 1972 - 2005
235202 Upper Gellibrand Gellibrand -37.56 143.64 53 31 1975 - 2005
235203 Curdie Curdies -38.45 142.96 790 31 1975 - 2005
235204 Beech Forest Little Aire Ck -38.66 143.53 11 30 1976 - 2005
235205 Wyelangta Arkins Ck West B -38.65 143.44 3 28 1978 - 2005
235227 Bunkers Hill Gellibrand -38.53 143.48 311 32 1974 - 2005
235233 Apollo Bay- Paradise Barhameast Branc -38.76 143.62 43 29 1977 - 2005
235234 Gellibrand Love Ck -38.49 143.57 75 27 1979 - 2005
236205 Woodford Merri -38.32 142.48 899 32 1974 - 2005
236212 Cudgee Brucknell Ck -38.35 142.65 570 31 1975 - 2005
237207 Heathmere Surry -38.25 141.66 310 31 1975 - 2005
238207 Jimmy Ck Wannon -37.37 142.50 40 32 1974 - 2005
238219 Morgiana Grange Burn -37.71 141.83 997 33 1973 - 2005
401208 Berringama Cudgewa Ck -36.21 147.68 350 41 1965 - 2005
401209 Omeo Livingstone Ck -37.11 147.57 243 27 1968 - 1994
401210 below Granite Flat Snowy Ck -36.57 147.41 407 38 1968 - 2005
401212 Upper Nariel Nariel Ck -36.45 147.83 252 52 1954 - 2005
401215 Uplands Morass Ck -36.87 147.70 471 35 1971 - 2005
401216 Jokers Ck Big -36.95 141.47 356 52 1952 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 191
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
401217 Gibbo Park Gibbo -36.75 147.71 389 35 1971 - 2005
401220 McCallums Tallangatta Ck -36.21 147.50 464 30 1976 - 2005
402203 Mongans Br Kiewa -36.60 147.10 552 36 1970 - 2005
402204 Osbornes Flat Yackandandah Ck -36.31 146.90 255 39 1967 - 2005
402206 Running Ck Running Ck -36.54 147.05 126 31 1975 - 2005
402217 Myrtleford Rd Br Flaggy Ck -36.39 146.88 24 36 1970 - 2005
403205 Bright Ovens Rivers -36.73 146.95 495 35 1971 - 2005
403209 Wangaratta North Reedy Ck -36.33 146.34 368 33 1973 - 2005
403213 Greta South Fifteen Mile Ck -36.62 146.24 229 33 1973 - 2005
403221 Woolshed Reedy Ck -36.31 146.60 214 30 1975 - 2004
403222 Abbeyard Buffalo -36.91 146.70 425 33 1973 - 2005
403224 Bobinawarrah Hurdle Ck -36.52 146.45 158 31 1975 - 2005
403226 Angleside Boggy Ck -36.61 146.36 108 32 1974 - 2005
403227 Cheshunt King -36.83 146.40 453 33 1973 - 2005
403233 Harris Lane Buckland -36.72 146.88 435 34 1972 - 2005
404206 Moorngag Broken -36.80 146.02 497 33 1973 - 2005
404207 Kelfeera Holland Ck -36.61 146.06 451 31 1975 - 2005
405205 Murrindindi above Colwells Murrindindi -37.41 145.56 108 31 1975 - 2005
405209 Taggerty Acheron -37.32 145.71 619 33 1973 - 2005
405212 Tallarook Sunday Ck -37.10 145.05 337 31 1975 - 2005
405214 Tonga Br Delatite -37.15 146.13 368 49 1957 - 2005
405215 Glen Esk Howqua -37.23 146.21 368 32 1974 - 2005
405217 Devlins Br Yea -37.38 145.48 360 31 1975 - 2005
405218 Gerrang Br Jamieson -37.29 146.19 368 47 1959 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 192
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
405219 Dohertys Goulburn -37.33 146.13 694 39 1967 - 2005
405226 Moorilim Pranjip Ck -36.62 145.31 787 32 1974 - 2005
405227 Jamieson Big Ck -37.37 146.06 619 36 1970 - 2005
405229 Wanalta Wanalta Ck -36.64 144.87 108 36 1969 - 2005
405230 Colbinabbin Cornella Ck -36.61 144.80 259 33 1973 - 2005
405231 Flowerdale King Parrot Ck -37.35 145.29 181 32 1974 - 2005
405237 Euroa Township Seven Creeks -36.76 145.58 332 33 1973 - 2005
405240 Ash Br Sugarloaf Ck -37.06 145.05 609 33 1973 - 2005
405241 Rubicon Rubicon -37.29 145.83 129 33 1973 - 2005
405245 Mansfield Ford Ck -37.04 146.05 115 36 1970 - 2005
405248 Graytown Major Ck -36.86 144.91 282 35 1971 - 2005
405251 Ancona Brankeet Ck -36.97 145.78 121 33 1973 - 2005
405263 U/S of Snake Ck Jun Goulburn -37.46 146.25 327 31 1975 - 2005
405264 D/S of Frenchman Ck Jun Big -37.52 146.08 333 31 1975 - 2005
405274 Yarck Home Ck -37.11 145.60 187 29 1977 - 2005
406213 Redesdale Campaspe -37.02 144.54 629 30 1975 - 2004
406214 Longlea Axe Ck -36.78 144.43 234 34 1972 - 2005
406215 Lyal Coliban -36.96 144.49 717 32 1974 - 2005
406216 Sedgewick Axe Ck -36.90 144.36 34 26 1975 - 2005
406224 Runnymede Mount Pleasant C -36.55 144.64 248 30 1975 - 2004
406226 Derrinal Mount Ida Ck -36.88 144.65 174 28 1978 - 2005
407214 Clunes Creswick Ck -37.30 143.79 308 31 1975 - 2005
407217 Vaughan atD/S Fryers Ck Loddon -37.16 144.21 299 38 1968 - 2005
407220 Norwood Bet Bet Ck -37.00 143.64 347 33 1973 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 193
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
407221 Yandoit Jim Crow Ck -37.21 144.10 166 33 1973 - 2005
407222 Clunes Tullaroop Ck -37.23 143.83 632 33 1973 - 2005
407230 Strathlea Joyces Ck -37.17 143.96 153 33 1973 - 2005
407246 Marong Bullock Ck -36.73 144.13 184 33 1973 - 2005
407253 Minto Piccaninny Ck -36.45 144.47 668 33 1973 - 2005
415207 Eversley Wimmera -37.19 143.19 304 31 1975 - 2005
415217 Grampians Rd Br Fyans Ck -37.26 142.53 34 33 1973 - 2005
415220 Wimmera HWY Avon -36.64 142.98 596 32 1974 - 2005
415226 Carrs Plains Richardson -36.75 142.79 130 31 1971 - 2001
415237 Stawell Concongella Ck -37.02 142.82 239 29 1977 - 2005
415238 Navarre Wattle Ck -36.90 143.10 141 30 1976 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 194
Table A3 Selected catchments from Tasmania
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
76 at Ballroom Offtake North Esk -41.50 147.39 335.0 74 1923 - 1996
159 D/S Rapid Arthur -41.12 145.08 1600.0 42 1955 - 1996
473 D/S Crossing Rv Davey -43.14 145.95 680.0 34 1964 - 1997
499 at Newbury Tyenna -42.71 146.71 198.0 33 1965 - 1997
852 at Strathbridge Meander -41.49 146.91 1025.0 24 1985 - 2008
1012 3.5 Km U/S Esperance Peak Rivulet -43.32 146.90 35.0 23 1975 - 1997
1200 at Whitemark Water Supply South Pats -40.09 148.02 21.0 22 1969 - 1990
2200 at The Grange Swan -42.05 148.07 440.0 33 1964 - 1996
2204 U/S Coles Bay Rd Bdg Apsley -41.94 148.24 157.0 24 1969 - 1992
2206 U/S Scamander Water Supply Scamander -41.45 148.18 265.0 28 1969 - 1996
2207 3 Km U/S Tasman Hwy Little Swanport -42.34 147.90 600.0 19 1971 - 1989
2208 at Swansea Meredith -42.12 148.04 88.0 27 1970 - 1996
2209 Tidal Limit Carlton -42.87 147.70 136.0 28 1969 - 1996
2211 U/S Brinktop Rd Orielton Rivulet -42.76 147.54 46.0 24 1973 - 1996
2213 D/S McNeils Rd Goatrock Ck -42.14 147.92 1.3 22 1975 - 1996
3203 at Baden Coal -42.43 147.45 55.0 26 1971 - 1996
4201 at Mauriceton Jordan -42.53 147.12 730.0 36 1966 - 2001
5200 at Summerleas Rd Br Browns -42.96 147.27 15.0 30 1963 - 1992
6200 D/S Grundys Ck Mountain -42.94 147.13 42.0 29 1968 - 1996
7200 Dover Ws Intake Esperance -43.34 146.96 174.0 29 1965 - 1993
14206 1.5 Km U/S of Mouth Sulphur Ck -41.11 146.03 23.0 29 1964 - 1992
14207 at Bannons Br Leven -41.25 146.09 495.0 35 1963 - 1997
14210 U/S Flowerdale R Juncti Inglis -41.00 145.63 170.0 21 1968 - 1988
14215 at Moorleah Flowerdale -40.97 145.61 150.0 31 1966 - 1996
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 195
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
14217 at Sprent Claytons Rivulet -41.26 146.17 13.5 26 1970 - 1995
14220 U/S Bass HWY Seabrook Ck -41.01 145.77 40.0 20 1977 - 1996
16200 U/S Old Bass Hwy Don -41.19 146.31 130.0 24 1967 - 1990
17200 at Tidal Limit Rubicon -41.26 146.57 255.0 31 1967 - 1997
17201 1.5KM U/S Tidal Limit Franklin Rivulet -41.26 146.61 131.0 20 1975 - 1994
18201 0.5 Km U/S Tamar Supply -41.26 146.94 135.0 19 1965 - 1983
18221 D/S Jackeys Marsh Jackeys Ck -41.68 146.66 29.0 27 1982 - 2008
18312 D/S Elizabeth R Junctio Macquarie -41.91 147.39 1900.0 19 1989 - 2007
19200 2.6KM U/S Tidal Limit Brid -41.02 147.37 134.0 32 1965 - 1996
19201 2KM U/S Forester Rd Bdg Great Forester -41.11 147.61 195.0 27 1970 - 1996
19204 D/S Yarrow Ck Pipers -41.07 147.11 292.0 25 1972 - 1996
304040 U/S Derwent Junction Florentine River -42.44 146.52 435.8 58 1951 - 2008
304125 Below Lagoon Travellers Rest River -42.07 146.25 43.6 25 1949 - 1973
304597 At Lake Highway Pine Tree Rivulet Ck -41.80 146.68 19.4 40 1969 - 2008
308145 At Mount Ficham Track Franklin River -42.24 145.77 757.0 56 1953 - 2008
308183 Below Jane River Franklin River -42.47 145.76 1590.3 22 1957 - 1978
308225 Below Darwin Dam Andrew River -42.22 145.62 5.3 21 1988 - 2008
308446 Below Huntley Gordon River -42.66 146.37 458.0 27 1953 - 1979
308799 B/L Alma Collingwood Ck -42.16 145.93 292.5 28 1981 - 2008
308819 Above Kelly Basin Rd Andrew River -42.22 145.62 4.6 26 1983 - 2008
310061 At Murchison Highway Que River -41.58 145.68 18.4 22 1987 - 2008
310148 Above Sterling Murchison River -41.76 145.62 756.3 28 1955 - 1982
310149 Below Sophia River Mackintosh River -41.72 145.63 523.2 27 1954 - 1980
310472 Below Bulgobac Creek Que River -41.62 145.58 119.1 32 1964 - 1995
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 196
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
315074 At Moina Wilmot River -41.47 146.07 158.1 46 1923 - 1968
315450 U/S Lemonthyme Forth River -41.61 146.13 311.0 46 1963 - 2008
316624 Above Mersey Arm River -41.69 146.21 86.0 37 1972 - 2008
318065 Below Deloraine Meander River -41.53 146.66 474.0 28 1969 - 1996
318350 Above Rocky Creek Whyte River -41.63 145.19 310.8 33 1960 - 1992
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 197
Table A4 Selected catchments from Queensland
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
102101A Fall Ck Pascoe -12.88 142.98 651 33 1968 - 2005
104001A Telegraph Rd Stewart -14.17 143.39 470 32 1970 - 2005
105105A Developmental Rd East Normanby -15.77 145.01 297 34 1970 - 2005
107001B Flaggy Endeavour -15.42 145.07 337 43 1959 - 2004
108002A Bairds Daintree -16.18 145.28 911 29 1969 - 2000
108003A China Camp Bloomfield -15.99 145.29 264 32 1971 - 2004
110003A Picnic Crossing Barron -17.26 145.54 228 80 1926 - 2005
110011B Recorder Flaggy Ck -16.78 145.53 150 44 1956 - 2003
110101B Freshwater Freshwater Ck -16.94 145.70 70 37 1922 - 1958
111001A Gordonvale Mulgrave -17.10 145.79 552 43 1917 - 1972
111003C Aloomba Behana Ck -17.13 145.84 86 28 1943 - 1970
111005A The Fisheries Mulgrave -17.19 145.72 357 34 1967 - 2004
111007A Peets Br Mulgrave -17.14 145.76 520 31 1973 - 2004
111105A The Boulders Babinda Ck -17.35 145.87 39 29 1967 - 2003
112001A Goondi North Johnstone -17.53 145.97 936 39 1929 - 1967
112002A Nerada Fisher Ck -17.57 145.91 15.7 75 1929 - 2004
112003A Glen Allyn North Johnstone -17.38 145.65 165 46 1959 - 2004
112004A Tung Oil North Johnstone -17.55 145.93 925 31 1967 - 2004
112101B U/S Central Mill South Johnstone -17.61 145.98 400 81 1917 - 2003
113004A Powerline Cochable Ck -17.75 145.63 95 32 1967 - 2001
114001A Upper Murray Murray -18.11 145.80 156 31 1971 - 2003
116005B Peacocks Siding Stone -18.69 145.98 368 36 1936 - 1971
116008B Abergowrie Gowrie Ck -18.45 145.85 124 51 1954 - 2004
116010A Blencoe Falls Blencoe Ck -18.20 145.54 226 40 1961 - 2000
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 198
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
116011A Ravenshoe Millstream -17.60 145.48 89 42 1963 - 2004
116012A 8.7KM Cameron Ck -18.07 145.34 360 41 1962 - 2002
116013A Archer Ck Millstream -17.65 145.34 308 42 1962 - 2003
116014A Silver Valley Wild -17.63 145.30 591 44 1962 - 2005
116015A Wooroora Blunder Ck -17.74 145.44 127 38 1967 - 2004
116017A Running Ck Stone -18.77 145.95 157 33 1971 - 2004
117002A Bruce HWY Black -19.24 146.63 256 31 1974 - 2004
117003A Bluewater Bluewater Ck -19.18 146.55 86 30 1974 - 2003
118101A Gleesons Weir Ross -19.32 146.74 797 44 1916 - 1959
118106A Allendale Alligator Ck -19.39 146.96 69 30 1975 - 2004
119006A Damsite Major Ck -19.67 147.02 468 25 1979 - 2003
120014A Oak Meadows Broughton -20.18 146.32 182 28 1971 - 1998
120102A Keelbottom Keelbottom Ck -19.37 146.36 193 38 1968 - 2005
120120A Mt. Bradley Running -19.13 145.91 490 30 1976 - 2005
120204B Crediton Recorder Broken -21.17 148.51 41 31 1957 - 1987
120206A Mt Jimmy Pelican Ck -20.60 147.69 545 27 1961 - 1987
120216A Old Racecourse Broken -21.19 148.45 100 36 1970 - 2005
120307A Pentland Cape -20.48 145.47 775 34 1970 - 2003
121001A Ida Ck Don -20.29 148.12 604 48 1958 - 2005
121002A Guthalungra Elliot -19.94 147.84 273 32 1974 - 2005
122004A Lower Gregory Gregory -20.30 148.55 47 33 1973 - 2005
124001A Caping Siding O'Connell -20.63 148.57 363 35 1970 - 2004
124002A Calen StHelens Ck -20.91 148.76 118 32 1974 - 2005
124003A Jochheims Andromache -20.58 148.47 230 29 1977 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 199
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
125002C Sarich's Pioneer -21.27 148.82 757 43 1961 - 2005
125004B Gargett Cattle Ck -21.18 148.74 326 38 1968 - 2005
125005A Whitefords Blacks Ck -21.33 148.83 506 32 1974 - 2005
125006A Dam Site Finch Hatton Ck -21.11 148.63 35 29 1977 - 2005
126003A Carmila Carmila Ck -21.92 149.40 84 31 1974 - 2004
129001A Byfield Waterpark Ck -22.84 150.67 212 48 1953 - 2005
130004A Old Stn Raglan Ck -23.82 150.82 389 41 1964 - 2004
130108B Curragh Blackwater Ck -23.50 148.88 776 31 1973 - 2005
130207A Clermont Sandy Ck -22.80 147.58 409 40 1966 - 2005
130208A Ellendale Theresa Ck -22.98 147.58 758 37 1965 - 2001
130215A Lilyvale Lagoon Crinum Ck -23.21 148.34 252 29 1977 - 2005
130319A Craiglands Bell Ck -24.15 150.52 300 44 1961 - 2004
130321A Mt. Kroombit Kroombit Ck -24.41 150.72 373 41 1964 - 2004
130334A Pump Stn South Kariboe Ck -24.56 150.75 284 33 1973 - 2005
130335A Wura Dee -23.77 150.36 472 34 1972 - 2005
130336A Folding Hills Grevillea Ck -24.58 150.62 233 33 1973 - 2005
130348A Red Hill Prospect Ck -24.45 150.42 369 30 1976 - 2005
130349A Kingsborough Don -23.97 150.39 593 28 1977 - 2005
130413A Braeside Denison Ck -21.77 148.79 757 34 1972 - 2005
133003A Marlua Diglum Ck -24.19 151.16 203 36 1969 - 2004
135002A Springfield Kolan -24.75 151.59 551 40 1966 - 2005
135004A Dam Site Gin Gin Ck -24.97 151.89 531 40 1966 - 2005
136006A Dam Site Reid Ck -25.27 151.52 219 40 1966 - 2005
136102A Meldale Three Moon Ck -24.69 150.96 310 32 1949 - 1980
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 200
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
136107A Cania Gorge Three Moon Ck -24.73 151.01 370 26 1963 - 1988
136108A Upper Monal Monal Ck -24.61 151.11 92 43 1963 - 2005
136111A Dakiel Splinter Ck -24.75 151.26 139 41 1965 - 2005
136112A Yarrol Burnett -24.99 151.35 370 40 1966 - 2005
136202D Litzows Barambah Ck -26.30 152.04 681 85 1921 - 2005
136203A Brooklands Barker Ck -26.74 151.82 249 64 1941 - 2005
136301B Weens Br Stuart -26.50 151.77 512 66 1936 - 2005
137001B Elliott Elliott -24.99 152.37 220 52 1949 - 2004
137003A Dr Mays Crossing Elliott -24.97 152.42 251 30 1975 - 2004
137101A Burrum HWY Gregory -25.09 152.24 454 36 1967 - 2004
137201A Bruce HWY Isis -25.27 152.37 446 38 1967 - 2004
138002C Brooyar Wide Bay Ck -26.01 152.41 655 94 1910 - 2005
138003D Glastonbury Glastonbury Ck -26.22 152.52 113 81 1921 - 2006
138009A Tagigan Rd Tinana Ck -26.08 152.78 100 31 1975 - 2005
138010A Kilkivan Wide Bay Ck -26.08 152.22 322 97 1910 - 2006
138101B Kenilworth Mary -26.60 152.73 720 52 1921 - 1972
138102C Zachariah Amamoor Ck -26.37 152.62 133 83 1921 - 2005
138103A Knockdomny Kandanga Ck -26.40 152.64 142 34 1921 - 1954
138104A Kidaman Obi Obi Ck -26.63 152.77 174 42 1921 - 1963
138106A Baroon Pocket Obi Obi Ck -26.71 152.86 67 39 1941 - 1986
138107B Cooran Six Mile Ck -26.33 152.81 186 58 1948 - 2005
138110A Bellbird Ck Mary -26.63 152.70 486 45 1960 - 2004
138111A Moy Pocket Mary -26.53 152.74 820 39 1964 - 2004
138113A Hygait Kandanga Ck -26.39 152.64 143 34 1972 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 201
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
140002A Coops Corner Teewah Ck -26.06 153.04 53 27 1975 - 2005
141001B Kiamba South Maroochy -26.59 152.90 33 65 1938 - 2004
141003C Warana Br Petrie Ck -26.62 152.96 38 41 1959 - 2004
141004B Yandina South Maroochy -26.56 152.94 75 27 1959 - 2004
141006A Mooloolah Mooloolah -26.76 152.98 39 33 1972 - 2004
142001A Upper Caboolture Caboolture -27.10 152.89 94 40 1966 - 2005
142201D Cashs Crossing South Pine -27.34 152.96 178 46 1918 - 1963
142202A Drapers Crossing South Pine -27.35 152.92 156 39 1966 - 2005
143010B Boat Mountain Emu Ck -26.98 152.29 915 31 1967 - 2005
143015B Damsite Cooyar Ck -26.74 152.14 963 35 1969 - 2005
143101A Mutdapily Warrill Ck -27.75 152.69 771 39 1915 - 1953
143102B Kalbar No.2 Warrill Ck -27.92 152.60 468 55 1913 - 1970
143103A Moogerah Reynolds Ck -28.04 152.55 190 36 1918 - 1953
143107A Walloon Bremer -27.60 152.69 622 36 1962 - 1999
143108A Amberley Warrill Ck -27.67 152.70 914 36 1962 - 2004
143110A Adams Br Bremer -27.83 152.51 125 29 1972 - 2004
143113A Loamside Purga Ck -27.68 152.73 215 28 1974 - 2004
143203C Helidon Number 3 Lockyer Ck -27.54 152.11 357 74 1927 - 2004
143208A Dam Site Fifteen Mile Ck -27.46 152.10 87 26 1957 - 1985
143209B Mulgowie2 Laidley Ck -27.73 152.36 167 31 1958 - 2004
143303A Peachester Stanley -26.84 152.84 104 77 1928 - 2005
143307A Causeway Byron Ck -27.13 152.65 79 26 1976 - 2005
145002A Lamington No.1 Christmas Ck -28.24 152.99 95 43 1910 - 1953
145003B Forest Home Logan -28.20 152.77 175 83 1918 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 202
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
145005A Avonmore Running Ck -28.30 152.91 89 30 1923 - 1952
145010A 5.8KM Deickmans Br Running Ckreek -28.25 152.89 128 40 1966 - 2005
145011A Croftby Teviot Brook -28.15 152.57 83 38 1967 - 2005
145012A The Overflow Teviot Brook -27.93 152.86 503 39 1967 - 2005
145018A Up Stream Maroon Dam Burnett Ck -28.22 152.61 82 32 1971 - 2005
145020A Rathdowney Logan -28.22 152.87 533 32 1974 - 2005
145101D Lumeah Number 2 Albert -28.06 153.04 169 43 1911 - 1953
145102B Bromfleet Albert -27.91 153.11 544 85 1919 - 2005
145103A Good Dam Site Cainbable Ck -28.09 153.08 42 32 1963 - 2004
145107A Main Rd Br Canungra Ck -28.00 153.16 101 32 1974 - 2005
146002B Glenhurst Nerang -28.00 153.31 241 85 1920 - 2005
146003B Camberra Number 2 Currumbin Ck -28.20 153.41 24 55 1928 - 1982
146004A Neranwood Little Nerang Ck -28.13 153.29 40 35 1927 - 1961
146005A Chippendale Tallebudgera Ck -28.16 153.40 55 27 1927 -1953
146010A Army Camp Coomera -28.03 153.19 88 43 1963 - 2005
146012A Nicolls Br Currumbin Ck -28.18 153.42 30 31 1971 - 2005
146014A Beechmont Back Ck -28.12 153.19 7 31 1972 - 2004
146095A Tallebudgera Ck Rd Tallebudgera Ck -28.15 153.40 56 29 1971 - 2004
416303C Clearview Pike Ck -28.81 151.52 950 48 1935 - 1987
416305B Beebo Brush Ck -28.69 150.98 335 36 1969 - 2005
416312A Texas Oaky Ck -28.81 151.15 422 35 1970 - 2004
416404C Terraine Bracker Ck -28.49 151.28 685 45 1953 - 2001
416410A Barongarook Macintyre Brook -28.44 151.46 465 32 1968 - 2001
422210A Tabers Bungil Ck -26.41 148.78 710 32 1967 - 2004
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 203
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
422301A Long Crossing Condamine -28.32 152.34 85 66 1912 - 1977
422302A Killarney Spring Ck -28.35 152.34 21 45 1910 - 1954
422303A Killarney Spring Ck South -28.36 152.34 10 45 1910 - 1954
422304A Elbow Valley Condamine -28.37 152.16 275 56 1916 - 1971
422306A Swanfels Swan Ck -28.16 152.28 83 85 1920 - 2004
422307A Kings Ck Kings Ck -27.90 151.91 334 42 1921 - 1966
422313B Emu Vale Emu Ck -28.23 152.23 148 58 1948 - 2005
422317B Rocky Pond Glengallan Ck -28.13 151.92 520 38 1954 - 1991
422319B Allora Dalrymple Ck -28.04 152.01 246 36 1969 - 2005
422321B Killarney Spring Ck -28.35 152.33 35 45 1960 - 2004
422326A Cranley Gowrie Ck -27.52 151.94 47 34 1970 - 2004
422332B Oakey Gowrie Ck -27.47 151.74 142 25 1969 - 2006
422334A Aides Br Kings Ck -27.93 151.86 516 35 1970 - 2004
422338A Leyburn Canal Ck -28.03 151.59 395 27 1975 - 2004
422341A Brosnans Barn Condamine -28.33 152.31 92 29 1977 - 2005
422394A Elbow Valley Condamine -28.37 152.14 325 32 1973 - 2004
913010A 16 Mile Waterhole Fiery Ck -18.88 139.36 722 29 1973 - 2004
915011A Mt Emu Plains Porcupine Ck -20.18 144.52 540 31 1972 - 2004
915206A Railway Crossing Dugald -20.20 140.22 660 31 1970 - 2004
915211A Landsborough HWY Williams -20.87 140.83 415 31 1971 - 2003
917104A Roseglen Etheridge -18.31 143.58 867 32 1967 - 2005
917107A Mount Surprise Elizabeth Ck -18.13 144.31 651 32 1969 - 2002
919005A Fonthill Rifle Ck -16.68 145.23 366 32 1969 - 2004
919013A Mulligan HWY McLeod -16.50 145.00 532 25 1973 - 2005
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 204
Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Record Length
(years) Period of Record
919201A Goldfields Palmer -16.11 144.78 533 30 1968 - 2004
919305B Nullinga Walsh -17.18 145.30 326 35 1957 - 1991
922101B Racecourse Coen -13.96 143.17 172 32 1968 - 2004
926002A Dougs Pad Dulhunty -11.83 142.42 332 30 1971 - 2004
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 205
APPENDIX B
Additional results on training and validation of RFFA models
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 206
Figure B.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model
for Q2 (training data set)
Figure B.2 Comparison of observed and predicted flood quantiles for ANN based RFFA model
for Q5 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 207
Figure B.3 Comparison of observed and predicted flood quantiles for ANN based RFFA model
for Q10 (training data set)
Figure B.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model
for Q50 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 208
Figure B.5 Comparison of observed and predicted flood quantiles for ANN based RFFA model
for Q100 (training data set)
Figure B.6 Comparison of observed and predicted flood quantiles for GAANN based RFFA
model for Q2 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 209
Figure B.7 Comparison of observed and predicted flood quantiles for GAANN based RFFA
model for Q5 (training data set)
Figure B.8 Comparison of observed and predicted flood quantiles for GAANN based RFFA
model for Q10 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 210
Figure B.9 Comparison of observed and predicted flood quantiles for GAANN based RFFA
model for Q50 (training data set)
Figure B.10 Comparison of observed and predicted flood quantiles for GAANN based RFFA
model for Q100 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 211
Figure B.11 Comparison of observed and predicted flood quantiles for GEP based RFFA model
for Q2 (training data set)
Figure B.12 Comparison of observed and predicted flood quantiles for GEP based RFFA model
for Q5 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 212
Figure B.13 Comparison of observed and predicted flood quantiles for GEP based RFFA model
for Q10 (training data set)
Figure B.14 Comparison of observed and predicted flood quantiles for GEP based RFFA model
for Q50 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 213
Figure B.15 Comparison of observed and predicted flood quantiles (training) for GEP based
RFFA model for Q100 (training data set)
Figure B.16 Comparison of observed and predicted flood quantiles (training) for CANFIS based
RFFA model for Q2 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 214
Figure B.17 Comparison of observed and predicted flood quantiles (training) for CANFIS based
RFFA model for Q5 (training data set)
Figure B.18 Comparison of observed and predicted flood quantiles (training) for CANFIS based
RFFA model for Q10 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 215
Figure B.19 Comparison of observed and predicted flood quantiles (training) for CANFIS based
RFFA model for Q50 (training data set)
Figure B.20 Comparison of observed and predicted flood quantiles (training) for CANFIS based
RFFA model for Q100 (training data set)
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 216
Figure B.21 Comparison of observed and predicted flood quantiles (validation) for ANN based
RFFA model for Q2
Figure B.22 Comparison of observed and predicted flood quantiles (validation) for ANN based
RFFA model for Q5
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 217
Figure B.23 Comparison of observed and predicted flood quantiles (validation) for ANN based
RFFA model for Q10
Figure B.24 Comparison of observed and predicted flood quantiles (validation) for ANN based
RFFA model for Q50
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 218
Figure B.25 Comparison of observed and predicted flood quantiles (validation) for ANN based
RFFA model for Q100
Figure B.26 Comparison of observed and predicted flood quantiles (validation) for GAANN
based RFFA model for Q2
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 219
Figure B.27 Comparison of observed and predicted flood quantiles (validation) for GAANN
based RFFA model for Q5
1
10
100
1000
10000
1 10 100 1000 10000
Qp
red
(m3/s
ec)
Qobs (m3/sec)
Figure B.28 Comparison of observed and predicted flood quantiles (validation) for GAANN
based RFFA model for Q10
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 220
Figure B.29 Comparison of observed and predicted flood quantiles (validation) for GAANN
based RFFA model for Q50
1
10
100
1000
10000
1 10 100 1000 10000
Qp
red
(m3/s
ec
)
Qobs (m3/sec)
Figure B.30 Comparison of observed and predicted flood quantiles (validation) for GAANN
based RFFA model for Q100
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 221
Figure B.31 Comparison of observed and predicted flood quantiles (validation) for GEP based
RFFA model for Q2
Figure B.32 Comparison of observed and predicted flood quantiles (validation) for GEP based
RFFA model for Q5
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 222
Figure B.33 Comparison of observed and predicted flood quantiles (validation) for GEP based
RFFA model for Q10
Figure B.34 Comparison of observed and predicted flood quantiles (validation) for GEP based
RFFA model for Q50
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 223
Figure B.35 Comparison of observed and predicted flood quantiles (validation) for GEP based
RFFA model for Q100
Figure B.36 Comparison of observed and predicted flood quantiles (validation) for CANFIS
based RFFA model for Q2
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 224
Figure B.37 Comparison of observed and predicted flood quantiles (validation) for CANFIS
based RFFA model for Q5
Figure B.38 Comparison of observed and predicted flood quantiles (validation) for CANFIS
based RFFA model for Q10
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 225
Figure B.39 Comparison of observed and predicted flood quantiles (validation) for CANFIS
based RFFA model for Q50
Figure B.40 Comparison of observed and predicted flood quantiles (validation) for CANFIS
based RFFA model for Q100
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 226
Figure B.41 Regression plot comparing the training and validation of the ANN based RFFA
model for Q2
Figure B.42 Regression plot comparing the training and validation of the ANN based RFFA
model for Q5
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 227
Figure B.43 Regression plot comparing the training and validation of the ANN based RFFA
model for Q10
Figure B.44 Regression plot comparing the training and validation of the ANN based RFFA
model for Q50
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 228
Figure B.45 Regression plot comparing the training and validation of the ANN based RFFA
model for Q100
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 229
Figure B.46 Section of Dendrogram using average linkage between groups
Artificial Intelligence Based RFFA Aziz
University of Western Sydney 230
Figure B.47 Section of Dendrogram using average linkage between groups